[CP2K:4008] Segfault on ia32/MPI in H2O-hfx-1 testcase

Michael Banck mba... at debian.org
Thu Sep 6 20:25:56 UTC 2012


Hi,

On Wed, Sep 05, 2012 at 01:41:59AM +0200, Michael Banck wrote:
> I noticed that on ia32 when using mpiexec (and only then, if I run cp2k.popt
> without mpirun or mpiexec, it works fine) to run e.g.
> tests/QS/regtest-hfx/H2O-hfx-1.inp, I get a segfault:

OK, so it does not seem to be highly related to MPI or not, different
builds fail differently.
 
> #0  STA (v=..., x=0xc4d0718, ovs=<optimized out>, aligned_like=<optimized out>) at ../../../simd-support/simd-sse2.h:125
> #1  n2fv_6 (ri=0xbffc8980, ii=0xbffc8988, ro=0xc4d06e8, io=0xc4d06f0, is=<optimized out>, os=<optimized out>, v=3, ivs=2, ovs=12) at ../common/n2fv_6.c:146
> #2  0xb6d5f8ad in dobatch (ego=ego at entry=0xc50c5e0, ri=ri at entry=0xc4d06e8, ii=0xc4d06f0, ro=ro at entry=0xc4d06e8, io=io at entry=0xc4d06f0, buf=buf at entry=0xbffc8980, batchsz=3) at direct.c:51
> #3  0xb6d5f9b7 in apply_buf (ego_=0xc50c5e0, ri=0xc4d06e8, ii=<optimized out>, ro=0xc4d06e8, io=0xc4d06f0) at direct.c:87
> #4  0xb6d5cb1c in apply_dit (ego_=0xc50c780, ri=ri at entry=0xc4d06e8, ii=ii at entry=0xc4d06f0, ro=ro at entry=0xc4d06e8, io=io at entry=0xc4d06f0) at ct.c:41
> #5  0xb6d62403 in apply (ego_=0xc50c800, ri=0xc4d06e8, ii=0xc4d06f0, ro=0xc4d06e8, io=0xc4d06f0) at vrank-geq1.c:62
> #6  0xb6d61e9c in apply (ego_=0xc4b0a00, ri=ri at entry=0xc4d06e8, ii=ii at entry=0xc4d06f0, ro=ro at entry=0xc4d06e8, io=io at entry=0xc4d06f0) at rank-geq2.c:48
> #7  0xb6d62403 in apply (ego_=0xc4afb40, ri=0xc4d06e8, ii=0xc4d06f0, ro=0xc4d06e8, io=0xc4d06f0) at vrank-geq1.c:62
> #8  0xb6d61e9c in apply (ego_=0xc4b2440, ri=0xc4d06e8, ii=0xc4d06f0, ro=0xc4d06e8, io=0xc4d06f0) at rank-geq2.c:48
> #9  0xb6e0458c in dfftw_execute_dft_ (p=p at entry=0xc4b389c, in=in at entry=0xc4d06e8, out=out at entry=0xc4d06e8) at f77funcs.h:178
> #10 0x09666738 in fftw33d (plan=..., scale=1, zin=..., zout=..., stat=1) at /build/cp2k-yMUnXy/cp2k-2.2.426/makefiles/../src/fft_lib/fftw3_lib.F:196

I had a look at this last night together with Debian's FFTW3 maintainer.
It turns out the problem is in passing unaligned (i.e. not aligned to 16
bytes) arrays to FFTW3, which on Debian and Ubuntu is using SIMD/SSE2 if
available on 32bit since a couple of months ago.  

So locally recompiling FFTW3 without SSE2 support or adding
"FFTW_ARRAYS_ALIGNED F"[1] to the input file's global section makes CP2K
run the job correctly.

However, both options are not very useful, as it is prohibitive to
remove SSE2 support again for all FFTW3 applications in Debian/Ubuntu
and on the other hand modifying every input file is not very practical
either.  Having some DEFINE flag to set in the arch/ files would make it
easier to disable it only for the ia32 architecture and only for CP2K.

That aside, according to our reading of the FFTW documentation, FFTW3
should be able to detect unaligned arrays and revert to the common code
path[2]. That this is not the case here might indicate that CP2K is
reusing a plan with a different array, and while the initial array was
(likely by chance) properly aligned to 16 bytes for SIMD instruction to
be used, while a follow-up array was not, resulting in the segmentation
fault.  Probably replanning the FFT is non-starter for performance
reasons, though?

The root cause appears to be that gfortran is using glibc's malloc() for
ALLOCATE, which does not align the memory to 16 bytes on 32bit systems,
see e.g. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24261

One possibility to allocate memory which is properly aligned for SIMD
instructions might be to use FFTW3's fftw_malloc and associate the
resulting allocated memory with the Fortran array via Fortran 2003's C
interoperability instead of the standard ALLOCATE, as shown in the FFTW3
docs: http://fftw.org/doc/Allocating-aligned-memory-in-Fortran.html 

Has this been investigated and rejected in the past and/or would this be
a reasonable and welcome thing to send patches for?


Best regards,

Michael

[1] It wasn't helping that the 2.2 branch INSTALL filed typos "align" so
it did not show up in a grep through the source tree initially.

[2] http://www.fftw.org/doc/SIMD-alignment-and-fftw_005fmalloc.html "If
the array happens not to be properly aligned, FFTW will not use the SIMD
extensions."



More information about the CP2K-user mailing list