Segmentation fault (x86-64)
Toon
Toon.Ver... at gmail.com
Mon Apr 30 17:29:18 UTC 2007
Hi Axel,
Your suggestion is indeed correct. When the stack limit is set to
unlimited (ulimit -S -H -s unlimited), the calculation runs fine on
any number of processes. I am one of the (defacto) sysadmins on these
machines, so there is no need for a hack. Thanks a lot for your help.
best regards,
Toon
On 30 apr, 16:56, akohlmey <akoh... at gmail.com> wrote:
> toon,
>
> please check your stack size (ulimit -s).
> the intel compiler requires much more stack than what
> is usually enabled on (redhat or derived) linux distributions.
> you can check this most easily for the serial version by
> doing this interactively. in parallel this needs to be set on
> all nodes.
>
> ideally, you have your sysadmin raise the default stack limit
> on all nodes to a more reasonable value. if this is not doable,
> i have a hack to the cp2k code to reset the value from within
> the cp2k run (and a similar one to re-enable coredumps which
> are intercepted by the (stupid) intel fortran runtime).
>
> cheers,
> axel.
>
> On Apr 30, 10:38 am, Toon <Toon.Ver... at gmail.com> wrote:
>
> > Hi,
>
> > We are compiling cp2k on a an x86-64 machine. Both the serial and the
> > parallel version have the same problem that cp2k segfaults when it
> > starts an scf cycle. We have tried three fft implementation (fftsg,
> > fftw3 and fftmkl). The error on the console looks like this:
>
> > rank 0 in job 1 moldyn48_36535 caused collective abort of all ranks
> > exit status of rank 0: killed by signal 11
>
> > Only when we specify a large number of parallel processes (-np 16),
> > the problem disappears. When we do the same test with H2O-32.inp from
> > the tests directory, 2 processes are sufficient, but one is not. This
> > is probably a memory related problem because the memory requirements
> > per process decrease when we use a higher number of processes. In
> > principle this should not happen since we are working on 64bit
> > machines. Have other people experienced the same problem? Maybe our
> > arch files are not perfect? Our arch files and an example output file
> > can be found at:
>
> >https://molmod.ugent.be/~toon/
>
> > P.S. We have tried the -i8 option for the intel compiler but this had
> > no effect on the serial version and it broke the parallel version.
More information about the CP2K-user
mailing list