[CP2K:2085] Re: Sigsegv error during cell optimization

Laino Teodoro teodor... at gmail.com
Mon May 18 22:32:30 CEST 2009


Dear Maricarmen,

I'm happy it was just a misunderstanding. Anyway I posted the test  
suite on both threads (I don't  know why, but  I thought they were  
somehow correlated..).

Anyway, the important part of the message (which I hope has been  
received) without further help from your side, it is
extremely difficult to help you (and this applies to possible  
segmentation faults as well as to features not working properly).

Regards,
Teo

On 18 May 2009, at 22:04, Maricarmen wrote:

>
> Dear Teo,
>
> I'm sorry but I think there has been a slight misunderstanding. You
> see, the issue with the file numbering in CELL_OPT runs is posted in a
> different thread, for it is completely independent from the issue
> posted here. What I'm refering to here is the issue with the
> segmentation fault signal that I've been getting in the latest
> machines I've tried to run CP2K on.
> The WALLTIME thing I mentioned was only to illustrate that the
> application is actually dying, even though the job keeps going on. I
> never meant that the flag itself doesn't work. Actually I think is a
> very useful feature.
>
>> The sad story is that I had to prepare this test by myself and I  
>> didn't
>> see anything back from your side (apart from complains): that's a  
>> very
>> strange thing and I can tell you that it is very uncommon when  
>> somebody
>> asks for help that the helper has to guess even how to reproduce the
>> problem!!
>
> I'm sorry again. As I just said, I wasn't talking about the numbering
> issue. You hadn't get any answer from me because I was planning to do
> as you suggested (create a testfile) starting this week. You suggested
> that last week and I don't normally work on weekends, so I was going
> to get to it today. I'm sorry that you took from your time to do what
> I was going to do in short time from now. I didn't mean for you to do
> what I was supossed to do.
> Also, I must say that I believe complaining is very different from
> asking for help. I'm very sorry if I sounded like I was complaining,
> that was never my intention. I though this was a help forum and I was
> only reaching out to see if someone that might have had the same
> problem could know a possible solution. Actually, Juerg's idea is very
> logical and that's what I'll do. It is difficult to get some progress
> if you have no control on the system's setup and you have to keep
> asking others to change configuration by essay and error in order to
> try and see if you're problem's been solved or not. If you add the
> fact that they only give you limited calculation time, then the need
> for an easy and/or quick solution is maximized. That's why I chose to
> ask you first rather than my system's administrators (who had never
> heard of CP2K before).
> I guess next times I'll just ask them to try things out before posting
> here. I just though maybe I could save some time and effort.
>
>>
>> For your convenience, keep also in mind the same suggestions we  
>> always
>> say: CP2K is not an easy code.
>> Highly demanding in terms of compilers/libraries.
>> If you have a very tight problem in terms of timing (I'm aware the  
>> world
>> today is based on timing!) keep in mind
>> that there are bunches of other codes.. much more well documented...
>> more easy friendly.. that may possibly help you better
>> to achieve your goals.
>>
>
> Yes, I know that. Thank's for the sugestion. But we've chosen CP2K for
> a reason and we'll stick with it. I really value the work you all do,
> and I really appreciated how attentive you were to us during the
> Tutorial. I can tell that you really love what you do and that you
> really want to contribute to the scientific community. That is
> absolutely remarkable. I apologize if somehow I implied otherwise in
> my post.
>
> Best regards,
>
> Maricarmen
>
>
>> Regards,
>> Teo
>>
>> Maricarmen wrote:
>>> I guess I rushed in. It's NOT working. I'm just not getting the
>>> sigsegv signal, but CP2K just dies (usually when starting the second
>>> cellopt step, but in any case is always when starting the SCF  
>>> cycle),
>>> no matter how big or small the system is. It starts fine, and then
>>> after a few steps it hangs and stays there until the job is  
>>> killed by
>>> external signal due to limit time being reached. So now I'm spending
>>> all my calculation time without doing any better than before.
>>> May I add that I'm using the WALLTIME flag, but it is just not
>>> working. As I said, the job is killed by MPI.
>>> Pleeeease, could someone help me find out how to solve this? I'm not
>>> just wasting my calculation time for the year, but real time to get
>>> some useful results...
>>> I wouldn't want to bother the administrators again without knowing
>>> where the issue comes from. Should I tell them to try another
>>> compiler??
>>
>>> Maricarmen
>>
>>> On 11 mai, 09:31, Maricarme... at cemes.fr wrote:
>>
>>>> Ciao everyone,
>>
>>>> I wanted to let you know that we have apparently solved the  
>>>> problem.
>>>> The machine administrators have recompiled the code with these
>>>> settings:
>>
>>>> - Classical optimization (-O2 -g) for INTEL 11 compilers
>>>> - SGI MPT 1.22 MPI library
>>>> - Intel MKL and Intel FFTW libraries
>>
>>>> I have been testing it the whole weekend and it looks like it works
>>>> again :)
>>>> Thanks a lot for your help.
>>
>>>> Cheers,
>>
>>>> Maricarmen
>>
>>>> On 6 mai, 16:49, Axel <akoh... at gmail.com> wrote:
>>
>>>>> ciao maricarmen,
>>
>>>>> On May 6, 9:24 am, Maricarme... at cemes.fr wrote:
>>
>>>>>> Thanks Teo,
>>
>>>>>> Actually the Intel fortran compiler is version 10.1.017. I  
>>>>>> can't find
>>>>>> any comments on this particular version. I found something on  
>>>>>> 10.1.018
>>>>>> though, and it semmed to work fin.
>>>>>> In the machine there is also version 11.0.83, but I actually  
>>>>>> found
>>>>>> some message on the list reporting problems with latests  
>>>>>> compilers
>>>>>> (e.g. versions 11).
>>
>>>>> hard to say, but the fact that it is up to patch level 83 is  
>>>>> somewhat
>>>>> telling.
>>>>> i'd try the 10.1 first.
>>
>>>>>> For the plain popt CP2K version I'll have to ask the  
>>>>>> administrators to
>>>>>> recompile the code (they did it the first time), so I might as  
>>>>>> well
>>>>>> ask them to use the newer compiler this time. Otherwise, do  
>>>>>> you think
>>>>>> it's better to compile to the popt version with the same compiler
>>>>>> (e.g. 10.1.017)?
>>
>>>>> i would suggest to first go a bit more conservative in  
>>>>> optimization
>>>>> and
>>>>> replace '-O3 -xS' with '-O2'. using a less aggressive optimization
>>>>> frequently
>>>>> helps with intel compilers. since you seem to be on an itanium
>>>>> processor
>>>>> machine, you'll be seeing more problems, though. those  
>>>>> compilers are
>>>>> generally lagging behind the x86 versions in reliability.  
>>>>> idependent
>>>>> of the
>>>>> individual version.
>>
>>>>> if you look through the files in the arch directory. there are  
>>>>> several
>>>>> entries
>>>>> with exceptions for files that are better compiled without any
>>>>> optimizations
>>>>> to work around to aggressive compilers. i'd try to collect all  
>>>>> of them
>>>>> into
>>>>> a special arch file in case you still are seeing problems.
>>
>>>>> finally, i'd have a closer look at the mpi manpage. on altix  
>>>>> machines
>>>>> there
>>>>> are a few environment variables that can affect the stability and
>>>>> performance
>>>>> of parallel jobs. i remember having tinkered with that on a  
>>>>> machine,
>>>>> but i have
>>>>> currently no access to it, and forgot to transfer the job scripts
>>>>> before that.
>>
>>>>> cheers,
>>>>>     axel.
>>
>>>>>> Ciao,
>>
>>>>>> Maricarmen
>>
>>>>>> On 6 mai, 09:56, Teodoro Laino <teodor... at gmail.com> wrote:
>>
>>>>>>> Hi Maricarmen,
>>
>>>>>>> could you try a plain popt version without the smp support?
>>>>>>> Keep as well in the submission script ompthreads=1.
>>
>>>>>>> which version of intel compiler are you using? did you check  
>>>>>>> on this
>>>>>>> mailing list that it is a "good one"?
>>>>>>> In case, do you have access to other compilers on that machine?
>>
>>>>>>> Teo
>>
>>>>>>> Maricarme... at cemes.fr wrote:
>>
>>>>>>>> Hello everyone,
>>
>>>>>>>> I'm running a DFT cell optimization for Mx-V4O11 crystals (M  
>>>>>>>> = Ag and
>>>>>>>> Cu). My cells are approximately 14x7x7 and about 260 atoms.  
>>>>>>>> Below is a
>>>>>>>> copy of one of my input files. The problem is I keep getting  
>>>>>>>> a SIGSEGV
>>>>>>>> (11) error, usually when starting the SCF cycles for the  
>>>>>>>> second cell
>>>>>>>> opt step (an extract from the output file is also below).
>>>>>>>> I'm running parallel on a calculus center (http://www.cines.fr/
>>>>>>>> spip.php?rubrique186), and the administrators have already  
>>>>>>>> checked for
>>>>>>>> the stack size (which according to them is set to  
>>>>>>>> unlimited). Below is
>>>>>>>> also a copy of the job submission's file, and of the arch file.
>>>>>>>> I even tried to run a cell opt test for a smaller cell  
>>>>>>>> (14*3*3, about
>>>>>>>> 68 atoms), which I had already ran in a different calculus  
>>>>>>>> center
>>>>>>>> without any issues, and I will still get the segmentation  
>>>>>>>> fault error.
>>>>>>>> This clearly indicates me that the problem is associated to a
>>>>>>>> configuration of the machines, to the way CP2K was  
>>>>>>>> installed, or to
>>>>>>>> the job submission's characteristics (or to something  
>>>>>>>> else??). I must
>>>>>>>> say I always get the exact same error during cell opt's  
>>>>>>>> second step,
>>>>>>>> no matter what the system is (small or big cell, Ag or Cu).
>>>>>>>> I tried running an Energy test on the smaller cell and it  
>>>>>>>> worked fine.
>>
>>>>>>>> I would really appreciate if any of you can throw some light  
>>>>>>>> at this,
>>>>>>>> for I'm pretty stuck on it right now.
>>
>>>>>>>> Cheers,
>>
>>>>>>>> Maricarmen.
>>
>>>>>>>> Arch file:
>>
>>>>>>>> # by default some intel compilers put temporaries on the stack
>>>>>>>> # this might lead to segmentation faults if the stack limit  
>>>>>>>> is set to
>>>>>>>> low
>>>>>>>> # stack limits can be increased by sysadmins or e.g with  
>>>>>>>> ulimit -s
>>>>>>>> 256000
>>>>>>>> # Tested on a HPC non-Itanium clusters @ UDS (France)
>>>>>>>> # Note: -O2 produces an executable which is slightly faster  
>>>>>>>> than -O3
>>>>>>>> # and the compilation time was also much shorter.
>>>>>>>> CC       = icc -diag-disable remark
>>>>>>>> CPP      =
>>>>>>>> FC       = ifort -diag-disable remark -openmp
>>>>>>>> LD       = ifort -diag-disable remark -openmp
>>>>>>>> AR       = ar -r
>>
>>>>>>>> #Better with mkl (intel lapack/blas) only
>>>>>>>> #DFLAGS   = -D__INTEL -D__FFTSG -D__parallel
>>>>>>>> #If you want to use BLACS and SCALAPACK use the flags below
>>>>>>>> DFLAGS   = -D__INTEL -D__FFTSG -D__parallel -D__BLACS - 
>>>>>>>> D__SCALAPACK -
>>>>>>>> D__FFTW3
>>>>>>>> CPPFLAGS =
>>>>>>>> FCFLAGS  = $(DFLAGS) -fpp -free -O3 -xS -I/opt/software/SGI/ 
>>>>>>>> intel/mkl/
>>>>>>>> 10.0.3.020/include -I/opt/software/SGI/intel/mkl/10.0.3.020/ 
>>>>>>>> include/
>>>>>>>> fftw
>>>>>>>> LDFLAGS  =  -L/opt/software/SGI/intel/mkl/10.0.3.020/lib/em64t
>>>>>>>> #LIBS     = -lmkl -lm -lpthread -lguide -openmp
>>>>>>>> #If you want to use BLACS and SCALAPACK use the libraries below
>>>>>>>> LIBS     = -Wl,--allow-multiple-definition - 
>>>>>>>> lmkl_scalapack_lp64 /
>>>>>>>> scratch/grisolia/blacsF77init_MPI-LINUX-0.a /scratch/grisolia/
>>>>>>>> blacs_MPI-LINUX-0.a -lmpi -lmkl -lfftw3xf_intel - 
>>>>>>>> lmkl_blacs_lp64
>>
>>>>>>>> OBJECTS_ARCHITECTURE = machine_intel.o
>>
>>>>>>>> -------
>>
>>>>>>>> Job submission's file (getting the sigsegv error):
>>
>>>>>>>> #PBS -N cp2k
>>>>>>>> #PBS -l walltime=24:00:00
>>>>>>>> #PBS -S /bin/bash
>>>>>>>> #PBS -l select=8:ncpus=8:mpiprocs=8:ompthreads=1
>>>>>>>> #PBS -j oe
>>>>>>>> #PBS -M  gris... at cemes.fr -m abe
>>
>>>>>>>> PBS_O_WORKDIR=/scratch/grisolia/CuVO/Fixed/
>>
>>>>>>>> cd $PBS_O_WORKDIR
>>
>>>>>>>> export OMP_NUM_THREADS=1
>>>>>>>> export MKL_NUM_THREADS=1
>>>>>>>> export MPI_GROUP_MAX=512
>>
>>>>>>>> /usr/pbs/bin/mpiexec /scratch/grisolia/cp2k/exe/Linux-x86-64- 
>>>>>>>> jade/
>>>>>>>> cp2k.psmp
>>
>> ...
>>
>> leer más »
>>
>>  bug_report_cell.tgz
>> 3363 KVerDescargar
> >




More information about the CP2K-user mailing list