[CP2K:2081] Re: Sigsegv error during cell optimization

Teodoro Laino teodor... at gmail.com
Mon May 18 11:41:36 CEST 2009


Maricarmen,

Please find attached a test suite that :

1) shows how the restart of the index of the cell_opt module works perfectly
2) shows how the WALLTIME flag works perfectly.

No additional modifications (related to this topic) have been committed 
in the repository since my last message on this mailing list.
Instructions:
    You have two directories: RUN1 and RUN2
    go in RUN1: and digest the input file inp1.inp with cp2k
    it will stop because the WALLTIME flag is set to 3 seconds.

    go immediately after in RUN2 and digest the inp2.inp with cp2k.

You will see that the files are newly created and "almost" no overlap is 
with the index of RUN1 directory.
Of course the optimizer needs to evaluate energy and forces in the point 
0, which is the last point of RUN1. You will
see only duplicated the files for that index. But that's all...

The sad story is that I had to prepare this test by myself and I didn't 
see anything back from your side (apart from complains): that's a very 
strange thing and I can tell you that it is very uncommon when somebody 
asks for help that the helper has to guess even how to reproduce the 
problem!!

For your convenience, keep also in mind the same suggestions we always 
say: CP2K is not an easy code.
Highly demanding in terms of compilers/libraries.
If you have a very tight problem in terms of timing (I'm aware the world 
today is based on timing!) keep in mind
that there are bunches of other codes.. much more well documented... 
more easy friendly.. that may possibly help you better
to achieve your goals.

Regards,
Teo




Maricarmen wrote:
> I guess I rushed in. It's NOT working. I'm just not getting the
> sigsegv signal, but CP2K just dies (usually when starting the second
> cellopt step, but in any case is always when starting the SCF cycle),
> no matter how big or small the system is. It starts fine, and then
> after a few steps it hangs and stays there until the job is killed by
> external signal due to limit time being reached. So now I'm spending
> all my calculation time without doing any better than before.
> May I add that I'm using the WALLTIME flag, but it is just not
> working. As I said, the job is killed by MPI.
> Pleeeease, could someone help me find out how to solve this? I'm not
> just wasting my calculation time for the year, but real time to get
> some useful results...
> I wouldn't want to bother the administrators again without knowing
> where the issue comes from. Should I tell them to try another
> compiler??
>
> Maricarmen
>
>
> On 11 mai, 09:31, Maricarme... at cemes.fr wrote:
>   
>> Ciao everyone,
>>
>> I wanted to let you know that we have apparently solved the problem.
>> The machine administrators have recompiled the code with these
>> settings:
>>
>> - Classical optimization (-O2 -g) for INTEL 11 compilers
>> - SGI MPT 1.22 MPI library
>> - Intel MKL and Intel FFTW libraries
>>
>> I have been testing it the whole weekend and it looks like it works
>> again :)
>> Thanks a lot for your help.
>>
>> Cheers,
>>
>> Maricarmen
>>
>> On 6 mai, 16:49, Axel <akoh... at gmail.com> wrote:
>>
>>     
>>> ciao maricarmen,
>>>       
>>> On May 6, 9:24 am, Maricarme... at cemes.fr wrote:
>>>       
>>>> Thanks Teo,
>>>>         
>>>> Actually the Intel fortran compiler is version 10.1.017. I can't find
>>>> any comments on this particular version. I found something on 10.1.018
>>>> though, and it semmed to work fin.
>>>> In the machine there is also version 11.0.83, but I actually found
>>>> some message on the list reporting problems with latests compilers
>>>> (e.g. versions 11).
>>>>         
>>> hard to say, but the fact that it is up to patch level 83 is somewhat
>>> telling.
>>> i'd try the 10.1 first.
>>>       
>>>> For the plain popt CP2K version I'll have to ask the administrators to
>>>> recompile the code (they did it the first time), so I might as well
>>>> ask them to use the newer compiler this time. Otherwise, do you think
>>>> it's better to compile to the popt version with the same compiler
>>>> (e.g. 10.1.017)?
>>>>         
>>> i would suggest to first go a bit more conservative in optimization
>>> and
>>> replace '-O3 -xS' with '-O2'. using a less aggressive optimization
>>> frequently
>>> helps with intel compilers. since you seem to be on an itanium
>>> processor
>>> machine, you'll be seeing more problems, though. those compilers are
>>> generally lagging behind the x86 versions in reliability. idependent
>>> of the
>>> individual version.
>>>       
>>> if you look through the files in the arch directory. there are several
>>> entries
>>> with exceptions for files that are better compiled without any
>>> optimizations
>>> to work around to aggressive compilers. i'd try to collect all of them
>>> into
>>> a special arch file in case you still are seeing problems.
>>>       
>>> finally, i'd have a closer look at the mpi manpage. on altix machines
>>> there
>>> are a few environment variables that can affect the stability and
>>> performance
>>> of parallel jobs. i remember having tinkered with that on a machine,
>>> but i have
>>> currently no access to it, and forgot to transfer the job scripts
>>> before that.
>>>       
>>> cheers,
>>>     axel.
>>>       
>>>> Ciao,
>>>>         
>>>> Maricarmen
>>>>         
>>>> On 6 mai, 09:56, Teodoro Laino <teodor... at gmail.com> wrote:
>>>>         
>>>>> Hi Maricarmen,
>>>>>           
>>>>> could you try a plain popt version without the smp support?
>>>>> Keep as well in the submission script ompthreads=1.
>>>>>           
>>>>> which version of intel compiler are you using? did you check on this
>>>>> mailing list that it is a "good one"?
>>>>> In case, do you have access to other compilers on that machine?
>>>>>           
>>>>> Teo
>>>>>           
>>>>> Maricarme... at cemes.fr wrote:
>>>>>           
>>>>>> Hello everyone,
>>>>>>             
>>>>>> I'm running a DFT cell optimization for Mx-V4O11 crystals (M = Ag and
>>>>>> Cu). My cells are approximately 14x7x7 and about 260 atoms. Below is a
>>>>>> copy of one of my input files. The problem is I keep getting a SIGSEGV
>>>>>> (11) error, usually when starting the SCF cycles for the second cell
>>>>>> opt step (an extract from the output file is also below).
>>>>>> I'm running parallel on a calculus center (http://www.cines.fr/
>>>>>> spip.php?rubrique186), and the administrators have already checked for
>>>>>> the stack size (which according to them is set to unlimited). Below is
>>>>>> also a copy of the job submission's file, and of the arch file.
>>>>>> I even tried to run a cell opt test for a smaller cell (14*3*3, about
>>>>>> 68 atoms), which I had already ran in a different calculus center
>>>>>> without any issues, and I will still get the segmentation fault error.
>>>>>> This clearly indicates me that the problem is associated to a
>>>>>> configuration of the machines, to the way CP2K was installed, or to
>>>>>> the job submission's characteristics (or to something else??). I must
>>>>>> say I always get the exact same error during cell opt's second step,
>>>>>> no matter what the system is (small or big cell, Ag or Cu).
>>>>>> I tried running an Energy test on the smaller cell and it worked fine.
>>>>>>             
>>>>>> I would really appreciate if any of you can throw some light at this,
>>>>>> for I'm pretty stuck on it right now.
>>>>>>             
>>>>>> Cheers,
>>>>>>             
>>>>>> Maricarmen.
>>>>>>             
>>>>>> Arch file:
>>>>>>             
>>>>>> # by default some intel compilers put temporaries on the stack
>>>>>> # this might lead to segmentation faults if the stack limit is set to
>>>>>> low
>>>>>> # stack limits can be increased by sysadmins or e.g with ulimit -s
>>>>>> 256000
>>>>>> # Tested on a HPC non-Itanium clusters @ UDS (France)
>>>>>> # Note: -O2 produces an executable which is slightly faster than -O3
>>>>>> # and the compilation time was also much shorter.
>>>>>> CC       = icc -diag-disable remark
>>>>>> CPP      =
>>>>>> FC       = ifort -diag-disable remark -openmp
>>>>>> LD       = ifort -diag-disable remark -openmp
>>>>>> AR       = ar -r
>>>>>>             
>>>>>> #Better with mkl (intel lapack/blas) only
>>>>>> #DFLAGS   = -D__INTEL -D__FFTSG -D__parallel
>>>>>> #If you want to use BLACS and SCALAPACK use the flags below
>>>>>> DFLAGS   = -D__INTEL -D__FFTSG -D__parallel -D__BLACS -D__SCALAPACK -
>>>>>> D__FFTW3
>>>>>> CPPFLAGS =
>>>>>> FCFLAGS  = $(DFLAGS) -fpp -free -O3 -xS -I/opt/software/SGI/intel/mkl/
>>>>>> 10.0.3.020/include -I/opt/software/SGI/intel/mkl/10.0.3.020/include/
>>>>>> fftw
>>>>>> LDFLAGS  =  -L/opt/software/SGI/intel/mkl/10.0.3.020/lib/em64t
>>>>>> #LIBS     = -lmkl -lm -lpthread -lguide -openmp
>>>>>> #If you want to use BLACS and SCALAPACK use the libraries below
>>>>>> LIBS     = -Wl,--allow-multiple-definition -lmkl_scalapack_lp64 /
>>>>>> scratch/grisolia/blacsF77init_MPI-LINUX-0.a /scratch/grisolia/
>>>>>> blacs_MPI-LINUX-0.a -lmpi -lmkl -lfftw3xf_intel -lmkl_blacs_lp64
>>>>>>             
>>>>>> OBJECTS_ARCHITECTURE = machine_intel.o
>>>>>>             
>>>>>> -------
>>>>>>             
>>>>>> Job submission's file (getting the sigsegv error):
>>>>>>             
>>>>>> #PBS -N cp2k
>>>>>> #PBS -l walltime=24:00:00
>>>>>> #PBS -S /bin/bash
>>>>>> #PBS -l select=8:ncpus=8:mpiprocs=8:ompthreads=1
>>>>>> #PBS -j oe
>>>>>> #PBS -M  gris... at cemes.fr -m abe
>>>>>>             
>>>>>> PBS_O_WORKDIR=/scratch/grisolia/CuVO/Fixed/
>>>>>>             
>>>>>> cd $PBS_O_WORKDIR
>>>>>>             
>>>>>> export OMP_NUM_THREADS=1
>>>>>> export MKL_NUM_THREADS=1
>>>>>> export MPI_GROUP_MAX=512
>>>>>>             
>>>>>> /usr/pbs/bin/mpiexec /scratch/grisolia/cp2k/exe/Linux-x86-64-jade/
>>>>>> cp2k.psmp CuV4O11-CellOpt.inp
>>>>>>             
>>>>>> --------------
>>>>>>             
>>>>>> Input file:
>>>>>>             
>>>>>> &GLOBAL
>>>>>>   PROJECT     CuV4O11-CellOpt
>>>>>>   RUN_TYPE    CELL_OPT
>>>>>>   PRINT_LEVEL MEDIUM
>>>>>>   WALLTIME  86000
>>>>>> &END GLOBAL
>>>>>> &FORCE_EVAL
>>>>>>   METHOD Quickstep
>>>>>>   &DFT
>>>>>>     BASIS_SET_FILE_NAME /scratch/grisolia/cp2k/tests/QS/BASIS_MOLOPT
>>>>>>     POTENTIAL_FILE_NAME /scratch/grisolia/cp2k/tests/QS/GTH_POTENTIALS
>>>>>>     LSD
>>>>>>     &MGRID
>>>>>>       CUTOFF 280
>>>>>>       NGRIDS 5
>>>>>>     &END MGRID
>>>>>>     &QS
>>>>>>       EPS_DEFAULT   1.0E-10
>>>>>>       EXTRAPOLATION PS
>>>>>>       EXTRAPOLATION_ORDER 1
>>>>>>     &END QS
>>>>>>     &SCF
>>>>>>       SCF_GUESS RESTART
>>>>>>       EPS_SCF 2.0E-7
>>>>>>       MAX_SCF 30
>>>>>>       &OUTER_SCF
>>>>>>          EPS_SCF 2.0E-7
>>>>>>          MAX_SCF 15
>>>>>>       &END
>>>>>>       &OT
>>>>>>         MINIMIZER CG
>>>>>>         PRECONDITIONER FULL_SINGLE_INVERSE
>>>>>>         ENERGY_GAP 0.05
>>>>>>       &END
>>>>>>       &PRINT
>>>>>>          &RESTART
>>>>>>             FILENAME = CuV4O11-CellOpt.wfn
>>>>>>          &END
>>>>>>       &END
>>>>>>     &END SCF
>>>>>>     &XC
>>>>>>       &XC_FUNCTIONAL PBE
>>>>>>       &END XC_FUNCTIONAL
>>>>>>     &END XC
>>>>>>      &PRINT
>>>>>>        &MO_CUBES
>>>>>>           WRITE_CUBE F
>>>>>>           NLUMO      20
>>>>>>           NHOMO      20
>>>>>>        &END
>>>>>>      &END
>>>>>>   &END DFT
>>>>>>   &SUBSYS
>>>>>>     &CELL
>>>>>>       @INCLUDE CuV4O11-GeoOpt.cell
>>>>>>     &END CELL
>>>>>>     &COORD
>>>>>>       @INCLUDE CuV4O11-GeoOpt.coord
>>>>>>     &END COORD
>>>>>>     &END COORD
>>>>>>     &KIND Cu
>>>>>>       BASIS_SET DZVP-MOLOPT-SR-GTH
>>>>>>       POTENTIAL GTH-PBE-q11
>>>>>>     &END KIND
>>>>>>     &KIND O
>>>>>>       BASIS_SET DZVP-MOLOPT-SR-GTH
>>>>>>       POTENTIAL GTH-PBE-q6
>>>>>>     &END KIND
>>>>>>     &KIND V
>>>>>>       BASIS_SET DZVP-MOLOPT-SR-GTH
>>>>>>       POTENTIAL GTH-PBE-q13
>>>>>>     &END KIND
>>>>>>   &END SUBSYS
>>>>>>   STRESS_TENSOR ANALYTICAL
>>>>>> &END FORCE_EVAL
>>>>>> &MOTION
>>>>>>   &MD
>>>>>>       TIMESTEP [fs] 0.5
>>>>>>       STEPS         10000
>>>>>>       TEMPERATURE   500
>>>>>>       ENSEMBLE      NVE
>>>>>>   &END
>>>>>>   &CELL_OPT
>>>>>>     TYPE GEO_OPT
>>>>>>     OPTIMIZER CG
>>>>>>     MAX_ITER 20
>>>>>>     EXTERNAL_PRESSURE [bar] 0.0
>>>>>>     MAX_DR 0.02
>>>>>>     RMS_DR 0.01
>>>>>>     MAX_FORCE 0.002
>>>>>>     RMS_FORCE 0.001
>>>>>>     KEEP_ANGLES T
>>>>>>     &CG
>>>>>>       &LINE_SEARCH
>>>>>>         TYPE 2PNT
>>>>>>         &2PNT
>>>>>>         &END
>>>>>>       &END
>>>>>>     &END
>>>>>>   &END
>>>>>>   &GEO_OPT
>>>>>>     MAX_ITER 300
>>>>>>     MINIMIZER LBFGS
>>>>>>   &END
>>>>>> &END
>>>>>>             
>>>>>> -------
>>>>>>             
>>>>>> Extract from the output file ( sigsegv error):
>>>>>>             
>>>>>> MPI: On host r17i2n5, Program /scratch/cem6039/grisolia/cp2k/exe/Linux-
>>>>>> x86-64-jade/cp2k.psmp, Rank 0, Process 4568 received signal SIGSEGV
>>>>>>             
>> ...
>>
>> plus de détails »
>>     
> >
>   

-------------- next part --------------
A non-text attachment was scrubbed...
Name: bug_report_cell.tgz
Type: application/octet-stream
Size: 2547831 bytes
Desc: not available
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20090518/a1ef2e43/attachment.obj>


More information about the CP2K-user mailing list