[CP2K:2081] Re: Sigsegv error during cell optimization

Juerg Hutter hut... at pci.uzh.ch
Mon May 18 08:13:57 UTC 2009
Previous message (by thread): Sigsegv error during cell optimization
Next message (by thread): [CP2K:2081] Re: Sigsegv error during cell optimization
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Dear Maricarmen

if you can reproduce the problem on a small system, you
should be able to run it on another machine.
Assuming it works there correctly you can document
the setup (copiler version, libraries, etc) and ask
the system admins to investigate their machine.

regards

Juerg

----------------------------------------------------------
Juerg Hutter                   Phone : ++41 44 635 4491
Physical Chemistry Institute   FAX   : ++41 44 635 6838
University of Zurich           E-mail: hut... at pci.uzh.ch
Winterthurerstrasse 190
CH-8057 Zurich, Switzerland
----------------------------------------------------------


On Mon, 18 May 2009, Maricarmen wrote:

>
> I guess I rushed in. It's NOT working. I'm just not getting the
> sigsegv signal, but CP2K just dies (usually when starting the second
> cellopt step, but in any case is always when starting the SCF cycle),
> no matter how big or small the system is. It starts fine, and then
> after a few steps it hangs and stays there until the job is killed by
> external signal due to limit time being reached. So now I'm spending
> all my calculation time without doing any better than before.
> May I add that I'm using the WALLTIME flag, but it is just not
> working. As I said, the job is killed by MPI.
> Pleeeease, could someone help me find out how to solve this? I'm not
> just wasting my calculation time for the year, but real time to get
> some useful results...
> I wouldn't want to bother the administrators again without knowing
> where the issue comes from. Should I tell them to try another
> compiler??
>
> Maricarmen
>
>
> On 11 mai, 09:31, Maricarme... at cemes.fr wrote:
>> Ciao everyone,
>>
>> I wanted to let you know that we have apparently solved the problem.
>> The machine administrators have recompiled the code with these
>> settings:
>>
>> - Classical optimization (-O2 -g) for INTEL 11 compilers
>> - SGI MPT 1.22 MPI library
>> - Intel MKL and Intel FFTW libraries
>>
>> I have been testing it the whole weekend and it looks like it works
>> again :)
>> Thanks a lot for your help.
>>
>> Cheers,
>>
>> Maricarmen
>>
>> On 6 mai, 16:49, Axel <akoh... at gmail.com> wrote:
>>
>>> ciao maricarmen,
>>
>>> On May 6, 9:24 am, Maricarme... at cemes.fr wrote:
>>
>>>> Thanks Teo,
>>
>>>> Actually the Intel fortran compiler is version 10.1.017. I can't find
>>>> any comments on this particular version. I found something on 10.1.018
>>>> though, and it semmed to work fin.
>>>> In the machine there is also version 11.0.83, but I actually found
>>>> some message on the list reporting problems with latests compilers
>>>> (e.g. versions 11).
>>
>>> hard to say, but the fact that it is up to patch level 83 is somewhat
>>> telling.
>>> i'd try the 10.1 first.
>>
>>>> For the plain popt CP2K version I'll have to ask the administrators to
>>>> recompile the code (they did it the first time), so I might as well
>>>> ask them to use the newer compiler this time. Otherwise, do you think
>>>> it's better to compile to the popt version with the same compiler
>>>> (e.g. 10.1.017)?
>>
>>> i would suggest to first go a bit more conservative in optimization
>>> and
>>> replace '-O3 -xS' with '-O2'. using a less aggressive optimization
>>> frequently
>>> helps with intel compilers. since you seem to be on an itanium
>>> processor
>>> machine, you'll be seeing more problems, though. those compilers are
>>> generally lagging behind the x86 versions in reliability. idependent
>>> of the
>>> individual version.
>>
>>> if you look through the files in the arch directory. there are several
>>> entries
>>> with exceptions for files that are better compiled without any
>>> optimizations
>>> to work around to aggressive compilers. i'd try to collect all of them
>>> into
>>> a special arch file in case you still are seeing problems.
>>
>>> finally, i'd have a closer look at the mpi manpage. on altix machines
>>> there
>>> are a few environment variables that can affect the stability and
>>> performance
>>> of parallel jobs. i remember having tinkered with that on a machine,
>>> but i have
>>> currently no access to it, and forgot to transfer the job scripts
>>> before that.
>>
>>> cheers,
>>>     axel.
>>
>>>> Ciao,
>>
>>>> Maricarmen
>>
>>>> On 6 mai, 09:56, Teodoro Laino <teodor... at gmail.com> wrote:
>>
>>>>> Hi Maricarmen,
>>
>>>>> could you try a plain popt version without the smp support?
>>>>> Keep as well in the submission script ompthreads=1.
>>
>>>>> which version of intel compiler are you using? did you check on this
>>>>> mailing list that it is a "good one"?
>>>>> In case, do you have access to other compilers on that machine?
>>
>>>>> Teo
>>
>>>>> Maricarme... at cemes.fr wrote:
>>>>>> Hello everyone,
>>
>>>>>> I'm running a DFT cell optimization for Mx-V4O11 crystals (M = Ag and
>>>>>> Cu). My cells are approximately 14x7x7 and about 260 atoms. Below is a
>>>>>> copy of one of my input files. The problem is I keep getting a SIGSEGV
>>>>>> (11) error, usually when starting the SCF cycles for the second cell
>>>>>> opt step (an extract from the output file is also below).
>>>>>> I'm running parallel on a calculus center (http://www.cines.fr/
>>>>>> spip.php?rubrique186), and the administrators have already checked for
>>>>>> the stack size (which according to them is set to unlimited). Below is
>>>>>> also a copy of the job submission's file, and of the arch file.
>>>>>> I even tried to run a cell opt test for a smaller cell (14*3*3, about
>>>>>> 68 atoms), which I had already ran in a different calculus center
>>>>>> without any issues, and I will still get the segmentation fault error.
>>>>>> This clearly indicates me that the problem is associated to a
>>>>>> configuration of the machines, to the way CP2K was installed, or to
>>>>>> the job submission's characteristics (or to something else??). I must
>>>>>> say I always get the exact same error during cell opt's second step,
>>>>>> no matter what the system is (small or big cell, Ag or Cu).
>>>>>> I tried running an Energy test on the smaller cell and it worked fine.
>>
>>>>>> I would really appreciate if any of you can throw some light at this,
>>>>>> for I'm pretty stuck on it right now.
>>
>>>>>> Cheers,
>>
>>>>>> Maricarmen.
>>
>>>>>> Arch file:
>>
>>>>>> # by default some intel compilers put temporaries on the stack
>>>>>> # this might lead to segmentation faults if the stack limit is set to
>>>>>> low
>>>>>> # stack limits can be increased by sysadmins or e.g with ulimit -s
>>>>>> 256000
>>>>>> # Tested on a HPC non-Itanium clusters @ UDS (France)
>>>>>> # Note: -O2 produces an executable which is slightly faster than -O3
>>>>>> # and the compilation time was also much shorter.
>>>>>> CC       = icc -diag-disable remark
>>>>>> CPP      =
>>>>>> FC       = ifort -diag-disable remark -openmp
>>>>>> LD       = ifort -diag-disable remark -openmp
>>>>>> AR       = ar -r
>>
>>>>>> #Better with mkl (intel lapack/blas) only
>>>>>> #DFLAGS   = -D__INTEL -D__FFTSG -D__parallel
>>>>>> #If you want to use BLACS and SCALAPACK use the flags below
>>>>>> DFLAGS   = -D__INTEL -D__FFTSG -D__parallel -D__BLACS -D__SCALAPACK -
>>>>>> D__FFTW3
>>>>>> CPPFLAGS =
>>>>>> FCFLAGS  = $(DFLAGS) -fpp -free -O3 -xS -I/opt/software/SGI/intel/mkl/
>>>>>> 10.0.3.020/include -I/opt/software/SGI/intel/mkl/10.0.3.020/include/
>>>>>> fftw
>>>>>> LDFLAGS  =  -L/opt/software/SGI/intel/mkl/10.0.3.020/lib/em64t
>>>>>> #LIBS     = -lmkl -lm -lpthread -lguide -openmp
>>>>>> #If you want to use BLACS and SCALAPACK use the libraries below
>>>>>> LIBS     = -Wl,--allow-multiple-definition -lmkl_scalapack_lp64 /
>>>>>> scratch/grisolia/blacsF77init_MPI-LINUX-0.a /scratch/grisolia/
>>>>>> blacs_MPI-LINUX-0.a -lmpi -lmkl -lfftw3xf_intel -lmkl_blacs_lp64
>>
>>>>>> OBJECTS_ARCHITECTURE = machine_intel.o
>>
>>>>>> -------
>>
>>>>>> Job submission's file (getting the sigsegv error):
>>
>>>>>> #PBS -N cp2k
>>>>>> #PBS -l walltime=24:00:00
>>>>>> #PBS -S /bin/bash
>>>>>> #PBS -l select=8:ncpus=8:mpiprocs=8:ompthreads=1
>>>>>> #PBS -j oe
>>>>>> #PBS -M  gris... at cemes.fr -m abe
>>
>>>>>> PBS_O_WORKDIR=/scratch/grisolia/CuVO/Fixed/
>>
>>>>>> cd $PBS_O_WORKDIR
>>
>>>>>> export OMP_NUM_THREADS=1
>>>>>> export MKL_NUM_THREADS=1
>>>>>> export MPI_GROUP_MAX=512
>>
>>>>>> /usr/pbs/bin/mpiexec /scratch/grisolia/cp2k/exe/Linux-x86-64-jade/
>>>>>> cp2k.psmp CuV4O11-CellOpt.inp
>>
>>>>>> --------------
>>
>>>>>> Input file:
>>
>>>>>> &GLOBAL
>>>>>>   PROJECT     CuV4O11-CellOpt
>>>>>>   RUN_TYPE    CELL_OPT
>>>>>>   PRINT_LEVEL MEDIUM
>>>>>>   WALLTIME  86000
>>>>>> &END GLOBAL
>>>>>> &FORCE_EVAL
>>>>>>   METHOD Quickstep
>>>>>>   &DFT
>>>>>>     BASIS_SET_FILE_NAME /scratch/grisolia/cp2k/tests/QS/BASIS_MOLOPT
>>>>>>     POTENTIAL_FILE_NAME /scratch/grisolia/cp2k/tests/QS/GTH_POTENTIALS
>>>>>>     LSD
>>>>>>     &MGRID
>>>>>>       CUTOFF 280
>>>>>>       NGRIDS 5
>>>>>>     &END MGRID
>>>>>>     &QS
>>>>>>       EPS_DEFAULT   1.0E-10
>>>>>>       EXTRAPOLATION PS
>>>>>>       EXTRAPOLATION_ORDER 1
>>>>>>     &END QS
>>>>>>     &SCF
>>>>>>       SCF_GUESS RESTART
>>>>>>       EPS_SCF 2.0E-7
>>>>>>       MAX_SCF 30
>>>>>>       &OUTER_SCF
>>>>>>          EPS_SCF 2.0E-7
>>>>>>          MAX_SCF 15
>>>>>>       &END
>>>>>>       &OT
>>>>>>         MINIMIZER CG
>>>>>>         PRECONDITIONER FULL_SINGLE_INVERSE
>>>>>>         ENERGY_GAP 0.05
>>>>>>       &END
>>>>>>       &PRINT
>>>>>>          &RESTART
>>>>>>             FILENAME = CuV4O11-CellOpt.wfn
>>>>>>          &END
>>>>>>       &END
>>>>>>     &END SCF
>>>>>>     &XC
>>>>>>       &XC_FUNCTIONAL PBE
>>>>>>       &END XC_FUNCTIONAL
>>>>>>     &END XC
>>>>>>      &PRINT
>>>>>>        &MO_CUBES
>>>>>>           WRITE_CUBE F
>>>>>>           NLUMO      20
>>>>>>           NHOMO      20
>>>>>>        &END
>>>>>>      &END
>>>>>>   &END DFT
>>>>>>   &SUBSYS
>>>>>>     &CELL
>>>>>>       @INCLUDE CuV4O11-GeoOpt.cell
>>>>>>     &END CELL
>>>>>>     &COORD
>>>>>>       @INCLUDE CuV4O11-GeoOpt.coord
>>>>>>     &END COORD
>>>>>>     &END COORD
>>>>>>     &KIND Cu
>>>>>>       BASIS_SET DZVP-MOLOPT-SR-GTH
>>>>>>       POTENTIAL GTH-PBE-q11
>>>>>>     &END KIND
>>>>>>     &KIND O
>>>>>>       BASIS_SET DZVP-MOLOPT-SR-GTH
>>>>>>       POTENTIAL GTH-PBE-q6
>>>>>>     &END KIND
>>>>>>     &KIND V
>>>>>>       BASIS_SET DZVP-MOLOPT-SR-GTH
>>>>>>       POTENTIAL GTH-PBE-q13
>>>>>>     &END KIND
>>>>>>   &END SUBSYS
>>>>>>   STRESS_TENSOR ANALYTICAL
>>>>>> &END FORCE_EVAL
>>>>>> &MOTION
>>>>>>   &MD
>>>>>>       TIMESTEP [fs] 0.5
>>>>>>       STEPS         10000
>>>>>>       TEMPERATURE   500
>>>>>>       ENSEMBLE      NVE
>>>>>>   &END
>>>>>>   &CELL_OPT
>>>>>>     TYPE GEO_OPT
>>>>>>     OPTIMIZER CG
>>>>>>     MAX_ITER 20
>>>>>>     EXTERNAL_PRESSURE [bar] 0.0
>>>>>>     MAX_DR 0.02
>>>>>>     RMS_DR 0.01
>>>>>>     MAX_FORCE 0.002
>>>>>>     RMS_FORCE 0.001
>>>>>>     KEEP_ANGLES T
>>>>>>     &CG
>>>>>>       &LINE_SEARCH
>>>>>>         TYPE 2PNT
>>>>>>         &2PNT
>>>>>>         &END
>>>>>>       &END
>>>>>>     &END
>>>>>>   &END
>>>>>>   &GEO_OPT
>>>>>>     MAX_ITER 300
>>>>>>     MINIMIZER LBFGS
>>>>>>   &END
>>>>>> &END
>>
>>>>>> -------
>>
>>>>>> Extract from the output file ( sigsegv error):
>>
>>>>>> MPI: On host r17i2n5, Program /scratch/cem6039/grisolia/cp2k/exe/Linux-
>>>>>> x86-64-jade/cp2k.psmp, Rank 0, Process 4568 received signal SIGSEGV
>>
>> ...
>>
>> plus de détails »
> >
>
Previous message (by thread): Sigsegv error during cell optimization
Next message (by thread): [CP2K:2081] Re: Sigsegv error during cell optimization
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the CP2K-user mailing list