Sigsegv error during cell optimization

Maricarme... at cemes.fr Maricarme... at cemes.fr
Mon May 11 09:31:40 CEST 2009


Ciao everyone,

I wanted to let you know that we have apparently solved the problem.
The machine administrators have recompiled the code with these
settings:

- Classical optimization (-O2 -g) for INTEL 11 compilers
- SGI MPT 1.22 MPI library
- Intel MKL and Intel FFTW libraries

I have been testing it the whole weekend and it looks like it works
again :)
Thanks a lot for your help.

Cheers,

Maricarmen



On 6 mai, 16:49, Axel <akoh... at gmail.com> wrote:
> ciao maricarmen,
>
> On May 6, 9:24 am, Maricarme... at cemes.fr wrote:
>
> > Thanks Teo,
>
> > Actually the Intel fortran compiler is version 10.1.017. I can't find
> > any comments on this particular version. I found something on 10.1.018
> > though, and it semmed to work fin.
> > In the machine there is also version 11.0.83, but I actually found
> > some message on the list reporting problems with latests compilers
> > (e.g. versions 11).
>
> hard to say, but the fact that it is up to patch level 83 is somewhat
> telling.
> i'd try the 10.1 first.
>
> > For the plain popt CP2K version I'll have to ask the administrators to
> > recompile the code (they did it the first time), so I might as well
> > ask them to use the newer compiler this time. Otherwise, do you think
> > it's better to compile to the popt version with the same compiler
> > (e.g. 10.1.017)?
>
> i would suggest to first go a bit more conservative in optimization
> and
> replace '-O3 -xS' with '-O2'. using a less aggressive optimization
> frequently
> helps with intel compilers. since you seem to be on an itanium
> processor
> machine, you'll be seeing more problems, though. those compilers are
> generally lagging behind the x86 versions in reliability. idependent
> of the
> individual version.
>
> if you look through the files in the arch directory. there are several
> entries
> with exceptions for files that are better compiled without any
> optimizations
> to work around to aggressive compilers. i'd try to collect all of them
> into
> a special arch file in case you still are seeing problems.
>
> finally, i'd have a closer look at the mpi manpage. on altix machines
> there
> are a few environment variables that can affect the stability and
> performance
> of parallel jobs. i remember having tinkered with that on a machine,
> but i have
> currently no access to it, and forgot to transfer the job scripts
> before that.
>
> cheers,
>     axel.
>
>
>
> > Ciao,
>
> > Maricarmen
>
> > On 6 mai, 09:56, Teodoro Laino <teodor... at gmail.com> wrote:
>
> > > Hi Maricarmen,
>
> > > could you try a plain popt version without the smp support?
> > > Keep as well in the submission script ompthreads=1.
>
> > > which version of intel compiler are you using? did you check on this
> > > mailing list that it is a "good one"?
> > > In case, do you have access to other compilers on that machine?
>
> > > Teo
>
> > > Maricarme... at cemes.fr wrote:
> > > > Hello everyone,
>
> > > > I'm running a DFT cell optimization for Mx-V4O11 crystals (M = Ag and
> > > > Cu). My cells are approximately 14x7x7 and about 260 atoms. Below is a
> > > > copy of one of my input files. The problem is I keep getting a SIGSEGV
> > > > (11) error, usually when starting the SCF cycles for the second cell
> > > > opt step (an extract from the output file is also below).
> > > > I'm running parallel on a calculus center (http://www.cines.fr/
> > > > spip.php?rubrique186), and the administrators have already checked for
> > > > the stack size (which according to them is set to unlimited). Below is
> > > > also a copy of the job submission's file, and of the arch file.
> > > > I even tried to run a cell opt test for a smaller cell (14*3*3, about
> > > > 68 atoms), which I had already ran in a different calculus center
> > > > without any issues, and I will still get the segmentation fault error.
> > > > This clearly indicates me that the problem is associated to a
> > > > configuration of the machines, to the way CP2K was installed, or to
> > > > the job submission's characteristics (or to something else??). I must
> > > > say I always get the exact same error during cell opt's second step,
> > > > no matter what the system is (small or big cell, Ag or Cu).
> > > > I tried running an Energy test on the smaller cell and it worked fine.
>
> > > > I would really appreciate if any of you can throw some light at this,
> > > > for I'm pretty stuck on it right now.
>
> > > > Cheers,
>
> > > > Maricarmen.
>
> > > > Arch file:
>
> > > > # by default some intel compilers put temporaries on the stack
> > > > # this might lead to segmentation faults if the stack limit is set to
> > > > low
> > > > # stack limits can be increased by sysadmins or e.g with ulimit -s
> > > > 256000
> > > > # Tested on a HPC non-Itanium clusters @ UDS (France)
> > > > # Note: -O2 produces an executable which is slightly faster than -O3
> > > > # and the compilation time was also much shorter.
> > > > CC       = icc -diag-disable remark
> > > > CPP      =
> > > > FC       = ifort -diag-disable remark -openmp
> > > > LD       = ifort -diag-disable remark -openmp
> > > > AR       = ar -r
>
> > > > #Better with mkl (intel lapack/blas) only
> > > > #DFLAGS   = -D__INTEL -D__FFTSG -D__parallel
> > > > #If you want to use BLACS and SCALAPACK use the flags below
> > > > DFLAGS   = -D__INTEL -D__FFTSG -D__parallel -D__BLACS -D__SCALAPACK -
> > > > D__FFTW3
> > > > CPPFLAGS =
> > > > FCFLAGS  = $(DFLAGS) -fpp -free -O3 -xS -I/opt/software/SGI/intel/mkl/
> > > > 10.0.3.020/include -I/opt/software/SGI/intel/mkl/10.0.3.020/include/
> > > > fftw
> > > > LDFLAGS  =  -L/opt/software/SGI/intel/mkl/10.0.3.020/lib/em64t
> > > > #LIBS     = -lmkl -lm -lpthread -lguide -openmp
> > > > #If you want to use BLACS and SCALAPACK use the libraries below
> > > > LIBS     = -Wl,--allow-multiple-definition -lmkl_scalapack_lp64 /
> > > > scratch/grisolia/blacsF77init_MPI-LINUX-0.a /scratch/grisolia/
> > > > blacs_MPI-LINUX-0.a -lmpi -lmkl -lfftw3xf_intel -lmkl_blacs_lp64
>
> > > > OBJECTS_ARCHITECTURE = machine_intel.o
>
> > > > -------
>
> > > > Job submission's file (getting the sigsegv error):
>
> > > > #PBS -N cp2k
> > > > #PBS -l walltime=24:00:00
> > > > #PBS -S /bin/bash
> > > > #PBS -l select=8:ncpus=8:mpiprocs=8:ompthreads=1
> > > > #PBS -j oe
> > > > #PBS -M  gris... at cemes.fr -m abe
>
> > > > PBS_O_WORKDIR=/scratch/grisolia/CuVO/Fixed/
>
> > > > cd $PBS_O_WORKDIR
>
> > > > export OMP_NUM_THREADS=1
> > > > export MKL_NUM_THREADS=1
> > > > export MPI_GROUP_MAX=512
>
> > > > /usr/pbs/bin/mpiexec /scratch/grisolia/cp2k/exe/Linux-x86-64-jade/
> > > > cp2k.psmp CuV4O11-CellOpt.inp
>
> > > > --------------
>
> > > > Input file:
>
> > > > &GLOBAL
> > > >   PROJECT     CuV4O11-CellOpt
> > > >   RUN_TYPE    CELL_OPT
> > > >   PRINT_LEVEL MEDIUM
> > > >   WALLTIME  86000
> > > > &END GLOBAL
> > > > &FORCE_EVAL
> > > >   METHOD Quickstep
> > > >   &DFT
> > > >     BASIS_SET_FILE_NAME /scratch/grisolia/cp2k/tests/QS/BASIS_MOLOPT
> > > >     POTENTIAL_FILE_NAME /scratch/grisolia/cp2k/tests/QS/GTH_POTENTIALS
> > > >     LSD
> > > >     &MGRID
> > > >       CUTOFF 280
> > > >       NGRIDS 5
> > > >     &END MGRID
> > > >     &QS
> > > >       EPS_DEFAULT   1.0E-10
> > > >       EXTRAPOLATION PS
> > > >       EXTRAPOLATION_ORDER 1
> > > >     &END QS
> > > >     &SCF
> > > >       SCF_GUESS RESTART
> > > >       EPS_SCF 2.0E-7
> > > >       MAX_SCF 30
> > > >       &OUTER_SCF
> > > >          EPS_SCF 2.0E-7
> > > >          MAX_SCF 15
> > > >       &END
> > > >       &OT
> > > >         MINIMIZER CG
> > > >         PRECONDITIONER FULL_SINGLE_INVERSE
> > > >         ENERGY_GAP 0.05
> > > >       &END
> > > >       &PRINT
> > > >          &RESTART
> > > >             FILENAME = CuV4O11-CellOpt.wfn
> > > >          &END
> > > >       &END
> > > >     &END SCF
> > > >     &XC
> > > >       &XC_FUNCTIONAL PBE
> > > >       &END XC_FUNCTIONAL
> > > >     &END XC
> > > >      &PRINT
> > > >        &MO_CUBES
> > > >           WRITE_CUBE F
> > > >           NLUMO      20
> > > >           NHOMO      20
> > > >        &END
> > > >      &END
> > > >   &END DFT
> > > >   &SUBSYS
> > > >     &CELL
> > > >       @INCLUDE CuV4O11-GeoOpt.cell
> > > >     &END CELL
> > > >     &COORD
> > > >       @INCLUDE CuV4O11-GeoOpt.coord
> > > >     &END COORD
> > > >     &END COORD
> > > >     &KIND Cu
> > > >       BASIS_SET DZVP-MOLOPT-SR-GTH
> > > >       POTENTIAL GTH-PBE-q11
> > > >     &END KIND
> > > >     &KIND O
> > > >       BASIS_SET DZVP-MOLOPT-SR-GTH
> > > >       POTENTIAL GTH-PBE-q6
> > > >     &END KIND
> > > >     &KIND V
> > > >       BASIS_SET DZVP-MOLOPT-SR-GTH
> > > >       POTENTIAL GTH-PBE-q13
> > > >     &END KIND
> > > >   &END SUBSYS
> > > >   STRESS_TENSOR ANALYTICAL
> > > > &END FORCE_EVAL
> > > > &MOTION
> > > >   &MD
> > > >       TIMESTEP [fs] 0.5
> > > >       STEPS         10000
> > > >       TEMPERATURE   500
> > > >       ENSEMBLE      NVE
> > > >   &END
> > > >   &CELL_OPT
> > > >     TYPE GEO_OPT
> > > >     OPTIMIZER CG
> > > >     MAX_ITER 20
> > > >     EXTERNAL_PRESSURE [bar] 0.0
> > > >     MAX_DR 0.02
> > > >     RMS_DR 0.01
> > > >     MAX_FORCE 0.002
> > > >     RMS_FORCE 0.001
> > > >     KEEP_ANGLES T
> > > >     &CG
> > > >       &LINE_SEARCH
> > > >         TYPE 2PNT
> > > >         &2PNT
> > > >         &END
> > > >       &END
> > > >     &END
> > > >   &END
> > > >   &GEO_OPT
> > > >     MAX_ITER 300
> > > >     MINIMIZER LBFGS
> > > >   &END
> > > > &END
>
> > > > -------
>
> > > > Extract from the output file ( sigsegv error):
>
> > > > MPI: On host r17i2n5, Program /scratch/cem6039/grisolia/cp2k/exe/Linux-
> > > > x86-64-jade/cp2k.psmp, Rank 0, Process 4568 received signal SIGSEGV
> > > > (11)
>
> > > > MPI: --------stack traceback-------
> > > > MPI: On host r17i3n12, Program /scratch/cem6039/grisolia/cp2k/exe/
> > > > Linux-x86-64-jade/cp2k.psmp, Rank 57, Process 29665 received signal
> > > > SIGSEGV(11)
>
> > > > MPI: --------stack traceback-------
> > > > MPI: On host r17i3n0, Program /scratch/cem6039/grisolia/cp2k/exe/Linux-
> > > > x86-64-jade/cp2k.psmp, Rank 25, Process 542 received signal SIGSEGV
> > > > (11)
>
> > > > MPI: --------stack traceback-------
> > > > MPI: On host r17i3n1, Program /scratch/cem6039/grisolia/cp2k/exe/Linux-
> > > > x86-64-jade/cp2k.psmp, Rank 32, Process 5057 received signal SIGSEGV
> > > > (11)
>
> > > > MPI:
>
> ...
>
> plus de détails »


More information about the CP2K-user mailing list