Sigsegv error during cell optimization

Axel akoh... at gmail.com
Wed May 6 14:49:46 UTC 2009


ciao maricarmen,

On May 6, 9:24 am, Maricarme... at cemes.fr wrote:
> Thanks Teo,
>
> Actually the Intel fortran compiler is version 10.1.017. I can't find
> any comments on this particular version. I found something on 10.1.018
> though, and it semmed to work fin.
> In the machine there is also version 11.0.83, but I actually found
> some message on the list reporting problems with latests compilers
> (e.g. versions 11).

hard to say, but the fact that it is up to patch level 83 is somewhat
telling.
i'd try the 10.1 first.

> For the plain popt CP2K version I'll have to ask the administrators to
> recompile the code (they did it the first time), so I might as well
> ask them to use the newer compiler this time. Otherwise, do you think
> it's better to compile to the popt version with the same compiler
> (e.g. 10.1.017)?

i would suggest to first go a bit more conservative in optimization
and
replace '-O3 -xS' with '-O2'. using a less aggressive optimization
frequently
helps with intel compilers. since you seem to be on an itanium
processor
machine, you'll be seeing more problems, though. those compilers are
generally lagging behind the x86 versions in reliability. idependent
of the
individual version.

if you look through the files in the arch directory. there are several
entries
with exceptions for files that are better compiled without any
optimizations
to work around to aggressive compilers. i'd try to collect all of them
into
a special arch file in case you still are seeing problems.

finally, i'd have a closer look at the mpi manpage. on altix machines
there
are a few environment variables that can affect the stability and
performance
of parallel jobs. i remember having tinkered with that on a machine,
but i have
currently no access to it, and forgot to transfer the job scripts
before that.

cheers,
    axel.
>
> Ciao,
>
> Maricarmen
>
> On 6 mai, 09:56, Teodoro Laino <teodor... at gmail.com> wrote:
>
> > Hi Maricarmen,
>
> > could you try a plain popt version without the smp support?
> > Keep as well in the submission script ompthreads=1.
>
> > which version of intel compiler are you using? did you check on this
> > mailing list that it is a "good one"?
> > In case, do you have access to other compilers on that machine?
>
> > Teo
>
> > Maricarme... at cemes.fr wrote:
> > > Hello everyone,
>
> > > I'm running a DFT cell optimization for Mx-V4O11 crystals (M = Ag and
> > > Cu). My cells are approximately 14x7x7 and about 260 atoms. Below is a
> > > copy of one of my input files. The problem is I keep getting a SIGSEGV
> > > (11) error, usually when starting the SCF cycles for the second cell
> > > opt step (an extract from the output file is also below).
> > > I'm running parallel on a calculus center (http://www.cines.fr/
> > > spip.php?rubrique186), and the administrators have already checked for
> > > the stack size (which according to them is set to unlimited). Below is
> > > also a copy of the job submission's file, and of the arch file.
> > > I even tried to run a cell opt test for a smaller cell (14*3*3, about
> > > 68 atoms), which I had already ran in a different calculus center
> > > without any issues, and I will still get the segmentation fault error.
> > > This clearly indicates me that the problem is associated to a
> > > configuration of the machines, to the way CP2K was installed, or to
> > > the job submission's characteristics (or to something else??). I must
> > > say I always get the exact same error during cell opt's second step,
> > > no matter what the system is (small or big cell, Ag or Cu).
> > > I tried running an Energy test on the smaller cell and it worked fine.
>
> > > I would really appreciate if any of you can throw some light at this,
> > > for I'm pretty stuck on it right now.
>
> > > Cheers,
>
> > > Maricarmen.
>
> > > Arch file:
>
> > > # by default some intel compilers put temporaries on the stack
> > > # this might lead to segmentation faults if the stack limit is set to
> > > low
> > > # stack limits can be increased by sysadmins or e.g with ulimit -s
> > > 256000
> > > # Tested on a HPC non-Itanium clusters @ UDS (France)
> > > # Note: -O2 produces an executable which is slightly faster than -O3
> > > # and the compilation time was also much shorter.
> > > CC       = icc -diag-disable remark
> > > CPP      =
> > > FC       = ifort -diag-disable remark -openmp
> > > LD       = ifort -diag-disable remark -openmp
> > > AR       = ar -r
>
> > > #Better with mkl (intel lapack/blas) only
> > > #DFLAGS   = -D__INTEL -D__FFTSG -D__parallel
> > > #If you want to use BLACS and SCALAPACK use the flags below
> > > DFLAGS   = -D__INTEL -D__FFTSG -D__parallel -D__BLACS -D__SCALAPACK -
> > > D__FFTW3
> > > CPPFLAGS =
> > > FCFLAGS  = $(DFLAGS) -fpp -free -O3 -xS -I/opt/software/SGI/intel/mkl/
> > > 10.0.3.020/include -I/opt/software/SGI/intel/mkl/10.0.3.020/include/
> > > fftw
> > > LDFLAGS  =  -L/opt/software/SGI/intel/mkl/10.0.3.020/lib/em64t
> > > #LIBS     = -lmkl -lm -lpthread -lguide -openmp
> > > #If you want to use BLACS and SCALAPACK use the libraries below
> > > LIBS     = -Wl,--allow-multiple-definition -lmkl_scalapack_lp64 /
> > > scratch/grisolia/blacsF77init_MPI-LINUX-0.a /scratch/grisolia/
> > > blacs_MPI-LINUX-0.a -lmpi -lmkl -lfftw3xf_intel -lmkl_blacs_lp64
>
> > > OBJECTS_ARCHITECTURE = machine_intel.o
>
> > > -------
>
> > > Job submission's file (getting the sigsegv error):
>
> > > #PBS -N cp2k
> > > #PBS -l walltime=24:00:00
> > > #PBS -S /bin/bash
> > > #PBS -l select=8:ncpus=8:mpiprocs=8:ompthreads=1
> > > #PBS -j oe
> > > #PBS -M  gris... at cemes.fr -m abe
>
> > > PBS_O_WORKDIR=/scratch/grisolia/CuVO/Fixed/
>
> > > cd $PBS_O_WORKDIR
>
> > > export OMP_NUM_THREADS=1
> > > export MKL_NUM_THREADS=1
> > > export MPI_GROUP_MAX=512
>
> > > /usr/pbs/bin/mpiexec /scratch/grisolia/cp2k/exe/Linux-x86-64-jade/
> > > cp2k.psmp CuV4O11-CellOpt.inp
>
> > > --------------
>
> > > Input file:
>
> > > &GLOBAL
> > >   PROJECT     CuV4O11-CellOpt
> > >   RUN_TYPE    CELL_OPT
> > >   PRINT_LEVEL MEDIUM
> > >   WALLTIME  86000
> > > &END GLOBAL
> > > &FORCE_EVAL
> > >   METHOD Quickstep
> > >   &DFT
> > >     BASIS_SET_FILE_NAME /scratch/grisolia/cp2k/tests/QS/BASIS_MOLOPT
> > >     POTENTIAL_FILE_NAME /scratch/grisolia/cp2k/tests/QS/GTH_POTENTIALS
> > >     LSD
> > >     &MGRID
> > >       CUTOFF 280
> > >       NGRIDS 5
> > >     &END MGRID
> > >     &QS
> > >       EPS_DEFAULT   1.0E-10
> > >       EXTRAPOLATION PS
> > >       EXTRAPOLATION_ORDER 1
> > >     &END QS
> > >     &SCF
> > >       SCF_GUESS RESTART
> > >       EPS_SCF 2.0E-7
> > >       MAX_SCF 30
> > >       &OUTER_SCF
> > >          EPS_SCF 2.0E-7
> > >          MAX_SCF 15
> > >       &END
> > >       &OT
> > >         MINIMIZER CG
> > >         PRECONDITIONER FULL_SINGLE_INVERSE
> > >         ENERGY_GAP 0.05
> > >       &END
> > >       &PRINT
> > >          &RESTART
> > >             FILENAME = CuV4O11-CellOpt.wfn
> > >          &END
> > >       &END
> > >     &END SCF
> > >     &XC
> > >       &XC_FUNCTIONAL PBE
> > >       &END XC_FUNCTIONAL
> > >     &END XC
> > >      &PRINT
> > >        &MO_CUBES
> > >           WRITE_CUBE F
> > >           NLUMO      20
> > >           NHOMO      20
> > >        &END
> > >      &END
> > >   &END DFT
> > >   &SUBSYS
> > >     &CELL
> > >       @INCLUDE CuV4O11-GeoOpt.cell
> > >     &END CELL
> > >     &COORD
> > >       @INCLUDE CuV4O11-GeoOpt.coord
> > >     &END COORD
> > >     &END COORD
> > >     &KIND Cu
> > >       BASIS_SET DZVP-MOLOPT-SR-GTH
> > >       POTENTIAL GTH-PBE-q11
> > >     &END KIND
> > >     &KIND O
> > >       BASIS_SET DZVP-MOLOPT-SR-GTH
> > >       POTENTIAL GTH-PBE-q6
> > >     &END KIND
> > >     &KIND V
> > >       BASIS_SET DZVP-MOLOPT-SR-GTH
> > >       POTENTIAL GTH-PBE-q13
> > >     &END KIND
> > >   &END SUBSYS
> > >   STRESS_TENSOR ANALYTICAL
> > > &END FORCE_EVAL
> > > &MOTION
> > >   &MD
> > >       TIMESTEP [fs] 0.5
> > >       STEPS         10000
> > >       TEMPERATURE   500
> > >       ENSEMBLE      NVE
> > >   &END
> > >   &CELL_OPT
> > >     TYPE GEO_OPT
> > >     OPTIMIZER CG
> > >     MAX_ITER 20
> > >     EXTERNAL_PRESSURE [bar] 0.0
> > >     MAX_DR 0.02
> > >     RMS_DR 0.01
> > >     MAX_FORCE 0.002
> > >     RMS_FORCE 0.001
> > >     KEEP_ANGLES T
> > >     &CG
> > >       &LINE_SEARCH
> > >         TYPE 2PNT
> > >         &2PNT
> > >         &END
> > >       &END
> > >     &END
> > >   &END
> > >   &GEO_OPT
> > >     MAX_ITER 300
> > >     MINIMIZER LBFGS
> > >   &END
> > > &END
>
> > > -------
>
> > > Extract from the output file ( sigsegv error):
>
> > > MPI: On host r17i2n5, Program /scratch/cem6039/grisolia/cp2k/exe/Linux-
> > > x86-64-jade/cp2k.psmp, Rank 0, Process 4568 received signal SIGSEGV
> > > (11)
>
> > > MPI: --------stack traceback-------
> > > MPI: On host r17i3n12, Program /scratch/cem6039/grisolia/cp2k/exe/
> > > Linux-x86-64-jade/cp2k.psmp, Rank 57, Process 29665 received signal
> > > SIGSEGV(11)
>
> > > MPI: --------stack traceback-------
> > > MPI: On host r17i3n0, Program /scratch/cem6039/grisolia/cp2k/exe/Linux-
> > > x86-64-jade/cp2k.psmp, Rank 25, Process 542 received signal SIGSEGV
> > > (11)
>
> > > MPI: --------stack traceback-------
> > > MPI: On host r17i3n1, Program /scratch/cem6039/grisolia/cp2k/exe/Linux-
> > > x86-64-jade/cp2k.psmp, Rank 32, Process 5057 received signal SIGSEGV
> > > (11)
>
> > > MPI: --------stack traceback-------
> > > MPI: GNU gdb 6.6
> > > MPI: Copyright (C) 2006 Free Software Foundation, Inc.
> > > MPI: GDB is free software, covered by the GNU General Public License,
> > > and you are
> > > MPI: welcome to change it and/or distribute copies of it under certain
> > > conditions.
> > > MPI: Type "show copying" to see the conditions.
> > > MPI: There is absolutely no warranty for GDB.  Type "show warranty"
> > > for details.
> > > MPI: This GDB was configured as "x86_64-suse-linux"...
> > > MPI: Using host libthread_db library "/lib64/libthread_db.so.1".
> > > MPI: Attaching to program: /proc/4568/exe, process 4568
> > > MPI: [Thread debugging using libthread_db enabled]
> > > MPI: [New Thread 46912551614368 (LWP 4568)]
> > > MPI: [New Thread 1073809728 (LWP 4588)]
> > > MPI: 0x00002aaaad94073f in waitpid () from /lib64/libpthread.so.0
> > > MPI: (gdb) #0  0x00002aaaad94073f in waitpid () from /lib64/
> > > libpthread.so.0
> > > MPI: #1  0x00002aaaaadb5133 in MPI_SGI_stacktraceback () from /usr/
> > > lib64/libmpi.so
> > > MPI: #2  0x00002aaaaadb5773 in slave_sig_handler () from /usr/lib64/
> > > libmpi.so
> > > MPI: #3  <signal handler called>
> > > MPI: #4  0x00000000017f7ad0 in fftw_destroy_plan ()
> > > MPI: #5  0x00000000017f794d in dfftw_destroy_plan_ ()
> > > MPI: #6  0x000000000169332a in fftw3_destroy_plan_ ()
> > > MPI: #7  0x000000000169199e in fft_destroy_plan_ ()
> > > MPI: #8  0x000000000044229e in
> > > fft_tools_mp_deallocate_fft_scratch_type_ ()
> > > MPI: #9  0x00000000004678ae in fft_tools_mp_resize_fft_scratch_pool_
> > > ()
> > > MPI: #10 0x00000000004556c8 in fft_tools_mp_get_fft_scratch_ ()
> > > MPI: #11 0x000000000046ca53 in fft_tools_mp_fft3d_ps_ ()
> > > MPI: #12 0x00000000007360c9 in pw_methods_mp_fft_wrap_pw1pw2_ ()
> > > MPI: #13 0x0000000000732a01 in pw_methods_mp_pw_transfer_ ()
> > > MPI: #14 0x0000000000778bcd in qs_collocate_density_mp_density_rs2pw_
> > > ()
> > > MPI: #15 0x0000000000777ba3 in
>
> ...
>
> read more »


More information about the CP2K-user mailing list