[CP2K:3043] Re: Segfault with psmp

Alin Marin Elena alinm... at gmail.com
Tue Jan 11 20:02:44 CET 2011


Hi Ondrej,


I had a look at it... 
I compiled on my machine the cp2k 2_1 branch with intel compiler 11.1.073 and 
with gnu compilers... 

with intel it crashes
with gnu compilers works...

why... that is a puzzle.
to make life simpler i removed the mkl and intel fftw compiling against plain 
lapack/blas/fftw3
and added debug and extra checks

here is the error I get
forrtl: severe (408): fort: (2): Subscript #3 of the array R has value 64 
which is greater than the upper bound of 63

Image              PC                Routine            Line        Source             
cp2k.ssmp          00000000073BEFAD  Unknown               Unknown  Unknown
cp2k.ssmp          00000000073BDAB5  Unknown               Unknown  Unknown
cp2k.ssmp          0000000007359200  Unknown               Unknown  Unknown
cp2k.ssmp          000000000730845A  Unknown               Unknown  Unknown
cp2k.ssmp          0000000007308852  Unknown               Unknown  Unknown
cp2k.ssmp          0000000001C96A54  realspace_grid_ty        1993  
realspace_grid_types.F
libiomp5.so        00007F277C231793  Unknown               Unknown  Unknown


and here is my arch file
alin at baphomet:~/lavello/cp2k/makefiles> cat ../arch/Linux-baphomet.ssmp 
# by default some intel compilers put temporaries on the stack
# this might lead to segmentation faults is the stack limit is set to low
# stack limits can be increased by sysadmins or e.g with ulimit -s 256000
# furthermore new ifort (10.0?) compilers support the option
# -heap-arrays 64
# add this to the compilation flags is the other options do not work
# The following settings worked for:
# - AMD64 Opteron
# - SUSE Linux Enterprise Server 10.0 (x86_64)
# - Intel(R) Fortran Compiler for Intel(R) EM64T-based applications, Version 
10.0
# - AMD acml library version 3.6.0
# - MPICH2-1.0.5p4
# - FFTW 3.1.2
#
CC       = icc
CPP      = 
FC       = ifort -FR -openmp -O0 -g -heap-arrays -check all  -debug all -
traceback  
LD       = ifort -FR -openmp -O0 -g -heap-arrays -check all -debug all -
traceback 
AR       = ar -r
DFLAGS   = -D__INTEL -D__FFTSG -D__FFTW3
CPPFLAGS = -C -traditional $(DFLAGS) -I$(INTEL_INC)
FCFLAGS  = $(DFLAGS) -I$(INTEL_INC) 
LDFLAGS  = $(FCFLAGS)
LIBS     = -llapack -lblas -lfftw3

OBJECTS_ARCHITECTURE = machine_intel.o

Alin


On Tuesday 11 January 2011 14:49:37 Ondrej Marsalek wrote:
> Additional confusion: the problem occurs within an 'IF (nthread > 1)',
> even if I set OMP_NUM_THREADS=1. When I print nthread in that place, I
> always get '8'. Is cp2k supposed to honor the value of the env
> variable? If not, what is the proper way to set the number of threads?
> 
> Thanks,
> Ondrej
> 
> PS:
> The threads can also be seen in gdb:
> 
> Starting program:
> /home/andy/build/cp2k/cp2k/exe/Linux-x86-64-intel/cp2k.ssmp W3-H3O.inp
> [Thread debugging using libthread_db enabled]
> [New Thread 0x7ffff7fd3700 (LWP 32192)]
> 
>   **** **** ******  **  PROGRAM STARTED AT               2011-01-11
> 14:38:08.422 ***** ** ***  *** **   PROGRAM STARTED ON                     
>        cassandra **    ****   ******    PROGRAM STARTED BY                 
>                 andy ***** **    ** ** **   PROGRAM PROCESS ID             
>                    32189 **** **  *******  **  PROGRAM STARTED IN         
> /home/andy/W3-H3O-min-global
> 
>  CP2K| version string:                 CP2K version 2.2.94 (Development
> Version) CP2K| is freely available from                         
> http://cp2k.berlios.de/ CP2K| Program compiled at                         
> Tue Jan 11 14:00:00 CET 2011 CP2K| Program compiled on                     
>                        cassandra CP2K| Program compiled for                
>                   Linux-x86-64-intel CP2K| Last CVS entry
>  CP2K| Input file name                                               
> W3-H3O.inp [New Thread 0x7ffff4fe4700 (LWP 32193)]
> [New Thread 0x7ffff4be3700 (LWP 32194)]
> [New Thread 0x7ffff47e2700 (LWP 32195)]
> [New Thread 0x7fffeffff700 (LWP 32196)]
> [New Thread 0x7fffefbfe700 (LWP 32197)]
> [New Thread 0x7fffef7fd700 (LWP 32198)]
> [New Thread 0x7fffef3fc700 (LWP 32199)]
> 
> 
> On Tue, Jan 11, 2011 at 14:25, Ondrej Marsalek
> 
> <ondrej.... at gmail.com> wrote:
> > I have simplified the problem further by doing a ssmp build. The arch
> > file can be found here:
> > 
> > http://marge.uochb.cas.cz/~marsalek/tmp/Linux-x86-64-intel.psmp
> > 
> > And the problem persists as described, regardless of the number of
> > OpenMP threads.
> > 
> > Any ideas how to get this working?
> > 
> > Thanks,
> > Ondrej
> > 
> > On Thu, Jan 6, 2011 at 14:04, Ondrej Marsalek <ondrej.... at gmail.com> 
wrote:
> >> Dear all,
> >> 
> >> I get a segfault with a psmp build of cp2k trunk. The corresponding
> >> popt works. The problem occurs even when run as a single process and
> >> with OMP_NUM_THREADS=1. This is what it looks like to gdb:
> >> 
> >> ==========
> >> ...
> >>  Spin 1
> >> 
> >>  Number of electrons:                                                
> >>         17 Number of occupied orbitals:                              
> >>                   17 Number of molecular orbitals:                  
> >>                              17
> >> 
> >>  Spin 2
> >> 
> >>  Number of electrons:                                                
> >>         16 Number of occupied orbitals:                              
> >>                   16 Number of molecular orbitals:                  
> >>                              16
> >> 
> >>  Number of orbital functions:                                        
> >>        169 Number of independent orbital functions:                  
> >>                  169
> >> 
> >>  Extrapolation method: initial_guess
> >> 
> >> Program received signal SIGSEGV, Segmentation fault.
> >> [Switching to Thread 0x7fffef887700 (LWP 23493)]
> >> __libc_free (mem=0x2020202000000001) at malloc.c:3709
> >> 3709    malloc.c: No such file or directory.
> >>        in malloc.c
> >> (gdb) backtrace
> >> #0  __libc_free (mem=0x2020202000000001) at malloc.c:3709
> >> #1  0x000000000257a6ec in for_deallocate ()
> >> #2  0x00000000016d413b in
> >> QS_COLLOCATE_DENSITY::L_qs_collocate_density_mp_calculate_rho_elec__91
> >> 7__par_region0_2_110 ()
> >>    at
> >> /home/andy/build/cp2k/cp2k/makefiles/../src/qs_collocate_density.F:11
> >> 63 #3  0x0000000002634763 in L_kmp_invoke_pass_parms ()
> >> #4  0x00007fffffff53fc in ?? ()
> >> #5  0x00007fffffff53c4 in ?? ()
> >> #6  0x00007fffffff5380 in ?? ()
> >> ...
> >> ==========
> >> 
> >> The CPU is a Core i7. The arch file and the input that triggers the
> >> segfault (almost immediately after start) are here:
> >> 
> >> http://marge.uochb.cas.cz/~marsalek/tmp/W3-H3O-min-global.tar.gz
> >> http://marge.uochb.cas.cz/~marsalek/tmp/Linux-x86-64-intel.psmp
> >> 
> >> The versions used for the build are:
> >> cp2k trunk checked out today
> >> OpenMPI 1.5
> >> Intel Compiler 11.1.073 and corresponding MKL
> >> 
> >> I understand that this might be difficult or impossible to reproduce,
> >> but would be grateful for any suggestions as for how to try to resolve
> >> this.
> >> 
> >> Thanks,
> >> Ondrej
-- 
I force myself to contradict myself in order to avoid conforming to my own 
taste. -- Marcel Duchamp
Without Questions there are no Answers!
_____________________________________________________________________
Alin Marin ELENA
Advanced Molecular Simulation Research Laboratory  
School of Physics, University College Dublin
----  
Ardionsamblú Móilíneach Saotharlann Taighde
Scoil na Fisice, An Coláiste Ollscoile, Baile Átha Cliath
-----------------------------------------------------------------------------------
Address:
Room 318, UCD Engineering and Material Science Centre
University College Dublin
Belfield, Dublin 4, Ireland
-----------------------------------------------------------------------------------
http://alin.elenaworld.net
alin.... at ucdconnect.ie, alinm... at gmail.com
______________________________________________________________________



More information about the CP2K-user mailing list