too many steps in some test jobs

Jörg Saßmannshausen j.sassma... at ucl.ac.uk
Wed Feb 12 14:55:16 UTC 2014


Dear all,

I recently build cp2k version 2.4.0 on our large cluster here. 
Running the regtest delivered the expected results, however, I got a bunch of jobs which are failing with 'too many 
steps'. In particular, and as an example, I am using the si8_noort_broy_wc_direct_ene.inp input file here.

I have previously build that version of cp2k on a different cluster without any problems, i.e. the test jobs in questions 
all passed there. Thus I was using the same makefile to build the version on the big cluster, assuming there are no 
major problems here.
As the big cluster to some new Sandybridge nodes I decided to go for the latest MKL library to make sure I get the 
best performance here. Thus my makefile looked like that:

CC       = cc
CPP      = 

FC       = mpif90 
LD       = mpif90

AR       = ar -r

CPPFLAGS = 
DFLAGS   = -D__GFORTRAN -D__FFTSG -D__LIBINT -D__FFTW3 -D__parallel -D__SCALAPACK -D__BLACS -D__LIBXC2 
FCFLAGS  = -O3 -march=native -ffast-math -funroll-loops -g -ffree-form -mno-avx $(DFLAGS) \
            -I/shared/ucl/apps/fftw/gcc463/double/3.3.1/include
LDFLAGS  = $(FCFLAGS)  -L/home/uccajsa/build/cp2k/libint/lib -L/home/uccajsa/build/cp2k/libsmm-2.4.0/lib \
           -L/home/uccajsa/build/cp2k/libxc-2.0.2/lib \
           -L/shared/ucl/apps/fftw/gcc463/double/3.3.1/lib -Wl,--rpath=/shared/ucl/apps/fftw/gcc463/double/3.3.1/lib \
           -L/shared/ucl/apps/intel_cs_2013.0.028/composer_xe_2013.1.117/mkl/lib/intel64/ \
            -Wl,--rpath=/shared/ucl/apps/intel_cs_2013.0.028/composer_xe_2013.1.117/mkl/lib/intel64/ \
           -Wl,--rpath=/shared/ucl/apps/gcc/4.6.3/lib64 -Wl,--rpath=/shared/ucl/apps/openmpi/gcc463-blcr/1.6.5/lib/
LIBS     = -lsmm_dnn -lderiv -lint -lstdc++ -lfftw3 -Wl,--start-group -lmkl_gf_lp64 -lmkl_scalapack_lp64 -
lmkl_blacs_openmpi_lp64 -lmkl_sequential -lmkl_core -Wl,--end-group -lpthread -lxc 

OBJECTS_ARCHITECTURE = machine_gfortran.o

with 
$ mpif90 --version
GNU Fortran (GCC) 4.6.3

However, that crashed:
 LOCALIZATION| Spin    1 :      4 orbitals in the selected energy range are localized.

 LOCALIZATION| Computing localization properties for OCCUPIED ORBITALS. Spin:  1
          Spread Functional     sum_in -w_i ln(|z_in|^2)    sum_in w_i(1-|z_in|^2)
    Initial Spread (Berry) :                 555.1772160672       29.6679932764
         Localization by direct minimization of the functiona; 
     Line search    Iteration          Functional           Tolerance    ds Min 
             1            1            1171.245428            2.896820     1.000
             2           24            1110.651481           36.466438     4.854
 !
             3           48            1078.151369           15.135590     1.010
 !
             4          108            1078.151369           15.135590     0.000
             5          136            1078.151369           15.135590     0.000
             6          164            1078.151369          131.500888     0.000
             7          214            1074.369358          101.703277     0.189
             8          234            1044.802384          102.474155     0.742
             9          260            1004.739801            7.722766     0.469
            10          284            1002.698494           17.776565     1.055
            11          309            1002.541945           14.320150     0.263
            12          335            1001.349272           26.357800     1.586
            13          360             992.840648          127.750219     0.643
            14          389             992.030878          143.798400     0.014
            15          414             951.305527           95.665960     0.596
            16          439             947.891529           60.783305     0.401
            17          467             947.888705           59.633759     0.020
            18          496             947.888702           59.600279     0.001
            19          524             947.888702           59.599317     0.000
            20          557             947.888702           59.599296     0.000
            21          586             947.888702           59.599288     0.000
            22          609             947.888702           59.599282     0.000
            23          635             947.888702           59.599280     0.000
            24          662             947.888702           59.599278     0.000
            25          688             947.888702           59.599273     0.000
            26          717             947.888702           47.920533     0.000
            27          758             946.459682           35.625909     0.427
            28          815             946.459682           35.625909     0.000
            29          838             946.459682           35.625909     0.000
STOP Too many


Thus I reasond the problem might be with the either the libxc (I was using 2.0.2 here and 2.0.1 in the previous build) 
or MKL and I tried an older version of libxc (same problem) and also an older version of MKL which worked well on the 
previous build of cp2k (2.2.426) on the same machine. However, neither of these approaches solved the problem and 
I am stuck here:

 LOCALIZATION| Spin    1 :      4 orbitals in the selected energy range are localized.

 LOCALIZATION| Computing localization properties for OCCUPIED ORBITALS. Spin:  1
          Spread Functional     sum_in -w_i ln(|z_in|^2)    sum_in w_i(1-|z_in|^2)
    Initial Spread (Berry) :                 555.1772158870       29.6679932764
         Localization by direct minimization of the functiona; 
     Line search    Iteration          Functional           Tolerance    ds Min 
             1            1            1171.245428            2.896820     1.000
             2           24            1110.651549           36.466437     4.854
 !
             3           48            1078.151394           15.135563     1.010
 !
             4          104            1078.151394           15.135563     0.0000

So, I done the google thing but that did not furnish much as well. 
Could somebody point me in the right direction here? In other words, it is a compiler or a library problem?

All the best from a wet and windy London

Jörg


-- 
*************************************************************
Jörg Saßmannshausen
University College London
Department of Chemistry
Gordon Street
London
WC1H 0AJ 

email: j.sassma... at ucl.ac.uk
web: http://sassy.formativ.net

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html




More information about the CP2K-user mailing list