too many steps in some test jobs
Jörg Saßmannshausen
j.sassma... at ucl.ac.uk
Wed Feb 12 14:55:16 UTC 2014
Dear all,
I recently build cp2k version 2.4.0 on our large cluster here.
Running the regtest delivered the expected results, however, I got a bunch of jobs which are failing with 'too many
steps'. In particular, and as an example, I am using the si8_noort_broy_wc_direct_ene.inp input file here.
I have previously build that version of cp2k on a different cluster without any problems, i.e. the test jobs in questions
all passed there. Thus I was using the same makefile to build the version on the big cluster, assuming there are no
major problems here.
As the big cluster to some new Sandybridge nodes I decided to go for the latest MKL library to make sure I get the
best performance here. Thus my makefile looked like that:
CC = cc
CPP =
FC = mpif90
LD = mpif90
AR = ar -r
CPPFLAGS =
DFLAGS = -D__GFORTRAN -D__FFTSG -D__LIBINT -D__FFTW3 -D__parallel -D__SCALAPACK -D__BLACS -D__LIBXC2
FCFLAGS = -O3 -march=native -ffast-math -funroll-loops -g -ffree-form -mno-avx $(DFLAGS) \
-I/shared/ucl/apps/fftw/gcc463/double/3.3.1/include
LDFLAGS = $(FCFLAGS) -L/home/uccajsa/build/cp2k/libint/lib -L/home/uccajsa/build/cp2k/libsmm-2.4.0/lib \
-L/home/uccajsa/build/cp2k/libxc-2.0.2/lib \
-L/shared/ucl/apps/fftw/gcc463/double/3.3.1/lib -Wl,--rpath=/shared/ucl/apps/fftw/gcc463/double/3.3.1/lib \
-L/shared/ucl/apps/intel_cs_2013.0.028/composer_xe_2013.1.117/mkl/lib/intel64/ \
-Wl,--rpath=/shared/ucl/apps/intel_cs_2013.0.028/composer_xe_2013.1.117/mkl/lib/intel64/ \
-Wl,--rpath=/shared/ucl/apps/gcc/4.6.3/lib64 -Wl,--rpath=/shared/ucl/apps/openmpi/gcc463-blcr/1.6.5/lib/
LIBS = -lsmm_dnn -lderiv -lint -lstdc++ -lfftw3 -Wl,--start-group -lmkl_gf_lp64 -lmkl_scalapack_lp64 -
lmkl_blacs_openmpi_lp64 -lmkl_sequential -lmkl_core -Wl,--end-group -lpthread -lxc
OBJECTS_ARCHITECTURE = machine_gfortran.o
with
$ mpif90 --version
GNU Fortran (GCC) 4.6.3
However, that crashed:
LOCALIZATION| Spin 1 : 4 orbitals in the selected energy range are localized.
LOCALIZATION| Computing localization properties for OCCUPIED ORBITALS. Spin: 1
Spread Functional sum_in -w_i ln(|z_in|^2) sum_in w_i(1-|z_in|^2)
Initial Spread (Berry) : 555.1772160672 29.6679932764
Localization by direct minimization of the functiona;
Line search Iteration Functional Tolerance ds Min
1 1 1171.245428 2.896820 1.000
2 24 1110.651481 36.466438 4.854
!
3 48 1078.151369 15.135590 1.010
!
4 108 1078.151369 15.135590 0.000
5 136 1078.151369 15.135590 0.000
6 164 1078.151369 131.500888 0.000
7 214 1074.369358 101.703277 0.189
8 234 1044.802384 102.474155 0.742
9 260 1004.739801 7.722766 0.469
10 284 1002.698494 17.776565 1.055
11 309 1002.541945 14.320150 0.263
12 335 1001.349272 26.357800 1.586
13 360 992.840648 127.750219 0.643
14 389 992.030878 143.798400 0.014
15 414 951.305527 95.665960 0.596
16 439 947.891529 60.783305 0.401
17 467 947.888705 59.633759 0.020
18 496 947.888702 59.600279 0.001
19 524 947.888702 59.599317 0.000
20 557 947.888702 59.599296 0.000
21 586 947.888702 59.599288 0.000
22 609 947.888702 59.599282 0.000
23 635 947.888702 59.599280 0.000
24 662 947.888702 59.599278 0.000
25 688 947.888702 59.599273 0.000
26 717 947.888702 47.920533 0.000
27 758 946.459682 35.625909 0.427
28 815 946.459682 35.625909 0.000
29 838 946.459682 35.625909 0.000
STOP Too many
Thus I reasond the problem might be with the either the libxc (I was using 2.0.2 here and 2.0.1 in the previous build)
or MKL and I tried an older version of libxc (same problem) and also an older version of MKL which worked well on the
previous build of cp2k (2.2.426) on the same machine. However, neither of these approaches solved the problem and
I am stuck here:
LOCALIZATION| Spin 1 : 4 orbitals in the selected energy range are localized.
LOCALIZATION| Computing localization properties for OCCUPIED ORBITALS. Spin: 1
Spread Functional sum_in -w_i ln(|z_in|^2) sum_in w_i(1-|z_in|^2)
Initial Spread (Berry) : 555.1772158870 29.6679932764
Localization by direct minimization of the functiona;
Line search Iteration Functional Tolerance ds Min
1 1 1171.245428 2.896820 1.000
2 24 1110.651549 36.466437 4.854
!
3 48 1078.151394 15.135563 1.010
!
4 104 1078.151394 15.135563 0.0000
So, I done the google thing but that did not furnish much as well.
Could somebody point me in the right direction here? In other words, it is a compiler or a library problem?
All the best from a wet and windy London
Jörg
--
*************************************************************
Jörg Saßmannshausen
University College London
Department of Chemistry
Gordon Street
London
WC1H 0AJ
email: j.sassma... at ucl.ac.uk
web: http://sassy.formativ.net
Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
More information about the CP2K-user
mailing list