THE ODISSEY OF CP2K; THE LONG WAY BACK TO ITHACA

Francesco Filippone francesco... at ism.cnr.it
Fri May 27 10:02:49 UTC 2011


1. DOWNLOAD & COMPILE; THE ROOT OF THE EVIL
(Contributions to configuration and compilation came from Luca Ferraro
@CASPUR and Emanuele Bovi @UniRoma1)

CP2K is an open source project since I'm an open source taliban
I'm quite happy of it) aimed at giving a fast and reliable code to
perform quantum materials science calculations. It is developed by
a group of really smart people, some of them I know personally, I can
tell, and it is the evolution of the old an glorious CPMD.

In my 15+ year scientific career I have used plane wave codes; mostly
CPMD and Quantum-espresso. Such codes are nice, smart, perform DFT
calculations efficiently and allow to answer a lot of interesting
scientific questions; they are open source and can be compiled on
several platforms. Once upon a time, we were using IBM RS/6000 or
DEC Alpha serial platforms, though now the most interesting ones are
based on the Linux/x86 parallel architecture.Well, nowadays, after 20+
years of heavy deployment, standard
DFT reached its limits of accuracy; for several interesting questions
there's need to resort to beyond-LDA methods, and one of them, on
fashion these years, is hybrid DFT. Where hybrid means "let's cook
DFT with some flavour of Hartree-Fock, it will taste much better".
Well, somehow it works, such an approach corrects the self-interaction
errors and gives interesting results; I am not too zealous let's give
it a try!
But, but, but. Exchange integrals in real space, required for
computing
HF exchange, are much too heavy for plane wave codes, so there's need
for
something different, a code with spatially localized base functions.
What about CP2K, then? It comes from the "same" factory of CPMD, it is
based on an extremely smart Gaussian + Plane Waves basis set, from the
publications it looks like being fast and accurate, it is open source.
Wow, the right answer for my needs, go and fetch it!I downloaded the
source code for the stable branch, unpacked the tar.gz
file, got into the cp2k tree, and found the installation instructions.
Well, no configure script, rather a subdirectory full of system-
dependent
configuration files, to be adapted the actual architecture of my
system.
I choose to use to machines as bedtests; my laptop, with Ubuntu 10.04
and
a 4-way Opteron workstation, with Debian 6.0, freshly upgraded from
5.0 (yes, it was also the testbed for the online upgrade procedure).
CP2k is reported to compile with Intel compilers (from version 9 on)
and
gfortran (from version 4.4; the conservative policy of Debian rules
out
my other boxes, still with 5.0 "Lenny", shipped with gfortran 4.3).
My machines all see the latest Intel compiler available for my
license,
11.073 (the 12 series has been renamed "Intel Composers", quite
puzzling,
I'd say; for me a composer was someone like Bach, Mozart or Vivaldi,
but we would rather go off-topic). Hacking a bit the configuration
files
both of the compilers succeed to compile the serial executable.
Disappointingly enough, the configuration files are quite out of date,
especially for what the libraries link are concerned.
And here comes to play the extremely useful Intel Math Kernel Library
Link Line Advisor;http://software.intel.com/en-us/articles/intel-mkl-
link-line-advisor/
I could manage to link correctly the MKL libraries both for
Linux-x86-64-intel.sopt and Linux-x86-64-gfortran.sopt
architectures.Of course I'm more interested in the parallel versions,
both MPI and OpenMP.
First of all, I've been using MPICH2 for some years, quite satisfied
with it, but I see that most of the codes (included libraries) now
rely on
openmpi; it should be the same, but it is much simpler to download and
compile a new parallel environment rather than hacking calls to mpi
libraries. Done, I have now openmpi-1.4.3, both with ifort and
gfortran.
And here comes the second trouble. Parallel CP2K relies on BLACS and
SCALAPACK,
I have never used them, and in the configuration files indications are
given
on linking such libraries (freely downloadable form netlib.org).
But, hey guy!, it's just a matter of downloading and compiling two
libaries
you did it some hundreds times before! I know, also, that an optimized
version of such libraries is distributed with MKL libraries, but
"optimization is the root of all evils" (Donald E. Knuth), so keep it
simple,
make it work, and then optimize it!
Under netlib.org/scalapack I find a nice script, scalapack_installer;
it
downloads and compiles scalapack, and, if needed, also blacs, lapack
and blas
libraries; do not expect astonishing performance, not even
completeness
of the libraries, but it is enough to have a working library on your
system. Automagically, the script does all the work for me; great!
Having a look at the installed libraries, I have some mismatch with
the
names of the libraries reported in the configuration files (have I
already
said that such files are rather out-of-date? If yes... repetita
iuvant!).
Uhm ... ok, some further googling & hacking, I find the correspondence
between the two forms, and get it.

Well, almost. What I get at link time is a wealth of errors, like;
[...]
mltfftsg_tools.F:(.text+0x538): undefined reference to `dscal_'
/home/filippon/CP2K/13gen11/cp2k//lib/Linux-x86-64-gfortran-scalapack-
libint/popt/libcp2k_base_lib.a(fast.o): In function `rankup_':
fast.F:(.text+0x1595): undefined reference to `zgeru_'
fast.F:(.text+0x15ac): undefined reference to `zscal_'
fast.F:(.text+0x15e1): undefined reference to `zgeru_'
collect2: ld returned 1 exit status
make[1]: *** [/home/filippon/CP2K/13gen11/cp2k//exe/Linux-x86-64-
gfortran-scalapack-libint/cp2k.popt] Error 1
zgeru_, zscal_ and other names such are blas routines; I have BLAS
library compiled, but the linker doesn't find them. Ok, let's check
it;
filippon at amore9:~/CP2K/13gen11/cp2k/arch$ objdump -t ../../BLAS/BLAS/
blas_LINUX.a | grep zgeru
zgeru.o:     file format elf64-x86-64
0000000000000000 l    df *ABS*  0000000000000000 zgeru.f
0000000000000000 g     F .text  0000000000000524 zgeru_
or;
filippon at amore9:~/CP2K/13gen11/cp2k/arch$ nm ../../../../SCALAPACK/
BLAS/BLAS/blas_LINUX.a | grep zgeru_
0000000000000000 T zgeru_

This exerpt is relative to the home brewed blas routines, but it was
the
same with blas installed and compiled by scalapack_installer.
Moreover,
I compiled LAPACK with such BLAS library, and it didn't complain.
By the way, this is the configuration file (gnu parallel environment);

#############################################################################
SCL_DIR   = /opt/blacs/gnu/ompi-1.4.3/lib/
BLAS_DIR  = /home/filippon/SCALAPACK/BLAS/BLAS
LPCK_DIR  = /home/filippon/SCALAPACK/LAPACK/lapack-3.3.1
MPI_DIR   = /opt/openmpi-1.4.3.g4.4/bin
FFTW_LIB  = /opt/fftw-3.2.2/lib/
LIBINT    = /opt/libint-1.1.4/
CP2KHOME  = /home/filippon/CP2K/13gen11/cp2k/

#
CC       = cc
CPP      =
FC       = $(MPI_DIR)/mpif90 -ffree-form -w -ffree-line-length-none
LD       = $(MPI_DIR)/mpif90 -ffree-form -w -ffree-line-length-none
AR       = ar -r
DFLAGS   = -D__GFORTRAN -D__FFTSG -D__parallel -D__BLACS -D__SCALAPACK
-D__FFTW3 -D__LIBINT -D__HAS_NO_ISO_C_BINDING
FCFLAGS  = $(DFLAGS) -I$(INTEL_INC) -I$(FFTWINCLUDE) -O2 -ffast-math -
funroll-loops -ftree-vectorize -march=native -ffree-formCPPFLAGS = -I$
(LIBINT)/include

LIBS     = \
           $(BLAS_DIR)/blas_LINUX.a \
           $(LPCK_DIR)/lapack_LINUX.a \
           $(SCL_DIR)/libscalapack.a \
           $(SCL_DIR)/blacsF77.a \
           $(SCL_DIR)/blacsC.a \
           $(SCL_DIR)/blacs.a \
           $(FFTW_LIB)/libfftw3.a\
           $(CP2KHOME)/tools/hfx_tools/libint_tools/
libint_cpp_wrapper.o\
           $(LIBINT)/lib/libderiv.a \
           $(LIBINT)/lib/libint.a \
           -lstdc++

OBJECTS_ARCHITECTURE = machine_gfortran.o


graphcon.o: graphcon.F
        $(FC) -c $(FCFLAGS2) $<
#############################################################################
The simple way didn't work, I'm getting pessimistic, but, ok, I try to
get more aggressive; with the intel libraries advisor I get the lines
needed to link the optimized mkl blacs and scalapack.

IT WORKS!!!!

Well ... IT COMPILES!!!!

Here is the configuration file (intel parallel environment);

#############################################################################
MPI_DIR   = /opt/openmpi-1.4.3.i11.1/bin
INTEL_LIB = /opt/intel/Compiler/11.1/073/mkl/lib/em64t
INTEL_INC = /opt/intel/Compiler/11.1/073/include
FFTW_LIB  = /opt/fftw-3.2.2/lib/
LIBINT    = /opt/libint-1.1.4/
CP2KHOME  = /home/filippon/CP2K/13gen11/cp2k/

#
CC       = cc
CPP      =
FC       = $(MPI_DIR)/mpif90
LD       = $(MPI_DIR)/mpif90
AR       = ar -r
DFLAGS   = -D__INTEL -D__FFTSG -D__parallel -D__BLACS -D__SCALAPACK -
D__FFTW3 -D__LIBINT -D__HAS_ISO_C_BINDING
CPPFLAGS = -I$(LIBINT)/include
FCFLAGS  = $(DFLAGS) -I$(INTEL_INC) -O2 -xW -heap-arrays 64 -funroll-
loops -fpp -free
FCFLAGS2 = $(DFLAGS) -I$(INTEL_INC) -O1 -xW -heap-arrays 64 -fpp -free
LDFLAGS  = $(FCFLAGS) -I$(INTEL_INC) -lrt
LIBS     = $(INTEL_LIB)/libmkl_scalapack_lp64.a \
           -Wl,--start-group \
           $(INTEL_LIB)/libmkl_intel_lp64.a \
           $(INTEL_LIB)/libmkl_sequential.a \
           $(INTEL_LIB)/libmkl_core.a \
           $(INTEL_LIB)/libmkl_blacs_openmpi_lp64.a \
           -Wl,--end-group -lpthread \
           $(FFTW_LIB)/libfftw3.a \
           $(CP2KHOME)/tools/hfx_tools/libint_tools/
libint_cpp_wrapper.o\
           $(LIBINT)/lib/libderiv.a \
OBJECTS_ARCHITECTURE = machine_intel.o


graphcon.o: graphcon.F
        $(FC) -c $(FCFLAGS2) $<
#############################################################################

Half-way between the serial and the parallel paradigms, there is
OpenMP;
nice, it parallelize where possible (taking advantage of multi-core
architecture and/or SMP machines), but doesn't need a parallel
environment. With the experience on home brewed libraries I decide to
start hacker the configuration files by linking with the Intel MKL.

First trial; Intel compiler (some say it is more buggy than a Dune
Buggy).
Ehm ... well ... Intel compiler does not understand OpenMP directives;
[...]
/home/filippon/CP2K/13gen11/cp2k/makefiles/../src/lib/
dbcsr_methods.F(1181): error #7636: This statement or directive is not
permitted within the body of an OpenMP SINGLE directive
    FORALL (t = 1:nrows)
----^
/home/filippon/CP2K/13gen11/cp2k/makefiles/../src/lib/
dbcsr_methods.F(1183): error #7636: This statement or directive is not
permitted within the body of an OpenMP SINGLE directive
    END FORALL
----^
compilation aborted for /home/filippon/CP2K/13gen11/cp2k/makefiles/../
src/lib/dbcsr_methods.F (code 1)
make[1]: *** [dbcsr_methods.o] Error 1
[...]
And here is the guilty, my configuration for Linux-x86-64-intel.ssmp;
#############################################################################
INTEL_LIB = /opt/intel/Compiler/11.1/073/mkl/lib/em64t
FFTW_LIB  = /opt/fftw-3.2.2/lib/
LIBINT_LIB= /opt/libint-1.1.4/
WRAPPER_DIR = /home/filippon/CP2K/13gen11/cp2k/tools/hfx_tools/
libint_tools/

CC       = icc
CPP      =
FC       = ifort -FR -openmp -O0
LD       = ifort -FR -openmp -O0
AR       = ar -r
DFLAGS   = -D__INTEL -D__FFTSG -D__FFTW3 -D__LIBINT -
D__HAS_NO_ISO_C_BINDING
CPPFLAGS = -C -traditional $(DFLAGS) -I$(INTEL_INC)
FCFLAGS  = $(DFLAGS) -I$(INTEL_INC) -O2 -xW -heap-arrays 64 -fpp -free
FCFLAGS2 = $(DFLAGS) -I$(INTEL_INC) -O1 -xW -heap-arrays 64 -fpp -free
LDFLAGS  = $(FCFLAGS) -lrt
LIBS     = -Wl,--start-group \
           $(INTEL_LIB)/libmkl_intel_lp64.a \
           $(INTEL_LIB)/libmkl_sequential.a \
           $(INTEL_LIB)/libmkl_core.a \
           -Wl,--end-group -lpthread \
           $(FFTW_LIB)/libfftw3.a \
           $(WRAPPER_DIR)/libint_cpp_wrapper.o \
           $(LIBINT_LIB)/lib/libderiv.a\
           $(LIBINT_LIB)/lib/libint.a\
           -lstdc++

OBJECTS_ARCHITECTURE = machine_intel.o

graphcon.o: graphcon.F
        $(FC) -c $(FCFLAGS2) $<
#############################################################################


Epic fail! But, since there seems to be no big performance difference
between ifort and gfortran, we can go with the...

Second trial; gfortran. Oh, nice, ... IT COMPILES!!!! Just one word;
gfortran
seems not to like too much parallel compilation, so instead of trying
make -j 4 ARCH=..., rather make ARCH=....Here is my wonderful working
configuration;
#############################################################################
# local variables
INTEL_LIB = /opt/intel/Compiler/11.1/064/mkl/lib/em64t
INTEL_INC = /opt/intel/Compiler/11.1/064/include
FFTWLIB = /opt/fftw-3.2.2/lib/
FFTWINCLUDE = /opt/fftw-3.2.2/include/
LIBINT = /opt/libint-1.1.4.g4.4/lib
CP2KHOME = $(HOME)/CP2K/13gen11/cp2k/
# CP2K variables
CC       = cc
CPP      =
FC       = gfortran -ffree-form -w -fno-second-underscore -ffree-line-
length-none $(FCFLAGS)
LD       = gfortran -ffree-form -w -fno-second-underscore -ffree-line-
length-none
AR       = ar -r
DFLAGS   = -D__GFORTRAN -D__FFTSG -D__FFTW3 -D__LIBINT
CPPFLAGS =
FCFLAGS  = $(DFLAGS) -I$(FFTWINCLUDE) -O2 -fopenmp -ffast-math -
funroll-loops -ftree-vectorize -march=native -ffree-form
LDFLAGS  = $(FCFLAGS)
LIBS     = -Wl,--start-group \
           $(INTEL_LIB)/libmkl_intel_lp64.a \
           $(INTEL_LIB)/libmkl_sequential.a \
           $(INTEL_LIB)/libmkl_core.a \
           -Wl,--end-group -lpthread \
           $(FFTWLIB)/libfftw3.a \
           $(CP2KHOME)/tools/hfx_tools/libint_tools/
libint_cpp_wrapper.o \
           $(LIBINT)/libderiv.a \
           $(LIBINT)/libint.a \
           -lstdc++


OBJECTS_ARCHITECTURE = machine_gfortran.o


graphcon.o: graphcon.FJust for some bookkeeping, compiling CP2K took
several hours of brain-time
and some days of elapsed time. To be compared with 15 minutes of brain-
time
and less than one hour (depending on the machine) for the venerated
CPMD,
and, if you want, a few minutes of brain-time and less than one hour
(machine-dependent too) for Quantum-espresso.

Well done, guy, you succeeded in compiling, your first task. Now go
for
testing ...

2. REGRESSION TESTS; THE STATISTICAL TRUTH

Every code comes with its own tests. Usually a suite of short tests,
with a
double task; on one hand to check that all results are identical with
reference ones, on the other hand to give a sort of primer for
the newbie, in order to have something to play with for starting one's
own calculations.

CP2K comes with a giant compilation of tests; 2060!!!! After all CP2K
joins several different framework of calculations; DFT, MD, MC, QM/MM,
and
so on, so such number must be  big.

Well, let's start with regression tests! I expected a (more or less)
100% success, even considering rounding errors, FFT inconsistencies,
and
so on.
The numbers of failed tests I get with the above described working
makefiles range from 0 (intel.sopt), to 68 (intel.popt), to 86
(gfortran.popt), to 325 (gfortran.sopt) to 351 (gfortran.ssmp).
Oh, fine, intel serial has 100% score! But it is serial and
presumably one wants something more. 351 failed tests are around
17%, quite a lot. Interestingly enough, gfortran serial scores
almost like openmp, really far from intel serial (quite surprising
indeed), while intel.popt and gfortran.popt score almost equal.
Around 70 tests (out of 2060) are around 3.5%, but but but ... digging
in 70 tests to discover what's wrong looks a bit too much, even for
a "power user" as I could define myself (allow me to do that, please).
And, having a look at some of the "failed" examples (most of them are
QS
examples), one find that often they actually finish the job, but are
not
capable to finalize it.
So, what I see from regression tests is that there is no absolute
truth, rather a statistical one! Or you accept some margin of possible
errors with a parallel version, and run jobs on interesting materials,
or use the serial version, but serial is serial...

3. WURTZITE GAN; THE PARALLEL UNIVERSE

After banging my head with regression tests, I decided to start with a
single test, physically meaningful to me, on a system I know a bit;
GaN.
For practical reasons I have been simulating such wurtzite structure
in an orthorombic 96-atoms supercell, somehow simpler than a 72-atom
hexagonal supercell. For such a computational setup, Gamma point is
not
the best choice to represent the Brillouin zone (especially if you
plan
to study charged defects), but for my present purpose (i.e., to see
how f
ast and accurate is CP2K in describing such a system, even with the
Gamma bias) it is certainly enough.

The first test in my mind was a lattice parameter check, and I take
the
Quantum-espresso as reference in describing such a setup. After having
optimized the lattice parameter with the 4-atom hexagonal supercell,
PBE exchange-correlation functional and Vanderbilt pseudopotentials,
I build up the 96-atom orthorombic supercell and check the equilibrium
parameters, with QE, first with a special point out of gamma, second
with gamma. In both the runs I perform 9 geometry optimizations,
by varying the a and c lattice parameters between equilibrium values
+/-
2% (in orthorombic setup, the parameter b is only figuratively
independent, it must have a fixed ratio to a).
Equilibrium parameters are the same for both the setups,
curvature of the E vs. V surface change a bit, but the qualitative
behaviour is similar, and that's enough for my present purposes.

Starts the hunt for CP2K lattice check. First of all, there are not
many
choices for the basis functions of Ga; only two are available,
SZV-MOLOPT-SR-GTH and DZVP-MOLOPT-SR-GTH, so, since I think that
calculations will not take too long, I would like to check them both.
N basis functions are taken from the same basis set of Ga.
Let's start with gfortran and SZV functions. With the serial
executable
all things seem to go quite smooth, from the computational point of
view.
Having a look with gnuplot to the E.vs.[a,V] surface, one sees that
energies
seem not to be at convergence and that lattice parameters want to be
slightly longer. No problem at all, we all know that a different
computational
approach can give different results (not too different, of course) and
that
some trial is needed to find a satisfactorily converged setup.
Here follow the en.vs.v data file;
# gf.szv.sopt
#  a         b=0.866024*a  c            V               Etot (a.u.)
10.929597278 9.465293553 10.285122644 1064.014930564
-4057.458008944183803
10.929597278 9.465293553 10.495023107 1085.729521074
-4057.533586053476483
10.929597278 9.465293553 10.704923569 1107.444111481
-4057.507101145449269
11.152650284 9.658462809 10.285122644 1107.887266316
-4057.463073096792868
11.152650284 9.658462809 10.495023107 1130.497210621
-4057.526841242022783
11.152650284 9.658462809 10.704923569 1153.107154819
-4057.492749061978429
11.375703289 9.851632065 10.285122644 1152.645911785
-4057.497072713405032
11.375703289 9.851632065 10.495023107 1176.169297839
-4057.542569216006086
11.375703289 9.851632065 10.704923569 1199.692683780
-4057.498076395957469

And now let's do a parallel run!
# gf.szv.popt
#  a         b=0.866024*a  c            V               Etot (a.u.)
10.929597278 9.465293553 10.285122644 1064.014930564
-4057.457522775173857
10.929597278 9.465293553 10.495023107 1085.729521074
-4057.537624651848091
10.929597278 9.465293553 10.704923569 1107.444111481
-4057.521115877077136
11.152650284 9.658462809 10.285122644 1107.887266316
-4057.462881508553892
11.152650284 9.658462809 10.495023107 1130.497210621
-4057.525928152936103
11.152650284 9.658462809 10.704923569 1153.107154819
-4057.493003056738416
11.375703289 9.851632065 10.285122644 1152.645911785
-4057.496485318506529
11.375703289 9.851632065 10.495023107 1176.169297839
-4057.540900292964125
11.375703289 9.851632065 10.704923569 1199.692683780
-4057.498267115213821

The two data sets are quite similar, as it should be. Just a couple
of points are different for more than some rounding, but I don't think
this is an issue. Instead, what is to be noted is that all 9 runs (all
9!)
crashed after the first WF optimization. Restarting them from that
point produces a smooth optimization for each parameter set; but the
user
needs to know it, not leaving a nice weekend script unattended on
friday
afternoon...
Let's check the corresponding results with intel compilers;
# if.szv.sopt
#  a         b=0.866024*a  c            V               Etot (a.u.)
10.929597278 9.465293553 10.285122644 1064.014930564
-4057.458008977281679
10.929597278 9.465293553 10.495023107 1085.729521074
-4057.537003824978001
10.929597278 9.465293553 10.704923569 1107.444111481
-4057.513785199121230
11.152650284 9.658462809 10.285122644 1107.887266316
-4057.463073095510481
11.152650284 9.658462809 10.495023107 1130.497210621
-4057.526841383722513
11.152650284 9.658462809 10.704923569 1153.107154819
-4057.492261957391747
11.375703289 9.851632065 10.285122644 1152.645911785
-4057.497072713427315
11.375703289 9.851632065 10.495023107 1176.169297839
-4057.542569215584990
11.375703289 9.851632065 10.704923569 1199.692683780
-4057.497992974855151

# if.szv.popt
#  a         b=0.866024*a  c            V               Etot (a.u.)
10.929597278 9.465293553 10.285122644 1064.014930564
-4057.198682399558948
10.929597278 9.465293553 10.495023107 1085.729521074
-4057.258843991621688
10.929597278 9.465293553 10.704923569 1107.444111481
-4057.262900019371045
11.152650284 9.658462809 10.285122644 1107.887266316
-4057.464431621376661
11.152650284 9.658462809 10.495023107 1130.497210621
-4057.527029648703774
11.152650284 9.658462809 10.704923569 1153.107154819
-4057.492472888854536
11.375703289 9.851632065 10.285122644 1152.645911785
-4057.337460866624042
11.375703289 9.851632065 10.495023107 1176.169297839
-4057.395782342439361
11.375703289 9.851632065 10.704923569 1199.692683780
-4057.470720146232907

The serial runs are almost identical to both the gfortran, while
noticeable
differences are to be read in energy values that do not rely on the
"equilibrium" a parameter; when compressing or expanding a the energy
rises much more than before. What's going on? After the first death,
exactly
the same way as gfortran executables, the jobs are resuscitated by an
extremely careful and good user. And now, well, some runs died
with some less regularity than gfortran ones, just here and there, and
having
left them unattended (you know background jobs?) I didn't realize that
soon.
BUT BUT BUT; wait a minute! In some runs there are SCF WF
optimizations
that do not converge in some of the cycles. Nevertheless the geometry
optimization goes on. Since in principle a soft exit procedure must be
implemented before moving the ions if all convergence criteria
are not fulfilled, and since it is self-evident that such a procedure
does not work, there must be some hard implementation problem.
Otherwise, and I obtained it, in such cases geometry optimizations
go bananas...
And now have a look at the OpenMP (4 threads) runs;
# gf.szv.ssmp (4 threads)
#  a         b=0.866024*a  c            V               Etot (a.u.)
10.929597278 9.465293553 10.285122644 1064.014930564
-4893.469403900166071
10.929597278 9.465293553 10.495023107 1085.729521074
-4893.637186283114715
10.929597278 9.465293553 10.704923569 1107.444111481
-4057.507134425624372
11.152650284 9.658462809 10.285122644 1107.887266316
-4961.091937232511555
11.152650284 9.658462809 10.495023107 1130.497210621
-4962.164469275494412
11.152650284 9.658462809 10.704923569 1153.107154819
-4139.210135050992903
11.375703289 9.851632065 10.285122644 1152.645911785
-4970.055752688032044
11.375703289 9.851632065 10.495023107 1176.169297839
-4972.716349927931333
11.375703289 9.851632065 10.704923569 1199.692683780
-4979.617441504382441

Ehm ... perhaps Fleischmann & Pons were right, there must be some cold
fusion when we change lattice parameters; energy differences around
850-900 a.u. are around 23-24 keV!!! If you take a look at the
energies
you find that only one job converged, the third in the table, a-2%_c
+2%;
one wonders why? Perhaps there are still troubles with OpenMP
parallelization,
it is conceivable after all.
Moreover, also in this case, there were SCF runs NOT CONVERGED that
left the GEOOPT run continue silently. Both opportunities, of course,
could happen in the same jobs ...

As a further check, let us perform OpenMP calculations with just one
thread; if the issue is in parallel distribution of the process, now
it must come out.
# gf.szv.ssmp (1 thread)
#  a         b=0.866024*a  c            V               Etot (a.u.)
10.929597278 9.465293553 10.285122644 1064.014930564
-4893.530979378364464
10.929597278 9.465293553 10.495023107 1085.729521074
-4893.584138374711983
10.929597278 9.465293553 10.704923569 1107.444111481
-4057.507112433383554
11.152650284 9.658462809 10.285122644 1107.887266316
-4961.410387787046602
11.152650284 9.658462809 10.495023107 1130.497210621
-4961.751357043548524
11.152650284 9.658462809 10.704923569 1153.107154819
-4140.418679071214683
11.375703289 9.851632065 10.285122644 1152.645911785
-4971.225380472770667
11.375703289 9.851632065 10.495023107 1176.169297839
-4972.673542198795985
11.375703289 9.851632065 10.704923569 1199.692683780
-4979.094830069981981
The global behaviour is extremely similar to the previous one, so the
problem must be in OpenMP itself.
As you remember, ifort doesn't understand OpenMP directives, so there
is no
intel counterpart here.
DZVP runs are still, well, walking rather than running, but the global
behaviour does not seem to change. For instance erratic OpenMP still
occurs (even worse, I would say);

# gf.dzvp.ssmp (4 thread)
#  a         b=0.866024*a  c            V               Etot (a.u.)
10.929597278 9.465293553 10.285122644 1064.014930564
-5179.732446189901566
10.929597278 9.465293553 10.495023107 1085.729521074
-5304.687979865198031
10.929597278 9.465293553 10.704923569 1107.444111481
-4060.111633893895487
11.152650284 9.658462809 10.285122644 1107.887266316
-4782.529215675342130
11.152650284 9.658462809 10.495023107 1130.497210621
-5438.867725617709766
11.152650284 9.658462809 10.704923569 1153.107154819
11.375703289 9.851632065 10.285122644 1152.645911785
-5075.273964402318597
11.375703289 9.851632065 10.495023107 1176.169297839
-5129.076182202904420
11.375703289 9.851632065 10.704923569 1199.692683780
-4572.932829343702906

With parallel intel, instead, we find again that serial and parallel
results look similar, but ... parallel jobs crash randomly ...

4. PENELOPE UNRAVELLING HER WEB

At the end of this long journey, through the stormy sees of
compilation
I found that most of my work looked like Penelope unravelling her web;
one day of conjectures and compilations, one night of job crashes
and segmentation faults. Finally, perhaps, I obtained a working
version
of CP2K, to be used on my local machines to get acquainted with.
And unfortunately, I was not really helped by the web (no, not
Penelope's, the world-wide-web) because informations are often hidden
behind a lot of "numerical noise", and, mainly, configuration files
are lagging behind the knowledge that people before me applied to
this problem. There is no need to reinvent the wheel, also because
those available nowadays are not square, for sure...

Anyway, these are my two pennies for improving the architecture file
archive, and, maybe, stimulate some discussion about the issue of
working versions and debug of the code.


Cheers,
F.







More information about the CP2K-user mailing list