cp2k 3.0 H2O-64 benchmark on small cluster

Cohen, Ronald rco... at carnegiescience.edu
Thu Mar 24 16:50:58 UTC 2016

Attached are my benchmark results. Do you think this could be further
improved? The machine is 40 nodes (n001-n040) with 16proc/ node = 640
Intel® Xeon® E5 Cloud Ready 2.4 GHz Compute Processor Cores based on
E5-2665 544GB DDR3 1600Mhz ECC REG System Memory 1GB Memory Per Compute
Processor Core 4x FDR infiniband Mellanox. Best performance is for  4
threads 16  mpi processes and 4 nodes, speedup of ~32 .

My archfile is

CC   = gcc
CPP  =
FC   = mpif90
LD   = mpif90
AR   = ar -r
PREFIX   = /home/rcohen
FFTW_INC   = $(PREFIX)/include
FFTW_LIB   = $(PREFIX)/lib
LIBINT_INC = $(PREFIX)/include
LIBXC_INC  = $(PREFIX)/include
GCC_LIB = $(PREFIX)/gcc-trunk/lib
GCC_LIB64  = $(PREFIX)/gcc-trunk/lib64
GCC_INC = $(PREFIX)/gcc-trunk/include
    -D__parallel -D__SCALAPACK -D__HAS_smm_dnn -D__ELPA3
FCFLAGS = $(DFLAGS) -O2 -ffast-math -ffree-form -ffree-line-length-none\
    -fopenmp -ftree-vectorize -funroll-loops\
    -mtune=native  \
     -I$(FFTW_INC) -I$(LIBINT_INC) -I$(LIBXC_INC) -I$(MKLROOT)/include \
     -I$(GCC_INC) -I$(PREFIX)/include/elpa_openmp-2015.11.001/modules
LIBS    =  \
$(PREFIX)/lib/libsmm_dnn_sandybridge-2015-11-10.a \
    $(PREFIX)/lib/liblapack.a  $(PREFIX)/lib/libtmglib.a
$(PREFIX)/lib/libgomp.a  \
    $(PREFIX)/lib/libderiv.a $(PREFIX)/lib/libint.a  -lelpa_openmp -lgomp
LDFLAGS = $(FCFLAGS)  -L$(GCC_LIB64) -L$(GCC_LIB) -static-libgfortran

gcc is 6.0.0
openmpi 1.10.2
scalapack 2.0.2
libint 1.1.5 (I tried libint 2.0.3 but seems to be missing derivs)
libxc 2.2.2
openblas xianyi-OpenBLAS-c679dd1
fftw 3.3.4

[image: Inline image 1]
Speedup total is for the whole benchmark, setup plus 30 timesteps.
Speedup step is for the time for the last time step.
64 Molecules H2O FPMD NVT within LDA

Ronald Cohen
Geophysical Laboratory
Carnegie Institution
5251 Broad Branch Rd., N.W.
Washington, D.C. 20015
rco... at carnegiescience.edu
office: 202-478-8937
skype: ronaldcohen
