cp2k 3.0 H2O-64 benchmark on small cluster

Cohen, Ronald rco... at carnegiescience.edu
Thu Mar 24 16:50:58 UTC 2016


Attached are my benchmark results. Do you think this could be further
improved? The machine is 40 nodes (n001-n040) with 16proc/ node = 640
Intel® Xeon® E5 Cloud Ready 2.4 GHz Compute Processor Cores based on
E5-2665 544GB DDR3 1600Mhz ECC REG System Memory 1GB Memory Per Compute
Processor Core 4x FDR infiniband Mellanox. Best performance is for  4
threads 16  mpi processes and 4 nodes, speedup of ~32 .

My archfile is

CC   = gcc
CPP  =
FC   = mpif90
LD   = mpif90
AR   = ar -r
PREFIX   = /home/rcohen
FFTW_INC   = $(PREFIX)/include
FFTW_LIB   = $(PREFIX)/lib
LIBINT_INC = $(PREFIX)/include
LIBINT_LIB = $(PREFIX)/lib
LIBXC_INC  = $(PREFIX)/include
LIBXC_LIB  = $(PREFIX)/lib
GCC_LIB = $(PREFIX)/gcc-trunk/lib
GCC_LIB64  = $(PREFIX)/gcc-trunk/lib64
GCC_INC = $(PREFIX)/gcc-trunk/include
DFLAGS  = -D__FFTW3 -D__LIBINT -D__LIBXC2\
    -D__LIBINT_MAX_AM=7 -D__LIBDERIV_MAX_AM1=6 -D__MAX_CONTR=4\
    -D__parallel -D__SCALAPACK -D__HAS_smm_dnn -D__ELPA3
CPPFLAGS   =
FCFLAGS = $(DFLAGS) -O2 -ffast-math -ffree-form -ffree-line-length-none\
    -fopenmp -ftree-vectorize -funroll-loops\
    -mtune=native  \
     -I$(FFTW_INC) -I$(LIBINT_INC) -I$(LIBXC_INC) -I$(MKLROOT)/include \
     -I$(GCC_INC) -I$(PREFIX)/include/elpa_openmp-2015.11.001/modules
LIBS    =  \
    $(PREFIX)/lib/libscalapack.a
$(PREFIX)/lib/libsmm_dnn_sandybridge-2015-11-10.a \
    $(FFTW_LIB)/libfftw3.a\
    $(FFTW_LIB)/libfftw3_threads.a\
    $(LIBXC_LIB)/libxcf90.a\
    $(LIBXC_LIB)/libxc.a\
    $(PREFIX)/lib/liblapack.a  $(PREFIX)/lib/libtmglib.a
$(PREFIX)/lib/libgomp.a  \
    $(PREFIX)/lib/libderiv.a $(PREFIX)/lib/libint.a  -lelpa_openmp -lgomp
-lopenblas
LDFLAGS = $(FCFLAGS)  -L$(GCC_LIB64) -L$(GCC_LIB) -static-libgfortran
-L$(PREFIX)/lib

gcc is 6.0.0
gfortran
openmpi 1.10.2
scalapack 2.0.2
elpa-2015.11.001
libint 1.1.5 (I tried libint 2.0.3 but seems to be missing derivs)
libxc 2.2.2
openblas xianyi-OpenBLAS-c679dd1
mm_dnn_sandybridge-2015-11-10
fftw 3.3.4

[image: Inline image 1]
Speedup total is for the whole benchmark, setup plus 30 timesteps.
Speedup step is for the time for the last time step.
64 Molecules H2O FPMD NVT within LDA

---
Ronald Cohen
Geophysical Laboratory
Carnegie Institution
5251 Broad Branch Rd., N.W.
Washington, D.C. 20015
rco... at carnegiescience.edu
office: 202-478-8937
skype: ronaldcohen
https://twitter.com/recohen3
https://www.linkedin.com/profile/view?id=163327727
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20160324/025d25e2/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 14716 bytes
Desc: not available
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20160324/025d25e2/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: deepcarbon_timings2.xlsx
Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Size: 40276 bytes
Desc: not available
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20160324/025d25e2/attachment.xlsx>


More information about the CP2K-user mailing list