cp2k 3.0 H2O-64 benchmark on small cluster
Cohen, Ronald
rco... at carnegiescience.edu
Thu Mar 24 16:50:58 UTC 2016
Attached are my benchmark results. Do you think this could be further
improved? The machine is 40 nodes (n001-n040) with 16proc/ node = 640
Intel® Xeon® E5 Cloud Ready 2.4 GHz Compute Processor Cores based on
E5-2665 544GB DDR3 1600Mhz ECC REG System Memory 1GB Memory Per Compute
Processor Core 4x FDR infiniband Mellanox. Best performance is for 4
threads 16 mpi processes and 4 nodes, speedup of ~32 .
My archfile is
CC = gcc
CPP =
FC = mpif90
LD = mpif90
AR = ar -r
PREFIX = /home/rcohen
FFTW_INC = $(PREFIX)/include
FFTW_LIB = $(PREFIX)/lib
LIBINT_INC = $(PREFIX)/include
LIBINT_LIB = $(PREFIX)/lib
LIBXC_INC = $(PREFIX)/include
LIBXC_LIB = $(PREFIX)/lib
GCC_LIB = $(PREFIX)/gcc-trunk/lib
GCC_LIB64 = $(PREFIX)/gcc-trunk/lib64
GCC_INC = $(PREFIX)/gcc-trunk/include
DFLAGS = -D__FFTW3 -D__LIBINT -D__LIBXC2\
-D__LIBINT_MAX_AM=7 -D__LIBDERIV_MAX_AM1=6 -D__MAX_CONTR=4\
-D__parallel -D__SCALAPACK -D__HAS_smm_dnn -D__ELPA3
CPPFLAGS =
FCFLAGS = $(DFLAGS) -O2 -ffast-math -ffree-form -ffree-line-length-none\
-fopenmp -ftree-vectorize -funroll-loops\
-mtune=native \
-I$(FFTW_INC) -I$(LIBINT_INC) -I$(LIBXC_INC) -I$(MKLROOT)/include \
-I$(GCC_INC) -I$(PREFIX)/include/elpa_openmp-2015.11.001/modules
LIBS = \
$(PREFIX)/lib/libscalapack.a
$(PREFIX)/lib/libsmm_dnn_sandybridge-2015-11-10.a \
$(FFTW_LIB)/libfftw3.a\
$(FFTW_LIB)/libfftw3_threads.a\
$(LIBXC_LIB)/libxcf90.a\
$(LIBXC_LIB)/libxc.a\
$(PREFIX)/lib/liblapack.a $(PREFIX)/lib/libtmglib.a
$(PREFIX)/lib/libgomp.a \
$(PREFIX)/lib/libderiv.a $(PREFIX)/lib/libint.a -lelpa_openmp -lgomp
-lopenblas
LDFLAGS = $(FCFLAGS) -L$(GCC_LIB64) -L$(GCC_LIB) -static-libgfortran
-L$(PREFIX)/lib
gcc is 6.0.0
gfortran
openmpi 1.10.2
scalapack 2.0.2
elpa-2015.11.001
libint 1.1.5 (I tried libint 2.0.3 but seems to be missing derivs)
libxc 2.2.2
openblas xianyi-OpenBLAS-c679dd1
mm_dnn_sandybridge-2015-11-10
fftw 3.3.4
[image: Inline image 1]
Speedup total is for the whole benchmark, setup plus 30 timesteps.
Speedup step is for the time for the last time step.
64 Molecules H2O FPMD NVT within LDA
---
Ronald Cohen
Geophysical Laboratory
Carnegie Institution
5251 Broad Branch Rd., N.W.
Washington, D.C. 20015
rco... at carnegiescience.edu
office: 202-478-8937
skype: ronaldcohen
https://twitter.com/recohen3
https://www.linkedin.com/profile/view?id=163327727
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20160324/025d25e2/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 14716 bytes
Desc: not available
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20160324/025d25e2/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: deepcarbon_timings2.xlsx
Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Size: 40276 bytes
Desc: not available
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20160324/025d25e2/attachment.xlsx>
More information about the CP2K-user
mailing list