cp2k speedup on multicore machines
Axel
akoh... at gmail.com
Wed Jan 30 01:37:58 UTC 2008
hi!
sorry for the delay, it took me a while to get some numbers together.
my machine is a dual processor intel xeon 5150 @ 2.66GHz (woodcrest).
first off, contrary to fawzi's statement, cpu affinity in OpenMPI has
to be explicitely enabled (e.g. via setting mpi_paffinity_alone=1 in
~/.openmpi/mca-params.conf). however what both LAM/MPI and
particularly
OpenMPI have activated by default are algorithms that can take
advantage
of locality and that require the correct specification of nodes.
to have better control, i'm not using the paffinity feature in OpenMPI
but use 'numactl --physcpubind=xx bash' to restrict the cpus available
to mpirun. with my set up i can either run one copy of cp2k.popt in
each of the two cpus or both on the two cores of the same cpu.
i get the following walltimes (all with the cp2k.popt binary):
1 cpu 1010.39s
2 cpu (2 cores in cpu) 624.90s
2 cpu (1 core each in 2 cpu) 550.87s
4 cpu (with processor affinity) 362.83s
so there is some significant speedup to be had. it also is
reduced in the case of quad core machines. we are currently
buying them anyways, because they cost almost the same as
dual core and with the dual-dual-core layout, one has effectively
twice the cpu cache when using only half the cores.
i guess speed thus depends a lot on the setup of the machine,
and particularly of the speed of the memory vs. the speed
of the CPU. the quality of the MPI implementation and
the cache efficiency of the compiled code (higher optimization
and vectorization rarely helps for large package codes).
here are my compiler settings:
CC = cc
CPP =
FC = mpif90 -FR
LD = mpif90
AR = ar -r
DFLAGS = -D__INTEL -D__FFTSG -D__FFTW3 \
-D__parallel -D__BLACS -D__SCALAPACK
CPPFLAGS = -traditional -C $(DFLAGS) -P
FCFLAGS = $(DFLAGS) -O2 -unroll -tpp6 -pc64 -fpp
LDFLAGS = -i-static -openmp $(FCFLAGS) -L/opt/intel/mkl/9.0/lib/em64t
\
-Wl,-rpath,/opt/intel/fce/9.1.040/lib:$(HOME)/openmpi/
lib
LIBS = $(HOME)/lib/libscalapack.a $(HOME)/lib/blacsF77init_MPI-
LINUX-0.a \
$(HOME)/lib/blacs_MPI-LINUX-0.a -lmkl_lapack -lmkl_em64t -
lfftw3
cheers,
axel.
On Jan 26, 3:40 pm, Matt W <MattWa... at gmail.com> wrote:
> Do you have an example of any code running well on 2 cores?
>
> Matt
>
> On Jan 26, 7:31 pm, cavallo <lcav... at unisa.it> wrote:
>
> > Thanks to all.
>
> > Yes, 8MB is a typo, the machine is 8GB ram. It is a HP proliant dl140,
> > with 2 em64t Intel Xeon CPU 5160 @ 3.00GHz from /proc/cpuinfo.
>
> > This is the kernel/compilers/libraries I used to compile cp2k, and
> > after this you can find the compiling options I used.
> > I preparred a mpich2 machine file as
> > 10.10.10.119 cpu=2 or with two lines with the ip, and after I
> > mpdboot -n 2 -f mpi.file I see this with ps x.
>
> > python2.5 /home/programs/mpich2/64/1.0.6p1/bin/mpd.py --ncpus=2 -e -d
>
> > Any idea ?
> > Thanks,
> > Luigi
>
> > Linux k119 2.6.21-1.3194.fc7 #1 SMP Wed May 23 22:47:07 EDT 2007
> > x86_64 x86_64 x86_64 GNU/Linux
> > gcc version 4.1.2 20070925 (Red Hat 4.1.2-27)
> > ifort Build 20070613 Package ID: l_fc_c_10.0.025
> > intel mkl 10.0.1.014
> > fftw-3.1.2
> > mpich2-1.0.6p1
>
> > INTEL_INC= /home/programs/intel/64/fce/10.0.025/include/
> > FFTW3_INC= /home/programs/fftw3/em64t/include/
>
> > MKL_LIB= /home/programs/intel/64/mkl/10.0.1.014/lib/em64t/
> > FFTW3_LIB= /home/programs/fftw3/em64t/lib/
>
> > CC = cc
> > CPP =
> > FC = mpif90
> > LD = mpif90
> > AR = ar -r
> > DFLAGS = -D__INTEL -D__FFTSG -D__parallel -D__BLACS -D__SCALAPACK -
> > D__FFTW3
> > CPPFLAGS = -I$(INTEL_INC) -I$(FFTW3_INC)
> > FCFLAGS = $(DFLAGS) -I$(INTEL_INC) -I$(FFTW3_INC) -O3 -xW -heap-
> > arrays 64 -funroll-loops -fpp -free
> > LDFLAGS = $(FCFLAGS) -I$(INTEL_INC) -L$(MKL_LIB)
>
> > LIBS = $(MKL_LIB)/libmkl_scalapack.a \
> > $(MKL_LIB)/libmkl_blacs.a \
> > $(MKL_LIB)/libmkl_em64t.a \
> > $(MKL_LIB)/libguide.a \
> > -lpthread \
> > $(FFTW3_LIB)/libfftw3.a
More information about the CP2K-user
mailing list