cp2k bulk water benchmarks, intel xeon quadcode w infiniband

Axel akoh... at gmail.com
Mon Sep 1 17:54:06 UTC 2008


On Sep 1, 10:26 am, cavallo <lcav... at unisa.it> wrote:
> Dear Axel,

dear luigi,

> thank you for the very interesting post. I just have one question.
> According to my little HW knowledge, the bottleneck using all the
> cores is RAM access. I knew that amd outperforms intel on this. At
> least,  this is what some posts in other lists claim. In the case,

yes. amd cpus are better in that respect. however, also the next
generation intel cpus "nehalem" should be better (due to "quickpath").

> could it be that cp2k on the opterons scales better than on the intel,
> which means that all the 8 cores could be used ?  Any experience/
> comments on this ?

well, there is scaling and there is performance, and there is
"bang for the buck" if you use 8 cores per node, you also have
to factor in that all 8 cores use the same infiniband HCA to
communicate and that seems to have an impact as well. i can see
it in the timings, the MPI_Alltoall is quite slow and doing
MPI_Alltoall in single precision improves performance quite a bit.
you can see from the graphs that for using only few nodes, the
performance
of 8 cores/node is better. this problem does not go away with amd.

with that in mind, i originally spec'ed the machine to have dual core
cpus, but then the quad cores turned out to be cheaper, so in my book
we basically have dual-core with double the cache memory. ;-)

also, at the time of the purchase no bidder had a competitive offer
from amd (intel has obviously been giving heavy discounts on their
quad cores to regain sales).

we'll be outfitting our old cluster with dual core opterons with an
(even older) myrinet soon, and then i can hopefully provide some
info on that. my colleague ben has access to the TACC machine range
(with 4x quad-core amd cpus) so perhaps i can ask him to redo those
benchmarks there as well for comparison.

so in short, considering all options and maximizing performance of
the machine for cp2k, we got a very good deal, even though we're
wasting cpu cores, we get more work done this way.

cheers,
    axel.

>
> Thanks,
> Luigi
>
> Axel wrote:
> > hi everybody,
>
> > since there were several discussions on the performance and
> > scaling of cp2k, i just done a series of benchmark runs on our
> > new cluster and uploaded a graph with the resulting data to.
>
> >http://groups.google.com/group/cp2k/web/cp2k-water-bench-cmm.png
>
> > scaling is quite ok for the larger systems. the main result is that
> > with quad-core nodes it is almost always better to use only half
> > the cores (this was expected, but at the time of purchase the
> > quad-cores were cheaper than the available dual-core cpus).
>
> > a few notes on hard and software:
> > each node has 8GB RAM and two 2.66GHz intel xeon E5430 quad-core cpus
> > (45nm)
> > the nodes were manufactured by dell
> > OS: Scientific Linux 5.1
> > infiniband is (from /sbin/lspci): Mellanox Technologies MT25204
> > [InfiniHost III Lx HCA]
> > infiniband speed is 4x DDR (20 Gb/sec).
> > infiniband software: ofed-1.3.1
> > MPI: OpenMPI 1.2.7, using --mca btl_openib_use_srq 1 and --mca
> > mpi_paffinity_alone = 1
> > compiler: intel 10.1.015, optimization is set to: -O2 -unroll -
> > march=pentiumpro -pc64
> > scalapack/blacs/lapack/blas: intel mkl 10.0.1.014
> > cp2k version: cvs as of 2008-08-29
> > linker flags: LDFLAGS  = $(FCFLAGS) -i-static
> > LIBS     = -L/cmm/pkg/intel/mkl/default/lib/em64t/ -Wl,-rpath,/cmm/pkg/
> > intel/mkl/default/lib/em64t/ \
> >            -lmkl_scalapack_lp64 -lmkl_blacs_openmpi_lp64 -
> > lmkl_intel_lp64 -lmkl_sequential -lmkl_core \
> >            /cmm/pkg/lib/libfftw3.a
>
> > cheers,
> >    axel.


More information about the CP2K-user mailing list