Increasing nb of cores per node degrade drastically the performance of cp2k.popt
Maria Shadrina
shadr... at gmail.com
Thu Jul 28 18:24:17 UTC 2011
Dear Axel,
Thank you very much for your reply.
We succeeded to solve the memory problem by the option "--mca
mpi_paffinity_alone 1 " to mpirun (openmpi).
Now for test 32H2O-md.inp we have:
nodes x cores/node CPU time per iter
2x2 8 sec
2x8 4 sec
Best regards, Maria.
On Jul 12, 7:05 pm, Axel <akoh... at gmail.com> wrote:
> On Tuesday, July 12, 2011 6:37:36 PM UTC-4, Maria Shadrina wrote:
>
> > The CPU is Intel-Xeon-E5462 (Harpertown). Is that normal situation for
> > this type of xeon processors?
>
> yes, as i already mentioned. those processors are extremely limited
> in their available memory bandwidth by two CPUs having to share a
> single memory controller per mainboard. all communication and data
> access has to go through this single bottleneck.
>
> typically, the memory bandwidth hungry applications (like cp2k but
> not limited to it) are quite efficiently run with 4 cores per (dual
> processor)
> node. this does require the use of processor affinity, though, so that MPI
> tasks are evenly spread out across the cpus and cores. the reason for
> this lies in the design of the CPU which is not a single quad-core CPU
> but two dual core CPUs glued into one case. each of the dual-core CPUs
> has a fairly large L2-cache which will help to make memory accesses
> more efficient. by using only half the cores, you effectively double the
> size of the cache and that helps massively. thus you are actually using
> most of all CPUs and wasting less than you might think. sometimes,
> the impact of the communication to the memory bandwidth contention
> can be reduced by using a hybrid OpenMP+MPI binary, but i never tried
> it for harpertown cpus.
>
> cheers,
> axel.
>
>
>
>
>
>
>
>
>
> > Best regards, Maria.
>
> > > this happens a lot with CPUs that have very limited memory bandwith.
> > > the quickstep algorithm is very demanding in terms of memory bandwidth.
>
> > > what type of xeon processors exactly? that makes all the difference.
> > > the 56xx (westmere) and 55xx (nehalem) series ones for example have
> > > _much_ more memory bandwidth than 54xx (harpertown) series ones.
>
> > > cheers,
> > > axel.
More information about the CP2K-user
mailing list