Increasing nb of cores per node degrade drastically the performance of cp2k.popt

Maria Shadrina shadr... at gmail.com
Thu Jul 28 18:24:17 UTC 2011


Dear Axel,
Thank you very much for your reply.

We succeeded to solve the memory problem by the option "--mca
mpi_paffinity_alone 1 " to mpirun (openmpi).
Now for test 32H2O-md.inp we have:

 nodes x cores/node   CPU time per iter

2x2          8 sec

2x8          4 sec

Best regards, Maria.



On Jul 12, 7:05 pm, Axel <akoh... at gmail.com> wrote:
> On Tuesday, July 12, 2011 6:37:36 PM UTC-4, Maria Shadrina wrote:
>
> > The CPU is Intel-Xeon-E5462 (Harpertown). Is that normal situation for
> > this type of xeon processors?
>
> yes, as i already mentioned. those processors are extremely limited
> in their available memory bandwidth by two CPUs having to share a
> single memory controller per mainboard. all communication and data
> access has to go through this single bottleneck.
>
> typically, the memory bandwidth hungry applications (like cp2k but
> not limited to it) are quite efficiently run with 4 cores per (dual
> processor)
> node. this does require the use of processor affinity, though, so that MPI
> tasks are evenly spread out across the cpus and cores. the reason for
> this lies in the design of the CPU which is not a single quad-core CPU
> but two dual core CPUs glued into one case. each of the dual-core CPUs
> has a fairly large L2-cache which will help to make memory accesses
> more efficient. by using only half the cores, you effectively double the
> size of the cache and that helps massively. thus you are actually using
> most of all CPUs and wasting less than you might think. sometimes,
> the impact of the communication to the memory bandwidth contention
> can be reduced by using a hybrid OpenMP+MPI binary, but i never tried
> it for harpertown cpus.
>
> cheers,
>     axel.
>
>
>
>
>
>
>
>
>
> > Best regards, Maria.
>
> > > this happens a lot with CPUs that have very limited memory bandwith.
> > > the quickstep algorithm is very demanding in terms of memory bandwidth.
>
> > > what type of xeon processors exactly? that makes all the difference.
> > > the 56xx (westmere) and 55xx (nehalem) series ones for example have
> > > _much_ more memory bandwidth than 54xx (harpertown) series ones.
>
> > > cheers,
> > >      axel.


More information about the CP2K-user mailing list