Increasing nb of cores per node degrade drastically the performance of cp2k.popt

Axel akoh... at
Tue Jul 12 23:05:28 UTC 2011

On Tuesday, July 12, 2011 6:37:36 PM UTC-4, Maria Shadrina wrote:
> The CPU is Intel-Xeon-E5462 (Harpertown). Is that normal situation for 
> this type of xeon processors? 

yes, as i already mentioned. those processors are extremely limited
in their available memory bandwidth by two CPUs having to share a 
single memory controller per mainboard. all communication and data
access has to go through this single bottleneck.

typically, the memory bandwidth hungry applications (like cp2k but
not limited to it) are quite efficiently run with 4 cores per (dual 
node. this does require the use of processor affinity, though, so that MPI 
tasks are evenly spread out across the cpus and cores. the reason for
this lies in the design of the CPU which is not a single quad-core CPU 
but two dual core CPUs glued into one case. each of the dual-core CPUs
has a fairly large L2-cache which will help to make memory accesses
more efficient. by using only half the cores, you effectively double the
size of the cache and that helps massively. thus you are actually using
most of all CPUs and wasting less than you might think. sometimes,
the impact of the communication to the memory bandwidth contention
can be reduced by using a hybrid OpenMP+MPI binary, but i never tried
it for harpertown cpus.


> Best regards, Maria. 
> > 
> > this happens a lot with CPUs that have very limited memory bandwith. 
> > the quickstep algorithm is very demanding in terms of memory bandwidth. 
> > 
> > 
> > 
> > what type of xeon processors exactly? that makes all the difference. 
> > the 56xx (westmere) and 55xx (nehalem) series ones for example have 
> > _much_ more memory bandwidth than 54xx (harpertown) series ones. 
> > 
> > cheers, 
> >      axel. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the CP2K-user mailing list