how to improve CP2K scaling for small systems

labicia lab... at yahoo.it
Fri Mar 14 11:58:32 UTC 2014


Good morning!

Our group is running BOMD simulations in one unit cell of ZIF-8 crystal 
(276 atoms, cubic system of side 16.9856 A).

With the following setup,
PBE with GTH-TZV2P basis and GTH pseudopotentials (700 Ry cutoff)
we have noticed that there is no benefit in using more than one core in a 
quadcore processor;
on the other hand, the code scaled very well increasing the number of 
different processors (sockets).

Example of Non Scaling:
Having a machine with four processors (sockets), each processor with four 
cores (xeon X7350),
running 4 MPI, one per each socket, takes 55 seconds per OT-DIIS,
while running 8 MPI, two per socket, takes 41 seconds,
while running 16 MPI, all cores, takes 40 seconds.
This same NON-SCALING behavior across multiple cores of the same processor 
has been observed
also on a i5-2550K intel processor.
Moreover, the behavior is the same using threads or mixing MPI and threads 
(OpenMP).

Example of Scaling:
The same system, using hybrid functionals,
scales more than linearly with the number of cores:
16MPI go more than 4 times faster than 4MPI.
(Even thought the time required for an MD step becomes
prohibitive to think about doing such calculations!)

Considerations:
On this basis, it seems that our PBE simulations are not computationally 
expensive.
The limiting factor seems to be the cache, i.e. increasing number of
sockets the cache increases and so does the speed of the simulations.

Questions:
Do you agree with this analysis?
Is there a way to improve the speed of our PBE computations (for example, 
some option to reduce the
amount of data to be transferred between RAM and cache)?

Thanks a lot for the attention,
best regards,
Marco and Andrea
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20140314/2f6c8427/attachment.htm>


More information about the CP2K-user mailing list