mixed openMP-MPI for HF exchange

flo fsch... at pci.uzh.ch
Wed Nov 9 11:26:18 UTC 2011

Hi Simone,

usually it should be the other way around. OMP should allow for more
integrals to be stored in core. This is mainly for the reason, that
the full density matrix gets replicated on every MPI task. For big
systems this can require quite a bit of memory and therefore is rather
inefficient. Could you check in your input file, that you have the
correct settings in HF%MEMORY%MAX_MEMORY. Here you can specify how
much memory every MPI task can access to store integrals. The default
is 512MB. That means:
64MPI= 32768MB for integrals
64OMP(4 threads)=8192MB for integrals
64OMP(8threads) = 4096MB for integrals

I think this explains, why MPI always fits, OMP with 4threads from 128
and with 8 threads from 256.

About the difference in speed it is hard to give a perfect answer.
Everything depends on the system size and which routines dominate the
timings and the load balancing. In your case my guess is, that for
such a small system the GGA MPI doesn't scale anymore (30 atoms 64
processors). Furthermore, load balancing gets more difficult on more
MPI tasks, which might further decrease the performance.

Unfortunately there is no perfect strategy to determine the best
combination. Nevertheless, what I usually do is to check on how many
MPI tasks GGA is scaling and than add OMP threads until I got a decent

Hope that helps


More information about the CP2K-user mailing list