comparison of psmp and popt (with and without openmp)

Ronald Cohen rco... at carnegiescience.edu
Wed Mar 23 20:28:19 UTC 2016


So I finally got decent performance with gfortran, openmpi, and openblas 
across inifiniband. Now I find that the use of openmp and 
half the number of mpi processes seems to give better performance for the 
64 molecule H2O test case. Is that reasonable? I recompiled everything 
including BLAS, scalapack, etc without -fopenmp etc. to make the popt 
version.

I find in seconds:

1 node 16 MPI procs psmp OMP_NUM_THREADS=1              834
1 node 16 MPI procs popt OMP_NUM_THREADS=1                836
2 nodes 16 MPI procs psmp OMP_NUM_THREADS=2             266
2 nodes 32 MPI procs popt OMP_NUM_THREADS=1               430
4 nodes   64 MPI procs popt OMP_NUM_THREADS=1             331
4 nodes   32 MPI procs psmp OMP_NUM_THREADS=2           189
4 nodes   64 MPI procs psmp OMP_NUM_THREADS=4           166

So you see there is no overhead using psmp built with openmp and setting 
threads to 1.
Using OMP THREADS greatly improves performance over just increasing mpi 
processes
This may be because this machine has only 1 GB memory per core, but even 4 
threads is better than 2, so it seems openmp 
is more efficient than mpi.

Still room for improvement though. Any ideas of how to tweak out better 
performance?


Ron

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20160323/c061e1f6/attachment.htm>


More information about the CP2K-user mailing list