comparison of psmp and popt (with and without openmp)

Cohen, Ronald rco... at carnegiescience.edu
Fri Mar 25 17:01:38 UTC 2016


I am finding very strange dependence of the benchmark on how I run under
openmpi. Does anyone have any insight?

cp2k 3.0

If I simply use:

mpirun  -n 16 cp2k.psmp H2O-64.inp >> H2O-64_REC.log

with

#PBS -l nodes=n013.cluster.com:ppn=4+n014.cluster.com:
ppn=4+n015.cluster.com:ppn=4+n016.cluster.com:ppn=4
for example.

The timing is 165 seconds, and for

#PBS -l nodes=4:ppn=16,pmem=1gb
mpirun  --map-by ppr:4:node -n 16  cp2k.psmp H2O-64.inp >> H2O-64_REC.log
it is 368 seconds!

Ron


---
Ronald Cohen
Geophysical Laboratory
Carnegie Institution
5251 Broad Branch Rd., N.W.
Washington, D.C. 20015
rco... at carnegiescience.edu
office: 202-478-8937
skype: ronaldcohen
https://twitter.com/recohen3
https://www.linkedin.com/profile/view?id=163327727

On Wed, Mar 23, 2016 at 4:28 PM, Ronald Cohen <rco... at carnegiescience.edu>
wrote:

> So I finally got decent performance with gfortran, openmpi, and openblas
> across inifiniband. Now I find that the use of openmp and
> half the number of mpi processes seems to give better performance for the
> 64 molecule H2O test case. Is that reasonable? I recompiled everything
> including BLAS, scalapack, etc without -fopenmp etc. to make the popt
> version.
>
> I find in seconds:
>
> 1 node 16 MPI procs psmp OMP_NUM_THREADS=1              834
> 1 node 16 MPI procs popt OMP_NUM_THREADS=1                836
> 2 nodes 16 MPI procs psmp OMP_NUM_THREADS=2             266
> 2 nodes 32 MPI procs popt OMP_NUM_THREADS=1               430
> 4 nodes   64 MPI procs popt OMP_NUM_THREADS=1             331
> 4 nodes   32 MPI procs psmp OMP_NUM_THREADS=2           189
> 4 nodes   64 MPI procs psmp OMP_NUM_THREADS=4           166
>
> So you see there is no overhead using psmp built with openmp and setting
> threads to 1.
> Using OMP THREADS greatly improves performance over just increasing mpi
> processes
> This may be because this machine has only 1 GB memory per core, but even 4
> threads is better than 2, so it seems openmp
> is more efficient than mpi.
>
> Still room for improvement though. Any ideas of how to tweak out better
> performance?
>
>
> Ron
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20160325/8d825fb7/attachment.htm>


More information about the CP2K-user mailing list