comparison of psmp and popt (with and without openmp)

Cohen, Ronald rco... at
Fri Mar 25 17:14:41 UTC 2016

It seems our cluster is slower today than yesterday, as when I ran the -n
16 benchmark again I got the slower speed of 371 second rather than 165.
Very strange. I can reproduce todaty's number, but not yesterday's. I have
the log files attached.


Ronald Cohen
Geophysical Laboratory
Carnegie Institution
5251 Broad Branch Rd., N.W.
Washington, D.C. 20015
rco... at
office: 202-478-8937
skype: ronaldcohen

On Fri, Mar 25, 2016 at 1:01 PM, Cohen, Ronald <rco... at>

> I am finding very strange dependence of the benchmark on how I run under
> openmpi. Does anyone have any insight?
> cp2k 3.0
> If I simply use:
> mpirun  -n 16 cp2k.psmp H2O-64.inp >> H2O-64_REC.log
> with
> #PBS -l
> for example.
> The timing is 165 seconds, and for
> #PBS -l nodes=4:ppn=16,pmem=1gb
> mpirun  --map-by ppr:4:node -n 16  cp2k.psmp H2O-64.inp >> H2O-64_REC.log
> it is 368 seconds!
> Ron
> ---
> Ronald Cohen
> Geophysical Laboratory
> Carnegie Institution
> 5251 Broad Branch Rd., N.W.
> Washington, D.C. 20015
> rco... at
> office: 202-478-8937
> skype: ronaldcohen
> On Wed, Mar 23, 2016 at 4:28 PM, Ronald Cohen <rco... at>
> wrote:
>> So I finally got decent performance with gfortran, openmpi, and openblas
>> across inifiniband. Now I find that the use of openmp and
>> half the number of mpi processes seems to give better performance for the
>> 64 molecule H2O test case. Is that reasonable? I recompiled everything
>> including BLAS, scalapack, etc without -fopenmp etc. to make the popt
>> version.
>> I find in seconds:
>> 1 node 16 MPI procs psmp OMP_NUM_THREADS=1              834
>> 1 node 16 MPI procs popt OMP_NUM_THREADS=1                836
>> 2 nodes 16 MPI procs psmp OMP_NUM_THREADS=2             266
>> 2 nodes 32 MPI procs popt OMP_NUM_THREADS=1               430
>> 4 nodes   64 MPI procs popt OMP_NUM_THREADS=1             331
>> 4 nodes   32 MPI procs psmp OMP_NUM_THREADS=2           189
>> 4 nodes   64 MPI procs psmp OMP_NUM_THREADS=4           166
>> So you see there is no overhead using psmp built with openmp and setting
>> threads to 1.
>> Using OMP THREADS greatly improves performance over just increasing mpi
>> processes
>> This may be because this machine has only 1 GB memory per core, but even
>> 4 threads is better than 2, so it seems openmp
>> is more efficient than mpi.
>> Still room for improvement though. Any ideas of how to tweak out better
>> performance?
>> Ron
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: H2O-64_REC_slow.log
Type: application/octet-stream
Size: 162228 bytes
Desc: not available
URL: <>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: H2O-64_REC_fast.log
Type: application/octet-stream
Size: 161457 bytes
Desc: not available
URL: <>

More information about the CP2K-user mailing list