comparison of psmp and popt (with and without openmp)

Cohen, Ronald rco... at carnegiescience.edu
Fri Mar 25 17:14:41 UTC 2016


It seems our cluster is slower today than yesterday, as when I ran the -n
16 benchmark again I got the slower speed of 371 second rather than 165.
Very strange. I can reproduce todaty's number, but not yesterday's. I have
the log files attached.

Ron


---
Ronald Cohen
Geophysical Laboratory
Carnegie Institution
5251 Broad Branch Rd., N.W.
Washington, D.C. 20015
rco... at carnegiescience.edu
office: 202-478-8937
skype: ronaldcohen
https://twitter.com/recohen3
https://www.linkedin.com/profile/view?id=163327727

On Fri, Mar 25, 2016 at 1:01 PM, Cohen, Ronald <rco... at carnegiescience.edu>
wrote:

> I am finding very strange dependence of the benchmark on how I run under
> openmpi. Does anyone have any insight?
>
> cp2k 3.0
>
> If I simply use:
>
> mpirun  -n 16 cp2k.psmp H2O-64.inp >> H2O-64_REC.log
>
> with
>
> #PBS -l nodes=n013.cluster.com:ppn=4+n014.cluster.com:
> ppn=4+n015.cluster.com:ppn=4+n016.cluster.com:ppn=4
> for example.
>
> The timing is 165 seconds, and for
>
> #PBS -l nodes=4:ppn=16,pmem=1gb
> mpirun  --map-by ppr:4:node -n 16  cp2k.psmp H2O-64.inp >> H2O-64_REC.log
> it is 368 seconds!
>
> Ron
>
>
> ---
> Ronald Cohen
> Geophysical Laboratory
> Carnegie Institution
> 5251 Broad Branch Rd., N.W.
> Washington, D.C. 20015
> rco... at carnegiescience.edu
> office: 202-478-8937
> skype: ronaldcohen
> https://twitter.com/recohen3
> https://www.linkedin.com/profile/view?id=163327727
>
> On Wed, Mar 23, 2016 at 4:28 PM, Ronald Cohen <rco... at carnegiescience.edu>
> wrote:
>
>> So I finally got decent performance with gfortran, openmpi, and openblas
>> across inifiniband. Now I find that the use of openmp and
>> half the number of mpi processes seems to give better performance for the
>> 64 molecule H2O test case. Is that reasonable? I recompiled everything
>> including BLAS, scalapack, etc without -fopenmp etc. to make the popt
>> version.
>>
>> I find in seconds:
>>
>> 1 node 16 MPI procs psmp OMP_NUM_THREADS=1              834
>> 1 node 16 MPI procs popt OMP_NUM_THREADS=1                836
>> 2 nodes 16 MPI procs psmp OMP_NUM_THREADS=2             266
>> 2 nodes 32 MPI procs popt OMP_NUM_THREADS=1               430
>> 4 nodes   64 MPI procs popt OMP_NUM_THREADS=1             331
>> 4 nodes   32 MPI procs psmp OMP_NUM_THREADS=2           189
>> 4 nodes   64 MPI procs psmp OMP_NUM_THREADS=4           166
>>
>> So you see there is no overhead using psmp built with openmp and setting
>> threads to 1.
>> Using OMP THREADS greatly improves performance over just increasing mpi
>> processes
>> This may be because this machine has only 1 GB memory per core, but even
>> 4 threads is better than 2, so it seems openmp
>> is more efficient than mpi.
>>
>> Still room for improvement though. Any ideas of how to tweak out better
>> performance?
>>
>>
>> Ron
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20160325/0381ed88/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: H2O-64_REC_slow.log
Type: application/octet-stream
Size: 162228 bytes
Desc: not available
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20160325/0381ed88/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: H2O-64_REC_fast.log
Type: application/octet-stream
Size: 161457 bytes
Desc: not available
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20160325/0381ed88/attachment-0001.obj>


More information about the CP2K-user mailing list