[CP2K:7568] terrible performance across infiniband

Glen MacLachlan mac... at gwu.edu
Mon Mar 21 21:36:23 UTC 2016
Previous message (by thread): [CP2K:7568] terrible performance across infiniband
Next message (by thread): [CP2K:7570] terrible performance across infiniband
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
It's hard to talk about the performance when you set OMP_NUM_THREADS = 1
because there is so much overhead associated with OpenMP that launching 1
thread almost always is a performance killer. In fact, OMP_NUM_THREADS=1
never rivals single-threaded performance-wise because of that overhead. No
one ever sets  OMP_NUM_THREADS=1 unless they are playing around...We never
do that in production jobs. How about when you scale up to 4 or 8 threads?

Glen

P.S. I see you're in DC...so am I. I support CP2K for the chemists at GWU.
Hope you aren't using Metro to get around the DMV :p
On Mar 21, 2016 5:11 PM, "Cohen, Ronald" <rco... at carnegiescience.edu> wrote:

> Yes I am using hybrid mode. But even if I set OMP_NUM_THREADS=1
> performance is terrible.
>
> ---
> Ronald Cohen
> Geophysical Laboratory
> Carnegie Institution
> 5251 Broad Branch Rd., N.W.
> Washington, D.C. 20015
> rco... at carnegiescience.edu
> office: 202-478-8937
> skype: ronaldcohen
> https://twitter.com/recohen3
> https://www.linkedin.com/profile/view?id=163327727
>
> On Mon, Mar 21, 2016 at 5:04 PM, Glen MacLachlan <mac... at gwu.edu> wrote:
>
>> Are you conflating MPI with OpenMP? OMP_NUM_THREADS sets the number of
>> threads used by OpenMP and OpenMP doesn't work on a distributed memory
>> environment unless you piggyback on MPI which would be a hybrid use and I'm
>> not sure CP2K ever worked optimally in hybrid mode or at least that's what
>> I've gotten from reading the comments on the source code.
>>
>> As for MPI, are you sure your MPI stack was compiled with IB bindings? I
>> had similar issues and the problem was that I wasn't actually using IB. If
>> you can, disable eth and leave only IB and see what happens.
>>
>> Glen
>> On Mar 21, 2016 4:48 PM, "Ronald Cohen" <rco... at carnegiescience.edu>
>> wrote:
>>
>>> On the dco machine deepcarbon I find decent single node mpi performnace,
>>> but running on the same number of processors across two nodes is terrible,
>>> even with the infiniband interconect. This is the cp2k  H2O-64 benchmark:
>>>
>>>
>>>
>>> On 16 cores on 1 node: total time 530 seconds
>>>  SUBROUTINE                       CALLS  ASD         SELF TIME
>>>  TOTAL TIME
>>>                                 MAXIMUM       AVERAGE  MAXIMUM  AVERAGE
>>>  MAXIMUM
>>>  CP2K                                 1  1.0    0.015    0.019  530.306
>>>  530.306
>>>  -
>>>       -
>>>  -                         MESSAGE PASSING PERFORMANCE
>>>       -
>>>  -
>>>       -
>>>
>>>  -------------------------------------------------------------------------------
>>>
>>>  ROUTINE             CALLS  TOT TIME [s]  AVE VOLUME [Bytes]
>>>  PERFORMANCE [MB/s]
>>>  MP_Group                5         0.000
>>>  MP_Bcast             4103         0.029              44140.
>>> 6191.05
>>>  MP_Allreduce        21860         7.077                263.
>>>    0.81
>>>  MP_Gather              62         0.008                320.
>>>    2.53
>>>  MP_Sync                54         0.001
>>>  MP_Alltoall         19407        26.839             648289.
>>>  468.77
>>>  MP_ISendRecv        21600         0.091              94533.
>>>  22371.25
>>>  MP_Wait            238786        50.545
>>>  MP_comm_split          50         0.004
>>>  MP_ISend            97572         0.741             239205.
>>>  31518.68
>>>  MP_IRecv            97572         8.605             239170.
>>> 2711.98
>>>  MP_Memory          167778        45.018
>>>
>>>  -------------------------------------------------------------------------------
>>>
>>>
>>> on 16 cores on 2 nodes: total time 5053 seconds !!
>>>
>>> SUBROUTINE                       CALLS  ASD         SELF TIME
>>>  TOTAL TIME
>>>                                 MAXIMUM       AVERAGE  MAXIMUM  AVERAGE
>>>  MAXIMUM
>>>  CP2K                                 1  1.0    0.311    0.363 5052.904
>>> 5052.909
>>>
>>>
>>>
>>> -------------------------------------------------------------------------------
>>>  -
>>>       -
>>>  -                         MESSAGE PASSING PERFORMANCE
>>>       -
>>>  -
>>>       -
>>>
>>>  -------------------------------------------------------------------------------
>>>
>>>  ROUTINE             CALLS  TOT TIME [s]  AVE VOLUME [Bytes]
>>>  PERFORMANCE [MB/s]
>>>  MP_Group                5         0.000
>>>  MP_Bcast             4119         0.258              43968.
>>>  700.70
>>>  MP_Allreduce        21892      1546.186                263.
>>>    0.00
>>>  MP_Gather              62         0.049                320.
>>>    0.40
>>>  MP_Sync                54         0.071
>>>  MP_Alltoall         19407      1507.024             648289.
>>>    8.35
>>>  MP_ISendRecv        21600         0.104              94533.
>>>  19656.44
>>>  MP_Wait            238786       513.507
>>>  MP_comm_split          50         4.096
>>>  MP_ISend            97572         1.102             239206.
>>>  21176.09
>>>  MP_IRecv            97572         2.739             239171.
>>> 8520.75
>>>  MP_Memory          167778        18.845
>>>
>>>  -------------------------------------------------------------------------------
>>>
>>> Any ideas? The code was built with the latest gfortran and I built all
>>> of the dependencies, using this arch file.
>>>
>>> CC   = gcc
>>> CPP  =
>>> FC   = mpif90
>>> LD   = mpif90
>>> AR   = ar -r
>>> PREFIX   = /home/rcohen
>>> FFTW_INC   = $(PREFIX)/include
>>> FFTW_LIB   = $(PREFIX)/lib
>>> LIBINT_INC = $(PREFIX)/include
>>> LIBINT_LIB = $(PREFIX)/lib
>>> LIBXC_INC  = $(PREFIX)/include
>>> LIBXC_LIB  = $(PREFIX)/lib
>>> GCC_LIB = $(PREFIX)/gcc-trunk/lib
>>> GCC_LIB64  = $(PREFIX)/gcc-trunk/lib64
>>> GCC_INC = $(PREFIX)/gcc-trunk/include
>>> DFLAGS  = -D__FFTW3 -D__LIBINT -D__LIBXC2\
>>>     -D__LIBINT_MAX_AM=7 -D__LIBDERIV_MAX_AM1=6 -D__MAX_CONTR=4\
>>>     -D__parallel -D__SCALAPACK -D__HAS_smm_dnn -D__ELPA3
>>> CPPFLAGS   =
>>> FCFLAGS = $(DFLAGS) -O2 -ffast-math -ffree-form -ffree-line-length-none\
>>>     -fopenmp -ftree-vectorize -funroll-loops\
>>>     -mtune=native  \
>>>      -I$(FFTW_INC) -I$(LIBINT_INC) -I$(LIBXC_INC) -I$(MKLROOT)/include \
>>>      -I$(GCC_INC) -I$(PREFIX)/include/elpa_openmp-2015.11.001/modules
>>> LIBS    =  \
>>>     $(PREFIX)/lib/libscalapack.a
>>> $(PREFIX)/lib/libsmm_dnn_sandybridge-2015-11-10.a \
>>>     $(FFTW_LIB)/libfftw3.a\
>>>     $(FFTW_LIB)/libfftw3_threads.a\
>>>     $(LIBXC_LIB)/libxcf90.a\
>>>     $(LIBXC_LIB)/libxc.a\
>>>     $(PREFIX)/lib/liblapack.a  $(PREFIX)/lib/libtmglib.a
>>> $(PREFIX)/lib/libgomp.a  \
>>>     $(PREFIX)/lib/libderiv.a $(PREFIX)/lib/libint.a  -lelpa_openmp
>>> -lgomp -lopenblas
>>> LDFLAGS = $(FCFLAGS)  -L$(GCC_LIB64) -L$(GCC_LIB) -static-libgfortran
>>> -L$(PREFIX)/lib
>>>
>>> It was run with  OMP_NUM_THREADS=2 on the two nodes and  OMP_NUM_THREADS=1
>>> on the one node.
>>> Running with  OMP_NUM_THREADS=1 on two nodes .
>>>
>>> I am now checking whether OMP_NUM_THREADS=1 on two nodes is faster than OMP_NUM_THREADS=2
>>> , but I do not think so.
>>>
>>> Ron Cohen
>>>
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "cp2k" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to cp2k+uns... at googlegroups.com.
>>> To post to this group, send email to cp... at googlegroups.com.
>>> Visit this group at https://groups.google.com/group/cp2k.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "cp2k" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/cp2k/lVLso0oseHU/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> cp2k+uns... at googlegroups.com.
>> To post to this group, send email to cp... at googlegroups.com.
>> Visit this group at https://groups.google.com/group/cp2k.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "cp2k" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cp2k+uns... at googlegroups.com.
> To post to this group, send email to cp... at googlegroups.com.
> Visit this group at https://groups.google.com/group/cp2k.
> For more options, visit https://groups.google.com/d/optout.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20160321/47668f27/attachment.htm>
Previous message (by thread): [CP2K:7568] terrible performance across infiniband
Next message (by thread): [CP2K:7570] terrible performance across infiniband
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the CP2K-user mailing list