[CP2K:7568] terrible performance across infiniband
Glen MacLachlan
mac... at gwu.edu
Mon Mar 21 21:36:23 UTC 2016
It's hard to talk about the performance when you set OMP_NUM_THREADS = 1
because there is so much overhead associated with OpenMP that launching 1
thread almost always is a performance killer. In fact, OMP_NUM_THREADS=1
never rivals single-threaded performance-wise because of that overhead. No
one ever sets OMP_NUM_THREADS=1 unless they are playing around...We never
do that in production jobs. How about when you scale up to 4 or 8 threads?
Glen
P.S. I see you're in DC...so am I. I support CP2K for the chemists at GWU.
Hope you aren't using Metro to get around the DMV :p
On Mar 21, 2016 5:11 PM, "Cohen, Ronald" <rco... at carnegiescience.edu> wrote:
> Yes I am using hybrid mode. But even if I set OMP_NUM_THREADS=1
> performance is terrible.
>
> ---
> Ronald Cohen
> Geophysical Laboratory
> Carnegie Institution
> 5251 Broad Branch Rd., N.W.
> Washington, D.C. 20015
> rco... at carnegiescience.edu
> office: 202-478-8937
> skype: ronaldcohen
> https://twitter.com/recohen3
> https://www.linkedin.com/profile/view?id=163327727
>
> On Mon, Mar 21, 2016 at 5:04 PM, Glen MacLachlan <mac... at gwu.edu> wrote:
>
>> Are you conflating MPI with OpenMP? OMP_NUM_THREADS sets the number of
>> threads used by OpenMP and OpenMP doesn't work on a distributed memory
>> environment unless you piggyback on MPI which would be a hybrid use and I'm
>> not sure CP2K ever worked optimally in hybrid mode or at least that's what
>> I've gotten from reading the comments on the source code.
>>
>> As for MPI, are you sure your MPI stack was compiled with IB bindings? I
>> had similar issues and the problem was that I wasn't actually using IB. If
>> you can, disable eth and leave only IB and see what happens.
>>
>> Glen
>> On Mar 21, 2016 4:48 PM, "Ronald Cohen" <rco... at carnegiescience.edu>
>> wrote:
>>
>>> On the dco machine deepcarbon I find decent single node mpi performnace,
>>> but running on the same number of processors across two nodes is terrible,
>>> even with the infiniband interconect. This is the cp2k H2O-64 benchmark:
>>>
>>>
>>>
>>> On 16 cores on 1 node: total time 530 seconds
>>> SUBROUTINE CALLS ASD SELF TIME
>>> TOTAL TIME
>>> MAXIMUM AVERAGE MAXIMUM AVERAGE
>>> MAXIMUM
>>> CP2K 1 1.0 0.015 0.019 530.306
>>> 530.306
>>> -
>>> -
>>> - MESSAGE PASSING PERFORMANCE
>>> -
>>> -
>>> -
>>>
>>> -------------------------------------------------------------------------------
>>>
>>> ROUTINE CALLS TOT TIME [s] AVE VOLUME [Bytes]
>>> PERFORMANCE [MB/s]
>>> MP_Group 5 0.000
>>> MP_Bcast 4103 0.029 44140.
>>> 6191.05
>>> MP_Allreduce 21860 7.077 263.
>>> 0.81
>>> MP_Gather 62 0.008 320.
>>> 2.53
>>> MP_Sync 54 0.001
>>> MP_Alltoall 19407 26.839 648289.
>>> 468.77
>>> MP_ISendRecv 21600 0.091 94533.
>>> 22371.25
>>> MP_Wait 238786 50.545
>>> MP_comm_split 50 0.004
>>> MP_ISend 97572 0.741 239205.
>>> 31518.68
>>> MP_IRecv 97572 8.605 239170.
>>> 2711.98
>>> MP_Memory 167778 45.018
>>>
>>> -------------------------------------------------------------------------------
>>>
>>>
>>> on 16 cores on 2 nodes: total time 5053 seconds !!
>>>
>>> SUBROUTINE CALLS ASD SELF TIME
>>> TOTAL TIME
>>> MAXIMUM AVERAGE MAXIMUM AVERAGE
>>> MAXIMUM
>>> CP2K 1 1.0 0.311 0.363 5052.904
>>> 5052.909
>>>
>>>
>>>
>>> -------------------------------------------------------------------------------
>>> -
>>> -
>>> - MESSAGE PASSING PERFORMANCE
>>> -
>>> -
>>> -
>>>
>>> -------------------------------------------------------------------------------
>>>
>>> ROUTINE CALLS TOT TIME [s] AVE VOLUME [Bytes]
>>> PERFORMANCE [MB/s]
>>> MP_Group 5 0.000
>>> MP_Bcast 4119 0.258 43968.
>>> 700.70
>>> MP_Allreduce 21892 1546.186 263.
>>> 0.00
>>> MP_Gather 62 0.049 320.
>>> 0.40
>>> MP_Sync 54 0.071
>>> MP_Alltoall 19407 1507.024 648289.
>>> 8.35
>>> MP_ISendRecv 21600 0.104 94533.
>>> 19656.44
>>> MP_Wait 238786 513.507
>>> MP_comm_split 50 4.096
>>> MP_ISend 97572 1.102 239206.
>>> 21176.09
>>> MP_IRecv 97572 2.739 239171.
>>> 8520.75
>>> MP_Memory 167778 18.845
>>>
>>> -------------------------------------------------------------------------------
>>>
>>> Any ideas? The code was built with the latest gfortran and I built all
>>> of the dependencies, using this arch file.
>>>
>>> CC = gcc
>>> CPP =
>>> FC = mpif90
>>> LD = mpif90
>>> AR = ar -r
>>> PREFIX = /home/rcohen
>>> FFTW_INC = $(PREFIX)/include
>>> FFTW_LIB = $(PREFIX)/lib
>>> LIBINT_INC = $(PREFIX)/include
>>> LIBINT_LIB = $(PREFIX)/lib
>>> LIBXC_INC = $(PREFIX)/include
>>> LIBXC_LIB = $(PREFIX)/lib
>>> GCC_LIB = $(PREFIX)/gcc-trunk/lib
>>> GCC_LIB64 = $(PREFIX)/gcc-trunk/lib64
>>> GCC_INC = $(PREFIX)/gcc-trunk/include
>>> DFLAGS = -D__FFTW3 -D__LIBINT -D__LIBXC2\
>>> -D__LIBINT_MAX_AM=7 -D__LIBDERIV_MAX_AM1=6 -D__MAX_CONTR=4\
>>> -D__parallel -D__SCALAPACK -D__HAS_smm_dnn -D__ELPA3
>>> CPPFLAGS =
>>> FCFLAGS = $(DFLAGS) -O2 -ffast-math -ffree-form -ffree-line-length-none\
>>> -fopenmp -ftree-vectorize -funroll-loops\
>>> -mtune=native \
>>> -I$(FFTW_INC) -I$(LIBINT_INC) -I$(LIBXC_INC) -I$(MKLROOT)/include \
>>> -I$(GCC_INC) -I$(PREFIX)/include/elpa_openmp-2015.11.001/modules
>>> LIBS = \
>>> $(PREFIX)/lib/libscalapack.a
>>> $(PREFIX)/lib/libsmm_dnn_sandybridge-2015-11-10.a \
>>> $(FFTW_LIB)/libfftw3.a\
>>> $(FFTW_LIB)/libfftw3_threads.a\
>>> $(LIBXC_LIB)/libxcf90.a\
>>> $(LIBXC_LIB)/libxc.a\
>>> $(PREFIX)/lib/liblapack.a $(PREFIX)/lib/libtmglib.a
>>> $(PREFIX)/lib/libgomp.a \
>>> $(PREFIX)/lib/libderiv.a $(PREFIX)/lib/libint.a -lelpa_openmp
>>> -lgomp -lopenblas
>>> LDFLAGS = $(FCFLAGS) -L$(GCC_LIB64) -L$(GCC_LIB) -static-libgfortran
>>> -L$(PREFIX)/lib
>>>
>>> It was run with OMP_NUM_THREADS=2 on the two nodes and OMP_NUM_THREADS=1
>>> on the one node.
>>> Running with OMP_NUM_THREADS=1 on two nodes .
>>>
>>> I am now checking whether OMP_NUM_THREADS=1 on two nodes is faster than OMP_NUM_THREADS=2
>>> , but I do not think so.
>>>
>>> Ron Cohen
>>>
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "cp2k" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to cp2k+uns... at googlegroups.com.
>>> To post to this group, send email to cp... at googlegroups.com.
>>> Visit this group at https://groups.google.com/group/cp2k.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "cp2k" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/cp2k/lVLso0oseHU/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> cp2k+uns... at googlegroups.com.
>> To post to this group, send email to cp... at googlegroups.com.
>> Visit this group at https://groups.google.com/group/cp2k.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "cp2k" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cp2k+uns... at googlegroups.com.
> To post to this group, send email to cp... at googlegroups.com.
> Visit this group at https://groups.google.com/group/cp2k.
> For more options, visit https://groups.google.com/d/optout.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20160321/47668f27/attachment.htm>
More information about the CP2K-user
mailing list