[CP2K:7571] terrible performance across infiniband
Glen MacLachlan
mac... at gwu.edu
Tue Mar 22 14:32:46 UTC 2016
Hi Ron,
There's a chance that OpenMPI wasn't configured to use IB properly. Why
don't you disable tcp and see if you are using IB? It's easy
mpirun --mca btl ^tcp ...
Regarding OpenMP:
I'm not sure we're converging on the same discussion anymore but setting
OMP_NUM_THREADS=1 does *not* disable multithreading overhead -- you need to
compile without the fopenmp to get a measure of true single thread
performance.
Best,
Glen
==========================================
Glen MacLachlan, PhD
*HPC Specialist *
*for Physical Sciences &*
*Professorial Lecturer, Data Sciences*
Office of Technology Services
The George Washington University
725 21st Street
Washington, DC 20052
Suite 211, Corcoran Hall
==========================================
On Mon, Mar 21, 2016 at 5:05 PM, Ronald Cohen <rco... at carnegiescience.edu>
wrote:
> According to my experience in general, or the cp2k web pages in particular
> that is not the case. Please see the performance page for cp2k. The
> problem I am sure now is with the openmpi build not using the proper
> infiniband libraries or drivers.
>
> Thank you!
>
> Ron
>
> Sent from my iPad
>
> On Mar 21, 2016, at 5:36 PM, Glen MacLachlan <mac... at gwu.edu> wrote:
>
> It's hard to talk about the performance when you set OMP_NUM_THREADS = 1
> because there is so much overhead associated with OpenMP that launching 1
> thread almost always is a performance killer. In fact, OMP_NUM_THREADS=1
> never rivals single-threaded performance-wise because of that overhead. No
> one ever sets OMP_NUM_THREADS=1 unless they are playing around...We never
> do that in production jobs. How about when you scale up to 4 or 8 threads?
>
> Glen
>
> P.S. I see you're in DC...so am I. I support CP2K for the chemists at GWU.
> Hope you aren't using Metro to get around the DMV :p
> On Mar 21, 2016 5:11 PM, "Cohen, Ronald" <rco... at carnegiescience.edu>
> wrote:
>
>> Yes I am using hybrid mode. But even if I set OMP_NUM_THREADS=1
>> performance is terrible.
>>
>> ---
>> Ronald Cohen
>> Geophysical Laboratory
>> Carnegie Institution
>> 5251 Broad Branch Rd., N.W.
>> Washington, D.C. 20015
>> rco... at carnegiescience.edu
>> office: 202-478-8937
>> skype: ronaldcohen
>> https://twitter.com/recohen3
>> https://www.linkedin.com/profile/view?id=163327727
>>
>> On Mon, Mar 21, 2016 at 5:04 PM, Glen MacLachlan <mac... at gwu.edu> wrote:
>>
>>> Are you conflating MPI with OpenMP? OMP_NUM_THREADS sets the number of
>>> threads used by OpenMP and OpenMP doesn't work on a distributed memory
>>> environment unless you piggyback on MPI which would be a hybrid use and I'm
>>> not sure CP2K ever worked optimally in hybrid mode or at least that's what
>>> I've gotten from reading the comments on the source code.
>>>
>>> As for MPI, are you sure your MPI stack was compiled with IB bindings? I
>>> had similar issues and the problem was that I wasn't actually using IB. If
>>> you can, disable eth and leave only IB and see what happens.
>>>
>>> Glen
>>> On Mar 21, 2016 4:48 PM, "Ronald Cohen" <rco... at carnegiescience.edu>
>>> wrote:
>>>
>>>> On the dco machine deepcarbon I find decent single node mpi
>>>> performnace, but running on the same number of processors across two nodes
>>>> is terrible, even with the infiniband interconect. This is the cp2k H2O-64
>>>> benchmark:
>>>>
>>>>
>>>>
>>>> On 16 cores on 1 node: total time 530 seconds
>>>> SUBROUTINE CALLS ASD SELF TIME
>>>> TOTAL TIME
>>>> MAXIMUM AVERAGE MAXIMUM AVERAGE
>>>> MAXIMUM
>>>> CP2K 1 1.0 0.015 0.019 530.306
>>>> 530.306
>>>> -
>>>> -
>>>> - MESSAGE PASSING PERFORMANCE
>>>> -
>>>> -
>>>> -
>>>>
>>>> -------------------------------------------------------------------------------
>>>>
>>>> ROUTINE CALLS TOT TIME [s] AVE VOLUME [Bytes]
>>>> PERFORMANCE [MB/s]
>>>> MP_Group 5 0.000
>>>> MP_Bcast 4103 0.029 44140.
>>>> 6191.05
>>>> MP_Allreduce 21860 7.077 263.
>>>> 0.81
>>>> MP_Gather 62 0.008 320.
>>>> 2.53
>>>> MP_Sync 54 0.001
>>>> MP_Alltoall 19407 26.839 648289.
>>>> 468.77
>>>> MP_ISendRecv 21600 0.091 94533.
>>>> 22371.25
>>>> MP_Wait 238786 50.545
>>>> MP_comm_split 50 0.004
>>>> MP_ISend 97572 0.741 239205.
>>>> 31518.68
>>>> MP_IRecv 97572 8.605 239170.
>>>> 2711.98
>>>> MP_Memory 167778 45.018
>>>>
>>>> -------------------------------------------------------------------------------
>>>>
>>>>
>>>> on 16 cores on 2 nodes: total time 5053 seconds !!
>>>>
>>>> SUBROUTINE CALLS ASD SELF TIME
>>>> TOTAL TIME
>>>> MAXIMUM AVERAGE MAXIMUM AVERAGE
>>>> MAXIMUM
>>>> CP2K 1 1.0 0.311 0.363 5052.904
>>>> 5052.909
>>>>
>>>>
>>>>
>>>> -------------------------------------------------------------------------------
>>>> -
>>>> -
>>>> - MESSAGE PASSING PERFORMANCE
>>>> -
>>>> -
>>>> -
>>>>
>>>> -------------------------------------------------------------------------------
>>>>
>>>> ROUTINE CALLS TOT TIME [s] AVE VOLUME [Bytes]
>>>> PERFORMANCE [MB/s]
>>>> MP_Group 5 0.000
>>>> MP_Bcast 4119 0.258 43968.
>>>> 700.70
>>>> MP_Allreduce 21892 1546.186 263.
>>>> 0.00
>>>> MP_Gather 62 0.049 320.
>>>> 0.40
>>>> MP_Sync 54 0.071
>>>> MP_Alltoall 19407 1507.024 648289.
>>>> 8.35
>>>> MP_ISendRecv 21600 0.104 94533.
>>>> 19656.44
>>>> MP_Wait 238786 513.507
>>>> MP_comm_split 50 4.096
>>>> MP_ISend 97572 1.102 239206.
>>>> 21176.09
>>>> MP_IRecv 97572 2.739 239171.
>>>> 8520.75
>>>> MP_Memory 167778 18.845
>>>>
>>>> -------------------------------------------------------------------------------
>>>>
>>>> Any ideas? The code was built with the latest gfortran and I built all
>>>> of the dependencies, using this arch file.
>>>>
>>>> CC = gcc
>>>> CPP =
>>>> FC = mpif90
>>>> LD = mpif90
>>>> AR = ar -r
>>>> PREFIX = /home/rcohen
>>>> FFTW_INC = $(PREFIX)/include
>>>> FFTW_LIB = $(PREFIX)/lib
>>>> LIBINT_INC = $(PREFIX)/include
>>>> LIBINT_LIB = $(PREFIX)/lib
>>>> LIBXC_INC = $(PREFIX)/include
>>>> LIBXC_LIB = $(PREFIX)/lib
>>>> GCC_LIB = $(PREFIX)/gcc-trunk/lib
>>>> GCC_LIB64 = $(PREFIX)/gcc-trunk/lib64
>>>> GCC_INC = $(PREFIX)/gcc-trunk/include
>>>> DFLAGS = -D__FFTW3 -D__LIBINT -D__LIBXC2\
>>>> -D__LIBINT_MAX_AM=7 -D__LIBDERIV_MAX_AM1=6 -D__MAX_CONTR=4\
>>>> -D__parallel -D__SCALAPACK -D__HAS_smm_dnn -D__ELPA3
>>>> CPPFLAGS =
>>>> FCFLAGS = $(DFLAGS) -O2 -ffast-math -ffree-form -ffree-line-length-none\
>>>> -fopenmp -ftree-vectorize -funroll-loops\
>>>> -mtune=native \
>>>> -I$(FFTW_INC) -I$(LIBINT_INC) -I$(LIBXC_INC) -I$(MKLROOT)/include \
>>>> -I$(GCC_INC) -I$(PREFIX)/include/elpa_openmp-2015.11.001/modules
>>>> LIBS = \
>>>> $(PREFIX)/lib/libscalapack.a
>>>> $(PREFIX)/lib/libsmm_dnn_sandybridge-2015-11-10.a \
>>>> $(FFTW_LIB)/libfftw3.a\
>>>> $(FFTW_LIB)/libfftw3_threads.a\
>>>> $(LIBXC_LIB)/libxcf90.a\
>>>> $(LIBXC_LIB)/libxc.a\
>>>> $(PREFIX)/lib/liblapack.a $(PREFIX)/lib/libtmglib.a
>>>> $(PREFIX)/lib/libgomp.a \
>>>> $(PREFIX)/lib/libderiv.a $(PREFIX)/lib/libint.a -lelpa_openmp
>>>> -lgomp -lopenblas
>>>> LDFLAGS = $(FCFLAGS) -L$(GCC_LIB64) -L$(GCC_LIB) -static-libgfortran
>>>> -L$(PREFIX)/lib
>>>>
>>>> It was run with OMP_NUM_THREADS=2 on the two nodes and OMP_NUM_THREADS=1
>>>> on the one node.
>>>> Running with OMP_NUM_THREADS=1 on two nodes .
>>>>
>>>> I am now checking whether OMP_NUM_THREADS=1 on two nodes is faster
>>>> than OMP_NUM_THREADS=2 , but I do not think so.
>>>>
>>>> Ron Cohen
>>>>
>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "cp2k" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to cp2k+uns... at googlegroups.com.
>>>> To post to this group, send email to cp... at googlegroups.com.
>>>> Visit this group at https://groups.google.com/group/cp2k.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>> --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "cp2k" group.
>>> To unsubscribe from this topic, visit
>>> https://groups.google.com/d/topic/cp2k/lVLso0oseHU/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> cp2k+uns... at googlegroups.com.
>>> To post to this group, send email to cp... at googlegroups.com.
>>> Visit this group at https://groups.google.com/group/cp2k.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "cp2k" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to cp2k+uns... at googlegroups.com.
>> To post to this group, send email to cp... at googlegroups.com.
>> Visit this group at https://groups.google.com/group/cp2k.
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "cp2k" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/cp2k/lVLso0oseHU/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> cp2k+uns... at googlegroups.com.
> To post to this group, send email to cp... at googlegroups.com.
> Visit this group at https://groups.google.com/group/cp2k.
> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "cp2k" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cp2k+uns... at googlegroups.com.
> To post to this group, send email to cp... at googlegroups.com.
> Visit this group at https://groups.google.com/group/cp2k.
> For more options, visit https://groups.google.com/d/optout.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20160322/b94ab393/attachment.htm>
More information about the CP2K-user
mailing list