[CP2K:7568] terrible performance across infiniband
Cohen, Ronald
rco... at carnegiescience.edu
Mon Mar 21 21:13:05 UTC 2016
Sorry--the second question: I used configure with openmpi-1.10.2 and it
seemed to discover the infiniband. But perhaps this is not set properly to
build OK on the machine. It is a good point.
Ron
---
Ronald Cohen
Geophysical Laboratory
Carnegie Institution
5251 Broad Branch Rd., N.W.
Washington, D.C. 20015
rco... at carnegiescience.edu
office: 202-478-8937
skype: ronaldcohen
https://twitter.com/recohen3
https://www.linkedin.com/profile/view?id=163327727
On Mon, Mar 21, 2016 at 5:04 PM, Glen MacLachlan <mac... at gwu.edu> wrote:
> Are you conflating MPI with OpenMP? OMP_NUM_THREADS sets the number of
> threads used by OpenMP and OpenMP doesn't work on a distributed memory
> environment unless you piggyback on MPI which would be a hybrid use and I'm
> not sure CP2K ever worked optimally in hybrid mode or at least that's what
> I've gotten from reading the comments on the source code.
>
> As for MPI, are you sure your MPI stack was compiled with IB bindings? I
> had similar issues and the problem was that I wasn't actually using IB. If
> you can, disable eth and leave only IB and see what happens.
>
> Glen
> On Mar 21, 2016 4:48 PM, "Ronald Cohen" <rco... at carnegiescience.edu>
> wrote:
>
>> On the dco machine deepcarbon I find decent single node mpi performnace,
>> but running on the same number of processors across two nodes is terrible,
>> even with the infiniband interconect. This is the cp2k H2O-64 benchmark:
>>
>>
>>
>> On 16 cores on 1 node: total time 530 seconds
>> SUBROUTINE CALLS ASD SELF TIME
>> TOTAL TIME
>> MAXIMUM AVERAGE MAXIMUM AVERAGE
>> MAXIMUM
>> CP2K 1 1.0 0.015 0.019 530.306
>> 530.306
>> -
>> -
>> - MESSAGE PASSING PERFORMANCE
>> -
>> -
>> -
>>
>> -------------------------------------------------------------------------------
>>
>> ROUTINE CALLS TOT TIME [s] AVE VOLUME [Bytes] PERFORMANCE
>> [MB/s]
>> MP_Group 5 0.000
>> MP_Bcast 4103 0.029 44140.
>> 6191.05
>> MP_Allreduce 21860 7.077 263.
>> 0.81
>> MP_Gather 62 0.008 320.
>> 2.53
>> MP_Sync 54 0.001
>> MP_Alltoall 19407 26.839 648289.
>> 468.77
>> MP_ISendRecv 21600 0.091 94533.
>> 22371.25
>> MP_Wait 238786 50.545
>> MP_comm_split 50 0.004
>> MP_ISend 97572 0.741 239205.
>> 31518.68
>> MP_IRecv 97572 8.605 239170.
>> 2711.98
>> MP_Memory 167778 45.018
>>
>> -------------------------------------------------------------------------------
>>
>>
>> on 16 cores on 2 nodes: total time 5053 seconds !!
>>
>> SUBROUTINE CALLS ASD SELF TIME
>> TOTAL TIME
>> MAXIMUM AVERAGE MAXIMUM AVERAGE
>> MAXIMUM
>> CP2K 1 1.0 0.311 0.363 5052.904
>> 5052.909
>>
>>
>>
>> -------------------------------------------------------------------------------
>> -
>> -
>> - MESSAGE PASSING PERFORMANCE
>> -
>> -
>> -
>>
>> -------------------------------------------------------------------------------
>>
>> ROUTINE CALLS TOT TIME [s] AVE VOLUME [Bytes] PERFORMANCE
>> [MB/s]
>> MP_Group 5 0.000
>> MP_Bcast 4119 0.258 43968.
>> 700.70
>> MP_Allreduce 21892 1546.186 263.
>> 0.00
>> MP_Gather 62 0.049 320.
>> 0.40
>> MP_Sync 54 0.071
>> MP_Alltoall 19407 1507.024 648289.
>> 8.35
>> MP_ISendRecv 21600 0.104 94533.
>> 19656.44
>> MP_Wait 238786 513.507
>> MP_comm_split 50 4.096
>> MP_ISend 97572 1.102 239206.
>> 21176.09
>> MP_IRecv 97572 2.739 239171.
>> 8520.75
>> MP_Memory 167778 18.845
>>
>> -------------------------------------------------------------------------------
>>
>> Any ideas? The code was built with the latest gfortran and I built all of
>> the dependencies, using this arch file.
>>
>> CC = gcc
>> CPP =
>> FC = mpif90
>> LD = mpif90
>> AR = ar -r
>> PREFIX = /home/rcohen
>> FFTW_INC = $(PREFIX)/include
>> FFTW_LIB = $(PREFIX)/lib
>> LIBINT_INC = $(PREFIX)/include
>> LIBINT_LIB = $(PREFIX)/lib
>> LIBXC_INC = $(PREFIX)/include
>> LIBXC_LIB = $(PREFIX)/lib
>> GCC_LIB = $(PREFIX)/gcc-trunk/lib
>> GCC_LIB64 = $(PREFIX)/gcc-trunk/lib64
>> GCC_INC = $(PREFIX)/gcc-trunk/include
>> DFLAGS = -D__FFTW3 -D__LIBINT -D__LIBXC2\
>> -D__LIBINT_MAX_AM=7 -D__LIBDERIV_MAX_AM1=6 -D__MAX_CONTR=4\
>> -D__parallel -D__SCALAPACK -D__HAS_smm_dnn -D__ELPA3
>> CPPFLAGS =
>> FCFLAGS = $(DFLAGS) -O2 -ffast-math -ffree-form -ffree-line-length-none\
>> -fopenmp -ftree-vectorize -funroll-loops\
>> -mtune=native \
>> -I$(FFTW_INC) -I$(LIBINT_INC) -I$(LIBXC_INC) -I$(MKLROOT)/include \
>> -I$(GCC_INC) -I$(PREFIX)/include/elpa_openmp-2015.11.001/modules
>> LIBS = \
>> $(PREFIX)/lib/libscalapack.a
>> $(PREFIX)/lib/libsmm_dnn_sandybridge-2015-11-10.a \
>> $(FFTW_LIB)/libfftw3.a\
>> $(FFTW_LIB)/libfftw3_threads.a\
>> $(LIBXC_LIB)/libxcf90.a\
>> $(LIBXC_LIB)/libxc.a\
>> $(PREFIX)/lib/liblapack.a $(PREFIX)/lib/libtmglib.a
>> $(PREFIX)/lib/libgomp.a \
>> $(PREFIX)/lib/libderiv.a $(PREFIX)/lib/libint.a -lelpa_openmp -lgomp
>> -lopenblas
>> LDFLAGS = $(FCFLAGS) -L$(GCC_LIB64) -L$(GCC_LIB) -static-libgfortran
>> -L$(PREFIX)/lib
>>
>> It was run with OMP_NUM_THREADS=2 on the two nodes and OMP_NUM_THREADS=1
>> on the one node.
>> Running with OMP_NUM_THREADS=1 on two nodes .
>>
>> I am now checking whether OMP_NUM_THREADS=1 on two nodes is faster than OMP_NUM_THREADS=2
>> , but I do not think so.
>>
>> Ron Cohen
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "cp2k" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to cp2k+uns... at googlegroups.com.
>> To post to this group, send email to cp... at googlegroups.com.
>> Visit this group at https://groups.google.com/group/cp2k.
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "cp2k" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/cp2k/lVLso0oseHU/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> cp2k+uns... at googlegroups.com.
> To post to this group, send email to cp... at googlegroups.com.
> Visit this group at https://groups.google.com/group/cp2k.
> For more options, visit https://groups.google.com/d/optout.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20160321/02caf855/attachment.htm>
More information about the CP2K-user
mailing list