[CP2K:7567] terrible performance across infiniband

Glen MacLachlan mac... at gwu.edu
Mon Mar 21 21:04:00 UTC 2016
Previous message (by thread): terrible performance across infiniband
Next message (by thread): [CP2K:7568] terrible performance across infiniband
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Are you conflating MPI with OpenMP? OMP_NUM_THREADS sets the number of
threads used by OpenMP and OpenMP doesn't work on a distributed memory
environment unless you piggyback on MPI which would be a hybrid use and I'm
not sure CP2K ever worked optimally in hybrid mode or at least that's what
I've gotten from reading the comments on the source code.

As for MPI, are you sure your MPI stack was compiled with IB bindings? I
had similar issues and the problem was that I wasn't actually using IB. If
you can, disable eth and leave only IB and see what happens.

Glen
On Mar 21, 2016 4:48 PM, "Ronald Cohen" <rco... at carnegiescience.edu> wrote:

> On the dco machine deepcarbon I find decent single node mpi performnace,
> but running on the same number of processors across two nodes is terrible,
> even with the infiniband interconect. This is the cp2k  H2O-64 benchmark:
>
>
>
> On 16 cores on 1 node: total time 530 seconds
>  SUBROUTINE                       CALLS  ASD         SELF TIME
>  TOTAL TIME
>                                 MAXIMUM       AVERAGE  MAXIMUM  AVERAGE
>  MAXIMUM
>  CP2K                                 1  1.0    0.015    0.019  530.306
>  530.306
>  -
>     -
>  -                         MESSAGE PASSING PERFORMANCE
>     -
>  -
>     -
>
>  -------------------------------------------------------------------------------
>
>  ROUTINE             CALLS  TOT TIME [s]  AVE VOLUME [Bytes]  PERFORMANCE
> [MB/s]
>  MP_Group                5         0.000
>  MP_Bcast             4103         0.029              44140.
> 6191.05
>  MP_Allreduce        21860         7.077                263.
>  0.81
>  MP_Gather              62         0.008                320.
>  2.53
>  MP_Sync                54         0.001
>  MP_Alltoall         19407        26.839             648289.
>  468.77
>  MP_ISendRecv        21600         0.091              94533.
>  22371.25
>  MP_Wait            238786        50.545
>  MP_comm_split          50         0.004
>  MP_ISend            97572         0.741             239205.
>  31518.68
>  MP_IRecv            97572         8.605             239170.
> 2711.98
>  MP_Memory          167778        45.018
>
>  -------------------------------------------------------------------------------
>
>
> on 16 cores on 2 nodes: total time 5053 seconds !!
>
> SUBROUTINE                       CALLS  ASD         SELF TIME        TOTAL
> TIME
>                                 MAXIMUM       AVERAGE  MAXIMUM  AVERAGE
>  MAXIMUM
>  CP2K                                 1  1.0    0.311    0.363 5052.904
> 5052.909
>
>
>
> -------------------------------------------------------------------------------
>  -
>     -
>  -                         MESSAGE PASSING PERFORMANCE
>     -
>  -
>     -
>
>  -------------------------------------------------------------------------------
>
>  ROUTINE             CALLS  TOT TIME [s]  AVE VOLUME [Bytes]  PERFORMANCE
> [MB/s]
>  MP_Group                5         0.000
>  MP_Bcast             4119         0.258              43968.
>  700.70
>  MP_Allreduce        21892      1546.186                263.
>  0.00
>  MP_Gather              62         0.049                320.
>  0.40
>  MP_Sync                54         0.071
>  MP_Alltoall         19407      1507.024             648289.
>  8.35
>  MP_ISendRecv        21600         0.104              94533.
>  19656.44
>  MP_Wait            238786       513.507
>  MP_comm_split          50         4.096
>  MP_ISend            97572         1.102             239206.
>  21176.09
>  MP_IRecv            97572         2.739             239171.
> 8520.75
>  MP_Memory          167778        18.845
>
>  -------------------------------------------------------------------------------
>
> Any ideas? The code was built with the latest gfortran and I built all of
> the dependencies, using this arch file.
>
> CC   = gcc
> CPP  =
> FC   = mpif90
> LD   = mpif90
> AR   = ar -r
> PREFIX   = /home/rcohen
> FFTW_INC   = $(PREFIX)/include
> FFTW_LIB   = $(PREFIX)/lib
> LIBINT_INC = $(PREFIX)/include
> LIBINT_LIB = $(PREFIX)/lib
> LIBXC_INC  = $(PREFIX)/include
> LIBXC_LIB  = $(PREFIX)/lib
> GCC_LIB = $(PREFIX)/gcc-trunk/lib
> GCC_LIB64  = $(PREFIX)/gcc-trunk/lib64
> GCC_INC = $(PREFIX)/gcc-trunk/include
> DFLAGS  = -D__FFTW3 -D__LIBINT -D__LIBXC2\
>     -D__LIBINT_MAX_AM=7 -D__LIBDERIV_MAX_AM1=6 -D__MAX_CONTR=4\
>     -D__parallel -D__SCALAPACK -D__HAS_smm_dnn -D__ELPA3
> CPPFLAGS   =
> FCFLAGS = $(DFLAGS) -O2 -ffast-math -ffree-form -ffree-line-length-none\
>     -fopenmp -ftree-vectorize -funroll-loops\
>     -mtune=native  \
>      -I$(FFTW_INC) -I$(LIBINT_INC) -I$(LIBXC_INC) -I$(MKLROOT)/include \
>      -I$(GCC_INC) -I$(PREFIX)/include/elpa_openmp-2015.11.001/modules
> LIBS    =  \
>     $(PREFIX)/lib/libscalapack.a
> $(PREFIX)/lib/libsmm_dnn_sandybridge-2015-11-10.a \
>     $(FFTW_LIB)/libfftw3.a\
>     $(FFTW_LIB)/libfftw3_threads.a\
>     $(LIBXC_LIB)/libxcf90.a\
>     $(LIBXC_LIB)/libxc.a\
>     $(PREFIX)/lib/liblapack.a  $(PREFIX)/lib/libtmglib.a
> $(PREFIX)/lib/libgomp.a  \
>     $(PREFIX)/lib/libderiv.a $(PREFIX)/lib/libint.a  -lelpa_openmp -lgomp
> -lopenblas
> LDFLAGS = $(FCFLAGS)  -L$(GCC_LIB64) -L$(GCC_LIB) -static-libgfortran
> -L$(PREFIX)/lib
>
> It was run with  OMP_NUM_THREADS=2 on the two nodes and  OMP_NUM_THREADS=1
> on the one node.
> Running with  OMP_NUM_THREADS=1 on two nodes .
>
> I am now checking whether OMP_NUM_THREADS=1 on two nodes is faster than OMP_NUM_THREADS=2
> , but I do not think so.
>
> Ron Cohen
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "cp2k" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cp2k+uns... at googlegroups.com.
> To post to this group, send email to cp... at googlegroups.com.
> Visit this group at https://groups.google.com/group/cp2k.
> For more options, visit https://groups.google.com/d/optout.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20160321/b5bad412/attachment.htm>
Previous message (by thread): terrible performance across infiniband
Next message (by thread): [CP2K:7568] terrible performance across infiniband
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the CP2K-user mailing list