[CP2K:7579] terrible performance across infiniband
Glen MacLachlan
mac... at gwu.edu
Tue Mar 22 16:15:18 UTC 2016
Sorry, it's more accurate to say the circumflex "^" is a regex character
that *reverses* the match.
Best,
Glen
==========================================
Glen MacLachlan, PhD
*HPC Specialist *
*for Physical Sciences &*
*Professorial Lecturer, Data Sciences*
Office of Technology Services
The George Washington University
725 21st Street
Washington, DC 20052
Suite 211, Corcoran Hall
==========================================
On Tue, Mar 22, 2016 at 11:12 AM, Glen MacLachlan <mac... at gwu.edu> wrote:
> Yeah, the ^ is a regular expression character that means ignore what comes
> after -- think of it as a negation.
>
> Best,
> Glen
>
> ==========================================
> Glen MacLachlan, PhD
> *HPC Specialist *
> *for Physical Sciences &*
>
> *Professorial Lecturer, Data Sciences*
>
> Office of Technology Services
> The George Washington University
> 725 21st Street
> Washington, DC 20052
> Suite 211, Corcoran Hall
>
> ==========================================
>
>
>
>
> On Tue, Mar 22, 2016 at 11:04 AM, Cohen, Ronald <
> rco... at carnegiescience.edu> wrote:
>
>> Thank you so much. It is a bit difficult because I did not set up this
>> machine and do not have root access, but I know it is a mess. I backed up
>> to just try the HPL benchmark.
>> I am finding 100 GFLOPS one node performance on N=2000 and 16 cores, and
>> 1.5 GFLOPS using two nodes, 8 cores per node. So there is definately
>> something really wrong. I need to getthis working before I can worry about
>> threads or cp2k.
>> Was that a caret in your command above:
>>
>> mpirun --mca btl ^tcp
>>
>> ?
>>
>> I looked through my openmpi build and it seems to have found the
>> infiniband includes such as they exist on the machine, but I could not the
>> expected mxm or Mellanox drivers anywhere on the machine.
>>
>> I am CCing Peter Fox, the person who volunteers his time for this
>> machine, and who has root access!
>>
>> Sincerely,
>>
>> Ron
>>
>>
>> ---
>> Ronald Cohen
>> Geophysical Laboratory
>> Carnegie Institution
>> 5251 Broad Branch Rd., N.W.
>> Washington, D.C. 20015
>> rco... at carnegiescience.edu
>> office: 202-478-8937
>> skype: ronaldcohen
>> https://twitter.com/recohen3
>> https://www.linkedin.com/profile/view?id=163327727
>>
>> On Tue, Mar 22, 2016 at 10:32 AM, Glen MacLachlan <mac... at gwu.edu>
>> wrote:
>>
>>> Hi Ron,
>>>
>>> There's a chance that OpenMPI wasn't configured to use IB properly. Why
>>> don't you disable tcp and see if you are using IB? It's easy
>>>
>>> mpirun --mca btl ^tcp ...
>>>
>>>
>>> Regarding OpenMP:
>>> I'm not sure we're converging on the same discussion anymore but setting
>>> OMP_NUM_THREADS=1 does *not* disable multithreading overhead -- you
>>> need to compile without the fopenmp to get a measure of true single thread
>>> performance.
>>>
>>>
>>> Best,
>>> Glen
>>>
>>> ==========================================
>>> Glen MacLachlan, PhD
>>> *HPC Specialist *
>>> *for Physical Sciences &*
>>>
>>> *Professorial Lecturer, Data Sciences*
>>>
>>> Office of Technology Services
>>> The George Washington University
>>> 725 21st Street
>>> Washington, DC 20052
>>> Suite 211, Corcoran Hall
>>>
>>> ==========================================
>>>
>>>
>>>
>>>
>>> On Mon, Mar 21, 2016 at 5:05 PM, Ronald Cohen <
>>> rco... at carnegiescience.edu> wrote:
>>>
>>>> According to my experience in general, or the cp2k web pages in
>>>> particular that is not the case. Please see the performance page for
>>>> cp2k. The problem I am sure now is with the openmpi build not using the
>>>> proper infiniband libraries or drivers.
>>>>
>>>> Thank you!
>>>>
>>>> Ron
>>>>
>>>> Sent from my iPad
>>>>
>>>> On Mar 21, 2016, at 5:36 PM, Glen MacLachlan <mac... at gwu.edu> wrote:
>>>>
>>>> It's hard to talk about the performance when you set OMP_NUM_THREADS =
>>>> 1 because there is so much overhead associated with OpenMP that launching 1
>>>> thread almost always is a performance killer. In fact, OMP_NUM_THREADS=1
>>>> never rivals single-threaded performance-wise because of that overhead. No
>>>> one ever sets OMP_NUM_THREADS=1 unless they are playing around...We never
>>>> do that in production jobs. How about when you scale up to 4 or 8 threads?
>>>>
>>>> Glen
>>>>
>>>> P.S. I see you're in DC...so am I. I support CP2K for the chemists at
>>>> GWU. Hope you aren't using Metro to get around the DMV :p
>>>> On Mar 21, 2016 5:11 PM, "Cohen, Ronald" <rco... at carnegiescience.edu>
>>>> wrote:
>>>>
>>>>> Yes I am using hybrid mode. But even if I set OMP_NUM_THREADS=1
>>>>> performance is terrible.
>>>>>
>>>>> ---
>>>>> Ronald Cohen
>>>>> Geophysical Laboratory
>>>>> Carnegie Institution
>>>>> 5251 Broad Branch Rd., N.W.
>>>>> Washington, D.C. 20015
>>>>> rco... at carnegiescience.edu
>>>>> office: 202-478-8937
>>>>> skype: ronaldcohen
>>>>> https://twitter.com/recohen3
>>>>> https://www.linkedin.com/profile/view?id=163327727
>>>>>
>>>>> On Mon, Mar 21, 2016 at 5:04 PM, Glen MacLachlan <mac... at gwu.edu>
>>>>> wrote:
>>>>>
>>>>>> Are you conflating MPI with OpenMP? OMP_NUM_THREADS sets the number
>>>>>> of threads used by OpenMP and OpenMP doesn't work on a distributed memory
>>>>>> environment unless you piggyback on MPI which would be a hybrid use and I'm
>>>>>> not sure CP2K ever worked optimally in hybrid mode or at least that's what
>>>>>> I've gotten from reading the comments on the source code.
>>>>>>
>>>>>> As for MPI, are you sure your MPI stack was compiled with IB
>>>>>> bindings? I had similar issues and the problem was that I wasn't actually
>>>>>> using IB. If you can, disable eth and leave only IB and see what happens.
>>>>>>
>>>>>> Glen
>>>>>> On Mar 21, 2016 4:48 PM, "Ronald Cohen" <rco... at carnegiescience.edu>
>>>>>> wrote:
>>>>>>
>>>>>>> On the dco machine deepcarbon I find decent single node mpi
>>>>>>> performnace, but running on the same number of processors across two nodes
>>>>>>> is terrible, even with the infiniband interconect. This is the cp2k H2O-64
>>>>>>> benchmark:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 16 cores on 1 node: total time 530 seconds
>>>>>>> SUBROUTINE CALLS ASD SELF TIME
>>>>>>> TOTAL TIME
>>>>>>> MAXIMUM AVERAGE MAXIMUM
>>>>>>> AVERAGE MAXIMUM
>>>>>>> CP2K 1 1.0 0.015 0.019
>>>>>>> 530.306 530.306
>>>>>>> -
>>>>>>> -
>>>>>>> - MESSAGE PASSING PERFORMANCE
>>>>>>> -
>>>>>>> -
>>>>>>> -
>>>>>>>
>>>>>>> -------------------------------------------------------------------------------
>>>>>>>
>>>>>>> ROUTINE CALLS TOT TIME [s] AVE VOLUME [Bytes]
>>>>>>> PERFORMANCE [MB/s]
>>>>>>> MP_Group 5 0.000
>>>>>>> MP_Bcast 4103 0.029 44140.
>>>>>>> 6191.05
>>>>>>> MP_Allreduce 21860 7.077 263.
>>>>>>> 0.81
>>>>>>> MP_Gather 62 0.008 320.
>>>>>>> 2.53
>>>>>>> MP_Sync 54 0.001
>>>>>>> MP_Alltoall 19407 26.839 648289.
>>>>>>> 468.77
>>>>>>> MP_ISendRecv 21600 0.091 94533.
>>>>>>> 22371.25
>>>>>>> MP_Wait 238786 50.545
>>>>>>> MP_comm_split 50 0.004
>>>>>>> MP_ISend 97572 0.741 239205.
>>>>>>> 31518.68
>>>>>>> MP_IRecv 97572 8.605 239170.
>>>>>>> 2711.98
>>>>>>> MP_Memory 167778 45.018
>>>>>>>
>>>>>>> -------------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> on 16 cores on 2 nodes: total time 5053 seconds !!
>>>>>>>
>>>>>>> SUBROUTINE CALLS ASD SELF TIME
>>>>>>> TOTAL TIME
>>>>>>> MAXIMUM AVERAGE MAXIMUM
>>>>>>> AVERAGE MAXIMUM
>>>>>>> CP2K 1 1.0 0.311 0.363
>>>>>>> 5052.904 5052.909
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -------------------------------------------------------------------------------
>>>>>>> -
>>>>>>> -
>>>>>>> - MESSAGE PASSING PERFORMANCE
>>>>>>> -
>>>>>>> -
>>>>>>> -
>>>>>>>
>>>>>>> -------------------------------------------------------------------------------
>>>>>>>
>>>>>>> ROUTINE CALLS TOT TIME [s] AVE VOLUME [Bytes]
>>>>>>> PERFORMANCE [MB/s]
>>>>>>> MP_Group 5 0.000
>>>>>>> MP_Bcast 4119 0.258 43968.
>>>>>>> 700.70
>>>>>>> MP_Allreduce 21892 1546.186 263.
>>>>>>> 0.00
>>>>>>> MP_Gather 62 0.049 320.
>>>>>>> 0.40
>>>>>>> MP_Sync 54 0.071
>>>>>>> MP_Alltoall 19407 1507.024 648289.
>>>>>>> 8.35
>>>>>>> MP_ISendRecv 21600 0.104 94533.
>>>>>>> 19656.44
>>>>>>> MP_Wait 238786 513.507
>>>>>>> MP_comm_split 50 4.096
>>>>>>> MP_ISend 97572 1.102 239206.
>>>>>>> 21176.09
>>>>>>> MP_IRecv 97572 2.739 239171.
>>>>>>> 8520.75
>>>>>>> MP_Memory 167778 18.845
>>>>>>>
>>>>>>> -------------------------------------------------------------------------------
>>>>>>>
>>>>>>> Any ideas? The code was built with the latest gfortran and I built
>>>>>>> all of the dependencies, using this arch file.
>>>>>>>
>>>>>>> CC = gcc
>>>>>>> CPP =
>>>>>>> FC = mpif90
>>>>>>> LD = mpif90
>>>>>>> AR = ar -r
>>>>>>> PREFIX = /home/rcohen
>>>>>>> FFTW_INC = $(PREFIX)/include
>>>>>>> FFTW_LIB = $(PREFIX)/lib
>>>>>>> LIBINT_INC = $(PREFIX)/include
>>>>>>> LIBINT_LIB = $(PREFIX)/lib
>>>>>>> LIBXC_INC = $(PREFIX)/include
>>>>>>> LIBXC_LIB = $(PREFIX)/lib
>>>>>>> GCC_LIB = $(PREFIX)/gcc-trunk/lib
>>>>>>> GCC_LIB64 = $(PREFIX)/gcc-trunk/lib64
>>>>>>> GCC_INC = $(PREFIX)/gcc-trunk/include
>>>>>>> DFLAGS = -D__FFTW3 -D__LIBINT -D__LIBXC2\
>>>>>>> -D__LIBINT_MAX_AM=7 -D__LIBDERIV_MAX_AM1=6 -D__MAX_CONTR=4\
>>>>>>> -D__parallel -D__SCALAPACK -D__HAS_smm_dnn -D__ELPA3
>>>>>>> CPPFLAGS =
>>>>>>> FCFLAGS = $(DFLAGS) -O2 -ffast-math -ffree-form
>>>>>>> -ffree-line-length-none\
>>>>>>> -fopenmp -ftree-vectorize -funroll-loops\
>>>>>>> -mtune=native \
>>>>>>> -I$(FFTW_INC) -I$(LIBINT_INC) -I$(LIBXC_INC)
>>>>>>> -I$(MKLROOT)/include \
>>>>>>> -I$(GCC_INC) -I$(PREFIX)/include/elpa_openmp-2015.11.001/modules
>>>>>>> LIBS = \
>>>>>>> $(PREFIX)/lib/libscalapack.a
>>>>>>> $(PREFIX)/lib/libsmm_dnn_sandybridge-2015-11-10.a \
>>>>>>> $(FFTW_LIB)/libfftw3.a\
>>>>>>> $(FFTW_LIB)/libfftw3_threads.a\
>>>>>>> $(LIBXC_LIB)/libxcf90.a\
>>>>>>> $(LIBXC_LIB)/libxc.a\
>>>>>>> $(PREFIX)/lib/liblapack.a $(PREFIX)/lib/libtmglib.a
>>>>>>> $(PREFIX)/lib/libgomp.a \
>>>>>>> $(PREFIX)/lib/libderiv.a $(PREFIX)/lib/libint.a -lelpa_openmp
>>>>>>> -lgomp -lopenblas
>>>>>>> LDFLAGS = $(FCFLAGS) -L$(GCC_LIB64) -L$(GCC_LIB)
>>>>>>> -static-libgfortran -L$(PREFIX)/lib
>>>>>>>
>>>>>>> It was run with OMP_NUM_THREADS=2 on the two nodes and OMP_NUM_THREADS=1
>>>>>>> on the one node.
>>>>>>> Running with OMP_NUM_THREADS=1 on two nodes .
>>>>>>>
>>>>>>> I am now checking whether OMP_NUM_THREADS=1 on two nodes is faster
>>>>>>> than OMP_NUM_THREADS=2 , but I do not think so.
>>>>>>>
>>>>>>> Ron Cohen
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "cp2k" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to cp2k+uns... at googlegroups.com.
>>>>>>> To post to this group, send email to cp... at googlegroups.com.
>>>>>>> Visit this group at https://groups.google.com/group/cp2k.
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to a topic in
>>>>>> the Google Groups "cp2k" group.
>>>>>> To unsubscribe from this topic, visit
>>>>>> https://groups.google.com/d/topic/cp2k/lVLso0oseHU/unsubscribe.
>>>>>> To unsubscribe from this group and all its topics, send an email to
>>>>>> cp2k+uns... at googlegroups.com.
>>>>>> To post to this group, send email to cp... at googlegroups.com.
>>>>>> Visit this group at https://groups.google.com/group/cp2k.
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "cp2k" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to cp2k+uns... at googlegroups.com.
>>>>> To post to this group, send email to cp... at googlegroups.com.
>>>>> Visit this group at https://groups.google.com/group/cp2k.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>> --
>>>> You received this message because you are subscribed to a topic in the
>>>> Google Groups "cp2k" group.
>>>> To unsubscribe from this topic, visit
>>>> https://groups.google.com/d/topic/cp2k/lVLso0oseHU/unsubscribe.
>>>> To unsubscribe from this group and all its topics, send an email to
>>>> cp2k+uns... at googlegroups.com.
>>>> To post to this group, send email to cp... at googlegroups.com.
>>>> Visit this group at https://groups.google.com/group/cp2k.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "cp2k" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to cp2k+uns... at googlegroups.com.
>>>> To post to this group, send email to cp... at googlegroups.com.
>>>> Visit this group at https://groups.google.com/group/cp2k.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "cp2k" group.
>>> To unsubscribe from this topic, visit
>>> https://groups.google.com/d/topic/cp2k/lVLso0oseHU/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> cp2k+uns... at googlegroups.com.
>>> To post to this group, send email to cp... at googlegroups.com.
>>> Visit this group at https://groups.google.com/group/cp2k.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "cp2k" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to cp2k+uns... at googlegroups.com.
>> To post to this group, send email to cp... at googlegroups.com.
>> Visit this group at https://groups.google.com/group/cp2k.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20160322/f1d6f8cb/attachment.htm>
More information about the CP2K-user
mailing list