<div dir="ltr">Sorry--the second question: I used configure with openmpi-1.10.2 and it seemed to discover the infiniband. But perhaps this is not set properly to build OK on the machine. It is a good point.<div><br></div><div>Ron</div><div><br></div></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature">---<br>Ronald Cohen<br>Geophysical Laboratory<br>Carnegie Institution<br>5251 Broad Branch Rd., N.W.<br>Washington, D.C. 20015<br><a href="mailto:rco...@carnegiescience.edu" target="_blank">rco...@carnegiescience.edu</a><br>office: 202-478-8937<br>skype: ronaldcohen<br><a href="https://twitter.com/recohen3" target="_blank">https://twitter.com/recohen3</a><br><a href="https://www.linkedin.com/profile/view?id=163327727" target="_blank">https://www.linkedin.com/profile/view?id=163327727</a><br></div></div>
<br><div class="gmail_quote">On Mon, Mar 21, 2016 at 5:04 PM, Glen MacLachlan <span dir="ltr"><<a href="mailto:mac...@gwu.edu" target="_blank">mac...@gwu.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><p dir="ltr">Are you conflating MPI with OpenMP? OMP_NUM_THREADS sets the number of threads used by OpenMP and OpenMP doesn't work on a distributed memory environment unless you piggyback on MPI which would be a hybrid use and I'm not sure CP2K ever worked optimally in hybrid mode or at least that's what I've gotten from reading the comments on the source code. </p>
<p dir="ltr">As for MPI, are you sure your MPI stack was compiled with IB bindings? I had similar issues and the problem was that I wasn't actually using IB. If you can, disable eth and leave only IB and see what happens.</p>
<p dir="ltr">Glen </p>
<div class="gmail_quote">On Mar 21, 2016 4:48 PM, "Ronald Cohen" <<a href="mailto:rco...@carnegiescience.edu" target="_blank">rco...@carnegiescience.edu</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div style="font-family:arial,sans-serif;font-size:12.8px">On the dco machine deepcarbon I find decent single node mpi performnace, but running on the same number of processors across two nodes is terrible, even with the infiniband interconect. This is the cp2k  H2O-64 benchmark:</div><div style="font-family:arial,sans-serif;font-size:12.8px"><br></div><div style="font-family:arial,sans-serif;font-size:12.8px"><br></div><div style="font-family:arial,sans-serif;font-size:12.8px"> </div><div style="font-family:arial,sans-serif;font-size:12.8px">On 16 cores on 1 node: total time 530 seconds</div><div style="font-family:arial,sans-serif;font-size:12.8px"><div> SUBROUTINE                       CALLS  ASD         SELF TIME        TOTAL TIME</div><div>                                MAXIMUM       AVERAGE  MAXIMUM  AVERAGE  MAXIMUM</div><div> CP2K                                 1  1.0    0.015    0.019  530.306  530.306</div></div><div style="font-family:arial,sans-serif;font-size:12.8px"><div> -                                                                             -</div><div> -                         MESSAGE PASSING PERFORMANCE                         -</div><div> -                                                                             -</div><div> -------------------------------------------------------------------------------</div><div><br></div><div> ROUTINE             CALLS  TOT TIME [s]  AVE VOLUME [Bytes]  PERFORMANCE [MB/s]</div><div> MP_Group                5         0.000</div><div> MP_Bcast             4103         0.029              44140.             6191.05</div><div> MP_Allreduce        21860         7.077                263.                0.81</div><div> MP_Gather              62         0.008                320.                2.53</div><div> MP_Sync                54         0.001</div><div> MP_Alltoall         19407        26.839             648289.              468.77</div><div> MP_ISendRecv        21600         0.091              94533.            22371.25</div><div> MP_Wait            238786        50.545</div><div> MP_comm_split          50         0.004</div><div> MP_ISend            97572         0.741             239205.            31518.68</div><div> MP_IRecv            97572         8.605             239170.             2711.98</div><div> MP_Memory          167778        45.018</div><div> -------------------------------------------------------------------------------</div></div><div style="font-family:arial,sans-serif;font-size:12.8px"><br></div><div style="font-family:arial,sans-serif;font-size:12.8px"><br></div><div style="font-family:arial,sans-serif;font-size:12.8px">on 16 cores on 2 nodes: total time 5053 seconds !!</div><div style="font-family:arial,sans-serif;font-size:12.8px"><br></div><div style="font-family:arial,sans-serif;font-size:12.8px"><div>SUBROUTINE                       CALLS  ASD         SELF TIME        TOTAL TIME</div><div>                                MAXIMUM       AVERAGE  MAXIMUM  AVERAGE  MAXIMUM</div><div> CP2K                                 1  1.0    0.311    0.363 5052.904 5052.909</div></div><div style="font-family:arial,sans-serif;font-size:12.8px"><br></div><div style="font-family:arial,sans-serif;font-size:12.8px"><br></div><div style="font-family:arial,sans-serif;font-size:12.8px"><div>-------------------------------------------------------------------------------</div><div> -                                                                             -</div><div> -                         MESSAGE PASSING PERFORMANCE                         -</div><div> -                                                                             -</div><div> -------------------------------------------------------------------------------</div><div><br></div><div> ROUTINE             CALLS  TOT TIME [s]  AVE VOLUME [Bytes]  PERFORMANCE [MB/s]</div><div> MP_Group                5         0.000</div><div> MP_Bcast             4119         0.258              43968.              700.70</div><div> MP_Allreduce        21892      1546.186                263.                0.00</div><div> MP_Gather              62         0.049                320.                0.40</div><div> MP_Sync                54         0.071</div><div> MP_Alltoall         19407      1507.024             648289.                8.35</div><div> MP_ISendRecv        21600         0.104              94533.            19656.44</div><div> MP_Wait            238786       513.507</div><div> MP_comm_split          50         4.096</div><div> MP_ISend            97572         1.102             239206.            21176.09</div><div> MP_IRecv            97572         2.739             239171.             8520.75</div><div> MP_Memory          167778        18.845</div><div> -------------------------------------------------------------------------------</div></div><div style="font-family:arial,sans-serif;font-size:12.8px"><br></div><div style="font-family:arial,sans-serif;font-size:12.8px">Any ideas? The code was built with the latest gfortran and I built all of the dependencies, using this arch file.</div><div style="font-family:arial,sans-serif;font-size:12.8px"><br></div><div style="font-family:arial,sans-serif;font-size:12.8px"><div style="font-size:12.8px">CC   = gcc</div><div style="font-size:12.8px">CPP  =</div><div style="font-size:12.8px">FC   = mpif90</div><div style="font-size:12.8px">LD   = mpif90</div><div style="font-size:12.8px">AR   = ar -r</div><div style="font-size:12.8px">PREFIX   = /home/rcohen</div><div style="font-size:12.8px">FFTW_INC   = $(PREFIX)/include</div><div style="font-size:12.8px">FFTW_LIB   = $(PREFIX)/lib</div><div style="font-size:12.8px">LIBINT_INC = $(PREFIX)/include</div><div style="font-size:12.8px">LIBINT_LIB = $(PREFIX)/lib</div><div style="font-size:12.8px">LIBXC_INC  = $(PREFIX)/include</div><div style="font-size:12.8px">LIBXC_LIB  = $(PREFIX)/lib</div><div style="font-size:12.8px">GCC_LIB = $(PREFIX)/gcc-trunk/lib</div><div style="font-size:12.8px">GCC_LIB64  = $(PREFIX)/gcc-trunk/lib64</div><div style="font-size:12.8px">GCC_INC = $(PREFIX)/gcc-trunk/include</div><div style="font-size:12.8px">DFLAGS  = -D__FFTW3 -D__LIBINT -D__LIBXC2\</div><div style="font-size:12.8px">    -D__LIBINT_MAX_AM=7 -D__LIBDERIV_MAX_AM1=6 -D__MAX_CONTR=4\</div><div style="font-size:12.8px">    -D__parallel -D__SCALAPACK -D__HAS_smm_dnn -D__ELPA3 </div><div style="font-size:12.8px">CPPFLAGS   =</div><div style="font-size:12.8px">FCFLAGS = $(DFLAGS) -O2 -ffast-math -ffree-form -ffree-line-length-none\</div><div style="font-size:12.8px">    -fopenmp -ftree-vectorize -funroll-loops\</div><div style="font-size:12.8px">    -mtune=native  \</div><div style="font-size:12.8px">     -I$(FFTW_INC) -I$(LIBINT_INC) -I$(LIBXC_INC) -I$(MKLROOT)/include \</div><div style="font-size:12.8px">     -I$(GCC_INC) -I$(PREFIX)/include/elpa_openmp-2015.11.001/modules</div><div style="font-size:12.8px">LIBS    =  \</div><div style="font-size:12.8px">    $(PREFIX)/lib/libscalapack.a $(PREFIX)/lib/libsmm_dnn_sandybridge-2015-11-10.a \</div><div style="font-size:12.8px">    $(FFTW_LIB)/libfftw3.a\</div><div style="font-size:12.8px">    $(FFTW_LIB)/libfftw3_threads.a\</div><div style="font-size:12.8px">    $(LIBXC_LIB)/libxcf90.a\</div><div style="font-size:12.8px">    $(LIBXC_LIB)/libxc.a\</div><div style="font-size:12.8px">    $(PREFIX)/lib/liblapack.a  $(PREFIX)/lib/libtmglib.a $(PREFIX)/lib/libgomp.a  \</div><div style="font-size:12.8px">    $(PREFIX)/lib/libderiv.a $(PREFIX)/lib/libint.a  -lelpa_openmp -lgomp -lopenblas</div><div style="font-size:12.8px">LDFLAGS = $(FCFLAGS)  -L$(GCC_LIB64) -L$(GCC_LIB) -static-libgfortran -L$(PREFIX)/lib </div><div><br></div><div>It was run with  <span style="font-size:12.8px">OMP_NUM_THREADS=2 on the two nodes</span> and  OMP_NUM_THREADS=1 on the one node.</div><div>Running with  OMP_NUM_THREADS=1 on two nodes .</div><div><br></div><div>I am now checking whether <span style="font-size:12.8px">OMP_NUM_THREADS=1 on two nodes is faster than </span><span style="font-size:12.8px">OMP_NUM_THREADS=2 , but I do not think so.</span></div><div><br></div><div>Ron Cohen</div><div><br></div></div><div style="font-family:arial,sans-serif;font-size:12.8px"><br></div><span class="HOEnZb"><font color="#888888"><div><br></div></font></span></div><span class="HOEnZb"><font color="#888888">

<p></p>

-- <br>
You received this message because you are subscribed to the Google Groups "cp2k" group.<br>
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="mailto:cp2k+uns...@googlegroups.com" target="_blank">cp2k+uns...@googlegroups.com</a>.<br>
To post to this group, send email to <a href="mailto:cp...@googlegroups.com" target="_blank">cp...@googlegroups.com</a>.<br>
Visit this group at <a href="https://groups.google.com/group/cp2k" target="_blank">https://groups.google.com/group/cp2k</a>.<br>
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank">https://groups.google.com/d/optout</a>.<br>
</font></span></blockquote></div><span class="HOEnZb"><font color="#888888">

<p></p>

-- <br>
You received this message because you are subscribed to a topic in the Google Groups "cp2k" group.<br>
To unsubscribe from this topic, visit <a href="https://groups.google.com/d/topic/cp2k/lVLso0oseHU/unsubscribe" target="_blank">https://groups.google.com/d/topic/cp2k/lVLso0oseHU/unsubscribe</a>.<br>
To unsubscribe from this group and all its topics, send an email to <a href="mailto:cp2k+uns...@googlegroups.com" target="_blank">cp2k+uns...@googlegroups.com</a>.<br>
To post to this group, send email to <a href="mailto:cp...@googlegroups.com" target="_blank">cp...@googlegroups.com</a>.<br>
Visit this group at <a href="https://groups.google.com/group/cp2k" target="_blank">https://groups.google.com/group/cp2k</a>.<br>
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank">https://groups.google.com/d/optout</a>.<br>
</font></span></blockquote></div><br></div>