<div dir="ltr">Thank you so much. It is a bit difficult because I did not set up this machine and do not have root access, but I know it is a mess. I backed up to just try the HPL benchmark.<div>I am finding 100 GFLOPS one node performance on N=2000 and 16 cores, and 1.5 GFLOPS using two nodes, 8 cores per node. So there is definately something really wrong. I need to getthis working before I can worry about threads or cp2k.</div><div>Was that a caret in your command above:</div><div><br></div><div><span style="font-family:monospace,monospace;font-size:12.8px">mpirun --mca btl ^tcp</span><br></div><div><span style="font-family:monospace,monospace;font-size:12.8px"><br></span></div><div><span style="font-family:monospace,monospace;font-size:12.8px">?</span></div><div><span style="font-family:monospace,monospace;font-size:12.8px"><br></span></div><div><font face="monospace, monospace"><span style="font-size:12.8px">I looked through my openmpi build and it seems to have found the infiniband includes such as they exist on the machine, but I could not the expected mxm or Mellanox drivers anywhere on the machine. </span></font></div><div><font face="monospace, monospace"><span style="font-size:12.8px"><br></span></font></div><div><font face="monospace, monospace"><span style="font-size:12.8px">I am CCing Peter Fox, the person who volunteers his time for this machine, and who has root access!</span></font></div><div><font face="monospace, monospace"><span style="font-size:12.8px"><br></span></font></div><div><font face="monospace, monospace"><span style="font-size:12.8px">Sincerely,</span></font></div><div><font face="monospace, monospace"><span style="font-size:12.8px"><br></span></font></div><div><font face="monospace, monospace"><span style="font-size:12.8px">Ron</span></font></div><div><font face="monospace, monospace"><span style="font-size:12.8px"><br></span></font></div></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature">---<br>Ronald Cohen<br>Geophysical Laboratory<br>Carnegie Institution<br>5251 Broad Branch Rd., N.W.<br>Washington, D.C. 20015<br><a href="mailto:rco...@carnegiescience.edu" target="_blank">rco...@carnegiescience.edu</a><br>office: 202-478-8937<br>skype: ronaldcohen<br><a href="https://twitter.com/recohen3" target="_blank">https://twitter.com/recohen3</a><br><a href="https://www.linkedin.com/profile/view?id=163327727" target="_blank">https://www.linkedin.com/profile/view?id=163327727</a><br></div></div>
<br><div class="gmail_quote">On Tue, Mar 22, 2016 at 10:32 AM, Glen MacLachlan <span dir="ltr"><<a href="mailto:mac...@gwu.edu" target="_blank">mac...@gwu.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Hi Ron, </div><div><br></div>There's a chance that OpenMPI wasn't configured to use IB properly. Why don't you disable tcp and see if you are using IB? It's easy<blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div><font face="monospace, monospace">mpirun --mca btl ^tcp ...</font></div></blockquote><div><br></div><div>Regarding OpenMP:</div><div>I'm not sure we're converging on the same discussion anymore but setting OMP_NUM_THREADS=1 does <u>not</u> disable multithreading overhead -- you need to compile without the fopenmp to get a measure of true single thread performance. </div><div><br></div><div class="gmail_extra"><br clear="all"><div><div><div dir="ltr"><div><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div><span style="color:rgb(0,0,0);font-size:small">Best,</span><br></div><div><span style="font-size:small"><font color="#000000">Glen</font></span></div><div><span style="color:rgb(7,55,99);font-size:small"><br></span></div><div><span style="color:rgb(7,55,99);font-size:small">==========================================</span></div><div><div style="font-size:13px"><font color="#073763">Glen MacLachlan, PhD</font></div></div><div style="font-size:13px"><div><font color="#073763"><i>HPC Specialist </i><i>for Physical Sciences &<br></i></font></div><div><font color="#073763"><i>Professorial Lecturer, Data Sciences<br></i></font></div></div></div></div></div></div></div></div></div></div><blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div><div><div><div><div><div><div><div><div><div style="font-size:13px"><font color="#073763">Office of Technology Services</font></div></div></div></div></div></div></div></div></div></div><div><div><div><div><div><div><div><div><div><div style="font-size:13px"><font color="#073763">The George Washington University</font></div></div></div></div></div></div></div></div></div></div><div><div><div><div><div><div><div><div><div><div style="font-size:13px"><font color="#073763">725 21st Street</font></div></div></div></div></div></div></div></div></div></div><div><div><div><div><div><div><div><div><div><div style="font-size:13px"><font color="#073763">Washington, DC 20052</font></div></div></div></div></div></div></div></div></div></div><div><div><div><div><div><div><div><div><div><div style="font-size:13px"><font color="#073763">Suite 211, Corcoran Hall</font></div></div></div></div></div></div></div></div></div></div></blockquote><span style="color:rgb(7,55,99);font-size:small">==========================================</span><br><blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div><div style="font-size:13px"><div><br></div></div></div></blockquote><div><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><span style="color:rgb(102,102,102)"><span style="border-collapse:collapse"><div style="font-size:13px;font-family:arial,sans-serif"><div><br></div></div></span></span></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div>
<br><div class="gmail_quote">On Mon, Mar 21, 2016 at 5:05 PM, Ronald Cohen <span dir="ltr"><<a href="mailto:rco...@carnegiescience.edu" target="_blank">rco...@carnegiescience.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="auto"><div>According to my experience in general, or the cp2k web pages in particular that is not the case. Please see the performance page for cp2k. The problem I am sure now is with the openmpi build not using the proper infiniband libraries or drivers.</div><div><br></div><div>Thank you!</div><div><br></div><div>Ron</div><div><br>Sent from my iPad</div><div><div><div><br>On Mar 21, 2016, at 5:36 PM, Glen MacLachlan <<a href="mailto:mac...@gwu.edu" target="_blank">mac...@gwu.edu</a>> wrote:<br><br></div><blockquote type="cite"><div><p dir="ltr">It's hard to talk about the performance when you set OMP_NUM_THREADS = 1 because there is so much overhead associated with OpenMP that launching 1 thread almost always is a performance killer. In fact, OMP_NUM_THREADS=1 never rivals single-threaded performance-wise because of that overhead. No one ever sets OMP_NUM_THREADS=1 unless they are playing around...We never do that in production jobs. How about when you scale up to 4 or 8 threads? </p>
<p dir="ltr">Glen</p>
<p dir="ltr">P.S. I see you're in DC...so am I. I support CP2K for the chemists at GWU. Hope you aren't using Metro to get around the DMV :p</p>
<div class="gmail_quote">On Mar 21, 2016 5:11 PM, "Cohen, Ronald" <<a href="mailto:rco...@carnegiescience.edu" target="_blank">rco...@carnegiescience.edu</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Yes I am using hybrid mode. But even if I set OMP_NUM_THREADS=1 performance is terrible.</div><div class="gmail_extra"><br clear="all"><div><div>---<br>Ronald Cohen<br>Geophysical Laboratory<br>Carnegie Institution<br>5251 Broad Branch Rd., N.W.<br>Washington, D.C. 20015<br><a href="mailto:rco...@carnegiescience.edu" target="_blank">rco...@carnegiescience.edu</a><br>office: <a href="tel:202-478-8937" value="+12024788937" target="_blank">202-478-8937</a><br>skype: ronaldcohen<br><a href="https://twitter.com/recohen3" target="_blank">https://twitter.com/recohen3</a><br><a href="https://www.linkedin.com/profile/view?id=163327727" target="_blank">https://www.linkedin.com/profile/view?id=163327727</a><br></div></div>
<br><div class="gmail_quote">On Mon, Mar 21, 2016 at 5:04 PM, Glen MacLachlan <span dir="ltr"><<a href="mailto:mac...@gwu.edu" target="_blank">mac...@gwu.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><p dir="ltr">Are you conflating MPI with OpenMP? OMP_NUM_THREADS sets the number of threads used by OpenMP and OpenMP doesn't work on a distributed memory environment unless you piggyback on MPI which would be a hybrid use and I'm not sure CP2K ever worked optimally in hybrid mode or at least that's what I've gotten from reading the comments on the source code. </p>
<p dir="ltr">As for MPI, are you sure your MPI stack was compiled with IB bindings? I had similar issues and the problem was that I wasn't actually using IB. If you can, disable eth and leave only IB and see what happens.</p>
<p dir="ltr">Glen </p>
<div class="gmail_quote">On Mar 21, 2016 4:48 PM, "Ronald Cohen" <<a href="mailto:rco...@carnegiescience.edu" target="_blank">rco...@carnegiescience.edu</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div style="font-family:arial,sans-serif;font-size:12.8px">On the dco machine deepcarbon I find decent single node mpi performnace, but running on the same number of processors across two nodes is terrible, even with the infiniband interconect. This is the cp2k H2O-64 benchmark:</div><div style="font-family:arial,sans-serif;font-size:12.8px"><br></div><div style="font-family:arial,sans-serif;font-size:12.8px"><br></div><div style="font-family:arial,sans-serif;font-size:12.8px"> </div><div style="font-family:arial,sans-serif;font-size:12.8px">On 16 cores on 1 node: total time 530 seconds</div><div style="font-family:arial,sans-serif;font-size:12.8px"><div> SUBROUTINE CALLS ASD SELF TIME TOTAL TIME</div><div> MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM</div><div> CP2K 1 1.0 0.015 0.019 530.306 530.306</div></div><div style="font-family:arial,sans-serif;font-size:12.8px"><div> - -</div><div> - MESSAGE PASSING PERFORMANCE -</div><div> - -</div><div> -------------------------------------------------------------------------------</div><div><br></div><div> ROUTINE CALLS TOT TIME [s] AVE VOLUME [Bytes] PERFORMANCE [MB/s]</div><div> MP_Group 5 0.000</div><div> MP_Bcast 4103 0.029 44140. 6191.05</div><div> MP_Allreduce 21860 7.077 263. 0.81</div><div> MP_Gather 62 0.008 320. 2.53</div><div> MP_Sync 54 0.001</div><div> MP_Alltoall 19407 26.839 648289. 468.77</div><div> MP_ISendRecv 21600 0.091 94533. 22371.25</div><div> MP_Wait 238786 50.545</div><div> MP_comm_split 50 0.004</div><div> MP_ISend 97572 0.741 239205. 31518.68</div><div> MP_IRecv 97572 8.605 239170. 2711.98</div><div> MP_Memory 167778 45.018</div><div> -------------------------------------------------------------------------------</div></div><div style="font-family:arial,sans-serif;font-size:12.8px"><br></div><div style="font-family:arial,sans-serif;font-size:12.8px"><br></div><div style="font-family:arial,sans-serif;font-size:12.8px">on 16 cores on 2 nodes: total time 5053 seconds !!</div><div style="font-family:arial,sans-serif;font-size:12.8px"><br></div><div style="font-family:arial,sans-serif;font-size:12.8px"><div>SUBROUTINE CALLS ASD SELF TIME TOTAL TIME</div><div> MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM</div><div> CP2K 1 1.0 0.311 0.363 5052.904 5052.909</div></div><div style="font-family:arial,sans-serif;font-size:12.8px"><br></div><div style="font-family:arial,sans-serif;font-size:12.8px"><br></div><div style="font-family:arial,sans-serif;font-size:12.8px"><div>-------------------------------------------------------------------------------</div><div> - -</div><div> - MESSAGE PASSING PERFORMANCE -</div><div> - -</div><div> -------------------------------------------------------------------------------</div><div><br></div><div> ROUTINE CALLS TOT TIME [s] AVE VOLUME [Bytes] PERFORMANCE [MB/s]</div><div> MP_Group 5 0.000</div><div> MP_Bcast 4119 0.258 43968. 700.70</div><div> MP_Allreduce 21892 1546.186 263. 0.00</div><div> MP_Gather 62 0.049 320. 0.40</div><div> MP_Sync 54 0.071</div><div> MP_Alltoall 19407 1507.024 648289. 8.35</div><div> MP_ISendRecv 21600 0.104 94533. 19656.44</div><div> MP_Wait 238786 513.507</div><div> MP_comm_split 50 4.096</div><div> MP_ISend 97572 1.102 239206. 21176.09</div><div> MP_IRecv 97572 2.739 239171. 8520.75</div><div> MP_Memory 167778 18.845</div><div> -------------------------------------------------------------------------------</div></div><div style="font-family:arial,sans-serif;font-size:12.8px"><br></div><div style="font-family:arial,sans-serif;font-size:12.8px">Any ideas? The code was built with the latest gfortran and I built all of the dependencies, using this arch file.</div><div style="font-family:arial,sans-serif;font-size:12.8px"><br></div><div style="font-family:arial,sans-serif;font-size:12.8px"><div style="font-size:12.8px">CC = gcc</div><div style="font-size:12.8px">CPP =</div><div style="font-size:12.8px">FC = mpif90</div><div style="font-size:12.8px">LD = mpif90</div><div style="font-size:12.8px">AR = ar -r</div><div style="font-size:12.8px">PREFIX = /home/rcohen</div><div style="font-size:12.8px">FFTW_INC = $(PREFIX)/include</div><div style="font-size:12.8px">FFTW_LIB = $(PREFIX)/lib</div><div style="font-size:12.8px">LIBINT_INC = $(PREFIX)/include</div><div style="font-size:12.8px">LIBINT_LIB = $(PREFIX)/lib</div><div style="font-size:12.8px">LIBXC_INC = $(PREFIX)/include</div><div style="font-size:12.8px">LIBXC_LIB = $(PREFIX)/lib</div><div style="font-size:12.8px">GCC_LIB = $(PREFIX)/gcc-trunk/lib</div><div style="font-size:12.8px">GCC_LIB64 = $(PREFIX)/gcc-trunk/lib64</div><div style="font-size:12.8px">GCC_INC = $(PREFIX)/gcc-trunk/include</div><div style="font-size:12.8px">DFLAGS = -D__FFTW3 -D__LIBINT -D__LIBXC2\</div><div style="font-size:12.8px"> -D__LIBINT_MAX_AM=7 -D__LIBDERIV_MAX_AM1=6 -D__MAX_CONTR=4\</div><div style="font-size:12.8px"> -D__parallel -D__SCALAPACK -D__HAS_smm_dnn -D__ELPA3 </div><div style="font-size:12.8px">CPPFLAGS =</div><div style="font-size:12.8px">FCFLAGS = $(DFLAGS) -O2 -ffast-math -ffree-form -ffree-line-length-none\</div><div style="font-size:12.8px"> -fopenmp -ftree-vectorize -funroll-loops\</div><div style="font-size:12.8px"> -mtune=native \</div><div style="font-size:12.8px"> -I$(FFTW_INC) -I$(LIBINT_INC) -I$(LIBXC_INC) -I$(MKLROOT)/include \</div><div style="font-size:12.8px"> -I$(GCC_INC) -I$(PREFIX)/include/elpa_openmp-2015.11.001/modules</div><div style="font-size:12.8px">LIBS = \</div><div style="font-size:12.8px"> $(PREFIX)/lib/libscalapack.a $(PREFIX)/lib/libsmm_dnn_sandybridge-2015-11-10.a \</div><div style="font-size:12.8px"> $(FFTW_LIB)/libfftw3.a\</div><div style="font-size:12.8px"> $(FFTW_LIB)/libfftw3_threads.a\</div><div style="font-size:12.8px"> $(LIBXC_LIB)/libxcf90.a\</div><div style="font-size:12.8px"> $(LIBXC_LIB)/libxc.a\</div><div style="font-size:12.8px"> $(PREFIX)/lib/liblapack.a $(PREFIX)/lib/libtmglib.a $(PREFIX)/lib/libgomp.a \</div><div style="font-size:12.8px"> $(PREFIX)/lib/libderiv.a $(PREFIX)/lib/libint.a -lelpa_openmp -lgomp -lopenblas</div><div style="font-size:12.8px">LDFLAGS = $(FCFLAGS) -L$(GCC_LIB64) -L$(GCC_LIB) -static-libgfortran -L$(PREFIX)/lib </div><div><br></div><div>It was run with <span style="font-size:12.8px">OMP_NUM_THREADS=2 on the two nodes</span> and OMP_NUM_THREADS=1 on the one node.</div><div>Running with OMP_NUM_THREADS=1 on two nodes .</div><div><br></div><div>I am now checking whether <span style="font-size:12.8px">OMP_NUM_THREADS=1 on two nodes is faster than </span><span style="font-size:12.8px">OMP_NUM_THREADS=2 , but I do not think so.</span></div><div><br></div><div>Ron Cohen</div><div><br></div></div><div style="font-family:arial,sans-serif;font-size:12.8px"><br></div><span><font color="#888888"><div><br></div></font></span></div><span><font color="#888888">
<p></p>
-- <br>
You received this message because you are subscribed to the Google Groups "cp2k" group.<br>
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="mailto:cp2k+uns...@googlegroups.com" target="_blank">cp2k+uns...@googlegroups.com</a>.<br>
To post to this group, send email to <a href="mailto:cp...@googlegroups.com" target="_blank">cp...@googlegroups.com</a>.<br>
Visit this group at <a href="https://groups.google.com/group/cp2k" target="_blank">https://groups.google.com/group/cp2k</a>.<br>
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank">https://groups.google.com/d/optout</a>.<br>
</font></span></blockquote></div><span><font color="#888888">
<p></p>
-- <br>
You received this message because you are subscribed to a topic in the Google Groups "cp2k" group.<br>
To unsubscribe from this topic, visit <a href="https://groups.google.com/d/topic/cp2k/lVLso0oseHU/unsubscribe" target="_blank">https://groups.google.com/d/topic/cp2k/lVLso0oseHU/unsubscribe</a>.<br>
To unsubscribe from this group and all its topics, send an email to <a href="mailto:cp2k+uns...@googlegroups.com" target="_blank">cp2k+uns...@googlegroups.com</a>.<br>
To post to this group, send email to <a href="mailto:cp...@googlegroups.com" target="_blank">cp...@googlegroups.com</a>.<br>
Visit this group at <a href="https://groups.google.com/group/cp2k" target="_blank">https://groups.google.com/group/cp2k</a>.<br>
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank">https://groups.google.com/d/optout</a>.<br>
</font></span></blockquote></div><br></div>
<p></p>
-- <br>
You received this message because you are subscribed to the Google Groups "cp2k" group.<br>
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="mailto:cp2k+uns...@googlegroups.com" target="_blank">cp2k+uns...@googlegroups.com</a>.<br>
To post to this group, send email to <a href="mailto:cp...@googlegroups.com" target="_blank">cp...@googlegroups.com</a>.<br>
Visit this group at <a href="https://groups.google.com/group/cp2k" target="_blank">https://groups.google.com/group/cp2k</a>.<br>
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank">https://groups.google.com/d/optout</a>.<br>
</blockquote></div>
<p></p>
-- <br>
You received this message because you are subscribed to a topic in the Google Groups "cp2k" group.<br>
To unsubscribe from this topic, visit <a href="https://groups.google.com/d/topic/cp2k/lVLso0oseHU/unsubscribe" target="_blank">https://groups.google.com/d/topic/cp2k/lVLso0oseHU/unsubscribe</a>.<br>
To unsubscribe from this group and all its topics, send an email to <a href="mailto:cp2k+uns...@googlegroups.com" target="_blank">cp2k+uns...@googlegroups.com</a>.<br>
To post to this group, send email to <a href="mailto:cp...@googlegroups.com" target="_blank">cp...@googlegroups.com</a>.<br>
Visit this group at <a href="https://groups.google.com/group/cp2k" target="_blank">https://groups.google.com/group/cp2k</a>.<br>
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank">https://groups.google.com/d/optout</a>.<span class="HOEnZb"><font color="#888888"><br>
</font></span></div></blockquote></div></div></div><span class="HOEnZb"><font color="#888888"><div><div>
<p></p>
-- <br>
You received this message because you are subscribed to the Google Groups "cp2k" group.<br>
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="mailto:cp2k+uns...@googlegroups.com" target="_blank">cp2k+uns...@googlegroups.com</a>.<br>
To post to this group, send email to <a href="mailto:cp...@googlegroups.com" target="_blank">cp...@googlegroups.com</a>.<br>
Visit this group at <a href="https://groups.google.com/group/cp2k" target="_blank">https://groups.google.com/group/cp2k</a>.<br>
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank">https://groups.google.com/d/optout</a>.<br>
</div></div></font></span></blockquote></div><span class="HOEnZb"><font color="#888888"><br></font></span></div></div><span class="HOEnZb"><font color="#888888">
<p></p>
-- <br>
You received this message because you are subscribed to a topic in the Google Groups "cp2k" group.<br>
To unsubscribe from this topic, visit <a href="https://groups.google.com/d/topic/cp2k/lVLso0oseHU/unsubscribe" target="_blank">https://groups.google.com/d/topic/cp2k/lVLso0oseHU/unsubscribe</a>.<br>
To unsubscribe from this group and all its topics, send an email to <a href="mailto:cp2k+uns...@googlegroups.com" target="_blank">cp2k+uns...@googlegroups.com</a>.<br>
To post to this group, send email to <a href="mailto:cp...@googlegroups.com" target="_blank">cp...@googlegroups.com</a>.<br>
Visit this group at <a href="https://groups.google.com/group/cp2k" target="_blank">https://groups.google.com/group/cp2k</a>.<br>
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank">https://groups.google.com/d/optout</a>.<br>
</font></span></blockquote></div><br></div>