<div dir="ltr"><div>Thanks for that Tiziano,</div><div>I'll give it a go later today.</div><div>kind regards,</div><div>Chris<br><br>On Friday, 31 May 2019 09:24:51 UTC+1, Tiziano Müller  wrote:</div><blockquote class="gmail_quote" style="margin: 0;margin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">Hi Chris,
<br>
<br>an arch/ file for CP2K with P100 GPUs can be found as part of the
<br>regtester output from Piz Daint here:
<br>
<br><a onmousedown="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.cp2k.org%2Fstatic%2Fregtest%2Ftrunk%2Fcscs-daint-xc50_gpu%2FCRAY_XC50-gfortran_gpu.psmp.out\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEV3E1CGYSCg7z-ZdV_-UzUvyDJKg';return true;" onclick="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.cp2k.org%2Fstatic%2Fregtest%2Ftrunk%2Fcscs-daint-xc50_gpu%2FCRAY_XC50-gfortran_gpu.psmp.out\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEV3E1CGYSCg7z-ZdV_-UzUvyDJKg';return true;" href="https://www.cp2k.org/static/regtest/trunk/cscs-daint-xc50_gpu/CRAY_XC50-gfortran_gpu.psmp.out" target="_blank" rel="nofollow">https://www.cp2k.org/static/<wbr>regtest/trunk/cscs-daint-xc50_<wbr>gpu/CRAY_XC50-gfortran_gpu.<wbr>psmp.out</a>
<br>
<br>Those outputs are usually available from here:
<br>
<br>  <a onmousedown="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fdashboard.cp2k.org%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHgqILIfQ8_hWhJX0qYPMTOuEJwCQ';return true;" onclick="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fdashboard.cp2k.org%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHgqILIfQ8_hWhJX0qYPMTOuEJwCQ';return true;" href="https://dashboard.cp2k.org/" target="_blank" rel="nofollow">https://dashboard.cp2k.org/</a>
<br>
<br>
<br>(click the link in the Status column)
<br>
<br>Best regards,
<br>Tiziano
<br>
<br>
<br>Am 30.05.19 um 10:15 schrieb CNelson:
<br>> Hi Both,
<br>> would it be possible to get a copy of the ARCH file you used to build
<br>> CP2K with the new V100 GPUs?
<br>> cheers,
<br>> Chris.
<br>> 
<br>> On Sunday, 4 November 2018 21:06:12 UTC, Alfio Lazzaro wrote:
<br>> 
<br>>     OK, the best way is if you can attach the arch file, the input file,
<br>>     and the output that you got from CP2K.
<br>>     The only GPU accelerated part in CP2K is DBCSR, but can be that you
<br>>     are bound from something else.
<br>> 
<br>>     I agree with you that the reoptimization is not that important at
<br>>     this stage...
<br>> 
<br>>     Alfio
<br>> 
<br>> 
<br>>     Il giorno domenica 4 novembre 2018 19:13:02 UTC+1, <a>fo...@gmail.com</a>
<br>>     ha scritto:
<br>> 
<br>>         Thanks Alfio for the response.
<br>> 
<br>>         Yes. 8 V100 GPUs is extreme. The test I had used takes around
<br>>         500 seconds on a system with Intel SKL G-6148 40 cores(20
<br>>         cores/socket). Do you think this test is not large enough to run
<br>>         on GPUs? If yes, can you recommend any test from CP2K tests folder?
<br>> 
<br>>         I had tried runs with 1 & 2 V100 gpus also. The performance was
<br>>         slower than the 8 V100 gpus run. 
<br>> 
<br>>         CP2K was able to recognize all the 8 gpus, as per "DBCSR| ACC:
<br>>         Number of devices/node".
<br>> 
<br>>         I had tried reoptimizing the kernels for V100. But could not
<br>>         determine what block size values have to be passed to tune.py
<br>>         script.
<br>> 
<br>>         As CP2K-6.1 already has optimized kernel parameters for P100,
<br>>         even 2xP100 GPUs run was slower than CPU only benchmark.
<br>> 
<br>>         On Sunday, November 4, 2018 at 2:33:11 PM UTC+5:30, Alfio
<br>>         Lazzaro wrote:
<br>> 
<br>>             You may take a look at this issue on
<br>>             github: <a onmousedown="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Fcp2k%2Fcp2k%2Fissues%2F73\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHay9GWtMchhngkD6hFPONyIcg7Ww';return true;" onclick="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Fcp2k%2Fcp2k%2Fissues%2F73\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHay9GWtMchhngkD6hFPONyIcg7Ww';return true;" href="https://github.com/cp2k/cp2k/issues/73" target="_blank" rel="nofollow">https://github.com/<wbr>cp2k/cp2k/issues/73</a>
<br>>             <<a onmousedown="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Fcp2k%2Fcp2k%2Fissues%2F73\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHay9GWtMchhngkD6hFPONyIcg7Ww';return true;" onclick="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Fcp2k%2Fcp2k%2Fissues%2F73\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHay9GWtMchhngkD6hFPONyIcg7Ww';return true;" href="https://github.com/cp2k/cp2k/issues/73" target="_blank" rel="nofollow">https://github.com/cp2k/cp2k/<wbr>issues/73</a>>
<br>> 
<br>>             In your particular case, your setup of 8 V100 is pretty
<br>>             extreme and it would require a large computation. Which test
<br>>             are you using for benchmarking?
<br>> 
<br>>             Then, your setup of 8 ranks + 5 threads should be OK. CP2K
<br>>             attaches ranks to GPU in a round-robin manner, therefore in
<br>>             your case there is a rank talking to each GPU.
<br>>             We don't have a large experience of multi-gpu nodes, hence I
<br>>             would suggest to do some scalability test by running 1 rank,
<br>>             2 ranks, ... 8 ranks (always 5 threads) to check how the
<br>>             performance scales. BTW, make sure CP2K is able to recognize
<br>>             8 GPUs by checking the following output at the beginning:
<br>> 
<br>>              DBCSR| ACC: Number of devices/node                         
<br>>                               1
<br>> 
<br>>             Eventually, you might consider reoptimizing the kernels for
<br>>             the V100, but this is not a priority...
<br>> 
<br>>             Alfio
<br>> 
<br>> 
<br>> 
<br>>             Il giorno sabato 3 novembre 2018 07:55:09 UTC+1,
<br>>             <a>fo...@gmail.com</a> ha scritto:
<br>> 
<br>>                 HI,
<br>> 
<br>>                 How is the CP2K performance on GPUs in general?
<br>> 
<br>>                 I'm getting very low performance on GPUs(Nvidia V100
<br>>                 SXM2). It is a single node benchmark with 8 GPUs and
<br>>                 Intel Skylake Gold 6148 dual processors. 
<br>> 
<br>>                 The CP2K time on 8 GPUs (CP2K-6.1 psmp version,
<br>>                 ifort-2017, CUDA-9.2, 8mpi ranks + 5 threads per rank)
<br>>                 is still slower than CP2K time of CPU only benchmark.
<br>> 
<br>>                 For CPU runs, the CP2K-6.1 is built with LIBXSMM-1.8.3.
<br>> 
<br>>                 For GPU runs, have tried both with and without LIBXSMM.
<br>>                 There is no performance difference. But both's
<br>>                 performance is still slower than CPU only benchmark even
<br>>                 after using all the 8 GPUs & all 40 cores of CPU. Can
<br>>                 some one please share their experience on CP2K
<br>>                 performance with GPUs.
<br>> 
<br>>                 The CUDA specific DFLAGS used are: -D__ACC -D__DBCSR_ACC
<br>>                 -D__PW_CUDA.
<br>> 
<br>> -- 
<br>> You received this message because you are subscribed to the Google
<br>> Groups "cp2k" group.
<br>> To unsubscribe from this group and stop receiving emails from it, send
<br>> an email to <a onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;" href="javascript:" target="_blank" rel="nofollow" gdf-obfuscated-mailto="0iRqnhQgCQAJ">cp...@googlegroups.<wbr>com</a>
<br>> <mailto:<a onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;" href="javascript:" target="_blank" rel="nofollow" gdf-obfuscated-mailto="0iRqnhQgCQAJ">cp...@<wbr>googlegroups.com</a>>.
<br>> To post to this group, send email to <a onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;" href="javascript:" target="_blank" rel="nofollow" gdf-obfuscated-mailto="0iRqnhQgCQAJ">c...@googlegroups.com</a>
<br>> <mailto:<a onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;" href="javascript:" target="_blank" rel="nofollow" gdf-obfuscated-mailto="0iRqnhQgCQAJ">c...@googlegroups.com</a>><wbr>.
<br>> Visit this group at <a onmousedown="this.href='https://groups.google.com/group/cp2k';return true;" onclick="this.href='https://groups.google.com/group/cp2k';return true;" href="https://groups.google.com/group/cp2k" target="_blank" rel="nofollow">https://groups.google.com/<wbr>group/cp2k</a>.
<br>> To view this discussion on the web visit
<br>> <a onmousedown="this.href='https://groups.google.com/d/msgid/cp2k/4920c538-3d63-4754-8dc3-76396262d543%40googlegroups.com';return true;" onclick="this.href='https://groups.google.com/d/msgid/cp2k/4920c538-3d63-4754-8dc3-76396262d543%40googlegroups.com';return true;" href="https://groups.google.com/d/msgid/cp2k/4920c538-3d63-4754-8dc3-76396262d543%40googlegroups.com" target="_blank" rel="nofollow">https://groups.google.com/d/<wbr>msgid/cp2k/4920c538-3d63-4754-<wbr>8dc3-76396262d543%<wbr>40googlegroups.com</a>
<br>> <<a onmousedown="this.href='https://groups.google.com/d/msgid/cp2k/4920c538-3d63-4754-8dc3-76396262d543%40googlegroups.com?utm_medium\x3demail\x26utm_source\x3dfooter';return true;" onclick="this.href='https://groups.google.com/d/msgid/cp2k/4920c538-3d63-4754-8dc3-76396262d543%40googlegroups.com?utm_medium\x3demail\x26utm_source\x3dfooter';return true;" href="https://groups.google.com/d/msgid/cp2k/4920c538-3d63-4754-8dc3-76396262d543%40googlegroups.com?utm_medium=email&utm_source=footer" target="_blank" rel="nofollow">https://groups.google.com/d/<wbr>msgid/cp2k/4920c538-3d63-4754-<wbr>8dc3-76396262d543%<wbr>40googlegroups.com?utm_medium=<wbr>email&utm_source=footer</a>>.
<br>> For more options, visit <a onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;" href="https://groups.google.com/d/optout" target="_blank" rel="nofollow">https://groups.google.com/d/<wbr>optout</a>.
<br>
<br>-- 
<br>Tiziano Müller
<br>University of Zurich
<br>Department of Chemistry
<br>Winterthurerstrasse 190
<br>CH-8057 Zürich
<br>
<br>Tel: +41 44 63 54234
<br><a onmousedown="this.href='http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.chem.uzh.ch\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEaSd80M-dQwlA8q5GEKkKY5wRCiQ';return true;" onclick="this.href='http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.chem.uzh.ch\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEaSd80M-dQwlA8q5GEKkKY5wRCiQ';return true;" href="http://www.chem.uzh.ch" target="_blank" rel="nofollow">www.chem.uzh.ch</a>
<br><a onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;" href="javascript:" target="_blank" rel="nofollow" gdf-obfuscated-mailto="0iRqnhQgCQAJ">tiz...@chem.uzh.ch</a>
<br></blockquote></div>