<div dir="ltr">Nico, Juerg,<div><br></div><div>Thank you. That all makes sense.</div><div><br></div><div>Best regards,</div><div>Jerry<br><br>On Friday, May 4, 2018 at 5:47:54 AM UTC-4, Nico Holmberg wrote:<blockquote class="gmail_quote" style="margin: 0;margin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir="ltr">Hi Jerry,<br><br>Just to expand on Juerg's answer, it appears that you are using the OT method together with the FULL_SINGLE_INVERSE <a href="https://manual.cp2k.org/trunk/CP2K_INPUT/FORCE_EVAL/DFT/SCF/OT.html#list_PRECONDITIONER" target="_blank" rel="nofollow" onmousedown="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fmanual.cp2k.org%2Ftrunk%2FCP2K_INPUT%2FFORCE_EVAL%2FDFT%2FSCF%2FOT.html%23list_PRECONDITIONER\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFdZ91DJVf4MKF1GWu385w5txrGlQ';return true;" onclick="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fmanual.cp2k.org%2Ftrunk%2FCP2K_INPUT%2FFORCE_EVAL%2FDFT%2FSCF%2FOT.html%23list_PRECONDITIONER\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFdZ91DJVf4MKF1GWu385w5txrGlQ';return true;">preconditioner</a>. ELPA won't accelerate such simulations because no direct diagonalization is performed during the SCF run.<br><br>If you swap the preconditioner to FULL_ALL, you should notice a ~50 % improvement in the <b>first step</b> of each inner SCF loop iteration when you use ELPA instead of ScaLAPACK. Futhermore, if you switch OT to a standard diagonalization based SCF solver, each SCF iteration should be accelerated by the same factor. I would however recommend that you always use OT if the system is suitable for the method.<br><br><br>BR,<br><br>Nico <br><br>torstai 3. toukokuuta 2018 19.24.34 UTC+3 jgh kirjoitti:<blockquote class="gmail_quote" style="margin:0;margin-left:0.8ex;border-left:1px #ccc solid;padding-left:1ex">Hi
<br>
<br>how much time is used for Diagonalization in these runs?
<br>I would guess it is a rather small fraction. We developed the OT
<br>method especially to avoid diagonalization as much as possible.
<br>If this is in fact the case, no speedup can be expected by using
<br>ELPA.
<br>
<br>best regards
<br>
<br>Juerg
<br>------------------------------<wbr>------------------------------<wbr>--
<br>Juerg Hutter Phone : ++41 44 635 4491
<br>Institut für Chemie C FAX : ++41 44 635 6838
<br>Universität Zürich E-mail: <a rel="nofollow">hut...@chem.uzh.ch</a>
<br>Winterthurerstrasse 190
<br>CH-8057 Zürich, Switzerland
<br>------------------------------<wbr>------------------------------<wbr>---
<br>
<br>-----<a rel="nofollow">cp...@googlegroups.com</a> wrote: -----
<br>To: cp2k <<a rel="nofollow">cp...@googlegroups.com</a>>
<br>From: Jerry Tanoury
<br>Sent by: <a rel="nofollow">cp...@googlegroups.com</a>
<br>Date: 05/03/2018 06:15PM
<br>Subject: [CP2K:10266] Re: ELPA speed-up with Intel-compiled code
<br>
<br>Dear Alfio,
<br>My apologies for not attaching these at the beginning. The input and output files are now attached. Please note that the file names (coordinate files, etc.) have been changed in the input and output files for proprietary reasons. Also, the ELPA run was killed rather soon after it began because no speed-up was observed.
<br>
<br>Best regards,
<br>Jerry
<br>
<br>On Wednesday, May 2, 2018 at 2:57:12 PM UTC-4, Jerry Tanoury wrote:
<br>Dear forum,
<br>I am running CP2K version 5.1 compiled with Intel 2017 update 5 compilers and corresponding MKL libs. The arch file is attached. Everything runs as expected. I then build ELPA-2016.05.004 as shown below and built an ELPA-enable cp2k version according to the attached arch file:
<br>
<br> ../configure --prefix=/cluster/home/<wbr>tanoury/CP2K/intelbuilt_<wbr>packages/2017u5/elpa-2016.05.<wbr>004 FC=mpiifort FCFLAGS=-O2 -xHost CC=mpiicc CFLAGS=-O2 -xHost --enable-option-checking=fatal --enable-static=yes --enable-avx2=no --enable-avx=no SCALAPACK_LDFLAGS=-L/cluster/<wbr>home/tanoury/intel/2017u5/<wbr>compilers_and_libraries_2017.<wbr>5.239/linux/mkl/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread -Wl,-rpath,/cluster/home/<wbr>tanoury/intel/2017u5/<wbr>compilers_and_libraries_2017.<wbr>5.239/linux/mkl/lib/intel64 SCALAPACK_FCFLAGS=-L/cluster/<wbr>home/tanoury/intel/2017u5/<wbr>compilers_and_libraries_2017.<wbr>5.239/linux/mkl/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread -I/cluster/home/tanoury/intel/<wbr>2017u5/compilers_and_<wbr>libraries_2017.5.239/linux/<wbr>mkl/include/intel64/lp64
<br>
<br>When doing a speed test on 80 cores, I saw no speed-up from ELPA. Is this unexpected? Did I build ELPA correctly? Perhaps I need to run on 100's of cores.
<br>
<br>Thank you for the help,
<br>Jerry
<br>
<br>
<br> --
<br> You received this message because you are subscribed to the Google Groups "cp2k" group.
<br> To unsubscribe from this group and stop receiving emails from it, send an email to <a rel="nofollow">cp2k+...@googlegroups.com</a>.
<br> To post to this group, send email to <a rel="nofollow">cp...@googlegroups.com</a>.
<br> Visit this group at <a href="https://groups.google.com/group/cp2k" rel="nofollow" target="_blank" onmousedown="this.href='https://groups.google.com/group/cp2k';return true;" onclick="this.href='https://groups.google.com/group/cp2k';return true;">https://groups.google.com/<wbr>group/cp2k</a>.
<br> For more options, visit <a href="https://groups.google.com/d/optout" rel="nofollow" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/<wbr>optout</a>.
<br>
<br>
<br>[attachment "CP2K_Test.inp" removed by Jürg Hutter/at/UZH]
<br>[attachment "CP2K_Test1-ELPA.inp" removed by Jürg Hutter/at/UZH]
<br>[attachment "CP2K_Test1-ELPA.output" removed by Jürg Hutter/at/UZH]
<br></blockquote></div></blockquote></div></div>