<div>As Juerg said it should be faster with less MPI processes, but more OpenMP threads. <br /></div><div><br /></div><div>I have run your input with 96bff0e and get a non-divergent temperature, see attached files. <br /></div><div><br /></div><div><br /></div><div>        77           38.500000         0.299839326       330.476990188     -1103.969230863     -1103.657189273        17.857932705<br />        78           39.000000         0.307346340       338.751073029     -1103.977498885     -1103.657531080        18.164155295<br />        79           39.500000         0.314567900       346.710533754     -1103.983238691     -1103.657830726        17.741509376<br />        80           40.000000         0.317580114       350.030536001     -1103.986168791     -1103.658073710        18.011582585<br />        81           40.500000         0.317927662       350.413597467     -1103.986493519     -1103.658253882        17.798200017<br />        82           41.000000         0.316336645       348.660009515     -1103.984850934     -1103.658375196        18.344547175<br />        83           41.500000         0.313430498       345.456911738     -1103.982150123     -1103.658446039        17.976945195<br />        84           42.000000         0.310384529       342.099704129     -1103.979372481     -1103.658476537        17.793025376<br />        85           42.500000         0.308448632       339.965996766     -1103.977407540     -1103.658479115        18.275814425<br />        86           43.000000         0.307327612       338.730430643     -1103.976918117     -1103.658470852        18.830695687<br />        87           43.500000         0.309050165       340.628994637     -1103.978149971     -1103.658471897        18.735918805<br />        88           44.000000         0.311776515       343.633923974     -1103.980768492     -1103.658492562        18.843529105<br />        89           44.500000         0.314976909       347.161335500     -1103.983874737     -1103.658524800        17.970038306<br />        90           45.000000         0.318084374       350.586321635     -1103.986269644     -1103.658549288        17.953601286<br />        91           45.500000         0.318574343       351.126356036     -1103.986913690     -1103.658546746        18.015417816<br />        92           46.000000         0.317332281       349.757380146     -1103.985352951     -1103.658512315        17.927721675<br />        93           46.500000         0.313819386       345.885535701     -1103.981876150     -1103.658460342        17.877082366<br />        94           47.000000         0.309418485       341.034950355     -1103.977405356     -1103.658419075        17.874089465<br />        95           47.500000         0.305410022       336.616901517     -1103.973131047     -1103.658418859        17.901049895<br />        96           48.000000         0.302510382       333.420975189     -1103.970125214     -1103.658479706        17.981440485<br />        97           48.500000         0.301197664       331.974123573     -1103.969074798     -1103.658605806        17.807046037<br />        98           49.000000         0.302529523       333.442072165     -1103.970206549     -1103.658788096        17.922524827<br />        99           49.500000         0.305364537       336.566768459     -1103.973353332     -1103.659013102        18.066427635<br />       100           50.000000         0.309864212       341.526221524     -1103.978088370     -1103.659269171        18.189648336<br /></div><div><br /></div><div>I think you problem is that you compile your own gcc, but use openmpi and openblas from the system, a combination that is prone for errors. <br /></div><div><br /></div><div>best, Johann<br /></div><br /><div class="gmail_quote"><div dir="auto" class="gmail_attr">On Tuesday, August 13, 2024 at 10:02:33 AM UTC+2 jgh wrote:<br/></div><blockquote class="gmail_quote" style="margin: 0 0 0 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">Hi<div>I forgot to mention that the D4 library allows for OpenMP parallelism. You can probably get</div><div>some speedups by running with 4 or 8 OpenMP threads.</div><div>regards</div><div>JH<br><br></div><div class="gmail_quote"><div dir="auto" class="gmail_attr">On Tuesday, August 13, 2024 at 9:41:23 AM UTC+2 jgh wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi
<br>
<br>CP2K uses an external library for D4. Currently, this library is not running in parallel and the
<br>number of parameters that can be adjusted is minimal. You can change the cutoff radius (D4_CUTOFF)
<br>and the coordination cutoff  D4_CN_CUTOFF for possible speedups.
<br>In the CP2K internal implementation for D3, there are some additional features used to speed up
<br>the calculation (besides parallelization), e.g. constant C6 terms for the 3-body terms (REFERENCE_C9_TERM).
<br>
<br>regards
<br>JH
<br>
<br>________________________________________
<br>From: <a rel="nofollow">cp...@googlegroups.com</a> <<a rel="nofollow">cp...@googlegroups.com</a>> on behalf of <a rel="nofollow">mayank...@gmail.com</a> <<a rel="nofollow">mayank...@gmail.com</a>>
<br>Sent: Friday, August 9, 2024 6:17 PM
<br>To: cp2k
<br>Subject: [CP2K:20540] DFT-D4 issues
<br>
<br>Hi,
<br>
<br>I am using CP2K-2024.1 (git:96bff0e) installed via toolchain as
<br>
<br>./install_cp2k_toolchain.sh --with-gcc=install --with-openmpi=system --with-openblas=system --with-sirius=no --with-gsl=no --with-spfft=no --with-spla=no --with-spglib=no --with-spla=no --with-dftd4=install
<br>
<br>I was testing the dft-d4 module in CP2K for DFT NVT BOMD of 64 water molecules at 300K. The input file is a modified version of H2O-64.inp, which is attached below. For reference, when using the GGA revPBE-D3(BJ) functional, the simulation runs with no issues.
<br>
<br>598          299.000000         0.299941422       330.589518254     -1104.202651786     -1103.888925173         6.335769739
<br>599          299.500000         0.301541403       332.352985832     -1104.203091753     -1103.888929962         6.343083243
<br>600          300.000000         0.304829862       335.977460517     -1104.206316541     -1103.888980814         6.294001573
<br>601          300.500000         0.309924126       341.592257168     -1104.211543588     -1103.889060265         6.321073636
<br>602          301.000000         0.316551623       348.896953508     -1104.217560779     -1103.889142976         5.964969439
<br>603          301.500000         0.322822897       355.809028278     -1104.223085700     -1103.889206038         6.120740931
<br>604          302.000000         0.327538416       361.006380378     -1104.227137720     -1103.889238373         5.925923496
<br>
<br>The computation time per step here is ~6 sec on a 32-core Intel(R) Xeon(R) Silver 4210. The temperature increase of ~60K is expected due to DFT functional change. However if I switch to dft-d4, then
<br>
<br>77           38.500000         0.862684099       950.833395090     -1103.274695245     -1102.386378043        91.672277213
<br>78           39.000000         0.881930821       972.046751741     -1103.282855444     -1102.374307902        90.513970271
<br>79           39.500000         0.882748826       972.948340710     -1103.287092185     -1102.380456423        90.920328229
<br>80           40.000000         0.896756672       988.387512978     -1103.287828468     -1102.367459596        91.310964988
<br>81           40.500000         0.958453301      1056.388319792     -1103.284416570     -1102.302414268        90.900099128
<br>82           41.000000         1.038365692      1144.466180877     -1103.270000131     -1102.208050576        91.362470460
<br>83           41.500000         1.079534923      1189.842095377     -1103.244176025     -1102.140472954        91.729197830
<br>84           42.000000         1.097126299      1209.230962278     -1103.212461950     -1102.090361488        90.937799281
<br>85           42.500000         1.105902625      1218.904056354     -1103.178147196     -1102.046945536        90.644135027
<br>        86           43.000000         1.101998771      1214.601305186     -1103.141896114     -1102.013002638        90.902946438
<br>
<br>I can notice the energy divergence and the computation time is stable at around ~91 sec, which in my experience, is more than for the equivalent revPBE0-D3(BJ) ADMM simulation.
<br>
<br>The relevant section in the input files are :
<br>DFT-D3
<br>
<br>       &VDW_POTENTIAL
<br>         POTENTIAL_TYPE  PAIR_POTENTIAL
<br>         &PAIR_POTENTIAL
<br>           R_CUTOFF     1.0000000000000009E+01
<br>           TYPE  DFTD3(BJ)
<br>           PARAMETER_FILE_NAME ./dftd3.dat
<br>           REFERENCE_FUNCTIONAL revPBE
<br>           CALCULATE_C9_TERM  T
<br>           REFERENCE_C9_TERM  T
<br>           LONG_RANGE_CORRECTION  F
<br>         &END PAIR_POTENTIAL
<br>       &END VDW_POTENTIAL
<br>
<br>DFT-D4
<br>
<br>       &VDW_POTENTIAL
<br>         POTENTIAL_TYPE  PAIR_POTENTIAL
<br>         &PAIR_POTENTIAL
<br>           R_CUTOFF     1.0000000000000009E+01
<br>           !TYPE  DFTD3(BJ)
<br>           TYPE  DFTD4
<br>           !PARAMETER_FILE_NAME ./dftd3.dat
<br>           REFERENCE_FUNCTIONAL revPBE
<br>           !CALCULATE_C9_TERM  T
<br>           !REFERENCE_C9_TERM  T
<br>           LONG_RANGE_CORRECTION  F
<br>         &END PAIR_POTENTIAL
<br>       &END VDW_POTENTIAL
<br>
<br>I haven't used DFT-D4 before, so I am not sure if I need to enter any addtional parameters in the dispersion block. But the 15:1 computation time ratio between dft-d4 and dft-d3bj suggest there is some issue in the simulation setup. Can you suggest any relevant changes I need to make resolve this situation?
<br>
<br>Best Regards,
<br>Mayank
<br>
<br>
<br>--
<br>You received this message because you are subscribed to the Google Groups "cp2k" group.
<br>To unsubscribe from this group and stop receiving emails from it, send an email to <a rel="nofollow">cp2k+uns...@googlegroups.com</a><mailto:<a rel="nofollow">cp2k+uns...@googlegroups.com</a>>.
<br>To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/cp2k/49eaaddb-c3f9-45ed-ac49-c91bdf6c73d7n%40googlegroups.com" rel="nofollow" target="_blank" data-saferedirecturl="https://www.google.com/url?hl=en&q=https://groups.google.com/d/msgid/cp2k/49eaaddb-c3f9-45ed-ac49-c91bdf6c73d7n%2540googlegroups.com&source=gmail&ust=1723720190053000&usg=AOvVaw1N79yL0FvXqyKb57dJJKlI">https://groups.google.com/d/msgid/cp2k/49eaaddb-c3f9-45ed-ac49-c91bdf6c73d7n%40googlegroups.com</a><<a href="https://groups.google.com/d/msgid/cp2k/49eaaddb-c3f9-45ed-ac49-c91bdf6c73d7n%40googlegroups.com?utm_medium=email&utm_source=footer" rel="nofollow" target="_blank" data-saferedirecturl="https://www.google.com/url?hl=en&q=https://groups.google.com/d/msgid/cp2k/49eaaddb-c3f9-45ed-ac49-c91bdf6c73d7n%2540googlegroups.com?utm_medium%3Demail%26utm_source%3Dfooter&source=gmail&ust=1723720190053000&usg=AOvVaw37_sg4nVHj5emLVT3oUHs8">https://groups.google.com/d/msgid/cp2k/49eaaddb-c3f9-45ed-ac49-c91bdf6c73d7n%40googlegroups.com?utm_medium=email&utm_source=footer</a>>.
<br></blockquote></div></blockquote></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups "cp2k" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="mailto:cp2k+unsubscribe@googlegroups.com">cp2k+unsubscribe@googlegroups.com</a>.<br />
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/cp2k/60489cc8-639b-4daa-bbdf-23e5d1af3994n%40googlegroups.com?utm_medium=email&utm_source=footer">https://groups.google.com/d/msgid/cp2k/60489cc8-639b-4daa-bbdf-23e5d1af3994n%40googlegroups.com</a>.<br />