[CP2K-user] massive slowdown in SCF timings during MD

Sean Fischer safisc... at gmail.com
Thu Oct 17 11:09:49 UTC 2019


The issue I was having appears to be tied to the version of OpenMPI that I 
was using. I'm not sure if it is just my particular build of OpenMPI 
v1.10.2 or if there is a bug somewhere, but moving to more recent versions 
of OpenMPI appears to have eliminated the increasing time per step.
 

On Friday, October 11, 2019 at 1:28:21 PM UTC-4, Sean Fischer wrote:
>
> Hi Matt,
>
> I agree that some of the behavior does look like my resources are being 
> oversubscribed, but I am sure that I have dedicated access to the nodes I 
> am using, and I see nothing unusual with top on the nodes I'm using. I have 
> another test that I ran where I did not get the large fluctuations but 
> still have the linear trend, though this one was stable for longer before 
> the increase. See below, and the input file for that run is attached (same 
> initial geometry but some changes to the other settings).
>
> [image: md_timings2.png]
>
> Best,
> Sean
>
> On Friday, October 11, 2019 at 12:10:01 PM UTC-4, Matt W wrote:
>>
>> Hi Sean,
>>
>> are you sure you have dedicated access to the resource you run on? Those 
>> large fluctuations look likely to be caused by other jobs running on the 
>> node(s) you are using (though the general linearish trend is surprising).
>>
>> Matt 
>>
>> On Friday, October 11, 2019 at 1:41:03 PM UTC+1, Sean Fischer wrote:
>>>
>>> I'm attempting to run MD using QS on a 64 molecule water box. So far I 
>>> have been unable to achieve consistent performance across the MD 
>>> trajectory. After a while the time per MD step dramatically increases, but 
>>> the number of SCF iterations per step stays basically constant, as you can 
>>> see in the included image.
>>>
>>> [image: md_timing.png]
>>>
>>> To further illustrate this, the following is from one of the first steps 
>>> in the trajectory
>>>
>>>
>>>   ----------------------------------- OT 
>>> ---------------------------------------
>>>
>>>   Minimizer      : DIIS                : direct inversion
>>>
>>>                                          in the iterative subspace
>>>
>>>                                          using   7 DIIS vectors
>>>
>>>                                          safer DIIS on
>>>
>>>   Preconditioner : FULL_ALL            : diagonalization, state 
>>> selective
>>>
>>>   Precond_solver : DEFAULT
>>>
>>>   stepsize       :    0.15000000                  energy_gap     :    
>>> 0.08000000
>>>
>>>   eps_taylor     :   0.10000E-15                  max_taylor     :     
>>>         4
>>>
>>>   ----------------------------------- OT 
>>> ---------------------------------------
>>>
>>>
>>>   Step     Update method      Time    Convergence         Total energy  
>>>   Change
>>>
>>>   
>>> ------------------------------------------------------------------------------
>>>
>>>      1 OT DIIS     0.15E+00    2.3     0.00030148     -1107.9047914512 
>>> -1.11E+03
>>>
>>>      2 OT DIIS     0.15E+00    1.6     0.00012826     -1107.9089080136 
>>> -4.12E-03
>>>
>>>      3 OT DIIS     0.15E+00    1.6     0.00007994     -1107.9095257164 
>>> -6.18E-04
>>>
>>>      4 OT DIIS     0.15E+00    1.6     0.00000744     -1107.9096948950 
>>> -1.69E-04
>>>
>>>      5 OT DIIS     0.15E+00    2.0     0.00000377     -1107.9096964458 
>>> -1.55E-06
>>>
>>>      6 OT DIIS     0.15E+00    1.6     0.00000067     -1107.9096967653 
>>> -3.19E-07
>>>
>>>
>>>   *** SCF run converged in     6 steps ***
>>>
>>>
>>>
>>>   Electronic density on regular grids:       -511.9999993695        
>>> 0.0000006305
>>>
>>>   Core density on regular grids:              511.9999999846       
>>> -0.0000000154
>>>
>>>   Total charge density on r-space grids:        0.0000006151
>>>
>>>   Total charge density g-space grids:           0.0000006151
>>>
>>>
>>>
>>> While after ~2500 steps, the same section now looks like
>>>
>>>
>>>   ----------------------------------- OT 
>>> ---------------------------------------
>>>
>>>   Minimizer      : DIIS                : direct inversion
>>>
>>>                                          in the iterative subspace
>>>
>>>                                          using   7 DIIS vectors
>>>
>>>                                          safer DIIS on
>>>
>>>   Preconditioner : FULL_ALL            : diagonalization, state 
>>> selective
>>>
>>>   Precond_solver : DEFAULT
>>>
>>>   stepsize       :    0.15000000                  energy_gap     :    
>>> 0.08000000
>>>
>>>   eps_taylor     :   0.10000E-15                  max_taylor     :     
>>>         4
>>>
>>>   ----------------------------------- OT 
>>> ---------------------------------------
>>>
>>>
>>>   Step     Update method      Time    Convergence         Total energy  
>>>   Change
>>>
>>>   
>>> ------------------------------------------------------------------------------
>>>
>>>      1 OT DIIS     0.15E+00    6.3     0.00003184     -1108.0316770158 
>>> -1.11E+03
>>>
>>>      2 OT DIIS     0.15E+00   17.0     0.00001599     -1108.0317180830 
>>> -4.11E-05
>>>
>>>      3 OT DIIS     0.15E+00   18.4     0.00000956     -1108.0317247395 
>>> -6.66E-06
>>>
>>>      4 OT DIIS     0.15E+00   21.9     0.00000119     -1108.0317273970 
>>> -2.66E-06
>>>
>>>      5 OT DIIS     0.15E+00    6.8     0.00000063     -1108.0317274283 
>>> -3.14E-08
>>>
>>>
>>>   *** SCF run converged in     5 steps ***
>>>
>>>
>>>
>>>   Electronic density on regular grids:       -511.9999994472        
>>> 0.0000005528
>>>
>>>   Core density on regular grids:              511.9999999845       
>>> -0.0000000155
>>>
>>>   Total charge density on r-space grids:        0.0000005373
>>>
>>>   Total charge density g-space grids:           0.0000005373
>>>
>>>
>>>
>>> This has happened with different functionals, pseudopotentials, and SCF 
>>> parameters (PS vs ASPC extrapolation, different orders of extrapolation, OT 
>>> DIIS vs DIIS/Diag., different preconditioners). I recompiled the whole code 
>>> with the toolchain install script generating all dependencies and got the 
>>> same result. There is no indication from the dynamics of the system itself 
>>> as to what the problem might be as the potential and kinetic energies are 
>>> well behaved, and the constant of motion for the dynamics is reasonably 
>>> conserved. I've watched the resulting trajectory itself, and there is 
>>> nothing unusually happening (no bonding breaking, etc.) The input file I 
>>> used for run in the plot above is attached. The starting geometry for the 
>>> water molecules was taken from the 64 molecule benchmark in the tests/QS 
>>> directory.
>>>
>>>
>>> If anyone has any suggestions on possible causes or where I can look for 
>>> more information, I would be very appreciative.
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20191017/76f5711c/attachment.htm>


More information about the CP2K-user mailing list