[CP2K-user] [CP2K:13065] Job setup: how to run multiple single point calculations efficiently?

hut... at chem.uzh.ch hut... at chem.uzh.ch
Mon Apr 6 17:33:07 UTC 2020


Hi 

you are looking for

 CP2K_INPUT / MOTION / MD
    ENSEMBLE REFTRAJ

to rerun a pre-calculated set of molecular coordinates.

regards

Juerg Hutter
--------------------------------------------------------------
Juerg Hutter                         Phone : ++41 44 635 4491
Institut für Chemie C                FAX   : ++41 44 635 6838
Universität Zürich                   E-mail: hut... at chem.uzh.ch
Winterthurerstrasse 190
CH-8057 Zürich, Switzerland
---------------------------------------------------------------

-----cp... at googlegroups.com wrote: -----
To: "cp2k" <cp... at googlegroups.com>
From: "Sam Niblett" 
Sent by: cp... at googlegroups.com
Date: 04/06/2020 06:54PM
Subject: [CP2K:13065] Job setup: how to run multiple single point calculations efficiently?

Dear all,

I need to perform a reasonably large number of energy+force single point calculations for distinct configurations of a single molecular system (~1000-10000 in the first instance but lots more later on). I'm using DFT with a rev-PBE functional, running a pre-compiled CP2K module on a large supercomputer. I'm on a fairly tight budget of core hours, so minimising the runtime is my main concern.

Is there a keyword that will instruct CP2K to perform the same operation on a series of different starting configurations (preferably read from a single xyz file, for example)? Something along the lines of LAMMPS' rerun command. 



I have looked through the CP2K_INPUT documentation and the best option I could find is to use FARMING to perform a separate ENERGY_FORCE job on each starting configuration. This works, but it is undesirable for three reasons:

1) It requires creating a separate input folder for each configuration (not a big problem, but it's annoying and inelegant given that all the input except the system coordinates is identical for every job)

2) This method appears to reallocate and reinitialise the functionals and system details for every set of input data, which is unnecessary overhead given that those details are the same each time.

3) My starting configurations are similar enough that the converged electron density of one should be a good starting point for the next. By analogy with AIMD calculations I have run on the same system, I estimate that using this information could decrease the cost of the calculation by up to 80%. But FARMING doesn't know that, so each calculation starts completely from scratch.

The result is that FARMING only gives a small speedup compared to running separate CP2K jobs for each input point. Does anyone know of a better (i.e. more efficient) way to set up these calculations? 



The only other thing I could think of would be using BAND with 0 optimisation steps and K_SPRING 0, treating each configuration of my input as a separate replica. I don't know if that would give me the information I want but if it did then it would fix at least points 1 and 2 of my list above. If anyone has tried something like that before, please let me know how you got on.

I'm hoping that there's a straightforward way to perform this task and I just haven't found it in the documentation. Please point me to the relevant page if so.

Thanks, and best wishes,

Sam
  
  -- 
 You received this message because you are subscribed to the Google Groups "cp2k" group.
 To unsubscribe from this group and stop receiving emails from it, send an email to cp... at googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/8e7824cc-0f9c-40e6-878f-a9d3cdb1ddda%40googlegroups.com.




More information about the CP2K-user mailing list