[CP2K-user] Creating/Finding a small but realistic MD model for my research
Tue Boesen
aly... at gmail.com
Tue May 18 15:38:46 UTC 2021
I’m completely new to cp2k and have only just installed it today, because I
learned that it was used to generate the MD17 dataset, which I am
interested in.
I’m currently starting up a neural network approach to molecular dynamics
and for that I need a dataset. The ideal dataset for my research is
essentially the MD17 dataset found here 1
<http://www.quantum-machine.org/datasets/#md-datasets>: However, there is a
problem with this dataset for my use-case, as quoted in the originating
article, the MD17 dataset is created as:
"The data used for training the DFT models were created running abinitio MD
in the NVT ensemble using the Nosé-Hoover ther- mostat at 500 K during a
200 ps simulation with a resolution of 0.5 fs. We computed forces and
energies using all-electrons at the generalized gradient approximation
level of theory with the Perdew-Burke-Ernzerhof (PBE) 65
exchange-correlation functional, treating van der Waals interactions with
the Tkatchenko-Scheffler (TS) method 66 . All calculations were performed
with FHI-aims 67 . The final training data was generated by subsampling the
full trajectory under preservation of the Maxwell-Boltzmann distribution
for the energies.
To create the coupled cluster datasets, we reused the same geometries as
for the
DFT models and recomputed energies and forces using all-electron coupled
cluster
with single, double, and perturbative triple excitations (CCSD(T)). The
Dunning’s
correlation-consistent basis set cc-pVTZ was used for ethanol, cc-pVDZ for
toluene
and malonaldehyde and CCSD/cc-pVDZ for aspirin. All calculations were
performed with the Psi4 68 software suite."
So the data has been subsampled, meaning that the datapoints in the MD17
dataset do not have the same time-step size between two following data
samples, which is needed for my work.
So my question are:
Is there anyway of generating this dataset again given the above
information? I have tried contacting the author, but haven’t heard anything
back yet.
Or alternatively, are there any other simple systems like this available
online or does anyone have any scripts/tutorial for how to generate a
realistic molecular system dataset.
What I need are the atomic positions at each step, and ideally I would like
the atomic velocities and Force vectors as well if possible. I would like
to generate at least 100k-500k time-steps since I need quite a lot of data
for the neural network training.
Any insight from experienced cp2k users or people in the field of molecular
dynamics would be greatly appreciated.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20210518/5d0b33d3/attachment.htm>
More information about the CP2K-user
mailing list