[CP2K:6545] Re: reading dcd files

Steve Schmerler elco... at gmail.com
Mon Jun 1 18:35:08 UTC 2015

On May 22 01:54 -0700, Ole Schütt wrote:
> Yes, exactly. Maybe as http://cp2k.org/tools:dumpdcd or 
> http://cp2k.org/howto:read_dcd_files .
> For what solution did you settle now? Are you using a combination of seek() 
> and numpy.fromfile() to parse the DCD file directly, or are you calling 
> dumpdcd in a pipe?

The numpy.fromfile() idea didn't actually occur to me even though it is pretty
obvious, thanks. Based on this I wrote a simple numpy-only version. It uses a
rather simple approach, is not super-fast and can certainly be optimized. It
took me a while to read the header correctly since it has "marker" integers
(84, 164, ...) around each block of data, which are the size (in bytes) of the
block they enclose. In Fortran, one can apparently just read the data without
reading the markers explicitly. Again, VMDs dcdplugin.c saved the day.

The dcd reader is part of my "pwtools" project [1], but the dcd part is (almost)
independent from the rest - you only need the file "dcp.py" [2]. The Python
version is called "read_dcd_data_py":
    $ python
    >>> import dcd
    >>> cc,co = dcd.read_dcd_data_py('cp2k.dcd')

where "cc" is a (nstep,6) array (float64) of unit cell parameters
(a,b,c,alpha,beta,gamma) -- make sure you use dcd_aligned_cell in cp2k. "co" is
a (nstep,natoms,3) array (float32) of cartesian coordinates in Angstrom. Have a
look at the doc string for how to read lammps-style files (keyword

I also adapted an earlier Fortran + f2py version to cp2k files. It is
much faster than the simple Python version, even though it walks the
file twice (once to find nstep, rather brute-force, might change later).
To use this stand-alone, copy dcd.f90 somewhere and run f2py (part of
numpy) to compile the extension module into "_dcd.so", which you import
with "import _dcd":
    $ f2py -c dcd.f90 -m _dcd
    $ python
    >>> import _dcd
    >>> nstep, natoms, timestep = _dcd.get_dcd_file_info('cp2k.dcd', nstephdr=False)
    >>> cc,co = _dcd.read_dcd_data('cp2k.dcd', nstep, natoms, convang=False)

There is also a wrapper function in dcd.py called "read_dcd_data_f", which does
exactly this. Again, please have a look at the doc string for more examples.

I don't use this regularly on large files at the moment, so there may be issues
that I'm not aware of. If you find bugs, have better / faster versions etc,
please let me know.


[1] https://bitbucket.org/elcorto/pwtools 
    Get it:
        hg clone https://bitbucket.org/elcorto/pwtools
        wget https://bitbucket.org/elcorto/pwtools/get/e591501fa367.zip

[2] https://bitbucket.org/elcorto/pwtools/src/e591501fa3671cab7f7583ad747541cb14512d55/dcd.py?at=default

Steve Schmerler
Institut für Theoretische Physik
TU Freiberg, Germany

More information about the CP2K-user mailing list