[CP2K:6545] Re: reading dcd files
Ole Schütt
o... at schuett.name
Tue Jun 2 12:51:24 UTC 2015
Hi Steve,
I did not realize that you had such an elaborate library project going on,
pretty cool :-)
I added your project our collections of tools: http://cp2k.org/tools:pwtools
Feel free to elaborate on the page.
> The numpy.fromfile() idea didn't actually occur to me even though it is
pretty obvious, thanks.
You're welcome. I came up with that trick when I wrote a parser for binary
gromacs trajectories, a couple years ago.
> I also adapted an earlier Fortran + f2py version to cp2k files. It is
> much faster than the simple Python version, even though it walks the
> file twice (once to find nstep, rather brute-force, might change later).
Hmm, looks like your are still doing quite a bit of processing in python.
In principle one should be able to get pretty close to the Fortran
performance by just using numpy routines. But I guess you are aware of that
;-)
-Ole
Am Montag, 1. Juni 2015 20:35:14 UTC+2 schrieb Steve Schmerler:
>
> On May 22 01:54 -0700, Ole Schütt wrote:
> > Yes, exactly. Maybe as http://cp2k.org/tools:dumpdcd or
> > http://cp2k.org/howto:read_dcd_files .
> >
> > For what solution did you settle now? Are you using a combination of
> seek()
> > and numpy.fromfile() to parse the DCD file directly, or are you calling
> > dumpdcd in a pipe?
>
> The numpy.fromfile() idea didn't actually occur to me even though it is
> pretty
> obvious, thanks. Based on this I wrote a simple numpy-only version. It
> uses a
> rather simple approach, is not super-fast and can certainly be optimized.
> It
> took me a while to read the header correctly since it has "marker"
> integers
> (84, 164, ...) around each block of data, which are the size (in bytes) of
> the
> block they enclose. In Fortran, one can apparently just read the data
> without
> reading the markers explicitly. Again, VMDs dcdplugin.c saved the day.
>
> The dcd reader is part of my "pwtools" project [1], but the dcd part is
> (almost)
> independent from the rest - you only need the file "dcp.py" [2]. The
> Python
> version is called "read_dcd_data_py":
>
> $ python
> >>> import dcd
> >>> cc,co = dcd.read_dcd_data_py('cp2k.dcd')
>
> where "cc" is a (nstep,6) array (float64) of unit cell parameters
> (a,b,c,alpha,beta,gamma) -- make sure you use dcd_aligned_cell in cp2k.
> "co" is
> a (nstep,natoms,3) array (float32) of cartesian coordinates in Angstrom.
> Have a
> look at the doc string for how to read lammps-style files (keyword
> convang=True).
>
> I also adapted an earlier Fortran + f2py version to cp2k files. It is
> much faster than the simple Python version, even though it walks the
> file twice (once to find nstep, rather brute-force, might change later).
> To use this stand-alone, copy dcd.f90 somewhere and run f2py (part of
> numpy) to compile the extension module into "_dcd.so", which you import
> with "import _dcd":
>
> $ f2py -c dcd.f90 -m _dcd
> $ python
> >>> import _dcd
> >>> nstep, natoms, timestep = _dcd.get_dcd_file_info('cp2k.dcd',
> nstephdr=False)
> >>> cc,co = _dcd.read_dcd_data('cp2k.dcd', nstep, natoms,
> convang=False)
>
> There is also a wrapper function in dcd.py called "read_dcd_data_f", which
> does
> exactly this. Again, please have a look at the doc string for more
> examples.
>
> I don't use this regularly on large files at the moment, so there may be
> issues
> that I'm not aware of. If you find bugs, have better / faster versions
> etc,
> please let me know.
>
> best,
> Steve
>
> [1] https://bitbucket.org/elcorto/pwtools
> Get it:
> hg clone https://bitbucket.org/elcorto/pwtools
> or
> wget https://bitbucket.org/elcorto/pwtools/get/e591501fa367.zip
>
> [2]
> https://bitbucket.org/elcorto/pwtools/src/e591501fa3671cab7f7583ad747541cb14512d55/dcd.py?at=default
>
> --
> Steve Schmerler
> Institut für Theoretische Physik
> TU Freiberg, Germany
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20150602/a193afbb/attachment.htm>
More information about the CP2K-user
mailing list