[CP2K:6545] Re: reading dcd files

Ole Schütt o... at schuett.name
Tue Jun 2 12:51:24 UTC 2015


Hi Steve,

I did not realize that you had such an elaborate library project going on, 
pretty cool :-)
I added your project our collections of tools: http://cp2k.org/tools:pwtools
Feel free to elaborate on the page.

> The numpy.fromfile() idea didn't actually occur to me even though it is 
pretty obvious, thanks.

You're welcome. I came up with that trick when I wrote a parser for binary 
gromacs trajectories, a couple years ago.

> I also adapted an earlier Fortran + f2py version to cp2k files. It is 
> much faster than the simple Python version, even though it walks the 
> file twice (once to find nstep, rather brute-force, might change later). 

Hmm, looks like your are still doing quite a bit of processing in python. 
In principle one should be able to get pretty close to the Fortran 
performance by just using numpy routines. But I guess you are aware of that 
;-)

-Ole

Am Montag, 1. Juni 2015 20:35:14 UTC+2 schrieb Steve Schmerler:
>
> On May 22 01:54 -0700, Ole Schütt wrote: 
> > Yes, exactly. Maybe as http://cp2k.org/tools:dumpdcd or 
> > http://cp2k.org/howto:read_dcd_files . 
> > 
> > For what solution did you settle now? Are you using a combination of 
> seek() 
> > and numpy.fromfile() to parse the DCD file directly, or are you calling 
> > dumpdcd in a pipe? 
>
> The numpy.fromfile() idea didn't actually occur to me even though it is 
> pretty 
> obvious, thanks. Based on this I wrote a simple numpy-only version. It 
> uses a 
> rather simple approach, is not super-fast and can certainly be optimized. 
> It 
> took me a while to read the header correctly since it has "marker" 
> integers 
> (84, 164, ...) around each block of data, which are the size (in bytes) of 
> the 
> block they enclose. In Fortran, one can apparently just read the data 
> without 
> reading the markers explicitly. Again, VMDs dcdplugin.c saved the day. 
>
> The dcd reader is part of my "pwtools" project [1], but the dcd part is 
> (almost) 
> independent from the rest - you only need the file "dcp.py" [2]. The 
> Python 
> version is called "read_dcd_data_py": 
>     
>     $ python 
>     >>> import dcd 
>     >>> cc,co = dcd.read_dcd_data_py('cp2k.dcd') 
>
> where "cc" is a (nstep,6) array (float64) of unit cell parameters 
> (a,b,c,alpha,beta,gamma) -- make sure you use dcd_aligned_cell in cp2k. 
> "co" is 
> a (nstep,natoms,3) array (float32) of cartesian coordinates in Angstrom. 
> Have a 
> look at the doc string for how to read lammps-style files (keyword 
> convang=True). 
>
> I also adapted an earlier Fortran + f2py version to cp2k files. It is 
> much faster than the simple Python version, even though it walks the 
> file twice (once to find nstep, rather brute-force, might change later). 
> To use this stand-alone, copy dcd.f90 somewhere and run f2py (part of 
> numpy) to compile the extension module into "_dcd.so", which you import 
> with "import _dcd": 
>     
>     $ f2py -c dcd.f90 -m _dcd 
>     $ python 
>     >>> import _dcd 
>     >>> nstep, natoms, timestep = _dcd.get_dcd_file_info('cp2k.dcd', 
> nstephdr=False) 
>     >>> cc,co = _dcd.read_dcd_data('cp2k.dcd', nstep, natoms, 
> convang=False) 
>
> There is also a wrapper function in dcd.py called "read_dcd_data_f", which 
> does 
> exactly this. Again, please have a look at the doc string for more 
> examples. 
>
> I don't use this regularly on large files at the moment, so there may be 
> issues 
> that I'm not aware of. If you find bugs, have better / faster versions 
> etc, 
> please let me know. 
>
> best, 
> Steve 
>
> [1] https://bitbucket.org/elcorto/pwtools 
>     Get it: 
>         hg clone https://bitbucket.org/elcorto/pwtools 
>         or 
>         wget https://bitbucket.org/elcorto/pwtools/get/e591501fa367.zip 
>
> [2] 
> https://bitbucket.org/elcorto/pwtools/src/e591501fa3671cab7f7583ad747541cb14512d55/dcd.py?at=default 
>
> -- 
> Steve Schmerler 
> Institut für Theoretische Physik 
> TU Freiberg, Germany 
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20150602/a193afbb/attachment.htm>


More information about the CP2K-user mailing list