merged wannier center and coordinate output file

Axel akoh... at gmail.com
Mon Jul 9 17:31:34 UTC 2007


On Jul 9, 12:33 pm, Teodoro Laino <teodor... at gmail.com> wrote:
> On 9 Jul 2007, at 18:03, Axel wrote:

[...]

> IONS+CENTER.xyz is a text file as far as I understand from the CPMD..
> So writing a text files takes as much time in cp2k as in cpmd..

in my (hacked) version of cpmd i can write .dcd. ;-)

> Now rises the question.. why creating a new format?

sorry, but this is _not_ a new format. this is about writing an
alternative output with a well supported format (.dcd,.xyz,.pdb).
i'd rather call the current cp2k output of the wannier centers a
new format with the spread being written in the three columns
after the coordinates, as plain .xyz has only the coordinates.
ok, almost all visualization programs ignore everything beyond
the 4th column so it is no problem in practice and offers
desirable additional information instead.

as to why write an _alternate_ output file: convenience and
consistency. with postprocessing, i always run the risk of
mixing the wrong files and i have to re-do something, that seems
much easier to do during the run itself.

> The world is already full of a messiness of formats (many of them not
> even documented (have a look at the time we both spent together for
> the PSF))..

exactly! why should i have to write a program that needs to
postprocess
my data, when cp2k can write it in a well supported format right away?


> But since we live in a democratic world, if you want a single file
> this is what you've to do:
>
> Immediatey after you've your wannier centers you can create a fake
> particle_set with the dimension of particles+wannier centers
> (you can do this locally.. no need to have one allocated since the
> very beginning of the calculation..
> the cost of allocating/deallocating this particle_set is negligible
> w.r.t. the QS calculation)
> and fill the particle_set with all information of the real particles
> and the fake informations from wannier centers..
> You can limit yourself to fill the information in the particle_set
> only printed out from the routines that write the coordinates..
> Once the particle_set is filled you can call the routine that dumps
> the atomic coordinates (at the moment supports only XYZ, DCD no PDB)..
> Remember to apply PBC (if you want them) because normally the
> particle_set is never processed regarding PBC...
> In this way you get one single file with all informations you need...

thanks a lot. i'll look into it.

[...]

> > apropos parallel performance, would there be a way to tell
> > cp2k to keep files open during a run? on several machines
> > that we are running on, frequent open/close of files can
> > have serious impact on (parallel) performance.
>
> This cannot *easily* be avoided due to the general idea behind of the
> print_keys and the high potentiality they have..
> Let me just say that in general I/O (even if not continuously opening/
> closing the unit) has great impact on the
> parallel performance (since we don't do parallel IO).. So people are
> highly invited to write to the disk only
> with a reasonable frequency.. obviously there are cases in which you

i totally agree. this is what i am currently experimenting
with and hence the many questions about i/o.

> have to write with an high frequency..
> Well in these cases I could never imagine that opening/closing a unit
> has an impact on the performance greater
> than writing data on files..

please factor in i/o buffering. with a close you force a flush and
a sync of the file. if you keep the file open, you'll write to a
buffer and only when the buffer is full, it will be written to the
file system.

> Just for my curiosity can you provide some numbers regarding this
> behavior?

no numbers with cp2k. i've seen significant up to dramatic changes
with cpmd and particularly quantum espresso (since wavefunction files
are there temporary, one can even create a pseudo-ramdisk by having
a huge buffer that can hold the whole file and intercept all flushes
and file closes). this currently affects only machines like the
cray xt3 with no local disk at all where using the iobuf module
from cray allows me to manage file buffers on a per file(name)
basis and to intercept flushes and close. however, intercepting
close under those circumstances works only for scratch files as there
is no final flush/close at the end of the job. so the open/close
of cp2k will render all optimizations in that direction meaningless.

this may not be a big issue for most of the current machine and
users, but i expect that more and more machines will have to use
parallel file systems like lustre, GPFS and alike and at the same
time, people want to run larger jobs faster on their new fancy
machines,
so this may become a much more important over the next few years.

e.g. at the moment, i have managed to get the 64 water QS benchmark
example (without localization, it may be intersting to try running
that on a separate set of nodes, btw) down to about 10 seconds per
MD step without tampering with the potentials, basis set, cutoff etc.
and it seems to scale out at around 64 cpus (=32 dual core nodes) on
the xt3 in pittsburgh. this is already _very_ nice, but on an
'extreme' machine like the xt3, one should be able to go a little
further (1 second/MD step ???). i'm not thinking short term here
(as you know, i rarely have time to do anything short term), but
what to do when we get access to true petascale hardware and for
that it seems reasonable to me to first try to evaluate how far
you can push on high-end hardware with the existing software and
some minimal(?) modifications.

cheers,
   axel.

> ciao,
> Teo
>
>
>
> > ciao,
> >    axel.




More information about the CP2K-user mailing list