[CP2K:2213] Re: determinism of CP2K runs
Laino Teodoro
teodor... at gmail.com
Fri Aug 7 20:46:40 UTC 2009
Hi Noam,
I perfectly understand your point (being deterministic is a feature
which is helpful in many situations).
It could be that there are uninitialized things in parallel.
If machine/library have been tested and they guarantee the same
number up to the machine precision with other
codes, independently of the computing cores, than very probably it
could be a bug in parallel (or in some library (ScaLAPACK more
reasonably) routines which are not used by these other codes).
Anyway.. as you said.. let's proceed step by step and first try to
see if you manage to reproduce the same error in serial..
A presto,
Teo
On 7 Aug 2009, at 22:36, Noam Bernstein wrote:
>
> Hi Teo - I thought of all the things you mentioned, but I doubt that
> they are the cause. I'll explain why briefly now (and also why
> I need it to be deterministic, unfortunately :), and I'll have a more
> complete explanation and hopefully a better (smaller, maybe usable
> in serial) test case.
>
> First the reason I need it to be deterministic: I'm running MD, and
> it's chaotic (in the technical sense), so unless I have deterministic
> runs I can't reproduce a trajectory (for example with different output
> options), even from the same input file.
>
> As to why I doubt that it's differences in machines, I've run many
> electronic structure codes on this cluster, and I've never seen
> non-determinism (for a fixed _number_ of processes) except for uses
> of random variables or uninitialized variables (i.e. inadvertently
> random variables). To me this means that the machines are
> reasonably reliable, and that the MPI implementation is deterministic
> (although since floating-point math isn't associative, changing
> the _number_ of processes does change the answer at the level
> of roundoff). I also don't think it's a memory error/cosmic ray
> for the same reason - CP2K is never the same twice (for my test
> input), while other (MPI+ScaLAPACK electronic structure) codes
> always are.
>
> Anyway, I'll work on reproducing the issue either in serial or at
> least
> in parallel on the same set of nodes, so feel free to ignore me
> until then.
>
> thanks,
> Noam
>
> >
More information about the CP2K-user
mailing list