Absolute validation
Toon Verstraelen
Toon.Ver... at UGent.be
Wed Nov 3 21:50:01 UTC 2010
Hi All,
I've mentioned in a previous post that the reliability of CP2K is
sometimes disappointing. In part, I blamed CVS and the centralized
development approach. (See previous post.) Part of it is due to the lack
of absolute validation.
CP2K has regression tests that are helpful when coding, but not for
validation. do_regtest triggers differences between outputs generated by
two version of CP2K, but it does not tell which version is wrong. When
coding in Fist I notice that about 20% of the regtest errors are due to
bugs in the old version, 80% is due to bugs in my own patches.
Conclusion: one needs absolute tests for validation, and not a
comparison between different versions. Based on a single execution of
the tests it must be able to say: "yes, feature X works" or "no, feature
X fails."
Writing absolute tests is not as easy as making regressions tests, but
they reduce the time wasted on debugging code. In the end it also pays
of from the developer perspective. There are several ways to make
absolute tests. Just a few random ideas:
- The run_type debug has a few nice examples. One may add more type of
consistency tests: 'debug conserved quantity', 'debug functional
derivative (for QS)', 'debug electrostatic potential', etc.
- The restart feature is suitable for absolute testing in a rather
trivial way.
- For Fist, it is easy to create simple inputs for which the output can
be reproduced with a few lines of Python.
- Parts of the QMMM output can be compared directly with independent
Fist and QS jobs. The same can be done with multiple force evals.
There are probably many other possibilities. It is just a matter of
creativity.
I've been using my own absolute testing framework to validate patches
for Fist. I can work on a more general version that also handles the
regression tests and hooks well on the debug run_type. It should be good
enough to replace do_regtest. Is there any chance that other developers
would start using it? Otherwise it does not make sense to work on it.
cheers,
Toon
More information about the CP2K-user
mailing list