Absolute validation

Toon Verstraelen Toon.Ver... at UGent.be
Wed Nov 3 21:50:01 UTC 2010


Hi All,

I've mentioned in a previous post that the reliability of CP2K is 
sometimes disappointing. In part, I blamed CVS and the centralized 
development approach. (See previous post.) Part of it is due to the lack 
of absolute validation.

CP2K has regression tests that are helpful when coding, but not for 
validation. do_regtest triggers differences between outputs generated by 
two version of CP2K, but it does not tell which version is wrong. When 
coding in Fist I notice that about 20% of the regtest errors are due to 
bugs in the old version, 80% is due to bugs in my own patches. 
Conclusion: one needs absolute tests for validation, and not a 
comparison between different versions. Based on a single execution of 
the tests it must be able to say: "yes, feature X works" or "no, feature 
X fails."

Writing absolute tests is not as easy as making regressions tests, but 
they reduce the time wasted on debugging code. In the end it also pays 
of from the developer perspective. There are several ways to make 
absolute tests. Just a few random ideas:

- The run_type debug has a few nice examples. One may add more type of 
consistency tests: 'debug conserved quantity', 'debug functional 
derivative (for QS)', 'debug electrostatic potential', etc.

- The restart feature is suitable for absolute testing in a rather 
trivial way.

- For Fist, it is easy to create simple inputs for which the output can 
be reproduced with a few lines of Python.

- Parts of the QMMM output can be compared directly with independent 
Fist and QS jobs. The same can be done with multiple force evals.

There are probably many other possibilities. It is just a matter of 
creativity.

I've been using my own absolute testing framework to validate patches 
for Fist. I can work on a more general version that also handles the 
regression tests and hooks well on the debug run_type. It should be good 
enough to replace do_regtest. Is there any chance that other developers 
would start using it? Otherwise it does not make sense to work on it.

cheers,

Toon





More information about the CP2K-user mailing list