cp2k crashes nodes?
Jörg Saßmannshausen
j.sassma... at ucl.ac.uk
Thu Jul 28 09:55:43 UTC 2011
Dear all,
maybe less of a specific cp2k problem but I was wondering if somebody on the
list has made similar experiences before and could comment on it.
We have a number (18) of a bit dated InfiniBand Opteron dual core 2220 nodes in
one cluster. For that last 6 weeks or so cp2k runs tend to crash the node
(i.e. kernel panic). In order to rule out any OS related problems I have
upgrade the OS to Debian Squeeze and compiled the latest version of cp2k on it
using the gfortran 4.4.5 compiler with the Intel MKL which comes with the
Intel Fortan Compiler 11.1.073. Compilation on that node went without
problems. However, running even a small test (H2O-xrd.inp) on one core crashes
the node with an error message I would relate to a memory problem. I did run
memtest on it before I upgraded the OS and after 9 cycles I could not find any
problems. Right now, about 50 % of the nodes are crashing on a regular base
when cp2k is running on them.
I know that plane wave code is quite memory intense but I find it a bit odd
that memtest runs ok and cp2k crashes the nodes. I would like to rule out any
other possibility but hardware problems. It is easy for me to say that one or
two nodes are gone (due to hardware problems beyond repair) but writing off a
complete cluster is a bit more difficult to explain.
Has anybody made similar experiences and would not mind to share it with me?
It can be off-list if they prefer.
All the best from a sunny London!
Jörg
--
*************************************************************
Jörg Saßmannshausen
University College London
Department of Chemistry
Gordon Street
London
WC1H 0AJ
email: j.sassma... at ucl.ac.uk
web: http://sassy.formativ.net
Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
More information about the CP2K-user
mailing list