cp2k crashes nodes?

Jörg Saßmannshausen j.sassma... at ucl.ac.uk
Thu Jul 28 09:55:43 UTC 2011


Dear all,

maybe less of a specific cp2k problem but I was wondering if somebody on the 
list has made similar experiences before and could comment on it.

We have a number (18) of a bit dated InfiniBand Opteron dual core 2220 nodes in 
one cluster. For that last 6 weeks or so cp2k runs tend to crash the node 
(i.e. kernel panic). In order to rule out any OS related problems I have 
upgrade the OS to Debian Squeeze and compiled the latest version of cp2k on it 
using the gfortran 4.4.5 compiler with the Intel MKL which comes with the 
Intel Fortan Compiler 11.1.073. Compilation on that node went without 
problems. However, running even a small test (H2O-xrd.inp) on one core crashes 
the node with an error message I would relate to a memory problem. I did run 
memtest on it before I upgraded the OS and after 9 cycles I could not find any 
problems. Right now, about 50 % of the nodes are crashing on a regular base 
when cp2k is running on them. 

I know that plane wave code is quite memory intense but I find it a bit odd 
that memtest runs ok and cp2k crashes the nodes. I would like to rule out any 
other possibility but hardware problems. It is easy for me to say that one or 
two nodes are gone (due to hardware problems beyond repair) but writing off a 
complete cluster is a bit more difficult to explain. 

Has anybody made similar experiences and would not mind to share it with me? 
It can be off-list if they prefer.

All the best from a sunny London!

Jörg

-- 
*************************************************************
Jörg Saßmannshausen
University College London
Department of Chemistry
Gordon Street
London
WC1H 0AJ 

email: j.sassma... at ucl.ac.uk
web: http://sassy.formativ.net

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html



More information about the CP2K-user mailing list