[CP2K:3388] Re: cp2k crashes nodes?

Jörg Saßmannshausen j.sassma... at ucl.ac.uk
Mon Aug 1 08:34:01 UTC 2011


Dear Axel,

thanks for your feedback. 

I am new to IB, hence I am still struggling to get the hang of it. We got

InfiniBand: QLogic, Corp. IBA6120 InfiniBand HCA (rev 02)

cards and from ibswitches

ports 24 "InfiniCon Systems InfinIO9024" enhanced port 0 lid 1 lmc 0

I hope that helps a bit. Don't ask me which firmware is running. I most 
certainly did not do any upgrades here. The old OS version is Rocks 4.3 (Mars 
Hill) which is causing all sorts of problems and I also have tried Debian 
squeeze as the 'to be installed' OS.

I think these nodes are mainly used for cp2k and not so much for other 
programs.

How did you cure your problem, apart from simply not running cp2k on these 
nodes?

All the best from a sunny London!

Jörg


On Thursday 28 July 2011 16:38:51 Axel wrote:
> On Thursday, July 28, 2011 5:55:43 AM UTC-4, sassy wrote:
> > Dear all,
> > 
> > maybe less of a specific cp2k problem but I was wondering if somebody on
> > the
> > list has made similar experiences before and could comment on it.
> > 
> > We have a number (18) of a bit dated InfiniBand Opteron dual core 2220
> > nodes in
> > one cluster. For that last 6 weeks or so cp2k runs tend to crash the node
> > (i.e. kernel panic). In order to rule out any OS related problems I have
> > upgrade the OS to Debian Squeeze and compiled the latest version of cp2k
> > on it
> > using the gfortran 4.4.5 compiler with the Intel MKL which comes with the
> > Intel Fortan Compiler 11.1.073. Compilation on that node went without
> 
> you didn't say what version of OFED or else you are running to drive the IB
> cards
> and what type of IB cards to begin with.
> 
> I know that plane wave code is quite memory intense but I find it a bit odd
> 
> > that memtest runs ok and cp2k crashes the nodes. I would like to rule out
> > any
> > other possibility but hardware problems. It is easy for me to say that
> > one or
> > two nodes are gone (due to hardware problems beyond repair) but writing
> > off a
> > complete cluster is a bit more difficult to explain.
> > 
> > Has anybody made similar experiences and would not mind to share it with
> > me?
> > It can be off-list if they prefer.
> 
> i've seen similar behavior with infinipath DDR-IB HCAs on some of our
> nodes. all applications would run well, but the communication pattern of
> cp2k would overload the kernel part of the IB driver and lead to
> intermittent crashes.
> 
> cheers,
>     axel.
> 
> > All the best from a sunny London!
> > 
> > Jörg
> > 
> > 
> > email: j.sas... at ucl.ac.uk
> > web: http://sassy.formativ.net
> > 
> > Please avoid sending me Word or PowerPoint attachments.
> > See http://www.gnu.org/philosophy/no-word-attachments.html

-- 
*************************************************************
Jörg Saßmannshausen
University College London
Department of Chemistry
Gordon Street
London
WC1H 0AJ 

email: j.sassma... at ucl.ac.uk
web: http://sassy.formativ.net

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html



More information about the CP2K-user mailing list