[CP2K:471] Re: mpich? problems on a linux cluster
carlo antonio pignedoli
c.pig... at gmail.com
Fri Dec 14 13:05:25 UTC 2007
Dear Axel,
sorry for being late in my reply but I'm still doing tests
We put the ulimit to unlimited and we activated jumbo frames
the two things seems to have solved the problem but I want to
make more severe tests.
For what about the scaling..
well of course we are not speaking about
an infiniband or even a blue gene or whatever else but.. it's not that bad:
a rough estimation from 10 to 20 nodes (40 to 80 cpus) gives
time reduction of 37% instead of the ideal 50% that is more than what
I expected for my present needs.
Thank you very much for your help
Carlo
On Dec 7, 2007 8:45 PM, Axel <akoh... at gmail.com> wrote:
>
> hi carlo,
>
> On Dec 7, 11:10 am, "carlo antonio pignedoli" <c.pig... at gmail.com>
> wrote:
> > Dear Axel
> > we are using the normal gigabit.
>
> please check your scaling. my hunch is that you'll find out
> that there is little or no improvement to go over the network.
>
> it is also quite likely that you overload your network (or the
> switch).
> it depends a bit on your input, tho.
>
> > I did the dmesg and... well I'm not an expert, I got something that
> > looks like an error.
>
> well, you'll get at least a message indicating the segfault.
> hard to tell without seeing it. only the last 10-20 lines will
> probably do.
>
> > for the ulimit -a I have
> >
> > core file size (blocks, -c) 0
> > data seg size (kbytes, -d) unlimited
> > file size (blocks, -f) unlimited
> > pending signals (-i) 38912
> > max locked memory (kbytes, -l) 32
> > max memory size (kbytes, -m) unlimited
> > open files (-n) 1024
> > pipe size (512 bytes, -p) 8
> > POSIX message queues (bytes, -q) 819200
> > stack size (kbytes, -s) 8192
>
> this is "small". it is thus possible that your
> executable was running ok until the frist completed
> SCF and then needed some extra memory which was not
> available from the stack.
>
> > cpu time (seconds, -t) unlimited
> > max user processes (-u) 38912
> > virtual memory (kbytes, -v) unlimited
> > file locks (-x) unlimited
> >
> > what is your suggestion for a reasonable value.
>
> at least 10 times as large. i usually set it to 1GB
> or unlimited if i test on a machine where i have to
> change this.
>
> ciao,
> axel.
>
> >
> > Thanks a lot
> >
> > Carlo
>
> >
> > On Dec 7, 2007 4:38 PM, Axel <akoh... at gmail.com> wrote:
> >
> >
> >
> > > carlo,
> >
> > > one more thing that may be important: what interconnect
> > > do you have and is it working correctly under high load?
> >
> > > cp2k is very demanding and i've run across multiple machines
> > > (myrinet/infiniband) where the MPI runtime settings needed to
> > > be tweaked to have the job run reliably. i suggest you log into
> > > the failing node and have a look at the kernel message buffer
> > > with "dmesg" and see if there is anything suspicious.
> >
> > > the second option when you see segmentation faults with intel
> > > compilers is the lack of sufficient stack size. for historical
> > > reasons, the intel fortran frontend allocates temporary arrays
> > > by default on the stack instead of the heap. please check your
> > > cluster nodes for whether the stack segment is large enough
> > > (ulimit -a), and have the sysadmins increase it if needed.
> >
> > > a second option is to reset the stack size from within cp2k, but
> > > that requires some (ugly?) modifications of the code and they need
> > > to be in c. i'll put an updated version of those into the files
> > > section later.
> >
> > > the third options is to use the -heap-arrays flag, which is only
> > > supported by intel compilers 10.0 and later.
> >
> > > hope that helps,
> > > axel.
> >
> > > On Dec 7, 7:59 am, "carlo antonio pignedoli" <c.pig... at gmail.com>
> > > wrote:
> > > > Ciao Teo,
> >
> > > > we are using the cmkl libraries
> > > > intel clustertoolkit for linux
> > > > version 9.1
> >
>
More information about the CP2K-user
mailing list