parallel distribution of data

Axel akoh... at gmail.com
Mon Mar 10 22:49:26 UTC 2008



On Mar 10, 5:49 pm, "Nichols A. Romero" <naro... at gmail.com> wrote:
> Teo,
>
> I was just able to reproduce this on on another machine.http://www.mhpcc.hpc.mil/doc/jaws.html
>
> I just ran it on 256 processors. Compiled it with ifort 9.1.045 and mvapich
> 1.2.7.
> I attach the arch file.

nick,

here's another caveat which has most likely nothing to do
with the immediate error that you are seeing, but may bite
you later.

when running on large infiniband clusters, you may have to limit
the number of processes per node. for the way openfabrics seems
to work (at least at the moment) you need _physical_ memory
as "backing store" for each RDMA connection, i.e. for each MPI
task you'll lose some physical memory regardless of the memory
requirements of your jobs. i've seen this on the NCSA 'abe' cluster
where i ran out of memory for rather small jobs despite having 1GB/
core
simply by increasing the requested number of cpus. also, you
may get better performance by using half the cpu cores requested.
i had to go down to a quarter (abe is dual quad-core, though) for
really big jobs. :-(

cheers,
    axel.




>
> Here is the error that I am seeing.
>
> Out of memory ...
>
>  *
>  *** ERROR in get_my_tasks  ***
>  *
>
>  *** The memory allocation for the data object <send_buf_r> failed. The  ***
>  *** requested memory size is 1931215 Kbytes                             ***
>

[...]


> --
> Nichols A. Romero, Ph.D.
> DoD User Productivity Enhancement and Technology Transfer (PET) Group
> High Performance Technologies, Inc.
> Reston, VA
> 443-567-8328 (C)
> 410-278-2692 (O)
>
>  Linux-x86-64-intel.popt
> 1KDownload


More information about the CP2K-user mailing list