libdbcsr, MPI error

nadler rod... at gmx.ch
Fri Nov 12 18:57:41 UTC 2010


Dear CP2K comunity,

I have a problem executing jobs on BladeCenter JS20 and BladeCenter
JS21 which are installed at the supercomputer center in Madrid. I
asked the support there to install a "recent" version of CP2K and they
refered me to the installed version 2.1.397, as of june 2010. Now,
when running jobs after around 110-120 hours of computation time the
jobs abort with the message:
 libdbcsr|  MPI error 1641487 in mpi_alltoall @ mp_alltoall_i44 :
Other MPI error, error stack:
MPI_Alltoall(711).................: MPI_Alltoall(sbuf=0x4001f33e260,
scount=6, MPI_INTEGER, rbuf=0x4001f367e60, rcount=6, MPI_INTEGER, MPI_
COMM_WORLD) failed
MPIR_Alltoall(175)................:
MPI_Type_create_indexed_block(166):
MPI_Type_create_indexed_block(count=8, blocklength=6,
array_of_displacements=0x4001f368de0, MPI_INTEGER
, newtype=0xfffffdf62a8) failed
MPID_Type_vector(57)..............: Out of memory
 libdbcsr| Abnormal program termination, stopped by process number 9
 libdbcsr|  MPI error 1008274447 in mpi_alltoall @ mp_alltoall_i44 :
Other MPI error, error stack:
[...]

I checked the cvs tree and there are a lot of dead files related to
dbscr* saying that libdbscr is now completely independent. Has this
something to do with the problems I suffer?

I upload *inp and *out to the board (cp2k-Magerit-problems.tar.gz). I
can't say how the program was compiled. As the support claims the
program or the input to be buggy I ask here for what I should ask them
to do or what I can do to prevent this error to happen.The same job
terminates without any problem on our own cluster where version 2.2.23
is installed.

Thanks a lot for help.
Cheers,
Roger


More information about the CP2K-user mailing list