[CP2K:2921] libdbcsr, MPI error

Urban Borštnik urban.b... at gmail.com
Thu Nov 18 15:06:39 UTC 2010


Hello,


On Fri, 2010-11-12 at 10:57 -0800, nadler wrote:
> Dear CP2K comunity,
> 
> I have a problem executing jobs on BladeCenter JS20 and BladeCenter
> JS21 which are installed at the supercomputer center in Madrid. I
> asked the support there to install a "recent" version of CP2K and they
> refered me to the installed version 2.1.397, as of june 2010. Now,
> when running jobs after around 110-120 hours of computation time the
> jobs abort with the message:

> [...] I can't say how the program was compiled. As the support claims the
> program or the input to be buggy I ask here for what I should ask them
> to do or what I can do to prevent this error to happen.The same job
> terminates without any problem on our own cluster where version 2.2.23
> is installed.
> 
it is hard to say what could be wrong without knowing more about the
computer and compile options.  Off the top of my head I would recommend
that none of the three
-D__c_bindings
-D__mpi_f_bindings
-D__cray_pointers
should be set in the ARCH file used to compile the program.

Best regards,
Urban


On Fri, 2010-11-12 at 10:57 -0800, nadler wrote:
> Dear CP2K comunity,
> 
> I have a problem executing jobs on BladeCenter JS20 and BladeCenter
> JS21 which are installed at the supercomputer center in Madrid. I
> asked the support there to install a "recent" version of CP2K and they
> refered me to the installed version 2.1.397, as of june 2010. Now,
> when running jobs after around 110-120 hours of computation time the
> jobs abort with the message:
>  libdbcsr|  MPI error 1641487 in mpi_alltoall @ mp_alltoall_i44 :
> Other MPI error, error stack:
> MPI_Alltoall(711).................: MPI_Alltoall(sbuf=0x4001f33e260,
> scount=6, MPI_INTEGER, rbuf=0x4001f367e60, rcount=6, MPI_INTEGER, MPI_
> COMM_WORLD) failed
> MPIR_Alltoall(175)................:
> MPI_Type_create_indexed_block(166):
> MPI_Type_create_indexed_block(count=8, blocklength=6,
> array_of_displacements=0x4001f368de0, MPI_INTEGER
> , newtype=0xfffffdf62a8) failed
> MPID_Type_vector(57)..............: Out of memory
>  libdbcsr| Abnormal program termination, stopped by process number 9
>  libdbcsr|  MPI error 1008274447 in mpi_alltoall @ mp_alltoall_i44 :
> Other MPI error, error stack:
> [...]
> 
> I checked the cvs tree and there are a lot of dead files related to
> dbscr* saying that libdbscr is now completely independent. Has this
> something to do with the problems I suffer?
> 
> I upload *inp and *out to the board (cp2k-Magerit-problems.tar.gz). I
> can't say how the program was compiled. As the support claims the
> program or the input to be buggy I ask here for what I should ask them
> to do or what I can do to prevent this error to happen.The same job
> terminates without any problem on our own cluster where version 2.2.23
> is installed.
> 
> Thanks a lot for help.
> Cheers,
> Roger
> 




More information about the CP2K-user mailing list