[CP2K:380] Re: Problem running MPI version of NPT

Teodoro Laino teodor... at gmail.com
Sat Nov 3 07:29:44 UTC 2007


Together with Rad, we investigated a little bit more the problem:

1) The job of Rad was stopping after ~180 MD steps with that error  
message.
2) Independently of the number of procs used: 32/64/128/256 .. It  
always crashes at the same point..
3) The regtest-norotho/graphite2 sample is giving him the error :

  Total charge density (g-space):
0.0086530397

************************************************************************ 
*****
  *** 10:24:03 ERRORL2 in cp_fm_cholesky:cp_fm_cholesky_decompose
processor ***
  ***      0  err=-300  condition FAILED at line
116                        ***

************************************************************************ 
*****

  ===== Routine Calling Stack =====

             8 cp_fm_cholesky_decompose
             7 make_preconditioner
             6 init_scf_loop
             5 scf_env_do_scf
             4 qs_energies
             3 qs_forces
             2 qs_mol_dyn_low
             1 CP2K

  CP2K| Stopped by process
number                                               0
  CP2K| Abnormal program termination
======================================================================== 
==============

I tried to run the same input file of Rad on another machine (XT3, 32  
procs) and the job in 1 hour reached 430 MD steps
and I didn't observe any crash due to MPI class errors.

After this summary (specifically supported by point (3)) I believe  
that on the machine on which Rad was originally running
his job SCALAPACK or MPI  (or both, since the error (3) is due to the  
SCALAPACK and it does not show on other machines)
may not be properly installed.

Teo

p.s.: as a further confirm I asked Rad to run a simple NVE example on  
the same machine to see what happens.. (it should also abort
with the same error message)..

On 2 Nov 2007, at 14:00, Nichols A. Romero wrote:

> Rad,
>
> Is this NPT issue reproducible on other computer platforms?
>
> Please test that for us if you can.
>
> On 11/2/07, Juerg Hutter < hut... at pci.uzh.ch> wrote:
>
> Hi
>
> this could be a problem of CP2K or the compiler (or the
> MPI installation).
> If it is a problem of CP2K the obvious question is why
> didn't it show up before. Can you run a small system
> with NPT in parallel? If the error persists please send the
> input. Another thing to test would be if the error
> depends on the number of CPUs.
> CP2K generates and frees MPI groups during calculation.
> If the free command is not matching it is possible that
> the number of groups keeps increasing (similar to a
> memory leak). It is possible that your input causes a
> new route in the code where this happens.
>
> Another possibility is that either the compiler or
> the installed MPI has a not working installation for
> the freeing of communicators.
>
> regards
>
> Juerg Hutter
>
> ----------------------------------------------------------
> Juerg Hutter                   Phone : ++41 44 635 4491
> Physical Chemistry Institute   FAX   : ++41 44 635 6838
> University of Zurich           E-mail: hut... at pci.uzh.ch
> Winterthurerstrasse 190
> CH-8057 Zurich, Switzerland
> ----------------------------------------------------------
>
>
> On Thu, 1 Nov 2007, Rad wrote:
>
> >
> > Dear All,
> >
> > I am trying to perform an NPT ensemble with a MPI compiled code and
> > run into the following error:
> >
> > Please set the environment variable MPI_GROUP_MAX for additional
> > space.
> > MPI has run out of internal group entries.
> > Please set the environment variable MPI_GROUP_MAX for additional
> > space.
> > The current value of MPI_GROUP_MAX is 512
> >
> > I have no problem running the calculation with the serially compiled
> > code (I tried both NPT_I and NPT_F). I tried the MPI run with cell
> > having 56 atoms, expanded to a supercell with 224 atoms, changed the
> > ranks to 64, 32, 16, 8, temperatures 2.5 K , 200 K, 300 K, various
> > pressures (1 bar, 50 bars) etc and I get the same error.
> >
> > The code is compiled on a IA64 Linux cluster using Intel compiler
> > (version 9.1).
> >
> > Please let me know if you have any suggestions and would like to  
> know
> > whether the NPT portion is tested for different MPI  
> architectures. If
> > it has been tested on a particular arch let me know I will run it on
> > the same arch.
> >
> > Thanks
> > Rad
> >
> >
> > >
> >
>
> Ph.D.
> DoD User Productivity Enhancement and Technology Transfer (PET) Group
> High Performance Technologies, Inc.
> Reston, VA
> 443-567-8328 (C)
> 410-278-2692 (O)
> >

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20071103/a45dd4fb/attachment.htm>


More information about the CP2K-user mailing list