[CP2K:769] Re: parallel distribution of data

Nichols A. Romero naro... at gmail.com
Mon Mar 10 21:49:40 UTC 2008


Teo,

I was just able to reproduce this on on another machine.
http://www.mhpcc.hpc.mil/doc/jaws.html

I just ran it on 256 processors. Compiled it with ifort 9.1.045 and mvapich
1.2.7.
I attach the arch file.

Here is the error that I am seeing.

Out of memory ...


 *
 *** ERROR in get_my_tasks  ***
 *


 *** The memory allocation for the data object <send_buf_r> failed. The  ***
 *** requested memory size is 1931215 Kbytes                             ***


 ===== Routine Calling Stack =====

            8 distribute_matrix
            7 calculate_rho_elec
            6 scf_env_initial_rho_setup
            5 init_scf_run
            4 qs_energies
            3 qs_forces
            2 qs_mol_dyn_low
            1 CP2K
 CP2K| Abnormal program termination, stopped by process number 231
[231] [MPI Abort by user] Aborting Program!


 *
 *** ERROR in pack_matrix almost there  ***
 *


 *** Matrix block not found  ***


 ===== Routine Calling Stack =====

            8 distribute_matrix
            7 calculate_rho_elec
            6 scf_env_initial_rho_setup
            5 init_scf_run
            4 qs_energies
            3 qs_forces
            2 qs_mol_dyn_low
            1 CP2K
 CP2K| Abnormal program termination, stopped by process number 212


On Mon, Mar 10, 2008 at 5:38 PM, Teodoro Laino <teodor... at gmail.com>
wrote:

> Ciao Nick,
> I can run it..I need to know exactly the setup of your job.. i.e. the
> amount of procs you're using for the H2O-2048..
>
> Teo
>
> On 10 Mar 2008, at 21:00, Nichols A. Romero wrote:
>
> Guys,
>
> I've been doing some testing with the some of the standard benchmarks
> cases. H2O-1024, 2048, 4096, etc.
>
> H20-1024 runs with the distributed and replicated. H2O-2048 runs only with
> the replicated.
>
> Can someone else try to run the H20-2048 with the distributed data
> algorithm to see if they get that same error?
> It happens right after initial guess, before the OT starts.
>
>  *
>  *** ERROR in pack_matrix almost there  ***
>  *
>
>  *** Matrix block not found  ***
>
>
>
> On Fri, Feb 29, 2008 at 5:31 PM, Matt W <MattWa... at gmail.com> wrote:
>
> >
> > Engaging brain a bit harder (it is Friday night here) a possible cause
> > is an overflow in the routine pair2int (realspace_task_selection, line
> > 72)...but I hope not.
> >
> > If any processor writes res as negative then this is the problem...
> >
> > Matt
> >
> > On Feb 29, 9:23 pm, "Nichols A. Romero" <naro... at gmail.com> wrote:
> > > Guys,
> > >
> > > Thanks for all the help. I would need to ask permission from the user
> > before
> > > I can
> > > post the input file. He is gone for the day already though. It is also
> > very
> > > large and we should
> > > try to reproduce the error with something much smaller. At the moment,
> > 4096
> > > & 8192 water
> > > benchmark input file (from the CP2K test cases) suffers from the same
> > error
> > > message.
> > > It has nothing to do with periodicity.
> > >
> > > Have others been able to run the 4096 & 8192 water benchmarks with
> > > distribution_type distributed?
> > > If so, then maybe it is something that is computer specific. A much
> > smaller
> > > test case, of 3x3 water
> > > molecules ran without a problem. I've also run some other calculations
> > in
> > > the 2000 atom range.
> > >
> > > To answer Matt's, questions I don't know. I will run again to find
> > out.
> > >
> > > Teo, I will work on the replicated case and get back to you.
> > >
> > >
> > >
> > > On Fri, Feb 29, 2008 at 4:01 PM, Matt W <MattWa... at gmail.com>
> > wrote:
> > >
> > > > Hi Nichols,
> > >
> > > > as Teo says extra details would help.  The slightly esoteric message
> > > > indicates that the processor involved doesn't possess the density
> > > > matrix block that it thinks it should have.  Is the crash immediate,
> > > > do you get an initial density....
> > >
> > > > It's not a problem with the size of vacuum, but it could be with the
> > > > non-periodic boundary conditions.
> > >
> > > > Matt
> > >
> > > > On Feb 29, 8:46 pm, Teodoro Laino <teodor... at gmail.com> wrote:
> > > > > Hi Nick,
> > >
> > > > > for DFT this is the section you need to check:
> > >
> > > > >http://cp2k.berlios.de/input/
> > > > > InputReference~__ROOT__~FORCE_EVAL~DFT~MGRID~RS_GRID.html
> > >
> > > > > in particular distribution_type therein..
> > > > > Ciao,
> > > > > teo
> > >
> > > > > On 29 Feb 2008, at 21:43, Nichols A. Romero wrote:
> > >
> > > > > > Ciao Teo,
> > >
> > > > > > I cannot find the keyword to do realspace replicated. Can you
> > help?
> > >
> > > > > > On Fri, Feb 29, 2008 at 3:31 PM, Teodoro Laino
> > > > > > <teodor... at gmail.com> wrote:
> > > > > > Ciao Nick,
> > >
> > > > > > Looks like it is a problem with the new real-space
> > distirbution..
> > > > > > can you try
> > >
> > > > > > the realspace distribution -> replicated ?
> > > > > > Does it work?
> > >
> > > > > > In case I guess people working on that need an input file (even
> > a
> > > > > > fake one) reproducing the same error to debug the problem..
> > >
> > > > > > Thanks Nick!
> > > > > > teo
> > >
> > > > > > On 29 Feb 2008, at 21:23, Nichols A. Romero wrote:
> > >
> > > > > >> Hi,
> > >
> > > > > >> We are working on a very large system size ~ 4000 atoms. It is
> > > > > >> finite system and
> > > > > >> there is about 20 Bohr of vacuum on all sides. (Probably
> > overkill).
> > >
> > > > > >> I think the error that I am receiving has to do with the
> > parallel
> > > > > >> distribution of
> > > > > >> the data. Would the distribution algorithm fail if there is too
> > > > > >> much vacuum perhaps?
> > >
> > > > > >> Here is the error message. BTW, we seem to be able to run the
> > 4096
> > > > > >> & 8196 test
> > > > > >> cases.
> > >
> > > > > >>   Extrapolation method: initial_guess
> > >
> > > > > >>  *
> > > > > >>  *** ERROR in pack_matrix almost there  ***
> > > > > >>  *
> > >
> > > > > >>  *** Matrix block not found  ***
> > >
> > > > > >> --
> > > > > >> Nichols A. Romero, Ph.D.
> > > > > >> DoD User Productivity Enhancement and Technology Transfer (PET)
> > Group
> > > > > >> High Performance Technologies, Inc.
> > > > > >> Reston, VA
> > > > > >> 443-567-8328 (C)
> > > > > >> 410-278-2692 (O)
> > >
> > > > > > --
> > > > > > Nichols A. Romero, Ph.D.
> > > > > > DoD User Productivity Enhancement and Technology Transfer (PET)
> > Group
> > > > > > High Performance Technologies, Inc.
> > > > > > Reston, VA
> > > > > > 443-567-8328 (C)
> > > > > > 410-278-2692 (O)
> > >
> > > --
> > > Nichols A. Romero, Ph.D.
> > > DoD User Productivity Enhancement and Technology Transfer (PET) Group
> > > High Performance Technologies, Inc.
> > > Reston, VA
> > > 443-567-8328 (C)
> > > 410-278-2692 (O)
> >
> >
>
>
> --
> Nichols A. Romero, Ph.D.
> DoD User Productivity Enhancement and Technology Transfer (PET) Group
> High Performance Technologies, Inc.
> Reston, VA
> 443-567-8328 (C)
> 410-278-2692 (O)
>
>
>
>
> >
>


-- 
Nichols A. Romero, Ph.D.
DoD User Productivity Enhancement and Technology Transfer (PET) Group
High Performance Technologies, Inc.
Reston, VA
443-567-8328 (C)
410-278-2692 (O)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20080310/8d211599/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Linux-x86-64-intel.popt
Type: application/octet-stream
Size: 1178 bytes
Desc: not available
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20080310/8d211599/attachment.obj>


More information about the CP2K-user mailing list