Teo,<br><br>I was just able to reproduce this on on another machine.<br><a href="http://www.mhpcc.hpc.mil/doc/jaws.html">http://www.mhpcc.hpc.mil/doc/jaws.html</a><br><br>I just ran it on 256 processors. Compiled it with ifort 9.1.045 and mvapich 1.2.7.<br>
I attach the arch file.<br><br>Here is the error that I am seeing. <br><br>Out of memory ...<br><br><br> *<br> *** ERROR in get_my_tasks ***<br> *<br><br><br> *** The memory allocation for the data object <send_buf_r> failed. The ***<br>
*** requested memory size is 1931215 Kbytes ***<br><br><br> ===== Routine Calling Stack =====<br><br> 8 distribute_matrix<br> 7 calculate_rho_elec<br> 6 scf_env_initial_rho_setup<br>
5 init_scf_run<br> 4 qs_energies<br> 3 qs_forces<br> 2 qs_mol_dyn_low<br> 1 CP2K<br> CP2K| Abnormal program termination, stopped by process number 231<br>[231] [MPI Abort by user] Aborting Program!<br>
<br><br> *<br> *** ERROR in pack_matrix almost there ***<br> *<br><br><br> *** Matrix block not found ***<br><br><br> ===== Routine Calling Stack =====<br><br> 8 distribute_matrix<br> 7 calculate_rho_elec<br>
6 scf_env_initial_rho_setup<br> 5 init_scf_run<br> 4 qs_energies<br> 3 qs_forces<br> 2 qs_mol_dyn_low<br> 1 CP2K<br> CP2K| Abnormal program termination, stopped by process number 212<br>
<br><br><div class="gmail_quote">On Mon, Mar 10, 2008 at 5:38 PM, Teodoro Laino <<a href="mailto:teodor...@gmail.com">teodor...@gmail.com</a>> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div style="">
<div>Ciao Nick,</div>I can run it..<div>I need to know exactly the setup of your job.. i.e. the amount of procs you're using for the H2O-2048..</div><div><br></div><div>Teo</div><div><div></div><div class="Wj3C7c"><div>
<br><div><div>On 10 Mar 2008, at 21:00, Nichols A. Romero wrote:</div><br><blockquote type="cite">Guys,<br><br>I've been doing some testing with the some of the standard benchmarks cases. H2O-1024, 2048, 4096, etc.<br>
<br>H20-1024 runs with the distributed and replicated. H2O-2048 runs only with the replicated.<br> <br>Can someone else try to run the H20-2048 with the distributed data algorithm to see if they get that same error?<br>It happens right after initial guess, before the OT starts.<br>
<p><font face="Courier New" size="2"> *</font> <br><font face="Courier New" size="2"> *** ERROR in pack_matrix almost there ***</font> <br><font face="Courier New" size="2"> *</font> </p> <br><p><font face="Courier New" size="2"> *** Matrix block not found ***</font></p>
<p><font face="Courier New" size="2"><br></font> </p><br><div class="gmail_quote">On Fri, Feb 29, 2008 at 5:31 PM, Matt W <<a href="mailto:MattWa...@gmail.com" target="_blank">MattWa...@gmail.com</a>> wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> <br> Engaging brain a bit harder (it is Friday night here) a possible cause<br> is an overflow in the routine pair2int (realspace_task_selection, line<br>
72)...but I hope not.<br> <br> If any processor writes res as negative then this is the problem...<br> <font color="#888888"><br> Matt<br> </font><div><br> On Feb 29, 9:23 pm, "Nichols A. Romero" <<a href="mailto:naro...@gmail.com" target="_blank">naro...@gmail.com</a>> wrote:<br>
> Guys,<br> ><br> > Thanks for all the help. I would need to ask permission from the user before<br> > I can<br> > post the input file. He is gone for the day already though. It is also very<br> > large and we should<br>
> try to reproduce the error with something much smaller. At the moment, 4096<br> > & 8192 water<br> > benchmark input file (from the CP2K test cases) suffers from the same error<br> > message.<br> > It has nothing to do with periodicity.<br>
><br> > Have others been able to run the 4096 & 8192 water benchmarks with<br> > distribution_type distributed?<br> > If so, then maybe it is something that is computer specific. A much smaller<br> > test case, of 3x3 water<br>
> molecules ran without a problem. I've also run some other calculations in<br> > the 2000 atom range.<br> ><br> > To answer Matt's, questions I don't know. I will run again to find out.<br> ><br>
> Teo, I will work on the replicated case and get back to you.<br> ><br> ><br> ><br> </div><div><div></div><div>> On Fri, Feb 29, 2008 at 4:01 PM, Matt W <<a href="mailto:MattWa...@gmail.com" target="_blank">MattWa...@gmail.com</a>> wrote:<br>
><br> > > Hi Nichols,<br> ><br> > > as Teo says extra details would help. The slightly esoteric message<br> > > indicates that the processor involved doesn't possess the density<br> > > matrix block that it thinks it should have. Is the crash immediate,<br>
> > do you get an initial density....<br> ><br> > > It's not a problem with the size of vacuum, but it could be with the<br> > > non-periodic boundary conditions.<br> ><br> > > Matt<br> ><br>
> > On Feb 29, 8:46 pm, Teodoro Laino <<a href="mailto:teodor...@gmail.com" target="_blank">teodor...@gmail.com</a>> wrote:<br> > > > Hi Nick,<br> ><br> > > > for DFT this is the section you need to check:<br>
><br> > > ><a href="http://cp2k.berlios.de/input/" target="_blank">http://cp2k.berlios.de/input/</a><br> > > > InputReference~__ROOT__~FORCE_EVAL~DFT~MGRID~RS_GRID.html<br> ><br> > > > in particular distribution_type therein..<br>
> > > Ciao,<br> > > > teo<br> ><br> > > > On 29 Feb 2008, at 21:43, Nichols A. Romero wrote:<br> ><br> > > > > Ciao Teo,<br> ><br> > > > > I cannot find the keyword to do realspace replicated. Can you help?<br>
><br> > > > > On Fri, Feb 29, 2008 at 3:31 PM, Teodoro Laino<br> > > > > <<a href="mailto:teodor...@gmail.com" target="_blank">teodor...@gmail.com</a>> wrote:<br> > > > > Ciao Nick,<br>
><br> > > > > Looks like it is a problem with the new real-space distirbution..<br> > > > > can you try<br> ><br> > > > > the realspace distribution -> replicated ?<br> > > > > Does it work?<br>
><br> > > > > In case I guess people working on that need an input file (even a<br> > > > > fake one) reproducing the same error to debug the problem..<br> ><br> > > > > Thanks Nick!<br>
> > > > teo<br> ><br> > > > > On 29 Feb 2008, at 21:23, Nichols A. Romero wrote:<br> ><br> > > > >> Hi,<br> ><br> > > > >> We are working on a very large system size ~ 4000 atoms. It is<br>
> > > >> finite system and<br> > > > >> there is about 20 Bohr of vacuum on all sides. (Probably overkill).<br> ><br> > > > >> I think the error that I am receiving has to do with the parallel<br>
> > > >> distribution of<br> > > > >> the data. Would the distribution algorithm fail if there is too<br> > > > >> much vacuum perhaps?<br> ><br> > > > >> Here is the error message. BTW, we seem to be able to run the 4096<br>
> > > >> & 8196 test<br> > > > >> cases.<br> ><br> > > > >> Extrapolation method: initial_guess<br> ><br> > > > >> *<br> > > > >> *** ERROR in pack_matrix almost there ***<br>
> > > >> *<br> ><br> > > > >> *** Matrix block not found ***<br> ><br> > > > >> --<br> > > > >> Nichols A. Romero, Ph.D.<br> > > > >> DoD User Productivity Enhancement and Technology Transfer (PET) Group<br>
> > > >> High Performance Technologies, Inc.<br> > > > >> Reston, VA<br> > > > >> 443-567-8328 (C)<br> > > > >> 410-278-2692 (O)<br> ><br> > > > > --<br>
> > > > Nichols A. Romero, Ph.D.<br> > > > > DoD User Productivity Enhancement and Technology Transfer (PET) Group<br> > > > > High Performance Technologies, Inc.<br> > > > > Reston, VA<br>
> > > > 443-567-8328 (C)<br> > > > > 410-278-2692 (O)<br> ><br> > --<br> > Nichols A. Romero, Ph.D.<br> > DoD User Productivity Enhancement and Technology Transfer (PET) Group<br> > High Performance Technologies, Inc.<br>
> Reston, VA<br> > 443-567-8328 (C)<br> > 410-278-2692 (O)<br> <br> </div></div></blockquote></div><br><br clear="all"><br>-- <br>Nichols A. Romero, Ph.D.<br>DoD User Productivity Enhancement and Technology Transfer (PET) Group<br>
High Performance Technologies, Inc.<br>Reston, VA<br>443-567-8328 (C)<br> 410-278-2692 (O)<br> <br> <br></blockquote></div><br></div><br>
</div></div></div><br>
</blockquote></div><br><br clear="all"><br>-- <br>Nichols A. Romero, Ph.D.<br>DoD User Productivity Enhancement and Technology Transfer (PET) Group<br>High Performance Technologies, Inc.<br>Reston, VA<br>443-567-8328 (C)<br>
410-278-2692 (O)