[CP2K:3634] NEB: number of replicas, cpus and input files

Jörg Saßmannshausen j.sassma... at ucl.ac.uk
Thu Nov 24 11:29:27 CET 2011


Hi Theo,

many thanks for your explanation. 

I was thinking along similar lines last night (and noticed that 17 is a bad 
case for 32 compute_slaves) but I think I was too tired and took a wrong 
mental turn here. At least my fog has now lifted and I think the one in London 
is slowly lifting as well, after 3 days of fog ;-)

All the best from a dull London!

Jörg

On Thursday 24 November 2011 09:42:50 Teodoro Laino wrote:
> Jörg,
> 
> I got the problem.. I overlooked into yours (and Carlos') previous post.
> I will try to explain the concept now in a more clear way (possibly).
> 
> In a NEB calculation (but it applies to frequency calculations) you have
> several independent calculations of energy/forces. The number of NEB
> images (i.e. frames) is called NUMBER_OF_REPLICA (maybe in an improper
> way.. but just think of it as NUMBER_OF_IMAGES). This is the reason why,
> if you provided in input 17 images (or replicas), you can NOT ask to BAND
> calculation to use 8! And that's way you get the error in the generation
> of the images.
> 
> Now, if you run in parallel without specifying NPROC_REP, the energy/forces
> are calculated sequentially and each REPLICA (or IMAGE) is using the full
> bunch of procs available.
> Of course, and this is a suggestion for all folks reading this e-mail, if
> you want to make Supercomputer center happy (i.e. running in the top 3% of
> the jobs that normally allocate resources larger than 1K procs on average)
> the key is to allocate a huge number of procs (in principle even the whole
> machine) and split the energy/forces calculations in parallel. How do you
> set that?
> with the NPROC_REP keyword. This keyword is controlling the number of
> computational boxes (let's called them computing slaves) that are
> computing energy/forces. By default the computing slave is 1. Ideally you
> would like to have NUMER_OF_REPLICA computing slaves. To do that you will
> have to allocate a number of mpi tasks which is equal to :
> NUMBER_OF_REPLICA*NPROC_REP, and set properly (to a meaningful value the
> variable NPROC_REP).
> 
> To recap, in your case:
> -) you provided 17 initial frames. This means that the number of replicas
> cannot be less than 17. Larger is ok: the additional frames are generated
> by interpolation. (controlled by NUMER_OF_REPLICA) -) setup NPROC_REP to 8
> -) allocate 32 mpi_tasks.
> 
> This will give you 4 computing slaves (determined by (number of mpi
> tasks)/NPROC_REP). So energy/forces will be computed cycling between 4. Of
> course 17 is the worst choice having 32 mpi tasks, because for the last
> step, 24 procs will be idle and only 8 will be working. Unless there is a
> reason of having 17, I would recommend 16.
> 
> Hope this is clear now.
> Best,
> Teo
> 
> On Nov 24, 2011, at 10:15 AM, Jörg Saßmannshausen wrote:
> > Hi Theo,
> > 
> > sure, no problems. I have attached the complete input file minus the
> > coordinates it as a text file.
> > 
> > I did not post it as the question originally was independent of the input
> > file.
> > 
> > All the best from London
> > 
> > Jörg
> > 
> > On Thursday 24 November 2011 05:39:05 Teodoro Laino wrote:
> >> Hi Jörg,
> >> 
> >> both you and Carlo are correct. That's the way replica/nproc_rep are
> >> working. If the error you get is in : neb_utils:build_replica_coords,
> >> then your problem is somewhere else.
> >> 
> >> Posting the motion section online it would help a lot!
> >> Thanks,
> >> Teo
> >> 
> >> On Nov 24, 2011, at 12:48 AM, Jörg Saßmannshausen wrote:
> >>> Hi Carlo,
> >>> 
> >>> gosh that was quick!
> >>> 
> >>> I got 8 cores per box, so that is where the cores come from.
> >>> I got 17 points I want to run, hence the 17 files.
> >>> 
> >>> Right now I only got 32 cores alltogether.
> >>> 
> >>> So my train of thoughts were similar to yours: 8 replicas and each
> >>> replica is using 4 cores adds up to 32 cores.
> >>> However, that crashed with that error message:
> >>> ERRORL2 in neb_utils:build_replica_coords processor
> >>> 
> >>> So in the end I decided to use 32 replicas and 32 nproc_rep for the 32
> >>> cores I have. I guess one could do better. ,-)
> >>> 
> >>> All the best
> >>> 
> >>> Jörg
> >>> 
> >>> On Mittwoch 23 November 2011 c.pignedoli wrote:
> >>>> Ciao Joerg,
> >>>> why 17 files an 8 replicas?
> >>>> 
> >>>> If i remember well:
> >>>> if you have 8 replicas an specify 100 nproc_rep
> >>>> then if you run with 200 cores
> >>>> 
> >>>> you will have two replicas done at a time (each with 100 cores), then
> >>>> other two then other two finally the last two
> >>>> 
> >>>> and you cicle.
> >>>> 
> >>>> if you run with 800 cores all replicas will proceed in parallel
> >>>> 
> >>>> Ciao
> >>>> 
> >>>> Carlo
> >>>> 
> >>>> On 24/nov/2011, at 00:05, Jörg Saßmannshausen
> >>>> <j.sassma... at ucl.ac.uk>
> >>> 
> >>> wrote:
> >>>>> Dear all,
> >>>>> 
> >>>>> I am a bit confused.
> >>>>> When I am running a frequency calculation or a NEB calculation, I can
> >>>>> specify the number of replicas (NUMBER_OF_REPLICA) and the number of
> >>>>> processors (NPROC_REP) in the MOTION section. Now, lets say I got 32
> >>>>> cores on my hands and I got 17 files which I want to use for the NEB
> >>>>> calculation. Now, my understanding was that I could run say 8
> >>>>> replicas and each replica is using 4 cores. That would add up to the
> >>>>> 32 cores I have alltogether. However, that does not work out. It
> >>>>> appears I got to use 32 replicas and 32 cores for all 32 cores I
> >>>>> have available. What is the relationship between replicas and cores
> >>>>> (in the input file) with the cores I have available on the machine
> >>>>> (an in relation to the number of coordinate files I have). A similar
> >>>>> observation was made for a frequency calculation.
> >>>>> 
> >>>>> I seem to have somehow a wrong understanding here :-/
> >>>>> 
> >>>>> All the best from a foggy London
> >>>>> 
> >>>>> Jörg
> >>>>> 
> >>>>> 
> >>>>> --
> >>>>> *************************************************************
> >>>>> Jörg Saßmannshausen
> >>>>> University College London
> >>>>> Department of Chemistry
> >>>>> Gordon Street
> >>>>> London
> >>>>> WC1H 0AJ
> >>>>> 
> >>>>> email: j.sassma... at ucl.ac.uk
> >>>>> web: http://sassy.formativ.net
> >>>>> 
> >>>>> Please avoid sending me Word or PowerPoint attachments.
> >>>>> See http://www.gnu.org/philosophy/no-word-attachments.html
> >>>>> 
> >>>>> --
> >>>>> You received this message because you are subscribed to the Google
> >>>>> Groups "cp2k" group. To post to this group, send email to
> >>>>> cp... at googlegroups.com. To unsubscribe from this group, send email to
> >>>>> cp2k+uns... at googlegroups.com. For more options, visit this group
> >>>>> at http://groups.google.com/group/cp2k?hl=en.

-- 
*************************************************************
Jörg Saßmannshausen
University College London
Department of Chemistry
Gordon Street
London
WC1H 0AJ 

email: j.sassma... at ucl.ac.uk
web: http://sassy.formativ.net

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html



More information about the CP2K-user mailing list