[CP2K:3633] NEB: number of replicas, cpus and input files

Teodoro Laino teodor... at gmail.com
Thu Nov 24 09:42:50 UTC 2011


Jörg,

I got the problem.. I overlooked into yours (and Carlos') previous post.
I will try to explain the concept now in a more clear way (possibly).

In a NEB calculation (but it applies to frequency calculations) you have several independent calculations of energy/forces.
The number of NEB images (i.e. frames) is called NUMBER_OF_REPLICA (maybe in an improper way.. but just think of it as NUMBER_OF_IMAGES).
This is the reason why, if you provided in input 17 images (or replicas), you can NOT ask to BAND calculation to use 8!
And that's way you get the error in the generation of the images.

Now, if you run in parallel without specifying NPROC_REP, the energy/forces are calculated sequentially and each REPLICA (or IMAGE) is using the
full bunch of procs available.
Of course, and this is a suggestion for all folks reading this e-mail, if you want to make Supercomputer center happy (i.e. running in the top 3% of the jobs that normally allocate resources larger than 1K procs on average) the key is to allocate a huge number of procs (in principle even the whole machine) and split the energy/forces calculations in parallel.
How do you set that?
with the NPROC_REP keyword. This keyword is controlling the number of computational boxes (let's called them computing slaves) that are computing energy/forces. By default the computing slave is 1. Ideally you would like to have NUMER_OF_REPLICA computing slaves. To do that you will have to allocate a number of mpi tasks which is equal to : NUMBER_OF_REPLICA*NPROC_REP, and set properly (to a meaningful value the variable NPROC_REP).

To recap, in your case:
-) you provided 17 initial frames. This means that the number of replicas cannot be less than 17. Larger is ok: the additional frames are generated by interpolation. (controlled by NUMER_OF_REPLICA)
-) setup NPROC_REP to 8
-) allocate 32 mpi_tasks.

This will give you 4 computing slaves (determined by (number of mpi tasks)/NPROC_REP). So energy/forces will be computed cycling between 4. Of course 17 is the worst choice having 32 mpi tasks, because for the last step, 24 procs will be idle and only 8 will be working.
Unless there is a reason of having 17, I would recommend 16.

Hope this is clear now.
Best,
Teo


On Nov 24, 2011, at 10:15 AM, Jörg Saßmannshausen wrote:

> Hi Theo,
> 
> sure, no problems. I have attached the complete input file minus the 
> coordinates it as a text file.
> 
> I did not post it as the question originally was independent of the input file.
> 
> All the best from London
> 
> Jörg
> 
> On Thursday 24 November 2011 05:39:05 Teodoro Laino wrote:
>> Hi Jörg,
>> 
>> both you and Carlo are correct. That's the way replica/nproc_rep are
>> working. If the error you get is in : neb_utils:build_replica_coords, then
>> your problem is somewhere else.
>> 
>> Posting the motion section online it would help a lot!
>> Thanks,
>> Teo
>> 
>> On Nov 24, 2011, at 12:48 AM, Jörg Saßmannshausen wrote:
>>> Hi Carlo,
>>> 
>>> gosh that was quick!
>>> 
>>> I got 8 cores per box, so that is where the cores come from.
>>> I got 17 points I want to run, hence the 17 files.
>>> 
>>> Right now I only got 32 cores alltogether.
>>> 
>>> So my train of thoughts were similar to yours: 8 replicas and each
>>> replica is using 4 cores adds up to 32 cores.
>>> However, that crashed with that error message:
>>> ERRORL2 in neb_utils:build_replica_coords processor
>>> 
>>> So in the end I decided to use 32 replicas and 32 nproc_rep for the 32
>>> cores I have. I guess one could do better. ,-)
>>> 
>>> All the best
>>> 
>>> Jörg
>>> 
>>> On Mittwoch 23 November 2011 c.pignedoli wrote:
>>>> Ciao Joerg,
>>>> why 17 files an 8 replicas?
>>>> 
>>>> If i remember well:
>>>> if you have 8 replicas an specify 100 nproc_rep
>>>> then if you run with 200 cores
>>>> 
>>>> you will have two replicas done at a time (each with 100 cores), then
>>>> other two then other two finally the last two
>>>> 
>>>> and you cicle.
>>>> 
>>>> if you run with 800 cores all replicas will proceed in parallel
>>>> 
>>>> Ciao
>>>> 
>>>> Carlo
>>>> 
>>>> On 24/nov/2011, at 00:05, Jörg Saßmannshausen
>>>> <j.sassma... at ucl.ac.uk>
>>> 
>>> wrote:
>>>>> Dear all,
>>>>> 
>>>>> I am a bit confused.
>>>>> When I am running a frequency calculation or a NEB calculation, I can
>>>>> specify the number of replicas (NUMBER_OF_REPLICA) and the number of
>>>>> processors (NPROC_REP) in the MOTION section. Now, lets say I got 32
>>>>> cores on my hands and I got 17 files which I want to use for the NEB
>>>>> calculation. Now, my understanding was that I could run say 8 replicas
>>>>> and each replica is using 4 cores. That would add up to the 32 cores I
>>>>> have alltogether. However, that does not work out. It appears I got to
>>>>> use 32 replicas and 32 cores for all 32 cores I have available. What is
>>>>> the relationship between replicas and cores (in the input file) with
>>>>> the cores I have available on the machine (an in relation to the
>>>>> number of coordinate files I have). A similar observation was made for
>>>>> a frequency calculation.
>>>>> 
>>>>> I seem to have somehow a wrong understanding here :-/
>>>>> 
>>>>> All the best from a foggy London
>>>>> 
>>>>> Jörg
>>>>> 
>>>>> 
>>>>> --
>>>>> *************************************************************
>>>>> Jörg Saßmannshausen
>>>>> University College London
>>>>> Department of Chemistry
>>>>> Gordon Street
>>>>> London
>>>>> WC1H 0AJ
>>>>> 
>>>>> email: j.sassma... at ucl.ac.uk
>>>>> web: http://sassy.formativ.net
>>>>> 
>>>>> Please avoid sending me Word or PowerPoint attachments.
>>>>> See http://www.gnu.org/philosophy/no-word-attachments.html
>>>>> 
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "cp2k" group. To post to this group, send email to
>>>>> cp... at googlegroups.com. To unsubscribe from this group, send email to
>>>>> cp2k+uns... at googlegroups.com. For more options, visit this group
>>>>> at http://groups.google.com/group/cp2k?hl=en.
> 
> -- 
> *************************************************************
> Jörg Saßmannshausen
> University College London
> Department of Chemistry
> Gordon Street
> London
> WC1H 0AJ 
> 
> email: j.sassma... at ucl.ac.uk
> web: http://sassy.formativ.net
> 
> Please avoid sending me Word or PowerPoint attachments.
> See http://www.gnu.org/philosophy/no-word-attachments.html
> 
> -- 
> You received this message because you are subscribed to the Google Groups "cp2k" group.
> To post to this group, send email to cp... at googlegroups.com.
> To unsubscribe from this group, send email to cp2k+uns... at googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/cp2k?hl=en.
> 
> <neb-testjob.inp>




More information about the CP2K-user mailing list