[CP2K-user] [CP2K:19039] Running Cp2k in parallel using thread in a PC

Léon Luntadila Lufungula Leon.luntadilalufungula at uantwerpen.be
Thu Jun 29 20:01:05 UTC 2023


Hi Corrado,

I have a similar (somewhat older) single node server with two Xeon Gold 
6152 CPUs @ 2.10GHz, 22 cores each, 44 total, 88 threads (hyperthreading 
enabled). I was currently running calculations with only OMP 
parallelization as I only have a limited amount of RAM (32GB), but I was 
wondering if I could also benefit from using mpirun with the --bind-to 
hwthread option, perhaps with -np 2 and OMP_NUM_THREADS=44? 

Out of interest, how much RAM do you have in your machine? I was thinking 
about suggesting to put some more RAM into our machine so that I can do 
heavier calculations, because I easily hit the memory limit... For some 
calculations I'm doing now I have to idle half of my processors so that I 
can run a calculation with 44 cores while using all the RAM on the node...

Kind regards,
Léon

On Tuesday, 14 February 2023 at 11:54:21 UTC+1 Corrado Bacchiocchi wrote:

> Hi Everyone,
>
> thanks for the many suggestions.
>
> I have a single node server with two Xeon Gold 6154 CPU @ 3.00GHz, 18 
> cores each, 36 total, 72 threads.
> I have found that the following launch command:
>
> mpirun -np 72 --bind-to hwthread  cp2k.psmp -i cp2k.inp -o cp2k.out 
>
> performs about 5x faster than
>
> mpirun -n 2 --bind-to numa --map-by numa -display-map cp2k.psmp -i 
> cp2k.inp -o cp2k.out
>
> Regards
> Corrado
> On Monday, June 6, 2022 at 1:58:42 PM UTC+2 Matthew Graneri wrote:
>
>> Hi Pierre,
>>
>> Sorry it's taken so long to reply. Your reply really helped. Thank you!
>>
>> Regards,
>>
>> Matthew
>>
>> On Friday, May 20, 2022 at 7:16:43 PM UTC+8 wave... at gmail.com wrote:
>>
>>> Hi Everyone
>>>
>>> While I haven't figured out the GPU side of things (btw, only part of 
>>> cp2k is GPU-optimized), I found this approach useful for mpirun. Note that 
>>> many people do not recommend using hyper-threading for this kind of 
>>> application, so this will not give hyper-threading.
>>>
>>>      mpirun -n 2 --bind-to numa --map-by numa -display-map cp2k.psmp -i 
>>> my-cp2k-run.inp > my-cp2k-run.out
>>>
>>>
>>>    1. The 'bind-to numa' and 'map-by numa' make use of the os's 
>>>    understanding of the processor.
>>>    2. These two together neatly puts the mpi ranks per cpu socket.
>>>    3. The '-display-map' writes the mpi assignments at the beginning of 
>>>    the output.
>>>
>>> Hope this helps!
>>>
>>> Kind Regards
>>>
>>> Sam
>>> On Wednesday, May 18, 2022 at 12:23:50 PM UTC+2 pierre.an... at gmail.com 
>>> wrote:
>>>
>>>> Hi Matthew,
>>>>
>>>>  
>>>>
>>>> Unfortunately, there’s no single way to determine the best MPI/OpenMP 
>>>> load. It is system, calculation type, and hardware dependant. I recommend 
>>>> testing the performance. The first thing you could try is check if your 
>>>> CPUs are multithreaded. For example, if they are made of 34 cores and 2 
>>>> virtual cores per physical core (68 virtual cores in total), you could try 
>>>> OMP_NUM_THREADS=2 and keep your mpirun -np (34*#nodes).
>>>>
>>>>  
>>>>
>>>> Roughly speaking, MPI creates multiple replica of the calculation 
>>>> (called process), each replica dealing with part of the calculation. CP2K 
>>>> is efficiently parallelized with MPI. OpenMP generated multiple threads on 
>>>> the fly, generally to parallelize a loop. OpenMP can be used in a MPI 
>>>> thread but not the other way around. Typically, having more MPI processed 
>>>> consumes more memory than the same number of OpenMP threads. To use 
>>>> multiple nodes, MPI is mandatory and more efficient. These are generalities 
>>>> and, again, combining both is best but the ideal ratio varies. Testing is 
>>>> the best course of action, check which combination yields the largest 
>>>> number of ps/day with the minimum hardware resources. Doubling the hardware 
>>>> does not double the output, so increasing the number of nodes becomes a 
>>>> waste of resources at some point.  A rule of thumb, if the increase in 
>>>> output is less than 75-80% of the ideal case, then, it is not worth it.
>>>>
>>>>  
>>>>
>>>> As you can see, there is a lot of try and error, no systematic rule I 
>>>> am afraid.
>>>>
>>>>  
>>>>
>>>> Regards,
>>>>
>>>> Pierre
>>>>
>>>>  
>>>>
>>>>  
>>>>
>>>>  
>>>>
>>>> *From: *cp... at googlegroups.com <cp... at googlegroups.com> on behalf of 
>>>> Matthew Graneri <mhvg... at gmail.com>
>>>> *Date: *Wednesday, 18 May 2022 at 10:35
>>>> *To: *cp2k <cp... at googlegroups.com>
>>>> *Subject: *Re: [CP2K:16997] Running Cp2k in parallel using thread in a 
>>>> PC
>>>>
>>>> Hi Pierre,
>>>>
>>>>  
>>>>
>>>> I found this really valuable! Unfortunately, being very new to AIMD and 
>>>> very unfamiliar with computation in general, I was wondering if I might be 
>>>> able to get some advice? We have a HPC at my university where each node has 
>>>> 34 processors, and ~750 GB RAM available for use. It runs on a slurm 
>>>> queuing system.
>>>>
>>>>  
>>>>
>>>> Until now, I've run all my jobs using: mpirun -np $SLURM_NTASKS 
>>>> cp2k.popt -I input.inp -o output.out
>>>>
>>>> where $SLURM_NTASKS is whatever number of processors I've allocated to 
>>>> the job via the --ntasks=x flag.
>>>>
>>>>  
>>>>
>>>> So instead, I'm thinking it might be more appropriate to use the .psmp 
>>>> executable, but I'm not sure what the difference between the OpenMP and the 
>>>> MPI threads are, and what kind of ratios between the OMP and MPI threads 
>>>> would be most effective for speeding up an AIMD job, and how many threads 
>>>> of each type you can add before the parallelisation becomes less efficient.
>>>>
>>>>  
>>>>
>>>> Do you (or anyone else) have any advice on the matter? Is it better to 
>>>> have more OMP or MPI threads? And how many OMP threads per MPI thread would 
>>>> be appropriate? What kinds of ratios are most effective at speeding up 
>>>> calculations?
>>>>
>>>>  
>>>>
>>>> I would really appreciate any help I can get!
>>>>
>>>>  
>>>>
>>>> Regards,
>>>>
>>>>  
>>>>
>>>> Matthew
>>>>
>>>> On Friday, September 20, 2019 at 10:45:55 PM UTC+8 
>>>> pierre.an... at gmail.com wrote:
>>>>
>>>> Hello Nikhil,
>>>>
>>>> Withe command "mpirun -n 42 cp2k.pop -i inp.inp -o -out.out", you are 
>>>> requesting 42 MPI threads and not 42 OpenMP threads. MPI usually relies on 
>>>> replicated data which means that, for a poorly program software, it will 
>>>> request a total amount of memory which the amount of memory required by a 
>>>> scalar execution times the number of threads. This can very quickly become 
>>>> problematic, in particular for QM calculations. OpenMP, however relies on 
>>>> shared memory, the data is normally not replicated but shared between 
>>>> threads and therefore, in an ideal scenario, the amount of memory needed 
>>>> for 42 OpenMP threads is the same as a single one.
>>>>
>>>> This might explains why you calculation freezes. You are out of memory. 
>>>> On your workstation, you should only use the executable "cp2k.ssmp" which 
>>>> is the OpenMP version. Then you don't need the mpirun command:
>>>>
>>>> cp2k.ssmp -i inp.inp -o -out.out
>>>>
>>>> To control the number of OpenMP threads, set the env variable: 
>>>> OMP_NUM_THREADS, e.g. in bash, export OMP_NUM_THREADS=48
>>>>
>>>> Now, if you need to balance between MPI and OpenMP, you should use the 
>>>> executable named cp2k.psmp. Here is such an example:
>>>>
>>>> export OMP_NUM_THREADS=24
>>>> mpirun -n 2 cp2k.psmp -i inp.inp -o -out.out
>>>>
>>>> In this example, I am requesting two MPI threads and each of them can 
>>>> use up to 24 OpenMP threads.
>>>>
>>>> Hope this clarifies things for you.
>>>>
>>>> Regards,
>>>> Pierre
>>>>
>>>>  
>>>>
>>>> On 20/09/2019 14:09, Nikhil Maroli wrote:
>>>>
>>>> Dear all, 
>>>>
>>>>  
>>>>
>>>> I have installed all the versions of CP2K in my workstation with 2 x 12 
>>>> core processor, total thread=48
>>>>
>>>>  
>>>>
>>>> I wanted to run cp2k in parallel using 42 threads, can anyone share the 
>>>> commands that i can use.
>>>>
>>>>  
>>>>
>>>> I have tried 
>>>>
>>>>  
>>>>
>>>> mpirun -n 42 cp2k.pop -i inp.inp -o -out.out
>>>>
>>>>  
>>>>
>>>> After this command there is a rise in memory to 100 % and the whole 
>>>> system freezes. (i have 128GB ram).
>>>>
>>>>  
>>>>
>>>> Any suggestion will be greatly appreciated,
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "cp2k" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to cp2k+uns... at googlegroups.com.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/cp2k/39284c57-f6eb-463e-81a6-3a123596a9f2%40googlegroups.com 
>>>> <https://groups.google.com/d/msgid/cp2k/39284c57-f6eb-463e-81a6-3a123596a9f2%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>>
>>>>
>>>> -- 
>>>>
>>>> Dr Pierre Cazade, PhD
>>>>
>>>> AD3-023, Bernal Institute,
>>>>
>>>> University of Limerick,
>>>>
>>>> Plassey Park Road,
>>>>
>>>> Castletroy, co. Limerick,
>>>>
>>>> Ireland
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "cp2k" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to cp2k+uns... at googlegroups.com.
>>>>
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/cp2k/010a2dd7-dc2c-4475-8a9b-17cdbb10d20dn%40googlegroups.com 
>>>> <https://groups.google.com/d/msgid/cp2k/010a2dd7-dc2c-4475-8a9b-17cdbb10d20dn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+unsubscribe at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/5db6cb44-b485-4520-ae04-7e125bb2d47cn%40googlegroups.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20230629/a43cde01/attachment-0001.htm>


More information about the CP2K-user mailing list