[CP2K-user] [CP2K:15203] Re: Does CP2K allow a multi-GPU run?

ASSIDUO Network lenardc... at gmail.com
Sun Apr 25 21:06:04 UTC 2021


Thanks for letting me know. I forwarded this to the admin and he confirmed 
that COSMA was compiled with CP2K, even though he specified it not to be. 
He has fixed this now and I've been able to use all my GPU resources.

Thanks for the help.

On Friday, April 23, 2021 at 10:24:57 AM UTC+2 Alfio Lazzaro wrote:

> The error in the log says that COSMA is used:
>
> #7  0x2dfb43b in check_runtime_status
> at 
> /apps/chpc/chem/gpu/cp2k/8.1.0/tools/toolchain/build/cosma-2.2.0/libs/Tiled-MM/src/Tiled-MM/util.hpp:17
> #8  0x2dfb43b in _ZNK3gpu13device_stream13enqueue_eventEv
> at 
> /apps/chpc/chem/gpu/cp2k/8.1.0/tools/toolchain/build/cosma-2.2.0/libs/Tiled-MM/src/Tiled-MM/device_stream.hpp:62
> #9  0x2dfb43b in 
> _ZN3gpu11round_robinIdEEvRNS_12tiled_matrixIT_EES4_S4_RNS_13device_bufferIS2_EES7_S7_iiiS2_S2_RNS_9mm_handleIS2_EE
> at 
> /apps/chpc/chem/gpu/cp2k/8.1.0/tools/toolchain/build/cosma-2.2.0/libs/Tiled-MM/src/Tiled-MM/tiled_mm.cpp:248
>
> ....
>
> Il giorno venerdì 23 aprile 2021 alle 10:00:35 UTC+2 ASSIDUO Network ha 
> scritto:
>
>> Dear Fabian. COSMA wasn't installed with CP2K, so that can't be the 
>> issue. The HPC system is not CRAY, but I did ask the HPC admin to look into 
>> it.
>>
>> On Thu, Apr 22, 2021 at 8:02 PM fa... at gmail.com <fa... at gmail.com> 
>> wrote:
>>
>>> Hi,
>>>
>>> cp2k is crashing when COSMA tries to access a gpu ("error: GPU API call 
>>> : invalid resource handle"). On cray systems there is the environment 
>>> variable "export CRAY_CUDA_MPS=1" that has to be set. Otherwise only one 
>>> mpi rank can access a specific GPU device. Maybe there is a similar setting 
>>> for your cluster?
>>>
>>> Also cp2k can be memory hungry. Setting "ulimit -s unlimited" is often 
>>> needed.
>>>
>>> I hope this helps,
>>> Fabian
>>>
>>> On Thursday, 22 April 2021 at 19:36:35 UTC+2 ASSIDUO Network wrote:
>>>
>>>> Oh you meant the error file. Please find it attached.
>>>>
>>>> I have run on CPU only and one GPU. It works.
>>>>
>>>> On Thu, Apr 22, 2021 at 7:31 PM Alfio Lazzaro <al... at gmail.com> 
>>>> wrote:
>>>>
>>>>> I'm sorry, I cannot assist you, I'm not an expert on how to use CP2K 
>>>>> ('m not a domain scientist). Without the total log, I can help you...
>>>>> I assume you should have a log file from PBS where you can see the 
>>>>> error message. I can assume it is a memory limit.
>>>>> Have you executed on a CPU only?
>>>>>
>>>>>
>>>>>
>>>>> Il giorno giovedì 22 aprile 2021 alle 17:45:06 UTC+2 ASSIDUO Network 
>>>>> ha scritto:
>>>>>
>>>>>> Here's the log file. The job ended prematurely.
>>>>>>
>>>>>> On Thu, Apr 22, 2021 at 3:23 PM Lenard Carroll <len... at gmail.com> 
>>>>>> wrote:
>>>>>>
>>>>>>> Not sure yet. The job is still in the queue. As soon as it is 
>>>>>>> finished I'll post the log file info here.
>>>>>>>
>>>>>>> On Thu, Apr 22, 2021 at 3:15 PM Alfio Lazzaro <al... at gmail.com> 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> And it works? Check the output and the performance... It can be 
>>>>>>>> that your particular test case doesn't use the GPU at all, so could you 
>>>>>>>> attach the log (at least the final part of it)
>>>>>>>>
>>>>>>>> Il giorno giovedì 22 aprile 2021 alle 13:42:16 UTC+2 ASSIDUO 
>>>>>>>> Network ha scritto:
>>>>>>>>
>>>>>>>>> I am using 30 threads now over 3 GPUs, so I used:
>>>>>>>>>
>>>>>>>>> export OMP_NUM_THREADS=10
>>>>>>>>> mpiexec -n 3 cp2k.psmp -i gold50.inp -o gold50.out
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Apr 22, 2021 at 1:34 PM Alfio Lazzaro <al... at gmail.com> 
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Wait, I see you have 32 threads in total, so need to have 32/4 = 
>>>>>>>>>> 8 threads.
>>>>>>>>>> Please change
>>>>>>>>>>
>>>>>>>>>> export OMP_NUM_THREADS=8
>>>>>>>>>>
>>>>>>>>>> Il giorno giovedì 22 aprile 2021 alle 13:27:59 UTC+2 ASSIDUO 
>>>>>>>>>> Network ha scritto:
>>>>>>>>>>
>>>>>>>>>>> Shall do. I already set it up, but it's in a long queue.
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Apr 22, 2021 at 1:22 PM Alfio Lazzaro <
>>>>>>>>>>> al... at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Could you try what I suggested:
>>>>>>>>>>>>
>>>>>>>>>>>> export OMP_NUM_THREADS=10
>>>>>>>>>>>> mpirun -np 4 ./cp2k.psmp -i gold.inp -o gold_pbc.out
>>>>>>>>>>>>
>>>>>>>>>>>> Please check the corresponding log.
>>>>>>>>>>>>
>>>>>>>>>>>> As I said above, you need an MPI rank per GPU and you told us 
>>>>>>>>>>>> that you have 4 GPUs, so you need 4 ranks (or multiple). With 10 you get 
>>>>>>>>>>>> unbalance.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Il giorno giovedì 22 aprile 2021 alle 10:17:27 UTC+2 ASSIDUO 
>>>>>>>>>>>> Network ha scritto:
>>>>>>>>>>>>
>>>>>>>>>>>>> Correction, he told me to use:
>>>>>>>>>>>>>
>>>>>>>>>>>>> mpirun -np 10 cp2k.psmp -i gold.inp -o gold_pbc.out
>>>>>>>>>>>>>
>>>>>>>>>>>>> but it didn't run correctly.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Apr 22, 2021 at 9:51 AM Lenard Carroll <
>>>>>>>>>>>>> len... at gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> He suggested I try out:
>>>>>>>>>>>>>> mpirun -n 10 cp2k.psmp -i gold.inp -o gold_pbc.out
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> as he is hoping that will cause the 1 GPU to use 10 CPUs over 
>>>>>>>>>>>>>> the selected 4 GPUs.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Apr 22, 2021 at 9:48 AM Alfio Lazzaro <
>>>>>>>>>>>>>> al... at gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>> Your command to run CP2K doesn't mention MPI (mpirun, 
>>>>>>>>>>>>>>> mpiexc, ...). Are you running with multiple ranks?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You can check those lines in the output:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  GLOBAL| Total number of message passing processes          
>>>>>>>>>>>>>>>                   32
>>>>>>>>>>>>>>>  GLOBAL| Number of threads for this process                  
>>>>>>>>>>>>>>>                   4
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> And check your numbers.
>>>>>>>>>>>>>>> I can guess you have 1 rank and 40 threads.
>>>>>>>>>>>>>>> To use 4 GPUs you need 4 ranks (and less threads per rank), 
>>>>>>>>>>>>>>> i.e. something like
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> export OMP_NUM_THREADS=10
>>>>>>>>>>>>>>> mpiexec -n 4 ./cp2k.psmp -i gold.inp -o gold_pbc.out
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Please check with your sysadmin on how to run with multiple 
>>>>>>>>>>>>>>> MPI ranks.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hope it helps.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Alfio
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Il giorno mercoledì 21 aprile 2021 alle 09:26:53 UTC+2 
>>>>>>>>>>>>>>> ASSIDUO Network ha scritto:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This is what my PBS file looks like:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> #!/bin/bash
>>>>>>>>>>>>>>>> #PBS -P <PROJECT>
>>>>>>>>>>>>>>>> #PBS -N <JOBNAME>
>>>>>>>>>>>>>>>> #PBS -l select=1:ncpus=40:ngpus=4
>>>>>>>>>>>>>>>> #PBS -l walltime=08:00:00
>>>>>>>>>>>>>>>> #PBS -q gpu_4
>>>>>>>>>>>>>>>> #PBS -m be
>>>>>>>>>>>>>>>> #PBS -M none
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> module purge
>>>>>>>>>>>>>>>> module load chpc/cp2k/8.1.0/cuda10.1/openmpi-4.0.0/gcc-7.3.0
>>>>>>>>>>>>>>>> source $SETUP
>>>>>>>>>>>>>>>> cd $PBS_O_WORKDIR
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> cp2k.psmp -i gold.inp -o gold_pbc.out
>>>>>>>>>>>>>>>> ~                                                          
>>>>>>>>>>>>>>>>                                                              ~              
>>>>>>>>>>>>>>>>                            
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Apr 21, 2021 at 9:22 AM Alfio Lazzaro <
>>>>>>>>>>>>>>>> al... at gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The way to use 4 GPUs per node is to use 4 MPI ranks. How 
>>>>>>>>>>>>>>>>> many ranks are you using?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Il giorno martedì 20 aprile 2021 alle 19:44:15 UTC+2 
>>>>>>>>>>>>>>>>> ASSIDUO Network ha scritto:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I'm asking, since the administrator running my country's 
>>>>>>>>>>>>>>>>>> HPC is saying that although I'm requesting access to 4 GPUs, CP2K is only 
>>>>>>>>>>>>>>>>>> using 1. I checked the following output:
>>>>>>>>>>>>>>>>>>  DBCSR| ACC: Number of devices/node                      
>>>>>>>>>>>>>>>>>>                       4
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> And it shows that CP2K is picking up 4 GPUs.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Tuesday, April 20, 2021 at 3:00:17 PM UTC+2 ASSIDUO 
>>>>>>>>>>>>>>>>>> Network wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I currently have access to 4 GPUs to run an AIMD 
>>>>>>>>>>>>>>>>>>> simulation, but only one of the GPUs are being used. Is there a way to use 
>>>>>>>>>>>>>>>>>>> the other 3, and if so, can you tell me how to set it up with a PBS job?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>> You received this message because you are subscribed to 
>>>>>>>>>>>>>>>>> the Google Groups "cp2k" group.
>>>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails 
>>>>>>>>>>>>>>>>> from it, send an email to cp... at googlegroups.com.
>>>>>>>>>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/cp2k/70ba0fce-8636-4b75-940d-133ce4dbf0can%40googlegroups.com 
>>>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/cp2k/70ba0fce-8636-4b75-940d-133ce4dbf0can%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>>>>>>> Google Groups "cp2k" group.
>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails 
>>>>>>>>>>>>>>> from it, send an email to cp... at googlegroups.com.
>>>>>>>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/cp2k/92e4f88d-fde8-4127-ab5f-0b98bbbba8ebn%40googlegroups.com 
>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/cp2k/92e4f88d-fde8-4127-ab5f-0b98bbbba8ebn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -- 
>>>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>>>> Google Groups "cp2k" group.
>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from 
>>>>>>>>>>>> it, send an email to cp... at googlegroups.com.
>>>>>>>>>>>>
>>>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>>>> https://groups.google.com/d/msgid/cp2k/59a635d8-0f0c-4dc5-abaf-b8bbe3c18da5n%40googlegroups.com 
>>>>>>>>>>>> <https://groups.google.com/d/msgid/cp2k/59a635d8-0f0c-4dc5-abaf-b8bbe3c18da5n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>> .
>>>>>>>>>>>>
>>>>>>>>>>> -- 
>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>> Google Groups "cp2k" group.
>>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>>>> send an email to cp... at googlegroups.com.
>>>>>>>>>>
>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>> https://groups.google.com/d/msgid/cp2k/ec4efd81-6314-4ce7-b22c-148b362d2ba6n%40googlegroups.com 
>>>>>>>>>> <https://groups.google.com/d/msgid/cp2k/ec4efd81-6314-4ce7-b22c-148b362d2ba6n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>> .
>>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>> Groups "cp2k" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>> send an email to cp... at googlegroups.com.
>>>>>>>> To view this discussion on the web visit 
>>>>>>>> https://groups.google.com/d/msgid/cp2k/d29306aa-e0b8-4797-9298-13dab23e9083n%40googlegroups.com 
>>>>>>>> <https://groups.google.com/d/msgid/cp2k/d29306aa-e0b8-4797-9298-13dab23e9083n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>> .
>>>>>>>>
>>>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "cp2k" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to cp... at googlegroups.com.
>>>>>
>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/cp2k/6852eb71-6886-4fe7-8f4a-3ad8318a289dn%40googlegroups.com 
>>>>> <https://groups.google.com/d/msgid/cp2k/6852eb71-6886-4fe7-8f4a-3ad8318a289dn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "cp2k" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to cp... at googlegroups.com.
>>>
>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/cp2k/c2033277-5fdf-4e98-9329-e9a289a5b277n%40googlegroups.com 
>>> <https://groups.google.com/d/msgid/cp2k/c2033277-5fdf-4e98-9329-e9a289a5b277n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20210425/8f2b66c5/attachment.htm>


More information about the CP2K-user mailing list