[CP2K-user] [CP2K:15203] Re: Does CP2K allow a multi-GPU run?

Alfio Lazzaro alfio.... at gmail.com
Fri Apr 23 08:24:57 UTC 2021


The error in the log says that COSMA is used:

#7  0x2dfb43b in check_runtime_status
at 
/apps/chpc/chem/gpu/cp2k/8.1.0/tools/toolchain/build/cosma-2.2.0/libs/Tiled-MM/src/Tiled-MM/util.hpp:17
#8  0x2dfb43b in _ZNK3gpu13device_stream13enqueue_eventEv
at 
/apps/chpc/chem/gpu/cp2k/8.1.0/tools/toolchain/build/cosma-2.2.0/libs/Tiled-MM/src/Tiled-MM/device_stream.hpp:62
#9  0x2dfb43b in 
_ZN3gpu11round_robinIdEEvRNS_12tiled_matrixIT_EES4_S4_RNS_13device_bufferIS2_EES7_S7_iiiS2_S2_RNS_9mm_handleIS2_EE
at 
/apps/chpc/chem/gpu/cp2k/8.1.0/tools/toolchain/build/cosma-2.2.0/libs/Tiled-MM/src/Tiled-MM/tiled_mm.cpp:248

....

Il giorno venerdì 23 aprile 2021 alle 10:00:35 UTC+2 ASSIDUO Network ha 
scritto:

> Dear Fabian. COSMA wasn't installed with CP2K, so that can't be the issue. 
> The HPC system is not CRAY, but I did ask the HPC admin to look into it.
>
> On Thu, Apr 22, 2021 at 8:02 PM fa... at gmail.com <fa... at gmail.com> 
> wrote:
>
>> Hi,
>>
>> cp2k is crashing when COSMA tries to access a gpu ("error: GPU API call 
>> : invalid resource handle"). On cray systems there is the environment 
>> variable "export CRAY_CUDA_MPS=1" that has to be set. Otherwise only one 
>> mpi rank can access a specific GPU device. Maybe there is a similar setting 
>> for your cluster?
>>
>> Also cp2k can be memory hungry. Setting "ulimit -s unlimited" is often 
>> needed.
>>
>> I hope this helps,
>> Fabian
>>
>> On Thursday, 22 April 2021 at 19:36:35 UTC+2 ASSIDUO Network wrote:
>>
>>> Oh you meant the error file. Please find it attached.
>>>
>>> I have run on CPU only and one GPU. It works.
>>>
>>> On Thu, Apr 22, 2021 at 7:31 PM Alfio Lazzaro <al... at gmail.com> 
>>> wrote:
>>>
>>>> I'm sorry, I cannot assist you, I'm not an expert on how to use CP2K 
>>>> ('m not a domain scientist). Without the total log, I can help you...
>>>> I assume you should have a log file from PBS where you can see the 
>>>> error message. I can assume it is a memory limit.
>>>> Have you executed on a CPU only?
>>>>
>>>>
>>>>
>>>> Il giorno giovedì 22 aprile 2021 alle 17:45:06 UTC+2 ASSIDUO Network ha 
>>>> scritto:
>>>>
>>>>> Here's the log file. The job ended prematurely.
>>>>>
>>>>> On Thu, Apr 22, 2021 at 3:23 PM Lenard Carroll <len... at gmail.com> 
>>>>> wrote:
>>>>>
>>>>>> Not sure yet. The job is still in the queue. As soon as it is 
>>>>>> finished I'll post the log file info here.
>>>>>>
>>>>>> On Thu, Apr 22, 2021 at 3:15 PM Alfio Lazzaro <al... at gmail.com> 
>>>>>> wrote:
>>>>>>
>>>>>>> And it works? Check the output and the performance... It can be that 
>>>>>>> your particular test case doesn't use the GPU at all, so could you attach 
>>>>>>> the log (at least the final part of it)
>>>>>>>
>>>>>>> Il giorno giovedì 22 aprile 2021 alle 13:42:16 UTC+2 ASSIDUO Network 
>>>>>>> ha scritto:
>>>>>>>
>>>>>>>> I am using 30 threads now over 3 GPUs, so I used:
>>>>>>>>
>>>>>>>> export OMP_NUM_THREADS=10
>>>>>>>> mpiexec -n 3 cp2k.psmp -i gold50.inp -o gold50.out
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Apr 22, 2021 at 1:34 PM Alfio Lazzaro <al... at gmail.com> 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Wait, I see you have 32 threads in total, so need to have 32/4 = 8 
>>>>>>>>> threads.
>>>>>>>>> Please change
>>>>>>>>>
>>>>>>>>> export OMP_NUM_THREADS=8
>>>>>>>>>
>>>>>>>>> Il giorno giovedì 22 aprile 2021 alle 13:27:59 UTC+2 ASSIDUO 
>>>>>>>>> Network ha scritto:
>>>>>>>>>
>>>>>>>>>> Shall do. I already set it up, but it's in a long queue.
>>>>>>>>>>
>>>>>>>>>> On Thu, Apr 22, 2021 at 1:22 PM Alfio Lazzaro <
>>>>>>>>>> al... at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Could you try what I suggested:
>>>>>>>>>>>
>>>>>>>>>>> export OMP_NUM_THREADS=10
>>>>>>>>>>> mpirun -np 4 ./cp2k.psmp -i gold.inp -o gold_pbc.out
>>>>>>>>>>>
>>>>>>>>>>> Please check the corresponding log.
>>>>>>>>>>>
>>>>>>>>>>> As I said above, you need an MPI rank per GPU and you told us 
>>>>>>>>>>> that you have 4 GPUs, so you need 4 ranks (or multiple). With 10 you get 
>>>>>>>>>>> unbalance.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Il giorno giovedì 22 aprile 2021 alle 10:17:27 UTC+2 ASSIDUO 
>>>>>>>>>>> Network ha scritto:
>>>>>>>>>>>
>>>>>>>>>>>> Correction, he told me to use:
>>>>>>>>>>>>
>>>>>>>>>>>> mpirun -np 10 cp2k.psmp -i gold.inp -o gold_pbc.out
>>>>>>>>>>>>
>>>>>>>>>>>> but it didn't run correctly.
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Apr 22, 2021 at 9:51 AM Lenard Carroll <
>>>>>>>>>>>> len... at gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> He suggested I try out:
>>>>>>>>>>>>> mpirun -n 10 cp2k.psmp -i gold.inp -o gold_pbc.out
>>>>>>>>>>>>>
>>>>>>>>>>>>> as he is hoping that will cause the 1 GPU to use 10 CPUs over 
>>>>>>>>>>>>> the selected 4 GPUs.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Apr 22, 2021 at 9:48 AM Alfio Lazzaro <
>>>>>>>>>>>>> al... at gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> Your command to run CP2K doesn't mention MPI (mpirun, mpiexc, 
>>>>>>>>>>>>>> ...). Are you running with multiple ranks?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You can check those lines in the output:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  GLOBAL| Total number of message passing processes            
>>>>>>>>>>>>>>                 32
>>>>>>>>>>>>>>  GLOBAL| Number of threads for this process                  
>>>>>>>>>>>>>>                   4
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> And check your numbers.
>>>>>>>>>>>>>> I can guess you have 1 rank and 40 threads.
>>>>>>>>>>>>>> To use 4 GPUs you need 4 ranks (and less threads per rank), 
>>>>>>>>>>>>>> i.e. something like
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> export OMP_NUM_THREADS=10
>>>>>>>>>>>>>> mpiexec -n 4 ./cp2k.psmp -i gold.inp -o gold_pbc.out
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Please check with your sysadmin on how to run with multiple 
>>>>>>>>>>>>>> MPI ranks.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hope it helps.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Alfio
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Il giorno mercoledì 21 aprile 2021 alle 09:26:53 UTC+2 
>>>>>>>>>>>>>> ASSIDUO Network ha scritto:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This is what my PBS file looks like:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> #!/bin/bash
>>>>>>>>>>>>>>> #PBS -P <PROJECT>
>>>>>>>>>>>>>>> #PBS -N <JOBNAME>
>>>>>>>>>>>>>>> #PBS -l select=1:ncpus=40:ngpus=4
>>>>>>>>>>>>>>> #PBS -l walltime=08:00:00
>>>>>>>>>>>>>>> #PBS -q gpu_4
>>>>>>>>>>>>>>> #PBS -m be
>>>>>>>>>>>>>>> #PBS -M none
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> module purge
>>>>>>>>>>>>>>> module load chpc/cp2k/8.1.0/cuda10.1/openmpi-4.0.0/gcc-7.3.0
>>>>>>>>>>>>>>> source $SETUP
>>>>>>>>>>>>>>> cd $PBS_O_WORKDIR
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> cp2k.psmp -i gold.inp -o gold_pbc.out
>>>>>>>>>>>>>>> ~                                                            
>>>>>>>>>>>>>>>                                                            ~                
>>>>>>>>>>>>>>>                          
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Apr 21, 2021 at 9:22 AM Alfio Lazzaro <
>>>>>>>>>>>>>>> al... at gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The way to use 4 GPUs per node is to use 4 MPI ranks. How 
>>>>>>>>>>>>>>>> many ranks are you using?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Il giorno martedì 20 aprile 2021 alle 19:44:15 UTC+2 
>>>>>>>>>>>>>>>> ASSIDUO Network ha scritto:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'm asking, since the administrator running my country's 
>>>>>>>>>>>>>>>>> HPC is saying that although I'm requesting access to 4 GPUs, CP2K is only 
>>>>>>>>>>>>>>>>> using 1. I checked the following output:
>>>>>>>>>>>>>>>>>  DBCSR| ACC: Number of devices/node                        
>>>>>>>>>>>>>>>>>                     4
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> And it shows that CP2K is picking up 4 GPUs.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tuesday, April 20, 2021 at 3:00:17 PM UTC+2 ASSIDUO 
>>>>>>>>>>>>>>>>> Network wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I currently have access to 4 GPUs to run an AIMD 
>>>>>>>>>>>>>>>>>> simulation, but only one of the GPUs are being used. Is there a way to use 
>>>>>>>>>>>>>>>>>> the other 3, and if so, can you tell me how to set it up with a PBS job?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>>>>>>>> Google Groups "cp2k" group.
>>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails 
>>>>>>>>>>>>>>>> from it, send an email to cp... at googlegroups.com.
>>>>>>>>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/cp2k/70ba0fce-8636-4b75-940d-133ce4dbf0can%40googlegroups.com 
>>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/cp2k/70ba0fce-8636-4b75-940d-133ce4dbf0can%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>>>>>> Google Groups "cp2k" group.
>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from 
>>>>>>>>>>>>>> it, send an email to cp... at googlegroups.com.
>>>>>>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>>>>>> https://groups.google.com/d/msgid/cp2k/92e4f88d-fde8-4127-ab5f-0b98bbbba8ebn%40googlegroups.com 
>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/cp2k/92e4f88d-fde8-4127-ab5f-0b98bbbba8ebn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>
>>>>>>>>>>>>> -- 
>>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>>> Google Groups "cp2k" group.
>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from 
>>>>>>>>>>> it, send an email to cp... at googlegroups.com.
>>>>>>>>>>>
>>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>>> https://groups.google.com/d/msgid/cp2k/59a635d8-0f0c-4dc5-abaf-b8bbe3c18da5n%40googlegroups.com 
>>>>>>>>>>> <https://groups.google.com/d/msgid/cp2k/59a635d8-0f0c-4dc5-abaf-b8bbe3c18da5n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>> .
>>>>>>>>>>>
>>>>>>>>>> -- 
>>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>>> Groups "cp2k" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>>> send an email to cp... at googlegroups.com.
>>>>>>>>>
>>>>>>>> To view this discussion on the web visit 
>>>>>>>>> https://groups.google.com/d/msgid/cp2k/ec4efd81-6314-4ce7-b22c-148b362d2ba6n%40googlegroups.com 
>>>>>>>>> <https://groups.google.com/d/msgid/cp2k/ec4efd81-6314-4ce7-b22c-148b362d2ba6n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>> .
>>>>>>>>>
>>>>>>>> -- 
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups "cp2k" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>> send an email to cp... at googlegroups.com.
>>>>>>> To view this discussion on the web visit 
>>>>>>> https://groups.google.com/d/msgid/cp2k/d29306aa-e0b8-4797-9298-13dab23e9083n%40googlegroups.com 
>>>>>>> <https://groups.google.com/d/msgid/cp2k/d29306aa-e0b8-4797-9298-13dab23e9083n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "cp2k" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to cp... at googlegroups.com.
>>>>
>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/cp2k/6852eb71-6886-4fe7-8f4a-3ad8318a289dn%40googlegroups.com 
>>>> <https://groups.google.com/d/msgid/cp2k/6852eb71-6886-4fe7-8f4a-3ad8318a289dn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "cp2k" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to cp... at googlegroups.com.
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/cp2k/c2033277-5fdf-4e98-9329-e9a289a5b277n%40googlegroups.com 
>> <https://groups.google.com/d/msgid/cp2k/c2033277-5fdf-4e98-9329-e9a289a5b277n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20210423/ff752f8c/attachment.htm>


More information about the CP2K-user mailing list