[CP2K-user] [CP2K:15217] Re: Does CP2K allow a multi-GPU run?

Lenard Carroll lenardc... at gmail.com
Fri Apr 23 08:51:56 UTC 2021


I see. The HPC admin said he compiled it without COSMA. I have informed him
of this.

On Fri, Apr 23, 2021 at 10:25 AM Alfio Lazzaro <alfio.... at gmail.com>
wrote:

> The error in the log says that COSMA is used:
>
> #7  0x2dfb43b in check_runtime_status
> at
> /apps/chpc/chem/gpu/cp2k/8.1.0/tools/toolchain/build/cosma-2.2.0/libs/Tiled-MM/src/Tiled-MM/util.hpp:17
> #8  0x2dfb43b in _ZNK3gpu13device_stream13enqueue_eventEv
> at
> /apps/chpc/chem/gpu/cp2k/8.1.0/tools/toolchain/build/cosma-2.2.0/libs/Tiled-MM/src/Tiled-MM/device_stream.hpp:62
> #9  0x2dfb43b in
> _ZN3gpu11round_robinIdEEvRNS_12tiled_matrixIT_EES4_S4_RNS_13device_bufferIS2_EES7_S7_iiiS2_S2_RNS_9mm_handleIS2_EE
> at
> /apps/chpc/chem/gpu/cp2k/8.1.0/tools/toolchain/build/cosma-2.2.0/libs/Tiled-MM/src/Tiled-MM/tiled_mm.cpp:248
>
> ....
>
> Il giorno venerdì 23 aprile 2021 alle 10:00:35 UTC+2 ASSIDUO Network ha
> scritto:
>
>> Dear Fabian. COSMA wasn't installed with CP2K, so that can't be the
>> issue. The HPC system is not CRAY, but I did ask the HPC admin to look into
>> it.
>>
>> On Thu, Apr 22, 2021 at 8:02 PM fa... at gmail.com <fa... at gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> cp2k is crashing when COSMA tries to access a gpu ("error: GPU API call
>>> : invalid resource handle"). On cray systems there is the environment
>>> variable "export CRAY_CUDA_MPS=1" that has to be set. Otherwise only one
>>> mpi rank can access a specific GPU device. Maybe there is a similar setting
>>> for your cluster?
>>>
>>> Also cp2k can be memory hungry. Setting "ulimit -s unlimited" is often
>>> needed.
>>>
>>> I hope this helps,
>>> Fabian
>>>
>>> On Thursday, 22 April 2021 at 19:36:35 UTC+2 ASSIDUO Network wrote:
>>>
>>>> Oh you meant the error file. Please find it attached.
>>>>
>>>> I have run on CPU only and one GPU. It works.
>>>>
>>>> On Thu, Apr 22, 2021 at 7:31 PM Alfio Lazzaro <al... at gmail.com>
>>>> wrote:
>>>>
>>>>> I'm sorry, I cannot assist you, I'm not an expert on how to use CP2K
>>>>> ('m not a domain scientist). Without the total log, I can help you...
>>>>> I assume you should have a log file from PBS where you can see the
>>>>> error message. I can assume it is a memory limit.
>>>>> Have you executed on a CPU only?
>>>>>
>>>>>
>>>>>
>>>>> Il giorno giovedì 22 aprile 2021 alle 17:45:06 UTC+2 ASSIDUO Network
>>>>> ha scritto:
>>>>>
>>>>>> Here's the log file. The job ended prematurely.
>>>>>>
>>>>>> On Thu, Apr 22, 2021 at 3:23 PM Lenard Carroll <len... at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Not sure yet. The job is still in the queue. As soon as it is
>>>>>>> finished I'll post the log file info here.
>>>>>>>
>>>>>>> On Thu, Apr 22, 2021 at 3:15 PM Alfio Lazzaro <al... at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> And it works? Check the output and the performance... It can be
>>>>>>>> that your particular test case doesn't use the GPU at all, so could you
>>>>>>>> attach the log (at least the final part of it)
>>>>>>>>
>>>>>>>> Il giorno giovedì 22 aprile 2021 alle 13:42:16 UTC+2 ASSIDUO
>>>>>>>> Network ha scritto:
>>>>>>>>
>>>>>>>>> I am using 30 threads now over 3 GPUs, so I used:
>>>>>>>>>
>>>>>>>>> export OMP_NUM_THREADS=10
>>>>>>>>> mpiexec -n 3 cp2k.psmp -i gold50.inp -o gold50.out
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Apr 22, 2021 at 1:34 PM Alfio Lazzaro <al... at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Wait, I see you have 32 threads in total, so need to have 32/4 =
>>>>>>>>>> 8 threads.
>>>>>>>>>> Please change
>>>>>>>>>>
>>>>>>>>>> export OMP_NUM_THREADS=8
>>>>>>>>>>
>>>>>>>>>> Il giorno giovedì 22 aprile 2021 alle 13:27:59 UTC+2 ASSIDUO
>>>>>>>>>> Network ha scritto:
>>>>>>>>>>
>>>>>>>>>>> Shall do. I already set it up, but it's in a long queue.
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Apr 22, 2021 at 1:22 PM Alfio Lazzaro <
>>>>>>>>>>> al... at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Could you try what I suggested:
>>>>>>>>>>>>
>>>>>>>>>>>> export OMP_NUM_THREADS=10
>>>>>>>>>>>> mpirun -np 4 ./cp2k.psmp -i gold.inp -o gold_pbc.out
>>>>>>>>>>>>
>>>>>>>>>>>> Please check the corresponding log.
>>>>>>>>>>>>
>>>>>>>>>>>> As I said above, you need an MPI rank per GPU and you told us
>>>>>>>>>>>> that you have 4 GPUs, so you need 4 ranks (or multiple). With 10 you get
>>>>>>>>>>>> unbalance.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Il giorno giovedì 22 aprile 2021 alle 10:17:27 UTC+2 ASSIDUO
>>>>>>>>>>>> Network ha scritto:
>>>>>>>>>>>>
>>>>>>>>>>>>> Correction, he told me to use:
>>>>>>>>>>>>>
>>>>>>>>>>>>> mpirun -np 10 cp2k.psmp -i gold.inp -o gold_pbc.out
>>>>>>>>>>>>>
>>>>>>>>>>>>> but it didn't run correctly.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Apr 22, 2021 at 9:51 AM Lenard Carroll <
>>>>>>>>>>>>> len... at gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> He suggested I try out:
>>>>>>>>>>>>>> mpirun -n 10 cp2k.psmp -i gold.inp -o gold_pbc.out
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> as he is hoping that will cause the 1 GPU to use 10 CPUs over
>>>>>>>>>>>>>> the selected 4 GPUs.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Apr 22, 2021 at 9:48 AM Alfio Lazzaro <
>>>>>>>>>>>>>> al... at gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>> Your command to run CP2K doesn't mention MPI (mpirun,
>>>>>>>>>>>>>>> mpiexc, ...). Are you running with multiple ranks?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You can check those lines in the output:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  GLOBAL| Total number of message passing processes
>>>>>>>>>>>>>>>                   32
>>>>>>>>>>>>>>>  GLOBAL| Number of threads for this process
>>>>>>>>>>>>>>>                   4
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> And check your numbers.
>>>>>>>>>>>>>>> I can guess you have 1 rank and 40 threads.
>>>>>>>>>>>>>>> To use 4 GPUs you need 4 ranks (and less threads per rank),
>>>>>>>>>>>>>>> i.e. something like
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> export OMP_NUM_THREADS=10
>>>>>>>>>>>>>>> mpiexec -n 4 ./cp2k.psmp -i gold.inp -o gold_pbc.out
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Please check with your sysadmin on how to run with multiple
>>>>>>>>>>>>>>> MPI ranks.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hope it helps.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Alfio
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Il giorno mercoledì 21 aprile 2021 alle 09:26:53 UTC+2
>>>>>>>>>>>>>>> ASSIDUO Network ha scritto:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This is what my PBS file looks like:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> #!/bin/bash
>>>>>>>>>>>>>>>> #PBS -P <PROJECT>
>>>>>>>>>>>>>>>> #PBS -N <JOBNAME>
>>>>>>>>>>>>>>>> #PBS -l select=1:ncpus=40:ngpus=4
>>>>>>>>>>>>>>>> #PBS -l walltime=08:00:00
>>>>>>>>>>>>>>>> #PBS -q gpu_4
>>>>>>>>>>>>>>>> #PBS -m be
>>>>>>>>>>>>>>>> #PBS -M none
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> module purge
>>>>>>>>>>>>>>>> module load chpc/cp2k/8.1.0/cuda10.1/openmpi-4.0.0/gcc-7.3.0
>>>>>>>>>>>>>>>> source $SETUP
>>>>>>>>>>>>>>>> cd $PBS_O_WORKDIR
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> cp2k.psmp -i gold.inp -o gold_pbc.out
>>>>>>>>>>>>>>>> ~
>>>>>>>>>>>>>>>>                                                              ~
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Apr 21, 2021 at 9:22 AM Alfio Lazzaro <
>>>>>>>>>>>>>>>> al... at gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The way to use 4 GPUs per node is to use 4 MPI ranks. How
>>>>>>>>>>>>>>>>> many ranks are you using?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Il giorno martedì 20 aprile 2021 alle 19:44:15 UTC+2
>>>>>>>>>>>>>>>>> ASSIDUO Network ha scritto:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I'm asking, since the administrator running my country's
>>>>>>>>>>>>>>>>>> HPC is saying that although I'm requesting access to 4 GPUs, CP2K is only
>>>>>>>>>>>>>>>>>> using 1. I checked the following output:
>>>>>>>>>>>>>>>>>>  DBCSR| ACC: Number of devices/node
>>>>>>>>>>>>>>>>>>                       4
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> And it shows that CP2K is picking up 4 GPUs.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Tuesday, April 20, 2021 at 3:00:17 PM UTC+2 ASSIDUO
>>>>>>>>>>>>>>>>>> Network wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I currently have access to 4 GPUs to run an AIMD
>>>>>>>>>>>>>>>>>>> simulation, but only one of the GPUs are being used. Is there a way to use
>>>>>>>>>>>>>>>>>>> the other 3, and if so, can you tell me how to set it up with a PBS job?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> You received this message because you are subscribed to
>>>>>>>>>>>>>>>>> the Google Groups "cp2k" group.
>>>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails
>>>>>>>>>>>>>>>>> from it, send an email to cp... at googlegroups.com.
>>>>>>>>>>>>>>>>> To view this discussion on the web visit
>>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/cp2k/70ba0fce-8636-4b75-940d-133ce4dbf0can%40googlegroups.com
>>>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/cp2k/70ba0fce-8636-4b75-940d-133ce4dbf0can%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>>>>>>> Google Groups "cp2k" group.
>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails
>>>>>>>>>>>>>>> from it, send an email to cp... at googlegroups.com.
>>>>>>>>>>>>>>> To view this discussion on the web visit
>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/cp2k/92e4f88d-fde8-4127-ab5f-0b98bbbba8ebn%40googlegroups.com
>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/cp2k/92e4f88d-fde8-4127-ab5f-0b98bbbba8ebn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>>>> Google Groups "cp2k" group.
>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from
>>>>>>>>>>>> it, send an email to cp... at googlegroups.com.
>>>>>>>>>>>>
>>>>>>>>>>> To view this discussion on the web visit
>>>>>>>>>>>> https://groups.google.com/d/msgid/cp2k/59a635d8-0f0c-4dc5-abaf-b8bbe3c18da5n%40googlegroups.com
>>>>>>>>>>>> <https://groups.google.com/d/msgid/cp2k/59a635d8-0f0c-4dc5-abaf-b8bbe3c18da5n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>> .
>>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>> Google Groups "cp2k" group.
>>>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>>> send an email to cp... at googlegroups.com.
>>>>>>>>>>
>>>>>>>>> To view this discussion on the web visit
>>>>>>>>>> https://groups.google.com/d/msgid/cp2k/ec4efd81-6314-4ce7-b22c-148b362d2ba6n%40googlegroups.com
>>>>>>>>>> <https://groups.google.com/d/msgid/cp2k/ec4efd81-6314-4ce7-b22c-148b362d2ba6n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>> .
>>>>>>>>>>
>>>>>>>>> --
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "cp2k" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>> send an email to cp... at googlegroups.com.
>>>>>>>> To view this discussion on the web visit
>>>>>>>> https://groups.google.com/d/msgid/cp2k/d29306aa-e0b8-4797-9298-13dab23e9083n%40googlegroups.com
>>>>>>>> <https://groups.google.com/d/msgid/cp2k/d29306aa-e0b8-4797-9298-13dab23e9083n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>> .
>>>>>>>>
>>>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "cp2k" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to cp... at googlegroups.com.
>>>>>
>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/cp2k/6852eb71-6886-4fe7-8f4a-3ad8318a289dn%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/cp2k/6852eb71-6886-4fe7-8f4a-3ad8318a289dn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "cp2k" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to cp... at googlegroups.com.
>>>
>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/cp2k/c2033277-5fdf-4e98-9329-e9a289a5b277n%40googlegroups.com
>>> <https://groups.google.com/d/msgid/cp2k/c2033277-5fdf-4e98-9329-e9a289a5b277n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "cp2k" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cp... at googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/cp2k/67838a92-f8a7-4c35-a1bf-3d0edfccc0dcn%40googlegroups.com
> <https://groups.google.com/d/msgid/cp2k/67838a92-f8a7-4c35-a1bf-3d0edfccc0dcn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20210423/ff7c9910/attachment.htm>


More information about the CP2K-user mailing list