[CP2K-user] [CP2K:15200] Re: Does CP2K allow a multi-GPU run?
fa...@gmail.com
fabia... at gmail.com
Thu Apr 22 18:02:00 UTC 2021
Hi,
cp2k is crashing when COSMA tries to access a gpu ("error: GPU API call :
invalid resource handle"). On cray systems there is the environment
variable "export CRAY_CUDA_MPS=1" that has to be set. Otherwise only one
mpi rank can access a specific GPU device. Maybe there is a similar setting
for your cluster?
Also cp2k can be memory hungry. Setting "ulimit -s unlimited" is often
needed.
I hope this helps,
Fabian
On Thursday, 22 April 2021 at 19:36:35 UTC+2 ASSIDUO Network wrote:
> Oh you meant the error file. Please find it attached.
>
> I have run on CPU only and one GPU. It works.
>
> On Thu, Apr 22, 2021 at 7:31 PM Alfio Lazzaro <al... at gmail.com> wrote:
>
>> I'm sorry, I cannot assist you, I'm not an expert on how to use CP2K ('m
>> not a domain scientist). Without the total log, I can help you...
>> I assume you should have a log file from PBS where you can see the error
>> message. I can assume it is a memory limit.
>> Have you executed on a CPU only?
>>
>>
>>
>> Il giorno giovedì 22 aprile 2021 alle 17:45:06 UTC+2 ASSIDUO Network ha
>> scritto:
>>
>>> Here's the log file. The job ended prematurely.
>>>
>>> On Thu, Apr 22, 2021 at 3:23 PM Lenard Carroll <len... at gmail.com>
>>> wrote:
>>>
>>>> Not sure yet. The job is still in the queue. As soon as it is finished
>>>> I'll post the log file info here.
>>>>
>>>> On Thu, Apr 22, 2021 at 3:15 PM Alfio Lazzaro <al... at gmail.com>
>>>> wrote:
>>>>
>>>>> And it works? Check the output and the performance... It can be that
>>>>> your particular test case doesn't use the GPU at all, so could you attach
>>>>> the log (at least the final part of it)
>>>>>
>>>>> Il giorno giovedì 22 aprile 2021 alle 13:42:16 UTC+2 ASSIDUO Network
>>>>> ha scritto:
>>>>>
>>>>>> I am using 30 threads now over 3 GPUs, so I used:
>>>>>>
>>>>>> export OMP_NUM_THREADS=10
>>>>>> mpiexec -n 3 cp2k.psmp -i gold50.inp -o gold50.out
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 22, 2021 at 1:34 PM Alfio Lazzaro <al... at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Wait, I see you have 32 threads in total, so need to have 32/4 = 8
>>>>>>> threads.
>>>>>>> Please change
>>>>>>>
>>>>>>> export OMP_NUM_THREADS=8
>>>>>>>
>>>>>>> Il giorno giovedì 22 aprile 2021 alle 13:27:59 UTC+2 ASSIDUO Network
>>>>>>> ha scritto:
>>>>>>>
>>>>>>>> Shall do. I already set it up, but it's in a long queue.
>>>>>>>>
>>>>>>>> On Thu, Apr 22, 2021 at 1:22 PM Alfio Lazzaro <al... at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Could you try what I suggested:
>>>>>>>>>
>>>>>>>>> export OMP_NUM_THREADS=10
>>>>>>>>> mpirun -np 4 ./cp2k.psmp -i gold.inp -o gold_pbc.out
>>>>>>>>>
>>>>>>>>> Please check the corresponding log.
>>>>>>>>>
>>>>>>>>> As I said above, you need an MPI rank per GPU and you told us that
>>>>>>>>> you have 4 GPUs, so you need 4 ranks (or multiple). With 10 you get
>>>>>>>>> unbalance.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Il giorno giovedì 22 aprile 2021 alle 10:17:27 UTC+2 ASSIDUO
>>>>>>>>> Network ha scritto:
>>>>>>>>>
>>>>>>>>>> Correction, he told me to use:
>>>>>>>>>>
>>>>>>>>>> mpirun -np 10 cp2k.psmp -i gold.inp -o gold_pbc.out
>>>>>>>>>>
>>>>>>>>>> but it didn't run correctly.
>>>>>>>>>>
>>>>>>>>>> On Thu, Apr 22, 2021 at 9:51 AM Lenard Carroll <
>>>>>>>>>> len... at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> He suggested I try out:
>>>>>>>>>>> mpirun -n 10 cp2k.psmp -i gold.inp -o gold_pbc.out
>>>>>>>>>>>
>>>>>>>>>>> as he is hoping that will cause the 1 GPU to use 10 CPUs over
>>>>>>>>>>> the selected 4 GPUs.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Apr 22, 2021 at 9:48 AM Alfio Lazzaro <
>>>>>>>>>>> al... at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> Your command to run CP2K doesn't mention MPI (mpirun, mpiexc,
>>>>>>>>>>>> ...). Are you running with multiple ranks?
>>>>>>>>>>>>
>>>>>>>>>>>> You can check those lines in the output:
>>>>>>>>>>>>
>>>>>>>>>>>> GLOBAL| Total number of message passing processes
>>>>>>>>>>>> 32
>>>>>>>>>>>> GLOBAL| Number of threads for this process
>>>>>>>>>>>> 4
>>>>>>>>>>>>
>>>>>>>>>>>> And check your numbers.
>>>>>>>>>>>> I can guess you have 1 rank and 40 threads.
>>>>>>>>>>>> To use 4 GPUs you need 4 ranks (and less threads per rank),
>>>>>>>>>>>> i.e. something like
>>>>>>>>>>>>
>>>>>>>>>>>> export OMP_NUM_THREADS=10
>>>>>>>>>>>> mpiexec -n 4 ./cp2k.psmp -i gold.inp -o gold_pbc.out
>>>>>>>>>>>>
>>>>>>>>>>>> Please check with your sysadmin on how to run with multiple MPI
>>>>>>>>>>>> ranks.
>>>>>>>>>>>>
>>>>>>>>>>>> Hope it helps.
>>>>>>>>>>>>
>>>>>>>>>>>> Alfio
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Il giorno mercoledì 21 aprile 2021 alle 09:26:53 UTC+2 ASSIDUO
>>>>>>>>>>>> Network ha scritto:
>>>>>>>>>>>>
>>>>>>>>>>>>> This is what my PBS file looks like:
>>>>>>>>>>>>>
>>>>>>>>>>>>> #!/bin/bash
>>>>>>>>>>>>> #PBS -P <PROJECT>
>>>>>>>>>>>>> #PBS -N <JOBNAME>
>>>>>>>>>>>>> #PBS -l select=1:ncpus=40:ngpus=4
>>>>>>>>>>>>> #PBS -l walltime=08:00:00
>>>>>>>>>>>>> #PBS -q gpu_4
>>>>>>>>>>>>> #PBS -m be
>>>>>>>>>>>>> #PBS -M none
>>>>>>>>>>>>>
>>>>>>>>>>>>> module purge
>>>>>>>>>>>>> module load chpc/cp2k/8.1.0/cuda10.1/openmpi-4.0.0/gcc-7.3.0
>>>>>>>>>>>>> source $SETUP
>>>>>>>>>>>>> cd $PBS_O_WORKDIR
>>>>>>>>>>>>>
>>>>>>>>>>>>> cp2k.psmp -i gold.inp -o gold_pbc.out
>>>>>>>>>>>>> ~
>>>>>>>>>>>>> ~
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Apr 21, 2021 at 9:22 AM Alfio Lazzaro <
>>>>>>>>>>>>> al... at gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> The way to use 4 GPUs per node is to use 4 MPI ranks. How
>>>>>>>>>>>>>> many ranks are you using?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Il giorno martedì 20 aprile 2021 alle 19:44:15 UTC+2 ASSIDUO
>>>>>>>>>>>>>> Network ha scritto:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm asking, since the administrator running my country's HPC
>>>>>>>>>>>>>>> is saying that although I'm requesting access to 4 GPUs, CP2K is only using
>>>>>>>>>>>>>>> 1. I checked the following output:
>>>>>>>>>>>>>>> DBCSR| ACC: Number of devices/node
>>>>>>>>>>>>>>> 4
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> And it shows that CP2K is picking up 4 GPUs.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tuesday, April 20, 2021 at 3:00:17 PM UTC+2 ASSIDUO
>>>>>>>>>>>>>>> Network wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I currently have access to 4 GPUs to run an AIMD
>>>>>>>>>>>>>>>> simulation, but only one of the GPUs are being used. Is there a way to use
>>>>>>>>>>>>>>>> the other 3, and if so, can you tell me how to set it up with a PBS job?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>>>>>> Google Groups "cp2k" group.
>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from
>>>>>>>>>>>>>> it, send an email to cp... at googlegroups.com.
>>>>>>>>>>>>>> To view this discussion on the web visit
>>>>>>>>>>>>>> https://groups.google.com/d/msgid/cp2k/70ba0fce-8636-4b75-940d-133ce4dbf0can%40googlegroups.com
>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/cp2k/70ba0fce-8636-4b75-940d-133ce4dbf0can%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>>>> Google Groups "cp2k" group.
>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from
>>>>>>>>>>>> it, send an email to cp... at googlegroups.com.
>>>>>>>>>>>> To view this discussion on the web visit
>>>>>>>>>>>> https://groups.google.com/d/msgid/cp2k/92e4f88d-fde8-4127-ab5f-0b98bbbba8ebn%40googlegroups.com
>>>>>>>>>>>> <https://groups.google.com/d/msgid/cp2k/92e4f88d-fde8-4127-ab5f-0b98bbbba8ebn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>> .
>>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>>> Groups "cp2k" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>> send an email to cp... at googlegroups.com.
>>>>>>>>>
>>>>>>>> To view this discussion on the web visit
>>>>>>>>> https://groups.google.com/d/msgid/cp2k/59a635d8-0f0c-4dc5-abaf-b8bbe3c18da5n%40googlegroups.com
>>>>>>>>> <https://groups.google.com/d/msgid/cp2k/59a635d8-0f0c-4dc5-abaf-b8bbe3c18da5n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>> .
>>>>>>>>>
>>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "cp2k" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to cp... at googlegroups.com.
>>>>>>>
>>>>>> To view this discussion on the web visit
>>>>>>> https://groups.google.com/d/msgid/cp2k/ec4efd81-6314-4ce7-b22c-148b362d2ba6n%40googlegroups.com
>>>>>>> <https://groups.google.com/d/msgid/cp2k/ec4efd81-6314-4ce7-b22c-148b362d2ba6n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "cp2k" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to cp... at googlegroups.com.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/cp2k/d29306aa-e0b8-4797-9298-13dab23e9083n%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/cp2k/d29306aa-e0b8-4797-9298-13dab23e9083n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> --
>> You received this message because you are subscribed to the Google Groups
>> "cp2k" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to cp... at googlegroups.com.
>>
> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/cp2k/6852eb71-6886-4fe7-8f4a-3ad8318a289dn%40googlegroups.com
>> <https://groups.google.com/d/msgid/cp2k/6852eb71-6886-4fe7-8f4a-3ad8318a289dn%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20210422/b8cfb072/attachment.htm>
More information about the CP2K-user
mailing list