[CP2K-user] CP2K performance on GPUs
foru... at gmail.com
foru... at gmail.com
Sun Nov 4 18:13:01 UTC 2018
Thanks Alfio for the response.
Yes. 8 V100 GPUs is extreme. The test I had used takes around 500 seconds
on a system with Intel SKL G-6148 40 cores(20 cores/socket). Do you think
this test is not large enough to run on GPUs? If yes, can you recommend any
test from CP2K tests folder?
I had tried runs with 1 & 2 V100 gpus also. The performance was slower than
the 8 V100 gpus run.
CP2K was able to recognize all the 8 gpus, as per "DBCSR| ACC: Number of
devices/node".
I had tried reoptimizing the kernels for V100. But could not determine what
block size values have to be passed to tune.py script.
As CP2K-6.1 already has optimized kernel parameters for P100, even 2xP100
GPUs run was slower than CPU only benchmark.
On Sunday, November 4, 2018 at 2:33:11 PM UTC+5:30, Alfio Lazzaro wrote:
>
> You may take a look at this issue on github:
> https://github.com/cp2k/cp2k/issues/73
>
> In your particular case, your setup of 8 V100 is pretty extreme and it
> would require a large computation. Which test are you using for
> benchmarking?
>
> Then, your setup of 8 ranks + 5 threads should be OK. CP2K attaches ranks
> to GPU in a round-robin manner, therefore in your case there is a rank
> talking to each GPU.
> We don't have a large experience of multi-gpu nodes, hence I would suggest
> to do some scalability test by running 1 rank, 2 ranks, ... 8 ranks (always
> 5 threads) to check how the performance scales. BTW, make sure CP2K is able
> to recognize 8 GPUs by checking the following output at the beginning:
>
> DBCSR| ACC: Number of devices/node
> 1
>
> Eventually, you might consider reoptimizing the kernels for the V100, but
> this is not a priority...
>
> Alfio
>
>
>
> Il giorno sabato 3 novembre 2018 07:55:09 UTC+1, for... at gmail.com ha
> scritto:
>>
>> HI,
>>
>> How is the CP2K performance on GPUs in general?
>>
>> I'm getting very low performance on GPUs(Nvidia V100 SXM2). It is a
>> single node benchmark with 8 GPUs and Intel Skylake Gold 6148 dual
>> processors.
>>
>> The CP2K time on 8 GPUs (CP2K-6.1 psmp version, ifort-2017, CUDA-9.2,
>> 8mpi ranks + 5 threads per rank) is still slower than CP2K time of CPU only
>> benchmark.
>>
>> For CPU runs, the CP2K-6.1 is built with LIBXSMM-1.8.3.
>>
>> For GPU runs, have tried both with and without LIBXSMM. There is no
>> performance difference. But both's performance is still slower than CPU
>> only benchmark even after using all the 8 GPUs & all 40 cores of CPU. Can
>> some one please share their experience on CP2K performance with GPUs.
>>
>> The CUDA specific DFLAGS used are: -D__ACC -D__DBCSR_ACC -D__PW_CUDA.
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20181104/7cd4c1b4/attachment.htm>
More information about the CP2K-user
mailing list