[CP2K-user] CP2K not effectively using GPUs
Fabian Ducry
fabia... at gmail.com
Wed Apr 15 14:15:43 UTC 2020
Hi everyone,
I have noticed that in some cases cp2k is not effectively using the GPUs
present on the node, while for similar atom configurations (and identical
input file, attached below) the GPUs are used. I wonder what causes these
differences?
Both simulationens are performed using the psmp version with 256 MPI ranks
and 3 OMP threads each, and 64 GPUs on the Piz Daint cluster. The number of
atoms (3655 and 3746) is slightly different but the species are the same,
Pt, Hf and O. The size of the matrix blocks to process is also the same.
>From the summary at the end of the output we see the that in the first
simulation the GPUs account for 99.9% of the flops while in the second one
only 6% of the flops are performed on the GPU.
COUNTER TOTAL BLAS
SMM ACC
...
flops 32 x 32 x 13 100202445398016 0.0% 0.0%
100.0%
flops 10 x 32 x 32 182711840276480 0.0% 0.4%
99.6%
flops 10 x 32 x 10 217680464691200 0.0% 0.0%
100.0%
flops 32 x 32 x 10 296933113405440 0.0% 0.0%
100.0%
flops inhomo. stacks 0 0.0%
0.0% 0.0%
flops total 1.179570E+15 0.0%
0.1% 99.9%
flops max/rank 4.725263E+12 0.0%
0.1% 99.9%
matmuls inhomo. stacks 0 0.0%
0.0% 0.0%
matmuls total 95474737068 0.0%
0.0% 100.0%
number of processed stacks 3880884 0.0% 0.1%
99.9%
average stack size 0.0
20228.0 24603.7
while here they are not:
COUNTER TOTAL BLAS
SMM ACC
...
flops 32 x 32 x 13 86563418093568 0.0% 100.0%
0.0%
flops 10 x 32 x 32 178933868625920 0.0% 100.0%
0.0%
flops 10 x 32 x 10 229355051443200 0.0% 90.9%
9.1%
flops 32 x 32 x 10 290793635389440 0.0% 100.0%
0.0%
flops inhomo. stacks 0 0.0%
0.0% 0.0%
flops total 1.134639E+15 0.0%
94.0% 6.0%
flops max/rank 4.704995E+12 0.0%
92.0% 8.0%
matmuls inhomo. stacks 0 0.0%
0.0% 0.0%
matmuls total 93442871379 0.0%
94.0% 6.0%
number of processed stacks 3893648 0.0%
93.3% 6.7%
average stack size 0.0
24183.7 21442.3
I have some GPU related GLOBAL settings:
&GLOBAL
PROJECT negf-step-282
RUN_TYPE ENERGY
PRINT_LEVEL MEDIUM
EXTENDED_FFT_LENGTHS
WALLTIME 17600
&FM
FORCE_BLOCK_SIZE
TYPE_OF_MATRIX_MULTIPLICATION DBCSR_MM
&END FM
&END GLOBAL
the full input file is attached as are the outputs for both simulations.
I am glad for any pointer to how I should change the settings.
Best,
Fabian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20200415/f3dc39e2/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cp2k.inp
Type: chemical/x-gamess-input
Size: 1704 bytes
Desc: not available
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20200415/f3dc39e2/attachment.inp>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cp2k1.out
Type: application/octet-stream
Size: 375866 bytes
Desc: not available
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20200415/f3dc39e2/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cp2k2.out
Type: application/octet-stream
Size: 368657 bytes
Desc: not available
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20200415/f3dc39e2/attachment-0001.obj>
More information about the CP2K-user
mailing list