[CP2K-user] dbcsr performance issues

Jing Lan lan... at mail2.sysu.edu.cn
Thu May 13 09:42:06 UTC 2021


Hi everyone,

I'm deploying and optimizing cp2k on an AMD cluster, with 4 AMD GPUs per 
node. I got the DBCSR and the pw part to run on the GPUs, but I have 
questions about the performance. I am testing QS/H2O-256.inp and 
QS_DM_LS/H2O-dft-ls.NERP2.inp benchmarks:

1. To use all 4 GPUs, is there any problem to call more than 4 MPI 
processes (e.g., 8, 16)?
2. I saw a performance boost in the QS_DM_LS tests since all dbcsr 
operations were assigned to the GPUs, but there was little help to the QS 
tests. I check the output logs and find that only 2% of flops were 
processed by the GPUs. How can I utilize GPUs on more computations? I know 
this is scheduled by the program.

 -------------------------------------------------------------------------------
 -                                                                             -
 -                                DBCSR STATISTICS                             -
 -                                                                             -
 -------------------------------------------------------------------------------
 COUNTER                                    TOTAL       BLAS       SMM       ACC
 flops     9 x     9 x    32        1430456039424     100.0%      0.0%      0.0%
 flops    32 x    32 x    32        1962800054272       0.0%      0.0%    100.0%
 flops    22 x     9 x    32        1986255912960     100.0%      0.0%      0.0%
 flops     9 x    22 x    32        1992003932160     100.0%      0.0%      0.0%
 flops    22 x    22 x    32        2753958699008     100.0%      0.0%      0.0%
 flops    32 x    32 x     9        4454954827776     100.0%      0.0%      0.0%
 flops    32 x    32 x    22        5444944789504     100.0%      0.0%      0.0%
 flops     9 x    32 x    32        5492290093056     100.0%      0.0%      0.0%
 flops    22 x    32 x    32        6712799002624     100.0%      0.0%      0.0%
 flops     9 x    32 x     9       11613072052224     100.0%      0.0%      0.0%
 flops    22 x    32 x     9       15239176077312     100.0%      0.0%      0.0%
 flops     9 x    32 x    22       15239176077312     100.0%      0.0%      0.0%
 flops    22 x    32 x    22       19911132921856     100.0%      0.0%      0.0%
 flops inhomo. stacks                           0       0.0%      0.0%      0.0%
 flops total                        94.233020E+12      97.9%      0.0%      2.1%
 flops max/rank                      5.910120E+12      97.9%      0.0%      2.1%
 matmuls inhomo. stacks                         0       0.0%      0.0%      0.0%
 matmuls total                         6806383904      99.6%      0.0%      0.4%
 number of processed stacks                728928      84.0%      0.0%     16.0%
 average stack size                                 11073.8       0.0     256.0
 marketing flops                   145.650931E+12

The QS test is long so I figure this is critical. Thanks.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20210513/400e7d90/attachment.htm>


More information about the CP2K-user mailing list