[CP2K-user] [CP2K:21520] High CPU usage in QS calculation for 128 water molecule cluster using docker image

Rui Li walter299792458 at gmail.com
Tue Jun 3 00:37:18 UTC 2025


Hi CP2K team,

This is a duplicate of the issue I have posted on GitHub 
<https://github.com/cp2k/cp2k/issues/4226>, but let me ask again in this 
forum.

I am trying to profile CP2K performance, esp. exact diagonalization, on 
A100 GPU with AMD EPYC 7763 64-core processor. For the 128 water cluster 
provided in benchmark (and using LDA functional), I get ~ 18 seconds per 
SCF cycle. As a comparison, OT takes ~ 1 seconds per cycle.

The environment is,

   - using docker image: cp2k/cp2k:2025.1_mpich_generic_cuda_P100_psmp
   - 1 GPU per node, 1 task per node, 1 GPU per task. I also set cpu-bind 
   to none in slurm job.

My questions are:

   - Is this an expected behavior for exact diagonalization? (dozen times 
   slower than OT per cycle) Although I understand that the algorithm is 
   different,  direct diagonalization of ~ 5000x5000 matrix should not take 
   more than a second on A100 GPU, as far as I tested, but it is seemingly not 
   the case in the output. It would also be helpful if you could point out 
   which function refers to a Fock matrix build.
   - When I use nsys to profile, I see heavy CPU utilization throughout the 
   whole process. Is it possible that ELPA is using CPU instead of GPU if I 
   did not explicitly specify? Or is it just because system is too small and 
   GPU is under-utilized?
   - Is using the docker image preventing me from utilizing GPU? Will I 
   achieve better performance if I compile manually (esp. considering 
   different CUDA architecture for P100 vs A100)?
   
   
I am attaching the output file, and a nsys profile screenshot, but can 
provide more details to investigate the issue.

As a side note, I see that NGC container uses scalapack instead of 
GPU-accelerated ELPA for diagonalization, and it would take ~23 seconds per 
SCF cycle. So I guess for exact diagonalization, the newest docker image 
would be faster.

Best,
Rui

-- 
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+unsubscribe at googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/cp2k/20e1d724-c638-43d3-84f0-be156481fa13n%40googlegroups.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20250602/ee9cbefc/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nsys screenshot.png
Type: image/png
Size: 484736 bytes
Desc: not available
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20250602/ee9cbefc/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: H2O_128_LDA_exact_diag.out
Type: application/octet-stream
Size: 250088 bytes
Desc: not available
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20250602/ee9cbefc/attachment-0001.obj>


More information about the CP2K-user mailing list