<div>Dear Rafael,</div><div>In general, your test is quite small. You should see more with a larger test such as H2O-512.</div><div>The routines where CP2K spent most of its time in your case are (see "self time" in the "timing" section at the end), grid_collocate_task_list, grid_integrate_task_list and cp_fm_cholesky_invert. The latter one is performed using Scalapack which is not GPU accelerated but should run efficiently on the CPU. The first two are mostly run on the GPU (see section "grid statistics" in your output file) but as mentioned, will not be well accelerated by your GPU (only the increase in memory bandwidth may actually improve the performance). Operations within DBCSR also employed the GPU.</div><div>Considering that all of these operations make use of double-precision-numbers, the efficiency on the GPU is poor and will be better on a GPGPU (compare https://dashboard.cp2k.org/archive/perf-openmp/commit_2064daf5fd3962f4cfa5dcce2bfa3d6108bed819.txt for a CPU-test and https://dashboard.cp2k.org/archive/perf-cuda-volta/commit_e68d6fd0baf8cfa324767cbe0a05190d11f10215.txt for a V100-test).</div><div>If you want the full CPU usage, you may consider this keyword: https://manual.cp2k.org/cp2k-2025_2-branch/CP2K_INPUT/GLOBAL/GRID.html#CP2K_INPUT.GLOBAL.GRID.BACKEND .</div><div>Best,</div><div>Frederick</div><div class="gmail_quote"><div dir="auto" class="gmail_attr">rafa...@gmail.com schrieb am Dienstag, 17. Februar 2026 um 23:37:24 UTC+1:<br/></div><blockquote class="gmail_quote" style="margin: 0 0 0 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div>Thanks for the reply, Frederick.</div><div><br></div><div>I hoped that a CPU+GPU job would be at least as quick as a CPU-only run. It seems my GPU job is only using 1 CPU core, whereas I expected to utilize all 4 cores (my GPU run is on par with a 1 core CPU run). During a GPU run, my GPU utilization periodically spikes to 100% but it is idling at 0% most of the time; it seems like the GPU job is bottle-necked by the partial CPU utilization while the GPU is idle.</div><div><br></div><div>It leaves me wondering how CP2K is distributing the workload to the GPU and CPU; and can I expect full CPU utilization during the CPU portion of a CPU+GPU run?</div><div class="gmail_quote"><div dir="auto" class="gmail_attr">On Saturday, February 7, 2026 at 12:20:19 PM UTC-8 Frederick Stein wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>Dear Rafael,</div><div>with your GPU consumer cards will not provide an acceleration in case of CP2K no matter the workload because CP2K relies on Double-precision floating point numbers for accuracy which are not well supported by consumer cards such as NVIDIA RTX.</div><div>The GPU performance has improved since then (grid library, PDGEMM in RPA, DGEMM in MP2, ...) so some comments in the linked are not anymore correct.</div><div>I can't tell how much memory (CPU or GPU) you need for this test.</div><div>If you are interested to use the latest version of CP2K, be aware that you need to switch to the CMake-based (or Spack or Easybuild) build system.</div><div>Best,</div><div>Frederick</div><br><div class="gmail_quote"><div dir="auto" class="gmail_attr"><a rel="nofollow">rafa...@gmail.com</a> schrieb am Samstag, 7. Februar 2026 um 19:31:35 UTC+1:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>Hello, I'm testing CP2K performance on an older workstation PC and I'm finding that a the CPU version of CP2k 2025.2 is faster than the GPU version. My understanding is that many consumer GPUs do not have great double precision performance, but I can't tell if the slower GPU timing is normal for my system or if there is anything I can improve? For example, a CPU-only H2O-32.inp benchmark is twice as fast as a GPU run. The timings show that "grid_collocate_task_list" and "grid_integrate_task_list" are the most time consuming steps.</div><div><br></div><div>I came across a similar thread from 2018 <a href="https://github.com/cp2k/cp2k/issues/73" rel="nofollow" target="_blank" data-saferedirecturl="https://www.google.com/url?hl=de&q=https://github.com/cp2k/cp2k/issues/73&source=gmail&ust=1771495508229000&usg=AOvVaw0W8MMMInQ65cny6CUlH0l5">issue73</a>, but I wonder how those comments hold up for the 2025.2 CP2K version? Should I expect any performance gains from a GPU on small systems (<250 atoms)? I attached the ARCH files I used to build the CPU and GPU versions of CP2K along with the output files from the H2O-32.inp benchmarks.</div><div><br></div><div>My system has: hyperthreaded 4-core AMD Ryzen 5 2400G CPU, NVIDIA RTX 3050 6gb GPU, and 16gb RAM.</div><div><br></div><div>For CPU runs I use 4 MPI ranks with 2 OMP threads to get full CPU utilization. For GPU runs I use 1 MPI rank with 2 OMP threads, increasing OMP_NUM_THREADS to 4, 6, 8 does not show increased CPU utilization during a GPU run.</div><div><br></div><div>(I am unable to run H20-64.inp on GPU because of a CUDA OOM error: ERROR: "cudaErrorLaunchOutOfResources" at /home/raf/cp2k-home/cp2k-colordiffusion/cp2k-2025.2/src/grid/gpu/<a href="http://grid_gpu_collocate.cu:387" rel="nofollow" target="_blank" data-saferedirecturl="https://www.google.com/url?hl=de&q=http://grid_gpu_collocate.cu:387&source=gmail&ust=1771495508229000&usg=AOvVaw29SYShg-r5BzhJooo4IVRZ">grid_gpu_collocate.cu:387</a> )</div><div><br></div><div>Thanks,</div><div>Rafal</div></blockquote></div></blockquote></div></blockquote></div> <p></p> -- <br /> You received this message because you are subscribed to the Google Groups "cp2k" group.<br /> To unsubscribe from this group and stop receiving emails from it, send an email to <a href="mailto:cp2k+unsubscribe@googlegroups.com">cp2k+unsubscribe@googlegroups.com</a>.<br /> To view this discussion visit <a href="https://groups.google.com/d/msgid/cp2k/f524042b-7a6b-4e64-b6a8-aeb2e5f6b7fdn%40googlegroups.com?utm_medium=email&utm_source=footer">https://groups.google.com/d/msgid/cp2k/f524042b-7a6b-4e64-b6a8-aeb2e5f6b7fdn%40googlegroups.com</a>.<br />