[CP2K-user] [CP2K:22104] Re: GPU vs CPU performance on consumer workstation

rafa...@gmail.com rafaldzie at gmail.com
Tue Feb 17 21:28:29 UTC 2026


Thanks for the reply, Frederick.

I hoped that a CPU+GPU job would be at least as quick as a CPU-only run. It 
seems my GPU job is only using 1 CPU core, whereas I expected to utilize 
all 4 cores (my GPU run is on par with a 1 core CPU run). During a GPU run, 
my GPU utilization periodically spikes to 100% but it is idling at 0% most 
of the time; it seems like the GPU job is bottle-necked by the partial CPU 
utilization while the GPU is idle.

It leaves me wondering how CP2K is distributing the workload to the GPU and 
CPU; and can I expect full CPU utilization during the CPU portion of a 
CPU+GPU run?
On Saturday, February 7, 2026 at 12:20:19 PM UTC-8 Frederick Stein wrote:

> Dear Rafael,
> with your GPU consumer cards will not provide an acceleration in case of 
> CP2K no matter the workload because CP2K relies on Double-precision 
> floating point numbers for accuracy which are not well supported by 
> consumer cards such as NVIDIA RTX.
> The GPU performance has improved since then (grid library, PDGEMM in RPA, 
> DGEMM in MP2, ...) so some comments in the linked are not anymore correct.
> I can't tell how much memory (CPU or GPU) you need for this test.
> If you are interested to use the latest version of CP2K, be aware that you 
> need to switch to the CMake-based (or Spack or Easybuild) build system.
> Best,
> Frederick
>
> rafa... at gmail.com schrieb am Samstag, 7. Februar 2026 um 19:31:35 UTC+1:
>
>> Hello, I'm testing CP2K performance on an older workstation PC and I'm 
>> finding that a the CPU version of CP2k 2025.2 is faster than the GPU 
>> version. My understanding is that many consumer GPUs do not have great 
>> double precision performance, but I can't tell if the slower GPU timing is 
>> normal for my system or if there is anything I can improve? For example, a 
>> CPU-only H2O-32.inp benchmark is twice as fast as a GPU run. The timings 
>> show that "grid_collocate_task_list" and "grid_integrate_task_list" are the 
>> most time consuming steps.
>>
>> I came across a similar thread from 2018 issue73 
>> <https://github.com/cp2k/cp2k/issues/73>, but I wonder how those 
>> comments hold up for the 2025.2 CP2K version? Should I expect any 
>> performance gains from a GPU on small systems (<250 atoms)? I attached the 
>> ARCH files I used to build the CPU and GPU versions of CP2K along with the 
>> output files from the H2O-32.inp benchmarks.
>>
>> My system has: hyperthreaded 4-core AMD Ryzen 5 2400G CPU, NVIDIA RTX 
>> 3050 6gb GPU, and 16gb RAM.
>>
>> For CPU runs I use 4 MPI ranks with 2 OMP threads to get full CPU 
>> utilization. For GPU runs I use 1 MPI rank with 2 OMP threads, increasing 
>> OMP_NUM_THREADS to 4, 6, 8 does not show increased CPU utilization during a 
>> GPU run.
>>
>> (I am unable to run H20-64.inp on GPU because of a CUDA OOM error: ERROR: 
>> "cudaErrorLaunchOutOfResources" at 
>> /home/raf/cp2k-home/cp2k-colordiffusion/cp2k-2025.2/src/grid/gpu/
>> grid_gpu_collocate.cu:387 )
>>
>> Thanks,
>> Rafal
>>
>

-- 
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+unsubscribe at googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/cp2k/2b122232-a340-47b3-a30f-8ce5e1abaa08n%40googlegroups.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20260217/a8bab9d1/attachment.htm>


More information about the CP2K-user mailing list