[CP2K-user] [CP2K:22104] Re: GPU vs CPU performance on consumer workstation
rafa...@gmail.com
rafaldzie at gmail.com
Tue Feb 17 21:28:29 UTC 2026
Thanks for the reply, Frederick.
I hoped that a CPU+GPU job would be at least as quick as a CPU-only run. It
seems my GPU job is only using 1 CPU core, whereas I expected to utilize
all 4 cores (my GPU run is on par with a 1 core CPU run). During a GPU run,
my GPU utilization periodically spikes to 100% but it is idling at 0% most
of the time; it seems like the GPU job is bottle-necked by the partial CPU
utilization while the GPU is idle.
It leaves me wondering how CP2K is distributing the workload to the GPU and
CPU; and can I expect full CPU utilization during the CPU portion of a
CPU+GPU run?
On Saturday, February 7, 2026 at 12:20:19 PM UTC-8 Frederick Stein wrote:
> Dear Rafael,
> with your GPU consumer cards will not provide an acceleration in case of
> CP2K no matter the workload because CP2K relies on Double-precision
> floating point numbers for accuracy which are not well supported by
> consumer cards such as NVIDIA RTX.
> The GPU performance has improved since then (grid library, PDGEMM in RPA,
> DGEMM in MP2, ...) so some comments in the linked are not anymore correct.
> I can't tell how much memory (CPU or GPU) you need for this test.
> If you are interested to use the latest version of CP2K, be aware that you
> need to switch to the CMake-based (or Spack or Easybuild) build system.
> Best,
> Frederick
>
> rafa... at gmail.com schrieb am Samstag, 7. Februar 2026 um 19:31:35 UTC+1:
>
>> Hello, I'm testing CP2K performance on an older workstation PC and I'm
>> finding that a the CPU version of CP2k 2025.2 is faster than the GPU
>> version. My understanding is that many consumer GPUs do not have great
>> double precision performance, but I can't tell if the slower GPU timing is
>> normal for my system or if there is anything I can improve? For example, a
>> CPU-only H2O-32.inp benchmark is twice as fast as a GPU run. The timings
>> show that "grid_collocate_task_list" and "grid_integrate_task_list" are the
>> most time consuming steps.
>>
>> I came across a similar thread from 2018 issue73
>> <https://github.com/cp2k/cp2k/issues/73>, but I wonder how those
>> comments hold up for the 2025.2 CP2K version? Should I expect any
>> performance gains from a GPU on small systems (<250 atoms)? I attached the
>> ARCH files I used to build the CPU and GPU versions of CP2K along with the
>> output files from the H2O-32.inp benchmarks.
>>
>> My system has: hyperthreaded 4-core AMD Ryzen 5 2400G CPU, NVIDIA RTX
>> 3050 6gb GPU, and 16gb RAM.
>>
>> For CPU runs I use 4 MPI ranks with 2 OMP threads to get full CPU
>> utilization. For GPU runs I use 1 MPI rank with 2 OMP threads, increasing
>> OMP_NUM_THREADS to 4, 6, 8 does not show increased CPU utilization during a
>> GPU run.
>>
>> (I am unable to run H20-64.inp on GPU because of a CUDA OOM error: ERROR:
>> "cudaErrorLaunchOutOfResources" at
>> /home/raf/cp2k-home/cp2k-colordiffusion/cp2k-2025.2/src/grid/gpu/
>> grid_gpu_collocate.cu:387 )
>>
>> Thanks,
>> Rafal
>>
>
--
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+unsubscribe at googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/cp2k/2b122232-a340-47b3-a30f-8ce5e1abaa08n%40googlegroups.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20260217/a8bab9d1/attachment.htm>
More information about the CP2K-user
mailing list