<div dir="ltr"><div dir="ltr">Ok, thanks for emailing me the required data. There are a number of issues. First only matrix multiplications and fft's can currently be accelerated by GPU's. Looking at the timing sections your calculation is dominated by CPU parts: Total time: CP2K 609.771 Main bottlenecks: integrate_v_rspace 268.894 calculate_rho_elec 205.875 This is normal for smaller calculations, GPU's become more useful for systems with 1000+ atoms. The second problem is that only a small part (12.4%) of your multiplications are ported to the GPU: COUNTER CPU GPU GPU% number of processed stacks 179436 25344 12.4 This is a result of there not being kernels for your basis set. You will have to manually add them: Open: src/dbcsr/libsmm_acc/libcusmm/generaty.py There is a section with triples just on the top of the file. Add to it: triples += combinations(7,9,16,22) Best Samuel P.S: The main parameter that determines that speed of the calculations that you want to do is the CUTOFF parameter in<big class="uctt"> CP2K_INPUT/FORCE_EVAL/DFT/</big>MGRID. </div></div>