<div dir="ltr"><div dir="ltr">Ok, thanks for emailing me the required data. There are a number of issues.<br><br>First
only matrix multiplications and fft's can currently be accelerated by
GPU's. Looking at the timing sections your calculation is dominated by
CPU parts:<br>Total time: CP2K <wbr> 609.771<br>Main bottlenecks: integrate_v_rspace <wbr> 268.894<br> calculate_rho_elec <wbr> 205.875<br><br>This is normal for smaller calculations, GPU's become more useful for systems with 1000+ atoms.<br><br>The second problem is that only a small part (12.4%) of your multiplications are ported to the GPU:<br><br> COUNTER <wbr> CPU GPU GPU%<br> number of processed stacks 179436 25344 12.4<br><br>This is a result of there not being kernels for your basis set. You will have to manually add them:<br><br>Open: src/dbcsr/libsmm_acc/libcusmm/<wbr>generaty.py<br><br>There is a section with triples just on the top of the file. Add to it:<br>triples += combinations(7,9,16,22)<br><br>Best<br><br>Samuel<br><br>P.S: The main parameter that determines that speed of the calculations that you want to do is the CUTOFF parameter in<big class="uctt"> <font size="2">CP2K_INPUT/FORCE_EVAL/DFT/</font></big><font size="2">MGR</font>ID.<br></div></div>