CUDA-DBCSR Statistics

Samuel Andermatt samuel.a... at student.ethz.ch
Thu Oct 2 12:06:36 UTC 2014


1. I think CP2K needs at least one MPI rank per GPU. So if you want to have 
multiple GPU you should run a popt or psmp build not sopt. Also a single 
core would likely not be able to fully load 4 GPUs.
2. There are currently no kernels for your block sizes. If you go to the 
cp2k/src/dbcsr/libsmm_acc/libcusmm/generate.py file you will have to add 
there: triples += combinations(1, 4,8, 16)
3. Some of your blocks are extremely small (with one of the dimensions 
beeing 1), this might lead to poor performance of the GPU code.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20141002/b609a795/attachment.htm>


More information about the CP2K-user mailing list