CUDA-DBCSR Statistics

Abhishek Bagusetty abhishek... at gmail.com
Wed Oct 1 16:43:47 UTC 2014


Hi Users & Developers,

I am trying to do a MD run using CP2K-2.5.1 - CUDA build (sopt) with 
DBSCR-CUDA support. 

*CASE-1*

When I use all the 4 GPUs on-node or in other words (CPU:GPU = 1:4), I get 
the simulation to run but the DBCSR statistics indicate 
the the GPU usage is 0 and all the matrix computations are carried on CPU. 
I have configured the input file 
so the DBCSR uses the accelerators and surprisingly things don't happen as 
expected. (Input Snippet at the bottom) 

 -------------------------------------------------------------------------------
 COUNTER                                      CPU                  GPU      
GPU%
 number of processed stacks                 10296                    
0       0.0
 matmuls inhomo. stacks                         0                    
0       0.0
 matmuls total                           11895585                    
0       0.0
 flops   1 x    1 x    8                  4664400                    
0       0.0
 flops   1 x    8 x    1                  3900000                    
0       0.0
 flops   1 x    1 x   16                 37315200                    
0       0.0
 flops   1 x   16 x    1                 31200000                    
0       0.0
 flops   1 x    4 x    8                 20092800                    
0       0.0
 flops   4 x    1 x    8                 20092800                    
0       0.0
 flops   1 x    8 x    4                 17472000                    
0       0.0
 flops   4 x    8 x    1                 17472000                    
0       0.0
 flops   1 x    4 x   16                160742400                    
0       0.0
 flops   4 x    1 x   16                160742400                    
0       0.0
 flops   1 x   16 x    4                139776000                    
0       0.0
 flops   4 x   16 x    1                139776000                    
0       0.0
 flops   4 x    4 x    8                 93230592                    
0       0.0
 flops   4 x    8 x    4                 78274560                    
0       0.0
 flops   4 x    4 x   16                745844736                    
0       0.0
 flops   4 x   16 x    4                626196480                    
0       0.0
 flops total                           2296792368                    
0       0.0
 marketing flops                       3478421232
 -------------------------------------------------------------------------------

My guess is that the CP2K-2.5.1 version is not yet fully configured to 
handle on-node multi-GPUs and when 
multiple GPUs are available, things go wrong. I am not sure if this 
interpretation makes sense. 


*CASE-2 *

I have tried choosing 1 GPU out of 4 GPUs (ID : 0,1,2,3) on-node (CPU:GPU = 
1:1) and tried to run the same simulation. 
I get these errors from DBSCR initialization,

Error report :
 dbcsr_cuda_stream_create failed
 libdbcsr| dbcsr_cuda_stream_create failed
 libdbcsr| Abnormal program termination, stopped by process number 0
CUDA Error: all CUDA-capable devices are busy or unavailable
 
My guess for this case is that the scheduler in the cluster might have 
assigned a GPU device ID other than 0 and the CP2K-2.5.1
might have the GPU device ID hard-coded to 0. This lead to the 
unavailability of necessary GPU device ID while creating streams, resulting 
in CUDA error. 


I am not able to understand, what is going wrong with the case where 
CPU:GPU = 1:4 not showing any GPU stats for 
DBCSR and CUDA error for the case where CPU:GPU = 1:1. My interpretations 
can be completely wrong and appreciate 
if some one can give an insight on what is going on.

*Architecture* : Super-Computing Cluster
Every node has 8 cores and 4 in-house GPU cards. 

*JOB details* : 1 Node, 1 Core and 4 GPUs (Case-1) / 1 GPU(Case-2) (on-node 
multiGPUs)

*Input File Snippet :* 

&GLOBAL
  PRINT_LEVEL LOW
  PROJECT_NAME PROTON_HOP
  RUN_TYPE MD
  &MACHINE_ARCH
    PRINT_FULL TRUE
  &END MACHINE_ARCH
  &DBCSR
    MM_DRIVER CUDA
    &CUDA
      PROCESS_INHOMOGENOUS TRUE
      PRIORITY_STREAMS 4
    &END CUDA
  &END DBCSR
&END GLOBAL



Thanks for your time and efforts,
Abhishek 

-----------------------------------------------------------------------------------------------------------
Abhishek Bagusetty
PhD Student, Computational Modeling & Simulation
Center for Simulation and Modeling
Department of Chemical & Petroleum Engineering
University of Pittsburgh
Pittsburgh, PA - 15261
Office : 920 Benedum Hall
-----------------------------------------------------------------------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20141001/51092b54/attachment.htm>


More information about the CP2K-user mailing list