CUDA-DBCSR Statistics
Abhishek Bagusetty
abhishek... at gmail.com
Wed Oct 1 16:43:47 UTC 2014
Hi Users & Developers,
I am trying to do a MD run using CP2K-2.5.1 - CUDA build (sopt) with
DBSCR-CUDA support.
*CASE-1*
When I use all the 4 GPUs on-node or in other words (CPU:GPU = 1:4), I get
the simulation to run but the DBCSR statistics indicate
the the GPU usage is 0 and all the matrix computations are carried on CPU.
I have configured the input file
so the DBCSR uses the accelerators and surprisingly things don't happen as
expected. (Input Snippet at the bottom)
-------------------------------------------------------------------------------
COUNTER CPU GPU
GPU%
number of processed stacks 10296
0 0.0
matmuls inhomo. stacks 0
0 0.0
matmuls total 11895585
0 0.0
flops 1 x 1 x 8 4664400
0 0.0
flops 1 x 8 x 1 3900000
0 0.0
flops 1 x 1 x 16 37315200
0 0.0
flops 1 x 16 x 1 31200000
0 0.0
flops 1 x 4 x 8 20092800
0 0.0
flops 4 x 1 x 8 20092800
0 0.0
flops 1 x 8 x 4 17472000
0 0.0
flops 4 x 8 x 1 17472000
0 0.0
flops 1 x 4 x 16 160742400
0 0.0
flops 4 x 1 x 16 160742400
0 0.0
flops 1 x 16 x 4 139776000
0 0.0
flops 4 x 16 x 1 139776000
0 0.0
flops 4 x 4 x 8 93230592
0 0.0
flops 4 x 8 x 4 78274560
0 0.0
flops 4 x 4 x 16 745844736
0 0.0
flops 4 x 16 x 4 626196480
0 0.0
flops total 2296792368
0 0.0
marketing flops 3478421232
-------------------------------------------------------------------------------
My guess is that the CP2K-2.5.1 version is not yet fully configured to
handle on-node multi-GPUs and when
multiple GPUs are available, things go wrong. I am not sure if this
interpretation makes sense.
*CASE-2 *
I have tried choosing 1 GPU out of 4 GPUs (ID : 0,1,2,3) on-node (CPU:GPU =
1:1) and tried to run the same simulation.
I get these errors from DBSCR initialization,
Error report :
dbcsr_cuda_stream_create failed
libdbcsr| dbcsr_cuda_stream_create failed
libdbcsr| Abnormal program termination, stopped by process number 0
CUDA Error: all CUDA-capable devices are busy or unavailable
My guess for this case is that the scheduler in the cluster might have
assigned a GPU device ID other than 0 and the CP2K-2.5.1
might have the GPU device ID hard-coded to 0. This lead to the
unavailability of necessary GPU device ID while creating streams, resulting
in CUDA error.
I am not able to understand, what is going wrong with the case where
CPU:GPU = 1:4 not showing any GPU stats for
DBCSR and CUDA error for the case where CPU:GPU = 1:1. My interpretations
can be completely wrong and appreciate
if some one can give an insight on what is going on.
*Architecture* : Super-Computing Cluster
Every node has 8 cores and 4 in-house GPU cards.
*JOB details* : 1 Node, 1 Core and 4 GPUs (Case-1) / 1 GPU(Case-2) (on-node
multiGPUs)
*Input File Snippet :*
&GLOBAL
PRINT_LEVEL LOW
PROJECT_NAME PROTON_HOP
RUN_TYPE MD
&MACHINE_ARCH
PRINT_FULL TRUE
&END MACHINE_ARCH
&DBCSR
MM_DRIVER CUDA
&CUDA
PROCESS_INHOMOGENOUS TRUE
PRIORITY_STREAMS 4
&END CUDA
&END DBCSR
&END GLOBAL
Thanks for your time and efforts,
Abhishek
-----------------------------------------------------------------------------------------------------------
Abhishek Bagusetty
PhD Student, Computational Modeling & Simulation
Center for Simulation and Modeling
Department of Chemical & Petroleum Engineering
University of Pittsburgh
Pittsburgh, PA - 15261
Office : 920 Benedum Hall
-----------------------------------------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20141001/51092b54/attachment.htm>
More information about the CP2K-user
mailing list