<div dir="ltr">Hi Users & Developers, I am trying to do a MD run using CP2K-2.5.1 - CUDA build (sopt) with DBSCR-CUDA support. CASE-1 When I use all the 4 GPUs on-node or in other words (CPU:GPU = 1:4), I get the simulation to run but the DBCSR statistics indicate the the GPU usage is 0 and all the matrix computations are carried on CPU. I have configured the input file so the DBCSR uses the accelerators and surprisingly things don't happen as expected. (Input Snippet at the bottom) ------------------------------------------------------------------------------- COUNTER CPU GPU GPU% number of processed stacks 10296 0 0.0 matmuls inhomo. stacks 0 0 0.0 matmuls total 11895585 0 0.0 flops 1 x 1 x 8 4664400 0 0.0 flops 1 x 8 x 1 3900000 0 0.0 flops 1 x 1 x 16 37315200 0 0.0 flops 1 x 16 x 1 31200000 0 0.0 flops 1 x 4 x 8 20092800 0 0.0 flops 4 x 1 x 8 20092800 0 0.0 flops 1 x 8 x 4 17472000 0 0.0 flops 4 x 8 x 1 17472000 0 0.0 flops 1 x 4 x 16 160742400 0 0.0 flops 4 x 1 x 16 160742400 0 0.0 flops 1 x 16 x 4 139776000 0 0.0 flops 4 x 16 x 1 139776000 0 0.0 flops 4 x 4 x 8 93230592 0 0.0 flops 4 x 8 x 4 78274560 0 0.0 flops 4 x 4 x 16 745844736 0 0.0 flops 4 x 16 x 4 626196480 0 0.0 flops total 2296792368 0 0.0 marketing flops 3478421232 ------------------------------------------------------------------------------- My guess is that the CP2K-2.5.1 version is not yet fully configured to handle on-node multi-GPUs and when multiple GPUs are available, things go wrong. I am not sure if this interpretation makes sense. CASE-2 I have tried choosing 1 GPU out of 4 GPUs (ID : 0,1,2,3) on-node (CPU:GPU = 1:1) and tried to run the same simulation. I get these errors from DBSCR initialization, Error report : dbcsr_cuda_stream_create failed libdbcsr| dbcsr_cuda_stream_create failed libdbcsr| Abnormal program termination, stopped by process number 0 CUDA Error: all CUDA-capable devices are busy or unavailable My guess for this case is that the scheduler in the cluster might have assigned a GPU device ID other than 0 and the CP2K-2.5.1 might have the GPU device ID hard-coded to 0. This lead to the unavailability of necessary GPU device ID while creating streams, resulting in CUDA error. I am not able to understand, what is going wrong with the case where CPU:GPU = 1:4 not showing any GPU stats for DBCSR and CUDA error for the case where CPU:GPU = 1:1. My interpretations can be completely wrong and appreciate if some one can give an insight on what is going on. Architecture : Super-Computing Cluster Every node has 8 cores and 4 in-house GPU cards. JOB details : 1 Node, 1 Core and 4 GPUs (Case-1) / 1 GPU(Case-2) (on-node multiGPUs) Input File Snippet : &GLOBAL PRINT_LEVEL LOW PROJECT_NAME PROTON_HOP RUN_TYPE MD &MACHINE_ARCH PRINT_FULL TRUE &END MACHINE_ARCH &DBCSR MM_DRIVER CUDA &CUDA PROCESS_INHOMOGENOUS TRUE PRIORITY_STREAMS 4 &END CUDA &END DBCSR &END GLOBAL Thanks for your time and efforts, Abhishek <div><div dir="ltr"><div><div><div><div><div><div>----------------------------------------------------------------------------------------------------------- Abhishek Bagusetty </div>PhD Student, Computational Modeling & Simulation </div>Center for Simulation and Modeling </div>Department of Chemical & Petroleum Engineering </div>University of Pittsburgh Pittsburgh, PA - 15261</div>Office : 920 Benedum Hall </div><div><div>-----------------------------------------------------------------------------------------------------------</div></div></div></div> </div>