[CP2K-user] CUDA RUNTIME API error: EventRecord failed with error cudaErrorInvalidResourceHandle

singlebook chenw... at gmail.com
Fri Feb 5 08:24:33 UTC 2021


Hello,  Alfio,

Yes, there are 12 MPI ranks, each rank has only one thread.
The output file is too large to upload, I only  put the head information 
for the cpu version here, those files for gpu are not saved for the moment. 
Whenever the workstation is idle, I will do more tests.







































































*DBCSR| CPU Multiplication driver                                           
XSMM DBCSR| Multrec recursion 
limit                                              512 DBCSR| 
Multiplication stack size                                           
1000 DBCSR| Maximum elements for images                                    
UNLIMITED DBCSR| Multiplicative factor virtual 
images                                   1 DBCSR| Use multiplication 
densification                                       T DBCSR| Multiplication 
size stacks                                             3 DBCSR| Use memory 
pool for CPU allocation                                     F DBCSR| Number 
of 3D layers                                               SINGLE DBCSR| 
Use MPI memory allocation                                              
F DBCSR| Use RMA 
algorithm                                                      F DBCSR| Use 
Communication thread                                               T DBCSR| 
Communication thread load                                             
87 DBCSR| MPI: My node 
id                                                        0 DBCSR| MPI: 
Number of nodes                                                  48 DBCSR| 
OMP: Current number of threads                                         
1 DBCSR| OMP: Max number of 
threads                                             1 DBCSR| Split modifier 
for TAS multiplication algorithm                  1.0E+00  **** **** 
******  **  PROGRAM STARTED AT               2021-02-04 09:18:01.088 ***** 
** ***  *** **   PROGRAM STARTED ON                                  
k172 **    ****   ******    PROGRAM STARTED 
BY                               chenwei ***** **    ** ** **   PROGRAM 
PROCESS ID                                 52126  **** **  *******  **  
PROGRAM STARTED IN /ncsfs02/chenwei/Machine 
Learning/CP2                                           K/SiC CP2K| version 
string:                                          CP2K version 8.1 CP2K| 
source code revision number:                                  
git:0b61f2f CP2K| cp2kflags: omp libint fftw3 libxc elpa parallel mpi3 
scalapack xsmm plume CP2K|            d2 spglib libvori libbqb CP2K| is 
freely available from                            
https://www.cp2k.org/ CP2K| Program compiled at                          
Thu Feb  4 08:49:28 CST 2021 CP2K| Program compiled 
on                                                  k172 CP2K| Program 
compiled for                                                local CP2K| 
Data directory path                       
/home/chenwei/src/cp2k-8.1/data CP2K| Input file 
name                                                   SiC.inp GLOBAL| 
Force Environment number                                              
1 GLOBAL| Basis set file name                                           
BASIS_SET GLOBAL| Potential file name                                      
GTH_POTENTIALS GLOBAL| MM Potential file 
name                                     MM_POTENTIAL GLOBAL| Coordinate 
file name                                      __STD_INPUT__ GLOBAL| Method 
name                                                        CP2K GLOBAL| 
Project name                                                   
SiC_AIMD GLOBAL| Preferred FFT 
library                                             FFTW3 GLOBAL| Preferred 
diagonalization lib.                                     ELPA GLOBAL| Run 
type                                                             MD GLOBAL| 
All-to-all communication in single precision                          
F GLOBAL| FFTs using library dependent 
lengths                                  F GLOBAL| Global print 
level                                                  LOW GLOBAL| MPI I/O 
enabled                                                       T GLOBAL| 
Total number of message passing processes                            
48 GLOBAL| Number of threads for this 
process                                    1 GLOBAL| This output is from 
process                                           0 GLOBAL| CPU model 
name                Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz GLOBAL| 
CPUID                                                              
1002 MEMORY| system memory details [Kb] MEMORY|                        rank 
0           min           max       average MEMORY| MemTotal            
131748504     131748504     131748504     131748504 MEMORY| 
MemFree              67523260      67523260      67523260      
67523260 MEMORY| Buffers                  4712          4712          
4712          4712 MEMORY| Cached               56159648      56159648      
56159648      56159648 MEMORY| Slab                  2740508       
2740508       2740508       2740508 MEMORY| SReclaimable          
2447544       2447544       2447544       2447544 MEMORY| 
MemLikelyFree       126135164     126135164     126135164     
126135164 GENERATE|  Preliminary Number of Bonds 
generated:                             0 GENERATE|  Achieved consistency in 
connectivity generation.*



















* SCF WAVEFUNCTION OPTIMIZATION  Step     Update method      Time    
Convergence         Total energy    Change  
------------------------------------------------------------------------------     
1 NoMix/Diag. 0.40E+00    0.3     3.80220882      -317.7175159821 
-3.18E+02     2 Broy./Diag. 0.40E+00    0.6     0.43368094      
-291.0370906460  2.67E+01     3 Broy./Diag. 0.40E+00    0.6     
0.23506554      -308.2043627628 -1.72E+01     4 Broy./Diag. 0.40E+00    
0.6     0.26390650      -309.7756477106 -1.57E+00     5 Broy./Diag. 
0.40E+00    0.6     0.00311711      -310.0196552337 -2.44E-01     6 
Broy./Diag. 0.40E+00    0.6     0.01762115      -309.8687051316  
1.51E-01     7 Broy./Diag. 0.40E+00    0.6     0.00055086      
-309.8505587170  1.81E-02     8 Broy./Diag. 0.40E+00    0.6     
0.00030811      -309.8516271774 -1.07E-03     9 Broy./Diag. 0.40E+00    
0.6     0.00001506      -309.8519055144 -2.78E-04    10 Broy./Diag. 
0.40E+00    0.6     0.00000129      -309.8519255844 -2.01E-05    11 
Broy./Diag. 0.40E+00    0.6     0.00000032      -309.8519300365 
-4.45E-06    12 Broy./Diag. 0.40E+00    0.6     0.00000002      
-309.8519304271 -3.91E-07  *** SCF run converged in    12 steps ****

Best wishes,

Wei

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20210205/49484602/attachment.htm>


More information about the CP2K-user mailing list