[CP2K-user] CUDA RUNTIME API error: EventRecord failed with error cudaErrorInvalidResourceHandle

Alfio Lazzaro alfio.... at gmail.com
Fri Feb 5 12:50:58 UTC 2021


OK, Thanks for the timers.
I assume you sent me the CPU timers.
As suspected, you are massively dominated by no GPU part. I can even not 
see any COSMA stuff. 
These are the main parts where the time goes:

fft_wrap_pw1pw2_150              228.660
fft3d_ps                         858.660
rs_pw_transfer_RS2PW_150        1206.580
mp_waitall_1                    1749.180
mp_sum_d                        1821.390
build_core_ppnl_forces          2032.140
rs_pw_transfer_PW2RS_150        2063.730
mp_alltoall_d11v                2399.420
mp_waitany                      6457.620
cp_fm_diag_elpa_base            6729.030
grid_integrate_task_list       16394.840
grid_collocate_task_list       18769.980
CP2K_Total                     66823.040

More than half of the total time (66823.040) is in the grid_* functions. 
BTW, for this kind of testings, I suggest using fewer steps...
I suspect you are hitting the performance problem for the CPU and GPU 
reported for the CP2K 8.1 (see https://github.com/cp2k/cp2k/issues/1323 ).
I suggest to try CP2K 7.1...

Alfio





Il giorno venerdì 5 febbraio 2021 alle 10:54:47 UTC+1 singlebook ha scritto:

>  DBCSR| CPU Multiplication 
> driver                                           XSMM
>  DBCSR| Multrec recursion 
> limit                                              512
>  DBCSR| Multiplication stack 
> size                                           1000
>  DBCSR| Maximum elements for images                                    
> UNLIMITED
>  DBCSR| Multiplicative factor virtual 
> images                                   1
>  DBCSR| Use multiplication 
> densification                                       T
>  DBCSR| Multiplication size 
> stacks                                             3
>  DBCSR| Use memory pool for CPU 
> allocation                                     F
>  DBCSR| Number of 3D layers                                               
> SINGLE
>  DBCSR| Use MPI memory 
> allocation                                              F
>  DBCSR| Use RMA 
> algorithm                                                      F
>  DBCSR| Use Communication 
> thread                                               T
>  DBCSR| Communication thread 
> load                                             87
>  DBCSR| MPI: My node 
> id                                                        0
>  DBCSR| MPI: Number of 
> nodes                                                  48
>  DBCSR| OMP: Current number of 
> threads                                         1
>  DBCSR| OMP: Max number of 
> threads                                             1
>  DBCSR| Split modifier for TAS multiplication algorithm                  
> 1.0E+00
>
>
>   **** **** ******  **  PROGRAM STARTED AT               2021-02-04 
> 09:18:01.088
>  ***** ** ***  *** **   PROGRAM STARTED 
> ON                                  k172
>  **    ****   ******    PROGRAM STARTED BY                               
> chenwei
>  ***** **    ** ** **   PROGRAM PROCESS ID                                 
> 52126
>   **** **  *******  **  PROGRAM STARTED IN /ncsfs02/chenwei/Machine 
> Learning/CP2
>                                            K/SiC
>
>  CP2K| version string:                                          CP2K 
> version 8.1
>  CP2K| source code revision number:                                  
> git:0b61f2f
>  CP2K| cp2kflags: omp libint fftw3 libxc elpa parallel mpi3 scalapack xsmm 
> plume
>  CP2K|            d2 spglib libvori libbqb
>  CP2K| is freely available from                            
> https://www.cp2k.org/
>  CP2K| Program compiled at                          Thu Feb  4 08:49:28 
> CST 2021
>  CP2K| Program compiled 
> on                                                  k172
>  CP2K| Program compiled for                                                
> local
>  CP2K| Data directory path                       
> /home/chenwei/src/cp2k-8.1/data
>  CP2K| Input file name                                                   
> SiC.inp
>
>  GLOBAL| Force Environment 
> number                                              1
>  GLOBAL| Basis set file name                                           
> BASIS_SET
>  GLOBAL| Potential file name                                      
> GTH_POTENTIALS
>  GLOBAL| MM Potential file name                                     
> MM_POTENTIAL
>  GLOBAL| Coordinate file name                                      
> __STD_INPUT__
>  GLOBAL| Method 
> name                                                        CP2K
>  GLOBAL| Project name                                                   
> SiC_AIMD
>  GLOBAL| Preferred FFT library                                             
> FFTW3
>  GLOBAL| Preferred diagonalization 
> lib.                                     ELPA
>  GLOBAL| Run 
> type                                                             MD
>  GLOBAL| All-to-all communication in single 
> precision                          F
>  GLOBAL| FFTs using library dependent 
> lengths                                  F
>  GLOBAL| Global print 
> level                                                  LOW
>  GLOBAL| MPI I/O 
> enabled                                                       T
>  GLOBAL| Total number of message passing 
> processes                            48
>  GLOBAL| Number of threads for this 
> process                                    1
>  GLOBAL| This output is from 
> process                                           0
>  GLOBAL| CPU model name                Intel(R) Xeon(R) CPU E5-2680 v4 @ 
> 2.40GHz
>  GLOBAL| 
> CPUID                                                              1002
>
>  MEMORY| system memory details [Kb]
>  MEMORY|                        rank 0           min           max       
> average
>  MEMORY| MemTotal            131748504     131748504     131748504     
> 131748504
>  MEMORY| MemFree              67523260      67523260      67523260      
> 67523260
>  MEMORY| Buffers                  4712          4712          
> 4712          4712
>  MEMORY| Cached               56159648      56159648      56159648      
> 56159648
>  MEMORY| Slab                  2740508       2740508       2740508       
> 2740508
>  MEMORY| SReclaimable          2447544       2447544       2447544       
> 2447544
>  MEMORY| MemLikelyFree       126135164     126135164     126135164     
> 126135164
>
>
>  GENERATE|  Preliminary Number of Bonds 
> generated:                             0
>  GENERATE|  Achieved consistency in connectivity generation.
>
>
>  *******************************************************************************
>
>  *******************************************************************************
>  **                                                                           
> **
>  **     #####                         ##              
> ##                      **
>  **    ##   ##            ##          ##              
> ##                      **
>  **   ##     ##                       ##            
> ######                    **
>  **   ##     ##  ##   ##  ##   #####  ##  ##   ####   ##    #####    
> #####    **
>  **   ##     ##  ##   ##  ##  ##      ## ##   ##      ##   ##   ##  ##   
> ##   **
>  **   ##  ## ##  ##   ##  ##  ##      ####     ###    ##   ######   
> ######    **
>  **    ##  ###   ##   ##  ##  ##      ## ##      ##   ##   ##       
> ##        **
>  **     #######   #####   ##   #####  ##  ##  ####    ##    #####   
> ##        **
>  **           ##                                                    
> ##        **
>  **                                                                           
> **
>  **                                                ... make the atoms 
> dance   **
>  **                                                                           
> **
>  **            Copyright (C) by CP2K developers group (2000 - 
> 2020)           **
>  **                      J. Chem. Phys. 152, 194103 
> (2020)                    **
>  **                                                                           
> **
>
>  *******************************************************************************
>
>
>  TOTAL NUMBERS AND MAXIMUM NUMBERS
>
>   Total number of            - Atomic 
> kinds:                                   2
>                              - 
> Atoms:                                         64
>                              - Shell 
> sets:                                   128
>                              - 
> Shells:                                       320
>                              - Primitive Cartesian 
> functions:                320
>                              - Cartesian basis 
> functions:                    896
>                              - Spherical basis 
> functions:                    832
>
>   Maximum angular momentum of- Orbital basis 
> functions:                        2
>                              - Local part of the GTH 
> pseudopotential:          2
>                              - Non-local part of the GTH 
> pseudopotential:      2
>
>
>  SCF PARAMETERS         Density guess:                                    
> ATOMIC
>                         
> --------------------------------------------------------
>                         
> max_scf:                                             300
>                         
> max_scf_history:                                       0
>                         
> max_diis:                                              4
>                         
> --------------------------------------------------------
>                         eps_scf:                                        
> 1.00E-07
>                         eps_scf_history:                                
> 0.00E+00
>                         eps_diis:                                       
> 1.00E-01
>                         eps_eigval:                                     
> 1.00E-05
>                         
> --------------------------------------------------------
>                         level_shift 
> [a.u.]:                                 0.00
>                         
> --------------------------------------------------------
>                         Mixing method:                            
> BROYDEN_MIXING
>                                                 charge density mixing in 
> g-space
>                         
> --------------------------------------------------------
>                         No outer SCF
>
>  PW_GRID| Information for grid 
> number                                          1
>  PW_GRID| Grid distributed over                                    48 
> processors
>  PW_GRID| Real space group dimensions                                    
> 48    1
>  PW_GRID| the grid is 
> blocked:                                                NO
>  PW_GRID| Cutoff [a.u.]                                                    
> 150.0
>  PW_GRID| spherical 
> cutoff:                                                   NO
>  PW_GRID|   Bounds   1            -48      47                
> Points:          96
>  PW_GRID|   Bounds   2            -48      47                
> Points:          96
>  PW_GRID|   Bounds   3            -48      47                
> Points:          96
>  PW_GRID| Volume element (a.u.^3)  0.5016E-02     Volume (a.u.^3)      
> 4437.6722
>  PW_GRID| Grid span                                                    
> FULLSPACE
>  PW_GRID|   Distribution                         Average         
> Max         Min
>  PW_GRID|   G-Vectors                            18432.0       18432       
> 18432
>  PW_GRID|   G-Rays                                 192.0         
> 192         192
>  PW_GRID|   Real Space Points                    18432.0       18432       
> 18432
>
>  PW_GRID| Information for grid 
> number                                          2
>  PW_GRID| Number of the reference 
> grid                                         1
>  PW_GRID| Grid distributed over                                    48 
> processors
>  PW_GRID| Real space group dimensions                                    
> 48    1
>  PW_GRID| the grid is 
> blocked:                                                NO
>  PW_GRID| Cutoff 
> [a.u.]                                                     50.0
>  PW_GRID| spherical 
> cutoff:                                                   NO
>  PW_GRID|   Bounds   1            -27      26                
> Points:          54
>  PW_GRID|   Bounds   2            -27      26                
> Points:          54
>  PW_GRID|   Bounds   3            -27      26                
> Points:          54
>  PW_GRID| Volume element (a.u.^3)  0.2818E-01     Volume (a.u.^3)      
> 4437.6722
>  PW_GRID| Grid span                                                    
> FULLSPACE
>  PW_GRID|   Distribution                         Average         
> Max         Min
>  PW_GRID|   G-Vectors                             3280.5        
> 3402        3186
>  PW_GRID|   G-Rays                                  60.8          
> 63          59
>  PW_GRID|   Real Space Points                     3280.5        
> 5832        2916
>
>  PW_GRID| Information for grid 
> number                                          3
>  PW_GRID| Number of the reference 
> grid                                         1
>  PW_GRID| Grid distributed over                                    48 
> processors
>  PW_GRID| Real space group dimensions                                     
> 6    8
>  PW_GRID| the grid is 
> blocked:                                                NO
>  PW_GRID| Cutoff 
> [a.u.]                                                     16.7
>  PW_GRID| spherical 
> cutoff:                                                   NO
>  PW_GRID|   Bounds   1            -16      15                
> Points:          32
>  PW_GRID|   Bounds   2            -16      15                
> Points:          32
>  PW_GRID|   Bounds   3            -16      15                
> Points:          32
>  PW_GRID| Volume element (a.u.^3)  0.1354         Volume (a.u.^3)      
> 4437.6722
>  PW_GRID| Grid span                                                    
> FULLSPACE
>  PW_GRID|   Distribution                         Average         
> Max         Min
>  PW_GRID|   G-Vectors                              682.7         
> 704         640
>  PW_GRID|   G-Rays                                  21.3          
> 22          20
>  PW_GRID|   Real Space Points                      682.7         
> 768         640
>
>  PW_GRID| Information for grid 
> number                                          4
>  PW_GRID| Number of the reference 
> grid                                         1
>  PW_GRID| Grid distributed over                                    48 
> processors
>  PW_GRID| Real space group dimensions                                     
> 6    8
>  PW_GRID| the grid is 
> blocked:                                                NO
>  PW_GRID| Cutoff 
> [a.u.]                                                      5.6
>  PW_GRID| spherical 
> cutoff:                                                   NO
>  PW_GRID|   Bounds   1             -9       8                
> Points:          18
>  PW_GRID|   Bounds   2             -9       8                
> Points:          18
>  PW_GRID|   Bounds   3             -9       8                
> Points:          18
>  PW_GRID| Volume element (a.u.^3)  0.7609         Volume (a.u.^3)      
> 4437.6722
>  PW_GRID| Grid span                                                    
> FULLSPACE
>  PW_GRID|   Distribution                         Average         
> Max         Min
>  PW_GRID|   G-Vectors                              121.5         
> 144         108
>  PW_GRID|   G-Rays                                   6.8           
> 8           6
>  PW_GRID|   Real Space Points                      121.5         
> 162         108
>
>  POISSON| Solver                                                        
> PERIODIC
>  POISSON| 
> Periodicity                                                        XYZ
>
>  RS_GRID| Information for grid 
> number                                          1
>  RS_GRID|   Bounds   1            -48      47                
> Points:          96
>  RS_GRID|   Bounds   2            -48      47                
> Points:          96
>  RS_GRID|   Bounds   3            -48      47                
> Points:          96
>  RS_GRID| Real space distribution over                                  6 
> groups
>  RS_GRID| Real space distribution along 
> direction                              2
>  RS_GRID| Border 
> size                                                         26
>  RS_GRID| Real space distribution over                                  8 
> groups
>  RS_GRID| Real space distribution along 
> direction                              3
>  RS_GRID| Border 
> size                                                         26
>  RS_GRID|   Distribution                         Average         
> Max         Min
>  RS_GRID|   Planes                                  68.0          
> 68          68
>  RS_GRID|   Distribution                         Average         
> Max         Min
>  RS_GRID|   Planes                                  64.0          
> 64          64
>
>  RS_GRID| Information for grid 
> number                                          2
>  RS_GRID|   Bounds   1            -27      26                
> Points:          54
>  RS_GRID|   Bounds   2            -27      26                
> Points:          54
>  RS_GRID|   Bounds   3            -27      26                
> Points:          54
>  RS_GRID| Real space fully replicated
>  RS_GRID| Group 
> size                                                           1
>
>  RS_GRID| Information for grid 
> number                                          3
>  RS_GRID|   Bounds   1            -16      15                
> Points:          32
>  RS_GRID|   Bounds   2            -16      15                
> Points:          32
>  RS_GRID|   Bounds   3            -16      15                
> Points:          32
>  RS_GRID| Real space fully replicated
>  RS_GRID| Group 
> size                                                           1
>
>  RS_GRID| Information for grid 
> number                                          4
>  RS_GRID|   Bounds   1             -9       8                
> Points:          18
>  RS_GRID|   Bounds   2             -9       8                
> Points:          18
>  RS_GRID|   Bounds   3             -9       8                
> Points:          18
>  RS_GRID| Real space fully replicated
>  RS_GRID| Group 
> size                                                           1
>
>  MD_PAR| Molecular dynamics protocol (MD input parameters)
>  MD_PAR| Ensemble 
> type                                                       NVT
>  MD_PAR| Number of time steps                                              
> 10000
>  MD_PAR| Time step [fs]                                                 
> 0.500000
>  MD_PAR| Temperature [K]                                              
> 300.000000
>  MD_PAR| Temperature tolerance [K]                                      
> 0.000000
>  MD_PAR| Print MD information every                                   10 
> step(s)
>  MD_PAR| File type   Print frequency [steps]                          File 
> names
>  MD_PAR| Coordinates         10                               
> SiC_AIMD-pos-1.xyz
>  MD_PAR| Velocities          10                               
> SiC_AIMD-vel-1.xyz
>  MD_PAR| Energies            10                                  
> SiC_AIMD-1.ener
>  MD_PAR| Dump                20                               
> SiC_AIMD-1.restart
>
>  ROT| Rotational analysis information
>  ROT| Principal axes and moments of inertia [a.u.]
>  ROT|                           1                   2                   3
>  ROT| Eigenvalues      9.86893119935E+07   1.19427476747E+08   
> 1.19427476747E+08
>  ROT|      x              0.577350269190     -0.408248290464      
> 0.707106781187
>  ROT|      y              0.577350269190     -0.408248290464     
> -0.707106781187
>  ROT|      z              0.577350269190      0.816496580928      
> 0.000000000000
>  ROT| Number of rotovibrational 
> vectors                                        6
>
>  DOF| Calculation of degrees of freedom
>  DOF| Number of 
> atoms                                                         64
>  DOF| Number of intramolecular 
> constraints                                     0
>  DOF| Number of intermolecular 
> constraints                                     0
>  DOF| Invariants (translations + 
> rotations)                                    3
>  DOF| Degrees of 
> freedom                                                     189
>
>  DOF| Restraints information
>  DOF| Number of intramolecular 
> restraints                                      0
>  DOF| Number of intermolecular 
> restraints                                      0
>
>  THERMOSTAT| Thermostat information for PARTICLES
>  THERMOSTAT| Type of thermostat                               
> Nose-Hoover-Chains
>  THERMOSTAT| Nose-Hoover-Chain 
> length                                          3
>  THERMOSTAT| Nose-Hoover-Chain time constant [fs]                    
> 1000.000000
>  THERMOSTAT| Order of Yoshida 
> integrator                                       3
>  THERMOSTAT| Number of multiple time 
> steps                                     2
>  THERMOSTAT| Initial potential energy                         
> 0.000000000000E+00
>  THERMOSTAT| Initial kinetic energy                           
> 0.475022301493E-03
>  THERMOSTAT| End of thermostat information for PARTICLES
>
>  MD_VEL| Velocities initialization
>  MD_VEL| Initial temperature [K]                                      
> 300.000000
>  MD_VEL| COM velocity             0.0000000000    -0.0000000000    
> -0.0000000000
>
>  Number of 
> electrons:                                                        256
>  Number of occupied 
> orbitals:                                                128
>  Number of molecular 
> orbitals:                                               128
>
>  Number of orbital 
> functions:                                                832
>  Number of independent orbital 
> functions:                                    832
>
>  Extrapolation method: initial_guess
>
>
>
>
>  -------------------------------------------------------------------------------
>  -                                                                             
> -
>  -                                DBCSR 
> STATISTICS                             -
>  -                                                                             
> -
>
>  -------------------------------------------------------------------------------
>  COUNTER                                    TOTAL       BLAS       
> SMM       ACC
>  flops    13 x    32 x    13        7086601666560       0.0%    
> 100.0%      0.0%
>  flops    13 x    13 x    32        9891694059520       0.0%    
> 100.0%      0.0%
>  flops inhomo. stacks                           0       0.0%      
> 0.0%      0.0%
>  flops total                        16.978296E+12       0.0%    
> 100.0%      0.0%
>  flops max/rank                    732.153860E+09       0.0%    
> 100.0%      0.0%
>  matmuls inhomo. stacks                         0       0.0%      
> 0.0%      0.0%
>  matmuls total                         1569738880       0.0%    
> 100.0%      0.0%
>  number of processed stacks              28782912       0.0%    
> 100.0%      0.0%
>  average stack size                                     0.0      
> 54.5       0.0
>  marketing flops                    26.595494E+12
>
>  -------------------------------------------------------------------------------
>  # multiplications                         149911
>  max memory usage/rank             153.088000E+06
>  # max total images/rank                        3
>  # max 3D layers                                1
>  # MPI messages exchanged               143914560
>  MPI messages size (bytes):
>   total size                         3.855411E+12
>   min size                           0.000000E+00
>   max size                         137.904000E+03
>   average size                      26.789580E+03
>  MPI breakdown and total messages size (bytes):
>              size <=      128            81866560                        0
>        128 < size <=     8192                   0                        0
>       8192 < size <=    32768            21587184             383158124544
>      32768 < size <=   131072            36941696            2980859518208
>     131072 < size <=  4194304             3519120             485300724480
>    4194304 < size <= 16777216                   0                        0
>   16777216 < size                               0                        0
>
>  -------------------------------------------------------------------------------
>
>  *** WARNING in dbcsr_mm.F:294 :: Using a non-square number of MPI ranks 
> ***
>  *** might lead to poor performance. Used ranks: 48 Suggested: 49 100    
> ***
>
>
>  -------------------------------------------------------------------------------
>  -                                                                             
> -
>  -                      DBCSR MESSAGE PASSING 
> PERFORMANCE                      -
>  -                                                                             
> -
>
>  -------------------------------------------------------------------------------
>  ROUTINE             CALLS      AVE VOLUME [Bytes]
>  MP_Bcast                3                     12.
>  MP_Allreduce       869441                      8.
>  MP_Alltoall       3098138                  32851.
>  MP_ISend          7195728                  12717.
>  MP_IRecv          7195728                  11224.
>
>  -------------------------------------------------------------------------------
>
>
>  -------------------------------------------------------------------------------
>  -                                                                             
> -
>  -                                GRID 
> STATISTICS                              -
>  -                                                                             
> -
>
>  -------------------------------------------------------------------------------
>  LP    KERNEL             BACKEND                              COUNT     
> PERCENT
>  2     collocate ortho    REF                             9708713949 
> <(970)%20871-3949>      36.60%
>  4     integrate ortho    REF                              529879041       
> 2.00%
>  4     collocate ortho    REF                              221635148       
> 0.84%
>  2     integrate ortho    REF                             8736976861 
> <(873)%20697-6861>      32.94%
>  0     collocate general  REF                               30723072       
> 0.12%
>  1     integrate general  REF                               30723072       
> 0.12%
>  5     integrate ortho    REF                               22183061       
> 0.08%
>  3     integrate ortho    REF                             3942635281      
> 14.86%
>  3     collocate ortho    REF                             3301325147      
> 12.45%
>
>  -------------------------------------------------------------------------------
>
>  MEMORY| Estimated peak process memory 
> [MiB]                                 146
>
>
>  -------------------------------------------------------------------------------
>  ----                             MULTIGRID 
> INFO                            ----
>
>  -------------------------------------------------------------------------------
>  count for grid        1:      110066116          cutoff [a.u.]          
> 150.00
>  count for grid        2:      519820015          cutoff [a.u.]           
> 50.00
>  count for grid        3:      459986613          cutoff [a.u.]           
> 16.67
>  count for grid        4:      235051958          cutoff [a.u.]            
> 5.56
>  total gridlevel count  :     1324924702
>
>
>  -------------------------------------------------------------------------------
>  -                                                                             
> -
>  -                         MESSAGE PASSING 
> PERFORMANCE                         -
>  -                                                                             
> -
>
>  -------------------------------------------------------------------------------
>
>  ROUTINE             CALLS      AVE VOLUME [Bytes]
>  MP_Group                4
>  MP_Bcast           203792                   2218.
>  MP_Allreduce      1459647                    265.
>  MP_Sync                 4
>  MP_Alltoall       1818671                 396307.
>  MP_ISendRecv     28177722                  18032.
>  MP_Wait          42247738
>  MP_ISend         12750952                  57626.
>  MP_IRecv         12750952                  57626.
>
>  -------------------------------------------------------------------------------
>
>
>
>  -------------------------------------------------------------------------------
>  -                                                                             
> -
>  -                                T I M I N 
> G                                  -
>  -                                                                             
> -
>
>  -------------------------------------------------------------------------------
>  SUBROUTINE                       CALLS  ASD         SELF TIME        
> TOTAL TIME
>                                 MAXIMUM       AVERAGE  MAXIMUM  AVERAGE  
> MAXIMUM
>  CP2K                                 1  1.0     0.01     0.01 66822.69 
> 66823.04
>  qs_mol_dyn_low                       1  2.0     0.34     0.37 66822.51 
> 66822.86
>  velocity_verlet                  10000  3.0     1.48     5.04 66810.62 
> 66811.08
>  qs_forces                        10001  4.0     0.98     1.02 66806.91 
> 66807.26
>  qs_energies                      10001  5.0     0.88     1.24 59685.56 
> 59686.71
>  scf_env_do_scf                   10001  6.0     0.94     1.73 54615.83 
> 54617.31
>  scf_env_do_scf_inner_loop        89920  7.0     4.83    26.14 54614.78 
> 54616.21
>  rebuild_ks_matrix                99921  8.7     0.40     0.46 25783.42 
> 25795.09
>  qs_ks_build_kohn_sham_matrix     99921  9.7    13.65    14.24 25783.02 
> 25794.65
>  qs_rho_update_rho                99921  8.1     0.53     0.65 25411.34 
> 25412.68
>  calculate_rho_elec               99921  9.1    10.26    10.68 25410.81 
> 25412.19
>  sum_up_and_integrate             99921 10.7    10.04    11.19 24320.21 
> 24334.14
>  integrate_v_rspace               99921 11.7     3.82     4.21 24309.99 
> 24324.85
>  qs_ks_update_qs_env              89920  8.0     0.78     0.91 22462.31 
> 22473.54
>  grid_collocate_task_list         99921 10.1 18451.53 18769.98 18451.53 
> 18769.98
>  grid_integrate_task_list         99921 12.7 16303.94 16394.84 16303.94 
> 16394.84
>  rs_pw_transfer                  819370 12.3    15.23    17.78 11655.48 
> 12071.19
>  qs_scf_new_mos                   89920  8.0     1.71     1.94  8270.35  
> 8321.12
>  eigensolver                      89920  9.0     5.28     7.69  7862.09  
> 7870.32
>  density_rs2pw                    99921 10.1     6.01     6.82  6836.50  
> 7045.41
>  cp_fm_diag_elpa                  89920 10.0     0.64     0.79  6757.80  
> 6804.53
>  cp_fm_diag_elpa_base             89920 11.0  6676.81  6729.03  6756.91  
> 6803.67
>  mp_waitany                     ******* 14.1  5758.84  6457.62  5758.84  
> 6457.62
>  potential_pw2rs                  99921 12.7     6.04     6.56  5839.37  
> 5848.24
>  rs_pw_transfer_RS2PW_150        109922 11.9  1068.20  1206.58  5210.54  
> 5627.18
>  rs_pw_transfer_PW2RS_150        109922 14.3  1943.71  2063.73  4455.92  
> 4497.89
>  build_core_hamiltonian_matrix_   10001  5.0     0.39     0.44  2865.88  
> 3438.38
>  qs_ks_update_qs_env_forces       10001  5.0     0.05     0.06  3365.19  
> 3366.37
>  init_scf_run                     10001  6.0     0.61     0.93  3252.05  
> 3253.43
>  scf_env_initial_rho_setup        10001  7.0     0.24     1.03  3175.29  
> 3176.49
>  wfi_extrapolate                  10001  8.0     0.91     1.00  3104.21  
> 3104.23
>  pw_transfer                    1288972 11.8    67.54    70.98  2676.70  
> 2707.44
>  fft_wrap_pw1pw2                1089130 12.8    10.61    11.18  2555.45  
> 2585.55
>  mp_alltoall_d11v               1529045 12.0  2279.64  2399.42  2279.64  
> 2399.42
>  fft_wrap_pw1pw2_150             489604 13.2   220.23   228.66  2227.19  
> 2283.38
>  rs_gather_matrices               99921 12.7    10.55    14.72  2150.73  
> 2276.05
>  build_core_ppnl_forces           10001  6.0  1724.02  2032.14  1724.02  
> 2032.14
>  fft3d_ps                       1089130 14.8   824.46   858.66  1971.84  
> 1994.23
>  mp_sum_d                        869728 10.8  1050.61  1821.39  1050.61  
> 1821.39
>  qs_energies_init_hamiltonians    10001  6.0     0.17     0.19  1767.07  
> 1767.08
>  mp_waitall_1                   ******* 14.6  1405.52  1749.18  1405.52  
> 1749.18
>  calculate_ecore_overlap          20002  6.0     0.24     0.35   885.01  
> 1685.36
>
>  -------------------------------------------------------------------------------
>
>  The number of warnings for this run is : 1
>
> On Friday, February 5, 2021 at 5:43:48 PM UTC+8 Alfio Lazzaro wrote:
>
>> Well, what I need is the top (let's say up to "SCF WAVEFUNCTION 
>> OPTIMIZATION") and the bottom of the logs (starting at "DBCSR STATISTICS").
>>
>> Il giorno venerdì 5 febbraio 2021 alle 09:24:34 UTC+1 singlebook ha 
>> scritto:
>>
>>> Hello,  Alfio,
>>>
>>> Yes, there are 12 MPI ranks, each rank has only one thread.
>>> The output file is too large to upload, I only  put the head information 
>>> for the cpu version here, those files for gpu are not saved for the moment. 
>>> Whenever the workstation is idle, I will do more tests.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *DBCSR| CPU Multiplication 
>>> driver                                           XSMM DBCSR| Multrec 
>>> recursion limit                                              512 DBCSR| 
>>> Multiplication stack size                                           
>>> 1000 DBCSR| Maximum elements for images                                    
>>> UNLIMITED DBCSR| Multiplicative factor virtual 
>>> images                                   1 DBCSR| Use multiplication 
>>> densification                                       T DBCSR| Multiplication 
>>> size stacks                                             3 DBCSR| Use memory 
>>> pool for CPU allocation                                     F DBCSR| Number 
>>> of 3D layers                                               SINGLE DBCSR| 
>>> Use MPI memory allocation                                              
>>> F DBCSR| Use RMA 
>>> algorithm                                                      F DBCSR| Use 
>>> Communication thread                                               T DBCSR| 
>>> Communication thread load                                             
>>> 87 DBCSR| MPI: My node 
>>> id                                                        0 DBCSR| MPI: 
>>> Number of nodes                                                  48 DBCSR| 
>>> OMP: Current number of threads                                         
>>> 1 DBCSR| OMP: Max number of 
>>> threads                                             1 DBCSR| Split modifier 
>>> for TAS multiplication algorithm                  1.0E+00  **** **** 
>>> ******  **  PROGRAM STARTED AT               2021-02-04 09:18:01.088 ***** 
>>> ** ***  *** **   PROGRAM STARTED ON                                  
>>> k172 **    ****   ******    PROGRAM STARTED 
>>> BY                               chenwei ***** **    ** ** **   PROGRAM 
>>> PROCESS ID                                 52126  **** **  *******  **  
>>> PROGRAM STARTED IN /ncsfs02/chenwei/Machine 
>>> Learning/CP2                                           K/SiC CP2K| version 
>>> string:                                          CP2K version 8.1 CP2K| 
>>> source code revision number:                                  
>>> git:0b61f2f CP2K| cp2kflags: omp libint fftw3 libxc elpa parallel mpi3 
>>> scalapack xsmm plume CP2K|            d2 spglib libvori libbqb CP2K| is 
>>> freely available from                            https://www.cp2k.org/ 
>>> <https://www.cp2k.org/> CP2K| Program compiled at                          
>>> Thu Feb  4 08:49:28 CST 2021 CP2K| Program compiled 
>>> on                                                  k172 CP2K| Program 
>>> compiled for                                                local CP2K| 
>>> Data directory path                       
>>> /home/chenwei/src/cp2k-8.1/data CP2K| Input file 
>>> name                                                   SiC.inp GLOBAL| 
>>> Force Environment number                                              
>>> 1 GLOBAL| Basis set file name                                           
>>> BASIS_SET GLOBAL| Potential file name                                      
>>> GTH_POTENTIALS GLOBAL| MM Potential file 
>>> name                                     MM_POTENTIAL GLOBAL| Coordinate 
>>> file name                                      __STD_INPUT__ GLOBAL| Method 
>>> name                                                        CP2K GLOBAL| 
>>> Project name                                                   
>>> SiC_AIMD GLOBAL| Preferred FFT 
>>> library                                             FFTW3 GLOBAL| Preferred 
>>> diagonalization lib.                                     ELPA GLOBAL| Run 
>>> type                                                             MD GLOBAL| 
>>> All-to-all communication in single precision                          
>>> F GLOBAL| FFTs using library dependent 
>>> lengths                                  F GLOBAL| Global print 
>>> level                                                  LOW GLOBAL| MPI I/O 
>>> enabled                                                       T GLOBAL| 
>>> Total number of message passing processes                            
>>> 48 GLOBAL| Number of threads for this 
>>> process                                    1 GLOBAL| This output is from 
>>> process                                           0 GLOBAL| CPU model 
>>> name                Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz GLOBAL| 
>>> CPUID                                                              
>>> 1002 MEMORY| system memory details [Kb] MEMORY|                        rank 
>>> 0           min           max       average MEMORY| MemTotal            
>>> 131748504     131748504     131748504     131748504 MEMORY| 
>>> MemFree              67523260      67523260      67523260      
>>> 67523260 MEMORY| Buffers                  4712          4712          
>>> 4712          4712 MEMORY| Cached               56159648      56159648      
>>> 56159648      56159648 MEMORY| Slab                  2740508       
>>> 2740508       2740508       2740508 MEMORY| SReclaimable          
>>> 2447544       2447544       2447544       2447544 MEMORY| 
>>> MemLikelyFree       126135164     126135164     126135164     
>>> 126135164 GENERATE|  Preliminary Number of Bonds 
>>> generated:                             0 GENERATE|  Achieved consistency in 
>>> connectivity generation.*
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> * SCF WAVEFUNCTION OPTIMIZATION  Step     Update method      Time    
>>> Convergence         Total energy    Change  
>>> ------------------------------------------------------------------------------     
>>> 1 NoMix/Diag. 0.40E+00    0.3     3.80220882      -317.7175159821 
>>> -3.18E+02     2 Broy./Diag. 0.40E+00    0.6     0.43368094      
>>> -291.0370906460  2.67E+01     3 Broy./Diag. 0.40E+00    0.6     
>>> 0.23506554      -308.2043627628 -1.72E+01     4 Broy./Diag. 0.40E+00    
>>> 0.6     0.26390650      -309.7756477106 -1.57E+00     5 Broy./Diag. 
>>> 0.40E+00    0.6     0.00311711      -310.0196552337 -2.44E-01     6 
>>> Broy./Diag. 0.40E+00    0.6     0.01762115      -309.8687051316  
>>> 1.51E-01     7 Broy./Diag. 0.40E+00    0.6     0.00055086      
>>> -309.8505587170  1.81E-02     8 Broy./Diag. 0.40E+00    0.6     
>>> 0.00030811      -309.8516271774 -1.07E-03     9 Broy./Diag. 0.40E+00    
>>> 0.6     0.00001506      -309.8519055144 -2.78E-04    10 Broy./Diag. 
>>> 0.40E+00    0.6     0.00000129      -309.8519255844 -2.01E-05    11 
>>> Broy./Diag. 0.40E+00    0.6     0.00000032      -309.8519300365 
>>> -4.45E-06    12 Broy./Diag. 0.40E+00    0.6     0.00000002      
>>> -309.8519304271 -3.91E-07  *** SCF run converged in    12 steps ****
>>>
>>> Best wishes,
>>>
>>> Wei
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20210205/d9377c2c/attachment.htm>


More information about the CP2K-user mailing list