[CP2K-user] CUDA RUNTIME API error: EventRecord failed with error cudaErrorInvalidResourceHandle

Wei Chen chenw... at gmail.com
Sat Feb 6 00:25:10 UTC 2021


Thank you very much for your reply. 

Best wishes,

Wei

On Friday, February 5, 2021 at 8:50:59 PM UTC+8 Alfio Lazzaro wrote:

> OK, Thanks for the timers.
> I assume you sent me the CPU timers.
> As suspected, you are massively dominated by no GPU part. I can even not 
> see any COSMA stuff. 
> These are the main parts where the time goes:
>
> fft_wrap_pw1pw2_150              228.660
> fft3d_ps                         858.660
> rs_pw_transfer_RS2PW_150        1206.580
> mp_waitall_1                    1749.180
> mp_sum_d                        1821.390
> build_core_ppnl_forces          2032.140
> rs_pw_transfer_PW2RS_150        2063.730
> mp_alltoall_d11v                2399.420
> mp_waitany                      6457.620
> cp_fm_diag_elpa_base            6729.030
> grid_integrate_task_list       16394.840
> grid_collocate_task_list       18769.980
> CP2K_Total                     66823.040
>
> More than half of the total time (66823.040) is in the grid_* functions. 
> BTW, for this kind of testings, I suggest using fewer steps...
> I suspect you are hitting the performance problem for the CPU and GPU 
> reported for the CP2K 8.1 (see https://github.com/cp2k/cp2k/issues/1323 ).
> I suggest to try CP2K 7.1...
>
> Alfio
>
>
>
>
>
> Il giorno venerdì 5 febbraio 2021 alle 10:54:47 UTC+1 singlebook ha 
> scritto:
>
>>  DBCSR| CPU Multiplication 
>> driver                                           XSMM
>>  DBCSR| Multrec recursion 
>> limit                                              512
>>  DBCSR| Multiplication stack 
>> size                                           1000
>>  DBCSR| Maximum elements for images                                    
>> UNLIMITED
>>  DBCSR| Multiplicative factor virtual 
>> images                                   1
>>  DBCSR| Use multiplication 
>> densification                                       T
>>  DBCSR| Multiplication size 
>> stacks                                             3
>>  DBCSR| Use memory pool for CPU 
>> allocation                                     F
>>  DBCSR| Number of 3D layers                                               
>> SINGLE
>>  DBCSR| Use MPI memory 
>> allocation                                              F
>>  DBCSR| Use RMA 
>> algorithm                                                      F
>>  DBCSR| Use Communication 
>> thread                                               T
>>  DBCSR| Communication thread 
>> load                                             87
>>  DBCSR| MPI: My node 
>> id                                                        0
>>  DBCSR| MPI: Number of 
>> nodes                                                  48
>>  DBCSR| OMP: Current number of 
>> threads                                         1
>>  DBCSR| OMP: Max number of 
>> threads                                             1
>>  DBCSR| Split modifier for TAS multiplication algorithm                  
>> 1.0E+00
>>
>>
>>   **** **** ******  **  PROGRAM STARTED AT               2021-02-04 
>> 09:18:01.088
>>  ***** ** ***  *** **   PROGRAM STARTED 
>> ON                                  k172
>>  **    ****   ******    PROGRAM STARTED BY                               
>> chenwei
>>  ***** **    ** ** **   PROGRAM PROCESS 
>> ID                                 52126
>>   **** **  *******  **  PROGRAM STARTED IN /ncsfs02/chenwei/Machine 
>> Learning/CP2
>>                                            K/SiC
>>
>>  CP2K| version string:                                          CP2K 
>> version 8.1
>>  CP2K| source code revision number:                                  
>> git:0b61f2f
>>  CP2K| cp2kflags: omp libint fftw3 libxc elpa parallel mpi3 scalapack 
>> xsmm plume
>>  CP2K|            d2 spglib libvori libbqb
>>  CP2K| is freely available from                            
>> https://www.cp2k.org/
>>  CP2K| Program compiled at                          Thu Feb  4 08:49:28 
>> CST 2021
>>  CP2K| Program compiled 
>> on                                                  k172
>>  CP2K| Program compiled 
>> for                                                local
>>  CP2K| Data directory path                       
>> /home/chenwei/src/cp2k-8.1/data
>>  CP2K| Input file name                                                   
>> SiC.inp
>>
>>  GLOBAL| Force Environment 
>> number                                              1
>>  GLOBAL| Basis set file name                                           
>> BASIS_SET
>>  GLOBAL| Potential file name                                      
>> GTH_POTENTIALS
>>  GLOBAL| MM Potential file name                                     
>> MM_POTENTIAL
>>  GLOBAL| Coordinate file name                                      
>> __STD_INPUT__
>>  GLOBAL| Method 
>> name                                                        CP2K
>>  GLOBAL| Project name                                                   
>> SiC_AIMD
>>  GLOBAL| Preferred FFT 
>> library                                             FFTW3
>>  GLOBAL| Preferred diagonalization 
>> lib.                                     ELPA
>>  GLOBAL| Run 
>> type                                                             MD
>>  GLOBAL| All-to-all communication in single 
>> precision                          F
>>  GLOBAL| FFTs using library dependent 
>> lengths                                  F
>>  GLOBAL| Global print 
>> level                                                  LOW
>>  GLOBAL| MPI I/O 
>> enabled                                                       T
>>  GLOBAL| Total number of message passing 
>> processes                            48
>>  GLOBAL| Number of threads for this 
>> process                                    1
>>  GLOBAL| This output is from 
>> process                                           0
>>  GLOBAL| CPU model name                Intel(R) Xeon(R) CPU E5-2680 v4 @ 
>> 2.40GHz
>>  GLOBAL| 
>> CPUID                                                              1002
>>
>>  MEMORY| system memory details [Kb]
>>  MEMORY|                        rank 0           min           max       
>> average
>>  MEMORY| MemTotal            131748504     131748504     131748504     
>> 131748504
>>  MEMORY| MemFree              67523260      67523260      67523260      
>> 67523260
>>  MEMORY| Buffers                  4712          4712          
>> 4712          4712
>>  MEMORY| Cached               56159648      56159648      56159648      
>> 56159648
>>  MEMORY| Slab                  2740508       2740508       2740508       
>> 2740508
>>  MEMORY| SReclaimable          2447544       2447544       2447544       
>> 2447544
>>  MEMORY| MemLikelyFree       126135164     126135164     126135164     
>> 126135164
>>
>>
>>  GENERATE|  Preliminary Number of Bonds 
>> generated:                             0
>>  GENERATE|  Achieved consistency in connectivity generation.
>>
>>
>>  *******************************************************************************
>>
>>  *******************************************************************************
>>  **                                                                           
>> **
>>  **     #####                         ##              
>> ##                      **
>>  **    ##   ##            ##          ##              
>> ##                      **
>>  **   ##     ##                       ##            
>> ######                    **
>>  **   ##     ##  ##   ##  ##   #####  ##  ##   ####   ##    #####    
>> #####    **
>>  **   ##     ##  ##   ##  ##  ##      ## ##   ##      ##   ##   ##  ##   
>> ##   **
>>  **   ##  ## ##  ##   ##  ##  ##      ####     ###    ##   ######   
>> ######    **
>>  **    ##  ###   ##   ##  ##  ##      ## ##      ##   ##   ##       
>> ##        **
>>  **     #######   #####   ##   #####  ##  ##  ####    ##    #####   
>> ##        **
>>  **           ##                                                    
>> ##        **
>>  **                                                                           
>> **
>>  **                                                ... make the atoms 
>> dance   **
>>  **                                                                           
>> **
>>  **            Copyright (C) by CP2K developers group (2000 - 
>> 2020)           **
>>  **                      J. Chem. Phys. 152, 194103 
>> (2020)                    **
>>  **                                                                           
>> **
>>
>>  *******************************************************************************
>>
>>
>>  TOTAL NUMBERS AND MAXIMUM NUMBERS
>>
>>   Total number of            - Atomic 
>> kinds:                                   2
>>                              - 
>> Atoms:                                         64
>>                              - Shell 
>> sets:                                   128
>>                              - 
>> Shells:                                       320
>>                              - Primitive Cartesian 
>> functions:                320
>>                              - Cartesian basis 
>> functions:                    896
>>                              - Spherical basis 
>> functions:                    832
>>
>>   Maximum angular momentum of- Orbital basis 
>> functions:                        2
>>                              - Local part of the GTH 
>> pseudopotential:          2
>>                              - Non-local part of the GTH 
>> pseudopotential:      2
>>
>>
>>  SCF PARAMETERS         Density guess:                                    
>> ATOMIC
>>                         
>> --------------------------------------------------------
>>                         
>> max_scf:                                             300
>>                         
>> max_scf_history:                                       0
>>                         
>> max_diis:                                              4
>>                         
>> --------------------------------------------------------
>>                         eps_scf:                                        
>> 1.00E-07
>>                         eps_scf_history:                                
>> 0.00E+00
>>                         eps_diis:                                       
>> 1.00E-01
>>                         eps_eigval:                                     
>> 1.00E-05
>>                         
>> --------------------------------------------------------
>>                         level_shift 
>> [a.u.]:                                 0.00
>>                         
>> --------------------------------------------------------
>>                         Mixing method:                            
>> BROYDEN_MIXING
>>                                                 charge density mixing in 
>> g-space
>>                         
>> --------------------------------------------------------
>>                         No outer SCF
>>
>>  PW_GRID| Information for grid 
>> number                                          1
>>  PW_GRID| Grid distributed over                                    48 
>> processors
>>  PW_GRID| Real space group dimensions                                    
>> 48    1
>>  PW_GRID| the grid is 
>> blocked:                                                NO
>>  PW_GRID| Cutoff 
>> [a.u.]                                                    150.0
>>  PW_GRID| spherical 
>> cutoff:                                                   NO
>>  PW_GRID|   Bounds   1            -48      47                
>> Points:          96
>>  PW_GRID|   Bounds   2            -48      47                
>> Points:          96
>>  PW_GRID|   Bounds   3            -48      47                
>> Points:          96
>>  PW_GRID| Volume element (a.u.^3)  0.5016E-02     Volume (a.u.^3)      
>> 4437.6722
>>  PW_GRID| Grid span                                                    
>> FULLSPACE
>>  PW_GRID|   Distribution                         Average         
>> Max         Min
>>  PW_GRID|   G-Vectors                            18432.0       
>> 18432       18432
>>  PW_GRID|   G-Rays                                 192.0         
>> 192         192
>>  PW_GRID|   Real Space Points                    18432.0       
>> 18432       18432
>>
>>  PW_GRID| Information for grid 
>> number                                          2
>>  PW_GRID| Number of the reference 
>> grid                                         1
>>  PW_GRID| Grid distributed over                                    48 
>> processors
>>  PW_GRID| Real space group dimensions                                    
>> 48    1
>>  PW_GRID| the grid is 
>> blocked:                                                NO
>>  PW_GRID| Cutoff 
>> [a.u.]                                                     50.0
>>  PW_GRID| spherical 
>> cutoff:                                                   NO
>>  PW_GRID|   Bounds   1            -27      26                
>> Points:          54
>>  PW_GRID|   Bounds   2            -27      26                
>> Points:          54
>>  PW_GRID|   Bounds   3            -27      26                
>> Points:          54
>>  PW_GRID| Volume element (a.u.^3)  0.2818E-01     Volume (a.u.^3)      
>> 4437.6722
>>  PW_GRID| Grid span                                                    
>> FULLSPACE
>>  PW_GRID|   Distribution                         Average         
>> Max         Min
>>  PW_GRID|   G-Vectors                             3280.5        
>> 3402        3186
>>  PW_GRID|   G-Rays                                  60.8          
>> 63          59
>>  PW_GRID|   Real Space Points                     3280.5        
>> 5832        2916
>>
>>  PW_GRID| Information for grid 
>> number                                          3
>>  PW_GRID| Number of the reference 
>> grid                                         1
>>  PW_GRID| Grid distributed over                                    48 
>> processors
>>  PW_GRID| Real space group dimensions                                     
>> 6    8
>>  PW_GRID| the grid is 
>> blocked:                                                NO
>>  PW_GRID| Cutoff 
>> [a.u.]                                                     16.7
>>  PW_GRID| spherical 
>> cutoff:                                                   NO
>>  PW_GRID|   Bounds   1            -16      15                
>> Points:          32
>>  PW_GRID|   Bounds   2            -16      15                
>> Points:          32
>>  PW_GRID|   Bounds   3            -16      15                
>> Points:          32
>>  PW_GRID| Volume element (a.u.^3)  0.1354         Volume (a.u.^3)      
>> 4437.6722
>>  PW_GRID| Grid span                                                    
>> FULLSPACE
>>  PW_GRID|   Distribution                         Average         
>> Max         Min
>>  PW_GRID|   G-Vectors                              682.7         
>> 704         640
>>  PW_GRID|   G-Rays                                  21.3          
>> 22          20
>>  PW_GRID|   Real Space Points                      682.7         
>> 768         640
>>
>>  PW_GRID| Information for grid 
>> number                                          4
>>  PW_GRID| Number of the reference 
>> grid                                         1
>>  PW_GRID| Grid distributed over                                    48 
>> processors
>>  PW_GRID| Real space group dimensions                                     
>> 6    8
>>  PW_GRID| the grid is 
>> blocked:                                                NO
>>  PW_GRID| Cutoff 
>> [a.u.]                                                      5.6
>>  PW_GRID| spherical 
>> cutoff:                                                   NO
>>  PW_GRID|   Bounds   1             -9       8                
>> Points:          18
>>  PW_GRID|   Bounds   2             -9       8                
>> Points:          18
>>  PW_GRID|   Bounds   3             -9       8                
>> Points:          18
>>  PW_GRID| Volume element (a.u.^3)  0.7609         Volume (a.u.^3)      
>> 4437.6722
>>  PW_GRID| Grid span                                                    
>> FULLSPACE
>>  PW_GRID|   Distribution                         Average         
>> Max         Min
>>  PW_GRID|   G-Vectors                              121.5         
>> 144         108
>>  PW_GRID|   G-Rays                                   6.8           
>> 8           6
>>  PW_GRID|   Real Space Points                      121.5         
>> 162         108
>>
>>  POISSON| Solver                                                        
>> PERIODIC
>>  POISSON| 
>> Periodicity                                                        XYZ
>>
>>  RS_GRID| Information for grid 
>> number                                          1
>>  RS_GRID|   Bounds   1            -48      47                
>> Points:          96
>>  RS_GRID|   Bounds   2            -48      47                
>> Points:          96
>>  RS_GRID|   Bounds   3            -48      47                
>> Points:          96
>>  RS_GRID| Real space distribution over                                  6 
>> groups
>>  RS_GRID| Real space distribution along 
>> direction                              2
>>  RS_GRID| Border 
>> size                                                         26
>>  RS_GRID| Real space distribution over                                  8 
>> groups
>>  RS_GRID| Real space distribution along 
>> direction                              3
>>  RS_GRID| Border 
>> size                                                         26
>>  RS_GRID|   Distribution                         Average         
>> Max         Min
>>  RS_GRID|   Planes                                  68.0          
>> 68          68
>>  RS_GRID|   Distribution                         Average         
>> Max         Min
>>  RS_GRID|   Planes                                  64.0          
>> 64          64
>>
>>  RS_GRID| Information for grid 
>> number                                          2
>>  RS_GRID|   Bounds   1            -27      26                
>> Points:          54
>>  RS_GRID|   Bounds   2            -27      26                
>> Points:          54
>>  RS_GRID|   Bounds   3            -27      26                
>> Points:          54
>>  RS_GRID| Real space fully replicated
>>  RS_GRID| Group 
>> size                                                           1
>>
>>  RS_GRID| Information for grid 
>> number                                          3
>>  RS_GRID|   Bounds   1            -16      15                
>> Points:          32
>>  RS_GRID|   Bounds   2            -16      15                
>> Points:          32
>>  RS_GRID|   Bounds   3            -16      15                
>> Points:          32
>>  RS_GRID| Real space fully replicated
>>  RS_GRID| Group 
>> size                                                           1
>>
>>  RS_GRID| Information for grid 
>> number                                          4
>>  RS_GRID|   Bounds   1             -9       8                
>> Points:          18
>>  RS_GRID|   Bounds   2             -9       8                
>> Points:          18
>>  RS_GRID|   Bounds   3             -9       8                
>> Points:          18
>>  RS_GRID| Real space fully replicated
>>  RS_GRID| Group 
>> size                                                           1
>>
>>  MD_PAR| Molecular dynamics protocol (MD input parameters)
>>  MD_PAR| Ensemble 
>> type                                                       NVT
>>  MD_PAR| Number of time 
>> steps                                              10000
>>  MD_PAR| Time step [fs]                                                 
>> 0.500000
>>  MD_PAR| Temperature [K]                                              
>> 300.000000
>>  MD_PAR| Temperature tolerance [K]                                      
>> 0.000000
>>  MD_PAR| Print MD information every                                   10 
>> step(s)
>>  MD_PAR| File type   Print frequency [steps]                          
>> File names
>>  MD_PAR| Coordinates         10                               
>> SiC_AIMD-pos-1.xyz
>>  MD_PAR| Velocities          10                               
>> SiC_AIMD-vel-1.xyz
>>  MD_PAR| Energies            10                                  
>> SiC_AIMD-1.ener
>>  MD_PAR| Dump                20                               
>> SiC_AIMD-1.restart
>>
>>  ROT| Rotational analysis information
>>  ROT| Principal axes and moments of inertia [a.u.]
>>  ROT|                           1                   2                   3
>>  ROT| Eigenvalues      9.86893119935E+07   1.19427476747E+08   
>> 1.19427476747E+08
>>  ROT|      x              0.577350269190     -0.408248290464      
>> 0.707106781187
>>  ROT|      y              0.577350269190     -0.408248290464     
>> -0.707106781187
>>  ROT|      z              0.577350269190      0.816496580928      
>> 0.000000000000
>>  ROT| Number of rotovibrational 
>> vectors                                        6
>>
>>  DOF| Calculation of degrees of freedom
>>  DOF| Number of 
>> atoms                                                         64
>>  DOF| Number of intramolecular 
>> constraints                                     0
>>  DOF| Number of intermolecular 
>> constraints                                     0
>>  DOF| Invariants (translations + 
>> rotations)                                    3
>>  DOF| Degrees of 
>> freedom                                                     189
>>
>>  DOF| Restraints information
>>  DOF| Number of intramolecular 
>> restraints                                      0
>>  DOF| Number of intermolecular 
>> restraints                                      0
>>
>>  THERMOSTAT| Thermostat information for PARTICLES
>>  THERMOSTAT| Type of thermostat                               
>> Nose-Hoover-Chains
>>  THERMOSTAT| Nose-Hoover-Chain 
>> length                                          3
>>  THERMOSTAT| Nose-Hoover-Chain time constant [fs]                    
>> 1000.000000
>>  THERMOSTAT| Order of Yoshida 
>> integrator                                       3
>>  THERMOSTAT| Number of multiple time 
>> steps                                     2
>>  THERMOSTAT| Initial potential energy                         
>> 0.000000000000E+00
>>  THERMOSTAT| Initial kinetic energy                           
>> 0.475022301493E-03
>>  THERMOSTAT| End of thermostat information for PARTICLES
>>
>>  MD_VEL| Velocities initialization
>>  MD_VEL| Initial temperature [K]                                      
>> 300.000000
>>  MD_VEL| COM velocity             0.0000000000    -0.0000000000    
>> -0.0000000000
>>
>>  Number of 
>> electrons:                                                        256
>>  Number of occupied 
>> orbitals:                                                128
>>  Number of molecular 
>> orbitals:                                               128
>>
>>  Number of orbital 
>> functions:                                                832
>>  Number of independent orbital 
>> functions:                                    832
>>
>>  Extrapolation method: initial_guess
>>
>>
>>
>>
>>  -------------------------------------------------------------------------------
>>  -                                                                             
>> -
>>  -                                DBCSR 
>> STATISTICS                             -
>>  -                                                                             
>> -
>>
>>  -------------------------------------------------------------------------------
>>  COUNTER                                    TOTAL       BLAS       
>> SMM       ACC
>>  flops    13 x    32 x    13        7086601666560       0.0%    
>> 100.0%      0.0%
>>  flops    13 x    13 x    32        9891694059520       0.0%    
>> 100.0%      0.0%
>>  flops inhomo. stacks                           0       0.0%      
>> 0.0%      0.0%
>>  flops total                        16.978296E+12       0.0%    
>> 100.0%      0.0%
>>  flops max/rank                    732.153860E+09       0.0%    
>> 100.0%      0.0%
>>  matmuls inhomo. stacks                         0       0.0%      
>> 0.0%      0.0%
>>  matmuls total                         1569738880       0.0%    
>> 100.0%      0.0%
>>  number of processed stacks              28782912       0.0%    
>> 100.0%      0.0%
>>  average stack size                                     0.0      
>> 54.5       0.0
>>  marketing flops                    26.595494E+12
>>
>>  -------------------------------------------------------------------------------
>>  # multiplications                         149911
>>  max memory usage/rank             153.088000E+06
>>  # max total images/rank                        3
>>  # max 3D layers                                1
>>  # MPI messages exchanged               143914560
>>  MPI messages size (bytes):
>>   total size                         3.855411E+12
>>   min size                           0.000000E+00
>>   max size                         137.904000E+03
>>   average size                      26.789580E+03
>>  MPI breakdown and total messages size (bytes):
>>              size <=      128            81866560                        0
>>        128 < size <=     8192                   0                        0
>>       8192 < size <=    32768            21587184             383158124544
>>      32768 < size <=   131072            36941696            2980859518208
>>     131072 < size <=  4194304             3519120             485300724480
>>    4194304 < size <= 16777216                   0                        0
>>   16777216 < size                               0                        0
>>
>>  -------------------------------------------------------------------------------
>>
>>  *** WARNING in dbcsr_mm.F:294 :: Using a non-square number of MPI ranks 
>> ***
>>  *** might lead to poor performance. Used ranks: 48 Suggested: 49 100    
>> ***
>>
>>
>>  -------------------------------------------------------------------------------
>>  -                                                                             
>> -
>>  -                      DBCSR MESSAGE PASSING 
>> PERFORMANCE                      -
>>  -                                                                             
>> -
>>
>>  -------------------------------------------------------------------------------
>>  ROUTINE             CALLS      AVE VOLUME [Bytes]
>>  MP_Bcast                3                     12.
>>  MP_Allreduce       869441                      8.
>>  MP_Alltoall       3098138                  32851.
>>  MP_ISend          7195728                  12717.
>>  MP_IRecv          7195728                  11224.
>>
>>  -------------------------------------------------------------------------------
>>
>>
>>  -------------------------------------------------------------------------------
>>  -                                                                             
>> -
>>  -                                GRID 
>> STATISTICS                              -
>>  -                                                                             
>> -
>>
>>  -------------------------------------------------------------------------------
>>  LP    KERNEL             BACKEND                              COUNT     
>> PERCENT
>>  2     collocate ortho    REF                             9708713949 
>> <(970)%20871-3949>      36.60%
>>  4     integrate ortho    REF                              
>> 529879041       2.00%
>>  4     collocate ortho    REF                              
>> 221635148       0.84%
>>  2     integrate ortho    REF                             8736976861 
>> <(873)%20697-6861>      32.94%
>>  0     collocate general  REF                               
>> 30723072       0.12%
>>  1     integrate general  REF                               
>> 30723072       0.12%
>>  5     integrate ortho    REF                               
>> 22183061       0.08%
>>  3     integrate ortho    REF                             3942635281      
>> 14.86%
>>  3     collocate ortho    REF                             3301325147      
>> 12.45%
>>
>>  -------------------------------------------------------------------------------
>>
>>  MEMORY| Estimated peak process memory 
>> [MiB]                                 146
>>
>>
>>  -------------------------------------------------------------------------------
>>  ----                             MULTIGRID 
>> INFO                            ----
>>
>>  -------------------------------------------------------------------------------
>>  count for grid        1:      110066116          cutoff [a.u.]          
>> 150.00
>>  count for grid        2:      519820015          cutoff [a.u.]           
>> 50.00
>>  count for grid        3:      459986613          cutoff [a.u.]           
>> 16.67
>>  count for grid        4:      235051958          cutoff 
>> [a.u.]            5.56
>>  total gridlevel count  :     1324924702
>>
>>
>>  -------------------------------------------------------------------------------
>>  -                                                                             
>> -
>>  -                         MESSAGE PASSING 
>> PERFORMANCE                         -
>>  -                                                                             
>> -
>>
>>  -------------------------------------------------------------------------------
>>
>>  ROUTINE             CALLS      AVE VOLUME [Bytes]
>>  MP_Group                4
>>  MP_Bcast           203792                   2218.
>>  MP_Allreduce      1459647                    265.
>>  MP_Sync                 4
>>  MP_Alltoall       1818671                 396307.
>>  MP_ISendRecv     28177722                  18032.
>>  MP_Wait          42247738
>>  MP_ISend         12750952                  57626.
>>  MP_IRecv         12750952                  57626.
>>
>>  -------------------------------------------------------------------------------
>>
>>
>>
>>  -------------------------------------------------------------------------------
>>  -                                                                             
>> -
>>  -                                T I M I N 
>> G                                  -
>>  -                                                                             
>> -
>>
>>  -------------------------------------------------------------------------------
>>  SUBROUTINE                       CALLS  ASD         SELF TIME        
>> TOTAL TIME
>>                                 MAXIMUM       AVERAGE  MAXIMUM  AVERAGE  
>> MAXIMUM
>>  CP2K                                 1  1.0     0.01     0.01 66822.69 
>> 66823.04
>>  qs_mol_dyn_low                       1  2.0     0.34     0.37 66822.51 
>> 66822.86
>>  velocity_verlet                  10000  3.0     1.48     5.04 66810.62 
>> 66811.08
>>  qs_forces                        10001  4.0     0.98     1.02 66806.91 
>> 66807.26
>>  qs_energies                      10001  5.0     0.88     1.24 59685.56 
>> 59686.71
>>  scf_env_do_scf                   10001  6.0     0.94     1.73 54615.83 
>> 54617.31
>>  scf_env_do_scf_inner_loop        89920  7.0     4.83    26.14 54614.78 
>> 54616.21
>>  rebuild_ks_matrix                99921  8.7     0.40     0.46 25783.42 
>> 25795.09
>>  qs_ks_build_kohn_sham_matrix     99921  9.7    13.65    14.24 25783.02 
>> 25794.65
>>  qs_rho_update_rho                99921  8.1     0.53     0.65 25411.34 
>> 25412.68
>>  calculate_rho_elec               99921  9.1    10.26    10.68 25410.81 
>> 25412.19
>>  sum_up_and_integrate             99921 10.7    10.04    11.19 24320.21 
>> 24334.14
>>  integrate_v_rspace               99921 11.7     3.82     4.21 24309.99 
>> 24324.85
>>  qs_ks_update_qs_env              89920  8.0     0.78     0.91 22462.31 
>> 22473.54
>>  grid_collocate_task_list         99921 10.1 18451.53 18769.98 18451.53 
>> 18769.98
>>  grid_integrate_task_list         99921 12.7 16303.94 16394.84 16303.94 
>> 16394.84
>>  rs_pw_transfer                  819370 12.3    15.23    17.78 11655.48 
>> 12071.19
>>  qs_scf_new_mos                   89920  8.0     1.71     1.94  8270.35  
>> 8321.12
>>  eigensolver                      89920  9.0     5.28     7.69  7862.09  
>> 7870.32
>>  density_rs2pw                    99921 10.1     6.01     6.82  6836.50  
>> 7045.41
>>  cp_fm_diag_elpa                  89920 10.0     0.64     0.79  6757.80  
>> 6804.53
>>  cp_fm_diag_elpa_base             89920 11.0  6676.81  6729.03  6756.91  
>> 6803.67
>>  mp_waitany                     ******* 14.1  5758.84  6457.62  5758.84  
>> 6457.62
>>  potential_pw2rs                  99921 12.7     6.04     6.56  5839.37  
>> 5848.24
>>  rs_pw_transfer_RS2PW_150        109922 11.9  1068.20  1206.58  5210.54  
>> 5627.18
>>  rs_pw_transfer_PW2RS_150        109922 14.3  1943.71  2063.73  4455.92  
>> 4497.89
>>  build_core_hamiltonian_matrix_   10001  5.0     0.39     0.44  2865.88  
>> 3438.38
>>  qs_ks_update_qs_env_forces       10001  5.0     0.05     0.06  3365.19  
>> 3366.37
>>  init_scf_run                     10001  6.0     0.61     0.93  3252.05  
>> 3253.43
>>  scf_env_initial_rho_setup        10001  7.0     0.24     1.03  3175.29  
>> 3176.49
>>  wfi_extrapolate                  10001  8.0     0.91     1.00  3104.21  
>> 3104.23
>>  pw_transfer                    1288972 11.8    67.54    70.98  2676.70  
>> 2707.44
>>  fft_wrap_pw1pw2                1089130 12.8    10.61    11.18  2555.45  
>> 2585.55
>>  mp_alltoall_d11v               1529045 12.0  2279.64  2399.42  2279.64  
>> 2399.42
>>  fft_wrap_pw1pw2_150             489604 13.2   220.23   228.66  2227.19  
>> 2283.38
>>  rs_gather_matrices               99921 12.7    10.55    14.72  2150.73  
>> 2276.05
>>  build_core_ppnl_forces           10001  6.0  1724.02  2032.14  1724.02  
>> 2032.14
>>  fft3d_ps                       1089130 14.8   824.46   858.66  1971.84  
>> 1994.23
>>  mp_sum_d                        869728 10.8  1050.61  1821.39  1050.61  
>> 1821.39
>>  qs_energies_init_hamiltonians    10001  6.0     0.17     0.19  1767.07  
>> 1767.08
>>  mp_waitall_1                   ******* 14.6  1405.52  1749.18  1405.52  
>> 1749.18
>>  calculate_ecore_overlap          20002  6.0     0.24     0.35   885.01  
>> 1685.36
>>
>>  -------------------------------------------------------------------------------
>>
>>  The number of warnings for this run is : 1
>>
>> On Friday, February 5, 2021 at 5:43:48 PM UTC+8 Alfio Lazzaro wrote:
>>
>>> Well, what I need is the top (let's say up to "SCF WAVEFUNCTION 
>>> OPTIMIZATION") and the bottom of the logs (starting at "DBCSR STATISTICS").
>>>
>>> Il giorno venerdì 5 febbraio 2021 alle 09:24:34 UTC+1 singlebook ha 
>>> scritto:
>>>
>>>> Hello,  Alfio,
>>>>
>>>> Yes, there are 12 MPI ranks, each rank has only one thread.
>>>> The output file is too large to upload, I only  put the head 
>>>> information for the cpu version here, those files for gpu are not saved for 
>>>> the moment. Whenever the workstation is idle, I will do more tests.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *DBCSR| CPU Multiplication 
>>>> driver                                           XSMM DBCSR| Multrec 
>>>> recursion limit                                              512 DBCSR| 
>>>> Multiplication stack size                                           
>>>> 1000 DBCSR| Maximum elements for images                                    
>>>> UNLIMITED DBCSR| Multiplicative factor virtual 
>>>> images                                   1 DBCSR| Use multiplication 
>>>> densification                                       T DBCSR| Multiplication 
>>>> size stacks                                             3 DBCSR| Use memory 
>>>> pool for CPU allocation                                     F DBCSR| Number 
>>>> of 3D layers                                               SINGLE DBCSR| 
>>>> Use MPI memory allocation                                              
>>>> F DBCSR| Use RMA 
>>>> algorithm                                                      F DBCSR| Use 
>>>> Communication thread                                               T DBCSR| 
>>>> Communication thread load                                             
>>>> 87 DBCSR| MPI: My node 
>>>> id                                                        0 DBCSR| MPI: 
>>>> Number of nodes                                                  48 DBCSR| 
>>>> OMP: Current number of threads                                         
>>>> 1 DBCSR| OMP: Max number of 
>>>> threads                                             1 DBCSR| Split modifier 
>>>> for TAS multiplication algorithm                  1.0E+00  **** **** 
>>>> ******  **  PROGRAM STARTED AT               2021-02-04 09:18:01.088 ***** 
>>>> ** ***  *** **   PROGRAM STARTED ON                                  
>>>> k172 **    ****   ******    PROGRAM STARTED 
>>>> BY                               chenwei ***** **    ** ** **   PROGRAM 
>>>> PROCESS ID                                 52126  **** **  *******  **  
>>>> PROGRAM STARTED IN /ncsfs02/chenwei/Machine 
>>>> Learning/CP2                                           K/SiC CP2K| version 
>>>> string:                                          CP2K version 8.1 CP2K| 
>>>> source code revision number:                                  
>>>> git:0b61f2f CP2K| cp2kflags: omp libint fftw3 libxc elpa parallel mpi3 
>>>> scalapack xsmm plume CP2K|            d2 spglib libvori libbqb CP2K| is 
>>>> freely available from                            https://www.cp2k.org/ 
>>>> <https://www.cp2k.org/> CP2K| Program compiled at                          
>>>> Thu Feb  4 08:49:28 CST 2021 CP2K| Program compiled 
>>>> on                                                  k172 CP2K| Program 
>>>> compiled for                                                local CP2K| 
>>>> Data directory path                       
>>>> /home/chenwei/src/cp2k-8.1/data CP2K| Input file 
>>>> name                                                   SiC.inp GLOBAL| 
>>>> Force Environment number                                              
>>>> 1 GLOBAL| Basis set file name                                           
>>>> BASIS_SET GLOBAL| Potential file name                                      
>>>> GTH_POTENTIALS GLOBAL| MM Potential file 
>>>> name                                     MM_POTENTIAL GLOBAL| Coordinate 
>>>> file name                                      __STD_INPUT__ GLOBAL| Method 
>>>> name                                                        CP2K GLOBAL| 
>>>> Project name                                                   
>>>> SiC_AIMD GLOBAL| Preferred FFT 
>>>> library                                             FFTW3 GLOBAL| Preferred 
>>>> diagonalization lib.                                     ELPA GLOBAL| Run 
>>>> type                                                             MD GLOBAL| 
>>>> All-to-all communication in single precision                          
>>>> F GLOBAL| FFTs using library dependent 
>>>> lengths                                  F GLOBAL| Global print 
>>>> level                                                  LOW GLOBAL| MPI I/O 
>>>> enabled                                                       T GLOBAL| 
>>>> Total number of message passing processes                            
>>>> 48 GLOBAL| Number of threads for this 
>>>> process                                    1 GLOBAL| This output is from 
>>>> process                                           0 GLOBAL| CPU model 
>>>> name                Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz GLOBAL| 
>>>> CPUID                                                              
>>>> 1002 MEMORY| system memory details [Kb] MEMORY|                        rank 
>>>> 0           min           max       average MEMORY| MemTotal            
>>>> 131748504     131748504     131748504     131748504 MEMORY| 
>>>> MemFree              67523260      67523260      67523260      
>>>> 67523260 MEMORY| Buffers                  4712          4712          
>>>> 4712          4712 MEMORY| Cached               56159648      56159648      
>>>> 56159648      56159648 MEMORY| Slab                  2740508       
>>>> 2740508       2740508       2740508 MEMORY| SReclaimable          
>>>> 2447544       2447544       2447544       2447544 MEMORY| 
>>>> MemLikelyFree       126135164     126135164     126135164     
>>>> 126135164 GENERATE|  Preliminary Number of Bonds 
>>>> generated:                             0 GENERATE|  Achieved consistency in 
>>>> connectivity generation.*
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> * SCF WAVEFUNCTION OPTIMIZATION  Step     Update method      Time    
>>>> Convergence         Total energy    Change  
>>>> ------------------------------------------------------------------------------     
>>>> 1 NoMix/Diag. 0.40E+00    0.3     3.80220882      -317.7175159821 
>>>> -3.18E+02     2 Broy./Diag. 0.40E+00    0.6     0.43368094      
>>>> -291.0370906460  2.67E+01     3 Broy./Diag. 0.40E+00    0.6     
>>>> 0.23506554      -308.2043627628 -1.72E+01     4 Broy./Diag. 0.40E+00    
>>>> 0.6     0.26390650      -309.7756477106 -1.57E+00     5 Broy./Diag. 
>>>> 0.40E+00    0.6     0.00311711      -310.0196552337 -2.44E-01     6 
>>>> Broy./Diag. 0.40E+00    0.6     0.01762115      -309.8687051316  
>>>> 1.51E-01     7 Broy./Diag. 0.40E+00    0.6     0.00055086      
>>>> -309.8505587170  1.81E-02     8 Broy./Diag. 0.40E+00    0.6     
>>>> 0.00030811      -309.8516271774 -1.07E-03     9 Broy./Diag. 0.40E+00    
>>>> 0.6     0.00001506      -309.8519055144 -2.78E-04    10 Broy./Diag. 
>>>> 0.40E+00    0.6     0.00000129      -309.8519255844 -2.01E-05    11 
>>>> Broy./Diag. 0.40E+00    0.6     0.00000032      -309.8519300365 
>>>> -4.45E-06    12 Broy./Diag. 0.40E+00    0.6     0.00000002      
>>>> -309.8519304271 -3.91E-07  *** SCF run converged in    12 steps ****
>>>>
>>>> Best wishes,
>>>>
>>>> Wei
>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20210205/d02835cf/attachment.htm>


More information about the CP2K-user mailing list