[CP2K-user] CUDA RUNTIME API error: EventRecord failed with error cudaErrorInvalidResourceHandle
Wei Chen
chenw... at gmail.com
Sat Feb 6 00:25:10 UTC 2021
Thank you very much for your reply.
Best wishes,
Wei
On Friday, February 5, 2021 at 8:50:59 PM UTC+8 Alfio Lazzaro wrote:
> OK, Thanks for the timers.
> I assume you sent me the CPU timers.
> As suspected, you are massively dominated by no GPU part. I can even not
> see any COSMA stuff.
> These are the main parts where the time goes:
>
> fft_wrap_pw1pw2_150 228.660
> fft3d_ps 858.660
> rs_pw_transfer_RS2PW_150 1206.580
> mp_waitall_1 1749.180
> mp_sum_d 1821.390
> build_core_ppnl_forces 2032.140
> rs_pw_transfer_PW2RS_150 2063.730
> mp_alltoall_d11v 2399.420
> mp_waitany 6457.620
> cp_fm_diag_elpa_base 6729.030
> grid_integrate_task_list 16394.840
> grid_collocate_task_list 18769.980
> CP2K_Total 66823.040
>
> More than half of the total time (66823.040) is in the grid_* functions.
> BTW, for this kind of testings, I suggest using fewer steps...
> I suspect you are hitting the performance problem for the CPU and GPU
> reported for the CP2K 8.1 (see https://github.com/cp2k/cp2k/issues/1323 ).
> I suggest to try CP2K 7.1...
>
> Alfio
>
>
>
>
>
> Il giorno venerdì 5 febbraio 2021 alle 10:54:47 UTC+1 singlebook ha
> scritto:
>
>> DBCSR| CPU Multiplication
>> driver XSMM
>> DBCSR| Multrec recursion
>> limit 512
>> DBCSR| Multiplication stack
>> size 1000
>> DBCSR| Maximum elements for images
>> UNLIMITED
>> DBCSR| Multiplicative factor virtual
>> images 1
>> DBCSR| Use multiplication
>> densification T
>> DBCSR| Multiplication size
>> stacks 3
>> DBCSR| Use memory pool for CPU
>> allocation F
>> DBCSR| Number of 3D layers
>> SINGLE
>> DBCSR| Use MPI memory
>> allocation F
>> DBCSR| Use RMA
>> algorithm F
>> DBCSR| Use Communication
>> thread T
>> DBCSR| Communication thread
>> load 87
>> DBCSR| MPI: My node
>> id 0
>> DBCSR| MPI: Number of
>> nodes 48
>> DBCSR| OMP: Current number of
>> threads 1
>> DBCSR| OMP: Max number of
>> threads 1
>> DBCSR| Split modifier for TAS multiplication algorithm
>> 1.0E+00
>>
>>
>> **** **** ****** ** PROGRAM STARTED AT 2021-02-04
>> 09:18:01.088
>> ***** ** *** *** ** PROGRAM STARTED
>> ON k172
>> ** **** ****** PROGRAM STARTED BY
>> chenwei
>> ***** ** ** ** ** PROGRAM PROCESS
>> ID 52126
>> **** ** ******* ** PROGRAM STARTED IN /ncsfs02/chenwei/Machine
>> Learning/CP2
>> K/SiC
>>
>> CP2K| version string: CP2K
>> version 8.1
>> CP2K| source code revision number:
>> git:0b61f2f
>> CP2K| cp2kflags: omp libint fftw3 libxc elpa parallel mpi3 scalapack
>> xsmm plume
>> CP2K| d2 spglib libvori libbqb
>> CP2K| is freely available from
>> https://www.cp2k.org/
>> CP2K| Program compiled at Thu Feb 4 08:49:28
>> CST 2021
>> CP2K| Program compiled
>> on k172
>> CP2K| Program compiled
>> for local
>> CP2K| Data directory path
>> /home/chenwei/src/cp2k-8.1/data
>> CP2K| Input file name
>> SiC.inp
>>
>> GLOBAL| Force Environment
>> number 1
>> GLOBAL| Basis set file name
>> BASIS_SET
>> GLOBAL| Potential file name
>> GTH_POTENTIALS
>> GLOBAL| MM Potential file name
>> MM_POTENTIAL
>> GLOBAL| Coordinate file name
>> __STD_INPUT__
>> GLOBAL| Method
>> name CP2K
>> GLOBAL| Project name
>> SiC_AIMD
>> GLOBAL| Preferred FFT
>> library FFTW3
>> GLOBAL| Preferred diagonalization
>> lib. ELPA
>> GLOBAL| Run
>> type MD
>> GLOBAL| All-to-all communication in single
>> precision F
>> GLOBAL| FFTs using library dependent
>> lengths F
>> GLOBAL| Global print
>> level LOW
>> GLOBAL| MPI I/O
>> enabled T
>> GLOBAL| Total number of message passing
>> processes 48
>> GLOBAL| Number of threads for this
>> process 1
>> GLOBAL| This output is from
>> process 0
>> GLOBAL| CPU model name Intel(R) Xeon(R) CPU E5-2680 v4 @
>> 2.40GHz
>> GLOBAL|
>> CPUID 1002
>>
>> MEMORY| system memory details [Kb]
>> MEMORY| rank 0 min max
>> average
>> MEMORY| MemTotal 131748504 131748504 131748504
>> 131748504
>> MEMORY| MemFree 67523260 67523260 67523260
>> 67523260
>> MEMORY| Buffers 4712 4712
>> 4712 4712
>> MEMORY| Cached 56159648 56159648 56159648
>> 56159648
>> MEMORY| Slab 2740508 2740508 2740508
>> 2740508
>> MEMORY| SReclaimable 2447544 2447544 2447544
>> 2447544
>> MEMORY| MemLikelyFree 126135164 126135164 126135164
>> 126135164
>>
>>
>> GENERATE| Preliminary Number of Bonds
>> generated: 0
>> GENERATE| Achieved consistency in connectivity generation.
>>
>>
>> *******************************************************************************
>>
>> *******************************************************************************
>> **
>> **
>> ** ##### ##
>> ## **
>> ** ## ## ## ##
>> ## **
>> ** ## ## ##
>> ###### **
>> ** ## ## ## ## ## ##### ## ## #### ## #####
>> ##### **
>> ** ## ## ## ## ## ## ## ## ## ## ## ## ##
>> ## **
>> ** ## ## ## ## ## ## ## #### ### ## ######
>> ###### **
>> ** ## ### ## ## ## ## ## ## ## ## ##
>> ## **
>> ** ####### ##### ## ##### ## ## #### ## #####
>> ## **
>> ** ##
>> ## **
>> **
>> **
>> ** ... make the atoms
>> dance **
>> **
>> **
>> ** Copyright (C) by CP2K developers group (2000 -
>> 2020) **
>> ** J. Chem. Phys. 152, 194103
>> (2020) **
>> **
>> **
>>
>> *******************************************************************************
>>
>>
>> TOTAL NUMBERS AND MAXIMUM NUMBERS
>>
>> Total number of - Atomic
>> kinds: 2
>> -
>> Atoms: 64
>> - Shell
>> sets: 128
>> -
>> Shells: 320
>> - Primitive Cartesian
>> functions: 320
>> - Cartesian basis
>> functions: 896
>> - Spherical basis
>> functions: 832
>>
>> Maximum angular momentum of- Orbital basis
>> functions: 2
>> - Local part of the GTH
>> pseudopotential: 2
>> - Non-local part of the GTH
>> pseudopotential: 2
>>
>>
>> SCF PARAMETERS Density guess:
>> ATOMIC
>>
>> --------------------------------------------------------
>>
>> max_scf: 300
>>
>> max_scf_history: 0
>>
>> max_diis: 4
>>
>> --------------------------------------------------------
>> eps_scf:
>> 1.00E-07
>> eps_scf_history:
>> 0.00E+00
>> eps_diis:
>> 1.00E-01
>> eps_eigval:
>> 1.00E-05
>>
>> --------------------------------------------------------
>> level_shift
>> [a.u.]: 0.00
>>
>> --------------------------------------------------------
>> Mixing method:
>> BROYDEN_MIXING
>> charge density mixing in
>> g-space
>>
>> --------------------------------------------------------
>> No outer SCF
>>
>> PW_GRID| Information for grid
>> number 1
>> PW_GRID| Grid distributed over 48
>> processors
>> PW_GRID| Real space group dimensions
>> 48 1
>> PW_GRID| the grid is
>> blocked: NO
>> PW_GRID| Cutoff
>> [a.u.] 150.0
>> PW_GRID| spherical
>> cutoff: NO
>> PW_GRID| Bounds 1 -48 47
>> Points: 96
>> PW_GRID| Bounds 2 -48 47
>> Points: 96
>> PW_GRID| Bounds 3 -48 47
>> Points: 96
>> PW_GRID| Volume element (a.u.^3) 0.5016E-02 Volume (a.u.^3)
>> 4437.6722
>> PW_GRID| Grid span
>> FULLSPACE
>> PW_GRID| Distribution Average
>> Max Min
>> PW_GRID| G-Vectors 18432.0
>> 18432 18432
>> PW_GRID| G-Rays 192.0
>> 192 192
>> PW_GRID| Real Space Points 18432.0
>> 18432 18432
>>
>> PW_GRID| Information for grid
>> number 2
>> PW_GRID| Number of the reference
>> grid 1
>> PW_GRID| Grid distributed over 48
>> processors
>> PW_GRID| Real space group dimensions
>> 48 1
>> PW_GRID| the grid is
>> blocked: NO
>> PW_GRID| Cutoff
>> [a.u.] 50.0
>> PW_GRID| spherical
>> cutoff: NO
>> PW_GRID| Bounds 1 -27 26
>> Points: 54
>> PW_GRID| Bounds 2 -27 26
>> Points: 54
>> PW_GRID| Bounds 3 -27 26
>> Points: 54
>> PW_GRID| Volume element (a.u.^3) 0.2818E-01 Volume (a.u.^3)
>> 4437.6722
>> PW_GRID| Grid span
>> FULLSPACE
>> PW_GRID| Distribution Average
>> Max Min
>> PW_GRID| G-Vectors 3280.5
>> 3402 3186
>> PW_GRID| G-Rays 60.8
>> 63 59
>> PW_GRID| Real Space Points 3280.5
>> 5832 2916
>>
>> PW_GRID| Information for grid
>> number 3
>> PW_GRID| Number of the reference
>> grid 1
>> PW_GRID| Grid distributed over 48
>> processors
>> PW_GRID| Real space group dimensions
>> 6 8
>> PW_GRID| the grid is
>> blocked: NO
>> PW_GRID| Cutoff
>> [a.u.] 16.7
>> PW_GRID| spherical
>> cutoff: NO
>> PW_GRID| Bounds 1 -16 15
>> Points: 32
>> PW_GRID| Bounds 2 -16 15
>> Points: 32
>> PW_GRID| Bounds 3 -16 15
>> Points: 32
>> PW_GRID| Volume element (a.u.^3) 0.1354 Volume (a.u.^3)
>> 4437.6722
>> PW_GRID| Grid span
>> FULLSPACE
>> PW_GRID| Distribution Average
>> Max Min
>> PW_GRID| G-Vectors 682.7
>> 704 640
>> PW_GRID| G-Rays 21.3
>> 22 20
>> PW_GRID| Real Space Points 682.7
>> 768 640
>>
>> PW_GRID| Information for grid
>> number 4
>> PW_GRID| Number of the reference
>> grid 1
>> PW_GRID| Grid distributed over 48
>> processors
>> PW_GRID| Real space group dimensions
>> 6 8
>> PW_GRID| the grid is
>> blocked: NO
>> PW_GRID| Cutoff
>> [a.u.] 5.6
>> PW_GRID| spherical
>> cutoff: NO
>> PW_GRID| Bounds 1 -9 8
>> Points: 18
>> PW_GRID| Bounds 2 -9 8
>> Points: 18
>> PW_GRID| Bounds 3 -9 8
>> Points: 18
>> PW_GRID| Volume element (a.u.^3) 0.7609 Volume (a.u.^3)
>> 4437.6722
>> PW_GRID| Grid span
>> FULLSPACE
>> PW_GRID| Distribution Average
>> Max Min
>> PW_GRID| G-Vectors 121.5
>> 144 108
>> PW_GRID| G-Rays 6.8
>> 8 6
>> PW_GRID| Real Space Points 121.5
>> 162 108
>>
>> POISSON| Solver
>> PERIODIC
>> POISSON|
>> Periodicity XYZ
>>
>> RS_GRID| Information for grid
>> number 1
>> RS_GRID| Bounds 1 -48 47
>> Points: 96
>> RS_GRID| Bounds 2 -48 47
>> Points: 96
>> RS_GRID| Bounds 3 -48 47
>> Points: 96
>> RS_GRID| Real space distribution over 6
>> groups
>> RS_GRID| Real space distribution along
>> direction 2
>> RS_GRID| Border
>> size 26
>> RS_GRID| Real space distribution over 8
>> groups
>> RS_GRID| Real space distribution along
>> direction 3
>> RS_GRID| Border
>> size 26
>> RS_GRID| Distribution Average
>> Max Min
>> RS_GRID| Planes 68.0
>> 68 68
>> RS_GRID| Distribution Average
>> Max Min
>> RS_GRID| Planes 64.0
>> 64 64
>>
>> RS_GRID| Information for grid
>> number 2
>> RS_GRID| Bounds 1 -27 26
>> Points: 54
>> RS_GRID| Bounds 2 -27 26
>> Points: 54
>> RS_GRID| Bounds 3 -27 26
>> Points: 54
>> RS_GRID| Real space fully replicated
>> RS_GRID| Group
>> size 1
>>
>> RS_GRID| Information for grid
>> number 3
>> RS_GRID| Bounds 1 -16 15
>> Points: 32
>> RS_GRID| Bounds 2 -16 15
>> Points: 32
>> RS_GRID| Bounds 3 -16 15
>> Points: 32
>> RS_GRID| Real space fully replicated
>> RS_GRID| Group
>> size 1
>>
>> RS_GRID| Information for grid
>> number 4
>> RS_GRID| Bounds 1 -9 8
>> Points: 18
>> RS_GRID| Bounds 2 -9 8
>> Points: 18
>> RS_GRID| Bounds 3 -9 8
>> Points: 18
>> RS_GRID| Real space fully replicated
>> RS_GRID| Group
>> size 1
>>
>> MD_PAR| Molecular dynamics protocol (MD input parameters)
>> MD_PAR| Ensemble
>> type NVT
>> MD_PAR| Number of time
>> steps 10000
>> MD_PAR| Time step [fs]
>> 0.500000
>> MD_PAR| Temperature [K]
>> 300.000000
>> MD_PAR| Temperature tolerance [K]
>> 0.000000
>> MD_PAR| Print MD information every 10
>> step(s)
>> MD_PAR| File type Print frequency [steps]
>> File names
>> MD_PAR| Coordinates 10
>> SiC_AIMD-pos-1.xyz
>> MD_PAR| Velocities 10
>> SiC_AIMD-vel-1.xyz
>> MD_PAR| Energies 10
>> SiC_AIMD-1.ener
>> MD_PAR| Dump 20
>> SiC_AIMD-1.restart
>>
>> ROT| Rotational analysis information
>> ROT| Principal axes and moments of inertia [a.u.]
>> ROT| 1 2 3
>> ROT| Eigenvalues 9.86893119935E+07 1.19427476747E+08
>> 1.19427476747E+08
>> ROT| x 0.577350269190 -0.408248290464
>> 0.707106781187
>> ROT| y 0.577350269190 -0.408248290464
>> -0.707106781187
>> ROT| z 0.577350269190 0.816496580928
>> 0.000000000000
>> ROT| Number of rotovibrational
>> vectors 6
>>
>> DOF| Calculation of degrees of freedom
>> DOF| Number of
>> atoms 64
>> DOF| Number of intramolecular
>> constraints 0
>> DOF| Number of intermolecular
>> constraints 0
>> DOF| Invariants (translations +
>> rotations) 3
>> DOF| Degrees of
>> freedom 189
>>
>> DOF| Restraints information
>> DOF| Number of intramolecular
>> restraints 0
>> DOF| Number of intermolecular
>> restraints 0
>>
>> THERMOSTAT| Thermostat information for PARTICLES
>> THERMOSTAT| Type of thermostat
>> Nose-Hoover-Chains
>> THERMOSTAT| Nose-Hoover-Chain
>> length 3
>> THERMOSTAT| Nose-Hoover-Chain time constant [fs]
>> 1000.000000
>> THERMOSTAT| Order of Yoshida
>> integrator 3
>> THERMOSTAT| Number of multiple time
>> steps 2
>> THERMOSTAT| Initial potential energy
>> 0.000000000000E+00
>> THERMOSTAT| Initial kinetic energy
>> 0.475022301493E-03
>> THERMOSTAT| End of thermostat information for PARTICLES
>>
>> MD_VEL| Velocities initialization
>> MD_VEL| Initial temperature [K]
>> 300.000000
>> MD_VEL| COM velocity 0.0000000000 -0.0000000000
>> -0.0000000000
>>
>> Number of
>> electrons: 256
>> Number of occupied
>> orbitals: 128
>> Number of molecular
>> orbitals: 128
>>
>> Number of orbital
>> functions: 832
>> Number of independent orbital
>> functions: 832
>>
>> Extrapolation method: initial_guess
>>
>>
>>
>>
>> -------------------------------------------------------------------------------
>> -
>> -
>> - DBCSR
>> STATISTICS -
>> -
>> -
>>
>> -------------------------------------------------------------------------------
>> COUNTER TOTAL BLAS
>> SMM ACC
>> flops 13 x 32 x 13 7086601666560 0.0%
>> 100.0% 0.0%
>> flops 13 x 13 x 32 9891694059520 0.0%
>> 100.0% 0.0%
>> flops inhomo. stacks 0 0.0%
>> 0.0% 0.0%
>> flops total 16.978296E+12 0.0%
>> 100.0% 0.0%
>> flops max/rank 732.153860E+09 0.0%
>> 100.0% 0.0%
>> matmuls inhomo. stacks 0 0.0%
>> 0.0% 0.0%
>> matmuls total 1569738880 0.0%
>> 100.0% 0.0%
>> number of processed stacks 28782912 0.0%
>> 100.0% 0.0%
>> average stack size 0.0
>> 54.5 0.0
>> marketing flops 26.595494E+12
>>
>> -------------------------------------------------------------------------------
>> # multiplications 149911
>> max memory usage/rank 153.088000E+06
>> # max total images/rank 3
>> # max 3D layers 1
>> # MPI messages exchanged 143914560
>> MPI messages size (bytes):
>> total size 3.855411E+12
>> min size 0.000000E+00
>> max size 137.904000E+03
>> average size 26.789580E+03
>> MPI breakdown and total messages size (bytes):
>> size <= 128 81866560 0
>> 128 < size <= 8192 0 0
>> 8192 < size <= 32768 21587184 383158124544
>> 32768 < size <= 131072 36941696 2980859518208
>> 131072 < size <= 4194304 3519120 485300724480
>> 4194304 < size <= 16777216 0 0
>> 16777216 < size 0 0
>>
>> -------------------------------------------------------------------------------
>>
>> *** WARNING in dbcsr_mm.F:294 :: Using a non-square number of MPI ranks
>> ***
>> *** might lead to poor performance. Used ranks: 48 Suggested: 49 100
>> ***
>>
>>
>> -------------------------------------------------------------------------------
>> -
>> -
>> - DBCSR MESSAGE PASSING
>> PERFORMANCE -
>> -
>> -
>>
>> -------------------------------------------------------------------------------
>> ROUTINE CALLS AVE VOLUME [Bytes]
>> MP_Bcast 3 12.
>> MP_Allreduce 869441 8.
>> MP_Alltoall 3098138 32851.
>> MP_ISend 7195728 12717.
>> MP_IRecv 7195728 11224.
>>
>> -------------------------------------------------------------------------------
>>
>>
>> -------------------------------------------------------------------------------
>> -
>> -
>> - GRID
>> STATISTICS -
>> -
>> -
>>
>> -------------------------------------------------------------------------------
>> LP KERNEL BACKEND COUNT
>> PERCENT
>> 2 collocate ortho REF 9708713949
>> <(970)%20871-3949> 36.60%
>> 4 integrate ortho REF
>> 529879041 2.00%
>> 4 collocate ortho REF
>> 221635148 0.84%
>> 2 integrate ortho REF 8736976861
>> <(873)%20697-6861> 32.94%
>> 0 collocate general REF
>> 30723072 0.12%
>> 1 integrate general REF
>> 30723072 0.12%
>> 5 integrate ortho REF
>> 22183061 0.08%
>> 3 integrate ortho REF 3942635281
>> 14.86%
>> 3 collocate ortho REF 3301325147
>> 12.45%
>>
>> -------------------------------------------------------------------------------
>>
>> MEMORY| Estimated peak process memory
>> [MiB] 146
>>
>>
>> -------------------------------------------------------------------------------
>> ---- MULTIGRID
>> INFO ----
>>
>> -------------------------------------------------------------------------------
>> count for grid 1: 110066116 cutoff [a.u.]
>> 150.00
>> count for grid 2: 519820015 cutoff [a.u.]
>> 50.00
>> count for grid 3: 459986613 cutoff [a.u.]
>> 16.67
>> count for grid 4: 235051958 cutoff
>> [a.u.] 5.56
>> total gridlevel count : 1324924702
>>
>>
>> -------------------------------------------------------------------------------
>> -
>> -
>> - MESSAGE PASSING
>> PERFORMANCE -
>> -
>> -
>>
>> -------------------------------------------------------------------------------
>>
>> ROUTINE CALLS AVE VOLUME [Bytes]
>> MP_Group 4
>> MP_Bcast 203792 2218.
>> MP_Allreduce 1459647 265.
>> MP_Sync 4
>> MP_Alltoall 1818671 396307.
>> MP_ISendRecv 28177722 18032.
>> MP_Wait 42247738
>> MP_ISend 12750952 57626.
>> MP_IRecv 12750952 57626.
>>
>> -------------------------------------------------------------------------------
>>
>>
>>
>> -------------------------------------------------------------------------------
>> -
>> -
>> - T I M I N
>> G -
>> -
>> -
>>
>> -------------------------------------------------------------------------------
>> SUBROUTINE CALLS ASD SELF TIME
>> TOTAL TIME
>> MAXIMUM AVERAGE MAXIMUM AVERAGE
>> MAXIMUM
>> CP2K 1 1.0 0.01 0.01 66822.69
>> 66823.04
>> qs_mol_dyn_low 1 2.0 0.34 0.37 66822.51
>> 66822.86
>> velocity_verlet 10000 3.0 1.48 5.04 66810.62
>> 66811.08
>> qs_forces 10001 4.0 0.98 1.02 66806.91
>> 66807.26
>> qs_energies 10001 5.0 0.88 1.24 59685.56
>> 59686.71
>> scf_env_do_scf 10001 6.0 0.94 1.73 54615.83
>> 54617.31
>> scf_env_do_scf_inner_loop 89920 7.0 4.83 26.14 54614.78
>> 54616.21
>> rebuild_ks_matrix 99921 8.7 0.40 0.46 25783.42
>> 25795.09
>> qs_ks_build_kohn_sham_matrix 99921 9.7 13.65 14.24 25783.02
>> 25794.65
>> qs_rho_update_rho 99921 8.1 0.53 0.65 25411.34
>> 25412.68
>> calculate_rho_elec 99921 9.1 10.26 10.68 25410.81
>> 25412.19
>> sum_up_and_integrate 99921 10.7 10.04 11.19 24320.21
>> 24334.14
>> integrate_v_rspace 99921 11.7 3.82 4.21 24309.99
>> 24324.85
>> qs_ks_update_qs_env 89920 8.0 0.78 0.91 22462.31
>> 22473.54
>> grid_collocate_task_list 99921 10.1 18451.53 18769.98 18451.53
>> 18769.98
>> grid_integrate_task_list 99921 12.7 16303.94 16394.84 16303.94
>> 16394.84
>> rs_pw_transfer 819370 12.3 15.23 17.78 11655.48
>> 12071.19
>> qs_scf_new_mos 89920 8.0 1.71 1.94 8270.35
>> 8321.12
>> eigensolver 89920 9.0 5.28 7.69 7862.09
>> 7870.32
>> density_rs2pw 99921 10.1 6.01 6.82 6836.50
>> 7045.41
>> cp_fm_diag_elpa 89920 10.0 0.64 0.79 6757.80
>> 6804.53
>> cp_fm_diag_elpa_base 89920 11.0 6676.81 6729.03 6756.91
>> 6803.67
>> mp_waitany ******* 14.1 5758.84 6457.62 5758.84
>> 6457.62
>> potential_pw2rs 99921 12.7 6.04 6.56 5839.37
>> 5848.24
>> rs_pw_transfer_RS2PW_150 109922 11.9 1068.20 1206.58 5210.54
>> 5627.18
>> rs_pw_transfer_PW2RS_150 109922 14.3 1943.71 2063.73 4455.92
>> 4497.89
>> build_core_hamiltonian_matrix_ 10001 5.0 0.39 0.44 2865.88
>> 3438.38
>> qs_ks_update_qs_env_forces 10001 5.0 0.05 0.06 3365.19
>> 3366.37
>> init_scf_run 10001 6.0 0.61 0.93 3252.05
>> 3253.43
>> scf_env_initial_rho_setup 10001 7.0 0.24 1.03 3175.29
>> 3176.49
>> wfi_extrapolate 10001 8.0 0.91 1.00 3104.21
>> 3104.23
>> pw_transfer 1288972 11.8 67.54 70.98 2676.70
>> 2707.44
>> fft_wrap_pw1pw2 1089130 12.8 10.61 11.18 2555.45
>> 2585.55
>> mp_alltoall_d11v 1529045 12.0 2279.64 2399.42 2279.64
>> 2399.42
>> fft_wrap_pw1pw2_150 489604 13.2 220.23 228.66 2227.19
>> 2283.38
>> rs_gather_matrices 99921 12.7 10.55 14.72 2150.73
>> 2276.05
>> build_core_ppnl_forces 10001 6.0 1724.02 2032.14 1724.02
>> 2032.14
>> fft3d_ps 1089130 14.8 824.46 858.66 1971.84
>> 1994.23
>> mp_sum_d 869728 10.8 1050.61 1821.39 1050.61
>> 1821.39
>> qs_energies_init_hamiltonians 10001 6.0 0.17 0.19 1767.07
>> 1767.08
>> mp_waitall_1 ******* 14.6 1405.52 1749.18 1405.52
>> 1749.18
>> calculate_ecore_overlap 20002 6.0 0.24 0.35 885.01
>> 1685.36
>>
>> -------------------------------------------------------------------------------
>>
>> The number of warnings for this run is : 1
>>
>> On Friday, February 5, 2021 at 5:43:48 PM UTC+8 Alfio Lazzaro wrote:
>>
>>> Well, what I need is the top (let's say up to "SCF WAVEFUNCTION
>>> OPTIMIZATION") and the bottom of the logs (starting at "DBCSR STATISTICS").
>>>
>>> Il giorno venerdì 5 febbraio 2021 alle 09:24:34 UTC+1 singlebook ha
>>> scritto:
>>>
>>>> Hello, Alfio,
>>>>
>>>> Yes, there are 12 MPI ranks, each rank has only one thread.
>>>> The output file is too large to upload, I only put the head
>>>> information for the cpu version here, those files for gpu are not saved for
>>>> the moment. Whenever the workstation is idle, I will do more tests.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *DBCSR| CPU Multiplication
>>>> driver XSMM DBCSR| Multrec
>>>> recursion limit 512 DBCSR|
>>>> Multiplication stack size
>>>> 1000 DBCSR| Maximum elements for images
>>>> UNLIMITED DBCSR| Multiplicative factor virtual
>>>> images 1 DBCSR| Use multiplication
>>>> densification T DBCSR| Multiplication
>>>> size stacks 3 DBCSR| Use memory
>>>> pool for CPU allocation F DBCSR| Number
>>>> of 3D layers SINGLE DBCSR|
>>>> Use MPI memory allocation
>>>> F DBCSR| Use RMA
>>>> algorithm F DBCSR| Use
>>>> Communication thread T DBCSR|
>>>> Communication thread load
>>>> 87 DBCSR| MPI: My node
>>>> id 0 DBCSR| MPI:
>>>> Number of nodes 48 DBCSR|
>>>> OMP: Current number of threads
>>>> 1 DBCSR| OMP: Max number of
>>>> threads 1 DBCSR| Split modifier
>>>> for TAS multiplication algorithm 1.0E+00 **** ****
>>>> ****** ** PROGRAM STARTED AT 2021-02-04 09:18:01.088 *****
>>>> ** *** *** ** PROGRAM STARTED ON
>>>> k172 ** **** ****** PROGRAM STARTED
>>>> BY chenwei ***** ** ** ** ** PROGRAM
>>>> PROCESS ID 52126 **** ** ******* **
>>>> PROGRAM STARTED IN /ncsfs02/chenwei/Machine
>>>> Learning/CP2 K/SiC CP2K| version
>>>> string: CP2K version 8.1 CP2K|
>>>> source code revision number:
>>>> git:0b61f2f CP2K| cp2kflags: omp libint fftw3 libxc elpa parallel mpi3
>>>> scalapack xsmm plume CP2K| d2 spglib libvori libbqb CP2K| is
>>>> freely available from https://www.cp2k.org/
>>>> <https://www.cp2k.org/> CP2K| Program compiled at
>>>> Thu Feb 4 08:49:28 CST 2021 CP2K| Program compiled
>>>> on k172 CP2K| Program
>>>> compiled for local CP2K|
>>>> Data directory path
>>>> /home/chenwei/src/cp2k-8.1/data CP2K| Input file
>>>> name SiC.inp GLOBAL|
>>>> Force Environment number
>>>> 1 GLOBAL| Basis set file name
>>>> BASIS_SET GLOBAL| Potential file name
>>>> GTH_POTENTIALS GLOBAL| MM Potential file
>>>> name MM_POTENTIAL GLOBAL| Coordinate
>>>> file name __STD_INPUT__ GLOBAL| Method
>>>> name CP2K GLOBAL|
>>>> Project name
>>>> SiC_AIMD GLOBAL| Preferred FFT
>>>> library FFTW3 GLOBAL| Preferred
>>>> diagonalization lib. ELPA GLOBAL| Run
>>>> type MD GLOBAL|
>>>> All-to-all communication in single precision
>>>> F GLOBAL| FFTs using library dependent
>>>> lengths F GLOBAL| Global print
>>>> level LOW GLOBAL| MPI I/O
>>>> enabled T GLOBAL|
>>>> Total number of message passing processes
>>>> 48 GLOBAL| Number of threads for this
>>>> process 1 GLOBAL| This output is from
>>>> process 0 GLOBAL| CPU model
>>>> name Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz GLOBAL|
>>>> CPUID
>>>> 1002 MEMORY| system memory details [Kb] MEMORY| rank
>>>> 0 min max average MEMORY| MemTotal
>>>> 131748504 131748504 131748504 131748504 MEMORY|
>>>> MemFree 67523260 67523260 67523260
>>>> 67523260 MEMORY| Buffers 4712 4712
>>>> 4712 4712 MEMORY| Cached 56159648 56159648
>>>> 56159648 56159648 MEMORY| Slab 2740508
>>>> 2740508 2740508 2740508 MEMORY| SReclaimable
>>>> 2447544 2447544 2447544 2447544 MEMORY|
>>>> MemLikelyFree 126135164 126135164 126135164
>>>> 126135164 GENERATE| Preliminary Number of Bonds
>>>> generated: 0 GENERATE| Achieved consistency in
>>>> connectivity generation.*
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> * SCF WAVEFUNCTION OPTIMIZATION Step Update method Time
>>>> Convergence Total energy Change
>>>> ------------------------------------------------------------------------------
>>>> 1 NoMix/Diag. 0.40E+00 0.3 3.80220882 -317.7175159821
>>>> -3.18E+02 2 Broy./Diag. 0.40E+00 0.6 0.43368094
>>>> -291.0370906460 2.67E+01 3 Broy./Diag. 0.40E+00 0.6
>>>> 0.23506554 -308.2043627628 -1.72E+01 4 Broy./Diag. 0.40E+00
>>>> 0.6 0.26390650 -309.7756477106 -1.57E+00 5 Broy./Diag.
>>>> 0.40E+00 0.6 0.00311711 -310.0196552337 -2.44E-01 6
>>>> Broy./Diag. 0.40E+00 0.6 0.01762115 -309.8687051316
>>>> 1.51E-01 7 Broy./Diag. 0.40E+00 0.6 0.00055086
>>>> -309.8505587170 1.81E-02 8 Broy./Diag. 0.40E+00 0.6
>>>> 0.00030811 -309.8516271774 -1.07E-03 9 Broy./Diag. 0.40E+00
>>>> 0.6 0.00001506 -309.8519055144 -2.78E-04 10 Broy./Diag.
>>>> 0.40E+00 0.6 0.00000129 -309.8519255844 -2.01E-05 11
>>>> Broy./Diag. 0.40E+00 0.6 0.00000032 -309.8519300365
>>>> -4.45E-06 12 Broy./Diag. 0.40E+00 0.6 0.00000002
>>>> -309.8519304271 -3.91E-07 *** SCF run converged in 12 steps ****
>>>>
>>>> Best wishes,
>>>>
>>>> Wei
>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20210205/d02835cf/attachment.htm>
More information about the CP2K-user
mailing list