[CP2K-user] CUDA RUNTIME API error: EventRecord failed with error cudaErrorInvalidResourceHandle
Alfio Lazzaro
alfio.... at gmail.com
Fri Feb 5 12:50:58 UTC 2021
OK, Thanks for the timers.
I assume you sent me the CPU timers.
As suspected, you are massively dominated by no GPU part. I can even not
see any COSMA stuff.
These are the main parts where the time goes:
fft_wrap_pw1pw2_150 228.660
fft3d_ps 858.660
rs_pw_transfer_RS2PW_150 1206.580
mp_waitall_1 1749.180
mp_sum_d 1821.390
build_core_ppnl_forces 2032.140
rs_pw_transfer_PW2RS_150 2063.730
mp_alltoall_d11v 2399.420
mp_waitany 6457.620
cp_fm_diag_elpa_base 6729.030
grid_integrate_task_list 16394.840
grid_collocate_task_list 18769.980
CP2K_Total 66823.040
More than half of the total time (66823.040) is in the grid_* functions.
BTW, for this kind of testings, I suggest using fewer steps...
I suspect you are hitting the performance problem for the CPU and GPU
reported for the CP2K 8.1 (see https://github.com/cp2k/cp2k/issues/1323 ).
I suggest to try CP2K 7.1...
Alfio
Il giorno venerdì 5 febbraio 2021 alle 10:54:47 UTC+1 singlebook ha scritto:
> DBCSR| CPU Multiplication
> driver XSMM
> DBCSR| Multrec recursion
> limit 512
> DBCSR| Multiplication stack
> size 1000
> DBCSR| Maximum elements for images
> UNLIMITED
> DBCSR| Multiplicative factor virtual
> images 1
> DBCSR| Use multiplication
> densification T
> DBCSR| Multiplication size
> stacks 3
> DBCSR| Use memory pool for CPU
> allocation F
> DBCSR| Number of 3D layers
> SINGLE
> DBCSR| Use MPI memory
> allocation F
> DBCSR| Use RMA
> algorithm F
> DBCSR| Use Communication
> thread T
> DBCSR| Communication thread
> load 87
> DBCSR| MPI: My node
> id 0
> DBCSR| MPI: Number of
> nodes 48
> DBCSR| OMP: Current number of
> threads 1
> DBCSR| OMP: Max number of
> threads 1
> DBCSR| Split modifier for TAS multiplication algorithm
> 1.0E+00
>
>
> **** **** ****** ** PROGRAM STARTED AT 2021-02-04
> 09:18:01.088
> ***** ** *** *** ** PROGRAM STARTED
> ON k172
> ** **** ****** PROGRAM STARTED BY
> chenwei
> ***** ** ** ** ** PROGRAM PROCESS ID
> 52126
> **** ** ******* ** PROGRAM STARTED IN /ncsfs02/chenwei/Machine
> Learning/CP2
> K/SiC
>
> CP2K| version string: CP2K
> version 8.1
> CP2K| source code revision number:
> git:0b61f2f
> CP2K| cp2kflags: omp libint fftw3 libxc elpa parallel mpi3 scalapack xsmm
> plume
> CP2K| d2 spglib libvori libbqb
> CP2K| is freely available from
> https://www.cp2k.org/
> CP2K| Program compiled at Thu Feb 4 08:49:28
> CST 2021
> CP2K| Program compiled
> on k172
> CP2K| Program compiled for
> local
> CP2K| Data directory path
> /home/chenwei/src/cp2k-8.1/data
> CP2K| Input file name
> SiC.inp
>
> GLOBAL| Force Environment
> number 1
> GLOBAL| Basis set file name
> BASIS_SET
> GLOBAL| Potential file name
> GTH_POTENTIALS
> GLOBAL| MM Potential file name
> MM_POTENTIAL
> GLOBAL| Coordinate file name
> __STD_INPUT__
> GLOBAL| Method
> name CP2K
> GLOBAL| Project name
> SiC_AIMD
> GLOBAL| Preferred FFT library
> FFTW3
> GLOBAL| Preferred diagonalization
> lib. ELPA
> GLOBAL| Run
> type MD
> GLOBAL| All-to-all communication in single
> precision F
> GLOBAL| FFTs using library dependent
> lengths F
> GLOBAL| Global print
> level LOW
> GLOBAL| MPI I/O
> enabled T
> GLOBAL| Total number of message passing
> processes 48
> GLOBAL| Number of threads for this
> process 1
> GLOBAL| This output is from
> process 0
> GLOBAL| CPU model name Intel(R) Xeon(R) CPU E5-2680 v4 @
> 2.40GHz
> GLOBAL|
> CPUID 1002
>
> MEMORY| system memory details [Kb]
> MEMORY| rank 0 min max
> average
> MEMORY| MemTotal 131748504 131748504 131748504
> 131748504
> MEMORY| MemFree 67523260 67523260 67523260
> 67523260
> MEMORY| Buffers 4712 4712
> 4712 4712
> MEMORY| Cached 56159648 56159648 56159648
> 56159648
> MEMORY| Slab 2740508 2740508 2740508
> 2740508
> MEMORY| SReclaimable 2447544 2447544 2447544
> 2447544
> MEMORY| MemLikelyFree 126135164 126135164 126135164
> 126135164
>
>
> GENERATE| Preliminary Number of Bonds
> generated: 0
> GENERATE| Achieved consistency in connectivity generation.
>
>
> *******************************************************************************
>
> *******************************************************************************
> **
> **
> ** ##### ##
> ## **
> ** ## ## ## ##
> ## **
> ** ## ## ##
> ###### **
> ** ## ## ## ## ## ##### ## ## #### ## #####
> ##### **
> ** ## ## ## ## ## ## ## ## ## ## ## ## ##
> ## **
> ** ## ## ## ## ## ## ## #### ### ## ######
> ###### **
> ** ## ### ## ## ## ## ## ## ## ## ##
> ## **
> ** ####### ##### ## ##### ## ## #### ## #####
> ## **
> ** ##
> ## **
> **
> **
> ** ... make the atoms
> dance **
> **
> **
> ** Copyright (C) by CP2K developers group (2000 -
> 2020) **
> ** J. Chem. Phys. 152, 194103
> (2020) **
> **
> **
>
> *******************************************************************************
>
>
> TOTAL NUMBERS AND MAXIMUM NUMBERS
>
> Total number of - Atomic
> kinds: 2
> -
> Atoms: 64
> - Shell
> sets: 128
> -
> Shells: 320
> - Primitive Cartesian
> functions: 320
> - Cartesian basis
> functions: 896
> - Spherical basis
> functions: 832
>
> Maximum angular momentum of- Orbital basis
> functions: 2
> - Local part of the GTH
> pseudopotential: 2
> - Non-local part of the GTH
> pseudopotential: 2
>
>
> SCF PARAMETERS Density guess:
> ATOMIC
>
> --------------------------------------------------------
>
> max_scf: 300
>
> max_scf_history: 0
>
> max_diis: 4
>
> --------------------------------------------------------
> eps_scf:
> 1.00E-07
> eps_scf_history:
> 0.00E+00
> eps_diis:
> 1.00E-01
> eps_eigval:
> 1.00E-05
>
> --------------------------------------------------------
> level_shift
> [a.u.]: 0.00
>
> --------------------------------------------------------
> Mixing method:
> BROYDEN_MIXING
> charge density mixing in
> g-space
>
> --------------------------------------------------------
> No outer SCF
>
> PW_GRID| Information for grid
> number 1
> PW_GRID| Grid distributed over 48
> processors
> PW_GRID| Real space group dimensions
> 48 1
> PW_GRID| the grid is
> blocked: NO
> PW_GRID| Cutoff [a.u.]
> 150.0
> PW_GRID| spherical
> cutoff: NO
> PW_GRID| Bounds 1 -48 47
> Points: 96
> PW_GRID| Bounds 2 -48 47
> Points: 96
> PW_GRID| Bounds 3 -48 47
> Points: 96
> PW_GRID| Volume element (a.u.^3) 0.5016E-02 Volume (a.u.^3)
> 4437.6722
> PW_GRID| Grid span
> FULLSPACE
> PW_GRID| Distribution Average
> Max Min
> PW_GRID| G-Vectors 18432.0 18432
> 18432
> PW_GRID| G-Rays 192.0
> 192 192
> PW_GRID| Real Space Points 18432.0 18432
> 18432
>
> PW_GRID| Information for grid
> number 2
> PW_GRID| Number of the reference
> grid 1
> PW_GRID| Grid distributed over 48
> processors
> PW_GRID| Real space group dimensions
> 48 1
> PW_GRID| the grid is
> blocked: NO
> PW_GRID| Cutoff
> [a.u.] 50.0
> PW_GRID| spherical
> cutoff: NO
> PW_GRID| Bounds 1 -27 26
> Points: 54
> PW_GRID| Bounds 2 -27 26
> Points: 54
> PW_GRID| Bounds 3 -27 26
> Points: 54
> PW_GRID| Volume element (a.u.^3) 0.2818E-01 Volume (a.u.^3)
> 4437.6722
> PW_GRID| Grid span
> FULLSPACE
> PW_GRID| Distribution Average
> Max Min
> PW_GRID| G-Vectors 3280.5
> 3402 3186
> PW_GRID| G-Rays 60.8
> 63 59
> PW_GRID| Real Space Points 3280.5
> 5832 2916
>
> PW_GRID| Information for grid
> number 3
> PW_GRID| Number of the reference
> grid 1
> PW_GRID| Grid distributed over 48
> processors
> PW_GRID| Real space group dimensions
> 6 8
> PW_GRID| the grid is
> blocked: NO
> PW_GRID| Cutoff
> [a.u.] 16.7
> PW_GRID| spherical
> cutoff: NO
> PW_GRID| Bounds 1 -16 15
> Points: 32
> PW_GRID| Bounds 2 -16 15
> Points: 32
> PW_GRID| Bounds 3 -16 15
> Points: 32
> PW_GRID| Volume element (a.u.^3) 0.1354 Volume (a.u.^3)
> 4437.6722
> PW_GRID| Grid span
> FULLSPACE
> PW_GRID| Distribution Average
> Max Min
> PW_GRID| G-Vectors 682.7
> 704 640
> PW_GRID| G-Rays 21.3
> 22 20
> PW_GRID| Real Space Points 682.7
> 768 640
>
> PW_GRID| Information for grid
> number 4
> PW_GRID| Number of the reference
> grid 1
> PW_GRID| Grid distributed over 48
> processors
> PW_GRID| Real space group dimensions
> 6 8
> PW_GRID| the grid is
> blocked: NO
> PW_GRID| Cutoff
> [a.u.] 5.6
> PW_GRID| spherical
> cutoff: NO
> PW_GRID| Bounds 1 -9 8
> Points: 18
> PW_GRID| Bounds 2 -9 8
> Points: 18
> PW_GRID| Bounds 3 -9 8
> Points: 18
> PW_GRID| Volume element (a.u.^3) 0.7609 Volume (a.u.^3)
> 4437.6722
> PW_GRID| Grid span
> FULLSPACE
> PW_GRID| Distribution Average
> Max Min
> PW_GRID| G-Vectors 121.5
> 144 108
> PW_GRID| G-Rays 6.8
> 8 6
> PW_GRID| Real Space Points 121.5
> 162 108
>
> POISSON| Solver
> PERIODIC
> POISSON|
> Periodicity XYZ
>
> RS_GRID| Information for grid
> number 1
> RS_GRID| Bounds 1 -48 47
> Points: 96
> RS_GRID| Bounds 2 -48 47
> Points: 96
> RS_GRID| Bounds 3 -48 47
> Points: 96
> RS_GRID| Real space distribution over 6
> groups
> RS_GRID| Real space distribution along
> direction 2
> RS_GRID| Border
> size 26
> RS_GRID| Real space distribution over 8
> groups
> RS_GRID| Real space distribution along
> direction 3
> RS_GRID| Border
> size 26
> RS_GRID| Distribution Average
> Max Min
> RS_GRID| Planes 68.0
> 68 68
> RS_GRID| Distribution Average
> Max Min
> RS_GRID| Planes 64.0
> 64 64
>
> RS_GRID| Information for grid
> number 2
> RS_GRID| Bounds 1 -27 26
> Points: 54
> RS_GRID| Bounds 2 -27 26
> Points: 54
> RS_GRID| Bounds 3 -27 26
> Points: 54
> RS_GRID| Real space fully replicated
> RS_GRID| Group
> size 1
>
> RS_GRID| Information for grid
> number 3
> RS_GRID| Bounds 1 -16 15
> Points: 32
> RS_GRID| Bounds 2 -16 15
> Points: 32
> RS_GRID| Bounds 3 -16 15
> Points: 32
> RS_GRID| Real space fully replicated
> RS_GRID| Group
> size 1
>
> RS_GRID| Information for grid
> number 4
> RS_GRID| Bounds 1 -9 8
> Points: 18
> RS_GRID| Bounds 2 -9 8
> Points: 18
> RS_GRID| Bounds 3 -9 8
> Points: 18
> RS_GRID| Real space fully replicated
> RS_GRID| Group
> size 1
>
> MD_PAR| Molecular dynamics protocol (MD input parameters)
> MD_PAR| Ensemble
> type NVT
> MD_PAR| Number of time steps
> 10000
> MD_PAR| Time step [fs]
> 0.500000
> MD_PAR| Temperature [K]
> 300.000000
> MD_PAR| Temperature tolerance [K]
> 0.000000
> MD_PAR| Print MD information every 10
> step(s)
> MD_PAR| File type Print frequency [steps] File
> names
> MD_PAR| Coordinates 10
> SiC_AIMD-pos-1.xyz
> MD_PAR| Velocities 10
> SiC_AIMD-vel-1.xyz
> MD_PAR| Energies 10
> SiC_AIMD-1.ener
> MD_PAR| Dump 20
> SiC_AIMD-1.restart
>
> ROT| Rotational analysis information
> ROT| Principal axes and moments of inertia [a.u.]
> ROT| 1 2 3
> ROT| Eigenvalues 9.86893119935E+07 1.19427476747E+08
> 1.19427476747E+08
> ROT| x 0.577350269190 -0.408248290464
> 0.707106781187
> ROT| y 0.577350269190 -0.408248290464
> -0.707106781187
> ROT| z 0.577350269190 0.816496580928
> 0.000000000000
> ROT| Number of rotovibrational
> vectors 6
>
> DOF| Calculation of degrees of freedom
> DOF| Number of
> atoms 64
> DOF| Number of intramolecular
> constraints 0
> DOF| Number of intermolecular
> constraints 0
> DOF| Invariants (translations +
> rotations) 3
> DOF| Degrees of
> freedom 189
>
> DOF| Restraints information
> DOF| Number of intramolecular
> restraints 0
> DOF| Number of intermolecular
> restraints 0
>
> THERMOSTAT| Thermostat information for PARTICLES
> THERMOSTAT| Type of thermostat
> Nose-Hoover-Chains
> THERMOSTAT| Nose-Hoover-Chain
> length 3
> THERMOSTAT| Nose-Hoover-Chain time constant [fs]
> 1000.000000
> THERMOSTAT| Order of Yoshida
> integrator 3
> THERMOSTAT| Number of multiple time
> steps 2
> THERMOSTAT| Initial potential energy
> 0.000000000000E+00
> THERMOSTAT| Initial kinetic energy
> 0.475022301493E-03
> THERMOSTAT| End of thermostat information for PARTICLES
>
> MD_VEL| Velocities initialization
> MD_VEL| Initial temperature [K]
> 300.000000
> MD_VEL| COM velocity 0.0000000000 -0.0000000000
> -0.0000000000
>
> Number of
> electrons: 256
> Number of occupied
> orbitals: 128
> Number of molecular
> orbitals: 128
>
> Number of orbital
> functions: 832
> Number of independent orbital
> functions: 832
>
> Extrapolation method: initial_guess
>
>
>
>
> -------------------------------------------------------------------------------
> -
> -
> - DBCSR
> STATISTICS -
> -
> -
>
> -------------------------------------------------------------------------------
> COUNTER TOTAL BLAS
> SMM ACC
> flops 13 x 32 x 13 7086601666560 0.0%
> 100.0% 0.0%
> flops 13 x 13 x 32 9891694059520 0.0%
> 100.0% 0.0%
> flops inhomo. stacks 0 0.0%
> 0.0% 0.0%
> flops total 16.978296E+12 0.0%
> 100.0% 0.0%
> flops max/rank 732.153860E+09 0.0%
> 100.0% 0.0%
> matmuls inhomo. stacks 0 0.0%
> 0.0% 0.0%
> matmuls total 1569738880 0.0%
> 100.0% 0.0%
> number of processed stacks 28782912 0.0%
> 100.0% 0.0%
> average stack size 0.0
> 54.5 0.0
> marketing flops 26.595494E+12
>
> -------------------------------------------------------------------------------
> # multiplications 149911
> max memory usage/rank 153.088000E+06
> # max total images/rank 3
> # max 3D layers 1
> # MPI messages exchanged 143914560
> MPI messages size (bytes):
> total size 3.855411E+12
> min size 0.000000E+00
> max size 137.904000E+03
> average size 26.789580E+03
> MPI breakdown and total messages size (bytes):
> size <= 128 81866560 0
> 128 < size <= 8192 0 0
> 8192 < size <= 32768 21587184 383158124544
> 32768 < size <= 131072 36941696 2980859518208
> 131072 < size <= 4194304 3519120 485300724480
> 4194304 < size <= 16777216 0 0
> 16777216 < size 0 0
>
> -------------------------------------------------------------------------------
>
> *** WARNING in dbcsr_mm.F:294 :: Using a non-square number of MPI ranks
> ***
> *** might lead to poor performance. Used ranks: 48 Suggested: 49 100
> ***
>
>
> -------------------------------------------------------------------------------
> -
> -
> - DBCSR MESSAGE PASSING
> PERFORMANCE -
> -
> -
>
> -------------------------------------------------------------------------------
> ROUTINE CALLS AVE VOLUME [Bytes]
> MP_Bcast 3 12.
> MP_Allreduce 869441 8.
> MP_Alltoall 3098138 32851.
> MP_ISend 7195728 12717.
> MP_IRecv 7195728 11224.
>
> -------------------------------------------------------------------------------
>
>
> -------------------------------------------------------------------------------
> -
> -
> - GRID
> STATISTICS -
> -
> -
>
> -------------------------------------------------------------------------------
> LP KERNEL BACKEND COUNT
> PERCENT
> 2 collocate ortho REF 9708713949
> <(970)%20871-3949> 36.60%
> 4 integrate ortho REF 529879041
> 2.00%
> 4 collocate ortho REF 221635148
> 0.84%
> 2 integrate ortho REF 8736976861
> <(873)%20697-6861> 32.94%
> 0 collocate general REF 30723072
> 0.12%
> 1 integrate general REF 30723072
> 0.12%
> 5 integrate ortho REF 22183061
> 0.08%
> 3 integrate ortho REF 3942635281
> 14.86%
> 3 collocate ortho REF 3301325147
> 12.45%
>
> -------------------------------------------------------------------------------
>
> MEMORY| Estimated peak process memory
> [MiB] 146
>
>
> -------------------------------------------------------------------------------
> ---- MULTIGRID
> INFO ----
>
> -------------------------------------------------------------------------------
> count for grid 1: 110066116 cutoff [a.u.]
> 150.00
> count for grid 2: 519820015 cutoff [a.u.]
> 50.00
> count for grid 3: 459986613 cutoff [a.u.]
> 16.67
> count for grid 4: 235051958 cutoff [a.u.]
> 5.56
> total gridlevel count : 1324924702
>
>
> -------------------------------------------------------------------------------
> -
> -
> - MESSAGE PASSING
> PERFORMANCE -
> -
> -
>
> -------------------------------------------------------------------------------
>
> ROUTINE CALLS AVE VOLUME [Bytes]
> MP_Group 4
> MP_Bcast 203792 2218.
> MP_Allreduce 1459647 265.
> MP_Sync 4
> MP_Alltoall 1818671 396307.
> MP_ISendRecv 28177722 18032.
> MP_Wait 42247738
> MP_ISend 12750952 57626.
> MP_IRecv 12750952 57626.
>
> -------------------------------------------------------------------------------
>
>
>
> -------------------------------------------------------------------------------
> -
> -
> - T I M I N
> G -
> -
> -
>
> -------------------------------------------------------------------------------
> SUBROUTINE CALLS ASD SELF TIME
> TOTAL TIME
> MAXIMUM AVERAGE MAXIMUM AVERAGE
> MAXIMUM
> CP2K 1 1.0 0.01 0.01 66822.69
> 66823.04
> qs_mol_dyn_low 1 2.0 0.34 0.37 66822.51
> 66822.86
> velocity_verlet 10000 3.0 1.48 5.04 66810.62
> 66811.08
> qs_forces 10001 4.0 0.98 1.02 66806.91
> 66807.26
> qs_energies 10001 5.0 0.88 1.24 59685.56
> 59686.71
> scf_env_do_scf 10001 6.0 0.94 1.73 54615.83
> 54617.31
> scf_env_do_scf_inner_loop 89920 7.0 4.83 26.14 54614.78
> 54616.21
> rebuild_ks_matrix 99921 8.7 0.40 0.46 25783.42
> 25795.09
> qs_ks_build_kohn_sham_matrix 99921 9.7 13.65 14.24 25783.02
> 25794.65
> qs_rho_update_rho 99921 8.1 0.53 0.65 25411.34
> 25412.68
> calculate_rho_elec 99921 9.1 10.26 10.68 25410.81
> 25412.19
> sum_up_and_integrate 99921 10.7 10.04 11.19 24320.21
> 24334.14
> integrate_v_rspace 99921 11.7 3.82 4.21 24309.99
> 24324.85
> qs_ks_update_qs_env 89920 8.0 0.78 0.91 22462.31
> 22473.54
> grid_collocate_task_list 99921 10.1 18451.53 18769.98 18451.53
> 18769.98
> grid_integrate_task_list 99921 12.7 16303.94 16394.84 16303.94
> 16394.84
> rs_pw_transfer 819370 12.3 15.23 17.78 11655.48
> 12071.19
> qs_scf_new_mos 89920 8.0 1.71 1.94 8270.35
> 8321.12
> eigensolver 89920 9.0 5.28 7.69 7862.09
> 7870.32
> density_rs2pw 99921 10.1 6.01 6.82 6836.50
> 7045.41
> cp_fm_diag_elpa 89920 10.0 0.64 0.79 6757.80
> 6804.53
> cp_fm_diag_elpa_base 89920 11.0 6676.81 6729.03 6756.91
> 6803.67
> mp_waitany ******* 14.1 5758.84 6457.62 5758.84
> 6457.62
> potential_pw2rs 99921 12.7 6.04 6.56 5839.37
> 5848.24
> rs_pw_transfer_RS2PW_150 109922 11.9 1068.20 1206.58 5210.54
> 5627.18
> rs_pw_transfer_PW2RS_150 109922 14.3 1943.71 2063.73 4455.92
> 4497.89
> build_core_hamiltonian_matrix_ 10001 5.0 0.39 0.44 2865.88
> 3438.38
> qs_ks_update_qs_env_forces 10001 5.0 0.05 0.06 3365.19
> 3366.37
> init_scf_run 10001 6.0 0.61 0.93 3252.05
> 3253.43
> scf_env_initial_rho_setup 10001 7.0 0.24 1.03 3175.29
> 3176.49
> wfi_extrapolate 10001 8.0 0.91 1.00 3104.21
> 3104.23
> pw_transfer 1288972 11.8 67.54 70.98 2676.70
> 2707.44
> fft_wrap_pw1pw2 1089130 12.8 10.61 11.18 2555.45
> 2585.55
> mp_alltoall_d11v 1529045 12.0 2279.64 2399.42 2279.64
> 2399.42
> fft_wrap_pw1pw2_150 489604 13.2 220.23 228.66 2227.19
> 2283.38
> rs_gather_matrices 99921 12.7 10.55 14.72 2150.73
> 2276.05
> build_core_ppnl_forces 10001 6.0 1724.02 2032.14 1724.02
> 2032.14
> fft3d_ps 1089130 14.8 824.46 858.66 1971.84
> 1994.23
> mp_sum_d 869728 10.8 1050.61 1821.39 1050.61
> 1821.39
> qs_energies_init_hamiltonians 10001 6.0 0.17 0.19 1767.07
> 1767.08
> mp_waitall_1 ******* 14.6 1405.52 1749.18 1405.52
> 1749.18
> calculate_ecore_overlap 20002 6.0 0.24 0.35 885.01
> 1685.36
>
> -------------------------------------------------------------------------------
>
> The number of warnings for this run is : 1
>
> On Friday, February 5, 2021 at 5:43:48 PM UTC+8 Alfio Lazzaro wrote:
>
>> Well, what I need is the top (let's say up to "SCF WAVEFUNCTION
>> OPTIMIZATION") and the bottom of the logs (starting at "DBCSR STATISTICS").
>>
>> Il giorno venerdì 5 febbraio 2021 alle 09:24:34 UTC+1 singlebook ha
>> scritto:
>>
>>> Hello, Alfio,
>>>
>>> Yes, there are 12 MPI ranks, each rank has only one thread.
>>> The output file is too large to upload, I only put the head information
>>> for the cpu version here, those files for gpu are not saved for the moment.
>>> Whenever the workstation is idle, I will do more tests.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *DBCSR| CPU Multiplication
>>> driver XSMM DBCSR| Multrec
>>> recursion limit 512 DBCSR|
>>> Multiplication stack size
>>> 1000 DBCSR| Maximum elements for images
>>> UNLIMITED DBCSR| Multiplicative factor virtual
>>> images 1 DBCSR| Use multiplication
>>> densification T DBCSR| Multiplication
>>> size stacks 3 DBCSR| Use memory
>>> pool for CPU allocation F DBCSR| Number
>>> of 3D layers SINGLE DBCSR|
>>> Use MPI memory allocation
>>> F DBCSR| Use RMA
>>> algorithm F DBCSR| Use
>>> Communication thread T DBCSR|
>>> Communication thread load
>>> 87 DBCSR| MPI: My node
>>> id 0 DBCSR| MPI:
>>> Number of nodes 48 DBCSR|
>>> OMP: Current number of threads
>>> 1 DBCSR| OMP: Max number of
>>> threads 1 DBCSR| Split modifier
>>> for TAS multiplication algorithm 1.0E+00 **** ****
>>> ****** ** PROGRAM STARTED AT 2021-02-04 09:18:01.088 *****
>>> ** *** *** ** PROGRAM STARTED ON
>>> k172 ** **** ****** PROGRAM STARTED
>>> BY chenwei ***** ** ** ** ** PROGRAM
>>> PROCESS ID 52126 **** ** ******* **
>>> PROGRAM STARTED IN /ncsfs02/chenwei/Machine
>>> Learning/CP2 K/SiC CP2K| version
>>> string: CP2K version 8.1 CP2K|
>>> source code revision number:
>>> git:0b61f2f CP2K| cp2kflags: omp libint fftw3 libxc elpa parallel mpi3
>>> scalapack xsmm plume CP2K| d2 spglib libvori libbqb CP2K| is
>>> freely available from https://www.cp2k.org/
>>> <https://www.cp2k.org/> CP2K| Program compiled at
>>> Thu Feb 4 08:49:28 CST 2021 CP2K| Program compiled
>>> on k172 CP2K| Program
>>> compiled for local CP2K|
>>> Data directory path
>>> /home/chenwei/src/cp2k-8.1/data CP2K| Input file
>>> name SiC.inp GLOBAL|
>>> Force Environment number
>>> 1 GLOBAL| Basis set file name
>>> BASIS_SET GLOBAL| Potential file name
>>> GTH_POTENTIALS GLOBAL| MM Potential file
>>> name MM_POTENTIAL GLOBAL| Coordinate
>>> file name __STD_INPUT__ GLOBAL| Method
>>> name CP2K GLOBAL|
>>> Project name
>>> SiC_AIMD GLOBAL| Preferred FFT
>>> library FFTW3 GLOBAL| Preferred
>>> diagonalization lib. ELPA GLOBAL| Run
>>> type MD GLOBAL|
>>> All-to-all communication in single precision
>>> F GLOBAL| FFTs using library dependent
>>> lengths F GLOBAL| Global print
>>> level LOW GLOBAL| MPI I/O
>>> enabled T GLOBAL|
>>> Total number of message passing processes
>>> 48 GLOBAL| Number of threads for this
>>> process 1 GLOBAL| This output is from
>>> process 0 GLOBAL| CPU model
>>> name Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz GLOBAL|
>>> CPUID
>>> 1002 MEMORY| system memory details [Kb] MEMORY| rank
>>> 0 min max average MEMORY| MemTotal
>>> 131748504 131748504 131748504 131748504 MEMORY|
>>> MemFree 67523260 67523260 67523260
>>> 67523260 MEMORY| Buffers 4712 4712
>>> 4712 4712 MEMORY| Cached 56159648 56159648
>>> 56159648 56159648 MEMORY| Slab 2740508
>>> 2740508 2740508 2740508 MEMORY| SReclaimable
>>> 2447544 2447544 2447544 2447544 MEMORY|
>>> MemLikelyFree 126135164 126135164 126135164
>>> 126135164 GENERATE| Preliminary Number of Bonds
>>> generated: 0 GENERATE| Achieved consistency in
>>> connectivity generation.*
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> * SCF WAVEFUNCTION OPTIMIZATION Step Update method Time
>>> Convergence Total energy Change
>>> ------------------------------------------------------------------------------
>>> 1 NoMix/Diag. 0.40E+00 0.3 3.80220882 -317.7175159821
>>> -3.18E+02 2 Broy./Diag. 0.40E+00 0.6 0.43368094
>>> -291.0370906460 2.67E+01 3 Broy./Diag. 0.40E+00 0.6
>>> 0.23506554 -308.2043627628 -1.72E+01 4 Broy./Diag. 0.40E+00
>>> 0.6 0.26390650 -309.7756477106 -1.57E+00 5 Broy./Diag.
>>> 0.40E+00 0.6 0.00311711 -310.0196552337 -2.44E-01 6
>>> Broy./Diag. 0.40E+00 0.6 0.01762115 -309.8687051316
>>> 1.51E-01 7 Broy./Diag. 0.40E+00 0.6 0.00055086
>>> -309.8505587170 1.81E-02 8 Broy./Diag. 0.40E+00 0.6
>>> 0.00030811 -309.8516271774 -1.07E-03 9 Broy./Diag. 0.40E+00
>>> 0.6 0.00001506 -309.8519055144 -2.78E-04 10 Broy./Diag.
>>> 0.40E+00 0.6 0.00000129 -309.8519255844 -2.01E-05 11
>>> Broy./Diag. 0.40E+00 0.6 0.00000032 -309.8519300365
>>> -4.45E-06 12 Broy./Diag. 0.40E+00 0.6 0.00000002
>>> -309.8519304271 -3.91E-07 *** SCF run converged in 12 steps ****
>>>
>>> Best wishes,
>>>
>>> Wei
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20210205/d9377c2c/attachment.htm>
More information about the CP2K-user
mailing list