<div>Thank you very much for your reply. <br></div><div><br></div><div>Best wishes,</div><div><br></div><div>Wei<br></div><br><div class="gmail_quote"><div dir="auto" class="gmail_attr">On Friday, February 5, 2021 at 8:50:59 PM UTC+8 Alfio Lazzaro wrote:<br/></div><blockquote class="gmail_quote" style="margin: 0 0 0 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">OK, Thanks for the timers.<div>I assume you sent me the CPU timers.</div><div>As suspected, you are massively dominated by no GPU part. I can even not see any COSMA stuff. </div><div>These are the main parts where the time goes:</div><div><br></div><div><div>fft_wrap_pw1pw2_150              228.660</div><div>fft3d_ps                         858.660</div><div>rs_pw_transfer_RS2PW_150        1206.580</div><div>mp_waitall_1                    1749.180</div><div>mp_sum_d                        1821.390</div><div>build_core_ppnl_forces          2032.140</div><div>rs_pw_transfer_PW2RS_150        2063.730</div><div>mp_alltoall_d11v                2399.420</div><div>mp_waitany                      6457.620</div><div>cp_fm_diag_elpa_base            6729.030</div><div>grid_integrate_task_list       16394.840</div><div>grid_collocate_task_list       18769.980</div><div>CP2K_Total                     66823.040</div></div><div><br></div><div>More than half of the total time (66823.040) is in the grid_* functions. BTW, for this kind of testings, I suggest using fewer steps...</div><div>I suspect you are hitting the performance problem for the CPU and GPU reported for the CP2K 8.1 (see <a href="https://github.com/cp2k/cp2k/issues/1323" target="_blank" rel="nofollow" data-saferedirecturl="https://www.google.com/url?hl=en&q=https://github.com/cp2k/cp2k/issues/1323&source=gmail&ust=1612656499150000&usg=AFQjCNH6FbQMf6YMyI-FtnX4poRLkPMN7Q">https://github.com/cp2k/cp2k/issues/1323</a> ).</div><div>I suggest to try CP2K 7.1...</div><div><br></div><div>Alfio</div><div><br></div><div><br></div><div><br></div><div><br><br></div><div class="gmail_quote"><div dir="auto" class="gmail_attr">Il giorno venerdì 5 febbraio 2021 alle 10:54:47 UTC+1 singlebook ha scritto:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> DBCSR| CPU Multiplication driver                                           XSMM<br> DBCSR| Multrec recursion limit                                              512<br> DBCSR| Multiplication stack size                                           1000<br> DBCSR| Maximum elements for images                                    UNLIMITED<br> DBCSR| Multiplicative factor virtual images                                   1<br> DBCSR| Use multiplication densification                                       T<br> DBCSR| Multiplication size stacks                                             3<br> DBCSR| Use memory pool for CPU allocation                                     F<br> DBCSR| Number of 3D layers                                               SINGLE<br> DBCSR| Use MPI memory allocation                                              F<br> DBCSR| Use RMA algorithm                                                      F<br> DBCSR| Use Communication thread                                               T<br> DBCSR| Communication thread load                                             87<br> DBCSR| MPI: My node id                                                        0<br> DBCSR| MPI: Number of nodes                                                  48<br> DBCSR| OMP: Current number of threads                                         1<br> DBCSR| OMP: Max number of threads                                             1<br> DBCSR| Split modifier for TAS multiplication algorithm                  1.0E+00<br><br><br>  **** **** ******  **  PROGRAM STARTED AT               2021-02-04 09:18:01.088<br> ***** ** ***  *** **   PROGRAM STARTED ON                                  k172<br> **    ****   ******    PROGRAM STARTED BY                               chenwei<br> ***** **    ** ** **   PROGRAM PROCESS ID                                 52126<br>  **** **  *******  **  PROGRAM STARTED IN /ncsfs02/chenwei/Machine Learning/CP2<br>                                           K/SiC<br><br> CP2K| version string:                                          CP2K version 8.1<br> CP2K| source code revision number:                                  git:0b61f2f<br> CP2K| cp2kflags: omp libint fftw3 libxc elpa parallel mpi3 scalapack xsmm plume<br> CP2K|            d2 spglib libvori libbqb<br> CP2K| is freely available from                            <a href="https://www.cp2k.org/" rel="nofollow" target="_blank" data-saferedirecturl="https://www.google.com/url?hl=en&q=https://www.cp2k.org/&source=gmail&ust=1612656499151000&usg=AFQjCNGmwn6nsVkoQjWTi5wXilaxcN6cnw">https://www.cp2k.org/</a><br> CP2K| Program compiled at                          Thu Feb  4 08:49:28 CST 2021<br> CP2K| Program compiled on                                                  k172<br> CP2K| Program compiled for                                                local<br> CP2K| Data directory path                       /home/chenwei/src/cp2k-8.1/data<br> CP2K| Input file name                                                   SiC.inp<br><br> GLOBAL| Force Environment number                                              1<br> GLOBAL| Basis set file name                                           BASIS_SET<br> GLOBAL| Potential file name                                      GTH_POTENTIALS<br> GLOBAL| MM Potential file name                                     MM_POTENTIAL<br> GLOBAL| Coordinate file name                                      __STD_INPUT__<br> GLOBAL| Method name                                                        CP2K<br> GLOBAL| Project name                                                   SiC_AIMD<br> GLOBAL| Preferred FFT library                                             FFTW3<br> GLOBAL| Preferred diagonalization lib.                                     ELPA<br> GLOBAL| Run type                                                             MD<br> GLOBAL| All-to-all communication in single precision                          F<br> GLOBAL| FFTs using library dependent lengths                                  F<br> GLOBAL| Global print level                                                  LOW<br> GLOBAL| MPI I/O enabled                                                       T<br> GLOBAL| Total number of message passing processes                            48<br> GLOBAL| Number of threads for this process                                    1<br> GLOBAL| This output is from process                                           0<br> GLOBAL| CPU model name                Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz<br> GLOBAL| CPUID                                                              1002<br><br> MEMORY| system memory details [Kb]<br> MEMORY|                        rank 0           min           max       average<br> MEMORY| MemTotal            131748504     131748504     131748504     131748504<br> MEMORY| MemFree              67523260      67523260      67523260      67523260<br> MEMORY| Buffers                  4712          4712          4712          4712<br> MEMORY| Cached               56159648      56159648      56159648      56159648<br> MEMORY| Slab                  2740508       2740508       2740508       2740508<br> MEMORY| SReclaimable          2447544       2447544       2447544       2447544<br> MEMORY| MemLikelyFree       126135164     126135164     126135164     126135164<br><br><br> GENERATE|  Preliminary Number of Bonds generated:                             0<br> GENERATE|  Achieved consistency in connectivity generation.<br><br> *******************************************************************************<br> *******************************************************************************<br> **                                                                           **<br> **     #####                         ##              ##                      **<br> **    ##   ##            ##          ##              ##                      **<br> **   ##     ##                       ##            ######                    **<br> **   ##     ##  ##   ##  ##   #####  ##  ##   ####   ##    #####    #####    **<br> **   ##     ##  ##   ##  ##  ##      ## ##   ##      ##   ##   ##  ##   ##   **<br> **   ##  ## ##  ##   ##  ##  ##      ####     ###    ##   ######   ######    **<br> **    ##  ###   ##   ##  ##  ##      ## ##      ##   ##   ##       ##        **<br> **     #######   #####   ##   #####  ##  ##  ####    ##    #####   ##        **<br> **           ##                                                    ##        **<br> **                                                                           **<br> **                                                ... make the atoms dance   **<br> **                                                                           **<br> **            Copyright (C) by CP2K developers group (2000 - 2020)           **<br> **                      J. Chem. Phys. 152, 194103 (2020)                    **<br> **                                                                           **<br> *******************************************************************************<br><br><br> TOTAL NUMBERS AND MAXIMUM NUMBERS<br><br>  Total number of            - Atomic kinds:                                   2<br>                             - Atoms:                                         64<br>                             - Shell sets:                                   128<br>                             - Shells:                                       320<br>                             - Primitive Cartesian functions:                320<br>                             - Cartesian basis functions:                    896<br>                             - Spherical basis functions:                    832<br><br>  Maximum angular momentum of- Orbital basis functions:                        2<br>                             - Local part of the GTH pseudopotential:          2<br>                             - Non-local part of the GTH pseudopotential:      2<br><br><br> SCF PARAMETERS         Density guess:                                    ATOMIC<br>                        --------------------------------------------------------<br>                        max_scf:                                             300<br>                        max_scf_history:                                       0<br>                        max_diis:                                              4<br>                        --------------------------------------------------------<br>                        eps_scf:                                        1.00E-07<br>                        eps_scf_history:                                0.00E+00<br>                        eps_diis:                                       1.00E-01<br>                        eps_eigval:                                     1.00E-05<br>                        --------------------------------------------------------<br>                        level_shift [a.u.]:                                 0.00<br>                        --------------------------------------------------------<br>                        Mixing method:                            BROYDEN_MIXING<br>                                                charge density mixing in g-space<br>                        --------------------------------------------------------<br>                        No outer SCF<br><br> PW_GRID| Information for grid number                                          1<br> PW_GRID| Grid distributed over                                    48 processors<br> PW_GRID| Real space group dimensions                                    48    1<br> PW_GRID| the grid is blocked:                                                NO<br> PW_GRID| Cutoff [a.u.]                                                    150.0<br> PW_GRID| spherical cutoff:                                                   NO<br> PW_GRID|   Bounds   1            -48      47                Points:          96<br> PW_GRID|   Bounds   2            -48      47                Points:          96<br> PW_GRID|   Bounds   3            -48      47                Points:          96<br> PW_GRID| Volume element (a.u.^3)  0.5016E-02     Volume (a.u.^3)      4437.6722<br> PW_GRID| Grid span                                                    FULLSPACE<br> PW_GRID|   Distribution                         Average         Max         Min<br> PW_GRID|   G-Vectors                            18432.0       18432       18432<br> PW_GRID|   G-Rays                                 192.0         192         192<br> PW_GRID|   Real Space Points                    18432.0       18432       18432<br><br> PW_GRID| Information for grid number                                          2<br> PW_GRID| Number of the reference grid                                         1<br> PW_GRID| Grid distributed over                                    48 processors<br> PW_GRID| Real space group dimensions                                    48    1<br> PW_GRID| the grid is blocked:                                                NO<br> PW_GRID| Cutoff [a.u.]                                                     50.0<br> PW_GRID| spherical cutoff:                                                   NO<br> PW_GRID|   Bounds   1            -27      26                Points:          54<br> PW_GRID|   Bounds   2            -27      26                Points:          54<br> PW_GRID|   Bounds   3            -27      26                Points:          54<br> PW_GRID| Volume element (a.u.^3)  0.2818E-01     Volume (a.u.^3)      4437.6722<br> PW_GRID| Grid span                                                    FULLSPACE<br> PW_GRID|   Distribution                         Average         Max         Min<br> PW_GRID|   G-Vectors                             3280.5        3402        3186<br> PW_GRID|   G-Rays                                  60.8          63          59<br> PW_GRID|   Real Space Points                     3280.5        5832        2916<br><br> PW_GRID| Information for grid number                                          3<br> PW_GRID| Number of the reference grid                                         1<br> PW_GRID| Grid distributed over                                    48 processors<br> PW_GRID| Real space group dimensions                                     6    8<br> PW_GRID| the grid is blocked:                                                NO<br> PW_GRID| Cutoff [a.u.]                                                     16.7<br> PW_GRID| spherical cutoff:                                                   NO<br> PW_GRID|   Bounds   1            -16      15                Points:          32<br> PW_GRID|   Bounds   2            -16      15                Points:          32<br> PW_GRID|   Bounds   3            -16      15                Points:          32<br> PW_GRID| Volume element (a.u.^3)  0.1354         Volume (a.u.^3)      4437.6722<br> PW_GRID| Grid span                                                    FULLSPACE<br> PW_GRID|   Distribution                         Average         Max         Min<br> PW_GRID|   G-Vectors                              682.7         704         640<br> PW_GRID|   G-Rays                                  21.3          22          20<br> PW_GRID|   Real Space Points                      682.7         768         640<br><br> PW_GRID| Information for grid number                                          4<br> PW_GRID| Number of the reference grid                                         1<br> PW_GRID| Grid distributed over                                    48 processors<br> PW_GRID| Real space group dimensions                                     6    8<br> PW_GRID| the grid is blocked:                                                NO<br> PW_GRID| Cutoff [a.u.]                                                      5.6<br> PW_GRID| spherical cutoff:                                                   NO<br> PW_GRID|   Bounds   1             -9       8                Points:          18<br> PW_GRID|   Bounds   2             -9       8                Points:          18<br> PW_GRID|   Bounds   3             -9       8                Points:          18<br> PW_GRID| Volume element (a.u.^3)  0.7609         Volume (a.u.^3)      4437.6722<br> PW_GRID| Grid span                                                    FULLSPACE<br> PW_GRID|   Distribution                         Average         Max         Min<br> PW_GRID|   G-Vectors                              121.5         144         108<br> PW_GRID|   G-Rays                                   6.8           8           6<br> PW_GRID|   Real Space Points                      121.5         162         108<br><br> POISSON| Solver                                                        PERIODIC<br> POISSON| Periodicity                                                        XYZ<br><br> RS_GRID| Information for grid number                                          1<br> RS_GRID|   Bounds   1            -48      47                Points:          96<br> RS_GRID|   Bounds   2            -48      47                Points:          96<br> RS_GRID|   Bounds   3            -48      47                Points:          96<br> RS_GRID| Real space distribution over                                  6 groups<br> RS_GRID| Real space distribution along direction                              2<br> RS_GRID| Border size                                                         26<br> RS_GRID| Real space distribution over                                  8 groups<br> RS_GRID| Real space distribution along direction                              3<br> RS_GRID| Border size                                                         26<br> RS_GRID|   Distribution                         Average         Max         Min<br> RS_GRID|   Planes                                  68.0          68          68<br> RS_GRID|   Distribution                         Average         Max         Min<br> RS_GRID|   Planes                                  64.0          64          64<br><br> RS_GRID| Information for grid number                                          2<br> RS_GRID|   Bounds   1            -27      26                Points:          54<br> RS_GRID|   Bounds   2            -27      26                Points:          54<br> RS_GRID|   Bounds   3            -27      26                Points:          54<br> RS_GRID| Real space fully replicated<br> RS_GRID| Group size                                                           1<br><br> RS_GRID| Information for grid number                                          3<br> RS_GRID|   Bounds   1            -16      15                Points:          32<br> RS_GRID|   Bounds   2            -16      15                Points:          32<br> RS_GRID|   Bounds   3            -16      15                Points:          32<br> RS_GRID| Real space fully replicated<br> RS_GRID| Group size                                                           1<br><br> RS_GRID| Information for grid number                                          4<br> RS_GRID|   Bounds   1             -9       8                Points:          18<br> RS_GRID|   Bounds   2             -9       8                Points:          18<br> RS_GRID|   Bounds   3             -9       8                Points:          18<br> RS_GRID| Real space fully replicated<br> RS_GRID| Group size                                                           1<br><br> MD_PAR| Molecular dynamics protocol (MD input parameters)<br> MD_PAR| Ensemble type                                                       NVT<br> MD_PAR| Number of time steps                                              10000<br> MD_PAR| Time step [fs]                                                 0.500000<br> MD_PAR| Temperature [K]                                              300.000000<br> MD_PAR| Temperature tolerance [K]                                      0.000000<br> MD_PAR| Print MD information every                                   10 step(s)<br> MD_PAR| File type   Print frequency [steps]                          File names<br> MD_PAR| Coordinates         10                               SiC_AIMD-pos-1.xyz<br> MD_PAR| Velocities          10                               SiC_AIMD-vel-1.xyz<br> MD_PAR| Energies            10                                  SiC_AIMD-1.ener<br> MD_PAR| Dump                20                               SiC_AIMD-1.restart<br><br> ROT| Rotational analysis information<br> ROT| Principal axes and moments of inertia [a.u.]<br> ROT|                           1                   2                   3<br> ROT| Eigenvalues      9.86893119935E+07   1.19427476747E+08   1.19427476747E+08<br> ROT|      x              0.577350269190     -0.408248290464      0.707106781187<br> ROT|      y              0.577350269190     -0.408248290464     -0.707106781187<br> ROT|      z              0.577350269190      0.816496580928      0.000000000000<br> ROT| Number of rotovibrational vectors                                        6<br><br> DOF| Calculation of degrees of freedom<br> DOF| Number of atoms                                                         64<br> DOF| Number of intramolecular constraints                                     0<br> DOF| Number of intermolecular constraints                                     0<br> DOF| Invariants (translations + rotations)                                    3<br> DOF| Degrees of freedom                                                     189<br><br> DOF| Restraints information<br> DOF| Number of intramolecular restraints                                      0<br> DOF| Number of intermolecular restraints                                      0<br><br> THERMOSTAT| Thermostat information for PARTICLES<br> THERMOSTAT| Type of thermostat                               Nose-Hoover-Chains<br> THERMOSTAT| Nose-Hoover-Chain length                                          3<br> THERMOSTAT| Nose-Hoover-Chain time constant [fs]                    1000.000000<br> THERMOSTAT| Order of Yoshida integrator                                       3<br> THERMOSTAT| Number of multiple time steps                                     2<br> THERMOSTAT| Initial potential energy                         0.000000000000E+00<br> THERMOSTAT| Initial kinetic energy                           0.475022301493E-03<br> THERMOSTAT| End of thermostat information for PARTICLES<br><br> MD_VEL| Velocities initialization<br> MD_VEL| Initial temperature [K]                                      300.000000<br> MD_VEL| COM velocity             0.0000000000    -0.0000000000    -0.0000000000<br><br> Number of electrons:                                                        256<br> Number of occupied orbitals:                                                128<br> Number of molecular orbitals:                                               128<br><br> Number of orbital functions:                                                832<br> Number of independent orbital functions:                                    832<br><br><div> Extrapolation method: initial_guess</div><div><br></div><div><br></div><br> -------------------------------------------------------------------------------<br> -                                                                             -<br> -                                DBCSR STATISTICS                             -<br> -                                                                             -<br> -------------------------------------------------------------------------------<br> COUNTER                                    TOTAL       BLAS       SMM       ACC<br> flops    13 x    32 x    13        7086601666560       0.0%    100.0%      0.0%<br> flops    13 x    13 x    32        9891694059520       0.0%    100.0%      0.0%<br> flops inhomo. stacks                           0       0.0%      0.0%      0.0%<br> flops total                        16.978296E+12       0.0%    100.0%      0.0%<br> flops max/rank                    732.153860E+09       0.0%    100.0%      0.0%<br> matmuls inhomo. stacks                         0       0.0%      0.0%      0.0%<br> matmuls total                         1569738880       0.0%    100.0%      0.0%<br> number of processed stacks              28782912       0.0%    100.0%      0.0%<br> average stack size                                     0.0      54.5       0.0<br> marketing flops                    26.595494E+12<br> -------------------------------------------------------------------------------<br> # multiplications                         149911<br> max memory usage/rank             153.088000E+06<br> # max total images/rank                        3<br> # max 3D layers                                1<br> # MPI messages exchanged               143914560<br> MPI messages size (bytes):<br>  total size                         3.855411E+12<br>  min size                           0.000000E+00<br>  max size                         137.904000E+03<br>  average size                      26.789580E+03<br> MPI breakdown and total messages size (bytes):<br>             size <=      128            81866560                        0<br>       128 < size <=     8192                   0                        0<br>      8192 < size <=    32768            21587184             383158124544<br>     32768 < size <=   131072            36941696            2980859518208<br>    131072 < size <=  4194304             3519120             485300724480<br>   4194304 < size <= 16777216                   0                        0<br>  16777216 < size                               0                        0<br> -------------------------------------------------------------------------------<br><br> *** WARNING in dbcsr_mm.F:294 :: Using a non-square number of MPI ranks ***<br> *** might lead to poor performance. Used ranks: 48 Suggested: 49 100    ***<br><br> -------------------------------------------------------------------------------<br> -                                                                             -<br> -                      DBCSR MESSAGE PASSING PERFORMANCE                      -<br> -                                                                             -<br> -------------------------------------------------------------------------------<br> ROUTINE             CALLS      AVE VOLUME [Bytes]<br> MP_Bcast                3                     12.<br> MP_Allreduce       869441                      8.<br> MP_Alltoall       3098138                  32851.<br> MP_ISend          7195728                  12717.<br> MP_IRecv          7195728                  11224.<br> -------------------------------------------------------------------------------<br><br> -------------------------------------------------------------------------------<br> -                                                                             -<br> -                                GRID STATISTICS                              -<br> -                                                                             -<br> -------------------------------------------------------------------------------<br> LP    KERNEL             BACKEND                              COUNT     PERCENT<br> 2     collocate ortho    REF                             <a href="tel:(970)%20871-3949" value="+19708713949" rel="nofollow" target="_blank">9708713949</a>      36.60%<br> 4     integrate ortho    REF                              529879041       2.00%<br> 4     collocate ortho    REF                              221635148       0.84%<br> 2     integrate ortho    REF                             <a href="tel:(873)%20697-6861" value="+18736976861" rel="nofollow" target="_blank">8736976861</a>      32.94%<br> 0     collocate general  REF                               30723072       0.12%<br> 1     integrate general  REF                               30723072       0.12%<br> 5     integrate ortho    REF                               22183061       0.08%<br> 3     integrate ortho    REF                             3942635281      14.86%<br> 3     collocate ortho    REF                             3301325147      12.45%<br> -------------------------------------------------------------------------------<br><br> MEMORY| Estimated peak process memory [MiB]                                 146<br><br> -------------------------------------------------------------------------------<br> ----                             MULTIGRID INFO                            ----<br> -------------------------------------------------------------------------------<br> count for grid        1:      110066116          cutoff [a.u.]          150.00<br> count for grid        2:      519820015          cutoff [a.u.]           50.00<br> count for grid        3:      459986613          cutoff [a.u.]           16.67<br> count for grid        4:      235051958          cutoff [a.u.]            5.56<br> total gridlevel count  :     1324924702<br><br> -------------------------------------------------------------------------------<br> -                                                                             -<br> -                         MESSAGE PASSING PERFORMANCE                         -<br> -                                                                             -<br> -------------------------------------------------------------------------------<br><br> ROUTINE             CALLS      AVE VOLUME [Bytes]<br> MP_Group                4<br> MP_Bcast           203792                   2218.<br> MP_Allreduce      1459647                    265.<br> MP_Sync                 4<br> MP_Alltoall       1818671                 396307.<br> MP_ISendRecv     28177722                  18032.<br> MP_Wait          42247738<br> MP_ISend         12750952                  57626.<br> MP_IRecv         12750952                  57626.<br> -------------------------------------------------------------------------------<br><br><br> -------------------------------------------------------------------------------<br> -                                                                             -<br> -                                T I M I N G                                  -<br> -                                                                             -<br> -------------------------------------------------------------------------------<br> SUBROUTINE                       CALLS  ASD         SELF TIME        TOTAL TIME<br>                                MAXIMUM       AVERAGE  MAXIMUM  AVERAGE  MAXIMUM<br> CP2K                                 1  1.0     0.01     0.01 66822.69 66823.04<br> qs_mol_dyn_low                       1  2.0     0.34     0.37 66822.51 66822.86<br> velocity_verlet                  10000  3.0     1.48     5.04 66810.62 66811.08<br> qs_forces                        10001  4.0     0.98     1.02 66806.91 66807.26<br> qs_energies                      10001  5.0     0.88     1.24 59685.56 59686.71<br> scf_env_do_scf                   10001  6.0     0.94     1.73 54615.83 54617.31<br> scf_env_do_scf_inner_loop        89920  7.0     4.83    26.14 54614.78 54616.21<br> rebuild_ks_matrix                99921  8.7     0.40     0.46 25783.42 25795.09<br> qs_ks_build_kohn_sham_matrix     99921  9.7    13.65    14.24 25783.02 25794.65<br> qs_rho_update_rho                99921  8.1     0.53     0.65 25411.34 25412.68<br> calculate_rho_elec               99921  9.1    10.26    10.68 25410.81 25412.19<br> sum_up_and_integrate             99921 10.7    10.04    11.19 24320.21 24334.14<br> integrate_v_rspace               99921 11.7     3.82     4.21 24309.99 24324.85<br> qs_ks_update_qs_env              89920  8.0     0.78     0.91 22462.31 22473.54<br> grid_collocate_task_list         99921 10.1 18451.53 18769.98 18451.53 18769.98<br> grid_integrate_task_list         99921 12.7 16303.94 16394.84 16303.94 16394.84<br> rs_pw_transfer                  819370 12.3    15.23    17.78 11655.48 12071.19<br> qs_scf_new_mos                   89920  8.0     1.71     1.94  8270.35  8321.12<br> eigensolver                      89920  9.0     5.28     7.69  7862.09  7870.32<br> density_rs2pw                    99921 10.1     6.01     6.82  6836.50  7045.41<br> cp_fm_diag_elpa                  89920 10.0     0.64     0.79  6757.80  6804.53<br> cp_fm_diag_elpa_base             89920 11.0  6676.81  6729.03  6756.91  6803.67<br> mp_waitany                     ******* 14.1  5758.84  6457.62  5758.84  6457.62<br> potential_pw2rs                  99921 12.7     6.04     6.56  5839.37  5848.24<br> rs_pw_transfer_RS2PW_150        109922 11.9  1068.20  1206.58  5210.54  5627.18<br> rs_pw_transfer_PW2RS_150        109922 14.3  1943.71  2063.73  4455.92  4497.89<br> build_core_hamiltonian_matrix_   10001  5.0     0.39     0.44  2865.88  3438.38<br> qs_ks_update_qs_env_forces       10001  5.0     0.05     0.06  3365.19  3366.37<br> init_scf_run                     10001  6.0     0.61     0.93  3252.05  3253.43<br> scf_env_initial_rho_setup        10001  7.0     0.24     1.03  3175.29  3176.49<br> wfi_extrapolate                  10001  8.0     0.91     1.00  3104.21  3104.23<br> pw_transfer                    1288972 11.8    67.54    70.98  2676.70  2707.44<br> fft_wrap_pw1pw2                1089130 12.8    10.61    11.18  2555.45  2585.55<br> mp_alltoall_d11v               1529045 12.0  2279.64  2399.42  2279.64  2399.42<br> fft_wrap_pw1pw2_150             489604 13.2   220.23   228.66  2227.19  2283.38<br> rs_gather_matrices               99921 12.7    10.55    14.72  2150.73  2276.05<br> build_core_ppnl_forces           10001  6.0  1724.02  2032.14  1724.02  2032.14<br> fft3d_ps                       1089130 14.8   824.46   858.66  1971.84  1994.23<br> mp_sum_d                        869728 10.8  1050.61  1821.39  1050.61  1821.39<br> qs_energies_init_hamiltonians    10001  6.0     0.17     0.19  1767.07  1767.08<br> mp_waitall_1                   ******* 14.6  1405.52  1749.18  1405.52  1749.18<br> calculate_ecore_overlap          20002  6.0     0.24     0.35   885.01  1685.36<br> -------------------------------------------------------------------------------<br><br> The number of warnings for this run is : 1<br><br><div class="gmail_quote"><div dir="auto" class="gmail_attr">On Friday, February 5, 2021 at 5:43:48 PM UTC+8 Alfio Lazzaro wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Well, what I need is the top (let's say up to "SCF WAVEFUNCTION OPTIMIZATION") and the bottom of the logs (starting at "DBCSR STATISTICS").<br><br><div class="gmail_quote"><div dir="auto" class="gmail_attr">Il giorno venerdì 5 febbraio 2021 alle 09:24:34 UTC+1 singlebook ha scritto:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>Hello,  Alfio,</div><div><br></div><div>Yes, there are 12 MPI ranks, each rank has only one thread.</div><div>The output file is too large to upload, I only  put the head information for the cpu version here, those files for gpu are not saved for the moment. Whenever the workstation is idle, I will do more tests.</div><div><br></div><div><b>DBCSR| CPU Multiplication driver                                           XSMM<br> DBCSR| Multrec recursion limit                                              512<br> DBCSR| Multiplication stack size                                           1000<br> DBCSR| Maximum elements for images                                    UNLIMITED<br> DBCSR| Multiplicative factor virtual images                                   1<br> DBCSR| Use multiplication densification                                       T<br> DBCSR| Multiplication size stacks                                             3<br> DBCSR| Use memory pool for CPU allocation                                     F<br> DBCSR| Number of 3D layers                                               SINGLE<br> DBCSR| Use MPI memory allocation                                              F<br> DBCSR| Use RMA algorithm                                                      F<br> DBCSR| Use Communication thread                                               T<br> DBCSR| Communication thread load                                             87<br> DBCSR| MPI: My node id                                                        0<br> DBCSR| MPI: Number of nodes                                                  48<br> DBCSR| OMP: Current number of threads                                         1<br> DBCSR| OMP: Max number of threads                                             1<br> DBCSR| Split modifier for TAS multiplication algorithm                  1.0E+00<br><br><br>  **** **** ******  **  PROGRAM STARTED AT               2021-02-04 09:18:01.088<br> ***** ** ***  *** **   PROGRAM STARTED ON                                  k172<br> **    ****   ******    PROGRAM STARTED BY                               chenwei<br> ***** **    ** ** **   PROGRAM PROCESS ID                                 52126<br>  **** **  *******  **  PROGRAM STARTED IN /ncsfs02/chenwei/Machine Learning/CP2<br>                                           K/SiC<br><br> CP2K| version string:                                          CP2K version 8.1<br> CP2K| source code revision number:                                  git:0b61f2f<br> CP2K| cp2kflags: omp libint fftw3 libxc elpa parallel mpi3 scalapack xsmm plume<br> CP2K|            d2 spglib libvori libbqb<br> CP2K| is freely available from                            <a href="https://www.cp2k.org/" rel="nofollow" target="_blank" data-saferedirecturl="https://www.google.com/url?hl=en&q=https://www.cp2k.org/&source=gmail&ust=1612656499155000&usg=AFQjCNHapApoKSQL96XU9nnR1GUPfkDEwQ">https://www.cp2k.org/</a><br> CP2K| Program compiled at                          Thu Feb  4 08:49:28 CST 2021<br> CP2K| Program compiled on                                                  k172<br> CP2K| Program compiled for                                                local<br> CP2K| Data directory path                       /home/chenwei/src/cp2k-8.1/data<br> CP2K| Input file name                                                   SiC.inp<br><br> GLOBAL| Force Environment number                                              1<br> GLOBAL| Basis set file name                                           BASIS_SET<br> GLOBAL| Potential file name                                      GTH_POTENTIALS<br> GLOBAL| MM Potential file name                                     MM_POTENTIAL<br> GLOBAL| Coordinate file name                                      __STD_INPUT__<br> GLOBAL| Method name                                                        CP2K<br> GLOBAL| Project name                                                   SiC_AIMD<br> GLOBAL| Preferred FFT library                                             FFTW3<br> GLOBAL| Preferred diagonalization lib.                                     ELPA<br> GLOBAL| Run type                                                             MD<br> GLOBAL| All-to-all communication in single precision                          F<br> GLOBAL| FFTs using library dependent lengths                                  F<br> GLOBAL| Global print level                                                  LOW<br> GLOBAL| MPI I/O enabled                                                       T<br> GLOBAL| Total number of message passing processes                            48<br> GLOBAL| Number of threads for this process                                    1<br> GLOBAL| This output is from process                                           0<br> GLOBAL| CPU model name                Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz<br> GLOBAL| CPUID                                                              1002<br><br> MEMORY| system memory details [Kb]<br> MEMORY|                        rank 0           min           max       average<br> MEMORY| MemTotal            131748504     131748504     131748504     131748504<br> MEMORY| MemFree              67523260      67523260      67523260      67523260<br> MEMORY| Buffers                  4712          4712          4712          4712<br> MEMORY| Cached               56159648      56159648      56159648      56159648<br> MEMORY| Slab                  2740508       2740508       2740508       2740508<br> MEMORY| SReclaimable          2447544       2447544       2447544       2447544<br> MEMORY| MemLikelyFree       126135164     126135164     126135164     126135164<br><br><br> GENERATE|  Preliminary Number of Bonds generated:                             0<br> GENERATE|  Achieved consistency in connectivity generation.</b><br></div><div><br></div><div><b> SCF WAVEFUNCTION OPTIMIZATION<br><br>  Step     Update method      Time    Convergence         Total energy    Change<br>  ------------------------------------------------------------------------------<br>     1 NoMix/Diag. 0.40E+00    0.3     3.80220882      -317.7175159821 -3.18E+02<br>     2 Broy./Diag. 0.40E+00    0.6     0.43368094      -291.0370906460  2.67E+01<br>     3 Broy./Diag. 0.40E+00    0.6     0.23506554      -308.2043627628 -1.72E+01<br>     4 Broy./Diag. 0.40E+00    0.6     0.26390650      -309.7756477106 -1.57E+00<br>     5 Broy./Diag. 0.40E+00    0.6     0.00311711      -310.0196552337 -2.44E-01<br>     6 Broy./Diag. 0.40E+00    0.6     0.01762115      -309.8687051316  1.51E-01<br>     7 Broy./Diag. 0.40E+00    0.6     0.00055086      -309.8505587170  1.81E-02<br>     8 Broy./Diag. 0.40E+00    0.6     0.00030811      -309.8516271774 -1.07E-03<br>     9 Broy./Diag. 0.40E+00    0.6     0.00001506      -309.8519055144 -2.78E-04<br>    10 Broy./Diag. 0.40E+00    0.6     0.00000129      -309.8519255844 -2.01E-05<br>    11 Broy./Diag. 0.40E+00    0.6     0.00000032      -309.8519300365 -4.45E-06<br>    12 Broy./Diag. 0.40E+00    0.6     0.00000002      -309.8519304271 -3.91E-07<br><br>  *** SCF run converged in    12 steps ***<br></b><br></div><div><br></div><div>Best wishes,</div><div><br></div><div>Wei<br></div><div class="gmail_quote"><br></div></blockquote></div></blockquote></div></blockquote></div></blockquote></div>