<div>Hello,  Alfio,</div><div><br></div><div>Yes, there are 12 MPI ranks, each rank has only one thread.</div><div>The output file is too large to upload, I only  put the head information for the cpu version here, those files for gpu are not saved for the moment. Whenever the workstation is idle, I will do more tests.</div><div><br></div><div><b>DBCSR| CPU Multiplication driver                                           XSMM<br> DBCSR| Multrec recursion limit                                              512<br> DBCSR| Multiplication stack size                                           1000<br> DBCSR| Maximum elements for images                                    UNLIMITED<br> DBCSR| Multiplicative factor virtual images                                   1<br> DBCSR| Use multiplication densification                                       T<br> DBCSR| Multiplication size stacks                                             3<br> DBCSR| Use memory pool for CPU allocation                                     F<br> DBCSR| Number of 3D layers                                               SINGLE<br> DBCSR| Use MPI memory allocation                                              F<br> DBCSR| Use RMA algorithm                                                      F<br> DBCSR| Use Communication thread                                               T<br> DBCSR| Communication thread load                                             87<br> DBCSR| MPI: My node id                                                        0<br> DBCSR| MPI: Number of nodes                                                  48<br> DBCSR| OMP: Current number of threads                                         1<br> DBCSR| OMP: Max number of threads                                             1<br> DBCSR| Split modifier for TAS multiplication algorithm                  1.0E+00<br><br><br>  **** **** ******  **  PROGRAM STARTED AT               2021-02-04 09:18:01.088<br> ***** ** ***  *** **   PROGRAM STARTED ON                                  k172<br> **    ****   ******    PROGRAM STARTED BY                               chenwei<br> ***** **    ** ** **   PROGRAM PROCESS ID                                 52126<br>  **** **  *******  **  PROGRAM STARTED IN /ncsfs02/chenwei/Machine Learning/CP2<br>                                           K/SiC<br><br> CP2K| version string:                                          CP2K version 8.1<br> CP2K| source code revision number:                                  git:0b61f2f<br> CP2K| cp2kflags: omp libint fftw3 libxc elpa parallel mpi3 scalapack xsmm plume<br> CP2K|            d2 spglib libvori libbqb<br> CP2K| is freely available from                            https://www.cp2k.org/<br> CP2K| Program compiled at                          Thu Feb  4 08:49:28 CST 2021<br> CP2K| Program compiled on                                                  k172<br> CP2K| Program compiled for                                                local<br> CP2K| Data directory path                       /home/chenwei/src/cp2k-8.1/data<br> CP2K| Input file name                                                   SiC.inp<br><br> GLOBAL| Force Environment number                                              1<br> GLOBAL| Basis set file name                                           BASIS_SET<br> GLOBAL| Potential file name                                      GTH_POTENTIALS<br> GLOBAL| MM Potential file name                                     MM_POTENTIAL<br> GLOBAL| Coordinate file name                                      __STD_INPUT__<br> GLOBAL| Method name                                                        CP2K<br> GLOBAL| Project name                                                   SiC_AIMD<br> GLOBAL| Preferred FFT library                                             FFTW3<br> GLOBAL| Preferred diagonalization lib.                                     ELPA<br> GLOBAL| Run type                                                             MD<br> GLOBAL| All-to-all communication in single precision                          F<br> GLOBAL| FFTs using library dependent lengths                                  F<br> GLOBAL| Global print level                                                  LOW<br> GLOBAL| MPI I/O enabled                                                       T<br> GLOBAL| Total number of message passing processes                            48<br> GLOBAL| Number of threads for this process                                    1<br> GLOBAL| This output is from process                                           0<br> GLOBAL| CPU model name                Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz<br> GLOBAL| CPUID                                                              1002<br><br> MEMORY| system memory details [Kb]<br> MEMORY|                        rank 0           min           max       average<br> MEMORY| MemTotal            131748504     131748504     131748504     131748504<br> MEMORY| MemFree              67523260      67523260      67523260      67523260<br> MEMORY| Buffers                  4712          4712          4712          4712<br> MEMORY| Cached               56159648      56159648      56159648      56159648<br> MEMORY| Slab                  2740508       2740508       2740508       2740508<br> MEMORY| SReclaimable          2447544       2447544       2447544       2447544<br> MEMORY| MemLikelyFree       126135164     126135164     126135164     126135164<br><br><br> GENERATE|  Preliminary Number of Bonds generated:                             0<br> GENERATE|  Achieved consistency in connectivity generation.</b><br></div><div><br></div><div><b> SCF WAVEFUNCTION OPTIMIZATION<br><br>  Step     Update method      Time    Convergence         Total energy    Change<br>  ------------------------------------------------------------------------------<br>     1 NoMix/Diag. 0.40E+00    0.3     3.80220882      -317.7175159821 -3.18E+02<br>     2 Broy./Diag. 0.40E+00    0.6     0.43368094      -291.0370906460  2.67E+01<br>     3 Broy./Diag. 0.40E+00    0.6     0.23506554      -308.2043627628 -1.72E+01<br>     4 Broy./Diag. 0.40E+00    0.6     0.26390650      -309.7756477106 -1.57E+00<br>     5 Broy./Diag. 0.40E+00    0.6     0.00311711      -310.0196552337 -2.44E-01<br>     6 Broy./Diag. 0.40E+00    0.6     0.01762115      -309.8687051316  1.51E-01<br>     7 Broy./Diag. 0.40E+00    0.6     0.00055086      -309.8505587170  1.81E-02<br>     8 Broy./Diag. 0.40E+00    0.6     0.00030811      -309.8516271774 -1.07E-03<br>     9 Broy./Diag. 0.40E+00    0.6     0.00001506      -309.8519055144 -2.78E-04<br>    10 Broy./Diag. 0.40E+00    0.6     0.00000129      -309.8519255844 -2.01E-05<br>    11 Broy./Diag. 0.40E+00    0.6     0.00000032      -309.8519300365 -4.45E-06<br>    12 Broy./Diag. 0.40E+00    0.6     0.00000002      -309.8519304271 -3.91E-07<br><br>  *** SCF run converged in    12 steps ***<br></b><br></div><div><br></div><div>Best wishes,</div><div><br></div><div>Wei<br></div><div class="gmail_quote"><br></div>