Dear CP2K community,<div>I am trying to run an DFT-MD simulation of a system containing a metal slab efficiently in parallel. I observed a strong loss of efficiency when going from 32cpu to 128cpu (full node on my system) and no speedup at all when going from 128cpu to 256cpu (i.e., 2 nodes). When going to 2 nodes, the timings seem to be dominated by cp_fm_redistribute_end (I will post all timings below). I tried using OMP parallelization on top of the MPI parallelization with a few OMP threads, but that made things worse. I also checked whether I could find benchmark tests for diag online, but could find none, so I do not know what to expect. Therefore, I have 2 questions: </div><div>1) Is this behavior expected (in view of the fact that I am using diagonalization and not OT) or may this an issue of our compilation?</div><div>2) Is there an obvious fix for the issue (e.g., use ELPA, or whatever else)?</div><div>Thank you for your help and best regards,</div><div>Katharina</div><div><br /></div><div> -                                T I M I N G  256 cpu (2 nodes)                                -<br /> -                                                                             -<br /> -------------------------------------------------------------------------------<br /> SUBROUTINE                       CALLS  ASD         SELF TIME        TOTAL TIME<br />                                MAXIMUM       AVERAGE  MAXIMUM  AVERAGE  MAXIMUM<br /> CP2K                                 1  1.0    0.043    0.204 2676.141 2676.163<br /> qs_mol_dyn_low                       1  2.0    0.006    0.008 2675.630 2675.692<br /> qs_forces                           21  4.0    0.007    0.009 2674.790 2674.812<br /> qs_energies                         21  5.0    0.003    0.004 2648.558 2648.605<br /> scf_env_do_scf                      21  6.0    0.003    0.004 2599.728 2602.333<br /> scf_env_do_scf_inner_loop          875  6.9    0.043    1.347 2599.678 2602.284<br /> velocity_verlet                     20  3.0    0.007    0.009 2418.820 2418.843<br /> qs_scf_new_mos                     875  7.9    0.018    0.022 2160.453 2162.360<br /> eigensolver                        875  8.9    0.096    0.267 2095.414 2096.092<br /> cp_fm_syevd                        895 10.0    0.015    0.025 1791.707 1795.858<br /> cp_fm_redistribute_end             895 11.0 1037.764 1785.744 1039.147 1786.479<br /> cp_fm_syevd_base                   895 10.9  739.888 1759.452  739.888 1759.452<br /> cp_fm_triangular_multiply         2625  9.9  301.128  307.525  301.128  307.525<br /> rebuild_ks_matrix                  896  8.8    0.006    0.008  248.047  248.107<br /> qs_ks_build_kohn_sham_matrix       896  9.8    0.177    0.233  248.041  248.102<br /> qs_ks_update_qs_env                875  7.9    0.023    0.029  241.842  241.902<br /> sum_up_and_integrate               896 10.8    0.207    0.444  177.563  177.730<br /> integrate_v_rspace                 896 11.8    0.034    0.051  177.354  177.596<br /> qs_rho_update_rho_low              896  7.9    0.009    0.013  176.786  177.420<br /> calculate_rho_elec                 896  8.9    0.162    0.220  176.778  177.411<br /> rs_pw_transfer                    7252 12.3    0.146    0.182  131.353  136.114<br /> density_rs2pw                      896  9.9    0.041    0.052   89.497   93.288<br /> potential_pw2rs                    896 12.8    0.058    0.070   82.157   82.656<br /> grid_collocate_task_list           896  9.9   74.814   79.240   74.814   79.240<br /> grid_integrate_task_list           896 12.8   75.706   78.564   75.706   78.564<br /> pw_transfer                      11627 11.8    1.017    1.215   71.773   73.106<br /> fft_wrap_pw1pw2                   9835 12.8    0.090    0.116   70.237   71.712<br /> mp_sum_d                          9316 10.4   29.648   70.131   29.648   70.131<br /> fft3d_ps                          9835 14.8    5.802    8.028   65.284   66.546<br /> qs_vxc_create                      896 10.8    0.023    0.040   57.095   57.943<br /> xc_vxc_pw_create                   896 11.8    0.447    0.561   57.071   57.917<br /> mp_alltoall_z22v                  9835 16.8   53.387   57.283   53.387   57.283<br /> mp_waitany                      129318 14.3   42.683   56.101   42.683   56.101<br /> mp_alltoall_d11v                 13495 12.2   50.528   55.324   50.528   55.324<br /> fft_wrap_pw1pw2_150               4459 13.1    1.760    2.222   53.418   54.942<br /> mp_waitall_1                   1080708 14.5   42.790   54.894   42.790   54.894<br /> -------------------------------------------------------------------------------<br /></div><div><br /></div><div>-                                T I M I N G         128cpu (1 node)                         -<br /> -                                                                             -<br /> -------------------------------------------------------------------------------<br /> SUBROUTINE                       CALLS  ASD         SELF TIME        TOTAL TIME<br />                                MAXIMUM       AVERAGE  MAXIMUM  AVERAGE  MAXIMUM<br /> CP2K                                 1  1.0    0.030    0.032 3072.794 3072.814<br /> qs_mol_dyn_low                       1  2.0    0.006    0.007 3072.442 3072.528<br /> qs_forces                           21  4.0    0.006    0.008 3071.900 3071.921<br /> qs_energies                         21  5.0    0.003    0.004 3024.241 3024.317<br /> scf_env_do_scf                      21  6.0    0.004    0.006 2969.550 2971.818<br /> scf_env_do_scf_inner_loop          875  6.9    0.047    0.532 2969.499 2971.766<br /> velocity_verlet                     20  3.0    0.006    0.008 2794.534 2794.562<br /> qs_scf_new_mos                     875  7.9    0.023    0.028 2271.468 2273.627<br /> eigensolver                        875  8.9    0.095    0.206 2185.491 2186.709<br /> cp_fm_syevd                        895 10.0    0.020    0.031 1767.954 1770.227<br /> cp_fm_redistribute_end             895 11.0  286.821 1759.007  288.074 1759.686<br /> cp_fm_syevd_base                   895 10.9 1465.425 1740.700 1465.425 1740.700<br /> cp_fm_triangular_multiply         2625  9.9  410.654  416.288  410.654  416.288<br /> rebuild_ks_matrix                  896  8.8    0.008    0.010  361.723  362.634<br /> qs_ks_build_kohn_sham_matrix       896  9.8    0.188    0.243  361.716  362.627<br /> qs_ks_update_qs_env                875  7.9    0.024    0.035  345.631  346.530<br /> qs_rho_update_rho_low              896  7.9    0.011    0.020  319.796  320.976<br /> calculate_rho_elec                 896  8.9    0.281    0.443  319.785  320.964<br /> sum_up_and_integrate               896 10.8    0.543    0.936  261.772  262.746<br /> integrate_v_rspace                 896 11.8    0.037    0.048  261.227  262.364<br /> rs_pw_transfer                    7252 12.3    0.140    0.161  208.866  216.256<br /> density_rs2pw                      896  9.9    0.045    0.055  163.812  170.509<br /> grid_integrate_task_list           896 12.8  148.592  155.403  148.592  155.403<br /> grid_collocate_task_list           896  9.9  140.797  144.515  140.797  144.515<br /> mp_waitany                       89824 14.3  124.591  134.833  124.591  134.833<br /> rs_pw_transfer_RS2PW_150           938 11.9   17.719   20.395  113.331  121.388<br /> potential_pw2rs                    896 12.8    0.068    0.077   89.497   90.026<br /> qs_vxc_create                      896 10.8    0.030    0.051   79.793   81.837<br /> xc_vxc_pw_create                   896 11.8    0.725    0.982   79.764   81.813<br /> pw_transfer                      11627 11.8    0.899    1.097   68.900   73.972<br /> fft_wrap_pw1pw2                   9835 12.8    0.103    0.124   66.963   72.074<br /> fft_wrap_pw1pw2_150               4459 13.1    4.627    5.429   55.447   62.287<br /> mp_alltoall_d11v                 13495 12.2   51.004   61.546   51.004   61.546<br /> -------------------------------------------------------------------------------<br /></div><div><br /></div><div>-                                T I M I N G         32cpu (1/4 of a node)                         -<br /> -                                                                             -<br /> -------------------------------------------------------------------------------<br /> SUBROUTINE                       CALLS  ASD         SELF TIME        TOTAL TIME<br />                                MAXIMUM       AVERAGE  MAXIMUM  AVERAGE  MAXIMUM<br /> CP2K                                 1  1.0    0.019    0.027 9152.866 9152.875<br /> qs_mol_dyn_low                       1  2.0    0.004    0.004 9152.558 9152.567<br /> qs_forces                           21  4.0    0.005    0.006 9152.148 9152.157<br /> qs_energies                         21  5.0    0.003    0.003 9047.928 9047.996<br /> scf_env_do_scf                      21  6.0    0.003    0.006 8925.472 8925.721<br /> scf_env_do_scf_inner_loop          875  6.9    0.058    0.394 8925.423 8925.670<br /> velocity_verlet                     20  3.0    0.003    0.004 8295.142 8295.162<br /> qs_scf_new_mos                     875  7.9    0.029    0.036 7036.433 7041.080<br /> eigensolver                        875  8.9    0.091    0.135 6743.683 6746.404<br /> cp_fm_syevd                        895 10.0    0.030    0.042 5143.201 5145.910<br /> cp_fm_syevd_base                   895 10.9 5142.498 5145.756 5142.498 5145.756<br /> cp_fm_triangular_multiply         2625  9.9 1559.272 1568.419 1559.272 1568.419<br /> rebuild_ks_matrix                  896  8.8    0.007    0.008 1019.477 1020.322<br /> qs_ks_build_kohn_sham_matrix       896  9.8    0.173    0.238 1019.470 1020.316<br /> qs_ks_update_qs_env                875  7.9    0.011    0.017  991.115  991.962<br /> sum_up_and_integrate               896 10.8    2.655    2.858  738.316  739.290<br /> integrate_v_rspace                 896 11.8    0.044    0.051  735.658  736.910<br /> qs_rho_update_rho_low              896  7.9    0.009    0.011  723.261  723.903<br /> calculate_rho_elec                 896  8.9    0.999    1.064  723.252  723.894<br /> grid_integrate_task_list           896 12.8  537.672  550.615  537.672  550.615<br /> grid_collocate_task_list           896  9.9  485.059  489.036  485.059  489.036<br /> pw_transfer                      11627 11.8    0.912    1.071  286.851  294.441<br /> fft_wrap_pw1pw2                   9835 12.8    0.129    0.145  275.251  282.867<br /> fft_wrap_pw1pw2_150               4459 13.1   27.356   28.582  256.292  266.093<br /> density_rs2pw                      896  9.9    0.055    0.061  218.157  223.442<br /> rs_pw_transfer                    7252 12.3    0.168    0.187  208.620  214.139<br /> fft3d_ps                          9835 14.8  127.773  133.888  194.643  207.229<br /> calculate_dm_sparse                895  8.9    0.181    0.222  186.857  191.068<br /> cp_dbcsr_plus_fm_fm_t_native       916  9.9    0.063    0.073  186.139  190.766<br /> qs_vxc_create                      896 10.8    0.029    0.036  182.589  186.938<br /> xc_vxc_pw_create                   896 11.8    3.250    4.661  182.560  186.906<br /> -------------------------------------------------------------------------------<br /></div><div><br /></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups "cp2k" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="mailto:cp2k+unsubscribe@googlegroups.com">cp2k+unsubscribe@googlegroups.com</a>.<br />
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/cp2k/f8c088b6-4ab1-41bc-a2ac-15cd28d0ebf4n%40googlegroups.com?utm_medium=email&utm_source=footer">https://groups.google.com/d/msgid/cp2k/f8c088b6-4ab1-41bc-a2ac-15cd28d0ebf4n%40googlegroups.com</a>.<br />