[CP2K-user] [CP2K:18691] parallelization when using diag

Jürg Hutter hutter at chem.uzh.ch
Fri Apr 21 15:38:48 UTC 2023


Hi

it seems your calculation is dominated by diagonalization. From the timing data we see that

cp_fm_syevd        32    5146
                             128    1770
                              256   1796

shows a speedup from 32 to 128 of 2.9 and gets slightly slower from 128 to 256.
This is well known fro ScaLapack routines. There is also no gain in using OpenMP for most
ScaLapack routines.

You should try the ELPA library, that was specifically developed for such cases.
See the examples on how to install and activate ELPA in CP2K.

regards

JH

________________________________________
From: 'k.doblh... at lic.leidenuniv.nl' via cp2k <cp2k at googlegroups.com>
Sent: Friday, April 21, 2023 2:38 PM
To: cp2k
Subject: [CP2K:18691] parallelization when using diag

Dear CP2K community,
I am trying to run an DFT-MD simulation of a system containing a metal slab efficiently in parallel. I observed a strong loss of efficiency when going from 32cpu to 128cpu (full node on my system) and no speedup at all when going from 128cpu to 256cpu (i.e., 2 nodes). When going to 2 nodes, the timings seem to be dominated by cp_fm_redistribute_end (I will post all timings below). I tried using OMP parallelization on top of the MPI parallelization with a few OMP threads, but that made things worse. I also checked whether I could find benchmark tests for diag online, but could find none, so I do not know what to expect. Therefore, I have 2 questions:
1) Is this behavior expected (in view of the fact that I am using diagonalization and not OT) or may this an issue of our compilation?
2) Is there an obvious fix for the issue (e.g., use ELPA, or whatever else)?
Thank you for your help and best regards,
Katharina

 -                                T I M I N G  256 cpu (2 nodes)                                -
 -                                                                             -
 -------------------------------------------------------------------------------
 SUBROUTINE                       CALLS  ASD         SELF TIME        TOTAL TIME
                                MAXIMUM       AVERAGE  MAXIMUM  AVERAGE  MAXIMUM
 CP2K                                 1  1.0    0.043    0.204 2676.141 2676.163
 qs_mol_dyn_low                       1  2.0    0.006    0.008 2675.630 2675.692
 qs_forces                           21  4.0    0.007    0.009 2674.790 2674.812
 qs_energies                         21  5.0    0.003    0.004 2648.558 2648.605
 scf_env_do_scf                      21  6.0    0.003    0.004 2599.728 2602.333
 scf_env_do_scf_inner_loop          875  6.9    0.043    1.347 2599.678 2602.284
 velocity_verlet                     20  3.0    0.007    0.009 2418.820 2418.843
 qs_scf_new_mos                     875  7.9    0.018    0.022 2160.453 2162.360
 eigensolver                        875  8.9    0.096    0.267 2095.414 2096.092
 cp_fm_syevd                        895 10.0    0.015    0.025 1791.707 1795.858
 cp_fm_redistribute_end             895 11.0 1037.764 1785.744 1039.147 1786.479
 cp_fm_syevd_base                   895 10.9  739.888 1759.452  739.888 1759.452
 cp_fm_triangular_multiply         2625  9.9  301.128  307.525  301.128  307.525
 rebuild_ks_matrix                  896  8.8    0.006    0.008  248.047  248.107
 qs_ks_build_kohn_sham_matrix       896  9.8    0.177    0.233  248.041  248.102
 qs_ks_update_qs_env                875  7.9    0.023    0.029  241.842  241.902
 sum_up_and_integrate               896 10.8    0.207    0.444  177.563  177.730
 integrate_v_rspace                 896 11.8    0.034    0.051  177.354  177.596
 qs_rho_update_rho_low              896  7.9    0.009    0.013  176.786  177.420
 calculate_rho_elec                 896  8.9    0.162    0.220  176.778  177.411
 rs_pw_transfer                    7252 12.3    0.146    0.182  131.353  136.114
 density_rs2pw                      896  9.9    0.041    0.052   89.497   93.288
 potential_pw2rs                    896 12.8    0.058    0.070   82.157   82.656
 grid_collocate_task_list           896  9.9   74.814   79.240   74.814   79.240
 grid_integrate_task_list           896 12.8   75.706   78.564   75.706   78.564
 pw_transfer                      11627 11.8    1.017    1.215   71.773   73.106
 fft_wrap_pw1pw2                   9835 12.8    0.090    0.116   70.237   71.712
 mp_sum_d                          9316 10.4   29.648   70.131   29.648   70.131
 fft3d_ps                          9835 14.8    5.802    8.028   65.284   66.546
 qs_vxc_create                      896 10.8    0.023    0.040   57.095   57.943
 xc_vxc_pw_create                   896 11.8    0.447    0.561   57.071   57.917
 mp_alltoall_z22v                  9835 16.8   53.387   57.283   53.387   57.283
 mp_waitany                      129318 14.3   42.683   56.101   42.683   56.101
 mp_alltoall_d11v                 13495 12.2   50.528   55.324   50.528   55.324
 fft_wrap_pw1pw2_150               4459 13.1    1.760    2.222   53.418   54.942
 mp_waitall_1                   1080708 14.5   42.790   54.894   42.790   54.894
 -------------------------------------------------------------------------------

-                                T I M I N G         128cpu (1 node)                         -
 -                                                                             -
 -------------------------------------------------------------------------------
 SUBROUTINE                       CALLS  ASD         SELF TIME        TOTAL TIME
                                MAXIMUM       AVERAGE  MAXIMUM  AVERAGE  MAXIMUM
 CP2K                                 1  1.0    0.030    0.032 3072.794 3072.814
 qs_mol_dyn_low                       1  2.0    0.006    0.007 3072.442 3072.528
 qs_forces                           21  4.0    0.006    0.008 3071.900 3071.921
 qs_energies                         21  5.0    0.003    0.004 3024.241 3024.317
 scf_env_do_scf                      21  6.0    0.004    0.006 2969.550 2971.818
 scf_env_do_scf_inner_loop          875  6.9    0.047    0.532 2969.499 2971.766
 velocity_verlet                     20  3.0    0.006    0.008 2794.534 2794.562
 qs_scf_new_mos                     875  7.9    0.023    0.028 2271.468 2273.627
 eigensolver                        875  8.9    0.095    0.206 2185.491 2186.709
 cp_fm_syevd                        895 10.0    0.020    0.031 1767.954 1770.227
 cp_fm_redistribute_end             895 11.0  286.821 1759.007  288.074 1759.686
 cp_fm_syevd_base                   895 10.9 1465.425 1740.700 1465.425 1740.700
 cp_fm_triangular_multiply         2625  9.9  410.654  416.288  410.654  416.288
 rebuild_ks_matrix                  896  8.8    0.008    0.010  361.723  362.634
 qs_ks_build_kohn_sham_matrix       896  9.8    0.188    0.243  361.716  362.627
 qs_ks_update_qs_env                875  7.9    0.024    0.035  345.631  346.530
 qs_rho_update_rho_low              896  7.9    0.011    0.020  319.796  320.976
 calculate_rho_elec                 896  8.9    0.281    0.443  319.785  320.964
 sum_up_and_integrate               896 10.8    0.543    0.936  261.772  262.746
 integrate_v_rspace                 896 11.8    0.037    0.048  261.227  262.364
 rs_pw_transfer                    7252 12.3    0.140    0.161  208.866  216.256
 density_rs2pw                      896  9.9    0.045    0.055  163.812  170.509
 grid_integrate_task_list           896 12.8  148.592  155.403  148.592  155.403
 grid_collocate_task_list           896  9.9  140.797  144.515  140.797  144.515
 mp_waitany                       89824 14.3  124.591  134.833  124.591  134.833
 rs_pw_transfer_RS2PW_150           938 11.9   17.719   20.395  113.331  121.388
 potential_pw2rs                    896 12.8    0.068    0.077   89.497   90.026
 qs_vxc_create                      896 10.8    0.030    0.051   79.793   81.837
 xc_vxc_pw_create                   896 11.8    0.725    0.982   79.764   81.813
 pw_transfer                      11627 11.8    0.899    1.097   68.900   73.972
 fft_wrap_pw1pw2                   9835 12.8    0.103    0.124   66.963   72.074
 fft_wrap_pw1pw2_150               4459 13.1    4.627    5.429   55.447   62.287
 mp_alltoall_d11v                 13495 12.2   51.004   61.546   51.004   61.546
 -------------------------------------------------------------------------------

-                                T I M I N G         32cpu (1/4 of a node)                         -
 -                                                                             -
 -------------------------------------------------------------------------------
 SUBROUTINE                       CALLS  ASD         SELF TIME        TOTAL TIME
                                MAXIMUM       AVERAGE  MAXIMUM  AVERAGE  MAXIMUM
 CP2K                                 1  1.0    0.019    0.027 9152.866 9152.875
 qs_mol_dyn_low                       1  2.0    0.004    0.004 9152.558 9152.567
 qs_forces                           21  4.0    0.005    0.006 9152.148 9152.157
 qs_energies                         21  5.0    0.003    0.003 9047.928 9047.996
 scf_env_do_scf                      21  6.0    0.003    0.006 8925.472 8925.721
 scf_env_do_scf_inner_loop          875  6.9    0.058    0.394 8925.423 8925.670
 velocity_verlet                     20  3.0    0.003    0.004 8295.142 8295.162
 qs_scf_new_mos                     875  7.9    0.029    0.036 7036.433 7041.080
 eigensolver                        875  8.9    0.091    0.135 6743.683 6746.404
 cp_fm_syevd                        895 10.0    0.030    0.042 5143.201 5145.910
 cp_fm_syevd_base                   895 10.9 5142.498 5145.756 5142.498 5145.756
 cp_fm_triangular_multiply         2625  9.9 1559.272 1568.419 1559.272 1568.419
 rebuild_ks_matrix                  896  8.8    0.007    0.008 1019.477 1020.322
 qs_ks_build_kohn_sham_matrix       896  9.8    0.173    0.238 1019.470 1020.316
 qs_ks_update_qs_env                875  7.9    0.011    0.017  991.115  991.962
 sum_up_and_integrate               896 10.8    2.655    2.858  738.316  739.290
 integrate_v_rspace                 896 11.8    0.044    0.051  735.658  736.910
 qs_rho_update_rho_low              896  7.9    0.009    0.011  723.261  723.903
 calculate_rho_elec                 896  8.9    0.999    1.064  723.252  723.894
 grid_integrate_task_list           896 12.8  537.672  550.615  537.672  550.615
 grid_collocate_task_list           896  9.9  485.059  489.036  485.059  489.036
 pw_transfer                      11627 11.8    0.912    1.071  286.851  294.441
 fft_wrap_pw1pw2                   9835 12.8    0.129    0.145  275.251  282.867
 fft_wrap_pw1pw2_150               4459 13.1   27.356   28.582  256.292  266.093
 density_rs2pw                      896  9.9    0.055    0.061  218.157  223.442
 rs_pw_transfer                    7252 12.3    0.168    0.187  208.620  214.139
 fft3d_ps                          9835 14.8  127.773  133.888  194.643  207.229
 calculate_dm_sparse                895  8.9    0.181    0.222  186.857  191.068
 cp_dbcsr_plus_fm_fm_t_native       916  9.9    0.063    0.073  186.139  190.766
 qs_vxc_create                      896 10.8    0.029    0.036  182.589  186.938
 xc_vxc_pw_create                   896 11.8    3.250    4.661  182.560  186.906
 -------------------------------------------------------------------------------


--
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+unsubscribe at googlegroups.com<mailto:cp2k+unsubscribe at googlegroups.com>.
To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/f8c088b6-4ab1-41bc-a2ac-15cd28d0ebf4n%40googlegroups.com<https://groups.google.com/d/msgid/cp2k/f8c088b6-4ab1-41bc-a2ac-15cd28d0ebf4n%40googlegroups.com?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+unsubscribe at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/ZR0P278MB075968080B5983F9ED2E9BAB9F609%40ZR0P278MB0759.CHEP278.PROD.OUTLOOK.COM.


More information about the CP2K-user mailing list