[CP2K-user] [CP2K:18691] parallelization when using diag

'k.doblh...@lic.leidenuniv.nl' via cp2k cp2k at googlegroups.com
Fri Apr 21 12:38:31 UTC 2023


Dear CP2K community,
I am trying to run an DFT-MD simulation of a system containing a metal slab 
efficiently in parallel. I observed a strong loss of efficiency when going 
from 32cpu to 128cpu (full node on my system) and no speedup at all when 
going from 128cpu to 256cpu (i.e., 2 nodes). When going to 2 nodes, the 
timings seem to be dominated by cp_fm_redistribute_end (I will post all 
timings below). I tried using OMP parallelization on top of the MPI 
parallelization with a few OMP threads, but that made things worse. I also 
checked whether I could find benchmark tests for diag online, but could 
find none, so I do not know what to expect. Therefore, I have 2 questions: 
1) Is this behavior expected (in view of the fact that I am using 
diagonalization and not OT) or may this an issue of our compilation?
2) Is there an obvious fix for the issue (e.g., use ELPA, or whatever else)?
Thank you for your help and best regards,
Katharina

 -                                T I M I N G  256 cpu (2 nodes)            
                    -
 -                                                                         
    -
 -------------------------------------------------------------------------------
 SUBROUTINE                       CALLS  ASD         SELF TIME        TOTAL 
TIME
                                MAXIMUM       AVERAGE  MAXIMUM  AVERAGE 
 MAXIMUM
 CP2K                                 1  1.0    0.043    0.204 2676.141 
2676.163
 qs_mol_dyn_low                       1  2.0    0.006    0.008 2675.630 
2675.692
 qs_forces                           21  4.0    0.007    0.009 2674.790 
2674.812
 qs_energies                         21  5.0    0.003    0.004 2648.558 
2648.605
 scf_env_do_scf                      21  6.0    0.003    0.004 2599.728 
2602.333
 scf_env_do_scf_inner_loop          875  6.9    0.043    1.347 2599.678 
2602.284
 velocity_verlet                     20  3.0    0.007    0.009 2418.820 
2418.843
 qs_scf_new_mos                     875  7.9    0.018    0.022 2160.453 
2162.360
 eigensolver                        875  8.9    0.096    0.267 2095.414 
2096.092
 cp_fm_syevd                        895 10.0    0.015    0.025 1791.707 
1795.858
 cp_fm_redistribute_end             895 11.0 1037.764 1785.744 1039.147 
1786.479
 cp_fm_syevd_base                   895 10.9  739.888 1759.452  739.888 
1759.452
 cp_fm_triangular_multiply         2625  9.9  301.128  307.525  301.128 
 307.525
 rebuild_ks_matrix                  896  8.8    0.006    0.008  248.047 
 248.107
 qs_ks_build_kohn_sham_matrix       896  9.8    0.177    0.233  248.041 
 248.102
 qs_ks_update_qs_env                875  7.9    0.023    0.029  241.842 
 241.902
 sum_up_and_integrate               896 10.8    0.207    0.444  177.563 
 177.730
 integrate_v_rspace                 896 11.8    0.034    0.051  177.354 
 177.596
 qs_rho_update_rho_low              896  7.9    0.009    0.013  176.786 
 177.420
 calculate_rho_elec                 896  8.9    0.162    0.220  176.778 
 177.411
 rs_pw_transfer                    7252 12.3    0.146    0.182  131.353 
 136.114
 density_rs2pw                      896  9.9    0.041    0.052   89.497   
93.288
 potential_pw2rs                    896 12.8    0.058    0.070   82.157   
82.656
 grid_collocate_task_list           896  9.9   74.814   79.240   74.814   
79.240
 grid_integrate_task_list           896 12.8   75.706   78.564   75.706   
78.564
 pw_transfer                      11627 11.8    1.017    1.215   71.773   
73.106
 fft_wrap_pw1pw2                   9835 12.8    0.090    0.116   70.237   
71.712
 mp_sum_d                          9316 10.4   29.648   70.131   29.648   
70.131
 fft3d_ps                          9835 14.8    5.802    8.028   65.284   
66.546
 qs_vxc_create                      896 10.8    0.023    0.040   57.095   
57.943
 xc_vxc_pw_create                   896 11.8    0.447    0.561   57.071   
57.917
 mp_alltoall_z22v                  9835 16.8   53.387   57.283   53.387   
57.283
 mp_waitany                      129318 14.3   42.683   56.101   42.683   
56.101
 mp_alltoall_d11v                 13495 12.2   50.528   55.324   50.528   
55.324
 fft_wrap_pw1pw2_150               4459 13.1    1.760    2.222   53.418   
54.942
 mp_waitall_1                   1080708 14.5   42.790   54.894   42.790   
54.894
 -------------------------------------------------------------------------------

-                                T I M I N G         128cpu (1 node)        
                 -
 -                                                                         
    -
 -------------------------------------------------------------------------------
 SUBROUTINE                       CALLS  ASD         SELF TIME        TOTAL 
TIME
                                MAXIMUM       AVERAGE  MAXIMUM  AVERAGE 
 MAXIMUM
 CP2K                                 1  1.0    0.030    0.032 3072.794 
3072.814
 qs_mol_dyn_low                       1  2.0    0.006    0.007 3072.442 
3072.528
 qs_forces                           21  4.0    0.006    0.008 3071.900 
3071.921
 qs_energies                         21  5.0    0.003    0.004 3024.241 
3024.317
 scf_env_do_scf                      21  6.0    0.004    0.006 2969.550 
2971.818
 scf_env_do_scf_inner_loop          875  6.9    0.047    0.532 2969.499 
2971.766
 velocity_verlet                     20  3.0    0.006    0.008 2794.534 
2794.562
 qs_scf_new_mos                     875  7.9    0.023    0.028 2271.468 
2273.627
 eigensolver                        875  8.9    0.095    0.206 2185.491 
2186.709
 cp_fm_syevd                        895 10.0    0.020    0.031 1767.954 
1770.227
 cp_fm_redistribute_end             895 11.0  286.821 1759.007  288.074 
1759.686
 cp_fm_syevd_base                   895 10.9 1465.425 1740.700 1465.425 
1740.700
 cp_fm_triangular_multiply         2625  9.9  410.654  416.288  410.654 
 416.288
 rebuild_ks_matrix                  896  8.8    0.008    0.010  361.723 
 362.634
 qs_ks_build_kohn_sham_matrix       896  9.8    0.188    0.243  361.716 
 362.627
 qs_ks_update_qs_env                875  7.9    0.024    0.035  345.631 
 346.530
 qs_rho_update_rho_low              896  7.9    0.011    0.020  319.796 
 320.976
 calculate_rho_elec                 896  8.9    0.281    0.443  319.785 
 320.964
 sum_up_and_integrate               896 10.8    0.543    0.936  261.772 
 262.746
 integrate_v_rspace                 896 11.8    0.037    0.048  261.227 
 262.364
 rs_pw_transfer                    7252 12.3    0.140    0.161  208.866 
 216.256
 density_rs2pw                      896  9.9    0.045    0.055  163.812 
 170.509
 grid_integrate_task_list           896 12.8  148.592  155.403  148.592 
 155.403
 grid_collocate_task_list           896  9.9  140.797  144.515  140.797 
 144.515
 mp_waitany                       89824 14.3  124.591  134.833  124.591 
 134.833
 rs_pw_transfer_RS2PW_150           938 11.9   17.719   20.395  113.331 
 121.388
 potential_pw2rs                    896 12.8    0.068    0.077   89.497   
90.026
 qs_vxc_create                      896 10.8    0.030    0.051   79.793   
81.837
 xc_vxc_pw_create                   896 11.8    0.725    0.982   79.764   
81.813
 pw_transfer                      11627 11.8    0.899    1.097   68.900   
73.972
 fft_wrap_pw1pw2                   9835 12.8    0.103    0.124   66.963   
72.074
 fft_wrap_pw1pw2_150               4459 13.1    4.627    5.429   55.447   
62.287
 mp_alltoall_d11v                 13495 12.2   51.004   61.546   51.004   
61.546
 -------------------------------------------------------------------------------

-                                T I M I N G         32cpu (1/4 of a node)  
                       -
 -                                                                         
    -
 -------------------------------------------------------------------------------
 SUBROUTINE                       CALLS  ASD         SELF TIME        TOTAL 
TIME
                                MAXIMUM       AVERAGE  MAXIMUM  AVERAGE 
 MAXIMUM
 CP2K                                 1  1.0    0.019    0.027 9152.866 
9152.875
 qs_mol_dyn_low                       1  2.0    0.004    0.004 9152.558 
9152.567
 qs_forces                           21  4.0    0.005    0.006 9152.148 
9152.157
 qs_energies                         21  5.0    0.003    0.003 9047.928 
9047.996
 scf_env_do_scf                      21  6.0    0.003    0.006 8925.472 
8925.721
 scf_env_do_scf_inner_loop          875  6.9    0.058    0.394 8925.423 
8925.670
 velocity_verlet                     20  3.0    0.003    0.004 8295.142 
8295.162
 qs_scf_new_mos                     875  7.9    0.029    0.036 7036.433 
7041.080
 eigensolver                        875  8.9    0.091    0.135 6743.683 
6746.404
 cp_fm_syevd                        895 10.0    0.030    0.042 5143.201 
5145.910
 cp_fm_syevd_base                   895 10.9 5142.498 5145.756 5142.498 
5145.756
 cp_fm_triangular_multiply         2625  9.9 1559.272 1568.419 1559.272 
1568.419
 rebuild_ks_matrix                  896  8.8    0.007    0.008 1019.477 
1020.322
 qs_ks_build_kohn_sham_matrix       896  9.8    0.173    0.238 1019.470 
1020.316
 qs_ks_update_qs_env                875  7.9    0.011    0.017  991.115 
 991.962
 sum_up_and_integrate               896 10.8    2.655    2.858  738.316 
 739.290
 integrate_v_rspace                 896 11.8    0.044    0.051  735.658 
 736.910
 qs_rho_update_rho_low              896  7.9    0.009    0.011  723.261 
 723.903
 calculate_rho_elec                 896  8.9    0.999    1.064  723.252 
 723.894
 grid_integrate_task_list           896 12.8  537.672  550.615  537.672 
 550.615
 grid_collocate_task_list           896  9.9  485.059  489.036  485.059 
 489.036
 pw_transfer                      11627 11.8    0.912    1.071  286.851 
 294.441
 fft_wrap_pw1pw2                   9835 12.8    0.129    0.145  275.251 
 282.867
 fft_wrap_pw1pw2_150               4459 13.1   27.356   28.582  256.292 
 266.093
 density_rs2pw                      896  9.9    0.055    0.061  218.157 
 223.442
 rs_pw_transfer                    7252 12.3    0.168    0.187  208.620 
 214.139
 fft3d_ps                          9835 14.8  127.773  133.888  194.643 
 207.229
 calculate_dm_sparse                895  8.9    0.181    0.222  186.857 
 191.068
 cp_dbcsr_plus_fm_fm_t_native       916  9.9    0.063    0.073  186.139 
 190.766
 qs_vxc_create                      896 10.8    0.029    0.036  182.589 
 186.938
 xc_vxc_pw_create                   896 11.8    3.250    4.661  182.560 
 186.906
 -------------------------------------------------------------------------------

-- 
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+unsubscribe at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/f8c088b6-4ab1-41bc-a2ac-15cd28d0ebf4n%40googlegroups.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20230421/ada33f80/attachment-0001.htm>


More information about the CP2K-user mailing list