[CP2K-user] [CP2K:18693] parallelization when using diag

'Doblhoff-Dier, K. (Katharina)' via cp2k cp2k at googlegroups.com
Fri Apr 21 21:19:56 UTC 2023


Dear Marcella, Dear Jürg,
Thank you for your reactions! So once
cp_fm_redistribute_end becomes dominant, there is nothing more one can do? Is this how i should read your answer?
What exactly is "cp_fm_redistribute_end"? (sorry if this is a stupid question. I had already tried to figure it out, but I ended up not being quite sure what it measures)
Thank you and best regards,
Katharina

________________________________
From: cp2k at googlegroups.com <cp2k at googlegroups.com> on behalf of Marcella Iannuzzi <marci.akira at gmail.com>
Sent: Friday, April 21, 2023 8:16 PM
To: cp2k <cp2k at googlegroups.com>
Subject: Re: [CP2K:18693] parallelization when using diag



Hi Katharina,

Indeed ELPA could help a bit, but not significantly, because with the present implementation,
by increasing the number of nodes cp_fm_redistribute_end  becomes quickly dominant.

Regards
Marcella

On Friday, April 21, 2023 at 5:38:55 PM UTC+2 Jürg Hutter wrote:
Hi

it seems your calculation is dominated by diagonalization. From the timing data we see that

cp_fm_syevd 32 5146
128 1770
256 1796

shows a speedup from 32 to 128 of 2.9 and gets slightly slower from 128 to 256.
This is well known fro ScaLapack routines. There is also no gain in using OpenMP for most
ScaLapack routines.

You should try the ELPA library, that was specifically developed for such cases.
See the examples on how to install and activate ELPA in CP2K.

regards

JH

________________________________________
From: 'k.doblh... at lic.leidenuniv.nl' via cp2k <cp... at googlegroups.com>
Sent: Friday, April 21, 2023 2:38 PM
To: cp2k
Subject: [CP2K:18691] parallelization when using diag

Dear CP2K community,
I am trying to run an DFT-MD simulation of a system containing a metal slab efficiently in parallel. I observed a strong loss of efficiency when going from 32cpu to 128cpu (full node on my system) and no speedup at all when going from 128cpu to 256cpu (i.e., 2 nodes). When going to 2 nodes, the timings seem to be dominated by cp_fm_redistribute_end (I will post all timings below). I tried using OMP parallelization on top of the MPI parallelization with a few OMP threads, but that made things worse. I also checked whether I could find benchmark tests for diag online, but could find none, so I do not know what to expect. Therefore, I have 2 questions:
1) Is this behavior expected (in view of the fact that I am using diagonalization and not OT) or may this an issue of our compilation?
2) Is there an obvious fix for the issue (e.g., use ELPA, or whatever else)?
Thank you for your help and best regards,
Katharina

- T I M I N G 256 cpu (2 nodes) -
- -
-------------------------------------------------------------------------------
SUBROUTINE CALLS ASD SELF TIME TOTAL TIME
MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM
CP2K 1 1.0 0.043 0.204 2676.141 2676.163
qs_mol_dyn_low 1 2.0 0.006 0.008 2675.630 2675.692
qs_forces 21 4.0 0.007 0.009 2674.790 2674.812
qs_energies 21 5.0 0.003 0.004 2648.558 2648.605
scf_env_do_scf 21 6.0 0.003 0.004 2599.728 2602.333
scf_env_do_scf_inner_loop 875 6.9 0.043 1.347 2599.678 2602.284
velocity_verlet 20 3.0 0.007 0.009 2418.820 2418.843
qs_scf_new_mos 875 7.9 0.018 0.022 2160.453 2162.360
eigensolver 875 8.9 0.096 0.267 2095.414 2096.092
cp_fm_syevd 895 10.0 0.015 0.025 1791.707 1795.858
cp_fm_redistribute_end 895 11.0 1037.764 1785.744 1039.147 1786.479
cp_fm_syevd_base 895 10.9 739.888 1759.452 739.888 1759.452
cp_fm_triangular_multiply 2625 9.9 301.128 307.525 301.128 307.525
rebuild_ks_matrix 896 8.8 0.006 0.008 248.047 248.107
qs_ks_build_kohn_sham_matrix 896 9.8 0.177 0.233 248.041 248.102
qs_ks_update_qs_env 875 7.9 0.023 0.029 241.842 241.902
sum_up_and_integrate 896 10.8 0.207 0.444 177.563 177.730
integrate_v_rspace 896 11.8 0.034 0.051 177.354 177.596
qs_rho_update_rho_low 896 7.9 0.009 0.013 176.786 177.420
calculate_rho_elec 896 8.9 0.162 0.220 176.778 177.411
rs_pw_transfer 7252 12.3 0.146 0.182 131.353 136.114
density_rs2pw 896 9.9 0.041 0.052 89.497 93.288
potential_pw2rs 896 12.8 0.058 0.070 82.157 82.656
grid_collocate_task_list 896 9.9 74.814 79.240 74.814 79.240
grid_integrate_task_list 896 12.8 75.706 78.564 75.706 78.564
pw_transfer 11627 11.8 1.017 1.215 71.773 73.106
fft_wrap_pw1pw2 9835 12.8 0.090 0.116 70.237 71.712
mp_sum_d 9316 10.4 29.648 70.131 29.648 70.131
fft3d_ps 9835 14.8 5.802 8.028 65.284 66.546
qs_vxc_create 896 10.8 0.023 0.040 57.095 57.943
xc_vxc_pw_create 896 11.8 0.447 0.561 57.071 57.917
mp_alltoall_z22v 9835 16.8 53.387 57.283 53.387 57.283
mp_waitany 129318 14.3 42.683 56.101 42.683 56.101
mp_alltoall_d11v 13495 12.2 50.528 55.324 50.528 55.324
fft_wrap_pw1pw2_150 4459 13.1 1.760 2.222 53.418 54.942
mp_waitall_1 1080708 14.5 42.790 54.894 42.790 54.894
-------------------------------------------------------------------------------

- T I M I N G 128cpu (1 node) -
- -
-------------------------------------------------------------------------------
SUBROUTINE CALLS ASD SELF TIME TOTAL TIME
MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM
CP2K 1 1.0 0.030 0.032 3072.794 3072.814
qs_mol_dyn_low 1 2.0 0.006 0.007 3072.442 3072.528
qs_forces 21 4.0 0.006 0.008 3071.900 3071.921
qs_energies 21 5.0 0.003 0.004 3024.241 3024.317
scf_env_do_scf 21 6.0 0.004 0.006 2969.550 2971.818
scf_env_do_scf_inner_loop 875 6.9 0.047 0.532 2969.499 2971.766
velocity_verlet 20 3.0 0.006 0.008 2794.534 2794.562
qs_scf_new_mos 875 7.9 0.023 0.028 2271.468 2273.627
eigensolver 875 8.9 0.095 0.206 2185.491 2186.709
cp_fm_syevd 895 10.0 0.020 0.031 1767.954 1770.227
cp_fm_redistribute_end 895 11.0 286.821 1759.007 288.074 1759.686
cp_fm_syevd_base 895 10.9 1465.425 1740.700 1465.425 1740.700
cp_fm_triangular_multiply 2625 9.9 410.654 416.288 410.654 416.288
rebuild_ks_matrix 896 8.8 0.008 0.010 361.723 362.634
qs_ks_build_kohn_sham_matrix 896 9.8 0.188 0.243 361.716 362.627
qs_ks_update_qs_env 875 7.9 0.024 0.035 345.631 346.530
qs_rho_update_rho_low 896 7.9 0.011 0.020 319.796 320.976
calculate_rho_elec 896 8.9 0.281 0.443 319.785 320.964
sum_up_and_integrate 896 10.8 0.543 0.936 261.772 262.746
integrate_v_rspace 896 11.8 0.037 0.048 261.227 262.364
rs_pw_transfer 7252 12.3 0.140 0.161 208.866 216.256
density_rs2pw 896 9.9 0.045 0.055 163.812 170.509
grid_integrate_task_list 896 12.8 148.592 155.403 148.592 155.403
grid_collocate_task_list 896 9.9 140.797 144.515 140.797 144.515
mp_waitany 89824 14.3 124.591 134.833 124.591 134.833
rs_pw_transfer_RS2PW_150 938 11.9 17.719 20.395 113.331 121.388
potential_pw2rs 896 12.8 0.068 0.077 89.497 90.026
qs_vxc_create 896 10.8 0.030 0.051 79.793 81.837
xc_vxc_pw_create 896 11.8 0.725 0.982 79.764 81.813
pw_transfer 11627 11.8 0.899 1.097 68.900 73.972
fft_wrap_pw1pw2 9835 12.8 0.103 0.124 66.963 72.074
fft_wrap_pw1pw2_150 4459 13.1 4.627 5.429 55.447 62.287
mp_alltoall_d11v 13495 12.2 51.004 61.546 51.004 61.546
-------------------------------------------------------------------------------

- T I M I N G 32cpu (1/4 of a node) -
- -
-------------------------------------------------------------------------------
SUBROUTINE CALLS ASD SELF TIME TOTAL TIME
MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM
CP2K 1 1.0 0.019 0.027 9152.866 9152.875
qs_mol_dyn_low 1 2.0 0.004 0.004 9152.558 9152.567
qs_forces 21 4.0 0.005 0.006 9152.148 9152.157
qs_energies 21 5.0 0.003 0.003 9047.928 9047.996
scf_env_do_scf 21 6.0 0.003 0.006 8925.472 8925.721
scf_env_do_scf_inner_loop 875 6.9 0.058 0.394 8925.423 8925.670
velocity_verlet 20 3.0 0.003 0.004 8295.142 8295.162
qs_scf_new_mos 875 7.9 0.029 0.036 7036.433 7041.080
eigensolver 875 8.9 0.091 0.135 6743.683 6746.404
cp_fm_syevd 895 10.0 0.030 0.042 5143.201 5145.910
cp_fm_syevd_base 895 10.9 5142.498 5145.756 5142.498 5145.756
cp_fm_triangular_multiply 2625 9.9 1559.272 1568.419 1559.272 1568.419
rebuild_ks_matrix 896 8.8 0.007 0.008 1019.477 1020.322
qs_ks_build_kohn_sham_matrix 896 9.8 0.173 0.238 1019.470 1020.316
qs_ks_update_qs_env 875 7.9 0.011 0.017 991.115 991.962
sum_up_and_integrate 896 10.8 2.655 2.858 738.316 739.290
integrate_v_rspace 896 11.8 0.044 0.051 735.658 736.910
qs_rho_update_rho_low 896 7.9 0.009 0.011 723.261 723.903
calculate_rho_elec 896 8.9 0.999 1.064 723.252 723.894
grid_integrate_task_list 896 12.8 537.672 550.615 537.672 550.615
grid_collocate_task_list 896 9.9 485.059 489.036 485.059 489.036
pw_transfer 11627 11.8 0.912 1.071 286.851 294.441
fft_wrap_pw1pw2 9835 12.8 0.129 0.145 275.251 282.867
fft_wrap_pw1pw2_150 4459 13.1 27.356 28.582 256.292 266.093
density_rs2pw 896 9.9 0.055 0.061 218.157 223.442
rs_pw_transfer 7252 12.3 0.168 0.187 208.620 214.139
fft3d_ps 9835 14.8 127.773 133.888 194.643 207.229
calculate_dm_sparse 895 8.9 0.181 0.222 186.857 191.068
cp_dbcsr_plus_fm_fm_t_native 916 9.9 0.063 0.073 186.139 190.766
qs_vxc_create 896 10.8 0.029 0.036 182.589 186.938
xc_vxc_pw_create 896 11.8 3.250 4.661 182.560 186.906
-------------------------------------------------------------------------------


--
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+uns... at googlegroups.com<mailto:cp2k+uns... at googlegroups.com>.
To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/f8c088b6-4ab1-41bc-a2ac-15cd28d0ebf4n%40googlegroups.com<https://groups.google.com/d/msgid/cp2k/f8c088b6-4ab1-41bc-a2ac-15cd28d0ebf4n%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
You received this message because you are subscribed to a topic in the Google Groups "cp2k" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cp2k/W3FHXhRpRr8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cp2k+unsubscribe at googlegroups.com<mailto:cp2k+unsubscribe at googlegroups.com>.
To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/5a84eb59-0a5a-462e-adf4-1d32efee7c26n%40googlegroups.com<https://groups.google.com/d/msgid/cp2k/5a84eb59-0a5a-462e-adf4-1d32efee7c26n%40googlegroups.com?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+unsubscribe at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/PAXPR09MB49096965AD664F932EA9A907E5609%40PAXPR09MB4909.eurprd09.prod.outlook.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20230421/890c1314/attachment-0001.htm>


More information about the CP2K-user mailing list