<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<div>Dear Marcella, Dear Jürg,</div>
<div>Thank you for your reactions! So once </div>
<div>cp_fm_redistribute_end becomes dominant, there is nothing more one can do? Is this how i should read your answer? </div>
<div>What exactly is "cp_fm_redistribute_end"? (sorry if this is a stupid question. I had already tried to figure it out, but I ended up not being quite sure what it measures)<br>
</div>
<div>Thank you and best regards,</div>
<div>Katharina</div>
<div></div>
<br>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size: 11pt;" color="#000000" data-ogsc=""><b>From:</b> cp2k@googlegroups.com <cp2k@googlegroups.com> on behalf of Marcella Iannuzzi <marci.akira@gmail.com><br>
<b>Sent:</b> Friday, April 21, 2023 8:16 PM<br>
<b>To:</b> cp2k <cp2k@googlegroups.com><br>
<b>Subject:</b> Re: [CP2K:18693] parallelization when using diag</font>
<div> </div>
</div>
<div>
<div><br>
</div>
<div><br>
</div>
Hi Katharina,
<div><br>
</div>
<div>Indeed ELPA could help a bit, but not significantly, because with the present implementation, </div>
<div>by increasing the number of nodes cp_fm_redistribute_end becomes quickly dominant.</div>
<div><br>
</div>
<div>Regards</div>
<div>Marcella<br>
<br>
</div>
<div class="x_gmail_quote">
<div dir="auto" class="x_gmail_attr">On Friday, April 21, 2023 at 5:38:55 PM UTC+2 Jürg Hutter wrote:<br>
</div>
<blockquote class="x_gmail_quote" style="margin:0 0 0 0.8ex; border-left:1px solid rgb(204,204,204); padding-left:1ex">
Hi <br>
<br>
it seems your calculation is dominated by diagonalization. From the timing data we see that
<br>
<br>
cp_fm_syevd 32 5146 <br>
128 1770 <br>
256 1796 <br>
<br>
shows a speedup from 32 to 128 of 2.9 and gets slightly slower from 128 to 256. <br>
This is well known fro ScaLapack routines. There is also no gain in using OpenMP for most
<br>
ScaLapack routines. <br>
<br>
You should try the ELPA library, that was specifically developed for such cases. <br>
See the examples on how to install and activate ELPA in CP2K. <br>
<br>
regards <br>
<br>
JH <br>
<br>
________________________________________ <br>
From: 'k.doblh...@lic.leidenuniv.nl' via cp2k <cp...@googlegroups.com> <br>
Sent: Friday, April 21, 2023 2:38 PM <br>
To: cp2k <br>
Subject: [CP2K:18691] parallelization when using diag <br>
<br>
Dear CP2K community, <br>
I am trying to run an DFT-MD simulation of a system containing a metal slab efficiently in parallel. I observed a strong loss of efficiency when going from 32cpu to 128cpu (full node on my system) and no speedup at all when going from 128cpu to 256cpu (i.e.,
2 nodes). When going to 2 nodes, the timings seem to be dominated by cp_fm_redistribute_end (I will post all timings below). I tried using OMP parallelization on top of the MPI parallelization with a few OMP threads, but that made things worse. I also checked
whether I could find benchmark tests for diag online, but could find none, so I do not know what to expect. Therefore, I have 2 questions:
<br>
1) Is this behavior expected (in view of the fact that I am using diagonalization and not OT) or may this an issue of our compilation?
<br>
2) Is there an obvious fix for the issue (e.g., use ELPA, or whatever else)? <br>
Thank you for your help and best regards, <br>
Katharina <br>
<br>
- T I M I N G 256 cpu (2 nodes) - <br>
- - <br>
------------------------------------------------------------------------------- <br>
SUBROUTINE CALLS ASD SELF TIME TOTAL TIME <br>
MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM <br>
CP2K 1 1.0 0.043 0.204 2676.141 2676.163 <br>
qs_mol_dyn_low 1 2.0 0.006 0.008 2675.630 2675.692 <br>
qs_forces 21 4.0 0.007 0.009 2674.790 2674.812 <br>
qs_energies 21 5.0 0.003 0.004 2648.558 2648.605 <br>
scf_env_do_scf 21 6.0 0.003 0.004 2599.728 2602.333 <br>
scf_env_do_scf_inner_loop 875 6.9 0.043 1.347 2599.678 2602.284 <br>
velocity_verlet 20 3.0 0.007 0.009 2418.820 2418.843 <br>
qs_scf_new_mos 875 7.9 0.018 0.022 2160.453 2162.360 <br>
eigensolver 875 8.9 0.096 0.267 2095.414 2096.092 <br>
cp_fm_syevd 895 10.0 0.015 0.025 1791.707 1795.858 <br>
cp_fm_redistribute_end 895 11.0 1037.764 1785.744 1039.147 1786.479 <br>
cp_fm_syevd_base 895 10.9 739.888 1759.452 739.888 1759.452 <br>
cp_fm_triangular_multiply 2625 9.9 301.128 307.525 301.128 307.525 <br>
rebuild_ks_matrix 896 8.8 0.006 0.008 248.047 248.107 <br>
qs_ks_build_kohn_sham_matrix 896 9.8 0.177 0.233 248.041 248.102 <br>
qs_ks_update_qs_env 875 7.9 0.023 0.029 241.842 241.902 <br>
sum_up_and_integrate 896 10.8 0.207 0.444 177.563 177.730 <br>
integrate_v_rspace 896 11.8 0.034 0.051 177.354 177.596 <br>
qs_rho_update_rho_low 896 7.9 0.009 0.013 176.786 177.420 <br>
calculate_rho_elec 896 8.9 0.162 0.220 176.778 177.411 <br>
rs_pw_transfer 7252 12.3 0.146 0.182 131.353 136.114 <br>
density_rs2pw 896 9.9 0.041 0.052 89.497 93.288 <br>
potential_pw2rs 896 12.8 0.058 0.070 82.157 82.656 <br>
grid_collocate_task_list 896 9.9 74.814 79.240 74.814 79.240 <br>
grid_integrate_task_list 896 12.8 75.706 78.564 75.706 78.564 <br>
pw_transfer 11627 11.8 1.017 1.215 71.773 73.106 <br>
fft_wrap_pw1pw2 9835 12.8 0.090 0.116 70.237 71.712 <br>
mp_sum_d 9316 10.4 29.648 70.131 29.648 70.131 <br>
fft3d_ps 9835 14.8 5.802 8.028 65.284 66.546 <br>
qs_vxc_create 896 10.8 0.023 0.040 57.095 57.943 <br>
xc_vxc_pw_create 896 11.8 0.447 0.561 57.071 57.917 <br>
mp_alltoall_z22v 9835 16.8 53.387 57.283 53.387 57.283 <br>
mp_waitany 129318 14.3 42.683 56.101 42.683 56.101 <br>
mp_alltoall_d11v 13495 12.2 50.528 55.324 50.528 55.324 <br>
fft_wrap_pw1pw2_150 4459 13.1 1.760 2.222 53.418 54.942 <br>
mp_waitall_1 1080708 14.5 42.790 54.894 42.790 54.894 <br>
------------------------------------------------------------------------------- <br>
<br>
- T I M I N G 128cpu (1 node) - <br>
- - <br>
------------------------------------------------------------------------------- <br>
SUBROUTINE CALLS ASD SELF TIME TOTAL TIME <br>
MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM <br>
CP2K 1 1.0 0.030 0.032 3072.794 3072.814 <br>
qs_mol_dyn_low 1 2.0 0.006 0.007 3072.442 3072.528 <br>
qs_forces 21 4.0 0.006 0.008 3071.900 3071.921 <br>
qs_energies 21 5.0 0.003 0.004 3024.241 3024.317 <br>
scf_env_do_scf 21 6.0 0.004 0.006 2969.550 2971.818 <br>
scf_env_do_scf_inner_loop 875 6.9 0.047 0.532 2969.499 2971.766 <br>
velocity_verlet 20 3.0 0.006 0.008 2794.534 2794.562 <br>
qs_scf_new_mos 875 7.9 0.023 0.028 2271.468 2273.627 <br>
eigensolver 875 8.9 0.095 0.206 2185.491 2186.709 <br>
cp_fm_syevd 895 10.0 0.020 0.031 1767.954 1770.227 <br>
cp_fm_redistribute_end 895 11.0 286.821 1759.007 288.074 1759.686 <br>
cp_fm_syevd_base 895 10.9 1465.425 1740.700 1465.425 1740.700 <br>
cp_fm_triangular_multiply 2625 9.9 410.654 416.288 410.654 416.288 <br>
rebuild_ks_matrix 896 8.8 0.008 0.010 361.723 362.634 <br>
qs_ks_build_kohn_sham_matrix 896 9.8 0.188 0.243 361.716 362.627 <br>
qs_ks_update_qs_env 875 7.9 0.024 0.035 345.631 346.530 <br>
qs_rho_update_rho_low 896 7.9 0.011 0.020 319.796 320.976 <br>
calculate_rho_elec 896 8.9 0.281 0.443 319.785 320.964 <br>
sum_up_and_integrate 896 10.8 0.543 0.936 261.772 262.746 <br>
integrate_v_rspace 896 11.8 0.037 0.048 261.227 262.364 <br>
rs_pw_transfer 7252 12.3 0.140 0.161 208.866 216.256 <br>
density_rs2pw 896 9.9 0.045 0.055 163.812 170.509 <br>
grid_integrate_task_list 896 12.8 148.592 155.403 148.592 155.403 <br>
grid_collocate_task_list 896 9.9 140.797 144.515 140.797 144.515 <br>
mp_waitany 89824 14.3 124.591 134.833 124.591 134.833 <br>
rs_pw_transfer_RS2PW_150 938 11.9 17.719 20.395 113.331 121.388 <br>
potential_pw2rs 896 12.8 0.068 0.077 89.497 90.026 <br>
qs_vxc_create 896 10.8 0.030 0.051 79.793 81.837 <br>
xc_vxc_pw_create 896 11.8 0.725 0.982 79.764 81.813 <br>
pw_transfer 11627 11.8 0.899 1.097 68.900 73.972 <br>
fft_wrap_pw1pw2 9835 12.8 0.103 0.124 66.963 72.074 <br>
fft_wrap_pw1pw2_150 4459 13.1 4.627 5.429 55.447 62.287 <br>
mp_alltoall_d11v 13495 12.2 51.004 61.546 51.004 61.546 <br>
------------------------------------------------------------------------------- <br>
<br>
- T I M I N G 32cpu (1/4 of a node) - <br>
- - <br>
------------------------------------------------------------------------------- <br>
SUBROUTINE CALLS ASD SELF TIME TOTAL TIME <br>
MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM <br>
CP2K 1 1.0 0.019 0.027 9152.866 9152.875 <br>
qs_mol_dyn_low 1 2.0 0.004 0.004 9152.558 9152.567 <br>
qs_forces 21 4.0 0.005 0.006 9152.148 9152.157 <br>
qs_energies 21 5.0 0.003 0.003 9047.928 9047.996 <br>
scf_env_do_scf 21 6.0 0.003 0.006 8925.472 8925.721 <br>
scf_env_do_scf_inner_loop 875 6.9 0.058 0.394 8925.423 8925.670 <br>
velocity_verlet 20 3.0 0.003 0.004 8295.142 8295.162 <br>
qs_scf_new_mos 875 7.9 0.029 0.036 7036.433 7041.080 <br>
eigensolver 875 8.9 0.091 0.135 6743.683 6746.404 <br>
cp_fm_syevd 895 10.0 0.030 0.042 5143.201 5145.910 <br>
cp_fm_syevd_base 895 10.9 5142.498 5145.756 5142.498 5145.756 <br>
cp_fm_triangular_multiply 2625 9.9 1559.272 1568.419 1559.272 1568.419 <br>
rebuild_ks_matrix 896 8.8 0.007 0.008 1019.477 1020.322 <br>
qs_ks_build_kohn_sham_matrix 896 9.8 0.173 0.238 1019.470 1020.316 <br>
qs_ks_update_qs_env 875 7.9 0.011 0.017 991.115 991.962 <br>
sum_up_and_integrate 896 10.8 2.655 2.858 738.316 739.290 <br>
integrate_v_rspace 896 11.8 0.044 0.051 735.658 736.910 <br>
qs_rho_update_rho_low 896 7.9 0.009 0.011 723.261 723.903 <br>
calculate_rho_elec 896 8.9 0.999 1.064 723.252 723.894 <br>
grid_integrate_task_list 896 12.8 537.672 550.615 537.672 550.615 <br>
grid_collocate_task_list 896 9.9 485.059 489.036 485.059 489.036 <br>
pw_transfer 11627 11.8 0.912 1.071 286.851 294.441 <br>
fft_wrap_pw1pw2 9835 12.8 0.129 0.145 275.251 282.867 <br>
fft_wrap_pw1pw2_150 4459 13.1 27.356 28.582 256.292 266.093 <br>
density_rs2pw 896 9.9 0.055 0.061 218.157 223.442 <br>
rs_pw_transfer 7252 12.3 0.168 0.187 208.620 214.139 <br>
fft3d_ps 9835 14.8 127.773 133.888 194.643 207.229 <br>
calculate_dm_sparse 895 8.9 0.181 0.222 186.857 191.068 <br>
cp_dbcsr_plus_fm_fm_t_native 916 9.9 0.063 0.073 186.139 190.766 <br>
qs_vxc_create 896 10.8 0.029 0.036 182.589 186.938 <br>
xc_vxc_pw_create 896 11.8 3.250 4.661 182.560 186.906 <br>
------------------------------------------------------------------------------- <br>
<br>
<br>
-- <br>
You received this message because you are subscribed to the Google Groups "cp2k" group.
<br>
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+uns...@googlegroups.com<mailto:cp2k+uns...@googlegroups.com>.
<br>
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/cp2k/f8c088b6-4ab1-41bc-a2ac-15cd28d0ebf4n%40googlegroups.com" data-auth="Verified" originalsrc="https://groups.google.com/d/msgid/cp2k/f8c088b6-4ab1-41bc-a2ac-15cd28d0ebf4n%40googlegroups.com" shash="OBwvcFLTPo8FIXIlVTHfLnqYUF//YigVtEkkaXLIi3TW6S9oieDL1gZDqJ9ke7GuCLt6nz6eoItPLkommkNedi0A4gtfFCZmSgMMnvHmc4bkLhW1Xx0UqhCciZxKjXRb76dIZGuBVttI5qDTm0vvwj5tvm+6TcU3wuheUmMio80=" data-saferedirecturl="https://www.google.com/url?hl=en&q=https://groups.google.com/d/msgid/cp2k/f8c088b6-4ab1-41bc-a2ac-15cd28d0ebf4n%2540googlegroups.com&source=gmail&ust=1682187287387000&usg=AOvVaw0wQqM6zJW_Oqt2ELqV1qEY" data-ogsc="" style="">
https://groups.google.com/d/msgid/cp2k/f8c088b6-4ab1-41bc-a2ac-15cd28d0ebf4n%40googlegroups.com</a><<a href="https://groups.google.com/d/msgid/cp2k/f8c088b6-4ab1-41bc-a2ac-15cd28d0ebf4n%40googlegroups.com?utm_medium=email&utm_source=footer" data-auth="Verified" originalsrc="https://groups.google.com/d/msgid/cp2k/f8c088b6-4ab1-41bc-a2ac-15cd28d0ebf4n%40googlegroups.com?utm_medium=email&utm_source=footer" shash="LCJPCFuQVJ8CCaFNOWIezC0nx18RRjA+4yzAUkEiVEhKQZC6GW+4VubIkDTSWWL/iNewmNVmBZV5Yg1KB1ghUYBDpa8NSZtA5PMXVWONQ9T74vSOOPNsLYlmVSq9zweZS9lH62VOu14H1NBDgPKMbza+oSrZ6vogg/KaIQTpOzI=" data-saferedirecturl="https://www.google.com/url?hl=en&q=https://groups.google.com/d/msgid/cp2k/f8c088b6-4ab1-41bc-a2ac-15cd28d0ebf4n%2540googlegroups.com?utm_medium%3Demail%26utm_source%3Dfooter&source=gmail&ust=1682187287387000&usg=AOvVaw29kozpiBrCikInqZ4S96Hn" data-ogsc="" style="">https://groups.google.com/d/msgid/cp2k/f8c088b6-4ab1-41bc-a2ac-15cd28d0ebf4n%40googlegroups.com?utm_medium=email&utm_source=footer</a>>.
<br>
</blockquote>
</div>
<p></p>
-- <br>
You received this message because you are subscribed to a topic in the Google Groups "cp2k" group.<br>
To unsubscribe from this topic, visit <a href="https://groups.google.com/d/topic/cp2k/W3FHXhRpRr8/unsubscribe" data-auth="Verified" originalsrc="https://groups.google.com/d/topic/cp2k/W3FHXhRpRr8/unsubscribe" shash="h5vYrFYuCdEf5waqAT+dLn7DTZWPndN6q/xulqBQ+R3jqyMcMv47j86zCGT4epLYQCmtX3J5ymK5F5hvnRlWG/MpdQNdwPfjrS2eWJ4Qc0zjMsPhN80UwhJMqgCZpwmKxwAmbMSj8teRRb4yLi8Rx4Koth8LV8PBAm2fkrevmho=" data-ogsc="" style="">
https://groups.google.com/d/topic/cp2k/W3FHXhRpRr8/unsubscribe</a>.<br>
To unsubscribe from this group and all its topics, send an email to <a href="mailto:cp2k+unsubscribe@googlegroups.com" data-auth="NotApplicable" data-ogsc="" style="">
cp2k+unsubscribe@googlegroups.com</a>.<br>
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/cp2k/5a84eb59-0a5a-462e-adf4-1d32efee7c26n%40googlegroups.com?utm_medium=email&utm_source=footer" data-auth="Verified" originalsrc="https://groups.google.com/d/msgid/cp2k/5a84eb59-0a5a-462e-adf4-1d32efee7c26n%40googlegroups.com?utm_medium=email&utm_source=footer" shash="T/GNSZWGAxtHc9Jd+pDmSotdBaE5EOPDFaN0urqaAvQgmKSe/jnlYuN4p8HAxCl0ubCGfRGQGOgGowjLGrHzLTelE705pqYDYVXEBgVfUylFjJU0aYo49YIqI4Uf753bVB34uiBVokv30E9RWu/1KhWaAxzEdrQpayXjqLU5nhI=" data-ogsc="" style="">
https://groups.google.com/d/msgid/cp2k/5a84eb59-0a5a-462e-adf4-1d32efee7c26n%40googlegroups.com</a>.<br>
</div>
</body>
</html>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups "cp2k" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="mailto:cp2k+unsubscribe@googlegroups.com">cp2k+unsubscribe@googlegroups.com</a>.<br />
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/cp2k/PAXPR09MB49096965AD664F932EA9A907E5609%40PAXPR09MB4909.eurprd09.prod.outlook.com?utm_medium=email&utm_source=footer">https://groups.google.com/d/msgid/cp2k/PAXPR09MB49096965AD664F932EA9A907E5609%40PAXPR09MB4909.eurprd09.prod.outlook.com</a>.<br />