[CP2K-user] [CP2K:18691] parallelization when using diag
'k.doblh...@lic.leidenuniv.nl' via cp2k
cp2k at googlegroups.com
Fri Apr 21 12:38:31 UTC 2023
Dear CP2K community,
I am trying to run an DFT-MD simulation of a system containing a metal slab
efficiently in parallel. I observed a strong loss of efficiency when going
from 32cpu to 128cpu (full node on my system) and no speedup at all when
going from 128cpu to 256cpu (i.e., 2 nodes). When going to 2 nodes, the
timings seem to be dominated by cp_fm_redistribute_end (I will post all
timings below). I tried using OMP parallelization on top of the MPI
parallelization with a few OMP threads, but that made things worse. I also
checked whether I could find benchmark tests for diag online, but could
find none, so I do not know what to expect. Therefore, I have 2 questions:
1) Is this behavior expected (in view of the fact that I am using
diagonalization and not OT) or may this an issue of our compilation?
2) Is there an obvious fix for the issue (e.g., use ELPA, or whatever else)?
Thank you for your help and best regards,
Katharina
- T I M I N G 256 cpu (2 nodes)
-
-
-
-------------------------------------------------------------------------------
SUBROUTINE CALLS ASD SELF TIME TOTAL
TIME
MAXIMUM AVERAGE MAXIMUM AVERAGE
MAXIMUM
CP2K 1 1.0 0.043 0.204 2676.141
2676.163
qs_mol_dyn_low 1 2.0 0.006 0.008 2675.630
2675.692
qs_forces 21 4.0 0.007 0.009 2674.790
2674.812
qs_energies 21 5.0 0.003 0.004 2648.558
2648.605
scf_env_do_scf 21 6.0 0.003 0.004 2599.728
2602.333
scf_env_do_scf_inner_loop 875 6.9 0.043 1.347 2599.678
2602.284
velocity_verlet 20 3.0 0.007 0.009 2418.820
2418.843
qs_scf_new_mos 875 7.9 0.018 0.022 2160.453
2162.360
eigensolver 875 8.9 0.096 0.267 2095.414
2096.092
cp_fm_syevd 895 10.0 0.015 0.025 1791.707
1795.858
cp_fm_redistribute_end 895 11.0 1037.764 1785.744 1039.147
1786.479
cp_fm_syevd_base 895 10.9 739.888 1759.452 739.888
1759.452
cp_fm_triangular_multiply 2625 9.9 301.128 307.525 301.128
307.525
rebuild_ks_matrix 896 8.8 0.006 0.008 248.047
248.107
qs_ks_build_kohn_sham_matrix 896 9.8 0.177 0.233 248.041
248.102
qs_ks_update_qs_env 875 7.9 0.023 0.029 241.842
241.902
sum_up_and_integrate 896 10.8 0.207 0.444 177.563
177.730
integrate_v_rspace 896 11.8 0.034 0.051 177.354
177.596
qs_rho_update_rho_low 896 7.9 0.009 0.013 176.786
177.420
calculate_rho_elec 896 8.9 0.162 0.220 176.778
177.411
rs_pw_transfer 7252 12.3 0.146 0.182 131.353
136.114
density_rs2pw 896 9.9 0.041 0.052 89.497
93.288
potential_pw2rs 896 12.8 0.058 0.070 82.157
82.656
grid_collocate_task_list 896 9.9 74.814 79.240 74.814
79.240
grid_integrate_task_list 896 12.8 75.706 78.564 75.706
78.564
pw_transfer 11627 11.8 1.017 1.215 71.773
73.106
fft_wrap_pw1pw2 9835 12.8 0.090 0.116 70.237
71.712
mp_sum_d 9316 10.4 29.648 70.131 29.648
70.131
fft3d_ps 9835 14.8 5.802 8.028 65.284
66.546
qs_vxc_create 896 10.8 0.023 0.040 57.095
57.943
xc_vxc_pw_create 896 11.8 0.447 0.561 57.071
57.917
mp_alltoall_z22v 9835 16.8 53.387 57.283 53.387
57.283
mp_waitany 129318 14.3 42.683 56.101 42.683
56.101
mp_alltoall_d11v 13495 12.2 50.528 55.324 50.528
55.324
fft_wrap_pw1pw2_150 4459 13.1 1.760 2.222 53.418
54.942
mp_waitall_1 1080708 14.5 42.790 54.894 42.790
54.894
-------------------------------------------------------------------------------
- T I M I N G 128cpu (1 node)
-
-
-
-------------------------------------------------------------------------------
SUBROUTINE CALLS ASD SELF TIME TOTAL
TIME
MAXIMUM AVERAGE MAXIMUM AVERAGE
MAXIMUM
CP2K 1 1.0 0.030 0.032 3072.794
3072.814
qs_mol_dyn_low 1 2.0 0.006 0.007 3072.442
3072.528
qs_forces 21 4.0 0.006 0.008 3071.900
3071.921
qs_energies 21 5.0 0.003 0.004 3024.241
3024.317
scf_env_do_scf 21 6.0 0.004 0.006 2969.550
2971.818
scf_env_do_scf_inner_loop 875 6.9 0.047 0.532 2969.499
2971.766
velocity_verlet 20 3.0 0.006 0.008 2794.534
2794.562
qs_scf_new_mos 875 7.9 0.023 0.028 2271.468
2273.627
eigensolver 875 8.9 0.095 0.206 2185.491
2186.709
cp_fm_syevd 895 10.0 0.020 0.031 1767.954
1770.227
cp_fm_redistribute_end 895 11.0 286.821 1759.007 288.074
1759.686
cp_fm_syevd_base 895 10.9 1465.425 1740.700 1465.425
1740.700
cp_fm_triangular_multiply 2625 9.9 410.654 416.288 410.654
416.288
rebuild_ks_matrix 896 8.8 0.008 0.010 361.723
362.634
qs_ks_build_kohn_sham_matrix 896 9.8 0.188 0.243 361.716
362.627
qs_ks_update_qs_env 875 7.9 0.024 0.035 345.631
346.530
qs_rho_update_rho_low 896 7.9 0.011 0.020 319.796
320.976
calculate_rho_elec 896 8.9 0.281 0.443 319.785
320.964
sum_up_and_integrate 896 10.8 0.543 0.936 261.772
262.746
integrate_v_rspace 896 11.8 0.037 0.048 261.227
262.364
rs_pw_transfer 7252 12.3 0.140 0.161 208.866
216.256
density_rs2pw 896 9.9 0.045 0.055 163.812
170.509
grid_integrate_task_list 896 12.8 148.592 155.403 148.592
155.403
grid_collocate_task_list 896 9.9 140.797 144.515 140.797
144.515
mp_waitany 89824 14.3 124.591 134.833 124.591
134.833
rs_pw_transfer_RS2PW_150 938 11.9 17.719 20.395 113.331
121.388
potential_pw2rs 896 12.8 0.068 0.077 89.497
90.026
qs_vxc_create 896 10.8 0.030 0.051 79.793
81.837
xc_vxc_pw_create 896 11.8 0.725 0.982 79.764
81.813
pw_transfer 11627 11.8 0.899 1.097 68.900
73.972
fft_wrap_pw1pw2 9835 12.8 0.103 0.124 66.963
72.074
fft_wrap_pw1pw2_150 4459 13.1 4.627 5.429 55.447
62.287
mp_alltoall_d11v 13495 12.2 51.004 61.546 51.004
61.546
-------------------------------------------------------------------------------
- T I M I N G 32cpu (1/4 of a node)
-
-
-
-------------------------------------------------------------------------------
SUBROUTINE CALLS ASD SELF TIME TOTAL
TIME
MAXIMUM AVERAGE MAXIMUM AVERAGE
MAXIMUM
CP2K 1 1.0 0.019 0.027 9152.866
9152.875
qs_mol_dyn_low 1 2.0 0.004 0.004 9152.558
9152.567
qs_forces 21 4.0 0.005 0.006 9152.148
9152.157
qs_energies 21 5.0 0.003 0.003 9047.928
9047.996
scf_env_do_scf 21 6.0 0.003 0.006 8925.472
8925.721
scf_env_do_scf_inner_loop 875 6.9 0.058 0.394 8925.423
8925.670
velocity_verlet 20 3.0 0.003 0.004 8295.142
8295.162
qs_scf_new_mos 875 7.9 0.029 0.036 7036.433
7041.080
eigensolver 875 8.9 0.091 0.135 6743.683
6746.404
cp_fm_syevd 895 10.0 0.030 0.042 5143.201
5145.910
cp_fm_syevd_base 895 10.9 5142.498 5145.756 5142.498
5145.756
cp_fm_triangular_multiply 2625 9.9 1559.272 1568.419 1559.272
1568.419
rebuild_ks_matrix 896 8.8 0.007 0.008 1019.477
1020.322
qs_ks_build_kohn_sham_matrix 896 9.8 0.173 0.238 1019.470
1020.316
qs_ks_update_qs_env 875 7.9 0.011 0.017 991.115
991.962
sum_up_and_integrate 896 10.8 2.655 2.858 738.316
739.290
integrate_v_rspace 896 11.8 0.044 0.051 735.658
736.910
qs_rho_update_rho_low 896 7.9 0.009 0.011 723.261
723.903
calculate_rho_elec 896 8.9 0.999 1.064 723.252
723.894
grid_integrate_task_list 896 12.8 537.672 550.615 537.672
550.615
grid_collocate_task_list 896 9.9 485.059 489.036 485.059
489.036
pw_transfer 11627 11.8 0.912 1.071 286.851
294.441
fft_wrap_pw1pw2 9835 12.8 0.129 0.145 275.251
282.867
fft_wrap_pw1pw2_150 4459 13.1 27.356 28.582 256.292
266.093
density_rs2pw 896 9.9 0.055 0.061 218.157
223.442
rs_pw_transfer 7252 12.3 0.168 0.187 208.620
214.139
fft3d_ps 9835 14.8 127.773 133.888 194.643
207.229
calculate_dm_sparse 895 8.9 0.181 0.222 186.857
191.068
cp_dbcsr_plus_fm_fm_t_native 916 9.9 0.063 0.073 186.139
190.766
qs_vxc_create 896 10.8 0.029 0.036 182.589
186.938
xc_vxc_pw_create 896 11.8 3.250 4.661 182.560
186.906
-------------------------------------------------------------------------------
--
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+unsubscribe at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/f8c088b6-4ab1-41bc-a2ac-15cd28d0ebf4n%40googlegroups.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20230421/ada33f80/attachment-0001.htm>
More information about the CP2K-user
mailing list