[CP2K-user] {Spam?} [CP2K:12029] Performance of CP2K for MPT vs IntelMPI

Krack Matthias (PSI) matthi... at psi.ch
Fri Aug 2 21:31:52 UTC 2019


Hi Chris

You can try to launch the jobs with I_MPI_DEBUG=5 set to get more insight how the processes are distributed and which interface is used. Btw, MKL should include FFTW3 and thus there is no need for a separate installation of that lib.

Matthias

Von: cp... at googlegroups.com <cp... at googlegroups.com> Im Auftrag von Christmas Mwk
Gesendet: Freitag, 2. August 2019 22:40
An: cp2k <cp... at googlegroups.com>
Betreff: {Spam?} [CP2K:12029] Performance of CP2K for MPT vs IntelMPI

Hi all,

Recently I was trying to run H2O-64 benchmark on CIRRUS. I compiled CP2K versions "popt" and "psmp" with GCC 8.2, FFTW3, Libint, libxsmm and MKL 2019.3. For MPI I used MPT 2.18 and IntelMPI 18.0.5.274 (both available as modules on CIRRUS so no problems with infiniband) for comparison. In my surprise, while on single node IntelMPI had better performance (59.5s) compared to what is shown in cirrus-h2o-64<https://www.cp2k.org/performance:cirrus-h2o-64>, when I tried to run it across more than 1 node, eg 4 nodes(144 cores), the runtime wasn't scaling. However in the case of MPT the results were similar to what is published on the web. Observing the output timings, I saw that a significant portion of the time is spent on mp_wait_any and mp_waitall_1 (around 21s out of 60s) in the case of IntelMPI across 4 nodes, while for MPT only around 6s is spent in these routines with overall runtime around 28s.

Initially I suspected that using IntelMPI might require some manual process pinning so I tried various options such as setting I_MPI_PIN_DOMAIN to compact, core etc. While there was some improvement in performance, these overheads in MPI routines were still the same. I also considered IntelMPI 2017 but same performance obtained. Additionally, similar results are obtained for both "popt" and "psmp" with OMP threads set to 1. I assume that if there was a load imbalance issue then performance for both MPT and IntelMPI would have been comparable but still not sure.

Is there anything that I am missing here or is this performance behaviour expected in the case of IntelMPI? If performance should be similar or comparable, could you please suggest how I can launch the executable using mpirun and IntelMPI?

Thank you in advance. Any help would be much appreciated. I attach the arch files (popt files are similar) and run time results for the runs on 4 nodes for IntelMPI(compact and core), MPT and also the single node result for IntelMPI. Below, examples of how the executable is launched are also provided.

Best,
Chris

Mpt
export OMP_NUM_THREADS=1
/lustre/sw/cp2k/4.1.17462/cp2k/cp2k/exe/mpt/placement 1

mpiexec_mpt -n 144 -ppn 36 dplace -p place.txt /lustre/home/d167/s1887443/scc/cp2k/exe/broadwell-o2-libs-mpt/cp2k.psmp H2O-64.inp

Compact
export OMP_NUM_THREADS=1


mpirun -n 144 -ppn 36 -env I_MPI_PIN_DOMAIN omp -env I_MPI_PIN_ORDER compact -print-rank-map /lustre/home/d167/s1887443/scc/cp2k/exe/broadwell-o2-libs/cp2k.psmp -i H2O-64.inp -o out.txt

Core

export OMP_NUM_THREADS=1
export I_MPI_PIN_DOMAIN=core

mpirun -n 144 -ppn 36 -genv I_MPI_PIN_DOMAIN=core -genv OMP_NUM_THREADS=1 -print-rank-map /lustre/home/d167/s1887443/scc/cp2k/exe/broadwell-o2-libs/cp2k.psmp -i H2O-64.inp -o out.txt
--
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp... at googlegroups.com<mailto:cp... at googlegroups.com>.
To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/885a737b-0798-468e-b7c5-eb46c4bf5255%40googlegroups.com<https://groups.google.com/d/msgid/cp2k/885a737b-0798-468e-b7c5-eb46c4bf5255%40googlegroups.com?utm_medium=email&utm_source=footer>.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20190802/165623b8/attachment.htm>


More information about the CP2K-user mailing list