[CP2K-user] Performance of CP2K for MPT vs IntelMPI
Hans Pabst
hf.... at gmail.com
Wed Aug 7 07:18:53 UTC 2019
Hello Chris,
Matthias is right - show it to the admins since it can be also related to
e.g. the setup of the job scheduler. If you want to make an additional
experiment, you can set I_MPI_HYDRA_BOOTSTRAP=slurm. I guess you are using
Slurm?
For CP2K with Intel bits (MPI, MKL, IFORT but also GFortran), I am
maintaining a recipe for CP2K 7.x and a step-by-step guide for CP2K 6.1
<https://xconfigure.readthedocs.io/cp2k/#step-by-step-guide> (still the
latest release). The performance section
<https://xconfigure.readthedocs.io/cp2k/#performance> also gives some hints
for tuning iMPI (I_MPI_COLL_INTRANODE=pt2pt, I_MPI_ADJUST_REDUCE=1,
I_MPI_ADJUST_BCAST=1). This is written equally for InfiniBand and Omnipath,
and running CP2K <https://xconfigure.readthedocs.io/cp2k/#running-cp2k>
with Intel MPI/OMP hybrid has its own section (I_MPI_PIN_DOMAIN=auto,
I_MPI_PIN_ORDER=bunch, OMP_PLACES=threads, OMP_PROC_BIND=SPREAD,
OMP_NUM_THREADS).
Hans
Am Freitag, 2. August 2019 22:40:24 UTC+2 schrieb Christmas Mwk:
>
> Hi all,
>
> Recently I was trying to run H2O-64 benchmark on CIRRUS. I compiled CP2K
> 6.1 versions "popt" and "psmp" with GCC 8.2, FFTW3, Libint, libxsmm and MKL
> 2019.3. For MPI I used MPT 2.18 and IntelMPI 18.0.5.274 (both available as
> modules on CIRRUS so no problems with infiniband) for comparison. In my
> surprise, while on single node IntelMPI had better performance (59.5s)
> compared to what is shown in cirrus-h2o-64
> <https://www.cp2k.org/performance:cirrus-h2o-64>, when I tried to run it
> across more than 1 node, eg 4 nodes(144 cores), the runtime wasn't scaling.
> However in the case of MPT the results were similar to what is published on
> the web. Observing the output timings, I saw that a significant portion of
> the time is spent on mp_wait_any and mp_waitall_1 (around 21s out of 60s)
> in the case of IntelMPI across 4 nodes, while for MPT only around 6s is
> spent in these routines with overall runtime around 28s.
>
> Initially I suspected that using IntelMPI might require some manual
> process pinning so I tried various options such as setting I_MPI_PIN_DOMAIN
> to compact, core etc. While there was some improvement in performance,
> these overheads in MPI routines were still the same. I also considered
> IntelMPI 2017 but same performance obtained. Additionally, similar results
> are obtained for both "popt" and "psmp" with OMP threads set to 1. I assume
> that if there was a load imbalance issue then performance for both MPT and
> IntelMPI would have been comparable but still not sure.
>
> Is there anything that I am missing here or is this performance behaviour
> expected in the case of IntelMPI? If performance should be similar or
> comparable, could you please suggest how I can launch the executable using
> mpirun and IntelMPI?
>
> Thank you in advance. Any help would be much appreciated. I attach the
> arch files (popt files are similar) and run time results for the runs on 4
> nodes for IntelMPI(compact and core), MPT and also the single node result
> for IntelMPI. Below, examples of how the executable is launched are also
> provided.
>
> Best,
> Chris
>
> Mpt
> export OMP_NUM_THREADS=1
> /lustre/sw/cp2k/4.1.17462/cp2k/cp2k/exe/mpt/placement 1
>
> mpiexec_mpt -n 144 -ppn 36 dplace -p place.txt
> /lustre/home/d167/s1887443/scc/cp2k/exe/broadwell-o2-libs-mpt/cp2k.psmp
> H2O-64.inp
>
> Compact
> export OMP_NUM_THREADS=1
>
>
> mpirun -n 144 -ppn 36 -env I_MPI_PIN_DOMAIN omp -env I_MPI_PIN_ORDER
> compact -print-rank-map
> /lustre/home/d167/s1887443/scc/cp2k/exe/broadwell-o2-libs/cp2k.psmp -i
> H2O-64.inp -o out.txt
>
> Core
>
> export OMP_NUM_THREADS=1
> export I_MPI_PIN_DOMAIN=core
>
> mpirun -n 144 -ppn 36 -genv I_MPI_PIN_DOMAIN=core -genv OMP_NUM_THREADS=1
> -print-rank-map
> /lustre/home/d167/s1887443/scc/cp2k/exe/broadwell-o2-libs/cp2k.psmp -i
> H2O-64.inp -o out.txt
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20190807/14aafc93/attachment.htm>
More information about the CP2K-user
mailing list