terrible performance across infiniband
Andreas Glöss
andreas... at gmail.com
Tue Mar 22 08:21:25 UTC 2016
Hi Ron,
There are several things in your ARCH-file that doesn't fit together, or at
least make no sense to me.
1) -I$(MKLROOT)/include, MKL is not used in your case.
2) reference (netlib) lapack, scalapack, openblas, will never give you peak
performance, better use MKL if available
3) not sure, but CP2K + ELPA-2015-11-10 was never tested yet?
Please provide a snippet of the TIMINGS section (~30 first lines) - maybe
we can locate the problem from there.
Btw., even thought that PSMP should run most efficient on MPI+OMP machine,
we usually find that the pure POPT (no OMP) runs faster. Could you try this
as well - 2 nodes, each running 16 MPI tasks?
To do this please remove '-fopenmp', '-lomp' and compile and link the
non-threaded versions of FFTW3 and ELPA.
Best regards,
Andreas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20160322/03fb2eca/attachment.htm>
More information about the CP2K-user
mailing list