terrible performance across infiniband

Andreas Glöss andreas... at gmail.com
Tue Mar 22 08:21:25 UTC 2016

Hi Ron,

There are several things in your ARCH-file that doesn't fit together, or at 
least make no sense to me.
1) -I$(MKLROOT)/include, MKL is not used in your case.
2) reference (netlib) lapack, scalapack, openblas, will never give you peak 
performance, better use MKL if available
3) not sure, but CP2K + ELPA-2015-11-10 was never tested yet?

Please provide a snippet of the TIMINGS section (~30 first lines) - maybe 
we can locate the problem from there.

Btw., even thought that PSMP should run most efficient on MPI+OMP machine, 
we usually find that the pure POPT (no OMP) runs faster. Could you try this 
as well - 2 nodes, each running 16 MPI tasks?
To do this please remove '-fopenmp', '-lomp' and compile and link the 
non-threaded versions of FFTW3 and ELPA.

Best regards,

