<div dir="ltr">Hi Ron, There are several things in your ARCH-file that doesn't fit together, or at least make no sense to me. 1) -I$(MKLROOT)/include, MKL is not used in your case. 2) reference (netlib) lapack, scalapack, openblas, will never give you peak performance, better use MKL if available 3) not sure, but CP2K + ELPA-2015-11-10 was never tested yet? Please provide a snippet of the TIMINGS section (~30 first lines) - maybe we can locate the problem from there. Btw., even thought that PSMP should run most efficient on MPI+OMP machine, we usually find that the pure POPT (no OMP) runs faster. Could you try this as well - 2 nodes, each running 16 MPI tasks? To do this please remove '-fopenmp', '-lomp' and compile and link the non-threaded versions of FFTW3 and ELPA. Best regards, Andreas </div>