No speedup using Intel MKL libraries?

Faraz H fa... at
Tue Nov 7 14:40:27 UTC 2017

Thanks Alfio for looking deeper into it!  I ran more tests and you are 
indeed correct; i.e the MKL functions are running serially ( one cpu ). But 
the rest of the code is using all cpus of the machine. In the local.ssmp 
makefile I saw that libmkl_sequential.a is linked. So I changed it to 
libmkl_intel_thread.a and libiomp5.a . Now the benchmark H20-128 runs in 5 
minutes on 28 cpu machine compared to 7 minutes for the non-MKL linked 

I wonder if it is a bug that the toolchain links the serial MKL when 
creating the local.ssmp makefiles? In what situation would someone want the 
sequential MKL libraries linked instead of the parallel ones for ssmp ?

On Monday, November 6, 2017 at 7:11:53 AM UTC-5, Alfio Lazzaro wrote:
> Dear Farah,
> OK, this is the comparison of the two runs for functions where I see the 
> highest timing discrepancy (time in seconds, second column w/ MKL, third 
> column w/o MKL)
> dbcsr_make_untransposed_blocks     4.139     1.591
> cp_fm_gemm                         5.691     1.087
> setup_rec_index_2d                 6.330     1.741
> cp_fm_cholesky_decompose          11.539     1.703  
> cp_fm_cholesky_invert             26.048     3.031 
> Well, personally I don't understand the differences in the 1st and 3rd 
> line, likely it was a fluctuation.
> For the other lines, these are MKL related (DGEMM and 
> Cholesky decomposition). My suspicious is that you are using MKL in 
> sequential, while Openblas is somehow using threads. A way to test it is to 
> run with a single thread (or less threads in general), the difference 
> should become smaller. I would also suggest to use the PSMP version.
> Alfio
> Il giorno giovedì 2 novembre 2017 15:33:13 UTC+1, Faraz H ha scritto:
>> Thanks, I am attaching the output of two runs. One with the gcc4.9 
>> executable and other with the MKL libraries and gcc4.9. Interestingly the 
>> results are not always consistent when I run the model multiple times. 
>> Sometimes the MKL one is faster by ~30 seconds overall. Sometimes slower. 
>> So perhaps something going on my system. Curious what you see.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the CP2K-user mailing list