No speedup using Intel MKL libraries?
alfio.... at gmail.com
Tue Nov 7 15:10:08 UTC 2017
Well, the sequential version of MKL is somehow the best choice since it can
be that we call MKL functions within OpenMP threads in CP2K. I must say
that this is was an old solution because of safety, nowadays MKL is able to
dynamically change the number of threads (see
), so it should be fine to use the threaded version.
Il giorno martedì 7 novembre 2017 15:40:27 UTC+1, Faraz H ha scritto:
> Thanks Alfio for looking deeper into it! I ran more tests and you are
> indeed correct; i.e the MKL functions are running serially ( one cpu ). But
> the rest of the code is using all cpus of the machine. In the local.ssmp
> makefile I saw that libmkl_sequential.a is linked. So I changed it to
> libmkl_intel_thread.a and libiomp5.a . Now the benchmark H20-128 runs in 5
> minutes on 28 cpu machine compared to 7 minutes for the non-MKL linked
> I wonder if it is a bug that the toolchain links the serial MKL when
> creating the local.ssmp makefiles? In what situation would someone want the
> sequential MKL libraries linked instead of the parallel ones for ssmp ?
> On Monday, November 6, 2017 at 7:11:53 AM UTC-5, Alfio Lazzaro wrote:
>> Dear Farah,
>> OK, this is the comparison of the two runs for functions where I see the
>> highest timing discrepancy (time in seconds, second column w/ MKL, third
>> column w/o MKL)
>> dbcsr_make_untransposed_blocks 4.139 1.591
>> cp_fm_gemm 5.691 1.087
>> setup_rec_index_2d 6.330 1.741
>> cp_fm_cholesky_decompose 11.539 1.703
>> cp_fm_cholesky_invert 26.048 3.031
>> Well, personally I don't understand the differences in the 1st and 3rd
>> line, likely it was a fluctuation.
>> For the other lines, these are MKL related (DGEMM and
>> Cholesky decomposition). My suspicious is that you are using MKL in
>> sequential, while Openblas is somehow using threads. A way to test it is to
>> run with a single thread (or less threads in general), the difference
>> should become smaller. I would also suggest to use the PSMP version.
>> Il giorno giovedì 2 novembre 2017 15:33:13 UTC+1, Faraz H ha scritto:
>>> Thanks, I am attaching the output of two runs. One with the gcc4.9
>>> executable and other with the MKL libraries and gcc4.9. Interestingly the
>>> results are not always consistent when I run the model multiple times.
>>> Sometimes the MKL one is faster by ~30 seconds overall. Sometimes slower.
>>> So perhaps something going on my system. Curious what you see.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the CP2K-user