MPI_wait problem in cp2k 4.1 with openmpi_2.0.0

Alfio Lazzaro alfio.... at
Wed Mar 29 11:44:36 UTC 2017

I'm sorry, my request was to run CP2K 2.1 with OpenMPI 2.0.0. This test 
would tell us if there is a problem in CP2K 4.1 test, or if it is something 
related to OpenMPI version. 
In any case, let's first analyze your logs. I see that there are a lot of 
First of all, the energy result:

ENERGY| Total FORCE_EVAL ( QS ) energy (a.u.):           
ENERGY| Total FORCE_EVAL ( QS ) energy (a.u.):           

It is a quite large difference... By looking at the SCF steps, I see that 
4.1 does 

*** SCF run converged in    13 steps ***

while the 2.1 does

*** SCF run converged in     7 steps ***

That's a large difference! The 2.1 does a completely different set of 
calculations (much less than 4.1). For instance, the number of MP_Wait 
calls are:

cp2k-2.1: 27263
cp2k-4.1: 198040

So, definitely I would conclude that the problem is not in OpenMPI. At this 
point, I will stop to compare with cp2k-2.1. Assuming that you are 
interested in using the newer CP2K, I suggest to check first your 
installation by running the regression tests:

A couple of other suggestions:
1) I know that x86-64 arch file is pretty complicated, but this is the 
right arch file for your tests. To get a simplified version, you can see 
the arch file used in our regression tests dashboard, for 
example: (the arch file is 
on the top)
2) you can get the best performance when using a minimal *square* number of 
MPI ranks. In your case, I see that you are requesting 24 ranks, I think 
the best would be to use 4 ranks and 6 threads (for which you have to use 
PSMP version, example at
3) you can improve the overall performance by using libxsmm library. Take a 
look at INSTALL file on how to use this library with CP2K


Il giorno mercoledì 29 marzo 2017 10:36:43 UTC+2, jim wang ha scritto:
> I did what you said and checked the x86-64 arhc files but they are really 
> complicated with so many command lines. I think they are not useful for my 
> compilation demands. As for the libraries, I am sure that the libraries 
> linked to cp2k-4.1 and cp2k-2.1 are nearly the same except for FFT 
> libs(Intel FFT for cp2k-4.1 and FFTW3 for cp2k-2.1). My tests showed that 
> cp2k-4.1 runned at exactly slow manner with Intel FFT libs and FFTW3 lib.
> The two output files are attached to my reply. I hope we can find out what 
> is the ultimate reason for the bad performance of cp2k-4.1. 
> 在 2017年3月29日星期三 UTC+8下午4:14:08,Alfio Lazzaro写道:
>> OK, I answered to another email related to your problem, where I said 
>> that Intel Xeon is a x86-64 architecture. IA64 is the Intel Itanium. 
>> Therefore, please us the x86-64 arch file as a template. Anyway, this is 
>> not really related to your problem with OpenMPI (I hope so!)...
>> Concerning your last email, yes, please attach the CP2K logs.
>> Then, have you tried to compile CP2K 4.1 with the same CP2K 2.1 libraries 
>> (or vice versa)?
>> Alfio 
>> Il giorno lunedì 27 marzo 2017 11:38:52 UTC+2, jim wang ha scritto:
>>> Hi, everybody!
>>> I am using cp2k 4.1 for the testing in our new cluster. But strangly, 
>>> the result showed that the cp2k 4.1 version is 3 to 4 times slower than 
>>> cp2k 2.1 version built on the same cluster. After examining the output file 
>>> genertated by both binary file running the same job, I found out that the 
>>> MPI_wait function may be the key problem.
>>> Here is the result of time consumed by MPI_wait function:
>>> 1. cp2k 4.1: MPI_wait time:1131(s) , Total run time: 1779(s)
>>> 2. cp2k 2.1: MPI_wait time:68(s), Total run time: 616(s)
>>> How can I determine whether the problem should be with our cluster or 
>>> the compilation?
>>> Hope you guys can give me some hints on the version comparison.
>>> THANKS!!!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the CP2K-user mailing list