[CP2K-user] CP2K 7.1-Cuda Bandgap and HF energies different from previous versions

Leopold Talirz leopol... at gmail.com
Mon May 4 16:12:06 UTC 2020


Dear Fabian,

thanks a lot for checking and for pinning down the issue.

Since this is a rather serious issue, my first instinct was to check on the 
performance page of cp2k to see whether CUDA + OMP was ever used in 
benchmark studies.
https://www.cp2k.org/performance

Unfortunately, it is not clear to me from the page - something I now 
remember to have run in before:
E.g. for some systems it says explicitly "no GPU" but for others that can 
have a GPU (like Cray XC40) it does not say it and it is not clear whether 
this means the GPU was used or not.
May I suggest to the maintainer of this page to make this information 
explicit?

And if it turns out that there are currently no tests including the CUDA 
version on the list, perhaps it would make sense to include some?

Best wishes from Bern,
Leopold




On Monday, 4 May 2020 17:35:08 UTC+2, Fabian Ducry wrote:
>
> Dear Andres,
>
> I can confirm and reproduce the issue. Apparently it appears when 
> combining CUDA + OMP in hybrid calculations. In that case the energy 
> becomes a function of #OMP threads per rank. For your input I got (cp2k 
> 8.0, revision 3e7b916, run on Piz Daint)
>                                                                 no cuda    
>                   OMP_NUM_THREADS = 1          OMP_NUM_THREADS = 3          
> OMP_NUM_THREADS = 6
>   Exchange-correlation energy:          -433.84964308969535               
> -433.84964308969302                -435.33426106395467                  
> -435.96513615032325
>   Hartree-Fock Exchange energy:      -127.87395928499694                
> -127.87395928499325                -125.97109874333140                  
> -125.24809389970088
>   Total energy:                                -1976.39722899739672       
>         -1976.39722899739013              
> -1975.95046919253809                 -1975.87080541858177
>
> Without OMP parallelization the energies agrees with the calculation 
> without CUDA accelleration. Increasing OMP_NUM_THREADS beyond 1 increases 
> the Hartree-Fock Exchange energy.
> Apparently you have to disable OMP to obtain correct results. This is 
> obviously not very satisfying and I hope this gets fixed. I see that you 
> used 1 MPI/12 OMP ranks per node. Try increasing the number of MPI ranks 
> per node. To do so you have to set 
> export CRAY_CUDA_MPS=1 in the submission script.
>
> I hope this helps.
>
> Best,
> Fabian
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20200504/93b612a7/attachment.htm>


More information about the CP2K-user mailing list