[CP2K-user] [CP2K:20885] Re: CP2K on LUMI

Daniele Passerone dpasserone at gmail.com
Fri Nov 8 13:41:57 UTC 2024


Great, I will reach privately to you Alfio. Thank you

On Friday, November 8, 2024 at 2:37:01 PM UTC+1 Alfio Lazzaro wrote:

> Ciao Daniele,
> The output
>
> DBCSR| ACC: GPU backend is enabled                                        
>     T (D)
>
> is from DBCSR. Yes, it was added in 2024.2 (only the print). Clearly, it 
> was "T" (==TRUE) also in 2024.1, only the print is now added (with a way to 
> disable it). So, no changes from the functional side.
> Still, the new DBCSR provides all kernels in 2024.2 provides AMD kernels, 
> with a quite large boost in performance on LUMI (and this is what I 
> suggested to Emanuele). 
>
> But then the error you see is on FFT, which I'm unfamiliar with...
>
> Please reach out to me privately for details on LUMI. The best is to open 
> a ticket on the LUMI system and ask for advice if there is support for CP2K 
> (this is really a support of the application). There are multiple channels:
> 1) LUMI coffee-breaks (once per month), see 
> https://www.lumi-supercomputer.eu/events/usercoffeebreaks/ for the past 
> event
> 2) LUMI porting application project, see 
> https://www.lumi-supercomputer.eu/open-call-for-porting-optimizing-gpu-2024/ 
> for the past call
> 3) LUMI hackathons (at least once per year)
>
> Alfio
>
>
> Il giorno giovedì 7 novembre 2024 alle 15:19:57 UTC+1 Daniele Passerone ha 
> scritto:
>
>> Dear forum, 
>>
>> Recently the supercomputer LUMI has been upgraded with the LUMI/24.03 
>> software environment. 
>> With the version 23.09 we could run on the GPU partition (8 Gpu per 
>> node), following the prescription:
>>
>>
>>    - *When running on LUMI-G, run using 8 MPI ranks per compute node, 
>>    where each rank has access to 1 GPU in the same NUMA zone. This also means 
>>    that you have to OMP_NUM_THREADS=6-7 to utilize all CPU cores. Please note 
>>    that using all 64 cores will not work as the first core in each CCD is 
>>    reserved for the operating system, so that only 56 cores are available.*
>>
>> The version we use on the old environment 23.09 was 
>>
>> CP2K/2024.1-cpeGNU-23.09-GPU
>>
>>
>> (easybuild)
>>
>>
>> Which is described on the LUMI website as 
>>
>>
>> *"CP2K 2024.1 release compiled with AMD GPU support enabled for CP2K 
>> itself and several of the libraries (SpFFT, SpLA). Cray Programming 
>> Environment 23.09 used together with the unsupported rocm/5.6.1 module 
>> installed by the LUMI Support Team."*
>>
>>
>> With the new environment, we are advised to compile accordingly, using 
>> easybuild. 
>>
>>
>> https://lumi-supercomputer.github.io/LUMI-EasyBuild-docs/c/CP2K/
>>
>>
>> The code was compiled (2024.2) , but then DFT SCF steps fail with an 
>> error like that:
>>
>>
>>
>>
>> *******************************************************************************
>> * ___
>> *
>> * / \
>> *
>> * [ABORT] 
>> *
>> * \___/ G vector not found
>> *
>> * | 
>> *
>> * O/| 
>> *
>> * /| | 
>> *
>> * / \ pw/pw_grids.F:1848
>> *
>>
>> *******************************************************************************
>>
>> or during an initial part of the run 
>>
>>
>>  
>>
>>  
>>
>>  *** WARNING in atoms_input.F:123 :: Overwriting coordinates. Active    *** 
>>    
>>
>>  *** coordinates read from &COORD section. Active coordinates READ from 
>> ***    
>>
>>  *** &COORD section                                                     
>> *** 
>>
>>  
>>
>> in which the job quits without any error message. 
>>
>>
>> Questions:
>>
>>
>> 1) Is there somebody who can help me understanding why those jobs fail, 
>> and how to properly compile cp2k on lumi?
>>
>>
>> The LUMI support (Emanuele Vitali) discovered that the newest cp2k 
>> version (with 24.03 environment, CP2K 2024.2) has a line in the output:
>>
>>
>>  DBCSR| ACC: GPU backend is enabled                                      
>>       T (D)
>>
>> that is not present in the CP2K 2024.1 compiled with 23.09. 
>>
>> So his hypothesis was that the CP2K 2024.1 that was working well was *NOT 
>> using GPU support., and that the problems in 2024.2 24.03 come from trying 
>> to use GPU support. *
>>
>> In my opinion (and also Marcella Iannuzzi's) this makes no sense, since 
>> we are sure that the scaling and performance (1 RANK - 1 GPU) was going 
>> well with the old version.
>>
>> 2) Is it true that the line "GPU backend is enabled" was added in 2024.2?
>>
>>
>> Thank you for any help, 
>>
>> Daniele
>>
>>
>>  
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+unsubscribe at googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/cp2k/24f386a8-3b0c-46cd-8bd7-683073cc99c9n%40googlegroups.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20241108/476bc685/attachment.htm>


More information about the CP2K-user mailing list