[CP2K-user] [CP2K:17635] cellopt calculation on EIGER aborted

'Miriam Jasmin Pougin' via cp2k cp2k at googlegroups.com
Thu Sep 8 13:24:07 UTC 2022


Hello Matthias,

Thanks a lot for your fast reply and explanations. As you suggest, I tried 
with the CPU implementation and that solved the memory problem on Daint. It 
is working fine now, thank you again for your help.

Best regards,
Miriam

Le jeudi 8 septembre 2022 à 13:28:12 UTC+2, Matthias Krack a écrit :

> Hello
>
>  
>
> There is a hard limit (48*1024) coded in GPU grid routines of CP2K because 
> of the limited GPU memory available. Using more nodes does not help here, 
> because this won’t increase the shared memory available per GPU. A work 
> around is to use the CPU implementation of grid_integrate instead of the 
> GPU implementation by selecting the grid BACKEND 
> <https://manual.cp2k.org/cp2k-2022_1-branch/CP2K_INPUT/GLOBAL/GRID.html#BACKEND> 
> CPU explicitly (the default is AUTO which will then select GPU 
> automatically on Piz Daint). Alternatively, you can try to change the code 
> and increase that limit, e.g. to 51*1024, with the risk, however, of 
> triggering other problems.
>
>  
>
> I don’t know what causes the error on Eiger.
>
>  
>
> HTH
>
>  
>
> Matthias
>
>  
>
> *From: *"cp... at googlegroups.com" <cp... at googlegroups.com>
> *Reply to: *"cp... at googlegroups.com" <cp... at googlegroups.com>
> *Date: *Thursday, 8 September 2022 at 11:37
> *To: *"cp... at googlegroups.com" <cp... at googlegroups.com>
> *Subject: *[CP2K:17629] cellopt calculation on EIGER aborted
>
>  
>
> Hello all,
>
> I am trying to run a cell-optimization for a metal-organic framework using 
> the scan functional and rvv10 vdw functional. As I had problems with SCF 
> convergence, I increased the cutoff and used the NN50_SMOOTH method for 
> calculating the XC derivatives and the nn50 density smoothing for the xc 
> calculations, as suggested in another conversation here.
> The singlepoint calculation converged with these settings, but when I 
> tried to run the cellopt on piz daint (32 nodes, 64GB RAM per node) I got 
> an out-of memory error:
> "ERROR: Not enough shared memory in grid_gpu_integrate.
> cab_len: 4704, alpha_len: 1512, cxyz_len: 364, total smem_per_block: 
> 51.406250 kb"
>
>
> So I tried running the calculations on Alps (Eiger) instead (256GB 
> RAM/node). Now I get an error in the cp2k outfile as soon as the SCF 
> calculation starts that I don't understand:
> "libfabric:187819:1662628695:cxi:core:cxip_ux_onload_cb():2259<warn> 
> nid001534: RXC (0x2300:32:0): PtlTE 105LE resources not recovered during 
> flow control. FI_CXI_RX_MATCH_MODE=[hybrid|software] is required.
>
> Program received signal SIGABRT: Process abort signal."
>
> Does someone have an idea what went wrong?
> I am using cp2k-9.1, I attach you my input file and the outfile with the 
> complete error message. 
> Thank you!
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "cp2k" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to cp2k+uns... at googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/cp2k/c0f4eecc-78a1-407c-a18d-20d35785d392n%40googlegroups.com 
> <https://groups.google.com/d/msgid/cp2k/c0f4eecc-78a1-407c-a18d-20d35785d392n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
>

-- 
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+unsubscribe at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/a4c8ead1-b7d0-4f3e-8674-25a0fbbdbfcbn%40googlegroups.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20220908/29955385/attachment-0001.htm>


More information about the CP2K-user mailing list