[CP2K-user] [CP2K:19485] Memory Leak on CP2k 9.1

Quentin Pessemesse q.pessemesse at gmail.com
Thu Nov 9 14:33:05 UTC 2023


Dear Matthias, 
Thank you very much for your help, we were able to solve the memory leak 
issue by using the mpi that is compiled with the image, with some 
modifications to the command you provided to account for specificities of 
the cluster.
Best,
Quentin

Le vendredi 6 octobre 2023 à 15:16:09 UTC+2, Krack Matthias a écrit :

> Dear Quentin
>
>  
>
> These containers are built differently following the usual cp2k toolchain 
> installation process. There is no /opt/cp2k-toolchain/ folder but the 
> folder /opt/cp2k/tools/toolchain/install/. There is no need, however, to 
> source that setup file, because the entrypoint.sh 
> <https://github.com/mkrack/cp2k/blob/master/tools/docker/production/Dockerfile.2023.2_mpich_generic_psmp#L83> 
> script takes already care of that. You should be able to run the container 
> as described in the READ.me 
> <https://github.com/mkrack/cp2k/blob/master/tools/docker/production/README.md>
> .
>
>  
>
> Best
>
>  
>
> Matthias
>
>  
>
> *From: *cp... at googlegroups.com <cp... at googlegroups.com> on behalf of 
> Quentin Pessemesse <q.pess... at gmail.com>
> *Date: *Friday, 6 October 2023 at 14:59
> *To: *cp2k <cp... at googlegroups.com>
> *Subject: *Re: [CP2K:19312] Memory Leak on CP2k 9.1
>
> Dear Matthias, 
>
> Thank you kindly for your advice, I will try these different versions as 
> soon as possible.
>
> I've built the docker image for an openmpi version of CP2k on the cluster. 
> With version 2023.1, I used to source the environment variables using " 
> source /opt/cp2k-toolchain/install/setup". This does not work anymore. Is 
> it a problem on the image's end or on the cluster end ?
>
> Best,
>
> Quentin
>
>  
>
>  
>
> Le vendredi 6 octobre 2023 à 11:27:26 UTC+2, Krack Matthias a écrit :
>
> Hi Quentin
>
>  
>
> There are some more cp2k 2023.2 docker containers for production 
> <https://github.com/mkrack/cp2k/tree/master/tools/docker/production> 
> available (build with MPICH or OpenMPI) which can also be pulled with 
> apptainer (see READ.me 
> <https://github.com/mkrack/cp2k/blob/master/tools/docker/production/README.md> 
> for details). Maybe, you have more luck with one of these.
>
>  
>
> Best
>
>  
>
> Matthias
>
>  
>
> *From: *cp... at googlegroups.com <cp... at googlegroups.com> on behalf of 
> Quentin Pessemesse <q.pess... at gmail.com>
> *Date: *Friday, 6 October 2023 at 10:47
> *To: *cp2k <cp... at googlegroups.com>
> *Subject: *Re: [CP2K:19310] Memory Leak on CP2k 9.1
>
> Dear all, 
>
> The cluster staff has moved to using a docker with a CP2k image, with CP2k 
> 2023.1 (https://hub.docker.com/r/cp2k/cp2k/tags). The program experiences 
> serious memory leaks (out-of-memory crash after less than 24 hours on AIMD 
> with a system less than 100 atoms with 256 GB). The cluster cannot use 
> intelmpi versions older than intelmpi 20. Is there a more recent version of 
> CP2k which is stable and does not experience this type of large memory 
> leaks?
>
> We've tried to compile our own versions of CP2k with multiple versions of 
> openMPI to no avail. The only stable CP2k version we have is CP2k 6.1, 
> which is used with intelMPI 18 but it is on a legacy container where no new 
> software can be installed.
>
> Has anyone managed to use this docker image succesfully, and if so, which 
> MPI package/version have you used ? If necessary, we can downgrade down to 
> CP2k 9.1.
>
> Best,
>
> Quentin
>
>  
>
> Le mercredi 5 octobre 2022 à 13:19:26 UTC+2, Krack Matthias (PSI) a écrit :
>
> Hi Quentin
>
>  
>
> It seems that you are using OpenMPI which is known to have leaks in some 
> versions. Check this issue 
> <https://github.com/cp2k/cp2k/issues/1830#issuecomment-1012561166> and this 
> discussion <https://groups.google.com/g/cp2k/c/BJ9c21ey0Ls/m/2UDxnhBRAQAJ> 
> here on this forum for further information.
>
>  
>
> HTH
>
>  
>
> Matthias 
>
>  
>
> *From: *"cp... at googlegroups.com" <cp... at googlegroups.com> on behalf of 
> Quentin Pessemesse <q.pess... at gmail.com>
> *Reply to: *"cp... at googlegroups.com" <cp... at googlegroups.com>
> *Date: *Wednesday, 5 October 2022 at 12:39
> *To: *"cp... at googlegroups.com" <cp... at googlegroups.com>
> *Subject: *[CP2K:17807] Memory Leak on CP2k 9.1
>
>  
>
> Dear all, 
>
> Our group is encountering a memory leak issue that makes running DFT-MD 
> impossible with large systems (~100 atoms) on one of the clusters we have 
> access to, even though the same calculations run correctly on other 
> machines.
>
> The cluster support sent me the following valgrind output and asked me to 
> find suggestions on how to proceed. Does anyone have input on how to deal 
> with such memory leaks ?
>
> Best,
>
> Quentin P.
>
>  
>
> ==62== Invalid write of size 4 ==62== at 0x1EA9887: 
> grid_ref_create_task_list (in 
> /ccc/products2/cp2k-9.1/Rhel_8__x86_64/gcc--8.3.0__openmpi--4.0.1/plumed/bin/cp2k.psmp) 
> ==62== by 0x1E7A772: grid_create_task_list (in 
> /ccc/products2/cp2k-9.1/Rhel_8__x86_64/gcc--8.3.0__openmpi--4.0.1/plumed/bin/cp2k.psmp) 
> ==62== by 0x1E790B3: __grid_api_MOD_grid_create_task_list (grid_api.F:938) 
> ==62== by 0x104AA67: __task_list_methods_MOD_generate_qs_task_list 
> (task_list_methods.F:623) ==62== by 0xF58353: 
> __qs_update_s_mstruct_MOD_qs_env_update_s_mstruct 
> (qs_update_s_mstruct.F:187) ==62== by 0xCC03AB: 
> __qs_energy_init_MOD_qs_energies_init (qs_energy_init.F:311) ==62== by 
> 0xCBF0A1: __qs_energy_MOD_qs_energies (qs_energy.F:84) ==62== by 0xCE087E: 
> __qs_force_MOD_qs_forces (qs_force.F:212) ==62== by 0xCE4349: 
> __qs_force_MOD_qs_calc_energy_force (qs_force.F:117) ==62== by 0x9AE2C0: 
> __force_env_methods_MOD_force_env_calc_energy_force 
> (force_env_methods.F:271) ==62== by 0x50CD0C: __md_run_MOD_qs_mol_dyn_low 
> (md_run.F:372) ==62== by 0x50DCF2: __md_run_MOD_qs_mol_dyn (md_run.F:153) 
> ==62== Address 0x26d18670 is 16 bytes before a block of size 10 free'd 
> ==62== at 0x4C35FAC: free (vg_replace_malloc.c:538) ==62== by 0x2B73E68: 
> __offload_api_MOD_offload_timeset (offload_api.F:137) ==62== by 0x2B60EDA: 
> __timings_MOD_timeset_handler (timings.F:278) ==62== by 0x2BE2C6D: 
> __message_passing_MOD_mp_waitany (message_passing.F:4597) ==62== by 
> 0x2963EA5: __realspace_grid_types_MOD_rs_pw_transfer_distributed 
> (realspace_grid_types.F:1439) ==62== by 0x2966559: 
> __realspace_grid_types_MOD_rs_pw_transfer (realspace_grid_types.F:711) 
> ==62== by 0xC9310B: __qs_collocate_density_MOD_calculate_rho_core 
> (qs_collocate_density.F:966) ==62== by 0xF57698: 
> __qs_update_s_mstruct_MOD_qs_env_update_s_mstruct 
> (qs_update_s_mstruct.F:109) ==62== by 0xCC03AB: 
> __qs_energy_init_MOD_qs_energies_init (qs_energy_init.F:311) ==62== by 
> 0xCBF0A1: __qs_energy_MOD_qs_energies (qs_energy.F:84) ==62== by 0xCE087E: 
> __qs_force_MOD_qs_forces (qs_force.F:212) ==62== by 0xCE4349: 
> __qs_force_MOD_qs_calc_energy_force (qs_force.F:117) ==62== Block was 
> alloc'd at ==62== at 0x4C34DFF: malloc (vg_replace_malloc.c:307) ==62== by 
> 0x2F21116: _gfortrani_xmallocarray (memory.c:66) ==62== by 0x2F1C271: 
> _gfortran_string_trim (string_intrinsics_inc.c:167) ==62== by 0x2B73E1C: 
> __offload_api_MOD_offload_timeset (offload_api.F:137) ==62== by 0x2B60EDA: 
> __timings_MOD_timeset_handler (timings.F:278) ==62== by 0x2BE2C6D: 
> __message_passing_MOD_mp_waitany (message_passing.F:4597) ==62== by 
> 0x2963EA5: __realspace_grid_types_MOD_rs_pw_transfer_distributed 
> (realspace_grid_types.F:1439) ==62== by 0x2966559: 
> __realspace_grid_types_MOD_rs_pw_transfer (realspace_grid_types.F:711) 
> ==62== by 0xC9310B: __qs_collocate_density_MOD_calculate_rho_core 
> (qs_collocate_density.F:966) ==62== by 0xF57698: 
> __qs_update_s_mstruct_MOD_qs_env_update_s_mstruct 
> (qs_update_s_mstruct.F:109) ==62== by 0xCC03AB: 
> __qs_energy_init_MOD_qs_energies_init (qs_energy_init.F:311) ==62== by 
> 0xCBF0A1: __qs_energy_MOD_qs_energies (qs_energy.F:84) 
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "cp2k" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to cp2k+uns... at googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/cp2k/4d753954-6ada-4d19-9092-931469136f8en%40googlegroups.com 
> <https://groups.google.com/d/msgid/cp2k/4d753954-6ada-4d19-9092-931469136f8en%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "cp2k" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to cp2k+uns... at googlegroups.com.
>
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/cp2k/c7060174-3f71-4179-8e70-ce5f179859a4n%40googlegroups.com 
> <https://groups.google.com/d/msgid/cp2k/c7060174-3f71-4179-8e70-ce5f179859a4n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "cp2k" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to cp2k+uns... at googlegroups.com.
>
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/cp2k/f282de9c-e731-4232-838c-96139638c9bcn%40googlegroups.com 
> <https://groups.google.com/d/msgid/cp2k/f282de9c-e731-4232-838c-96139638c9bcn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+unsubscribe at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/5d62512a-817a-4566-be6c-e47a52810222n%40googlegroups.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20231109/002076a1/attachment-0001.htm>


More information about the CP2K-user mailing list