[CP2K-user] [CP2K:19312] Memory Leak on CP2k 9.1
Quentin Pessemesse
q.pessemesse at gmail.com
Fri Oct 6 12:59:02 UTC 2023
Dear Matthias,
Thank you kindly for your advice, I will try these different versions as
soon as possible.
I've built the docker image for an openmpi version of CP2k on the cluster.
With version 2023.1, I used to source the environment variables using "
source /opt/cp2k-toolchain/install/setup". This does not work anymore. Is
it a problem on the image's end or on the cluster end ?
Best,
Quentin
Le vendredi 6 octobre 2023 à 11:27:26 UTC+2, Krack Matthias a écrit :
> Hi Quentin
>
>
>
> There are some more cp2k 2023.2 docker containers for production
> <https://github.com/mkrack/cp2k/tree/master/tools/docker/production>
> available (build with MPICH or OpenMPI) which can also be pulled with
> apptainer (see READ.me
> <https://github.com/mkrack/cp2k/blob/master/tools/docker/production/README.md>
> for details). Maybe, you have more luck with one of these.
>
>
>
> Best
>
>
>
> Matthias
>
>
>
> *From: *cp... at googlegroups.com <cp... at googlegroups.com> on behalf of
> Quentin Pessemesse <q.pess... at gmail.com>
> *Date: *Friday, 6 October 2023 at 10:47
> *To: *cp2k <cp... at googlegroups.com>
> *Subject: *Re: [CP2K:19310] Memory Leak on CP2k 9.1
>
> Dear all,
>
> The cluster staff has moved to using a docker with a CP2k image, with CP2k
> 2023.1 (https://hub.docker.com/r/cp2k/cp2k/tags). The program experiences
> serious memory leaks (out-of-memory crash after less than 24 hours on AIMD
> with a system less than 100 atoms with 256 GB). The cluster cannot use
> intelmpi versions older than intelmpi 20. Is there a more recent version of
> CP2k which is stable and does not experience this type of large memory
> leaks?
>
> We've tried to compile our own versions of CP2k with multiple versions of
> openMPI to no avail. The only stable CP2k version we have is CP2k 6.1,
> which is used with intelMPI 18 but it is on a legacy container where no new
> software can be installed.
>
> Has anyone managed to use this docker image succesfully, and if so, which
> MPI package/version have you used ? If necessary, we can downgrade down to
> CP2k 9.1.
>
> Best,
>
> Quentin
>
>
>
> Le mercredi 5 octobre 2022 à 13:19:26 UTC+2, Krack Matthias (PSI) a écrit :
>
> Hi Quentin
>
>
>
> It seems that you are using OpenMPI which is known to have leaks in some
> versions. Check this issue
> <https://github.com/cp2k/cp2k/issues/1830#issuecomment-1012561166> and this
> discussion <https://groups.google.com/g/cp2k/c/BJ9c21ey0Ls/m/2UDxnhBRAQAJ>
> here on this forum for further information.
>
>
>
> HTH
>
>
>
> Matthias
>
>
>
> *From: *"cp... at googlegroups.com" <cp... at googlegroups.com> on behalf of
> Quentin Pessemesse <q.pess... at gmail.com>
> *Reply to: *"cp... at googlegroups.com" <cp... at googlegroups.com>
> *Date: *Wednesday, 5 October 2022 at 12:39
> *To: *"cp... at googlegroups.com" <cp... at googlegroups.com>
> *Subject: *[CP2K:17807] Memory Leak on CP2k 9.1
>
>
>
> Dear all,
>
> Our group is encountering a memory leak issue that makes running DFT-MD
> impossible with large systems (~100 atoms) on one of the clusters we have
> access to, even though the same calculations run correctly on other
> machines.
>
> The cluster support sent me the following valgrind output and asked me to
> find suggestions on how to proceed. Does anyone have input on how to deal
> with such memory leaks ?
>
> Best,
>
> Quentin P.
>
>
>
> ==62== Invalid write of size 4 ==62== at 0x1EA9887:
> grid_ref_create_task_list (in
> /ccc/products2/cp2k-9.1/Rhel_8__x86_64/gcc--8.3.0__openmpi--4.0.1/plumed/bin/cp2k.psmp)
> ==62== by 0x1E7A772: grid_create_task_list (in
> /ccc/products2/cp2k-9.1/Rhel_8__x86_64/gcc--8.3.0__openmpi--4.0.1/plumed/bin/cp2k.psmp)
> ==62== by 0x1E790B3: __grid_api_MOD_grid_create_task_list (grid_api.F:938)
> ==62== by 0x104AA67: __task_list_methods_MOD_generate_qs_task_list
> (task_list_methods.F:623) ==62== by 0xF58353:
> __qs_update_s_mstruct_MOD_qs_env_update_s_mstruct
> (qs_update_s_mstruct.F:187) ==62== by 0xCC03AB:
> __qs_energy_init_MOD_qs_energies_init (qs_energy_init.F:311) ==62== by
> 0xCBF0A1: __qs_energy_MOD_qs_energies (qs_energy.F:84) ==62== by 0xCE087E:
> __qs_force_MOD_qs_forces (qs_force.F:212) ==62== by 0xCE4349:
> __qs_force_MOD_qs_calc_energy_force (qs_force.F:117) ==62== by 0x9AE2C0:
> __force_env_methods_MOD_force_env_calc_energy_force
> (force_env_methods.F:271) ==62== by 0x50CD0C: __md_run_MOD_qs_mol_dyn_low
> (md_run.F:372) ==62== by 0x50DCF2: __md_run_MOD_qs_mol_dyn (md_run.F:153)
> ==62== Address 0x26d18670 is 16 bytes before a block of size 10 free'd
> ==62== at 0x4C35FAC: free (vg_replace_malloc.c:538) ==62== by 0x2B73E68:
> __offload_api_MOD_offload_timeset (offload_api.F:137) ==62== by 0x2B60EDA:
> __timings_MOD_timeset_handler (timings.F:278) ==62== by 0x2BE2C6D:
> __message_passing_MOD_mp_waitany (message_passing.F:4597) ==62== by
> 0x2963EA5: __realspace_grid_types_MOD_rs_pw_transfer_distributed
> (realspace_grid_types.F:1439) ==62== by 0x2966559:
> __realspace_grid_types_MOD_rs_pw_transfer (realspace_grid_types.F:711)
> ==62== by 0xC9310B: __qs_collocate_density_MOD_calculate_rho_core
> (qs_collocate_density.F:966) ==62== by 0xF57698:
> __qs_update_s_mstruct_MOD_qs_env_update_s_mstruct
> (qs_update_s_mstruct.F:109) ==62== by 0xCC03AB:
> __qs_energy_init_MOD_qs_energies_init (qs_energy_init.F:311) ==62== by
> 0xCBF0A1: __qs_energy_MOD_qs_energies (qs_energy.F:84) ==62== by 0xCE087E:
> __qs_force_MOD_qs_forces (qs_force.F:212) ==62== by 0xCE4349:
> __qs_force_MOD_qs_calc_energy_force (qs_force.F:117) ==62== Block was
> alloc'd at ==62== at 0x4C34DFF: malloc (vg_replace_malloc.c:307) ==62== by
> 0x2F21116: _gfortrani_xmallocarray (memory.c:66) ==62== by 0x2F1C271:
> _gfortran_string_trim (string_intrinsics_inc.c:167) ==62== by 0x2B73E1C:
> __offload_api_MOD_offload_timeset (offload_api.F:137) ==62== by 0x2B60EDA:
> __timings_MOD_timeset_handler (timings.F:278) ==62== by 0x2BE2C6D:
> __message_passing_MOD_mp_waitany (message_passing.F:4597) ==62== by
> 0x2963EA5: __realspace_grid_types_MOD_rs_pw_transfer_distributed
> (realspace_grid_types.F:1439) ==62== by 0x2966559:
> __realspace_grid_types_MOD_rs_pw_transfer (realspace_grid_types.F:711)
> ==62== by 0xC9310B: __qs_collocate_density_MOD_calculate_rho_core
> (qs_collocate_density.F:966) ==62== by 0xF57698:
> __qs_update_s_mstruct_MOD_qs_env_update_s_mstruct
> (qs_update_s_mstruct.F:109) ==62== by 0xCC03AB:
> __qs_energy_init_MOD_qs_energies_init (qs_energy_init.F:311) ==62== by
> 0xCBF0A1: __qs_energy_MOD_qs_energies (qs_energy.F:84)
>
> --
> You received this message because you are subscribed to the Google Groups
> "cp2k" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cp2k+uns... at googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/cp2k/4d753954-6ada-4d19-9092-931469136f8en%40googlegroups.com
> <https://groups.google.com/d/msgid/cp2k/4d753954-6ada-4d19-9092-931469136f8en%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> --
> You received this message because you are subscribed to the Google Groups
> "cp2k" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cp2k+uns... at googlegroups.com.
>
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/cp2k/c7060174-3f71-4179-8e70-ce5f179859a4n%40googlegroups.com
> <https://groups.google.com/d/msgid/cp2k/c7060174-3f71-4179-8e70-ce5f179859a4n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
--
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+unsubscribe at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/f282de9c-e731-4232-838c-96139638c9bcn%40googlegroups.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20231006/64a57164/attachment-0001.htm>
More information about the CP2K-user
mailing list