[CP2K-user] [CP2K:19486] Memory Leak on CP2k 9.1
Quentin Pessemesse
q.pessemesse at gmail.com
Thu Nov 9 15:18:53 UTC 2023
Dear Matthias,
Do you know if there is a easy way to use plumed modules beyond the
default ones with these docker images ? Some modules I need are
deactivated by default. I have no experience in editing docker images,
and this may rather be a question for the plumed forum...
Thank you again,
Q.
On 09/11/2023 15:57, Krack Matthias wrote:
>
> Dear Quentin
>
> I am glad to read that you could solve the problems by using the
> provided container images.
>
> For future reference, I would like the mention that the README and the
> docker files for CP2K production containers have been moved to a
> separate repository on GitHub called cp2k-containers
> <https://github.com/cp2k/cp2k-containers>, recently. Therefore, the
> links in my previous mail are no longer valid. The CP2K production
> containers can be download from DockerHub here
> <https://hub.docker.com/r/cp2k/cp2k/tags/>.
>
> Best
>
> Matthias
>
> *From: *cp2k at googlegroups.com <cp2k at googlegroups.com> on behalf of
> Quentin Pessemesse <q.pessemesse at gmail.com>
> *Date: *Thursday, 9 November 2023 at 15:33
> *To: *cp2k <cp2k at googlegroups.com>
> *Subject: *Re: [CP2K:19485] Memory Leak on CP2k 9.1
>
> Dear Matthias,
>
> Thank you very much for your help, we were able to solve the memory
> leak issue by using the mpi that is compiled with the image, with some
> modifications to the command you provided to account for specificities
> of the cluster.
>
> Best,
>
> Quentin
>
> Le vendredi 6 octobre 2023 à 15:16:09 UTC+2, Krack Matthias a écrit :
>
> Dear Quentin
>
> These containers are built differently following the usual cp2k
> toolchain installation process. There is no
> /opt/cp2k-toolchain/folder but the folder
> /opt/cp2k/tools/toolchain/install/. There is no need, however, to
> source that setup file, because the entrypoint.sh
> <https://github.com/mkrack/cp2k/blob/master/tools/docker/production/Dockerfile.2023.2_mpich_generic_psmp#L83>
> script takes already care of that. You should be able to run the
> container as described in theREAD.me
> <https://github.com/mkrack/cp2k/blob/master/tools/docker/production/README.md>.
>
> Best
>
> Matthias
>
> *From: *cp... at googlegroups.com <cp... at googlegroups.com> on behalf
> of Quentin Pessemesse <q.pess... at gmail.com>
> *Date: *Friday, 6 October 2023 at 14:59
> *To: *cp2k <cp... at googlegroups.com>
> *Subject: *Re: [CP2K:19312] Memory Leak on CP2k 9.1
>
> Dear Matthias,
>
> Thank you kindly for your advice, I will try these different
> versions as soon as possible.
>
> I've built the docker image for an openmpi version of CP2k on the
> cluster. With version 2023.1, I used to source the environment
> variables using " source /opt/cp2k-toolchain/install/setup". This
> does not work anymore. Is it a problem on the image's end or on
> the cluster end ?
>
> Best,
>
> Quentin
>
> Le vendredi 6 octobre 2023 à 11:27:26 UTC+2, Krack Matthias a écrit :
>
> Hi Quentin
>
> There are some more cp2k 2023.2 docker containers for
> production
> <https://github.com/mkrack/cp2k/tree/master/tools/docker/production>
> available (build with MPICH or OpenMPI) which can also be
> pulled with apptainer (see READ.me
> <https://github.com/mkrack/cp2k/blob/master/tools/docker/production/README.md>
> for details). Maybe, you have more luck with one of these.
>
> Best
>
> Matthias
>
> *From: *cp... at googlegroups.com <cp... at googlegroups.com> on
> behalf of Quentin Pessemesse <q.pess... at gmail.com>
> *Date: *Friday, 6 October 2023 at 10:47
> *To: *cp2k <cp... at googlegroups.com>
> *Subject: *Re: [CP2K:19310] Memory Leak on CP2k 9.1
>
> Dear all,
>
> The cluster staff has moved to using a docker with a CP2k
> image, with CP2k 2023.1
> (https://hub.docker.com/r/cp2k/cp2k/tags). The program
> experiences serious memory leaks (out-of-memory crash after
> less than 24 hours on AIMD with a system less than 100 atoms
> with 256 GB). The cluster cannot use intelmpi versions older
> than intelmpi 20. Is there a more recent version of CP2k which
> is stable and does not experience this type of large memory leaks?
>
> We've tried to compile our own versions of CP2k with multiple
> versions of openMPI to no avail. The only stable CP2k version
> we have is CP2k 6.1, which is used with intelMPI 18 but it is
> on a legacy container where no new software can be installed.
>
> Has anyone managed to use this docker image succesfully, and
> if so, which MPI package/version have you used ? If necessary,
> we can downgrade down to CP2k 9.1.
>
> Best,
>
> Quentin
>
> Le mercredi 5 octobre 2022 à 13:19:26 UTC+2, Krack Matthias
> (PSI) a écrit :
>
> Hi Quentin
>
> It seems that you are using OpenMPI which is known to have
> leaks in some versions. Check this issue
> <https://github.com/cp2k/cp2k/issues/1830#issuecomment-1012561166>
> and this discussion
> <https://groups.google.com/g/cp2k/c/BJ9c21ey0Ls/m/2UDxnhBRAQAJ>
> here on this forum for further information.
>
> HTH
>
> Matthias
>
> *From: *"cp... at googlegroups.com" <cp... at googlegroups.com>
> on behalf of Quentin Pessemesse <q.pess... at gmail.com>
> *Reply to: *"cp... at googlegroups.com" <cp... at googlegroups.com>
> *Date: *Wednesday, 5 October 2022 at 12:39
> *To: *"cp... at googlegroups.com" <cp... at googlegroups.com>
> *Subject: *[CP2K:17807] Memory Leak on CP2k 9.1
>
> Dear all,
>
> Our group is encountering a memory leak issue that makes
> running DFT-MD impossible with large systems (~100 atoms)
> on one of the clusters we have access to, even though the
> same calculations run correctly on other machines.
>
> The cluster support sent me the following valgrind output
> and asked me to find suggestions on how to proceed. Does
> anyone have input on how to deal with such memory leaks ?
>
> Best,
>
> Quentin P.
>
> ==62== Invalid write of size 4 ==62== at 0x1EA9887:
> grid_ref_create_task_list (in
> /ccc/products2/cp2k-9.1/Rhel_8__x86_64/gcc--8.3.0__openmpi--4.0.1/plumed/bin/cp2k.psmp)
> ==62== by 0x1E7A772: grid_create_task_list (in
> /ccc/products2/cp2k-9.1/Rhel_8__x86_64/gcc--8.3.0__openmpi--4.0.1/plumed/bin/cp2k.psmp)
> ==62== by 0x1E790B3: __grid_api_MOD_grid_create_task_list
> (grid_api.F:938) ==62== by 0x104AA67:
> __task_list_methods_MOD_generate_qs_task_list
> (task_list_methods.F:623) ==62== by 0xF58353:
> __qs_update_s_mstruct_MOD_qs_env_update_s_mstruct
> (qs_update_s_mstruct.F:187) ==62== by 0xCC03AB:
> __qs_energy_init_MOD_qs_energies_init
> (qs_energy_init.F:311) ==62== by 0xCBF0A1:
> __qs_energy_MOD_qs_energies (qs_energy.F:84) ==62== by
> 0xCE087E: __qs_force_MOD_qs_forces (qs_force.F:212) ==62==
> by 0xCE4349: __qs_force_MOD_qs_calc_energy_force
> (qs_force.F:117) ==62== by 0x9AE2C0:
> __force_env_methods_MOD_force_env_calc_energy_force
> (force_env_methods.F:271) ==62== by 0x50CD0C:
> __md_run_MOD_qs_mol_dyn_low (md_run.F:372) ==62== by
> 0x50DCF2: __md_run_MOD_qs_mol_dyn (md_run.F:153) ==62==
> Address 0x26d18670 is 16 bytes before a block of size 10
> free'd ==62== at 0x4C35FAC: free (vg_replace_malloc.c:538)
> ==62== by 0x2B73E68: __offload_api_MOD_offload_timeset
> (offload_api.F:137) ==62== by 0x2B60EDA:
> __timings_MOD_timeset_handler (timings.F:278) ==62== by
> 0x2BE2C6D: __message_passing_MOD_mp_waitany
> (message_passing.F:4597) ==62== by 0x2963EA5:
> __realspace_grid_types_MOD_rs_pw_transfer_distributed
> (realspace_grid_types.F:1439) ==62== by 0x2966559:
> __realspace_grid_types_MOD_rs_pw_transfer
> (realspace_grid_types.F:711) ==62== by 0xC9310B:
> __qs_collocate_density_MOD_calculate_rho_core
> (qs_collocate_density.F:966) ==62== by 0xF57698:
> __qs_update_s_mstruct_MOD_qs_env_update_s_mstruct
> (qs_update_s_mstruct.F:109) ==62== by 0xCC03AB:
> __qs_energy_init_MOD_qs_energies_init
> (qs_energy_init.F:311) ==62== by 0xCBF0A1:
> __qs_energy_MOD_qs_energies (qs_energy.F:84) ==62== by
> 0xCE087E: __qs_force_MOD_qs_forces (qs_force.F:212) ==62==
> by 0xCE4349: __qs_force_MOD_qs_calc_energy_force
> (qs_force.F:117) ==62== Block was alloc'd at ==62== at
> 0x4C34DFF: malloc (vg_replace_malloc.c:307) ==62== by
> 0x2F21116: _gfortrani_xmallocarray (memory.c:66) ==62== by
> 0x2F1C271: _gfortran_string_trim
> (string_intrinsics_inc.c:167) ==62== by 0x2B73E1C:
> __offload_api_MOD_offload_timeset (offload_api.F:137)
> ==62== by 0x2B60EDA: __timings_MOD_timeset_handler
> (timings.F:278) ==62== by 0x2BE2C6D:
> __message_passing_MOD_mp_waitany (message_passing.F:4597)
> ==62== by 0x2963EA5:
> __realspace_grid_types_MOD_rs_pw_transfer_distributed
> (realspace_grid_types.F:1439) ==62== by 0x2966559:
> __realspace_grid_types_MOD_rs_pw_transfer
> (realspace_grid_types.F:711) ==62== by 0xC9310B:
> __qs_collocate_density_MOD_calculate_rho_core
> (qs_collocate_density.F:966) ==62== by 0xF57698:
> __qs_update_s_mstruct_MOD_qs_env_update_s_mstruct
> (qs_update_s_mstruct.F:109) ==62== by 0xCC03AB:
> __qs_energy_init_MOD_qs_energies_init
> (qs_energy_init.F:311) ==62== by 0xCBF0A1:
> __qs_energy_MOD_qs_energies (qs_energy.F:84)
>
> --
> You received this message because you are subscribed to
> the Google Groups "cp2k" group.
> To unsubscribe from this group and stop receiving emails
> from it, send an email to cp2k+uns... at googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/cp2k/4d753954-6ada-4d19-9092-931469136f8en%40googlegroups.com
> <https://groups.google.com/d/msgid/cp2k/4d753954-6ada-4d19-9092-931469136f8en%40googlegroups.com?utm_medium=email&utm_source=footer>.
>
> --
> You received this message because you are subscribed to the
> Google Groups "cp2k" group.
> To unsubscribe from this group and stop receiving emails from
> it, send an email to cp2k+uns... at googlegroups.com.
>
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/cp2k/c7060174-3f71-4179-8e70-ce5f179859a4n%40googlegroups.com
> <https://groups.google.com/d/msgid/cp2k/c7060174-3f71-4179-8e70-ce5f179859a4n%40googlegroups.com?utm_medium=email&utm_source=footer>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "cp2k" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to cp2k+uns... at googlegroups.com.
>
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/cp2k/f282de9c-e731-4232-838c-96139638c9bcn%40googlegroups.com
> <https://groups.google.com/d/msgid/cp2k/f282de9c-e731-4232-838c-96139638c9bcn%40googlegroups.com?utm_medium=email&utm_source=footer>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "cp2k" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to cp2k+unsubscribe at googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/cp2k/5d62512a-817a-4566-be6c-e47a52810222n%40googlegroups.com
> <https://groups.google.com/d/msgid/cp2k/5d62512a-817a-4566-be6c-e47a52810222n%40googlegroups.com?utm_medium=email&utm_source=footer>.
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "cp2k" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/cp2k/uEodASnEFfQ/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> cp2k+unsubscribe at googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/cp2k/ZRAP278MB0827795C76D23FF284B7D273F4AFA%40ZRAP278MB0827.CHEP278.PROD.OUTLOOK.COM
> <https://groups.google.com/d/msgid/cp2k/ZRAP278MB0827795C76D23FF284B7D273F4AFA%40ZRAP278MB0827.CHEP278.PROD.OUTLOOK.COM?utm_medium=email&utm_source=footer>.
--
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+unsubscribe at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/a0274c5c-8f74-4673-bf72-9347d8780773%40gmail.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20231109/d21942de/attachment-0001.htm>
More information about the CP2K-user
mailing list