[CP2K-user] [CP2K:19486] Memory Leak on CP2k 9.1

Quentin Pessemesse q.pessemesse at gmail.com
Thu Nov 9 15:18:53 UTC 2023


Dear Matthias,

Do you know if there is a easy way to use plumed modules beyond the 
default ones with these docker images ? Some modules I need are 
deactivated by default. I have no experience in editing docker images, 
and this may rather be a question for the plumed forum...

Thank you again,

Q.

On 09/11/2023 15:57, Krack Matthias wrote:
>
> Dear Quentin
>
> I am glad to read that you could solve the problems by using the 
> provided container images.
>
> For future reference, I would like the mention that the README and the 
> docker files for CP2K production containers have been moved to a 
> separate repository on GitHub called cp2k-containers 
> <https://github.com/cp2k/cp2k-containers>, recently. Therefore, the 
> links in my previous mail are no longer valid. The CP2K production 
> containers can be download from DockerHub here 
> <https://hub.docker.com/r/cp2k/cp2k/tags/>.
>
> Best
>
> Matthias
>
> *From: *cp2k at googlegroups.com <cp2k at googlegroups.com> on behalf of 
> Quentin Pessemesse <q.pessemesse at gmail.com>
> *Date: *Thursday, 9 November 2023 at 15:33
> *To: *cp2k <cp2k at googlegroups.com>
> *Subject: *Re: [CP2K:19485] Memory Leak on CP2k 9.1
>
> Dear Matthias,
>
> Thank you very much for your help, we were able to solve the memory 
> leak issue by using the mpi that is compiled with the image, with some 
> modifications to the command you provided to account for specificities 
> of the cluster.
>
> Best,
>
> Quentin
>
> Le vendredi 6 octobre 2023 à 15:16:09 UTC+2, Krack Matthias a écrit :
>
>     Dear Quentin
>
>     These containers are built differently following the usual cp2k
>     toolchain installation process. There is no
>     /opt/cp2k-toolchain/folder but the folder
>     /opt/cp2k/tools/toolchain/install/. There is no need, however, to
>     source that setup file, because the entrypoint.sh
>     <https://github.com/mkrack/cp2k/blob/master/tools/docker/production/Dockerfile.2023.2_mpich_generic_psmp#L83>
>     script takes already care of that. You should be able to run the
>     container as described in theREAD.me
>     <https://github.com/mkrack/cp2k/blob/master/tools/docker/production/README.md>.
>
>     Best
>
>     Matthias
>
>     *From: *cp... at googlegroups.com <cp... at googlegroups.com> on behalf
>     of Quentin Pessemesse <q.pess... at gmail.com>
>     *Date: *Friday, 6 October 2023 at 14:59
>     *To: *cp2k <cp... at googlegroups.com>
>     *Subject: *Re: [CP2K:19312] Memory Leak on CP2k 9.1
>
>     Dear Matthias,
>
>     Thank you kindly for your advice, I will try these different
>     versions as soon as possible.
>
>     I've built the docker image for an openmpi version of CP2k on the
>     cluster. With version 2023.1, I used to source the environment
>     variables using " source /opt/cp2k-toolchain/install/setup". This
>     does not work anymore. Is it a problem on the image's end or on
>     the cluster end ?
>
>     Best,
>
>     Quentin
>
>     Le vendredi 6 octobre 2023 à 11:27:26 UTC+2, Krack Matthias a écrit :
>
>         Hi Quentin
>
>         There are some more cp2k 2023.2 docker containers for
>         production
>         <https://github.com/mkrack/cp2k/tree/master/tools/docker/production>
>         available (build with MPICH or OpenMPI) which can also be
>         pulled with apptainer (see READ.me
>         <https://github.com/mkrack/cp2k/blob/master/tools/docker/production/README.md>
>         for details). Maybe, you have more luck with one of these.
>
>         Best
>
>         Matthias
>
>         *From: *cp... at googlegroups.com <cp... at googlegroups.com> on
>         behalf of Quentin Pessemesse <q.pess... at gmail.com>
>         *Date: *Friday, 6 October 2023 at 10:47
>         *To: *cp2k <cp... at googlegroups.com>
>         *Subject: *Re: [CP2K:19310] Memory Leak on CP2k 9.1
>
>         Dear all,
>
>         The cluster staff has moved to using a docker with a CP2k
>         image, with CP2k 2023.1
>         (https://hub.docker.com/r/cp2k/cp2k/tags). The program
>         experiences serious memory leaks (out-of-memory crash after
>         less than 24 hours on AIMD with a system less than 100 atoms
>         with 256 GB). The cluster cannot use intelmpi versions older
>         than intelmpi 20. Is there a more recent version of CP2k which
>         is stable and does not experience this type of large memory leaks?
>
>         We've tried to compile our own versions of CP2k with multiple
>         versions of openMPI to no avail. The only stable CP2k version
>         we have is CP2k 6.1, which is used with intelMPI 18 but it is
>         on a legacy container where no new software can be installed.
>
>         Has anyone managed to use this docker image succesfully, and
>         if so, which MPI package/version have you used ? If necessary,
>         we can downgrade down to CP2k 9.1.
>
>         Best,
>
>         Quentin
>
>         Le mercredi 5 octobre 2022 à 13:19:26 UTC+2, Krack Matthias
>         (PSI) a écrit :
>
>             Hi Quentin
>
>             It seems that you are using OpenMPI which is known to have
>             leaks in some versions. Check this issue
>             <https://github.com/cp2k/cp2k/issues/1830#issuecomment-1012561166>
>             and this discussion
>             <https://groups.google.com/g/cp2k/c/BJ9c21ey0Ls/m/2UDxnhBRAQAJ>
>             here on this forum for further information.
>
>             HTH
>
>             Matthias
>
>             *From: *"cp... at googlegroups.com" <cp... at googlegroups.com>
>             on behalf of Quentin Pessemesse <q.pess... at gmail.com>
>             *Reply to: *"cp... at googlegroups.com" <cp... at googlegroups.com>
>             *Date: *Wednesday, 5 October 2022 at 12:39
>             *To: *"cp... at googlegroups.com" <cp... at googlegroups.com>
>             *Subject: *[CP2K:17807] Memory Leak on CP2k 9.1
>
>             Dear all,
>
>             Our group is encountering a memory leak issue that makes
>             running DFT-MD impossible with large systems (~100 atoms)
>             on one of the clusters we have access to, even though the
>             same calculations run correctly on other machines.
>
>             The cluster support sent me the following valgrind output
>             and asked me to find suggestions on how to proceed. Does
>             anyone have input on how to deal with such memory leaks ?
>
>             Best,
>
>             Quentin P.
>
>             ==62== Invalid write of size 4 ==62== at 0x1EA9887:
>             grid_ref_create_task_list (in
>             /ccc/products2/cp2k-9.1/Rhel_8__x86_64/gcc--8.3.0__openmpi--4.0.1/plumed/bin/cp2k.psmp)
>             ==62== by 0x1E7A772: grid_create_task_list (in
>             /ccc/products2/cp2k-9.1/Rhel_8__x86_64/gcc--8.3.0__openmpi--4.0.1/plumed/bin/cp2k.psmp)
>             ==62== by 0x1E790B3: __grid_api_MOD_grid_create_task_list
>             (grid_api.F:938) ==62== by 0x104AA67:
>             __task_list_methods_MOD_generate_qs_task_list
>             (task_list_methods.F:623) ==62== by 0xF58353:
>             __qs_update_s_mstruct_MOD_qs_env_update_s_mstruct
>             (qs_update_s_mstruct.F:187) ==62== by 0xCC03AB:
>             __qs_energy_init_MOD_qs_energies_init
>             (qs_energy_init.F:311) ==62== by 0xCBF0A1:
>             __qs_energy_MOD_qs_energies (qs_energy.F:84) ==62== by
>             0xCE087E: __qs_force_MOD_qs_forces (qs_force.F:212) ==62==
>             by 0xCE4349: __qs_force_MOD_qs_calc_energy_force
>             (qs_force.F:117) ==62== by 0x9AE2C0:
>             __force_env_methods_MOD_force_env_calc_energy_force
>             (force_env_methods.F:271) ==62== by 0x50CD0C:
>             __md_run_MOD_qs_mol_dyn_low (md_run.F:372) ==62== by
>             0x50DCF2: __md_run_MOD_qs_mol_dyn (md_run.F:153) ==62==
>             Address 0x26d18670 is 16 bytes before a block of size 10
>             free'd ==62== at 0x4C35FAC: free (vg_replace_malloc.c:538)
>             ==62== by 0x2B73E68: __offload_api_MOD_offload_timeset
>             (offload_api.F:137) ==62== by 0x2B60EDA:
>             __timings_MOD_timeset_handler (timings.F:278) ==62== by
>             0x2BE2C6D: __message_passing_MOD_mp_waitany
>             (message_passing.F:4597) ==62== by 0x2963EA5:
>             __realspace_grid_types_MOD_rs_pw_transfer_distributed
>             (realspace_grid_types.F:1439) ==62== by 0x2966559:
>             __realspace_grid_types_MOD_rs_pw_transfer
>             (realspace_grid_types.F:711) ==62== by 0xC9310B:
>             __qs_collocate_density_MOD_calculate_rho_core
>             (qs_collocate_density.F:966) ==62== by 0xF57698:
>             __qs_update_s_mstruct_MOD_qs_env_update_s_mstruct
>             (qs_update_s_mstruct.F:109) ==62== by 0xCC03AB:
>             __qs_energy_init_MOD_qs_energies_init
>             (qs_energy_init.F:311) ==62== by 0xCBF0A1:
>             __qs_energy_MOD_qs_energies (qs_energy.F:84) ==62== by
>             0xCE087E: __qs_force_MOD_qs_forces (qs_force.F:212) ==62==
>             by 0xCE4349: __qs_force_MOD_qs_calc_energy_force
>             (qs_force.F:117) ==62== Block was alloc'd at ==62== at
>             0x4C34DFF: malloc (vg_replace_malloc.c:307) ==62== by
>             0x2F21116: _gfortrani_xmallocarray (memory.c:66) ==62== by
>             0x2F1C271: _gfortran_string_trim
>             (string_intrinsics_inc.c:167) ==62== by 0x2B73E1C:
>             __offload_api_MOD_offload_timeset (offload_api.F:137)
>             ==62== by 0x2B60EDA: __timings_MOD_timeset_handler
>             (timings.F:278) ==62== by 0x2BE2C6D:
>             __message_passing_MOD_mp_waitany (message_passing.F:4597)
>             ==62== by 0x2963EA5:
>             __realspace_grid_types_MOD_rs_pw_transfer_distributed
>             (realspace_grid_types.F:1439) ==62== by 0x2966559:
>             __realspace_grid_types_MOD_rs_pw_transfer
>             (realspace_grid_types.F:711) ==62== by 0xC9310B:
>             __qs_collocate_density_MOD_calculate_rho_core
>             (qs_collocate_density.F:966) ==62== by 0xF57698:
>             __qs_update_s_mstruct_MOD_qs_env_update_s_mstruct
>             (qs_update_s_mstruct.F:109) ==62== by 0xCC03AB:
>             __qs_energy_init_MOD_qs_energies_init
>             (qs_energy_init.F:311) ==62== by 0xCBF0A1:
>             __qs_energy_MOD_qs_energies (qs_energy.F:84)
>
>             -- 
>             You received this message because you are subscribed to
>             the Google Groups "cp2k" group.
>             To unsubscribe from this group and stop receiving emails
>             from it, send an email to cp2k+uns... at googlegroups.com.
>             To view this discussion on the web visit
>             https://groups.google.com/d/msgid/cp2k/4d753954-6ada-4d19-9092-931469136f8en%40googlegroups.com
>             <https://groups.google.com/d/msgid/cp2k/4d753954-6ada-4d19-9092-931469136f8en%40googlegroups.com?utm_medium=email&utm_source=footer>.
>
>         -- 
>         You received this message because you are subscribed to the
>         Google Groups "cp2k" group.
>         To unsubscribe from this group and stop receiving emails from
>         it, send an email to cp2k+uns... at googlegroups.com.
>
>         To view this discussion on the web visit
>         https://groups.google.com/d/msgid/cp2k/c7060174-3f71-4179-8e70-ce5f179859a4n%40googlegroups.com
>         <https://groups.google.com/d/msgid/cp2k/c7060174-3f71-4179-8e70-ce5f179859a4n%40googlegroups.com?utm_medium=email&utm_source=footer>.
>
>     -- 
>     You received this message because you are subscribed to the Google
>     Groups "cp2k" group.
>     To unsubscribe from this group and stop receiving emails from it,
>     send an email to cp2k+uns... at googlegroups.com.
>
>     To view this discussion on the web visit
>     https://groups.google.com/d/msgid/cp2k/f282de9c-e731-4232-838c-96139638c9bcn%40googlegroups.com
>     <https://groups.google.com/d/msgid/cp2k/f282de9c-e731-4232-838c-96139638c9bcn%40googlegroups.com?utm_medium=email&utm_source=footer>.
>
> -- 
> You received this message because you are subscribed to the Google 
> Groups "cp2k" group.
> To unsubscribe from this group and stop receiving emails from it, send 
> an email to cp2k+unsubscribe at googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/cp2k/5d62512a-817a-4566-be6c-e47a52810222n%40googlegroups.com 
> <https://groups.google.com/d/msgid/cp2k/5d62512a-817a-4566-be6c-e47a52810222n%40googlegroups.com?utm_medium=email&utm_source=footer>.
>
> -- 
> You received this message because you are subscribed to a topic in the 
> Google Groups "cp2k" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/cp2k/uEodASnEFfQ/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> cp2k+unsubscribe at googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/cp2k/ZRAP278MB0827795C76D23FF284B7D273F4AFA%40ZRAP278MB0827.CHEP278.PROD.OUTLOOK.COM 
> <https://groups.google.com/d/msgid/cp2k/ZRAP278MB0827795C76D23FF284B7D273F4AFA%40ZRAP278MB0827.CHEP278.PROD.OUTLOOK.COM?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+unsubscribe at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/a0274c5c-8f74-4673-bf72-9347d8780773%40gmail.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20231109/d21942de/attachment-0001.htm>


More information about the CP2K-user mailing list