[CP2K-user] [CP2K:17749] Possible Memory Leak

Krack Matthias (PSI) matthias.krack at psi.ch
Fri Sep 23 16:00:10 UTC 2022


Dear Matthew

The memory growth for v2022.1 looks not too bad and it might be fine to survive longer runs.
I remember issues with memory leaks caused by the MPI implementation. Especially OpenMPI showed such problems in the past and that is why I used only MPICH for years, because leak checking in CP2K was impossible with OpenMPI. Have a look at this issue<https://github.com/cp2k/cp2k/issues/1830> from Jan this year for instance.
The presence of memory leaks usually does not imply that the results are wrong.

Best regards

Matthias

From: "cp2k at googlegroups.com" <cp2k at googlegroups.com> on behalf of Matthew Emerson <mrson at uiowa.edu>
Reply to: "cp2k at googlegroups.com" <cp2k at googlegroups.com>
Date: Friday, 23 September 2022 at 16:55
To: "cp2k at googlegroups.com" <cp2k at googlegroups.com>
Subject: Re: [CP2K:17749] Possible Memory Leak

Dear Dr. Matthias,

Sorry for the late response, I wanted to test these things carefully. Below is what I have found.

All versions of CP2K that I have tested appear to still have a memory leak. The issue is less severe on version 2022.1, but as you can see, in the graph "CP2K-2022.1.png", the program keeps growing with time and our runs are long.  For version 8.1, in graph "CP2K-8.1.png", the problem is much more severe (on the same hardware - 56 MPI ranks).

My primary concern is with the correctness of the calculations. We can of course restart the program, but how do I know that a code that looks like it is leaking is not corrupting data. If you continue the run you started, I predict it will continue growing in memory usage. I thank you for your time.


Sincerely,
Matthew S. Emerson
On Wednesday, September 21, 2022 at 9:23:41 AM UTC-5 Matthias Krack wrote:
Hi Matthew

I used this arch file<https://github.com/cp2k/cp2k/blob/master/arch/Linux-gnu-x86_64.psmp> to build the current cp2k release version 2022.1 on our local cluster, basically by running
· source arch/Linux-gnu-x86_64.psmp
in the main cp2k folder and then run make as proposed after the cp2k toolchain has been built successfully. This is also done for the continuous regression testing (see first two entries in the CP2K dashboard<https://dashboard.cp2k.org/index.html>, just click on the “OK” link to see the details).

HTH

Matthias

From: "cp... at googlegroups.com" <cp... at googlegroups.com> on behalf of Matthew Emerson <mr... at uiowa.edu>
Reply to: "cp... at googlegroups.com" <cp... at googlegroups.com>
Date: Wednesday, 21 September 2022 at 15:38
To: "cp... at googlegroups.com" <cp... at googlegroups.com>
Subject: Re: [CP2K:17727] Possible Memory Leak

Dear Dr. Matthias,

Do you have an ARCH file for this build that you could point me to? I would like to build using the same settings to test at ORNL.

Sincerely,
Matthew S. Emerson
On Wednesday, September 21, 2022 at 4:00:21 AM UTC-5 Matthias Krack wrote:
Hello Matthew

I have run your case on our local compute cluster with CP2K v2022.1 (gnu 11.2.0, OpenMPI 4.1.3) using 144 CPU cores. I observe only a small increase in memory usage after the usual initial growth during the first MD steps (see attached plot).

Best regards

Matthias

Error! Filename not specified.

From: "cp... at googlegroups.com" <cp... at googlegroups.com> on behalf of Matthew Emerson <mr... at uiowa.edu>
Reply to: "cp... at googlegroups.com" <cp... at googlegroups.com>
Date: Tuesday, 20 September 2022 at 23:21
To: "cp... at googlegroups.com" <cp... at googlegroups.com>
Subject: [CP2K:17721] Possible Memory Leak

Dear CP2K Developers/Community,

I have attached an input file which I believe shows an example of a possible memory leak in CP2K. It is a typical NVT DFT simulation of molten MgCl2 with PBE-D3 dispersion corrections.

I have tried well-tested CP2K builds on our local cluster (v6.1, v8.1, v2022.1), our university supercomputer (v6.1, v8.1), and even the system-wide installations of CP2K on Cori at Oak Ridge National Lab (v8.1 and v9.1) and memory usage grows linear with time until either the node locks up/dies from insufficient memory usage or the job dies from maximum walltime (ORNL). I've done enough testing that I can almost tell how many MD steps before the job will die for a given machine with X amount of RAM and Y amount of MPI ranks.

 I normally wouldn’t email about things like this but I’ve tried multiple combinations (w/unit-testing) of GCC, OpenMPI, OpenBLAS/MKL, etc. and nothing seems to work. I am hoping this is simply an input file issue or my own error.



Any help will be much appreciated.

Matthew S. Emerson
Margulis Research Group
Department of Chemistry
The University of Iowa
--
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+uns... at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/0c20a8f2-3658-4429-a2e9-7f4aa0edb321n%40googlegroups.com<https://groups.google.com/d/msgid/cp2k/0c20a8f2-3658-4429-a2e9-7f4aa0edb321n%40googlegroups.com?utm_medium=email&utm_source=footer>.
--
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+uns... at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/c69d3c0d-e3d2-489c-8040-19ac7c4e396en%40googlegroups.com<https://groups.google.com/d/msgid/cp2k/c69d3c0d-e3d2-489c-8040-19ac7c4e396en%40googlegroups.com?utm_medium=email&utm_source=footer>.
--
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+unsubscribe at googlegroups.com<mailto:cp2k+unsubscribe at googlegroups.com>.
To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/3bb44147-b5e4-4edf-aa0c-14b6eb9195den%40googlegroups.com<https://groups.google.com/d/msgid/cp2k/3bb44147-b5e4-4edf-aa0c-14b6eb9195den%40googlegroups.com?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+unsubscribe at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/D03B0A9B-83F9-4F31-8855-16CD0CFFDD32%40psi.ch.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20220923/2b5af3a9/attachment-0001.htm>


More information about the CP2K-user mailing list