[CP2K-user] [CP2K:19445] Deallocate memory used by Hamiltonian-related subroutines

Cindy Pham cindypham196 at gmail.com
Mon Oct 30 17:52:47 UTC 2023


Hi Prof.Hutter,

The memory used CP2K (as read by TRACE from /proc/self/statm file) keeps
increasing even if I run a standard OT job (the input file is attached).
The TRACE helped us to establish that the memory increase happens within
qs_rho_update_rho_low() subroutine (the increase is shown in the attached
plot). A closer look into the TRACE printout of subroutines called by
qs_rho_update_rho_low() produces confusing data. TRACE tells that memory
allocation can increase not only in complex subroutines, but even in simple
subroutines such as pw_zero(). How is it possible that pw_zero() increases
TRACE printout? Is it because allocated memory cannot be measured precisely
in Unix (via /proc/self/statm)?

If so, it still does not explain the SYSTEMATIC memory increase in more
complex subroutines called by qs_rho_update_rho_low(). I have noticed two
arrays, grids_c and npts_local, within the grid_collocate_task_list
subroutine that are not explicitly deallocated. Do you know if it is by
design? I know that valgrind tells that the memory is properly deallocated
(somehow) in the END of the job, but the minor increases DURING a long run
eventually lead to "out of memory" problem.

Best regards,
Cindy.

On Wed, Oct 4, 2023 at 8:56 PM Cindy Pham <cindypham196 at gmail.com> wrote:

> Hi Prof. Hutter,
>
> Thank you for your suggestion!
>
> Best regards,
> Cindy.
>
>
>
> On Wed, Oct 4, 2023 at 4:15 AM Jürg Hutter <hutter at chem.uzh.ch> wrote:
>
>> Hi
>>
>> without knowing the details of your program it is impossible to point to
>> a easy solution. As there are no memory leaks in CP2K, you must miss some
>> routines that clean up at the end of the SCF loop.
>> I would suggest you compile the code with memory leak detection in order
>> to find the problematic structure and then write a routine to deallocate
>> them.
>> See the sdbg arch files for the gfortran options needed for leak
>> detection.
>>
>> regards
>> JH
>>
>> ________________________________________
>> From: cp2k at googlegroups.com <cp2k at googlegroups.com> on behalf of Cindy
>> Pham <cindypham196 at gmail.com>
>> Sent: Tuesday, October 3, 2023 10:16 PM
>> To: cp2k at googlegroups.com
>> Subject: [CP2K:19284] Deallocate memory used by Hamiltonian-related
>> subroutines
>>
>> Hi CP2K forum,
>>
>> I am running a lengthy SCF calculation (over 10k iterations) and noticed
>> a gradual increase in the allocated memory (I used TRACE keyword to print
>> current allocated memory). It appears that the step-by-step increase in
>> memory allocation happens when the Kohn-Sham Hamiltonian is re-calculated,
>> specifically within the qs_ks_did_change function (in qs_ks_types.F).
>>
>> The SCF routine is my own code that relies on the CP2K built-in
>> Hamiltonian subroutines. It functions properly, but its only problem is the
>> ever increasing memory consumption.
>>
>> Since I do not need to keep previous Hamiltonians (for any kind of DIIS
>> extrapolation), is there any way to deallocate all memory used by the
>> Hamiltonian-related subroutines (at least once in a while, say, after 1000
>> SCF iterations)?
>>
>> Alternatively, are there any input keywords that can ensure that the
>> Hamiltonian structures are reset once in a while?
>>
>> Thank you in advance for your time and your suggestions.
>>
>> Best regards,
>> Cindy Pham.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "cp2k" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to cp2k+unsubscribe at googlegroups.com<mailto:
>> cp2k+unsubscribe at googlegroups.com>.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/cp2k/CAN4Jpm3f3BUuJ%2B3EDEuqpp0TbBEkaHfOjQm%2Bn9%3D6vYquYQNQvQ%40mail.gmail.com
>> <
>> https://groups.google.com/d/msgid/cp2k/CAN4Jpm3f3BUuJ%2B3EDEuqpp0TbBEkaHfOjQm%2Bn9%3D6vYquYQNQvQ%40mail.gmail.com?utm_medium=email&utm_source=footer
>> >.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "cp2k" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to cp2k+unsubscribe at googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/cp2k/ZR0P278MB07591FF04A66D61DBAA83F3A9FCBA%40ZR0P278MB0759.CHEP278.PROD.OUTLOOK.COM
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+unsubscribe at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/CAN4Jpm0fQ2eh-B5VbkYRoMcphU9eYDL5f6Jki7FkvY%3Duh9h3GQ%40mail.gmail.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20231030/aefe443c/attachment-0001.htm>
-------------- next part --------------
 000000:000001>>                    8      1 pw_zero       start Hostmem: 1160 MB GPUmem: 0 MB
 000000:000001<<                    8      1 pw_zero       0.028 Hostmem: 1266 MB GPUmem: 0 MB
 000000:000001>>                    8      2 pw_zero       start Hostmem: 1415 MB GPUmem: 0 MB
 000000:000001<<                    8      2 pw_zero       0.028 Hostmem: 1521 MB GPUmem: 0 MB
 000000:000001>>                            11      3 pw_zero       start Hostmem: 1975 MB GPUmem: 0 MB
 000000:000001<<                            11      3 pw_zero       0.016 Hostmem: 2028 MB GPUmem: 0 MB
 000000:000001>>                            11      4 pw_zero       start Hostmem: 2029 MB GPUmem: 0 MB
 000000:000001<<                            11      4 pw_zero       0.015 Hostmem: 2081 MB GPUmem: 0 MB
 000000:000001>>                            11      5 pw_zero       start Hostmem: 2081 MB GPUmem: 0 MB
 000000:000001<<                            11      5 pw_zero       0.015 Hostmem: 2133 MB GPUmem: 0 MB
 000000:000001>>                            11      6 pw_zero       start Hostmem: 2191 MB GPUmem: 0 MB
 000000:000001<<                            11      6 pw_zero       0.016 Hostmem: 2245 MB GPUmem: 0 MB
 000000:000001>>                            11      7 pw_zero       start Hostmem: 2245 MB GPUmem: 0 MB
 000000:000001<<                            11      7 pw_zero       0.015 Hostmem: 2297 MB GPUmem: 0 MB
 000000:000001>>                            11      8 pw_zero       start Hostmem: 2297 MB GPUmem: 0 MB
 000000:000001<<                            11      8 pw_zero       0.015 Hostmem: 2349 MB GPUmem: 0 MB
 000000:000001>>                       9    183 pw_zero       start Hostmem: 3898 MB GPUmem: 0 MB
 000000:000001<<                       9    183 pw_zero       0.028 Hostmem: 4004 MB GPUmem: 0 MB
 000000:000001>>                       9    184 pw_zero       start Hostmem: 4217 MB GPUmem: 0 MB
 000000:000001<<                       9    184 pw_zero       0.028 Hostmem: 4323 MB GPUmem: 0 MB
 000000:000001>>                    8    185 pw_zero       start Hostmem: 4488 MB GPUmem: 0 MB
 000000:000001<<                    8    185 pw_zero       0.015 Hostmem: 4541 MB GPUmem: 0 MB
 000000:000001>>                    8    186 pw_zero       start Hostmem: 4541 MB GPUmem: 0 MB
 000000:000001<<                    8    186 pw_zero       0.014 Hostmem: 4593 MB GPUmem: 0 MB
 000000:000001>>                    8    187 pw_zero       start Hostmem: 4593 MB GPUmem: 0 MB
 000000:000001<<                    8    187 pw_zero       0.015 Hostmem: 4645 MB GPUmem: 0 MB
 000000:000001>>                    8    188 pw_zero       start Hostmem: 4704 MB GPUmem: 0 MB
 000000:000001<<                    8    188 pw_zero       0.015 Hostmem: 4757 MB GPUmem: 0 MB
 000000:000001>>                    8    189 pw_zero       start Hostmem: 4757 MB GPUmem: 0 MB
 000000:000001<<                    8    189 pw_zero       0.014 Hostmem: 4809 MB GPUmem: 0 MB
 000000:000001>>                    8    190 pw_zero       start Hostmem: 4809 MB GPUmem: 0 MB
 000000:000001<<                    8    190 pw_zero       0.014 Hostmem: 4861 MB GPUmem: 0 MB
 000000:000001>>                    8    193 pw_zero       start Hostmem: 8439 MB GPUmem: 0 MB
 000000:000001<<                    8    193 pw_zero       0.015 Hostmem: 8491 MB GPUmem: 0 MB
 000000:000001>>                    8    194 pw_zero       start Hostmem: 8491 MB GPUmem: 0 MB
 000000:000001<<                    8    194 pw_zero       0.015 Hostmem: 8543 MB GPUmem: 0 MB
 000000:000001>>                       9    195 pw_zero       start Hostmem: 8543 MB GPUmem: 0 MB
 000000:000001<<                       9    195 pw_zero       0.015 Hostmem: 8595 MB GPUmem: 0 MB
 000000:000001>>                       9    196 pw_zero       start Hostmem: 8595 MB GPUmem: 0 MB
 000000:000001<<                       9    196 pw_zero       0.015 Hostmem: 8647 MB GPUmem: 0 MB
 000000:000001>>                       9    197 pw_zero       start Hostmem: 8647 MB GPUmem: 0 MB
 000000:000001<<                       9    197 pw_zero       0.015 Hostmem: 8699 MB GPUmem: 0 MB
 000000:000001>>                       9    198 pw_zero       start Hostmem: 8699 MB GPUmem: 0 MB
 000000:000001<<                       9    198 pw_zero       0.015 Hostmem: 8751 MB GPUmem: 0 MB
 000000:000001>>                       9    199 pw_zero       start Hostmem: 8755 MB GPUmem: 0 MB
 000000:000001<<                       9    199 pw_zero       0.015 Hostmem: 8808 MB GPUmem: 0 MB
 000000:000001>>                       9    200 pw_zero       start Hostmem: 8809 MB GPUmem: 0 MB
 000000:000001<<                       9    200 pw_zero       0.015 Hostmem: 8861 MB GPUmem: 0 MB
 000000:000001>>                       9    201 pw_zero       start Hostmem: 8861 MB GPUmem: 0 MB
 000000:000001<<                       9    201 pw_zero       0.015 Hostmem: 8913 MB GPUmem: 0 MB
 000000:000001>>                       9    202 pw_zero       start Hostmem: 8971 MB GPUmem: 0 MB
 000000:000001<<                       9    202 pw_zero       0.015 Hostmem: 9024 MB GPUmem: 0 MB
 000000:000001>>                       9    203 pw_zero       start Hostmem: 9025 MB GPUmem: 0 MB
 000000:000001<<                       9    203 pw_zero       0.015 Hostmem: 9077 MB GPUmem: 0 MB
 000000:000001>>                       9    204 pw_zero       start Hostmem: 9077 MB GPUmem: 0 MB
 000000:000001<<                       9    204 pw_zero       0.015 Hostmem: 9129 MB GPUmem: 0 MB
 000000:000001>>                         10    205 pw_zero       start Hostmem: 9291 MB GPUmem: 0 MB
 000000:000001<<                         10    205 pw_zero       0.016 Hostmem: 9345 MB GPUmem: 0 MB
 000000:000001>>                         10    206 pw_zero       start Hostmem: 9345 MB GPUmem: 0 MB
 000000:000001<<                         10    206 pw_zero       0.016 Hostmem: 9399 MB GPUmem: 0 MB
 000000:000001>>                         10    207 pw_zero       start Hostmem: 9399 MB GPUmem: 0 MB
 000000:000001<<                         10    207 pw_zero       0.016 Hostmem: 9453 MB GPUmem: 0 MB
 000000:000001>>                         10    208 pw_zero       start Hostmem: 9453 MB GPUmem: 0 MB
 000000:000001<<                         10    208 pw_zero       0.016 Hostmem: 9507 MB GPUmem: 0 MB
 000000:000001>>                         10    209 pw_zero       start Hostmem: 9507 MB GPUmem: 0 MB
 000000:000001<<                         10    209 pw_zero       0.016 Hostmem: 9561 MB GPUmem: 0 MB
 000000:000001>>                         10    210 pw_zero       start Hostmem: 9561 MB GPUmem: 0 MB
 000000:000001<<                         10    210 pw_zero       0.016 Hostmem: 9615 MB GPUmem: 0 MB
 000000:000001>>                         10    211 pw_zero       start Hostmem: 9615 MB GPUmem: 0 MB
 000000:000001<<                         10    211 pw_zero       0.016 Hostmem: 9669 MB GPUmem: 0 MB
 000000:000001>>                         10    212 pw_zero       start Hostmem: 9669 MB GPUmem: 0 MB
 000000:000001<<                         10    212 pw_zero       0.016 Hostmem: 9723 MB GPUmem: 0 MB
 000000:000001>>                         10    213 pw_zero       start Hostmem: 10209 MB GPUmem: 0 MB
 000000:000001<<                         10    213 pw_zero       0.016 Hostmem: 10264 MB GPUmem: 0 MB
 000000:000001>>                         10    214 pw_zero       start Hostmem: 10264 MB GPUmem: 0 MB
 000000:000001<<                         10    214 pw_zero       0.016 Hostmem: 10317 MB GPUmem: 0 MB
 000000:000001>>                         10    215 pw_zero       start Hostmem: 10317 MB GPUmem: 0 MB
 000000:000001<<                         10    215 pw_zero       0.016 Hostmem: 10371 MB GPUmem: 0 MB
 000000:000001>>                         10    216 pw_zero       start Hostmem: 10371 MB GPUmem: 0 MB
 000000:000001<<                         10    216 pw_zero       0.016 Hostmem: 10425 MB GPUmem: 0 MB
 000000:000001>>                            11    245 pw_zero       start Hostmem: 10561 MB GPUmem: 0 MB
 000000:000001<<                            11    245 pw_zero       0.028 Hostmem: 10667 MB GPUmem: 0 MB
 000000:000001>>                         10    661 pw_zero       start Hostmem: 10672 MB GPUmem: 0 MB
 000000:000001<<                         10    661 pw_zero       0.009 Hostmem: 10673 MB GPUmem: 0 MB
-------------- next part --------------
A non-text attachment was scrubbed...
Name: He_crystal_OT.inp
Type: application/octet-stream
Size: 2084 bytes
Desc: not available
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20231030/aefe443c/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trace_He_crystal_OT_cell888.pdf
Type: application/pdf
Size: 13305 bytes
Desc: not available
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20231030/aefe443c/attachment-0001.pdf>


More information about the CP2K-user mailing list