[CP2K-user] [CP2K:14269] Re: Hybrid functional calculation problem
fa...@gmail.com
fabia... at gmail.com
Mon Nov 23 17:18:18 UTC 2020
Your graph nicely shows that cp2k runs out of memory. As Matt wrote, you
have to decrease MAX_MEMORY to allow enough memory for the rest of the
programm. Here are some details on memory consumption with HF:
https://groups.google.com/g/cp2k/c/DZDVTIORyVY/m/OGjJDJuqBwAJ
Of course you can recalculate some the ERI's in each SCF cycle. But that
slows down the minimization by a lot, I'd advise against doing that. Try to
use screening, set a proper value for MAX_MEMORY, and use all the resources
you have to store the ERI's
Fabian
On Sunday, 22 November 2020 at 23:08:17 UTC+1 Lucas Lodeiro wrote:
> Hi Fabian and Matt,
>
> About the access to the memory, I ran calculations without problems for
> months, using 90% of the node RAM without problems. But to check I set
> ulimit -s unlimited. There are some changes, before using ulimit, the
> calculation crashes and the use of RAM was so low (15%), after using
> ulimit, the calculation crashes equally, but the use of RAM shows a
> sustained rise to the limit and then the calculation crashes. This is a
> change. I adjunct an image.
>
> About the SCREEN_ON_INITIAL_P, I will use it in the little cluster. I like
> the idea of running 2 calculations as climbing steps.
>
> I know that the number of the ERIs calculated on the fly should be 0, and
> if it is different from zero, I need to use more RAM to store them and to
> not calculate them at each scf step. But in the case of the little cluster,
> I am using all processors and RAM resources. But the way, the
> calculation runs without problems when ERIs calculated on the fly at each
> scf step, just is very slow.
>
> About what Matt comments. In the little cluster, I have a single node with
> 250GB RAM. Then I use MAX_MEMORY = 2600, this is a total of 166.4 GB for
> the ERIS (the output informs 143 GB), and the rest for the whole program.
> In the case of the big cluster, we have access to many nodes with 44 proc
> and 192GB RAM, and 9 nodes with 44 proc and 768GB RAM. In the first case, I
> use 5 nodes (220 proc) using all memory (960GB), setting MAX_MEMORY = 4000
> (4.0 GB * 220 proc = 880 GB RAM for ERIs). In the second case, I use 5
> nodes (220 proc) using all memory (3840GB), setting MAX_MEMORY = 15000
> (15.0 GB * 220 proc = 3300 GB RAM for ERIs).
> In both cases the calculation crashes... I do not know if I am
> so credulous , but 3.3 TB of RAM seems, at least, enough to store so many
> of the ERIs...
>
> Using the data informed in the output of little cluster:
> HFX_MEM_INFO| Number of sph. ERI's calculated:
> 4879985997918
> HFX_MEM_INFO| Number of sph. ERI's stored in-core:
> 116452577779
> HFX_MEM_INFO| Number of sph. ERI's stored on disk:
> 0
> HFX_MEM_INFO| Number of sph. ERI's calculated on the fly:
> 4763533420139
>
> The stored ERI's are the 1/42 of the total ERIs, and use 166.4 GB (143 GB
> informed)... Then if I want to store all of them, I need 166.4 GB * 42 =
> ~7.0 TB... Is that correct?
> I can get 7.0 TB RAM using 9 nodes with 768 GB RAM each one. But I am not
> so clear about the idea that the amount of RAM is the problem, because in
> the little cluster it runs, calculating almost all ERIs at each scf step...
>
> I am a little surprised why the calculation runs in the little cluster,
> but not in the big one.
> Do you guess some other related problem?
>
> Regards - Lucas
>
>
>
> El dom, 22 nov 2020 a las 13:55, Matt W (<mat... at gmail.com>) escribió:
>
>> Your input has
>>
>> &MEMORY
>> MAX_MEMORY 4000
>> EPS_STORAGE_SCALING 0.1
>> &END MEMORY
>>
>> This means that each MPI task (which can be multiple cores) should be
>> able to allocate 4GBi of memory _exclusively_ for the 2 electron
>> integrals. If there is less than that available it will crash as the
>> memory allocation can't occur. I guess your main cluster has less memory
>> than the smaller one. You need to leave space for the operating system and
>> the rest of the cp2k run besides the 2 electron integrals.
>>
>> There is another thread where Juerg answers HFX memory in more detail
>> form earlier this year.
>>
>> Matt
>>
>> On Sunday, November 22, 2020 at 4:42:47 PM UTC fa... at gmail.com wrote:
>>
>>> Can cp2k access all the memory on the cluster? On linux you can use
>>> ulimit -s unlimited
>>> to remove any limit on the amount of memory a process can use.
>>>
>>> I usually use SCREEN_ON_INITIAL_P. I found that for large systems it is
>>> faster to run two energy minimizations with the key word enabled (such that
>>> the second restarts from a converged PBE0 wfn) than running a single
>>> minimization without SCREEN_ON_INITIAL_P. But that probably depends on the
>>> system you simulate.
>>>
>>> You should converge the cutoff with respect to the properties that you
>>> are interested in. Run a test system with increasing cutoff and look at,
>>> e.g. the energy, pdos, etc.
>>>
>>> Number of sph. ERI's calculated on the fly: 4763533420139
>>> This number should always be 0. If it is larger, increase the memory
>>> cp2k has available.
>>>
>>> Fabian
>>> On Sunday, 22 November 2020 at 17:24:13 UTC+1 Lucas Lodeiro wrote:
>>>
>>>> Dear Fabian,
>>>>
>>>> Thanks for your advise. I forgot to clarify the time ejecution... my
>>>> mistake.
>>>> The calculation runs for 5 or 7 minutes, and stops... the walltime for
>>>> the calculation was set as 72hrs, then I do not believe this is the
>>>> problem. Now I am running the same input in a littler cluster (different
>>>> form the problematic crash) with 64 proc and 250GB RAM, and the calculation
>>>> works fine (so so slow, 9 hr per scf step, but runs... the total RAM
>>>> assigned for the ERI's is not sufficient but the problem is not appear)...
>>>> It is no practical to use this little cluster, then I need to fix the
>>>> problem in the big one, to use more RAM and more processors (more than 220
>>>> it is possible), but as the program does not show what is happening, I
>>>> cannot tell anything to the cluster admin to recompile or fix the problem.
>>>> :(
>>>>
>>>> This is the output in the little cluster:
>>>>
>>>> Step Update method Time Convergence Total energy
>>>> Change
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> HFX_MEM_INFO| Est. max. program size before HFX [MiB]:
>>>> 1371
>>>>
>>>> *** WARNING in hfx_energy_potential.F:605 :: The Kohn Sham matrix is
>>>> not ***
>>>>
>>>> *** 100% occupied. This may result in incorrect Hartree-Fock results.
>>>> Try ***
>>>> *** to decrease EPS_PGF_ORB and EPS_FILTER_MATRIX in the QS section.
>>>> For ***
>>>> *** more information see FAQ: https://www.cp2k.org/faq:hfx_eps_warning
>>>> ***
>>>>
>>>> HFX_MEM_INFO| Number of cart. primitive ERI's calculated:
>>>> 27043173676632
>>>> HFX_MEM_INFO| Number of sph. ERI's calculated:
>>>> 4879985997918
>>>> HFX_MEM_INFO| Number of sph. ERI's stored in-core:
>>>> 116452577779
>>>> HFX_MEM_INFO| Number of sph. ERI's stored on disk:
>>>> 0
>>>> HFX_MEM_INFO| Number of sph. ERI's calculated on the fly:
>>>> 4763533420139
>>>> HFX_MEM_INFO| Total memory consumption ERI's RAM [MiB]:
>>>> 143042
>>>> HFX_MEM_INFO| Whereof max-vals [MiB]:
>>>> 1380
>>>> HFX_MEM_INFO| Total compression factor ERI's RAM:
>>>> 6.21
>>>> HFX_MEM_INFO| Total memory consumption ERI's disk [MiB]:
>>>> 0
>>>> HFX_MEM_INFO| Total compression factor ERI's disk:
>>>> 0.00
>>>> HFX_MEM_INFO| Size of density/Fock matrix [MiB]:
>>>> 266
>>>> HFX_MEM_INFO| Size of buffers [MiB]:
>>>> 98
>>>> HFX_MEM_INFO| Number of periodic image cells considered:
>>>> 5
>>>> HFX_MEM_INFO| Est. max. program size after HFX [MiB]:
>>>> 3834
>>>>
>>>> 1 NoMix/Diag. 0.40E+00 ****** 5.46488333 -20625.2826573514
>>>> -2.06E+04
>>>>
>>>> About the SCREEN_ON_INITIAL_P, I read that to use it, you need a very
>>>> good guess (more than de GGA converged one) as for example the last step or
>>>> frame from a GEO_OPT or MD... Is it really useful when the guess is the GGA
>>>> wavefunction?
>>>> About the CUTOFF_RADIUS, I read that 6 or 7 it is a good compromise,
>>>> and as my cell is approximately twice, I use the minimal image convention
>>>> to decide the 8.62 number which is near the recomended (6 or 7). If I
>>>> reduce it, does the computational cost diminish considerably?
>>>>
>>>> Regards - Lucas
>>>>
>>>> El dom, 22 nov 2020 a las 12:53, fa... at gmail.com (<
>>>> fa... at gmail.com>) escribió:
>>>>
>>>>> Dear Lucas,
>>>>>
>>>>> cp2k was computes the four-center integrals during (or prior) to the
>>>>> first SCF cycle. I assume the job ran out of time during this task For a
>>>>> system with more than 1000 atoms this takes a lot of time. With only 220
>>>>> CPUs this could take several hours in fact.
>>>>>
>>>>> To speed up the calculation you should use SCREEN_ON_INITIAL_P T and
>>>>> restart using a well converged PBE wfn. Other than that, there is little
>>>>> you can do other than assign the job more time and/or CPUs. (Of course,
>>>>> reducing CUTOFF_RADIUS 8.62 would help too but could negatively
>>>>> affect the result).
>>>>>
>>>>> Cheers,
>>>>> Fabian
>>>>>
>>>>> On Sunday, 22 November 2020 at 01:21:05 UTC+1 Lucas Lodeiro wrote:
>>>>>
>>>>>> Hi all,
>>>>>> I need to perform a hybrid calculation with CP2K7.1, over a big
>>>>>> system (+1000 atoms). I study the manual, the tutorials and some videos of
>>>>>> CP2K developers to improve my input. But the program exits the calculation
>>>>>> when the HF part is running... I see the memory usage on the fly, and there
>>>>>> is no peak which explains the fail (I used 4000Mb with 220 processors).
>>>>>> The output does not show some explanation... Thinking in the memory,
>>>>>> I try with a largemem node at our cluster, using 15000Mb with 220
>>>>>> processors, but the program exists at the same point without message, just
>>>>>> killing the process.
>>>>>> The output shows a warning:
>>>>>>
>>>>>> *** WARNING in hfx_energy_potential.F:591 :: The Kohn Sham matrix is
>>>>>> not ***
>>>>>> *** 100% occupied. This may result in incorrect Hartree-Fock
>>>>>> results. Try ***
>>>>>> *** to decrease EPS_PGF_ORB and EPS_FILTER_MATRIX in the QS section.
>>>>>> For ***
>>>>>> *** more information see FAQ:
>>>>>> https://www.cp2k.org/faq:hfx_eps_warning ***
>>>>>>
>>>>>> but I read this is not a very complicated issue, and the calculation
>>>>>> has to continue and not crash
>>>>>> Also I decrease the EPS__PGF_ORB, but the warning and the problem
>>>>>> persist.
>>>>>>
>>>>>> I do not know if the problem could be located in other parts of my
>>>>>> input... for example I use the PBE0-T_C-LR (I use PBC for XY), and ADMM. In
>>>>>> the ADMM options, I use ADMM_PURIFICATION_METHOD = NONE, due to I read that
>>>>>> ADMM1 is the only one useful for smearing calculations.
>>>>>>
>>>>>> I run this system with PBE (for the first guess of PBE0), and there
>>>>>> is no problem in that case.
>>>>>> Moreover, I try with other CP2K versions (7.0, 6.1 and 5.1) compiled
>>>>>> into the cluster with (libint_max_am=6), and the calculation crash, but
>>>>>> show this problem:
>>>>>>
>>>>>>
>>>>>> *******************************************************************************
>>>>>> * ___
>>>>>> *
>>>>>> * / \
>>>>>> *
>>>>>> * [ABORT]
>>>>>> *
>>>>>> * \___/ CP2K and libint were compiled with different
>>>>>> LIBINT_MAX_AM. *
>>>>>> * |
>>>>>> *
>>>>>> * O/|
>>>>>> *
>>>>>> * /| |
>>>>>> *
>>>>>> * / \
>>>>>> hfx_libint_wrapper.F:134 *
>>>>>>
>>>>>> *******************************************************************************
>>>>>>
>>>>>>
>>>>>> ===== Routine Calling Stack =====
>>>>>>
>>>>>> 2 hfx_create
>>>>>> 1 CP2K
>>>>>>
>>>>>> It seems like this problem is not present in the 7.1 version, as the
>>>>>> program does not show it, and the compilation information does not
>>>>>> show LIBINT_MAX_AM value...
>>>>>>
>>>>>> If somebody could give me some advice, I will appreciate it. :)
>>>>>> I attach the input file, and the output file for 7.1 version.
>>>>>>
>>>>>> Regards - Lucas Lodeiro
>>>>>>
>>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "cp2k" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to cp... at googlegroups.com.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/cp2k/96479ce2-d8a3-4ccf-b55c-0e935878f1c0n%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/cp2k/96479ce2-d8a3-4ccf-b55c-0e935878f1c0n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> --
>> You received this message because you are subscribed to the Google Groups
>> "cp2k" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to cp... at googlegroups.com.
>>
> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/cp2k/aa6b0a55-9d21-4da6-a3bb-f6f62ea0768bn%40googlegroups.com
>> <https://groups.google.com/d/msgid/cp2k/aa6b0a55-9d21-4da6-a3bb-f6f62ea0768bn%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20201123/0380008b/attachment.htm>
More information about the CP2K-user
mailing list