[CP2K-user] needs advice to speed up hybrid/ADMM method
Sun Geng
gengs... at gmail.com
Sun Sep 13 23:51:35 UTC 2020
Hi,
Thank you very much for your help.
I will try out the section with OT, and I am more clear about on-the-fly
ERI numbers now.
I have set up the PRINT_LEVEL to be HIGH, so the full output is a little
big lengthy,
but I believe you are referring this section:
MEMORY| system memory details [Kb]
MEMORY| rank 0 min max
average
MEMORY| MemTotal 132183208 132183208 132183208
132183208
MEMORY| MemFree 128323088 128323088 128323088
128323088
MEMORY| Buffers 27608 27608 27608
27608
MEMORY| Cached 687440 687440 687440
687440
MEMORY| Slab 284408 284408 284408
284408
MEMORY| SReclaimable 154740 154740 154740
154740
MEMORY| MemLikelyFree 129192876 129192876 129192876
129192876,
So I guess what I can use is 128323088 /24 ( I have 24 cores in the
node)/1000 = 5346 MB per MPI process,
but since I need to reserve some for system and other parts of CP2K,
finally I should give MAX_MEMORY a number smaller than that.
Please let me know if I am incorrect.
Thanks again,
Best Wishes,
Geng
在2020年9月13日星期日 UTC-7 下午3:58:15<n... at berkeley.edu> 写道:
> (1) OT can get the band gap, if you include the section:
> &DFT
> &PRINT
> &MO_CUBES
> WRITE_CUBE False
> NHOMO 1
> NLUMO 1
> &END
> &END
> &END
>
> It will print a line "HOMO-LUMO" with the band gap. Now, because OT only
> works on the occupied levels, it will return a *slightly* different result
> from diagonalization for the band gap. I've tested it a few times and found
> it was only 0.01eV difference for a moderately gapped material, so I think
> it should be fine for most applications. If you need the levels to be super
> accurate, you can always converge with OT, then re-evaluate with
> diagonalization, but if you're using ADMM then you're willing to sacrifice
> a tiny bit of accuracy anyway.
>
> (2) For max memory, you can look at the beginning of your cp2k output file
> and find the line that says "MemFree" and divide it by the number of
> message-passing processes (also listed in out file near the top), and see
> how much you should have available for each mpi process. Can we see full
> output file?
>
> (3) "Finally, I would assume the line (printing the Number of sph. ERI's
> calculated on the fly is not zero) only works for the first SCF iteration?
> ". To be clear, the output says
>
> Number of sph. ERI's stored in-core: 16901607068
> and
> Number of sph. ERI's calculated on the fly: 91978901962
>
> This means that during the first step "16901607068" ERIs were stored, and
> are re-used each SCF step, but "91978901962" ERIs could not be stored, and
> will be re-calculated on SCF step 2,3,4...
>
> On Sunday, September 13, 2020 at 2:11:33 PM UTC-7 ge... at gmail.com
> wrote:
>
>> Hi,
>> Thank you for your prompt reply.
>> For the basis set, I am using the small (DZVP) primary basis sets and
>> small auxiliary basis set too. I planned to increase them if the accuracy
>> is not optimal.
>> Indeed, my target system has a large band gap (~3eV). However, I would
>> like to study the band gap of the material, can I use OT method? it seems
>> that OT method only prints the energies of occupied orbitals. Please
>> correct me if I am wrong.
>>
>> Finally, I would assume the line (printing the Number of sph. ERI's
>> calculated on the fly is not zero) only works for the first SCF iteration?
>> and How could I have a good estimation on the MAX_MEMORY?
>> and will the MAX_MEMORY depend on the choice of OMP_NUM_THREAD?
>>
>> Thank you very much.
>> Best Regards,
>> Geng
>>
>>
>>
>> 在2020年9月13日星期日 UTC-7 下午12:15:40<n... at berkeley.edu> 写道:
>>
>>> You pointed out a key issue at the end of your post. "ERI's calculated
>>> on the fly" should ideally be zero. The reason is that the 4-center
>>> electron-repulsion integrals (ERIs) are geometric objects, and only need to
>>> be evaluated in the first SCF, provided you can store their results in
>>> memory. If you have enough memory for this, then the first SCF step will be
>>> long, but the subsequent SCF steps will be only slightly more expensive
>>> than a GGA calculation.
>>>
>>> Other than that there are two things I might note:
>>> (1) If your system has a band-gap, you should use the OT method instead
>>> of standard matrix diagonalization, it scales quite well and has very nice
>>> convergence behavior.
>>> (2) You say you have 120Gb of memory available for you calculation, but
>>> only 13Gb are consumed by your HFX module. Even with the rest of the cp2k
>>> program taking some memory, you should have a lot more memory left over for
>>> storing thee ERIs. Double check MAX_MEMORY is a reasonable value, it is the
>>> max amount of memory for *each* MPI task to use.
>>> (3) Last thing that could be an issue is your auxiliary basis set, which
>>> ones are you using for this calculation? ADMM is so beneficial because you
>>> can use a smaller, aux basis, for the HF part of the calculation, but maybe
>>> your are using a large aux basis set?
>>>
>>> In general, ADMM calc should be much faster than the same calc in vasp
>>> using a primary basis set, so long as you don't make thee supercell too big.
>>> On Sunday, September 13, 2020 at 11:42:55 AM UTC-7 ge... at gmail.com
>>> wrote:
>>>
>>>> Dear CP2K users,
>>>>
>>>> I would like benchmark a small periodic system (11 A x 11 A x 11A)
>>>> using HSE06 functional with results obtained from VASP,
>>>> Here is my input for DFT section:
>>>>
>>>> &DFT
>>>> BASIS_SET_FILE_NAME BASIS_MOLOPT_UCL
>>>> BASIS_SET_FILE_NAME BASIS_MOLOPT
>>>> BASIS_SET_FILE_NAME BASIS_ADMM_MOLOPT
>>>> BASIS_SET_FILE_NAME BASIS_ADMM
>>>> POTENTIAL_FILE_NAME GTH_POTENTIALS
>>>> WFN_RESTART_FILE_NAME cp2k-RESTART.wfn
>>>> &MGRID
>>>> CUTOFF 320
>>>> COMMENSURATE
>>>> &END MGRID
>>>> &QS
>>>> EXTRAPOLATION PS
>>>> EXTRAPOLATION_ORDER 3
>>>> EPS_DEFAULT 1.0E-11
>>>> EPS_PGF_ORB 1.0E-14
>>>> MAP_CONSISTENT T
>>>> &END QS
>>>> &SCF
>>>> SCF_GUESS RESTART
>>>> EPS_SCF 1.0E-7
>>>> MAX_SCF 300
>>>> ADDED_MOS 100
>>>> &DIAGONALIZATION
>>>> ALGORITHM STANDARD
>>>> &END DIAGONALIZATION
>>>> &SMEAR ON
>>>> METHOD FERMI_DIRAC
>>>> ELECTRONIC_TEMPERATURE [K] 300
>>>> &END SMEAR
>>>> &MIXING
>>>> METHOD BROYDEN_MIXING
>>>> ALPHA 0.2
>>>> BETA 1.5
>>>> NBROYDEN 8
>>>> &END MIXING
>>>> &END SCF
>>>> !&XC
>>>> ! &XC_FUNCTIONAL PBE
>>>> ! &END XC_FUNCTIONAL
>>>> !&END XC
>>>> &XC
>>>> &XC_FUNCTIONAL
>>>> &PBE
>>>> SCALE_X 0.0
>>>> SCALE_C 1.0
>>>> &END PBE
>>>> &XWPBE
>>>> SCALE_X -0.25
>>>> SCALE_X0 1.0
>>>> OMEGA 0.11
>>>> &END XWPBE
>>>> &END XC_FUNCTIONAL
>>>> &HF
>>>> &SCREENING
>>>> EPS_SCHWARZ 1.0E-6
>>>> SCREEN_ON_INITIAL_P T
>>>> &END SCREENING
>>>> &INTERACTION_POTENTIAL
>>>> POTENTIAL_TYPE SHORTRANGE
>>>> OMEGA 0.11
>>>> &END INTERACTION_POTENTIAL
>>>> &MEMORY
>>>> MAX_MEMORY 4000
>>>> EPS_STORAGE_SCALING 0.1
>>>> &END MEMORY
>>>> FRACTION 0.25
>>>> &END HF
>>>> &END XC
>>>> &AUXILIARY_DENSITY_MATRIX_METHOD
>>>> METHOD BASIS_PROJECTION
>>>> ADMM_PURIFICATION_METHOD NONE
>>>> &END AUXILIARY_DENSITY_MATRIX_METHOD
>>>> &PRINT
>>>> &PDOS
>>>> FILENAME pdos
>>>> # print all projected DOS available:
>>>> NLUMO -1
>>>> # split the density by quantum number:
>>>> COMPONENTS
>>>> &END
>>>> &END PRINT
>>>> &END DFT
>>>>
>>>> The calculation restarted from a converged PBE wavefunction.
>>>> However, I found that the calculation is quite "slow" ( Vasp needs 240
>>>> seconds for a SCF step, but CP2K needs almost 2400 seconds. Both of them
>>>> are carried out using a computing node with 24 cores and 120 G memory in
>>>> total). I understand it is not easy to compare the different software
>>>> because of very different setups, but I wound expect the ADMM method in
>>>> CP2K should be much faster.
>>>>
>>>> Below is the output.
>>>>
>>>> SCF WAVEFUNCTION OPTIMIZATION
>>>>
>>>> Step Update method Time Convergence Total energy
>>>> Change
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> HFX_MEM_INFO| Est. max. program size before HFX [MiB]:
>>>> 792
>>>>
>>>> *** WARNING in hfx_types.F:1287 :: Periodic Hartree Fock calculation
>>>> ***
>>>> *** requested with use of a truncated or shortrange potential. The
>>>> cutoff ***
>>>> *** radius is larger than half the minimal cell dimension. This may
>>>> lead ***
>>>> *** to unphysical total energies. Reduce the cutoff radius in order
>>>> to ***
>>>> *** avoid possible problems.
>>>> ***
>>>>
>>>> HFX_MEM_INFO| Number of cart. primitive ERI's calculated:
>>>> 11992558561508
>>>> HFX_MEM_INFO| Number of sph. ERI's calculated:
>>>> 157558545566
>>>> HFX_MEM_INFO| Number of sph. ERI's stored in-core:
>>>> 16901607068
>>>> HFX_MEM_INFO| Number of sph. ERI's stored on disk:
>>>> 0
>>>> HFX_MEM_INFO| Number of sph. ERI's calculated on the fly:
>>>> 91978901962
>>>> HFX_MEM_INFO| Total memory consumption ERI's RAM [MiB]:
>>>> 13711
>>>> HFX_MEM_INFO| Whereof max-vals [MiB]:
>>>> 454
>>>> HFX_MEM_INFO| Total compression factor ERI's RAM:
>>>> 9.41
>>>> HFX_MEM_INFO| Total memory consumption ERI's disk [MiB]:
>>>> 0
>>>> HFX_MEM_INFO| Total compression factor ERI's disk:
>>>> 0.00
>>>> HFX_MEM_INFO| Size of density/Fock matrix [MiB]:
>>>> 24
>>>> HFX_MEM_INFO| Size of buffers [MiB]:
>>>> 90
>>>> HFX_MEM_INFO| Number of periodic image cells considered:
>>>> 123
>>>> HFX_MEM_INFO| Est. max. program size after HFX [MiB]:
>>>> 3582
>>>>
>>>> 1 NoMix/Diag. 0.20E+00 6553.7 0.12989389 -3154.6382899197
>>>> -3.15E+03
>>>>
>>>> *** WARNING in hfx_types.F:1287 :: Periodic Hartree Fock calculation
>>>> ***
>>>> *** requested with use of a truncated or shortrange potential. The
>>>> cutoff ***
>>>> *** radius is larger than half the minimal cell dimension. This may
>>>> lead ***
>>>> *** to unphysical total energies. Reduce the cutoff radius in order
>>>> to ***
>>>> *** avoid possible problems.
>>>> ***
>>>>
>>>> 2 Broy./Diag. 0.20E+00 2486.1 0.00624233 -3159.6346919624
>>>> -5.00E+00
>>>>
>>>> *** WARNING in hfx_types.F:1287 :: Periodic Hartree Fock calculation
>>>> ***
>>>> *** requested with use of a truncated or shortrange potential. The
>>>> cutoff ***
>>>>
>>>>
>>>> Is there anything wrong with my input that slows down the calculation?
>>>> In particular, the " ERI's calculated on the fly" is not zero which
>>>> seems not good according to a slide from "
>>>> https://mattatlincoln.github.io/talks/GhentWorkshop/?print-pdf#/"
>>>>
>>>> Thank you very much in advance
>>>> Best Regards,
>>>> Geng
>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20200913/ace740ac/attachment.htm>
More information about the CP2K-user
mailing list