[CP2K-user] needs advice to speed up hybrid/ADMM method

Sun Geng gengs... at gmail.com
Sun Sep 13 23:51:35 UTC 2020

Thank you very much for your help. 
I will try out the section with OT, and I am more clear about on-the-fly 
ERI numbers now.

I have set up the PRINT_LEVEL to be HIGH,  so the full output is a little 
big lengthy,
but I believe you are referring this section:

 MEMORY| system memory details [Kb]
 MEMORY|                        rank 0           min           max      
 MEMORY| MemTotal            132183208     132183208     132183208    
 MEMORY| MemFree             128323088     128323088     128323088    
 MEMORY| Buffers                 27608         27608         27608        
 MEMORY| Cached                 687440        687440        687440        
 MEMORY| Slab                   284408        284408        284408        
 MEMORY| SReclaimable           154740        154740        154740        
 MEMORY| MemLikelyFree       129192876     129192876     129192876    

So I guess what I can use is 128323088 /24 ( I have 24 cores in the 
node)/1000 = 5346 MB per MPI process, 
but since I need to reserve some for system and other parts of CP2K, 
finally I should give MAX_MEMORY a number smaller than that.
Please let me know if I am incorrect.

Thanks again,

Best Wishes,


在2020年9月13日星期日 UTC-7 下午3:58:15<n... at berkeley.edu> 写道:

> (1) OT can get the band gap, if you include the section:
> &DFT
>     &PRINT
>         &MO_CUBES
>             WRITE_CUBE False
>             NHOMO 1
>             NLUMO 1
>         &END
>     &END
> &END
> It will print a line "HOMO-LUMO" with the band gap. Now, because OT only 
> works on the occupied levels, it will return a *slightly* different result 
> from diagonalization for the band gap. I've tested it a few times and found 
> it was only 0.01eV difference for a moderately gapped material, so I think 
> it should be fine for most applications. If you need the levels to be super 
> accurate, you can always converge with OT, then re-evaluate with 
> diagonalization, but if you're using ADMM then you're willing to sacrifice 
> a tiny bit of accuracy anyway.
> (2) For max memory, you can look at the beginning of your cp2k output file 
> and find the line that says "MemFree" and divide it by the number of 
> message-passing processes (also listed in out file near the top), and see 
> how much you should have available for each mpi process. Can we see full 
> output file?
> (3) "Finally, I would assume the line (printing the Number of sph. ERI's 
> calculated on the fly is not zero) only works for the first SCF iteration? 
> ". To be clear, the output says 
>  Number of sph. ERI's stored in-core: 16901607068
> and 
>  Number of sph. ERI's calculated on the fly: 91978901962
> This means that during the first step "16901607068" ERIs were stored, and 
> are re-used each SCF step, but "91978901962" ERIs could not be stored, and 
> will be re-calculated on SCF step 2,3,4... 
> On Sunday, September 13, 2020 at 2:11:33 PM UTC-7 ge... at gmail.com 
> wrote:
>> Hi,
>> Thank you for your prompt reply.
>> For the basis set, I am using the small (DZVP) primary basis sets and 
>> small auxiliary basis set too. I planned to increase them if  the accuracy 
>> is not optimal.
>> Indeed, my target system has a large band gap (~3eV). However, I would 
>> like to study the band gap of the material, can I use OT method? it seems 
>> that OT method only prints the energies of occupied orbitals. Please 
>> correct me if I am wrong.
>> Finally, I would assume the line (printing the Number of sph. ERI's 
>> calculated on the fly is not zero) only works for the first SCF iteration? 
>> and How could I have a good estimation on the MAX_MEMORY? 
>> and will the MAX_MEMORY depend on the choice of OMP_NUM_THREAD?
>> Thank you very much.
>> Best Regards,
>> Geng
>> 在2020年9月13日星期日 UTC-7 下午12:15:40<n... at berkeley.edu> 写道:
>>> You pointed out a key issue at the end of your post. "ERI's calculated 
>>> on the fly" should ideally be zero. The reason is that the 4-center 
>>> electron-repulsion integrals (ERIs) are geometric objects, and only need to 
>>> be evaluated in the first SCF, provided you can store their results in 
>>> memory. If you have enough memory for this, then the first SCF step will be 
>>> long, but the subsequent SCF steps will be only slightly more expensive 
>>> than a GGA calculation. 
>>> Other than that there are two things I might note:
>>> (1) If your system has a band-gap, you should use the OT method instead 
>>> of standard matrix diagonalization, it scales quite well and has very nice 
>>> convergence behavior.
>>> (2) You say you have 120Gb of memory available for you calculation, but 
>>> only 13Gb are consumed by your HFX module. Even with the rest of the cp2k 
>>> program taking some memory, you should have a lot more memory left over for 
>>> storing thee ERIs. Double check MAX_MEMORY is a reasonable value, it is the 
>>> max amount of memory for *each* MPI task to use.
>>> (3) Last thing that could be an issue is your auxiliary basis set, which 
>>> ones are you using for this calculation? ADMM is so beneficial because you 
>>> can use a smaller, aux basis, for the HF part of the calculation, but maybe 
>>> your are using a large aux basis set? 
>>> In general, ADMM calc should be much faster than the same calc in vasp 
>>> using a primary basis set, so long as you don't make thee supercell too big.
>>> On Sunday, September 13, 2020 at 11:42:55 AM UTC-7 ge... at gmail.com 
>>> wrote:
>>>> Dear CP2K users, 
>>>> I would like benchmark  a small periodic system (11 A x 11 A x 11A)  
>>>> using HSE06 functional  with results obtained from VASP,
>>>> Here is my input for DFT section:
>>>>    &DFT
>>>>       &MGRID
>>>>          CUTOFF 320
>>>>          COMMENSURATE
>>>>       &END MGRID
>>>>       &QS
>>>>          EXTRAPOLATION PS
>>>>          EPS_DEFAULT  1.0E-11
>>>>          EPS_PGF_ORB  1.0E-14
>>>>          MAP_CONSISTENT T
>>>>       &END QS
>>>>       &SCF
>>>>          SCF_GUESS RESTART
>>>>          EPS_SCF 1.0E-7
>>>>          MAX_SCF 300
>>>>          ADDED_MOS 100
>>>>          &DIAGONALIZATION
>>>>             ALGORITHM STANDARD
>>>>          &SMEAR  ON
>>>>             METHOD FERMI_DIRAC
>>>>             ELECTRONIC_TEMPERATURE [K] 300
>>>>          &END SMEAR
>>>>          &MIXING
>>>>             METHOD BROYDEN_MIXING
>>>>             ALPHA 0.2
>>>>             BETA 1.5
>>>>             NBROYDEN 8
>>>>          &END MIXING
>>>>       &END SCF
>>>>       !&XC
>>>>       !   &XC_FUNCTIONAL PBE
>>>>       !   &END XC_FUNCTIONAL
>>>>       !&END XC
>>>>       &XC
>>>>         &XC_FUNCTIONAL
>>>>           &PBE
>>>>             SCALE_X 0.0
>>>>             SCALE_C 1.0
>>>>           &END PBE
>>>>           &XWPBE
>>>>             SCALE_X -0.25
>>>>             SCALE_X0 1.0
>>>>             OMEGA 0.11
>>>>           &END XWPBE
>>>>         &END XC_FUNCTIONAL
>>>>         &HF
>>>>           &SCREENING
>>>>             EPS_SCHWARZ 1.0E-6
>>>>             SCREEN_ON_INITIAL_P T
>>>>           &END SCREENING
>>>>             OMEGA 0.11
>>>>           &MEMORY
>>>>             MAX_MEMORY  4000
>>>>             EPS_STORAGE_SCALING 0.1
>>>>           &END MEMORY
>>>>           FRACTION 0.25
>>>>         &END HF
>>>>       &END XC
>>>>       &PRINT
>>>>          &PDOS
>>>>             FILENAME pdos
>>>>             # print all projected DOS available:
>>>>             NLUMO -1
>>>>             # split the density by quantum number:
>>>>             COMPONENTS
>>>>          &END
>>>>       &END PRINT
>>>>    &END DFT
>>>> The calculation restarted from a converged PBE wavefunction.
>>>> However, I found that the calculation is quite "slow" ( Vasp needs 240 
>>>> seconds for a SCF step, but CP2K needs almost 2400 seconds. Both of them 
>>>> are carried out using a computing node with 24 cores and 120 G memory in 
>>>> total). I understand it is not easy to compare the different software 
>>>> because of very different setups, but I wound expect the ADMM method in 
>>>> CP2K should be much faster.
>>>> Below is the output.
>>>>   Step     Update method      Time    Convergence         Total energy  
>>>>   Change
>>>> ------------------------------------------------------------------------------
>>>>   HFX_MEM_INFO| Est. max. program size before HFX [MiB]:                
>>>>      792
>>>>  *** WARNING in hfx_types.F:1287 :: Periodic Hartree Fock calculation  
>>>>     ***
>>>>  *** requested with use of a truncated or shortrange potential. The 
>>>> cutoff ***
>>>>  *** radius is larger than half the minimal cell dimension. This may 
>>>> lead  ***
>>>>  *** to unphysical total energies. Reduce the cutoff radius in order 
>>>> to    ***
>>>>  *** avoid possible problems.                                          
>>>>     ***
>>>>   HFX_MEM_INFO| Number of cart. primitive ERI's calculated:      
>>>>  11992558561508
>>>>   HFX_MEM_INFO| Number of sph. ERI's calculated:                    
>>>> 157558545566
>>>>   HFX_MEM_INFO| Number of sph. ERI's stored in-core:                
>>>>  16901607068
>>>>   HFX_MEM_INFO| Number of sph. ERI's stored on disk:                    
>>>>        0
>>>>   HFX_MEM_INFO| Number of sph. ERI's calculated on the fly:          
>>>> 91978901962
>>>>   HFX_MEM_INFO| Total memory consumption ERI's RAM [MiB]:              
>>>>     13711
>>>>   HFX_MEM_INFO| Whereof max-vals [MiB]:                                
>>>>       454
>>>>   HFX_MEM_INFO| Total compression factor ERI's RAM:                    
>>>>      9.41
>>>>   HFX_MEM_INFO| Total memory consumption ERI's disk [MiB]:              
>>>>        0
>>>>   HFX_MEM_INFO| Total compression factor ERI's disk:                    
>>>>     0.00
>>>>   HFX_MEM_INFO| Size of density/Fock matrix [MiB]:                      
>>>>       24
>>>>   HFX_MEM_INFO| Size of buffers [MiB]:                                  
>>>>       90
>>>>   HFX_MEM_INFO| Number of periodic image cells considered:              
>>>>      123
>>>>   HFX_MEM_INFO| Est. max. program size after HFX  [MiB]:                
>>>>     3582
>>>>      1 NoMix/Diag. 0.20E+00 6553.7     0.12989389     -3154.6382899197 
>>>> -3.15E+03
>>>>  *** WARNING in hfx_types.F:1287 :: Periodic Hartree Fock calculation  
>>>>     ***
>>>>  *** requested with use of a truncated or shortrange potential. The 
>>>> cutoff ***
>>>>  *** radius is larger than half the minimal cell dimension. This may 
>>>> lead  ***
>>>>  *** to unphysical total energies. Reduce the cutoff radius in order 
>>>> to    ***
>>>>  *** avoid possible problems.                                          
>>>>     ***
>>>>      2 Broy./Diag. 0.20E+00 2486.1     0.00624233     -3159.6346919624 
>>>> -5.00E+00
>>>>  *** WARNING in hfx_types.F:1287 :: Periodic Hartree Fock calculation  
>>>>     ***
>>>>  *** requested with use of a truncated or shortrange potential. The 
>>>> cutoff ***
>>>> Is there anything wrong with my input that slows down the calculation?
>>>> In particular, the " ERI's calculated on the fly" is not zero which 
>>>> seems not good according to a slide from "
>>>> https://mattatlincoln.github.io/talks/GhentWorkshop/?print-pdf#/"
>>>> Thank you very much in advance
>>>> Best Regards,
>>>> Geng
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20200913/ace740ac/attachment.htm>

More information about the CP2K-user mailing list