[CP2K-user] [CP2K:13874] Re: needs advice to speed up hybrid/ADMM method

Sun Geng gengs... at gmail.com
Tue Sep 15 03:11:58 UTC 2020


Dear  Nicholas

Thanks for your reply,
and I finally found that my previous question results from the bad
OMP_NUM_THREDS
I was using
export OMP_NUM_THREADS=4
mpirun -np 6 cp2k.psmp

Now I am using
export OMP_NUM_THREADS=1
mpirun -np 24 cp2k.psmp,

The latter one gives a reasonable SCF time,
after the first SCF. ( I am attaching the standard output below).

Best Regards,
Geng


Nicholas Winner <nwi... at berkeley.edu> 于2020年9月14日周一 下午2:02写道:
>
> I'm a little uncertain now. If you have 24 processes, then it should be enough on one node with 120Gb, then it should be enough. Are you sure your number of MPI tasks is not something smaller than 24?
>
> On Sunday, September 13, 2020 at 4:51:36 PM UTC-7 ge... at gmail.com wrote:
>>
>> Hi,
>> Thank you very much for your help.
>> I will try out the section with OT, and I am more clear about on-the-fly ERI numbers now.
>>
>> I have set up the PRINT_LEVEL to be HIGH,  so the full output is a little big lengthy,
>> but I believe you are referring this section:
>>
>>  MEMORY| system memory details [Kb]
>>  MEMORY|                        rank 0           min           max       average
>>  MEMORY| MemTotal            132183208     132183208     132183208     132183208
>>  MEMORY| MemFree             128323088     128323088     128323088     128323088
>>  MEMORY| Buffers                 27608         27608         27608         27608
>>  MEMORY| Cached                 687440        687440        687440        687440
>>  MEMORY| Slab                   284408        284408        284408        284408
>>  MEMORY| SReclaimable           154740        154740        154740        154740
>>  MEMORY| MemLikelyFree       129192876     129192876     129192876     129192876,
>>
>> So I guess what I can use is 128323088 /24 ( I have 24 cores in the node)/1000 = 5346 MB per MPI process,
>> but since I need to reserve some for system and other parts of CP2K, finally I should give MAX_MEMORY a number smaller than that.
>> Please let me know if I am incorrect.
>>
>> Thanks again,
>>
>> Best Wishes,
>>
>> Geng
>>
>>
>>
>> 在2020年9月13日星期日 UTC-7 下午3:58:15<n... at berkeley.edu> 写道:
>>>
>>> (1) OT can get the band gap, if you include the section:
>>> &DFT
>>>     &PRINT
>>>         &MO_CUBES
>>>             WRITE_CUBE False
>>>             NHOMO 1
>>>             NLUMO 1
>>>         &END
>>>     &END
>>> &END
>>>
>>> It will print a line "HOMO-LUMO" with the band gap. Now, because OT only works on the occupied levels, it will return a *slightly* different result from diagonalization for the band gap. I've tested it a few times and found it was only 0.01eV difference for a moderately gapped material, so I think it should be fine for most applications. If you need the levels to be super accurate, you can always converge with OT, then re-evaluate with diagonalization, but if you're using ADMM then you're willing to sacrifice a tiny bit of accuracy anyway.
>>>
>>> (2) For max memory, you can look at the beginning of your cp2k output file and find the line that says "MemFree" and divide it by the number of message-passing processes (also listed in out file near the top), and see how much you should have available for each mpi process. Can we see full output file?
>>>
>>> (3) "Finally, I would assume the line (printing the Number of sph. ERI's calculated on the fly is not zero) only works for the first SCF iteration? ". To be clear, the output says
>>>
>>>  Number of sph. ERI's stored in-core: 16901607068
>>> and
>>>  Number of sph. ERI's calculated on the fly: 91978901962
>>>
>>> This means that during the first step "16901607068" ERIs were stored, and are re-used each SCF step, but "91978901962" ERIs could not be stored, and will be re-calculated on SCF step 2,3,4...
>>>
>>> On Sunday, September 13, 2020 at 2:11:33 PM UTC-7 ge... at gmail.com wrote:
>>>>
>>>> Hi,
>>>> Thank you for your prompt reply.
>>>> For the basis set, I am using the small (DZVP) primary basis sets and small auxiliary basis set too. I planned to increase them if  the accuracy is not optimal.
>>>> Indeed, my target system has a large band gap (~3eV). However, I would like to study the band gap of the material, can I use OT method? it seems that OT method only prints the energies of occupied orbitals. Please correct me if I am wrong.
>>>>
>>>> Finally, I would assume the line (printing the Number of sph. ERI's calculated on the fly is not zero) only works for the first SCF iteration?
>>>> and How could I have a good estimation on the MAX_MEMORY?
>>>> and will the MAX_MEMORY depend on the choice of OMP_NUM_THREAD?
>>>>
>>>> Thank you very much.
>>>> Best Regards,
>>>> Geng
>>>>
>>>>
>>>>
>>>> 在2020年9月13日星期日 UTC-7 下午12:15:40<n... at berkeley.edu> 写道:
>>>>>
>>>>> You pointed out a key issue at the end of your post. "ERI's calculated on the fly" should ideally be zero. The reason is that the 4-center electron-repulsion integrals (ERIs) are geometric objects, and only need to be evaluated in the first SCF, provided you can store their results in memory. If you have enough memory for this, then the first SCF step will be long, but the subsequent SCF steps will be only slightly more expensive than a GGA calculation.
>>>>>
>>>>> Other than that there are two things I might note:
>>>>> (1) If your system has a band-gap, you should use the OT method instead of standard matrix diagonalization, it scales quite well and has very nice convergence behavior.
>>>>> (2) You say you have 120Gb of memory available for you calculation, but only 13Gb are consumed by your HFX module. Even with the rest of the cp2k program taking some memory, you should have a lot more memory left over for storing thee ERIs. Double check MAX_MEMORY is a reasonable value, it is the max amount of memory for each MPI task to use.
>>>>> (3) Last thing that could be an issue is your auxiliary basis set, which ones are you using for this calculation? ADMM is so beneficial because you can use a smaller, aux basis, for the HF part of the calculation, but maybe your are using a large aux basis set?
>>>>>
>>>>> In general, ADMM calc should be much faster than the same calc in vasp using a primary basis set, so long as you don't make thee supercell too big.
>>>>> On Sunday, September 13, 2020 at 11:42:55 AM UTC-7 ge... at gmail.com wrote:
>>>>>>
>>>>>> Dear CP2K users,
>>>>>>
>>>>>> I would like benchmark  a small periodic system (11 A x 11 A x 11A)  using HSE06 functional  with results obtained from VASP,
>>>>>> Here is my input for DFT section:
>>>>>>
>>>>>>    &DFT
>>>>>>       BASIS_SET_FILE_NAME BASIS_MOLOPT_UCL
>>>>>>       BASIS_SET_FILE_NAME BASIS_MOLOPT
>>>>>>       BASIS_SET_FILE_NAME BASIS_ADMM_MOLOPT
>>>>>>       BASIS_SET_FILE_NAME BASIS_ADMM
>>>>>>       POTENTIAL_FILE_NAME GTH_POTENTIALS
>>>>>>       WFN_RESTART_FILE_NAME cp2k-RESTART.wfn
>>>>>>       &MGRID
>>>>>>          CUTOFF 320
>>>>>>          COMMENSURATE
>>>>>>       &END MGRID
>>>>>>       &QS
>>>>>>          EXTRAPOLATION PS
>>>>>>          EXTRAPOLATION_ORDER 3
>>>>>>          EPS_DEFAULT  1.0E-11
>>>>>>          EPS_PGF_ORB  1.0E-14
>>>>>>          MAP_CONSISTENT T
>>>>>>       &END QS
>>>>>>       &SCF
>>>>>>          SCF_GUESS RESTART
>>>>>>          EPS_SCF 1.0E-7
>>>>>>          MAX_SCF 300
>>>>>>          ADDED_MOS 100
>>>>>>          &DIAGONALIZATION
>>>>>>             ALGORITHM STANDARD
>>>>>>          &END DIAGONALIZATION
>>>>>>          &SMEAR  ON
>>>>>>             METHOD FERMI_DIRAC
>>>>>>             ELECTRONIC_TEMPERATURE [K] 300
>>>>>>          &END SMEAR
>>>>>>          &MIXING
>>>>>>             METHOD BROYDEN_MIXING
>>>>>>             ALPHA 0.2
>>>>>>             BETA 1.5
>>>>>>             NBROYDEN 8
>>>>>>          &END MIXING
>>>>>>       &END SCF
>>>>>>       !&XC
>>>>>>       !   &XC_FUNCTIONAL PBE
>>>>>>       !   &END XC_FUNCTIONAL
>>>>>>       !&END XC
>>>>>>       &XC
>>>>>>         &XC_FUNCTIONAL
>>>>>>           &PBE
>>>>>>             SCALE_X 0.0
>>>>>>             SCALE_C 1.0
>>>>>>           &END PBE
>>>>>>           &XWPBE
>>>>>>             SCALE_X -0.25
>>>>>>             SCALE_X0 1.0
>>>>>>             OMEGA 0.11
>>>>>>           &END XWPBE
>>>>>>         &END XC_FUNCTIONAL
>>>>>>         &HF
>>>>>>           &SCREENING
>>>>>>             EPS_SCHWARZ 1.0E-6
>>>>>>             SCREEN_ON_INITIAL_P T
>>>>>>           &END SCREENING
>>>>>>           &INTERACTION_POTENTIAL
>>>>>>             POTENTIAL_TYPE SHORTRANGE
>>>>>>             OMEGA 0.11
>>>>>>           &END INTERACTION_POTENTIAL
>>>>>>           &MEMORY
>>>>>>             MAX_MEMORY  4000
>>>>>>             EPS_STORAGE_SCALING 0.1
>>>>>>           &END MEMORY
>>>>>>           FRACTION 0.25
>>>>>>         &END HF
>>>>>>       &END XC
>>>>>>       &AUXILIARY_DENSITY_MATRIX_METHOD
>>>>>>           METHOD BASIS_PROJECTION
>>>>>>           ADMM_PURIFICATION_METHOD NONE
>>>>>>       &END AUXILIARY_DENSITY_MATRIX_METHOD
>>>>>>       &PRINT
>>>>>>          &PDOS
>>>>>>             FILENAME pdos
>>>>>>             # print all projected DOS available:
>>>>>>             NLUMO -1
>>>>>>             # split the density by quantum number:
>>>>>>             COMPONENTS
>>>>>>          &END
>>>>>>       &END PRINT
>>>>>>    &END DFT
>>>>>>
>>>>>> The calculation restarted from a converged PBE wavefunction.
>>>>>> However, I found that the calculation is quite "slow" ( Vasp needs 240 seconds for a SCF step, but CP2K needs almost 2400 seconds. Both of them are carried out using a computing node with 24 cores and 120 G memory in total). I understand it is not easy to compare the different software because of very different setups, but I wound expect the ADMM method in CP2K should be much faster.
>>>>>>
>>>>>> Below is the output.
>>>>>>
>>>>>>  SCF WAVEFUNCTION OPTIMIZATION
>>>>>>
>>>>>>   Step     Update method      Time    Convergence         Total energy    Change
>>>>>>   ------------------------------------------------------------------------------
>>>>>>
>>>>>>   HFX_MEM_INFO| Est. max. program size before HFX [MiB]:                     792
>>>>>>
>>>>>>  *** WARNING in hfx_types.F:1287 :: Periodic Hartree Fock calculation      ***
>>>>>>  *** requested with use of a truncated or shortrange potential. The cutoff ***
>>>>>>  *** radius is larger than half the minimal cell dimension. This may lead  ***
>>>>>>  *** to unphysical total energies. Reduce the cutoff radius in order to    ***
>>>>>>  *** avoid possible problems.                                              ***
>>>>>>
>>>>>>   HFX_MEM_INFO| Number of cart. primitive ERI's calculated:       11992558561508
>>>>>>   HFX_MEM_INFO| Number of sph. ERI's calculated:                    157558545566
>>>>>>   HFX_MEM_INFO| Number of sph. ERI's stored in-core:                 16901607068
>>>>>>   HFX_MEM_INFO| Number of sph. ERI's stored on disk:                           0
>>>>>>   HFX_MEM_INFO| Number of sph. ERI's calculated on the fly:          91978901962
>>>>>>   HFX_MEM_INFO| Total memory consumption ERI's RAM [MiB]:                  13711
>>>>>>   HFX_MEM_INFO| Whereof max-vals [MiB]:                                      454
>>>>>>   HFX_MEM_INFO| Total compression factor ERI's RAM:                         9.41
>>>>>>   HFX_MEM_INFO| Total memory consumption ERI's disk [MiB]:                     0
>>>>>>   HFX_MEM_INFO| Total compression factor ERI's disk:                        0.00
>>>>>>   HFX_MEM_INFO| Size of density/Fock matrix [MiB]:                            24
>>>>>>   HFX_MEM_INFO| Size of buffers [MiB]:                                        90
>>>>>>   HFX_MEM_INFO| Number of periodic image cells considered:                   123
>>>>>>   HFX_MEM_INFO| Est. max. program size after HFX  [MiB]:                    3582
>>>>>>
>>>>>>      1 NoMix/Diag. 0.20E+00 6553.7     0.12989389     -3154.6382899197 -3.15E+03
>>>>>>
>>>>>>  *** WARNING in hfx_types.F:1287 :: Periodic Hartree Fock calculation      ***
>>>>>>  *** requested with use of a truncated or shortrange potential. The cutoff ***
>>>>>>  *** radius is larger than half the minimal cell dimension. This may lead  ***
>>>>>>  *** to unphysical total energies. Reduce the cutoff radius in order to    ***
>>>>>>  *** avoid possible problems.                                              ***
>>>>>>
>>>>>>      2 Broy./Diag. 0.20E+00 2486.1     0.00624233     -3159.6346919624 -5.00E+00
>>>>>>
>>>>>>  *** WARNING in hfx_types.F:1287 :: Periodic Hartree Fock calculation      ***
>>>>>>  *** requested with use of a truncated or shortrange potential. The cutoff ***
>>>>>>
>>>>>>
>>>>>> Is there anything wrong with my input that slows down the calculation?
>>>>>> In particular, the " ERI's calculated on the fly" is not zero which seems not good according to a slide from "https://mattatlincoln.github.io/talks/GhentWorkshop/?print-pdf#/"
>>>>>>
>>>>>> Thank you very much in advance
>>>>>> Best Regards,
>>>>>> Geng
>>>>>>
> --
> You received this message because you are subscribed to the Google Groups "cp2k" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to cp... at googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/1f2f99a3-09f9-4526-a74e-2c914d3bff94n%40googlegroups.com.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cp2k_optimization.out
Type: application/octet-stream
Size: 12280 bytes
Desc: not available
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20200914/4187f5e1/attachment.obj>


More information about the CP2K-user mailing list