[CP2K-user] [CP2K:22240] Re: Poor performance of hybrid functional calculation

Lorenzo Lagasco lagascolorenzo at gmail.com
Tue May 19 12:49:15 UTC 2026


Unfortunately, the calculation is still going and statistics are not yet
printed.

Best
Lorenzo Lagasco


Il giorno mar 19 mag 2026 alle ore 13:46 Johann Pototschnig <
pototschnig.johann at gmail.com> ha scritto:

> If the computation gets slower with more OMP threads, most of the time
> they are run on the same CPU.
> You can add to mpirun "--bind-to none" or explicit binding.
> Since you have 52 CPUs you can 13 MPI processes with 4 threads each.
> Depending on the method it can be faster than 52 MPI processes.
>
> At the end of the output file statistics are provided (showing which
> routine/ library used how much time).
> Can you ether provide the output or this information? This should show
> which library/routine is costly.
>
>
> best
>
> On Tuesday, May 19, 2026 at 11:46:57 AM UTC+2 Lorenzo Lagasco wrote:
>
>> Good morning everyone,
>> I'm launching a HSEO6  calculation on a 395 atom large system for
>> computing  PDOS and inter-eletronic coupling using Kondov diabatization
>> scheme and Im trying to increase the velocity calculation. The convergence
>> criteria are not particularly strict for scf convergence (as you can check
>> from the input here copied).
>> &GLOBAL
>>   PROJECT CuAlO2-HSE06
>>   RUN_TYPE ENERGY
>>   PRINT_LEVEL LOW
>> &END GLOBAL
>> &FORCE_EVAL
>>
>>   METHOD QS
>>   &DFT
>>     WFN_RESTART_FILE_NAME
>> /auto/tms7/llagasco/DELAFOSSITI-proj/INTERFACE/DOS-interface-HSE06/1.PBE/CuAlO2-PBE-RESTART.wfn
>>     LSD
>>     MULTIPLICITY 2
>>     BASIS_SET_FILE_NAME ./BASIS_MOLOPT
>>     BASIS_SET_FILE_NAME ./BASIS_ADMM
>>     BASIS_SET_FILE_NAME ./BASIS_ADMM_MOLOPT
>>     POTENTIAL_FILE_NAME ./POTENTIAL
>>     &MGRID
>>       NGRIDS 5
>>       CUTOFF 500
>>       REL_CUTOFF 50
>>     &END MGRID
>>     &AUXILIARY_DENSITY_MATRIX_METHOD
>>       METHOD BASIS_PROJECTION
>>       ADMM_PURIFICATION_METHOD NONE
>>     &END
>>     &QS
>>       EPS_DEFAULT          1.0E-10
>>       EPS_PGF_ORB          1.0E-6
>>       EXTRAPOLATION PS
>>       EXTRAPOLATION_ORDER 3
>>      #&DDAPC_RESTRAINT
>>      #  STRENGTH   1.0
>>      #  TARGET    -1.0
>>      #  ATOMS       43
>>      #  TYPE_OF_DENSITY SPIN
>>      #  FUNCTIONAL_FORM RESTRAINT
>>      #&END
>>     &END QS
>>     &SCF
>>       EPS_SCF     1.0E-6
>>      #SCF_GUESS ATOMIC
>>       SCF_GUESS RESTART
>>       MAX_SCF 50
>>       &OUTER_SCF
>>         EPS_SCF 1.0E-6
>>         MAX_SCF 50
>>       &END
>>      &OT ON
>>         MINIMIZER DIIS
>>         PRECONDITIONER FULL_SINGLE_INVERSE
>>         ENERGY_GAP 0.25
>>         LINESEARCH 2PNT
>>       &END OT
>>     &END SCF
>>     &XC
>>       &XC_FUNCTIONAL
>>        &PBE
>>          SCALE_X 0.88
>>          SCALE_C 1.0
>>        &END PBE
>>        &PBE_HOLE_T_C_LR
>>            SCALE_X 0.12       ! + 25% of truncated PBE0 functional - that
>> includes exact hfx
>>            CUTOFF_RADIUS 4.0  ! that has interaction truncated at 4.0 A
>> from the atomic core
>>         &END
>>       &END XC_FUNCTIONAL
>>       &HF
>>         &SCREENING
>>           EPS_SCHWARZ 1.0E-6
>>           SCREEN_ON_INITIAL_P TRUE
>>         &END
>>         &INTERACTION_POTENTIAL
>>           POTENTIAL_TYPE TRUNCATED
>>           CUTOFF_RADIUS 4.0
>>           T_C_G_DATA
>> /soft_rocky8/prod/cp2k/12may_2025/gnu14.2.0-openmpi4.1.6/cp2k/data/t_c_g.dat
>>         &END
>>         &MEMORY
>>           MAX_MEMORY 2000
>>           EPS_STORAGE_SCALING 0.1
>>         &END
>>         &END
>>         FRACTION 0.12
>>        &END
>>       &END XC
>>       &DENSITY_FITTING
>>       &END
>>     &PRINT
>>       &PDOS
>>         NLUMO 100
>>         COMPONENTS .TRUE.
>>       &END PDOS
>>     &END PRINT
>>     &END DFT
>>   &SUBSYS
>>     &CELL
>>     A     1.7166299819900004E+01    0.0000000000000000E+00
>>  0.0000000000000000E+00
>>     B     0.0000000000000000E+00    1.7678800582899999E+01
>>  0.0000000000000000E+00
>>     C     0.0000000000000000E+00    0.0000000000000000E+00
>>  3.3554000854500003E+01
>>     PERIODIC  XYZ
>>     MULTIPLE_UNIT_CELL  1 1 1
>>     &END CELL
>>  &TOPOLOGY
>>     COORD_FILE_NAME final-geom-opt.xyz
>>     COORD_FILE_FORMAT XYZ
>>   &END TOPOLOGY
>>
>>     &KIND Cu
>>       ELEMENT Cu
>>       BASIS_SET DZVP-MOLOPT-SR-GTH
>>       BASIS_SET AUX_FIT cFIT10
>>       POTENTIAL GTH-PBE-q11
>>        &BS  T
>>          &ALPHA
>>            NEL  2 2
>>            L  2 0
>>            N  3 4
>>          &END ALPHA
>>          &BETA
>>            NEL  -2 -2
>>            L  2 0
>>            N  3 4
>>          &END BETA
>>        &END BS
>>     &END KIND
>>
>>     &KIND Al
>>       ELEMENT Al
>>       BASIS_SET DZVP-MOLOPT-SR-GTH
>>       BASIS_SET AUX_FIT cFIT3
>>       POTENTIAL GTH-PBE-q3
>>     &END KIND
>>
>>     &KIND O_D
>>       ELEMENT O
>>       BASIS_SET DZVP-MOLOPT-SR-GTH
>>       BASIS_SET AUX_FIT cFIT3
>>       POTENTIAL GTH-PBE-q6
>>     &END KIND
>>    &KIND O_COOH
>>       ELEMENT O
>>       BASIS_SET DZVP-MOLOPT-SR-GTH
>>       BASIS_SET AUX_FIT cFIT3
>>       POTENTIAL GTH-PBE-q6
>>     &END KIND
>>
>>      &KIND H
>>        ELEMENT H
>>        BASIS_SET DZVP-MOLOPT-SR-GTH
>>        BASIS_SET AUX_FIT cFIT3
>>        POTENTIAL GTH-PBE-q1
>>      &END KIND
>>
>>      &KIND H_COOH
>>        ELEMENT H
>>        BASIS_SET DZVP-MOLOPT-SR-GTH
>>        BASIS_SET AUX_FIT cFIT3
>>        POTENTIAL GTH-PBE-q1
>>      &END KIND
>>
>>     &KIND O
>>       ELEMENT O
>>       BASIS_SET DZVP-MOLOPT-SR-GTH
>>       BASIS_SET AUX_FIT cFIT3
>>       POTENTIAL GTH-PBE-q6
>>     &END KIND
>>
>>     &KIND N
>>        ELEMENT N
>>        BASIS_SET DZVP-MOLOPT-SR-GTH
>>        BASIS_SET AUX_FIT cFIT3
>>        POTENTIAL GTH-PBE-q5
>>     &END KIND
>>
>>     &KIND C
>>        ELEMENT C
>>        BASIS_SET DZVP-MOLOPT-SR-GTH
>>        BASIS_SET AUX_FIT cFIT3
>>        POTENTIAL GTH-PBE-q4
>>     &END KIND
>>
>>     &KIND C_COOH
>>        ELEMENT C
>>        BASIS_SET DZVP-MOLOPT-SR-GTH
>>        BASIS_SET AUX_FIT cFIT3
>>        POTENTIAL GTH-PBE-q4
>>     &END KIND
>>
>>   &END SUBSYS
>>
>>  Moreover, I am running the calculation on 4 nodes with 52 processors
>> each. The most surprising thing is that increasing the number of OMP
>> threads beyond 1 actually makes the calculation slower. It is already quite
>> slow — a single SCF step takes about 11 minutes.This is my SLUM submission
>> script:
>>
>> #!/bin/bash
>> #SBATCH --job-name=HSE06
>> #SBATCH --nodes=4
>> #SBATCH --ntasks-per-node=52
>> #SBATCH --partition=taras2-6230r
>>
>> module purge
>> module load cp2k/may2025-gnu14.2.0-openmpi4.1.6-psm211.2.230
>>
>> # OpenMP settings
>> export OMP_NUM_THREADS=1
>>
>> # Run
>> mpirun cp2k.popt -i SLAB+DYE.inp -o SLAB+DYE.out
>>
>> I was wondering if it is possible to increase the calculation speed. I
>> hope this email is clear
>> ~
>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "cp2k" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cp2k+unsubscribe at googlegroups.com.
> To view this discussion visit
> https://groups.google.com/d/msgid/cp2k/1be59005-2f09-41c5-a952-d71f5f2526c9n%40googlegroups.com
> <https://groups.google.com/d/msgid/cp2k/1be59005-2f09-41c5-a952-d71f5f2526c9n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+unsubscribe at googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/cp2k/CAKRVyRsrAdzoZaMisQZ-8kwN8_kS_qZxUST0c%3DtB6wCyv3OZUQ%40mail.gmail.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20260519/caf5b5dd/attachment-0001.htm>


More information about the CP2K-user mailing list