[CP2K-user] [CP2K:22240] Re: Poor performance of hybrid functional calculation

Johann Pototschnig pototschnig.johann at gmail.com
Tue May 19 11:44:32 UTC 2026
Previous message (by thread): [CP2K-user] [CP2K:22238] Poor performance of hybrid functional calculation
Next message (by thread): [CP2K-user] [CP2K:22240] Re: Poor performance of hybrid functional calculation
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
If the computation gets slower with more OMP threads, most of the time they 
are run on the same CPU. 
You can add to mpirun "--bind-to none" or explicit binding. 
Since you have 52 CPUs you can 13 MPI processes with 4 threads each. 
Depending on the method it can be faster than 52 MPI processes. 

At the end of the output file statistics are provided (showing which 
routine/ library used how much time). 
Can you ether provide the output or this information? This should show 
which library/routine is costly. 


best

On Tuesday, May 19, 2026 at 11:46:57 AM UTC+2 Lorenzo Lagasco wrote:

> Good morning everyone,
> I'm launching a HSEO6  calculation on a 395 atom large system for 
> computing  PDOS and inter-eletronic coupling using Kondov diabatization 
> scheme and Im trying to increase the velocity calculation. The convergence 
> criteria are not particularly strict for scf convergence (as you can check 
> from the input here copied). 
> &GLOBAL
>   PROJECT CuAlO2-HSE06
>   RUN_TYPE ENERGY
>   PRINT_LEVEL LOW
> &END GLOBAL
> &FORCE_EVAL
>
>   METHOD QS
>   &DFT
>     WFN_RESTART_FILE_NAME 
> /auto/tms7/llagasco/DELAFOSSITI-proj/INTERFACE/DOS-interface-HSE06/1.PBE/CuAlO2-PBE-RESTART.wfn
>     LSD
>     MULTIPLICITY 2
>     BASIS_SET_FILE_NAME ./BASIS_MOLOPT
>     BASIS_SET_FILE_NAME ./BASIS_ADMM
>     BASIS_SET_FILE_NAME ./BASIS_ADMM_MOLOPT
>     POTENTIAL_FILE_NAME ./POTENTIAL
>     &MGRID
>       NGRIDS 5
>       CUTOFF 500
>       REL_CUTOFF 50
>     &END MGRID
>     &AUXILIARY_DENSITY_MATRIX_METHOD
>       METHOD BASIS_PROJECTION
>       ADMM_PURIFICATION_METHOD NONE
>     &END
>     &QS
>       EPS_DEFAULT          1.0E-10
>       EPS_PGF_ORB          1.0E-6
>       EXTRAPOLATION PS
>       EXTRAPOLATION_ORDER 3
>      #&DDAPC_RESTRAINT
>      #  STRENGTH   1.0
>      #  TARGET    -1.0
>      #  ATOMS       43
>      #  TYPE_OF_DENSITY SPIN
>      #  FUNCTIONAL_FORM RESTRAINT
>      #&END
>     &END QS
>     &SCF
>       EPS_SCF     1.0E-6
>      #SCF_GUESS ATOMIC
>       SCF_GUESS RESTART
>       MAX_SCF 50
>       &OUTER_SCF
>         EPS_SCF 1.0E-6
>         MAX_SCF 50
>       &END
>      &OT ON
>         MINIMIZER DIIS
>         PRECONDITIONER FULL_SINGLE_INVERSE
>         ENERGY_GAP 0.25
>         LINESEARCH 2PNT
>       &END OT
>     &END SCF
>     &XC
>       &XC_FUNCTIONAL
>        &PBE
>          SCALE_X 0.88
>          SCALE_C 1.0
>        &END PBE
>        &PBE_HOLE_T_C_LR
>            SCALE_X 0.12       ! + 25% of truncated PBE0 functional - that 
> includes exact hfx
>            CUTOFF_RADIUS 4.0  ! that has interaction truncated at 4.0 A 
> from the atomic core
>         &END
>       &END XC_FUNCTIONAL
>       &HF
>         &SCREENING
>           EPS_SCHWARZ 1.0E-6
>           SCREEN_ON_INITIAL_P TRUE
>         &END
>         &INTERACTION_POTENTIAL
>           POTENTIAL_TYPE TRUNCATED
>           CUTOFF_RADIUS 4.0
>           T_C_G_DATA 
> /soft_rocky8/prod/cp2k/12may_2025/gnu14.2.0-openmpi4.1.6/cp2k/data/t_c_g.dat
>         &END
>         &MEMORY
>           MAX_MEMORY 2000
>           EPS_STORAGE_SCALING 0.1
>         &END
>         &END
>         FRACTION 0.12
>        &END
>       &END XC
>       &DENSITY_FITTING
>       &END
>     &PRINT
>       &PDOS
>         NLUMO 100
>         COMPONENTS .TRUE.
>       &END PDOS
>     &END PRINT
>     &END DFT
>   &SUBSYS
>     &CELL
>     A     1.7166299819900004E+01    0.0000000000000000E+00   
>  0.0000000000000000E+00
>     B     0.0000000000000000E+00    1.7678800582899999E+01   
>  0.0000000000000000E+00
>     C     0.0000000000000000E+00    0.0000000000000000E+00   
>  3.3554000854500003E+01
>     PERIODIC  XYZ
>     MULTIPLE_UNIT_CELL  1 1 1
>     &END CELL
>  &TOPOLOGY
>     COORD_FILE_NAME final-geom-opt.xyz
>     COORD_FILE_FORMAT XYZ
>   &END TOPOLOGY
>
>     &KIND Cu
>       ELEMENT Cu
>       BASIS_SET DZVP-MOLOPT-SR-GTH
>       BASIS_SET AUX_FIT cFIT10
>       POTENTIAL GTH-PBE-q11
>        &BS  T
>          &ALPHA
>            NEL  2 2
>            L  2 0
>            N  3 4
>          &END ALPHA
>          &BETA
>            NEL  -2 -2
>            L  2 0
>            N  3 4
>          &END BETA
>        &END BS
>     &END KIND
>
>     &KIND Al
>       ELEMENT Al
>       BASIS_SET DZVP-MOLOPT-SR-GTH
>       BASIS_SET AUX_FIT cFIT3
>       POTENTIAL GTH-PBE-q3
>     &END KIND
>
>     &KIND O_D
>       ELEMENT O
>       BASIS_SET DZVP-MOLOPT-SR-GTH
>       BASIS_SET AUX_FIT cFIT3
>       POTENTIAL GTH-PBE-q6
>     &END KIND
>    &KIND O_COOH
>       ELEMENT O
>       BASIS_SET DZVP-MOLOPT-SR-GTH
>       BASIS_SET AUX_FIT cFIT3
>       POTENTIAL GTH-PBE-q6
>     &END KIND
>
>      &KIND H
>        ELEMENT H
>        BASIS_SET DZVP-MOLOPT-SR-GTH
>        BASIS_SET AUX_FIT cFIT3
>        POTENTIAL GTH-PBE-q1
>      &END KIND
>
>      &KIND H_COOH
>        ELEMENT H
>        BASIS_SET DZVP-MOLOPT-SR-GTH
>        BASIS_SET AUX_FIT cFIT3
>        POTENTIAL GTH-PBE-q1
>      &END KIND
>
>     &KIND O
>       ELEMENT O
>       BASIS_SET DZVP-MOLOPT-SR-GTH
>       BASIS_SET AUX_FIT cFIT3
>       POTENTIAL GTH-PBE-q6
>     &END KIND
>
>     &KIND N
>        ELEMENT N
>        BASIS_SET DZVP-MOLOPT-SR-GTH
>        BASIS_SET AUX_FIT cFIT3
>        POTENTIAL GTH-PBE-q5
>     &END KIND
>
>     &KIND C
>        ELEMENT C
>        BASIS_SET DZVP-MOLOPT-SR-GTH
>        BASIS_SET AUX_FIT cFIT3
>        POTENTIAL GTH-PBE-q4
>     &END KIND
>
>     &KIND C_COOH
>        ELEMENT C
>        BASIS_SET DZVP-MOLOPT-SR-GTH
>        BASIS_SET AUX_FIT cFIT3
>        POTENTIAL GTH-PBE-q4
>     &END KIND
>
>   &END SUBSYS
>
>  Moreover, I am running the calculation on 4 nodes with 52 processors 
> each. The most surprising thing is that increasing the number of OMP 
> threads beyond 1 actually makes the calculation slower. It is already quite 
> slow — a single SCF step takes about 11 minutes.This is my SLUM submission 
> script:
>
> #!/bin/bash
> #SBATCH --job-name=HSE06
> #SBATCH --nodes=4
> #SBATCH --ntasks-per-node=52        
> #SBATCH --partition=taras2-6230r
>
> module purge
> module load cp2k/may2025-gnu14.2.0-openmpi4.1.6-psm211.2.230
>
> # OpenMP settings
> export OMP_NUM_THREADS=1
>
> # Run
> mpirun cp2k.popt -i SLAB+DYE.inp -o SLAB+DYE.out
>
> I was wondering if it is possible to increase the calculation speed. I 
> hope this email is clear
> ~                                                       
>
>
>

-- 
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+unsubscribe at googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/cp2k/1be59005-2f09-41c5-a952-d71f5f2526c9n%40googlegroups.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20260519/e6cc0c43/attachment-0001.htm>
Previous message (by thread): [CP2K-user] [CP2K:22238] Poor performance of hybrid functional calculation
Next message (by thread): [CP2K-user] [CP2K:22240] Re: Poor performance of hybrid functional calculation
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the CP2K-user mailing list