[CP2K-user] [CP2K:22240] Re: Poor performance of hybrid functional calculation
Johann Pototschnig
pototschnig.johann at gmail.com
Tue May 19 11:44:32 UTC 2026
If the computation gets slower with more OMP threads, most of the time they
are run on the same CPU.
You can add to mpirun "--bind-to none" or explicit binding.
Since you have 52 CPUs you can 13 MPI processes with 4 threads each.
Depending on the method it can be faster than 52 MPI processes.
At the end of the output file statistics are provided (showing which
routine/ library used how much time).
Can you ether provide the output or this information? This should show
which library/routine is costly.
best
On Tuesday, May 19, 2026 at 11:46:57 AM UTC+2 Lorenzo Lagasco wrote:
> Good morning everyone,
> I'm launching a HSEO6 calculation on a 395 atom large system for
> computing PDOS and inter-eletronic coupling using Kondov diabatization
> scheme and Im trying to increase the velocity calculation. The convergence
> criteria are not particularly strict for scf convergence (as you can check
> from the input here copied).
> &GLOBAL
> PROJECT CuAlO2-HSE06
> RUN_TYPE ENERGY
> PRINT_LEVEL LOW
> &END GLOBAL
> &FORCE_EVAL
>
> METHOD QS
> &DFT
> WFN_RESTART_FILE_NAME
> /auto/tms7/llagasco/DELAFOSSITI-proj/INTERFACE/DOS-interface-HSE06/1.PBE/CuAlO2-PBE-RESTART.wfn
> LSD
> MULTIPLICITY 2
> BASIS_SET_FILE_NAME ./BASIS_MOLOPT
> BASIS_SET_FILE_NAME ./BASIS_ADMM
> BASIS_SET_FILE_NAME ./BASIS_ADMM_MOLOPT
> POTENTIAL_FILE_NAME ./POTENTIAL
> &MGRID
> NGRIDS 5
> CUTOFF 500
> REL_CUTOFF 50
> &END MGRID
> &AUXILIARY_DENSITY_MATRIX_METHOD
> METHOD BASIS_PROJECTION
> ADMM_PURIFICATION_METHOD NONE
> &END
> &QS
> EPS_DEFAULT 1.0E-10
> EPS_PGF_ORB 1.0E-6
> EXTRAPOLATION PS
> EXTRAPOLATION_ORDER 3
> #&DDAPC_RESTRAINT
> # STRENGTH 1.0
> # TARGET -1.0
> # ATOMS 43
> # TYPE_OF_DENSITY SPIN
> # FUNCTIONAL_FORM RESTRAINT
> #&END
> &END QS
> &SCF
> EPS_SCF 1.0E-6
> #SCF_GUESS ATOMIC
> SCF_GUESS RESTART
> MAX_SCF 50
> &OUTER_SCF
> EPS_SCF 1.0E-6
> MAX_SCF 50
> &END
> &OT ON
> MINIMIZER DIIS
> PRECONDITIONER FULL_SINGLE_INVERSE
> ENERGY_GAP 0.25
> LINESEARCH 2PNT
> &END OT
> &END SCF
> &XC
> &XC_FUNCTIONAL
> &PBE
> SCALE_X 0.88
> SCALE_C 1.0
> &END PBE
> &PBE_HOLE_T_C_LR
> SCALE_X 0.12 ! + 25% of truncated PBE0 functional - that
> includes exact hfx
> CUTOFF_RADIUS 4.0 ! that has interaction truncated at 4.0 A
> from the atomic core
> &END
> &END XC_FUNCTIONAL
> &HF
> &SCREENING
> EPS_SCHWARZ 1.0E-6
> SCREEN_ON_INITIAL_P TRUE
> &END
> &INTERACTION_POTENTIAL
> POTENTIAL_TYPE TRUNCATED
> CUTOFF_RADIUS 4.0
> T_C_G_DATA
> /soft_rocky8/prod/cp2k/12may_2025/gnu14.2.0-openmpi4.1.6/cp2k/data/t_c_g.dat
> &END
> &MEMORY
> MAX_MEMORY 2000
> EPS_STORAGE_SCALING 0.1
> &END
> &END
> FRACTION 0.12
> &END
> &END XC
> &DENSITY_FITTING
> &END
> &PRINT
> &PDOS
> NLUMO 100
> COMPONENTS .TRUE.
> &END PDOS
> &END PRINT
> &END DFT
> &SUBSYS
> &CELL
> A 1.7166299819900004E+01 0.0000000000000000E+00
> 0.0000000000000000E+00
> B 0.0000000000000000E+00 1.7678800582899999E+01
> 0.0000000000000000E+00
> C 0.0000000000000000E+00 0.0000000000000000E+00
> 3.3554000854500003E+01
> PERIODIC XYZ
> MULTIPLE_UNIT_CELL 1 1 1
> &END CELL
> &TOPOLOGY
> COORD_FILE_NAME final-geom-opt.xyz
> COORD_FILE_FORMAT XYZ
> &END TOPOLOGY
>
> &KIND Cu
> ELEMENT Cu
> BASIS_SET DZVP-MOLOPT-SR-GTH
> BASIS_SET AUX_FIT cFIT10
> POTENTIAL GTH-PBE-q11
> &BS T
> &ALPHA
> NEL 2 2
> L 2 0
> N 3 4
> &END ALPHA
> &BETA
> NEL -2 -2
> L 2 0
> N 3 4
> &END BETA
> &END BS
> &END KIND
>
> &KIND Al
> ELEMENT Al
> BASIS_SET DZVP-MOLOPT-SR-GTH
> BASIS_SET AUX_FIT cFIT3
> POTENTIAL GTH-PBE-q3
> &END KIND
>
> &KIND O_D
> ELEMENT O
> BASIS_SET DZVP-MOLOPT-SR-GTH
> BASIS_SET AUX_FIT cFIT3
> POTENTIAL GTH-PBE-q6
> &END KIND
> &KIND O_COOH
> ELEMENT O
> BASIS_SET DZVP-MOLOPT-SR-GTH
> BASIS_SET AUX_FIT cFIT3
> POTENTIAL GTH-PBE-q6
> &END KIND
>
> &KIND H
> ELEMENT H
> BASIS_SET DZVP-MOLOPT-SR-GTH
> BASIS_SET AUX_FIT cFIT3
> POTENTIAL GTH-PBE-q1
> &END KIND
>
> &KIND H_COOH
> ELEMENT H
> BASIS_SET DZVP-MOLOPT-SR-GTH
> BASIS_SET AUX_FIT cFIT3
> POTENTIAL GTH-PBE-q1
> &END KIND
>
> &KIND O
> ELEMENT O
> BASIS_SET DZVP-MOLOPT-SR-GTH
> BASIS_SET AUX_FIT cFIT3
> POTENTIAL GTH-PBE-q6
> &END KIND
>
> &KIND N
> ELEMENT N
> BASIS_SET DZVP-MOLOPT-SR-GTH
> BASIS_SET AUX_FIT cFIT3
> POTENTIAL GTH-PBE-q5
> &END KIND
>
> &KIND C
> ELEMENT C
> BASIS_SET DZVP-MOLOPT-SR-GTH
> BASIS_SET AUX_FIT cFIT3
> POTENTIAL GTH-PBE-q4
> &END KIND
>
> &KIND C_COOH
> ELEMENT C
> BASIS_SET DZVP-MOLOPT-SR-GTH
> BASIS_SET AUX_FIT cFIT3
> POTENTIAL GTH-PBE-q4
> &END KIND
>
> &END SUBSYS
>
> Moreover, I am running the calculation on 4 nodes with 52 processors
> each. The most surprising thing is that increasing the number of OMP
> threads beyond 1 actually makes the calculation slower. It is already quite
> slow — a single SCF step takes about 11 minutes.This is my SLUM submission
> script:
>
> #!/bin/bash
> #SBATCH --job-name=HSE06
> #SBATCH --nodes=4
> #SBATCH --ntasks-per-node=52
> #SBATCH --partition=taras2-6230r
>
> module purge
> module load cp2k/may2025-gnu14.2.0-openmpi4.1.6-psm211.2.230
>
> # OpenMP settings
> export OMP_NUM_THREADS=1
>
> # Run
> mpirun cp2k.popt -i SLAB+DYE.inp -o SLAB+DYE.out
>
> I was wondering if it is possible to increase the calculation speed. I
> hope this email is clear
> ~
>
>
>
--
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+unsubscribe at googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/cp2k/1be59005-2f09-41c5-a952-d71f5f2526c9n%40googlegroups.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20260519/e6cc0c43/attachment-0001.htm>
More information about the CP2K-user
mailing list