How to use GPU to accelerate a calculations
Ole Schütt
o... at schuett.name
Fri Nov 20 12:58:30 UTC 2015
Hi Jianfeng,
try replacing -D__DBCSR_CUDA with -D__DBCSR_ACC. That might just do the
trick.
-Ole
Am Freitag, 20. November 2015 13:04:25 UTC+1 schrieb jjf... at yahoo.com.cn:
>
> Dear all,
> I have compiled my CP2K with:
> DFLAGS = -D__INTEL -D__FFTSG -D__parallel -D__BLACS -D__SCALAPACK
> -D__FFTW3 -D__FFTMKL -D__LIBINT -D__LIBXC2 -D__ACC -D__CUDAPW
> -D__DBCSR_CUDA -D__LIBINT_MAX_AM=6 -D__LIBDERIV_MAX_AM1=5
>
> I have revised the generate.py as following:
> triples = combinations(23) # blocked H2O (benchmark)
> triples += combinations(6) # idem min basis
> triples += combinations(14,16,29) # RPA water
> triples += combinations(5, 32, 13, 24, 26)
> triples += combinations(9, 32, 22)
> triples += combinations(32)
> triples += combinations(64)
> triples += combinations(78)
> triples += combinations(16,29,55)
> triples += combinations(13,32,13)
> triples += combinations(26,32,13)
> triples += combinations(13,32,26)
> triples += combinations(26,32,26)
> triples += combinations(13,32,9)
> triples += combinations(9,32,13)
> triples += combinations(26,32,9)
> triples += combinations(9,32,26)
>
> However, I don't find the acceleration from MY GPU (K20m). The DBCSR
> STATISTICS was as following:
>
> COUNTER CPU
> ACC ACC%
> number of processed stacks 388804
> 0 0.0
> matmuls inhomo. stacks 12328621
> 0 0.0
> matmuls total 242005257
> 0 0.0
> flops 9 x 21 x 9 1073834496
> 0 0.0
> flops 224 x 224 x 224 1753350144
> 0 0.0
> flops 224 x 224 x 245 1917726720
> 0 0.0
> flops 224 x 245 x 224 1917726720
> 0 0.0
> flops 245 x 224 x 224 1917726720
> 0 0.0
> flops 224 x 245 x 245 2097513600
> 0 0.0
> flops 245 x 224 x 245 2097513600
> 0 0.0
> flops 245 x 245 x 224 2097513600
> 0 0.0
> flops 245 x 245 x 245 2294155500
> 0 0.0
> flops 9 x 9 x 213 2556480528
> 0 0.0
> flops 26 x 21 x 9 3102188544
> 0 0.0
> flops 9 x 21 x 26 3102188544
> 0 0.0
> flops 224 x 224 x 256 4007657472
> 0 0.0
> flops 224 x 256 x 224 4007657472
> 0 0.0
> flops 256 x 224 x 224 4007657472
> 0 0.0
> flops 224 x 245 x 256 4383375360
> 0 0.0
> flops 224 x 256 x 245 4383375360
> 0 0.0
> flops 245 x 224 x 256 4383375360
> 0 0.0
> flops 245 x 256 x 224 4383375360
> 0 0.0
> flops 256 x 224 x 245 4383375360
> 0 0.0
> flops 256 x 245 x 224 4383375360
> 0 0.0
> flops 9 x 21 x 13 4737435066
> 0 0.0
> flops 13 x 21 x 9 4737435066
> 0 0.0
> flops 245 x 245 x 256 4794316800
> 0 0.0
> flops 245 x 256 x 245 4794316800
> 0 0.0
> flops 256 x 245 x 245 4794316800
> 0 0.0
> flops 26 x 9 x 213 7222105800
> 0 0.0
> flops 9 x 26 x 213 7247226168
> 0 0.0
> flops 26 x 21 x 26 8961878016
> 0 0.0
> flops 256 x 256 x 224 9160359936
> 0 0.0
> flops 224 x 256 x 256 9160359936
> 0 0.0
> flops 256 x 224 x 256 9160359936
> 0 0.0
> flops 9 x 9 x 256 9217732608
> 0 0.0
> flops 929 x 224 x 224 9882062848
> 0 0.0
> flops 256 x 256 x 245 10019143680
> 0 0.0
> flops 245 x 256 x 256 10019143680
> 0 0.0
> flops 256 x 245 x 256 10019143680
> 0 0.0
> flops 929 x 224 x 245 10808506240
> 0 0.0
> flops 929 x 245 x 224 10808506240
> 0 0.0
> flops 9 x 13 x 213 11030981598
> 0 0.0
> flops 13 x 9 x 213 11065522104
> 0 0.0
> flops 929 x 245 x 245 11821803700
> 0 0.0
> flops 224 x 224 x 929 11933057024
> 0 0.0
> flops 224 x 245 x 929 13051781120
> 0 0.0
> flops 245 x 224 x 929 13051781120
> 0 0.0
> flops 26 x 21 x 13 13997099844
> 0 0.0
> flops 13 x 21 x 26 13997099844
> 0 0.0
> flops 245 x 245 x 929 14275385600
> 0 0.0
> flops 13 x 21 x 13 14415243024
> 0 0.0
> flops 256 x 256 x 256 20937965568
> 0 0.0
> flops 26 x 26 x 213 21335565888
> 0 0.0
> flops 929 x 224 x 256 22587572224
> 0 0.0
> flops 929 x 256 x 224 22587572224
> 0 0.0
> flops 929 x 245 x 256 24705157120
> 0 0.0
> flops 929 x 256 x 245 24705157120
> 0 0.0
> flops 26 x 9 x 256 26040268800
> 0 0.0
> flops 9 x 26 x 256 26130843648
> 0 0.0
> flops 224 x 256 x 929 27275558912
> 0 0.0
> flops 256 x 224 x 929 27275558912
> 0 0.0
> flops 924 x 224 x 224 29486628864
> 0 0.0
> flops 245 x 256 x 929 29832642560
> 0 0.0
> flops 256 x 245 x 929 29832642560
> 0 0.0
> flops 924 x 224 x 245 32251000320
> 0 0.0
> flops 924 x 245 x 224 32251000320
> 0 0.0
> flops 13 x 26 x 213 32611122180
> 0 0.0
> flops 26 x 13 x 213 32674620888
> 0 0.0
> flops 13 x 13 x 213 33962737536
> 0 0.0
> flops 924 x 245 x 245 35274531600
> 0 0.0
> flops 224 x 224 x 924 35606495232
> 0 0.0
> flops 224 x 245 x 924 38944604160
> 0 0.0
> flops 245 x 224 x 924 38944604160
> 0 0.0
> flops 9 x 13 x 256 39773680128
> 0 0.0
> flops 13 x 9 x 256 39898220544
> 0 0.0
> flops 245 x 245 x 924 42595660800
> 0 0.0
> flops 929 x 224 x 929 47943653632
> 0 0.0
> flops 9 x 32 x 9 49089576960
> 0 0.0
> flops 929 x 256 x 256 51628736512
> 0 0.0
> flops 929 x 245 x 929 52438371160
> 0 0.0
> flops 256 x 256 x 929 62344134656
> 0 0.0
> flops 924 x 256 x 224 67398008832
> 0 0.0
> flops 924 x 224 x 256 67398008832
> 0 0.0
> flops 924 x 256 x 245 73716572160
> 0 0.0
> flops 924 x 245 x 256 73716572160
> 0 0.0
> flops 26 x 26 x 256 76928237568
> 0 0.0
> flops 224 x 256 x 924 81386274816
> 0 0.0
> flops 256 x 224 x 924 81386274816
> 0 0.0
> flops 245 x 256 x 924 89016238080
> 0 0.0
> flops 256 x 245 x 924 89016238080
> 0 0.0
> flops 929 x 256 x 929 109585494016
> 0 0.0
> flops 13 x 26 x 256 117583764480
> 0 0.0
> flops 26 x 13 x 256 117812717568
> 0 0.0
> flops 13 x 13 x 256 122457194496
> 0 0.0
> flops 26 x 32 x 9 141814333440
> 0 0.0
> flops 9 x 32 x 26 141814333440
> 0 0.0
> flops 924 x 224 x 929 143056843776
> 0 0.0
> flops 929 x 224 x 924 143056843776
> 0 0.0
> flops 924 x 256 x 256 154052591616
> 0 0.0
> flops 924 x 245 x 929 156468422880
> 0 0.0
> flops 929 x 245 x 924 156468422880
> 0 0.0
> flops 256 x 256 x 924 186025771008
> 0 0.0
> flops 9 x 32 x 13 216568460160
> 0 0.0
> flops 13 x 32 x 9 216568460160
> 0 0.0
> flops 924 x 256 x 929 326987071488
> 0 0.0
> flops 929 x 256 x 924 326987071488
> 0 0.0
> flops 26 x 32 x 26 409685852160
> 0 0.0
> flops 924 x 224 x 924 426860679168
> 0 0.0
> flops 924 x 245 x 924 466878867840
> 0 0.0
> flops 26 x 32 x 13 639867421440
> 0 0.0
> flops 13 x 32 x 26 639867421440
> 0 0.0
> flops 13 x 32 x 13 658982538240
> 0 0.0
> flops 924 x 256 x 924 975681552384
> 0 0.0
> flops total 9705794368534
> 0 0.0
> marketing flops 10521420653440
>
> MY input file:
> &GLOBAL
> PRINT_LEVEL LOW
> PROJECT_NAME 1
> RUN_TYPE ENERGY
> &END GLOBAL
> &MOTION
> &GEO_OPT
> OPTIMIZER BFGS
> STEP_START_VAL 1
> &END GEO_OPT
> &END MOTION
> &FORCE_EVAL
> METHOD QS
> STRESS_TENSOR ANALYTICAL
> &DFT
> &SCF
> MAX_SCF 50
> EPS_SCF 1.0E-7
> &OT T
> MINIMIZER DIIS
> ENERGY_GAP 0.002
> ALGORITHM IRAC
> PRECONDITIONER FULL_ALL
> &END OT
> &OUTER_SCF
> EPS_SCF 1.0E-7
> MAX_SCF 40
> STEP_SIZE 0.1
> EXTRAPOLATION_ORDER 4
> &END OUTER_SCF
> &END SCF
> &QS
> METHOD GPW
> &END QS
> &MGRID
> CUTOFF 400
> &END MGRID
> &XC
> &XC_FUNCTIONAL NO_SHORTCUT
> &PBE T
> &END PBE
> &END XC_FUNCTIONAL
> &END XC
> &POISSON
> periodic xyz
> poisson_solver periodic
> &END POISSON
> &END DFT
> &SUBSYS
> &CELL
> A 16.707 0.00 0.00
> B 0.00 15.580 0.00
> C 0.00 0.00 30.
> PERIODIC XYZ
> MULTIPLE_UNIT_CELL 1 1 1
> &END CELL
> &COORD
> &END COORD
>
> anyone can help?
>
>
> Jianfeng Jia
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20151120/7b4719a8/attachment.htm>
More information about the CP2K-user
mailing list