[CP2K:7187] Re: How to use GPU to accelerate a calculations
贾建峰
jjf_... at 163.com
Sat Nov 21 09:19:58 UTC 2015
Hi Ole,
It's a big trick. It work fine!!!.
thanks.
Jianfeng
At 2015-11-20 20:58:30, "Ole Schütt" <o... at schuett.name> wrote:
Hi Jianfeng,
try replacing -D__DBCSR_CUDA with -D__DBCSR_ACC. That might just do the trick.
-Ole
Am Freitag, 20. November 2015 13:04:25 UTC+1 schrieb jjf... at yahoo.com.cn:
Dear all,
I have compiled my CP2K with:
DFLAGS = -D__INTEL -D__FFTSG -D__parallel -D__BLACS -D__SCALAPACK -D__FFTW3 -D__FFTMKL -D__LIBINT -D__LIBXC2 -D__ACC -D__CUDAPW -D__DBCSR_CUDA -D__LIBINT_MAX_AM=6 -D__LIBDERIV_MAX_AM1=5
I have revised the generate.py as following:
triples = combinations(23) # blocked H2O (benchmark)
triples += combinations(6) # idem min basis
triples += combinations(14,16,29) # RPA water
triples += combinations(5, 32, 13, 24, 26)
triples += combinations(9, 32, 22)
triples += combinations(32)
triples += combinations(64)
triples += combinations(78)
triples += combinations(16,29,55)
triples += combinations(13,32,13)
triples += combinations(26,32,13)
triples += combinations(13,32,26)
triples += combinations(26,32,26)
triples += combinations(13,32,9)
triples += combinations(9,32,13)
triples += combinations(26,32,9)
triples += combinations(9,32,26)
However, I don't find the acceleration from MY GPU (K20m). The DBCSR STATISTICS was as following:
COUNTER CPU ACC ACC%
number of processed stacks 388804 0 0.0
matmuls inhomo. stacks 12328621 0 0.0
matmuls total 242005257 0 0.0
flops 9 x 21 x 9 1073834496 0 0.0
flops 224 x 224 x 224 1753350144 0 0.0
flops 224 x 224 x 245 1917726720 0 0.0
flops 224 x 245 x 224 1917726720 0 0.0
flops 245 x 224 x 224 1917726720 0 0.0
flops 224 x 245 x 245 2097513600 0 0.0
flops 245 x 224 x 245 2097513600 0 0.0
flops 245 x 245 x 224 2097513600 0 0.0
flops 245 x 245 x 245 2294155500 0 0.0
flops 9 x 9 x 213 2556480528 0 0.0
flops 26 x 21 x 9 3102188544 0 0.0
flops 9 x 21 x 26 3102188544 0 0.0
flops 224 x 224 x 256 4007657472 0 0.0
flops 224 x 256 x 224 4007657472 0 0.0
flops 256 x 224 x 224 4007657472 0 0.0
flops 224 x 245 x 256 4383375360 0 0.0
flops 224 x 256 x 245 4383375360 0 0.0
flops 245 x 224 x 256 4383375360 0 0.0
flops 245 x 256 x 224 4383375360 0 0.0
flops 256 x 224 x 245 4383375360 0 0.0
flops 256 x 245 x 224 4383375360 0 0.0
flops 9 x 21 x 13 4737435066 0 0.0
flops 13 x 21 x 9 4737435066 0 0.0
flops 245 x 245 x 256 4794316800 0 0.0
flops 245 x 256 x 245 4794316800 0 0.0
flops 256 x 245 x 245 4794316800 0 0.0
flops 26 x 9 x 213 7222105800 0 0.0
flops 9 x 26 x 213 7247226168 0 0.0
flops 26 x 21 x 26 8961878016 0 0.0
flops 256 x 256 x 224 9160359936 0 0.0
flops 224 x 256 x 256 9160359936 0 0.0
flops 256 x 224 x 256 9160359936 0 0.0
flops 9 x 9 x 256 9217732608 0 0.0
flops 929 x 224 x 224 9882062848 0 0.0
flops 256 x 256 x 245 10019143680 0 0.0
flops 245 x 256 x 256 10019143680 0 0.0
flops 256 x 245 x 256 10019143680 0 0.0
flops 929 x 224 x 245 10808506240 0 0.0
flops 929 x 245 x 224 10808506240 0 0.0
flops 9 x 13 x 213 11030981598 0 0.0
flops 13 x 9 x 213 11065522104 0 0.0
flops 929 x 245 x 245 11821803700 0 0.0
flops 224 x 224 x 929 11933057024 0 0.0
flops 224 x 245 x 929 13051781120 0 0.0
flops 245 x 224 x 929 13051781120 0 0.0
flops 26 x 21 x 13 13997099844 0 0.0
flops 13 x 21 x 26 13997099844 0 0.0
flops 245 x 245 x 929 14275385600 0 0.0
flops 13 x 21 x 13 14415243024 0 0.0
flops 256 x 256 x 256 20937965568 0 0.0
flops 26 x 26 x 213 21335565888 0 0.0
flops 929 x 224 x 256 22587572224 0 0.0
flops 929 x 256 x 224 22587572224 0 0.0
flops 929 x 245 x 256 24705157120 0 0.0
flops 929 x 256 x 245 24705157120 0 0.0
flops 26 x 9 x 256 26040268800 0 0.0
flops 9 x 26 x 256 26130843648 0 0.0
flops 224 x 256 x 929 27275558912 0 0.0
flops 256 x 224 x 929 27275558912 0 0.0
flops 924 x 224 x 224 29486628864 0 0.0
flops 245 x 256 x 929 29832642560 0 0.0
flops 256 x 245 x 929 29832642560 0 0.0
flops 924 x 224 x 245 32251000320 0 0.0
flops 924 x 245 x 224 32251000320 0 0.0
flops 13 x 26 x 213 32611122180 0 0.0
flops 26 x 13 x 213 32674620888 0 0.0
flops 13 x 13 x 213 33962737536 0 0.0
flops 924 x 245 x 245 35274531600 0 0.0
flops 224 x 224 x 924 35606495232 0 0.0
flops 224 x 245 x 924 38944604160 0 0.0
flops 245 x 224 x 924 38944604160 0 0.0
flops 9 x 13 x 256 39773680128 0 0.0
flops 13 x 9 x 256 39898220544 0 0.0
flops 245 x 245 x 924 42595660800 0 0.0
flops 929 x 224 x 929 47943653632 0 0.0
flops 9 x 32 x 9 49089576960 0 0.0
flops 929 x 256 x 256 51628736512 0 0.0
flops 929 x 245 x 929 52438371160 0 0.0
flops 256 x 256 x 929 62344134656 0 0.0
flops 924 x 256 x 224 67398008832 0 0.0
flops 924 x 224 x 256 67398008832 0 0.0
flops 924 x 256 x 245 73716572160 0 0.0
flops 924 x 245 x 256 73716572160 0 0.0
flops 26 x 26 x 256 76928237568 0 0.0
flops 224 x 256 x 924 81386274816 0 0.0
flops 256 x 224 x 924 81386274816 0 0.0
flops 245 x 256 x 924 89016238080 0 0.0
flops 256 x 245 x 924 89016238080 0 0.0
flops 929 x 256 x 929 109585494016 0 0.0
flops 13 x 26 x 256 117583764480 0 0.0
flops 26 x 13 x 256 117812717568 0 0.0
flops 13 x 13 x 256 122457194496 0 0.0
flops 26 x 32 x 9 141814333440 0 0.0
flops 9 x 32 x 26 141814333440 0 0.0
flops 924 x 224 x 929 143056843776 0 0.0
flops 929 x 224 x 924 143056843776 0 0.0
flops 924 x 256 x 256 154052591616 0 0.0
flops 924 x 245 x 929 156468422880 0 0.0
flops 929 x 245 x 924 156468422880 0 0.0
flops 256 x 256 x 924 186025771008 0 0.0
flops 9 x 32 x 13 216568460160 0 0.0
flops 13 x 32 x 9 216568460160 0 0.0
flops 924 x 256 x 929 326987071488 0 0.0
flops 929 x 256 x 924 326987071488 0 0.0
flops 26 x 32 x 26 409685852160 0 0.0
flops 924 x 224 x 924 426860679168 0 0.0
flops 924 x 245 x 924 466878867840 0 0.0
flops 26 x 32 x 13 639867421440 0 0.0
flops 13 x 32 x 26 639867421440 0 0.0
flops 13 x 32 x 13 658982538240 0 0.0
flops 924 x 256 x 924 975681552384 0 0.0
flops total 9705794368534 0 0.0
marketing flops 10521420653440
MY input file:
&GLOBAL
PRINT_LEVEL LOW
PROJECT_NAME 1
RUN_TYPE ENERGY
&END GLOBAL
&MOTION
&GEO_OPT
OPTIMIZER BFGS
STEP_START_VAL 1
&END GEO_OPT
&END MOTION
&FORCE_EVAL
METHOD QS
STRESS_TENSOR ANALYTICAL
&DFT
&SCF
MAX_SCF 50
EPS_SCF 1.0E-7
&OT T
MINIMIZER DIIS
ENERGY_GAP 0.002
ALGORITHM IRAC
PRECONDITIONER FULL_ALL
&END OT
&OUTER_SCF
EPS_SCF 1.0E-7
MAX_SCF 40
STEP_SIZE 0.1
EXTRAPOLATION_ORDER 4
&END OUTER_SCF
&END SCF
&QS
METHOD GPW
&END QS
&MGRID
CUTOFF 400
&END MGRID
&XC
&XC_FUNCTIONAL NO_SHORTCUT
&PBE T
&END PBE
&END XC_FUNCTIONAL
&END XC
&POISSON
periodic xyz
poisson_solver periodic
&END POISSON
&END DFT
&SUBSYS
&CELL
A 16.707 0.00 0.00
B 0.00 15.580 0.00
C 0.00 0.00 30.
PERIODIC XYZ
MULTIPLE_UNIT_CELL 1 1 1
&END CELL
&COORD
&END COORD
anyone can help?
Jianfeng Jia
--
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+uns... at googlegroups.com.
To post to this group, send email to cp... at googlegroups.com.
Visit this group at http://groups.google.com/group/cp2k.
For more options, visit https://groups.google.com/d/optout.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20151121/5440d121/attachment.htm>
More information about the CP2K-user
mailing list