[CP2K:4984] FFT, CUDA, and Parallelization
Ole Schütt
o... at schuett.name
Tue Feb 25 11:36:34 UTC 2014
Hi,
> I am curious what the current CUDA support is (i.e. for the
> development trunk and future development).
I added an FAQ for this: http://cp2k.org/faq:cuda_support
> However the poor cards are quickly overwhelmed for memory as DBCSR
> loads (Intel MPI is being used and
> each process takes a chunk),
This can very well be. The cuda part in DBCSR was developed for large
machines like "Piz Daint" at CSCS.ch, which has a GPU/CPU ratio of 1:1.
As a consequence we implemented e.g. a double buffering scheme, which
trades memory for lower network latency.
If you have a different GPU/CPU ration the GPU-memory will quickly
become the limiting resource.
> but the Global/CUDA/Memory flag was removed after revision 13531.
Right, the memory consumption now simply depends on the problem size.
> Strangely, running the executable compiled without the -D__DBCSR_CUDA
> flag results in the same error.
This is indeed strange. Did you run "make distclean" before recompiling?
> On a somewhat related note, the computer in question is not in a
> cluster. Is OpenMP thus the optimal method for running in parallel
> (versus MPI)?
I added an FAQ for this: http://cp2k.org/faq:mpi_vs_openmp
-Ole
--
Ole Schütt
ole.s... at mat.ethz.ch
Nanoscale Simulations
www.nanosim.mat.ethz.ch
ETH Zürich
HIT G 31.3
Wolfgang-Pauli-Strasse 27
CH-8093 Zürich
Phone +41 44 633 81 52
Fax +41 44 633 14 59
More information about the CP2K-user
mailing list