[CP2K:4984] FFT, CUDA, and Parallelization

Ole Schütt o... at schuett.name
Tue Feb 25 11:36:34 UTC 2014


> I am curious what the current CUDA support is (i.e. for the
> development trunk and future development).

I added an FAQ for this: http://cp2k.org/faq:cuda_support

> However the poor cards are quickly overwhelmed for memory as DBCSR 
> loads (Intel MPI is being used and
> each process takes a chunk),

This can very well be. The cuda part in DBCSR was developed for large 
machines like "Piz Daint" at CSCS.ch, which has a GPU/CPU ratio of 1:1.
As a consequence we implemented e.g. a double buffering scheme, which 
trades memory for lower network latency.
If you have a different GPU/CPU ration the GPU-memory will quickly 
become the limiting resource.

> but the Global/CUDA/Memory flag was removed after revision 13531.

Right, the memory consumption now simply depends on the problem size.

> Strangely, running the executable compiled without the -D__DBCSR_CUDA 
> flag results in the same error.

This is indeed strange. Did you run "make distclean" before recompiling?

> On a somewhat related note, the computer in question is not in a
> cluster. Is OpenMP thus the optimal method for running in parallel
> (versus MPI)?

I added an FAQ for this: http://cp2k.org/faq:mpi_vs_openmp


Ole Schütt
ole.s... at mat.ethz.ch

Nanoscale Simulations

ETH Zürich
HIT G 31.3
Wolfgang-Pauli-Strasse 27
CH-8093 Zürich

Phone +41 44 633 81 52
Fax   +41 44 633 14 59

More information about the CP2K-user mailing list