cuda_tools in CP2K

Wei wei.a... at
Thu Sep 8 16:03:42 UTC 2011

Dear Urban,

thanks a lot for the reply!

>the GPU/Cuda support is still in early development and as such has quite
>a few limitations, performance bottlenecks and possibly bugs--I have not
>thoroughly tested it.

oh, I thought it is quite well developed, especially the dbcsr
version, as I saw there are already a lot of codes on cuda in src/

> Regarding the out-of-memory: I believe the (__CUDAPW & __FFTCU &
> __FFTSGL) options are currently incompatible with the __DBCSR_CUDA
> option (this is due to different approaches to memory allocation on the
> card).  You will probably have to choose one or the other.

These options mean the DFLAGS in the ARCH file, right?

DFLAGS   = -D__INTEL -D__FFTSG  -D__parallel -D__SCALAPACK -D__BLACS -

it seems that I didn't use these options. when you mentioned
__FFTSGL,  is it   -D__FFTSG?

If I delete the "-D__FFTSG", there is no FFT library.  For the
compilation it is ok, but it won't work because of the error " ***
FFTSG not functional.... *** ".

> Regarding parallelism (__DBCSR_CUDA) on a node (i.e., computer):
> Support for 1 process/multiple threads will be forthcoming and then
> supporting multiple GPUs in a box, each controlled by one MPI
> process--these two developments should solve your problem.

Yes, I think it will be great to have this kind of version. When will
this version be available?  In several weeks or months?

Also do you have the plan to make it parallel over multiple nodes,
like the normal "popt" case? because our calculations usually take 96
MPI-processes or more (at least 64 processes).. I don't know if the
parallelism within one node can help..



More information about the CP2K-user mailing list