[CP2K:3489] Re: cuda_tools in CP2K
urban.b... at gmail.com
Fri Sep 9 23:49:36 CEST 2011
On Thu, 2011-09-08 at 09:03 -0700, Wei wrote:
> > Regarding the out-of-memory: I believe the (__CUDAPW & __FFTCU &
> > __FFTSGL) options are currently incompatible with the __DBCSR_CUDA
> > option (this is due to different approaches to memory allocation on the
> > card). You will probably have to choose one or the other.
> These options mean the DFLAGS in the ARCH file, right?
> DFLAGS = -D__INTEL -D__FFTSG -D__parallel -D__SCALAPACK -D__BLACS -
> it seems that I didn't use these options. when you mentioned
> __FFTSGL, is it -D__FFTSG?
> If I delete the "-D__FFTSG", there is no FFT library. For the
> compilation it is ok, but it won't work because of the error " ***
> FFTSG not functional.... *** ".
No, FFTSG should not be turned off--it has nothing to do with CUDA.
Just do not use CUDAPW and FFTCU (and do not use FFTSGL, which means to
use SinGLe precision); see section 2h of the INSTALL file. CUDAPW and
FFTCU reserve a big block of memory on the GPU for the entire run,
leaving little room for other uses; however, this size is settable via
the input file so you might be able to get all three options to work
> > Regarding parallelism (__DBCSR_CUDA) on a node (i.e., computer):
> > Support for 1 process/multiple threads will be forthcoming and then
> > supporting multiple GPUs in a box, each controlled by one MPI
> > process--these two developments should solve your problem.
> Yes, I think it will be great to have this kind of version. When will
> this version be available? In several weeks or months?
> Also do you have the plan to make it parallel over multiple nodes,
> like the normal "popt" case? because our calculations usually take 96
> MPI-processes or more (at least 64 processes).. I don't know if the
> parallelism within one node can help..
I believe this should already work though I have not had the opportunity
to test this with multiple computers.
More information about the CP2K-user