[CP2K:3489] Re: cuda_tools in CP2K

Urban Borštnik urban.b... at gmail.com
Fri Sep 9 21:49:36 UTC 2011

Previous message (by thread): cuda_tools in CP2K
Next message (by thread): CI BAND calculation
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, 2011-09-08 at 09:03 -0700, Wei wrote:
> [...]
> > Regarding the out-of-memory: I believe the (__CUDAPW & __FFTCU &
> > __FFTSGL) options are currently incompatible with the __DBCSR_CUDA
> > option (this is due to different approaches to memory allocation on the
> > card).  You will probably have to choose one or the other.
> 
> These options mean the DFLAGS in the ARCH file, right?

Yes.

> DFLAGS   = -D__INTEL -D__FFTSG  -D__parallel -D__SCALAPACK -D__BLACS -
> D__DBCSR_CUDA
> 
> it seems that I didn't use these options. when you mentioned
> __FFTSGL,  is it   -D__FFTSG?
> 
> If I delete the "-D__FFTSG", there is no FFT library.  For the
> compilation it is ok, but it won't work because of the error " ***
> FFTSG not functional.... *** ".

No, FFTSG should not be turned off--it has nothing to do with CUDA.
Just do not use CUDAPW and FFTCU (and do not use FFTSGL, which means to
use SinGLe precision); see section 2h of the INSTALL file.  CUDAPW and
FFTCU reserve a big block of memory on the GPU for the entire run,
leaving little room for other uses; however, this size is settable via
the input file so you might be able to get all three options to work
together.

> > Regarding parallelism (__DBCSR_CUDA) on a node (i.e., computer):
> >
> > Support for 1 process/multiple threads will be forthcoming and then
> > supporting multiple GPUs in a box, each controlled by one MPI
> > process--these two developments should solve your problem.
> 
> Yes, I think it will be great to have this kind of version. When will
> this version be available?  In several weeks or months?
> 
> Also do you have the plan to make it parallel over multiple nodes,
> like the normal "popt" case? because our calculations usually take 96
> MPI-processes or more (at least 64 processes).. I don't know if the
> parallelism within one node can help..

I believe this should already work though I have not had the opportunity
to test this with multiple computers.

Best regards,
Urban.

Previous message (by thread): cuda_tools in CP2K
Next message (by thread): CI BAND calculation
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the CP2K-user mailing list