cuda_tools in CP2K

Wei wei.a... at googlemail.com
Tue Sep 6 21:59:27 UTC 2011

Previous message (by thread): NMR of spin-spin coupling
Next message (by thread): [CP2K:3487] cuda_tools in CP2K
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Dear all,

I am interested in the cuda_tools in cp2k, I have complied the recent
cp2k (Version 2.2.320) with cuda4.0, intel compiler 12, intelmkl
inside the package, and intelmpi (modification based on Linux-x86-64-
dbcsr-cuda.popt, see it at the end).

If I run with "./cp2k.popt test.inp", it is ok for about 100 atoms
(Sb,Te) or less, but it gives "CUDA Error: out of memory" when the
system excceeds 120 atoms (it this normal? as each GPU has 6 GB device
memory).

So I wonder how can I run it in parallel. Now I cannot run it with
"mpirun -np 2 ./cp2k.popt test.inp", because it gives the "out of
memory problem" at once.

CUDA Error: out of memory
 ASSERTION FAILED:         1.EQ.        0

  stack:
  error in dev_mem_alloc_i at line    35 with error type  -1
  message: Could not allocate GPU device memory
    6 error in dev_mem_alloc_i at line    35
    5 called from dev_mem_alloc_any
    4 called from init_card_c
    3 called from dbcsr_multrec_init
    2 called from dbcsr_mult_m_e_e
    1 called from dbcsr_multiply_anytype


Where can I get more information about this cuda_tools? Can this
"popt" version utilize the resources between nodes like the normal
case? As we have 2 GPU(NVIDIA Quadro 6000 (Fermi)) and 2 6-core CPU on
each node, how can I get the best performance out of it? like assign
the job on several nodes with several MPI-core to control two GPU on
each node? How?

Thanks a lot in advance!


NVCC     = nvcc
NVFLAGS  = $(DFLAGS) -g -arch sm_20

CC       = mpiicc
CPP      =
FC       = mpiifort
LD       = $(FC)
AR       = ar -r
CPPFLAGS =
DFLAGS   = -D__INTEL -D__FFTSG  -D__parallel -D__SCALAPACK -D__BLACS -
D__DBCSR_CUDA
INTEL_INC= /opt/intel/Compiler/12.0/4.191/rwthlnk/mkl/include
MKLPATH  = /opt/intel/Compiler/12.0/4.191/rwthlnk/mkl/lib/intel64
FCFLAGS  = $(DFLAGS) -I$(INTEL_INC) -O3 -msse2 -heap-arrays 64 -
funroll-loops -fpp -free
LDFLAGS  = $(FCFLAGS)
CUDAPATH = /usr/local_rwth/sw/cuda/4.0.17/lib64
LIBS     = $(CUDAPATH)/libcudart.so            $(CUDAPATH)/
libcufft.so     $(CUDAPATH)/libcublas.so   $(MKLPATH)/
libmkl_scalapack_lp64.a  $(MKLPATH)/libmkl_solver_lp64.a   -Wl,--start-
group    $(MKLPATH)/libmkl_intel_lp64.a    $(MKLPATH)/
libmkl_sequential.a    $(MKLPATH)/libmkl_core.a   $(MKLPATH)/
libmkl_blacs_intelmpi_lp64.a  -Wl,--end-group -lpthread

OBJECTS_ARCHITECTURE = machine_intel.o


Best regards,

Wei

---------------------------------------------------------
Wei ZHANG
PhD student
Institute for Theoretical Solid State Physics
RWTH Aachen University, Germany

Previous message (by thread): NMR of spin-spin coupling
Next message (by thread): [CP2K:3487] cuda_tools in CP2K
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the CP2K-user mailing list