[CP2K:4984] FFT, CUDA, and Parallelization

Iain Bethune ibet... at epcc.ed.ac.uk
Tue Feb 25 09:01:14 UTC 2014

Previous message (by thread): FFT, CUDA, and Parallelization
Next message (by thread): [CP2K:4984] FFT, CUDA, and Parallelization
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Tyler,

I can’t comment much on the CUDA aspects of your question, as I’ve not been closely involved with that.  However, even if your machine is not in a cluster, but is just a single multi-core box, you will probably find that MPI will give you better performance in parallel than OpenMP.  I have certainly observed this on e.g. a 16-core Intel Ivy-bridge system.  Some parts of the code are not fully OpenMP’ed, and so mostly OpenMP helps when you have already reached the limit of MPI scaling, and want to push the performance and scaling further, likely only 1000s of cores, depending on the type and size of calculation you are running.  It might make sense to use OpenMP in combination with CUDA as I think only one GPU can be used per MPI rank, so you can make use of the additional CPU cores via OpenMP.  Obviously using MPI requires the installation of a suitable MPI library but OpenMPI and MPICH are commonly available on various Linux distros.

Cheers

- Iain

--

Iain Bethune
Project Manager, EPCC

Email: ibet... at epcc.ed.ac.uk
Twitter: @IainBethune
Web: http://www2.epcc.ed.ac.uk/~ibethune
Tel/Fax: +44 (0)131 650 5201/6555
Mob: +44 (0)7598317015
Addr: 2404 JCMB, The King's Buildings, Mayfield Road, Edinburgh, EH9 3JZ

On 25 Feb 2014, at 05:09, Tyler Gubb <tag... at gmail.com> wrote:

> Hello,
> 
> I am curious what the current CUDA support is (i.e. for the development trunk and future development).  I have a linux box with two Quadro cards and would like to run simulations with FFT and possibly matrices on the cards.  Attempts to compile MPI parallel + CUDA support have gone surprisingly swimmingly.  However the poor cards are quickly overwhelmed for memory as DBCSR loads (Intel MPI is being used and each process takes a chunk), but the Global/CUDA/Memory flag was removed after revision 13531.  Strangely, running the executable compiled without the -D__DBCSR_CUDA flag results in the same error. 
> 
> On a somewhat related note, the computer in question is not in a cluster.  Is OpenMP thus the optimal method for running in parallel (versus MPI)?  If so, then how would CUDA be utilized (assuming it can be)?
> 
> Lastly, I'd like to give a sincere thank you to the developers for their hard work and dedication to the CP2K project.
> 
> Cheers,
> T. Gubb
> 
> -- 
> You received this message because you are subscribed to the Google Groups "cp2k" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+uns... at googlegroups.com.
> To post to this group, send email to cp... at googlegroups.com.
> Visit this group at http://groups.google.com/group/cp2k.
> For more options, visit https://groups.google.com/groups/opt_out.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2984 bytes
Desc: not available
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20140225/660c32ef/attachment.bin>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20140225/660c32ef/attachment.ksh>

Previous message (by thread): FFT, CUDA, and Parallelization
Next message (by thread): [CP2K:4984] FFT, CUDA, and Parallelization
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the CP2K-user mailing list