[CP2K-user] cp2k on 10 GbE

Anton Kudelin archm... at gmail.com
Thu Nov 29 18:27:33 UTC 2018


Another approach come to my mind, thank to Tiziano's link [1]. Using MVAPICH 
<http://mvapich.cse.ohio-state.edu/downloads> and rdma-core packages you 
can set up so-called Soft-RoCE 
<https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/networking_guide/sec-configuring_soft-_roce>, 
which will give you additional MPI performance. Running jobs require 
environment variable "MV2_USE_RoCE=1". Read 5.2.7 of MVAPICH user guide to 
learn in detail.

четверг, 29 ноября 2018 г., 13:20:43 UTC+3 пользователь Peter Kraus написал:
>
> Dear Anton,
>
> thanks for the suggestion. MPICH 3.3 seems quicker than OpenMPI 3.1, as on 
> 16 MPI instances with 8 OpenMP threads each (128 cores total), it takes 
> ~130 s per wavefunction optimisation step, while OpenMPI takes ~200 s. 
> However, with OpenMPI running with 8x8 parallelisation (64 cores, fits into 
> one of my hyper-threaded nodes), I get ~7 s per step, so the MPI penalty is 
> still ridiculous. This is for a V2O5 bulk system with 168 atoms, PBE and DZ 
> basis set.
>
> Best,
> Peter
>
> On Wednesday, 28 November 2018 13:17:27 UTC+1, Anton Kudelin wrote:
>>
>> Try to employ MPICH or its derivatives (MVAPICH) configured with 
>> --with-device=ch3:nemesis
>>
>> среда, 28 ноября 2018 г., 14:35:04 UTC+3 пользователь Peter Kraus написал:
>>>
>>> Dear Mike,
>>>
>>> I have tried to use CP2K on our cluster with nodes connected using 10 
>>> GbE, and all I see is a very significant slowdown. This was using 
>>> gcc-8.2.0, openmpi-3.1.1 and OpenBLAS/fftw/scalapack compiled using the two 
>>> with OpenMP enabled where possible. I've resorted to submitting "SMP"-like 
>>> jobs (by selecting the smp parallel environment, but parallelising using 
>>> both MPI and OpenMP). 
>>>
>>> If you figure out how to squeeze extra performance from the 10GbE, 
>>> please let me know.
>>>
>>> Best,
>>> Peter
>>>
>>> On Monday, 12 November 2018 18:01:48 UTC+1, Mike Ruggiero wrote:
>>>>
>>>> Hello cp2k community - I have recently setup a small computing cluster, 
>>>> with 20-24 core server nodes linked via 10 GbE connections. While scaling 
>>>> on single nodes is as it should be (i.e., nearly linear), I get very 
>>>> little-to no scale up when performing multiple node simulations. After 
>>>> digging around, it seems that this is relatively well known for cp2k, but 
>>>> I'm curious if anyone has had any success on using cp2k over 10 GbE 
>>>> connections. Any advice would be greatly appreciated! 
>>>>
>>>> Best,
>>>> Michael Ruggiero  
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20181129/084c0528/attachment.htm>


More information about the CP2K-user mailing list