Issue with running CP2K on multi-GPU node

Ole Schütt o... at schuett.name
Fri Nov 14 10:42:57 CET 2014


Hi Abhishek,

> It looks like there are no other places from where a set_device_active() 
can be invoked other than cp2k_runs.F.

Well, actually it's called a second time from here:

https://github.com/cp2k/cp2k/blob/master/cp2k/src/f77_interface.F#L302

... don't ask ;-)

-Ole


On Thursday, November 13, 2014 5:53:51 PM UTC+1, Abhishek Bagusetty wrote:
>
> Hi Vedran,
>
> I tried to hard-code the device index to acc_set_active_device(#) and when 
> I tried to use 1 node -> all cores and a particular device-ID #, somehow 
> all the other device indices gets into the picture. This happens when using 
> MPI.
>
> The goal is to use 1 node, all cores and a particular device-ID # using 
> MPI (popt). I changed the line : 199 at cp2k_runs.F and even initialized 
> the device_ID=# in acc/include/acc.h. But somehow, the other device IDs 
> gets in. I have tried a serial run using a particular GPU and this config 
> worked fine. Some how, using mpirun is messing things around. 
>
> It looks like there are no other places from where a set_device_active() 
> can be invoked other than cp2k_runs.F. Do you have an idea what could be 
> going on?
>
> Thanks,
> Abhishek
> On Wednesday, November 12, 2014 3:46:26 AM UTC-5, Vedran Miletić wrote:
>>
>> Hello Abhishek,
>>
>> can you try changing line 199 in cp2k/src/start/cp2k_runs.F from:
>>
>>   CALL acc_set_active_device(MOD(para_env%mepos, acc_get_ndevices()))
>>
>> to
>>
>>   CALL acc_set_active_device(1)
>>
>> and see if this works? You can use 2 or 3 if you prefer.
>>
>> Regards,
>> Vedran
>>
>> Dana utorak, 11. studenoga 2014. 20:06:42 UTC+1, korisnik Abhishek 
>> Bagusetty napisao je:
>>>
>>> Hi Developers,
>>>
>>> The cluster we use have a 4 on-node GPUs. It is apparent that the 
>>> GPU-IDs are tagged as 0,1,2,3 and when GPU_ID 0 is being used by some-other 
>>> application, CP2K reports in the output that *CUDA Error: all 
>>> CUDA-capable devices are busy or unavailable*. It looks like the 
>>> deviceIndex is defaulted to 0 for the CUDA APIs.
>>>
>>> Is there a way to specify a specific GPU-ID, so that memory management 
>>> and/or kernel computations are performed with respect to that particular 
>>> device ID ?
>>>
>>> Thanks,
>>> Abhishek 
>>>
>>>
>>> -----------------------------------------------------------------------------------------------------------
>>> Abhishek Bagusetty
>>> PhD Student, Computational Modeling & Simulation
>>> Center for Simulation and Modeling
>>> Department of Chemical & Petroleum Engineering
>>> University of Pittsburgh
>>> Pittsburgh, PA - 15261
>>> Office : 920 Benedum Hall
>>>
>>> -----------------------------------------------------------------------------------------------------------
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20141114/9c25f767/attachment.html>


More information about the CP2K-user mailing list