[CP2K:3713] hwloc support in cp2k-trunk

Axel akoh... at gmail.com
Fri Jan 27 00:38:04 UTC 2012



On Thursday, January 26, 2012 3:50:00 AM UTC-5, Christiane Pousa Ribeiro 
wrote:
 

>  
>
>> yes, this kind of behavior is what i would have expected.
>> this should also help with the internal threading in OpenMPI.
>>
>
> The main goal is to avoid memory allocations and access from different 
> MPIs on remote NUMA nodes.
>

i know. ;)
 

> But, If you want to pin also threads you can try the Linear strategy, 
> which will pin process and threads. 
>

unlike with binding MPI tasks to "NUMA units",
i didn't see a significant difference in performance.


>> please have a look at the attached file. you'll see that there
>> are some entries that don't look right. particularly the node
>> names are all that of MPI rank 0.
>>
>
> I did some changes to fix this. Could you try the latest version of CP2K?
>

yes. updated, compiled and tested. it gives the output that i expect now.
 

>
>> yes. our MPI installation is configured by default to have a 1:1 core to 
>> MPI
>> rank mapping (since there is practically nobody yet using MPI+OpenMP)
>> with memory affinity for giving people the best MPI-only performance.
>>
>>
> Ok. So, for threads, even with this installation you can not specify their 
> cores?
>

it is not alone a matter of "want". the majority of users that i am working
with doesn't care (well, they do care if things run faster, but they don't
care so much, if it looks/sounds/is complicated). with hiding most of
the complexity in a script and having it not allow unreasonable choices,
i don't get the maximal flexibility, but all i need to do is to tell people:
just use this wrapper and it'll work. if every application would be hardware
topology aware and adjust itself as needed, that is even better and that
is why i am trying to compile cp2k this way.

at the end of the attached file i include a copy of the wrapper script,
>> that is OpenMPI specific (since that is the only MPI library installed).
>>
>
> thanks for the script. 
>  
>
>>
>> overall, it looks to me like that default settings are giving a desirable 
>> processor and memory affinity (which is great) that is consistent with
>> the best settings i could get using my wrapper script, but the diagnostics
>> seems to be off and may be confusing people, particularly technical
>> support in computing centers, that are often too literal and assume 
>> that any software is always giving 100% correct information. ;-)
>>
>
> Now, it should work :) Let me know if you find new bugs.
>

thanks a lot. much appreciated. will let you know,
if i run across any additional problems.
 

> Considering your machine, the cores number problem comes from the fact 
> that I was using the number that the OS gives to the cores. Now, I'm using 
> the logical ones. BTW, is your machine intel? 
>

We have both. Intel and AMD (which is forcing me to use compiler settings,
that are compatible with a common subset of both). overall the AMD ones
benefit the most from using processor and memory affinity, but i was 
surprised
how much impact it has on the X5677 Intel CPUs (quad-core westmere ep
with 3.5GHz). just proves that there is always something new to learn...
 
thanks again,
    axel.

 
>
>>
>> cheers,
>>      axel.
>>
>>
> cheers,
>
> Christiane Pousa Ribeiro
>  
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20120126/f950962b/attachment.htm>


More information about the CP2K-user mailing list