[CP2K:3713] hwloc support in cp2k-trunk
Axel
akoh... at gmail.com
Fri Jan 27 00:38:04 UTC 2012
On Thursday, January 26, 2012 3:50:00 AM UTC-5, Christiane Pousa Ribeiro
wrote:
>
>
>> yes, this kind of behavior is what i would have expected.
>> this should also help with the internal threading in OpenMPI.
>>
>
> The main goal is to avoid memory allocations and access from different
> MPIs on remote NUMA nodes.
>
i know. ;)
> But, If you want to pin also threads you can try the Linear strategy,
> which will pin process and threads.
>
unlike with binding MPI tasks to "NUMA units",
i didn't see a significant difference in performance.
>> please have a look at the attached file. you'll see that there
>> are some entries that don't look right. particularly the node
>> names are all that of MPI rank 0.
>>
>
> I did some changes to fix this. Could you try the latest version of CP2K?
>
yes. updated, compiled and tested. it gives the output that i expect now.
>
>> yes. our MPI installation is configured by default to have a 1:1 core to
>> MPI
>> rank mapping (since there is practically nobody yet using MPI+OpenMP)
>> with memory affinity for giving people the best MPI-only performance.
>>
>>
> Ok. So, for threads, even with this installation you can not specify their
> cores?
>
it is not alone a matter of "want". the majority of users that i am working
with doesn't care (well, they do care if things run faster, but they don't
care so much, if it looks/sounds/is complicated). with hiding most of
the complexity in a script and having it not allow unreasonable choices,
i don't get the maximal flexibility, but all i need to do is to tell people:
just use this wrapper and it'll work. if every application would be hardware
topology aware and adjust itself as needed, that is even better and that
is why i am trying to compile cp2k this way.
at the end of the attached file i include a copy of the wrapper script,
>> that is OpenMPI specific (since that is the only MPI library installed).
>>
>
> thanks for the script.
>
>
>>
>> overall, it looks to me like that default settings are giving a desirable
>> processor and memory affinity (which is great) that is consistent with
>> the best settings i could get using my wrapper script, but the diagnostics
>> seems to be off and may be confusing people, particularly technical
>> support in computing centers, that are often too literal and assume
>> that any software is always giving 100% correct information. ;-)
>>
>
> Now, it should work :) Let me know if you find new bugs.
>
thanks a lot. much appreciated. will let you know,
if i run across any additional problems.
> Considering your machine, the cores number problem comes from the fact
> that I was using the number that the OS gives to the cores. Now, I'm using
> the logical ones. BTW, is your machine intel?
>
We have both. Intel and AMD (which is forcing me to use compiler settings,
that are compatible with a common subset of both). overall the AMD ones
benefit the most from using processor and memory affinity, but i was
surprised
how much impact it has on the X5677 Intel CPUs (quad-core westmere ep
with 3.5GHz). just proves that there is always something new to learn...
thanks again,
axel.
>
>>
>> cheers,
>> axel.
>>
>>
> cheers,
>
> Christiane Pousa Ribeiro
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20120126/f950962b/attachment.htm>
More information about the CP2K-user
mailing list