[CP2K:29] latest modification to improve 2d particle distribution
Teodoro Laino
teodor... at gmail.com
Wed May 2 04:51:10 UTC 2007
And here there are more info (from Joost) about the latest update on
2d particle distribution..
>
> Hi,
>
> as discussed previously, I've implemented a new distribution scheme
> for
> distribution_2d. It is now particle based by default and uses global
> optimization to find the best distribution. Surprisingly, no method
> (including KG) seems to depend on the distribution_2d being molecule
> based. So there is an input option to retain a molecular
> distribution, but
> it is basically not used. Let me know if there are troubles I did not
> notice (e.g. very few properties are regtested, so these could be
> affected). This also means that one should now be able to generate
> molecules or a topology in QS runs without a performance penalty.
>
> The distribution_2d is now obtained minimizing a cost model using
> monte
> carlo based annealing to find the global minimum. The cost model
> basically
> needs to assign a cost to a given atom_pair ab, while the MC samples
> different local_rows/local_cols to minimize the maximum load on all
> CPUs.
> The good thing is that the MC seems to work very well in real life,
> i.e.
> something very good is found in a fraction of a second in most
> cases. In
> separate tests, this is even true in difficult cases (e.g. many
> CPUs and
> few atoms or with cost models that include sparsity). However, the
> current
> implementation is quadratic in the number of atoms and cpus, so
> I've added
> a skip_optimization option to work around problems you might
> experience if
> you're running with e.g. 10000s atoms or 10000s CPUs. The limiting
> factor
> of the current approach is the cost model. In particular, the current
> scheme of assigning to the cost of the pair ab the number of matrix
> elements of the block ab is not optimal. It neglects the sparsity
> of the
> overlap matrix and the fact that the cost depends much more on the
> composition of the basis than on the number of basis functions.
> Nevertheless, in the best case (32 H2O on 32 CPUs with TZV2P-MOLOPT-
> GTH) a
> speedup per SCF step of about 50% is observed. Demonstrating the
> potential
> for load balancing. Normally, the speedup is far smaller, as the
> scalability of QS is rarely dominated by the load balance of the
> atomic
> blocks.
>
> Cheers,
>
> Joost
More information about the CP2K-user
mailing list