[CP2K:29] latest modification to improve 2d particle distribution

Teodoro Laino teodor... at gmail.com
Wed May 2 06:51:10 CEST 2007

And here there are more info (from Joost) about the latest update on  
2d particle distribution..

> Hi,
> as discussed previously, I've implemented a new distribution scheme  
> for
> distribution_2d. It is now particle based by default and uses global
> optimization to find the best distribution. Surprisingly, no method
> (including KG) seems to depend on the distribution_2d being molecule
> based. So there is an input option to retain a molecular  
> distribution, but
> it is basically not used. Let me know if there are troubles I did not
> notice (e.g. very few properties are regtested, so these could be
> affected). This also means that one should now be able to generate
> molecules or a topology in QS runs without a performance penalty.
> The distribution_2d is now obtained minimizing a cost model using  
> monte
> carlo based annealing to find the global minimum. The cost model  
> basically
> needs to assign a cost to a given atom_pair ab, while the MC samples
> different local_rows/local_cols to minimize the maximum load on all  
> CPUs.
> The good thing is that the MC seems to work very well in real life,  
> i.e.
> something very good is found in a fraction of a second in most  
> cases. In
> separate tests, this is even true in difficult cases (e.g. many  
> CPUs and
> few atoms or with cost models that include sparsity). However, the  
> current
> implementation is quadratic in the number of atoms and cpus, so  
> I've added
> a skip_optimization option to work around problems you might  
> experience if
> you're running with e.g. 10000s atoms or 10000s CPUs. The limiting  
> factor
> of the current approach is the cost model. In particular, the current
> scheme of assigning to the cost of the pair ab the number of matrix
> elements of the block ab is not optimal. It neglects the sparsity  
> of the
> overlap matrix and the fact that the cost depends much more on the
> composition of the basis than on the number of basis functions.
> Nevertheless, in the best case (32 H2O on 32 CPUs with TZV2P-MOLOPT- 
> GTH) a
> speedup per SCF step of about 50% is observed. Demonstrating the  
> potential
> for load balancing. Normally, the speedup is far smaller, as the
> scalability of QS is rarely dominated by the load balance of the  
> atomic
> blocks.
> Cheers,
> Joost

More information about the CP2K-user mailing list