cp2k speedup on multicore machines

Axel akoh... at gmail.com
Wed Jan 30 12:31:19 UTC 2008



On Jan 30, 6:15 am, cavallo <lcav... at unisa.it> wrote:
> Thanks Axel,
>
> Clearly something was wrong in my machine. The problem was related to
> something in mpich2. I rebuild it from scratch and things are much
> better now.

> Using the cp2k/tests/QS/benchmarks/  H2O-64.inp   H2O-256.inp as they
> are, I got the following execution times and speedups:
>
>                       H2O-64.inp         H2O-256.inp
>                    secs                    secs
> 1 cores        2347    1.00         27518     1.00
> 2 cores        1286    1.83         16526     1.66
> 4 cores          863    2.72         16311     1.68
>
> However, beside the MD steps, lot of time is spent in the starting 50
> steps for scf wf optimization.

this is just the way quickstep works. the first initial guess is not
very good, but for each subsequent step, the extrapolator is
taking care that the initial scf guess is improved. the main trick
for efficient MD is tailor the extrapolator and SCF convergence
to be most efficient, and also set it so that you conserve energy.

> Since none of my exec times is close to yours, I wonder which test
> your ran. Can you post the input/recipe if any change from that in the
> cp2k test dir ? Beside speedup, I am also interested in absolute

i was using the 32 water example from the same directory where
you took the 64 and 256 inputs from. i will run those if i get into
the office later.

> execution times, of course, and your tests are sort of final target
> for me, since you tested on a machine extremely similar to the one I
> am testing now.
>
> Final question is: what's better, amd or intel ? Any experience on
> this ?

there is no clear winner. it is the whole package that matters.
at high clock rates and core numbers, memory bandwidth and
latencies become more important than the cpu type and speed.

how much lack of memory bandwidth affects you, depends on
your specific job. large quickstep jobs are affected the most.
right now, you can get a pretty good deal on 45nm intel quad core.
amd cpus are by design less affected by memory bandwidth
restriction, but _require_ you to use working NUMA control (cpu
and memory affinity) for good performance. this becomes more
evident, when you have four-way machines.

cheers,
    axel.

> Thanks again,
> Luigi


More information about the CP2K-user mailing list