cp2k speedup on multicore machines

Axel akoh... at gmail.com
Wed Jan 30 22:47:07 UTC 2008


for the sake of completeness. here my data in the same style:

machine: 2x Intel(R) Xeon(R) CPU 5150  @ 2.66GHz (woodcrest)
cache size: 4096 KB

fedora core 6, kernel 2.6.22.14-72.fc6 #1 SMP
intel fortran 9.1.040 Build 20061101
intel mkl-9.0, fftw3, OpenMPI-1.2.1
optimization: -O2 -unroll -tpp6 -pc64

serial runs using: numactl --physcpubind=3 cp2k.sopt
single runs using: numactl --physcpubind=3 mpirun -np 1 cp2k.popt
dual-a runs using: numactl --physcpubind=0,1 bash ; mpirun -np 2
cp2k.popt
dual-b runs using: numactl --physcpubind=0,2 bash ; mpirun -np 2
cp2k.popt
quad   runs using: mpirun -np 4 cp2k.popt

OpenMPI with mpi_paffinity_alone = 1 in ~/.openmpi/mca-params.conf

# 32-water benchmark input total wall time, scaling:
serial   985.73    1.03
single  1010.39    1.00
dual-a   550.87    1.83
dual-b   624.90    1.62
quad     362.83    2.78

# 64-water benchmark input total wall time, scaling:
serial  2527.28    1.04
single  2616.59    1.00
dual-a  1402.91    1.86
quad     940.68    2.78

so scaling is quite similar.

shawn, it would be great if you could check with
your machine, and particularly use numactl to run
a -np 4 job so that each process is on a different
dual-core die. i'm very curious to see whether already
with a single node you saturate the memory bandwidth
and thus 4-way/node would be faster than 8-way/node.

cheers,
   axel.

On Jan 30, 10:23 am, cavallo <lcav... at unisa.it> wrote:
> Dear all,
>
> these are the final benchmarks on the following machine.
> HP proliant dl140 with dual-core Intel Xeon CPU 5160  @ 3.00GHz .
> kernel 2.6.21-1.3194.fc7
> gcc version 4.1.2 20070925 (Red Hat 4.1.2-27)
> ifort Build 20070613 Package ID: l_fc_c_10.0.025
> intel mkl 10.0.1.014
> fftw-3.1.2
> mpich2-1.0.6p1
>
> These are with inputs from cp2k/tests/QS/benchamrks, no changes
>                h2o-32.inp         h2o-64.inp         h2o-256.inp
>               secs                  secs                 secs
> 1 cores    908   1.00         2347    1.00       27518     1.00
> 2 cores    511  1.78          1286    1.83       16526     1.66
> 4 cores    329  2.76            863    2.72       16311     1.68
>
> These are the results for the 64w test with Teo speedup tips (see
> above)
> 1 cores    1229  1.00
> 1 cores      671  1.83
> 1 cores      663  1.85
>
> During the next days I'll try to run the same tests with OpenMPI.
> Ciao,
> Luigi


More information about the CP2K-user mailing list