cp2k speedup on multicore machines
Axel
akoh... at gmail.com
Wed Jan 30 22:47:07 UTC 2008
for the sake of completeness. here my data in the same style:
machine: 2x Intel(R) Xeon(R) CPU 5150 @ 2.66GHz (woodcrest)
cache size: 4096 KB
fedora core 6, kernel 2.6.22.14-72.fc6 #1 SMP
intel fortran 9.1.040 Build 20061101
intel mkl-9.0, fftw3, OpenMPI-1.2.1
optimization: -O2 -unroll -tpp6 -pc64
serial runs using: numactl --physcpubind=3 cp2k.sopt
single runs using: numactl --physcpubind=3 mpirun -np 1 cp2k.popt
dual-a runs using: numactl --physcpubind=0,1 bash ; mpirun -np 2
cp2k.popt
dual-b runs using: numactl --physcpubind=0,2 bash ; mpirun -np 2
cp2k.popt
quad runs using: mpirun -np 4 cp2k.popt
OpenMPI with mpi_paffinity_alone = 1 in ~/.openmpi/mca-params.conf
# 32-water benchmark input total wall time, scaling:
serial 985.73 1.03
single 1010.39 1.00
dual-a 550.87 1.83
dual-b 624.90 1.62
quad 362.83 2.78
# 64-water benchmark input total wall time, scaling:
serial 2527.28 1.04
single 2616.59 1.00
dual-a 1402.91 1.86
quad 940.68 2.78
so scaling is quite similar.
shawn, it would be great if you could check with
your machine, and particularly use numactl to run
a -np 4 job so that each process is on a different
dual-core die. i'm very curious to see whether already
with a single node you saturate the memory bandwidth
and thus 4-way/node would be faster than 8-way/node.
cheers,
axel.
On Jan 30, 10:23 am, cavallo <lcav... at unisa.it> wrote:
> Dear all,
>
> these are the final benchmarks on the following machine.
> HP proliant dl140 with dual-core Intel Xeon CPU 5160 @ 3.00GHz .
> kernel 2.6.21-1.3194.fc7
> gcc version 4.1.2 20070925 (Red Hat 4.1.2-27)
> ifort Build 20070613 Package ID: l_fc_c_10.0.025
> intel mkl 10.0.1.014
> fftw-3.1.2
> mpich2-1.0.6p1
>
> These are with inputs from cp2k/tests/QS/benchamrks, no changes
> h2o-32.inp h2o-64.inp h2o-256.inp
> secs secs secs
> 1 cores 908 1.00 2347 1.00 27518 1.00
> 2 cores 511 1.78 1286 1.83 16526 1.66
> 4 cores 329 2.76 863 2.72 16311 1.68
>
> These are the results for the 64w test with Teo speedup tips (see
> above)
> 1 cores 1229 1.00
> 1 cores 671 1.83
> 1 cores 663 1.85
>
> During the next days I'll try to run the same tests with OpenMPI.
> Ciao,
> Luigi
More information about the CP2K-user
mailing list