> First of all setting OMP_NUM_THREADS=2 and running 1 MPI process per
> node makes performance worse (lines 2, 5 and 6) compared to 2 MPI

a proper hybrid OpenMP/MPI compile is tricky. you have
to compile all of cp2k with OpenMP support _and_ have a
multi-threaded FFT and BLAS/LAPACK. even then the gain
is probably most, when you have scaled out the MPI parallelism.

> processes per node (we have 1 dual core CPU per node).
> Is it normal? CPU time still doubles.

yep. OpenMP comes with a lot of overhead and i have the
feeling the with intel binaries the threads keep spinning
instead of sleeping until they are used.

> And is it a normal scaling behavior? ( 4.89 on 8 procs is not very
> good. I know the system is small but anyway).

please see previous discussions on scaling and dual-core performance.

what kind of interconnect do you have on your cluster?
cp2k is _very_ demanding is terms of both memory bandwidth
and communication bandwidth/latency.

> Thanks.

