Why my cp2k.popt is running much slower than cp2k.sopt?

Axel akoh... at gmail.com
Sat Jul 19 20:03:08 UTC 2008



On Jul 19, 3:34 pm, hawk2012 <hawk2... at gmail.com> wrote:
> Dear All:
>
> With the help from this discussion group I successfully compiled both
> serial and parallel executables of cp2k with g95 compiler and
> mpich1.2.6.
>
> However, with the same input file I found that it took much longer
> time to run cp2k.popt with 4 CPUs than that to run cp2k.sopt with 1
> CPU.
> Attached file log.sopt is the output file for cp2k.sopt with 1 CPU
> while log.popt-4CPUs is the output file for cp2k.popt with 4 CPUs.
> It looks like the job was really running in parallel with 4 CPUs from
> the output file log.popt-4CPUs because 4 processe numbers were shown
> and Total number of message passing processes is also 4 which was
> decomposed as 2x2 with Number of processor rows 2 and Number of
> processor cols 2. When I typed command 'top', I really saw four
> cp2k.popt processes were actually running.
>
> It is so weird. Is this due to the special input file I used or
> something else?
> Could anyone take a look at these two output files and tell me what is
> the possible reason?


you are using MKL version 10.0 or later, right?

have a look at the summary of CPU time and ELAPSED time.
in your "serial" calculation, the CPU time is almost 4 times
of your elapsed time. this usually happens, when MKL is used
in multi-threaded mode (you are running on a quad-core node or
a two-way dual core node. right?). since version 10 MKL multi-threads
by default across all available cpus. now if you switch to MPI,
MKL does not know that and thus with -np 4 you are _still_ running
with 4 threads per MPI tasks, i.e. 16 threads altogether. that clogs
up your memory bus and brings down your computation time.

add to that, that a serial executable is a bit faster due to lack
of parallel overhead and the fact that SMP performance of MPICH-1
is suboptimal and your experience is completely understandable.

please read the MKL documentation and either set OMP_NUM_THREADS=1
in your environment or link with the sequential mkl libraries
explicitly.

this has been discussed in this group before. please check the
archives.

cheers,
   axel.




More information about the CP2K-user mailing list