Why my cp2k.popt is running much slower than cp2k.sopt?

hawk2012 hawk2... at gmail.com
Sun Jul 20 03:50:04 UTC 2008


No, I did not use Intel MKL library to link the executable cp2k.popt.
The libraries I used can be shown in my Linux-x86-64-g95.popt file:
CC       = cc
CPP      =
FC       = /home/mpich.g95/bin/mpif90
LD       = /home/mpich.g95/bin/mpif90
AR       = ar -r
DFLAGS   = -D__G95 -D__FFTSG -D__parallel -D__BLACS -D__SCALAPACK -
D__FFTW3 -D__LIBINT
CPPFLAGS =
FCFLAGS  = $(DFLAGS) -ffree-form -O2 -ffast-math -march=opteron -cpp -
g
LDFLAGS  = $(FCFLAGS)
LIBS     = /home/scalapack/scalapack-1.8.0/libscalapack.a \
           /home/BLACS/LIB/blacsF77init_MPI-LINUX-0.a \
           /home/BLACS/LIB/blacs_MPI-LINUX-0.a \
           /home/BLACS/LIB/blacsCinit_MPI-LINUX-0.a \
           /home/lapack-3.1.1/lapack_LINUX.a \
           /home/GotoBLAS/libgoto.a \
           /home/fftw/lib/libfftw3.a \
           /home/libint/lib/libderiv.a \
           /home/libint/lib/libint.a \
           /usr/lib64/libstdc++.so.6 -lpthread

OBJECTS_ARCHITECTURE = machine_g95.o

On Jul 19, 3:03 pm, Axel <akoh... at gmail.com> wrote:
> On Jul 19, 3:34 pm, hawk2012 <hawk2... at gmail.com> wrote:
>
>
>
> > Dear All:
>
> > With the help from this discussion group I successfully compiled both
> > serial and parallel executables of cp2k with g95 compiler and
> > mpich1.2.6.
>
> > However, with the same input file I found that it took much longer
> > time to run cp2k.popt with 4 CPUs than that to run cp2k.sopt with 1
> > CPU.
> > Attached file log.sopt is the output file for cp2k.sopt with 1 CPU
> > while log.popt-4CPUs is the output file for cp2k.popt with 4 CPUs.
> > It looks like the job was really running in parallel with 4 CPUs from
> > the output file log.popt-4CPUs because 4 processe numbers were shown
> > and Total number of message passing processes is also 4 which was
> > decomposed as 2x2 with Number of processor rows 2 and Number of
> > processor cols 2. When I typed command 'top', I really saw four
> > cp2k.popt processes were actually running.
>
> > It is so weird. Is this due to the special input file I used or
> > something else?
> > Could anyone take a look at these two output files and tell me what is
> > the possible reason?
>
> you are using MKL version 10.0 or later, right?
>
> have a look at the summary of CPU time and ELAPSED time.
> in your "serial" calculation, the CPU time is almost 4 times
> of your elapsed time. this usually happens, when MKL is used
> in multi-threaded mode (you are running on a quad-core node or
> a two-way dual core node. right?). since version 10 MKL multi-threads
> by default across all available cpus. now if you switch to MPI,
> MKL does not know that and thus with -np 4 you are _still_ running
> with 4 threads per MPI tasks, i.e. 16 threads altogether. that clogs
> up your memory bus and brings down your computation time.
>
> add to that, that a serial executable is a bit faster due to lack
> of parallel overhead and the fact that SMP performance of MPICH-1
> is suboptimal and your experience is completely understandable.
>
> please read the MKL documentation and either set OMP_NUM_THREADS=1
> in your environment or link with the sequential mkl libraries
> explicitly.
>
> this has been discussed in this group before. please check the
> archives.
>
> cheers,
>    axel.


More information about the CP2K-user mailing list