Why my cp2k.popt is running much slower than cp2k.sopt?

Axel akoh... at gmail.com
Mon Jul 21 20:11:03 CEST 2008



On Jul 19, 11:50 pm, hawk2012 <hawk2... at gmail.com> wrote:
> No, I did not use Intel MKL library to link the executable cp2k.popt.

but you use a threaded GOTO and the timings that you present
only make sense for multi-threaded use (cpu time higher than wall
time).
so please check how GOTO controls threading or link with a non-
threaded
BLAS. in MKL you have to set OMP_NUM_THREADS=1 on all nodes.

cheers,
   axel.

> The libraries I used can be shown in my Linux-x86-64-g95.popt file:
> CC       = cc
> CPP      =
> FC       = /home/mpich.g95/bin/mpif90
> LD       = /home/mpich.g95/bin/mpif90
> AR       = ar -r
> DFLAGS   = -D__G95 -D__FFTSG -D__parallel -D__BLACS -D__SCALAPACK -
> D__FFTW3 -D__LIBINT
> CPPFLAGS =
> FCFLAGS  = $(DFLAGS) -ffree-form -O2 -ffast-math -march=opteron -cpp -
> g
> LDFLAGS  = $(FCFLAGS)
> LIBS     = /home/scalapack/scalapack-1.8.0/libscalapack.a \
>            /home/BLACS/LIB/blacsF77init_MPI-LINUX-0.a \
>            /home/BLACS/LIB/blacs_MPI-LINUX-0.a \
>            /home/BLACS/LIB/blacsCinit_MPI-LINUX-0.a \
>            /home/lapack-3.1.1/lapack_LINUX.a \
>            /home/GotoBLAS/libgoto.a \
>            /home/fftw/lib/libfftw3.a \
>            /home/libint/lib/libderiv.a \
>            /home/libint/lib/libint.a \
>            /usr/lib64/libstdc++.so.6 -lpthread
>
> OBJECTS_ARCHITECTURE = machine_g95.o
>
> On Jul 19, 3:03 pm, Axel <akoh... at gmail.com> wrote:
>
> > On Jul 19, 3:34 pm, hawk2012 <hawk2... at gmail.com> wrote:
>
> > > Dear All:
>
> > > With the help from this discussion group I successfully compiled both
> > > serial and parallel executables of cp2k with g95 compiler and
> > > mpich1.2.6.
>
> > > However, with the same input file I found that it took much longer
> > > time to run cp2k.popt with 4 CPUs than that to run cp2k.sopt with 1
> > > CPU.
> > > Attached file log.sopt is the output file for cp2k.sopt with 1 CPU
> > > while log.popt-4CPUs is the output file for cp2k.popt with 4 CPUs.
> > > It looks like the job was really running in parallel with 4 CPUs from
> > > the output file log.popt-4CPUs because 4 processe numbers were shown
> > > and Total number of message passing processes is also 4 which was
> > > decomposed as 2x2 with Number of processor rows 2 and Number of
> > > processor cols 2. When I typed command 'top', I really saw four
> > > cp2k.popt processes were actually running.
>
> > > It is so weird. Is this due to the special input file I used or
> > > something else?
> > > Could anyone take a look at these two output files and tell me what is
> > > the possible reason?
>
> > you are using MKL version 10.0 or later, right?
>
> > have a look at the summary of CPU time and ELAPSED time.
> > in your "serial" calculation, the CPU time is almost 4 times
> > of your elapsed time. this usually happens, when MKL is used
> > in multi-threaded mode (you are running on a quad-core node or
> > a two-way dual core node. right?). since version 10 MKL multi-threads
> > by default across all available cpus. now if you switch to MPI,
> > MKL does not know that and thus with -np 4 you are _still_ running
> > with 4 threads per MPI tasks, i.e. 16 threads altogether. that clogs
> > up your memory bus and brings down your computation time.
>
> > add to that, that a serial executable is a bit faster due to lack
> > of parallel overhead and the fact that SMP performance of MPICH-1
> > is suboptimal and your experience is completely understandable.
>
> > please read the MKL documentation and either set OMP_NUM_THREADS=1
> > in your environment or link with the sequential mkl libraries
> > explicitly.
>
> > this has been discussed in this group before. please check the
> > archives.
>
> > cheers,
> >    axel.


More information about the CP2K-user mailing list