Why my cp2k.popt is running much slower than cp2k.sopt?

Axel akoh... at gmail.com
Mon Jul 21 20:41:13 UTC 2008

Previous message (by thread): Why my cp2k.popt is running much slower than cp2k.sopt?
Next message (by thread): Why my cp2k.popt is running much slower than cp2k.sopt?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]


On Jul 21, 2:11 pm, Axel <akoh... at gmail.com> wrote:
> On Jul 19, 11:50 pm, hawk2012 <hawk2... at gmail.com> wrote:
>
> > No, I did not use Intel MKL library to link the executable cp2k.popt.
>
> but you use a threaded GOTO and the timings that you present
> only make sense for multi-threaded use (cpu time higher than wall
> time).
> so please check how GOTO controls threading or link with a non-
> threaded
> BLAS. in MKL you have to set OMP_NUM_THREADS=1 on all nodes.


to followup my own response. out of curiosity i looked up the
goto blas FAQ and indeed this has the same (stupid IMNSHO) default
behavior to thread across all available local CPUs.

so people on multi-core machines or altixen beware of the threaded
BLASes
and set OMP_NUM_THREADS=1 by default in your environment.

cheers,
   axel.

>
> cheers,
>    axel.
>
> > The libraries I used can be shown in my Linux-x86-64-g95.popt file:
> > CC       = cc
> > CPP      =
> > FC       = /home/mpich.g95/bin/mpif90
> > LD       = /home/mpich.g95/bin/mpif90
> > AR       = ar -r
> > DFLAGS   = -D__G95 -D__FFTSG -D__parallel -D__BLACS -D__SCALAPACK -
> > D__FFTW3 -D__LIBINT
> > CPPFLAGS =
> > FCFLAGS  = $(DFLAGS) -ffree-form -O2 -ffast-math -march=opteron -cpp -
> > g
> > LDFLAGS  = $(FCFLAGS)
> > LIBS     = /home/scalapack/scalapack-1.8.0/libscalapack.a \
> >            /home/BLACS/LIB/blacsF77init_MPI-LINUX-0.a \
> >            /home/BLACS/LIB/blacs_MPI-LINUX-0.a \
> >            /home/BLACS/LIB/blacsCinit_MPI-LINUX-0.a \
> >            /home/lapack-3.1.1/lapack_LINUX.a \
> >            /home/GotoBLAS/libgoto.a \
> >            /home/fftw/lib/libfftw3.a \
> >            /home/libint/lib/libderiv.a \
> >            /home/libint/lib/libint.a \
> >            /usr/lib64/libstdc++.so.6 -lpthread
>
> > OBJECTS_ARCHITECTURE = machine_g95.o
>
> > On Jul 19, 3:03 pm, Axel <akoh... at gmail.com> wrote:
>
> > > On Jul 19, 3:34 pm, hawk2012 <hawk2... at gmail.com> wrote:
>
> > > > Dear All:
>
> > > > With the help from this discussion group I successfully compiled both
> > > > serial and parallel executables of cp2k with g95 compiler and
> > > > mpich1.2.6.
>
> > > > However, with the same input file I found that it took much longer
> > > > time to run cp2k.popt with 4 CPUs than that to run cp2k.sopt with 1
> > > > CPU.
> > > > Attached file log.sopt is the output file for cp2k.sopt with 1 CPU
> > > > while log.popt-4CPUs is the output file for cp2k.popt with 4 CPUs.
> > > > It looks like the job was really running in parallel with 4 CPUs from
> > > > the output file log.popt-4CPUs because 4 processe numbers were shown
> > > > and Total number of message passing processes is also 4 which was
> > > > decomposed as 2x2 with Number of processor rows 2 and Number of
> > > > processor cols 2. When I typed command 'top', I really saw four
> > > > cp2k.popt processes were actually running.
>
> > > > It is so weird. Is this due to the special input file I used or
> > > > something else?
> > > > Could anyone take a look at these two output files and tell me what is
> > > > the possible reason?
>
> > > you are using MKL version 10.0 or later, right?
>
> > > have a look at the summary of CPU time and ELAPSED time.
> > > in your "serial" calculation, the CPU time is almost 4 times
> > > of your elapsed time. this usually happens, when MKL is used
> > > in multi-threaded mode (you are running on a quad-core node or
> > > a two-way dual core node. right?). since version 10 MKL multi-threads
> > > by default across all available cpus. now if you switch to MPI,
> > > MKL does not know that and thus with -np 4 you are _still_ running
> > > with 4 threads per MPI tasks, i.e. 16 threads altogether. that clogs
> > > up your memory bus and brings down your computation time.
>
> > > add to that, that a serial executable is a bit faster due to lack
> > > of parallel overhead and the fact that SMP performance of MPICH-1
> > > is suboptimal and your experience is completely understandable.
>
> > > please read the MKL documentation and either set OMP_NUM_THREADS=1
> > > in your environment or link with the sequential mkl libraries
> > > explicitly.
>
> > > this has been discussed in this group before. please check the
> > > archives.
>
> > > cheers,
> > >    axel.

Previous message (by thread): Why my cp2k.popt is running much slower than cp2k.sopt?
Next message (by thread): Why my cp2k.popt is running much slower than cp2k.sopt?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the CP2K-user mailing list