MPI_wait problem in cp2k 4.1 with openmpi_2.0.0

jim wang jimw... at gmail.com
Tue Mar 28 11:25:56 UTC 2017


*Thanks for your reply!*

Here are two arch files for cp2k-2.1 and cp2k-4.1:

*(1) cp2k-2.1: With openMPI-1.6.5, compiled with 4 cores, popt version*
CC       = mpicc
CPP      =
FC       = mpif90
LD       = mpif90
AR       = ar -r
DFLAGS   = -D__INTEL -D__FFTSG -D__FFTW3 -D__parallel -D__BLACS 
-D__SCALAPACK -D__MKL
CPPFLAGS =

MKLROOT  = /public/software/compiler/intel/composer_xe_2015.2.164/mkl
INTEL_INC= 
/public/software/compiler/intel/composer_xe_2015.2.164/mkl/include
FFTW3_INC= /public/home/wj/Codes/fftw-3.3.4/include/
FCFLAGS  = $(DFLAGS) -I$(INTEL_INC) -I$(FFTW3_INC) -O2 -msse2 -heap-arrays 
64 -funroll-loops -fpp -free
FCFLAGS2 = $(DFLAGS) -I$(INTEL_INC) -I$(FFTW3_INC) -O1 -msse2 -heap-arrays 
64 -fpp -free
LDFLAGS  = $(FCFLAGS) -I$(INTEL_INC) -I$(FFTW3_INC)
LIBS     = /public/home/wj/Codes/fftw-3.3.4/lib/libfftw3.a 
 /public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64/libmkl_scalapack_lp64.a 
  -Wl,--

start-group 
 /public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64/libmkl_intel_lp64.a 
  

/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64/libmkl_sequential.a 
  

/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64/libmkl_core.a 
    

/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64/libmkl_blacs_openmpi_lp64.a 
  -Wl,--end-group -lpthread

OBJECTS_ARCHITECTURE = machine_intel.o

graphcon.o: graphcon.F
        $(FC) -c $(FCFLAGS2) $<


*cp2k-4.1: Complied using 4 cores with openMPI_2.0.0, popt version*

CC       = icc
#CPP      = /lib/cpp
FC       = mpif90 -FR
FC_fixed = mpif90 -FI
LD       = mpif90
AR       = /usr/bin/ar -r

FFTW_INC=${MKLROOT}/include/fftw
INTEL_INC=${MKLROOT}/include
DFLAGS   = -D__INTEL -D__FFTW3 -D__MKL -D__parallel -D__BLACS -D__SCALAPACK
CPPFLAGS = -C $(DFLAGS) -P -traditional -I${FFTW_INC} -I${INTEL_INC}
FCFLAGS  = -O2 -pc64 -unroll -heap-arrays 64 -xHost -fpp -free 
-I${FFTW_INC} -I${INTEL_INC}

LDFLAGS  = $(FCFLAGS) -L$(HOME)/lib -L${MKLROOT}/lib/intel64
LDFLAGS_C  = $(FCFLAGS) -L$(HOME)/lib -L${MKLROOT}/lib/intel64 -nofor_main
#If you want to use BLACS and SCALAPACK use the libraries below
LIBS     = 
 /public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64/libmkl_scalapack_lp64.a 
  \
            -Wl,--start-group 
 /public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64/libmkl_intel_lp64.a 
  \
            
/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64/libmkl_sequential.a 
  \
            
/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64/libmkl_core.a 
    \
            
/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64/libmkl_blacs_openmpi_lp64.a 
  -Wl,--end-group -lpthread \
            ${MKLROOT}/interfaces/fftw3xf/libfftw3xf_intel.a
OBJECTS_ARCHITECTURE = machine_intel.o
graphcon.o: graphcon.F
        $(FC) -c $(FCFLAGS2) $<

Both jobs with cp2k-2.1 and cp2k-4.1 were carried out using 24 processor in 
one node without threading nor openMP. For cp2k-2.1, we use openMPI_1.6.5 
and for cp2k-4.1 we use openMPI_2.0.0 which is same to the compiilation 
enviroment. 

The job is a geo_opt task for amorphous solid system consisting of 216 
atoms. Are the logs you mentioned  just the output files generated by cp2k?

在 2017年3月28日星期二 UTC+8下午4:42:20,Alfio Lazzaro写道:
>
> Hello,
> unfortunately, it is not easy to answer this question without knowing more 
> details...
> First of all, which input you are running? Could you attach it? How many 
> nodes, MPI ranks, threads you are using and which CP2K version (PSMP or 
> POPT)?
> I also assume that you are compiling the two CP2K with the same setup, 
> i.e. compile options and library versions...
> Could you attach the two logs? 
>
> The problem is that we should first understand where the MPI_wait are 
> used. Indeed, it can be that CP2K 4.1 is using more MPI_wait in other 
> places.
>
> Alfio
>
> Il giorno lunedì 27 marzo 2017 11:38:52 UTC+2, jim wang ha scritto:
>>
>> Hi, everybody!
>>
>> I am using cp2k 4.1 for the testing in our new cluster. But strangly, the 
>> result showed that the cp2k 4.1 version is 3 to 4 times slower than cp2k 
>> 2.1 version built on the same cluster. After examining the output file 
>> genertated by both binary file running the same job, I found out that the 
>> MPI_wait function may be the key problem.
>>
>> Here is the result of time consumed by MPI_wait function:
>> 1. cp2k 4.1: MPI_wait time:1131(s) , Total run time: 1779(s)
>> 2. cp2k 2.1: MPI_wait time:68(s), Total run time: 616(s)
>>
>> How can I determine whether the problem should be with our cluster or the 
>> compilation?
>> Hope you guys can give me some hints on the version comparison.
>>
>> THANKS!!!
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20170328/84fd7bc1/attachment.htm>


More information about the CP2K-user mailing list