building hybrid MPI+OpenMP

Steve Schmerler elco... at
Mon Jul 8 17:09:46 CEST 2013


I'm trying to compile a hybrid MPI+OpenMP version with

* ifort 12.1
* mkl 10.3
* intel MPI 4.0.3  
* fftw 3.3.3 threaded
* scalapack from mkl

The arch file:

    CC       = cc
    CPP      = 
    FC       = mpiifort
    LD       = mpiifort
    AR       = ar -r
    DFLAGS   = -D__INTEL -D__parallel -D__BLACS -D__SCALAPACK -D__FFTW3
    FCFLAGS  = $(DFLAGS)  -O2 -free -heap-arrays 64 -funroll-loops -fpp -axAVX \
               -openmp -mt_mpi -I$(FFTW_INC)
    FCFLAGS2 = $(DFLAGS)  -O1 -free
    LIBS     = -lfftw3_threads -lfftw3 -liomp5 \
               -lmkl_blacs_intelmpi_lp64 -lmkl_scalapack_lp64 \
               -lmkl_intel_lp64 -lmkl_core -lmkl_sequential \
    OBJECTS_ARCHITECTURE = machine_intel.o
    graphcon.o: graphcon.F
            $(FC) -c $(FCFLAGS2) $<

I see different errors, depending on which combo of MPI tasks and threads is

* OMP_NUM_THREADS=1, mpirun -np 1 

    *** ERROR in cp_fm_syevd_base (MODULE cp_fm_diag) ***
    *** Matrix diagonalization failed ***
    *** Program stopped at line number 384 of MODULE cp_fm_diag ***
    ===== Routine Calling Stack ===== 
              10 cp_fm_syevd_base
               9 cp_fm_syevd
               8 cp_dbcsr_syevd
               7 subspace_eigenvalues_ks_dbcsr
               6 prepare_preconditioner
               5 init_scf_loop
               4 scf_env_do_scf
               3 qs_energies_scf
               2 qs_forces
               1 CP2K

* OMP_NUM_THREADS=1, mpirun -np 4

    MKL ERROR: Parameter 4 was incorrect on entry to DLASCL
    {    1,    1}:  On entry to 
    DSTEQR parameter number   -3 had an illegal value 
    MKL ERROR: Parameter 5 was incorrect on entry to DLASCL
    {    0,    0}:  On entry to 
    DSTEQR parameter number   -3 had an illegal value 

  I had this one before and the reason was that the input geometry + used basis
  caused NaNs which were apparently passed to a scalapack call. However, the
  input is OK and works with a pure-MPI build. Therefore, I guess that the OMP
  code calculates something wrong. Then in the case of 1 core, a serial lapack
  call fails, while in the parallel case, a scalapack call does.

* OMP_NUM_THREADS=4, mpirun -np 1

    Output hangs at "Extrapolation method: initial_guess", only one MPI taks is
    running, but no threads.

I wanted to blame the MPI library, but Intel MPI says it supports
MPI_THREAD_FUNNELED. The same happens if I only link fftw3, not
fftw3_threads, so it's probably not fftw, either. So am I linking some
libraries wrong (in which case the problem is probably completety
trivial and I just don't see it)?

Thank you for your help!


Steve Schmerler
Institut f�r Theoretische Physik
TU Freiberg, Germany

More information about the CP2K-user mailing list