[CP2K-user] psmp runtime error with hybrid run combinations (mpi + openmp)

Shivarama Rao shivar... at gmail.com
Wed Apr 15 05:03:16 UTC 2020


Hi Alfio,

Thanks for looking into this. following is the arch file I am using.

DFLAGS      = -D__F2008 -D__FFTW3 -D__LIBINT -D__LIBXC -D__MPI_VERSION=3\
              -D__LIBINT_MAX_AM=5 -D__LIBDERIV_MAX_AM1=4 -D__MAX_CONTR=4\
              -D__parallel -D__SCALAPACK

CPPFLAGS    = -fPIC
FCFLAGS     = $(DFLAGS) -O3 -ffree-form -fPIC\
              -fopenmp -mtune=native \
              -I$(FFTW_INC) -I$(LIBINT_INC) -I$(LIBXC_INC)
LDFLAGS     = $(FCFLAGS) -libverbs -fPIC

LIBS        = $(LIBSCALAPACK_LIB)/libscalapack.a\
              $(LIBLAPACK_LIB)/liblapack.a\
              $(LIBBLAS_LIB)/libblas.a\
              $(FFTW_LIB)/libfftw3.a\
              $(FFTW_LIB)/libfftw3_threads.a \
              $(LIBXC_LIB)/libxcf03.a\
              $(LIBXC_LIB)/libxc.a\
              $(LIBINT_LIB)/libderiv.a\
              $(LIBINT_LIB)/libint.a

let me know if you find any issues with above.

Thanks,
Shivaram


On Tuesday, April 14, 2020 at 12:25:54 PM UTC+5:30, Alfio Lazzaro wrote:
>
> Hi,
> So, from what you are saying the problem is the OpenMP parallelization (it 
> works without).
> However, Cholesky is SCALAPACK, so you should check how you are linking to 
> it. 
> In particular, CP2K suggested using Sequential BLAS.
>
> Could you share you arch file and how you are compiling CP2K? (which 
> libraries)
>
> Alfio
>
>
> Il giorno lunedì 13 aprile 2020 07:03:42 UTC+2, Shivarama Rao ha scritto:
>>
>> Hi,
>>
>> I am trying to debug an issue with cp2k 6.1. The popt run is fine. but 
>> there are errors with psmp runs.
>>
>>   command line used:
>>
>> *     mpirun -np 2 -x OMP_NUM_THREADS=4 
>> ../../../exe/Linux-x86-64-aocc/cp2k.psmp ./H2O-32.inp*
>>
>> psmp executable runs fine with 1 mpi rank and 4 openmp threads. but it 
>> fails with 2 mpi rank and 4 openmp threads. 
>>
>> Following is the error generated with 2 mpi rank and 4 openmp threads. 
>> the behavior is same for all other input sets like H2O-64.inp, H20-128.inp 
>> H20-256.inp, H20-1024.inp
>>
>>
>>
>>  *******************************************************************************
>>  *   ___                                                                  
>>      *
>>  *  /                                                                    
>>       *
>>  *[ABORT]                                                                
>>       *
>>  * ___/           Cholesky decomposition failed. Matrix ill conditioned 
>> ?      *
>>  *  |                                                                    
>>       *
>>  *O/|     /                                                              
>>       *
>>  *| |     /                                                              
>>       *
>>  *                
>>  /home/amd/cp2k_aocc/cp2k-6.1/src/cp_dbcsr_cholesky.F:121 *
>>
>>  *******************************************************************************
>>
>>
>>  ===== Routine Calling Stack =====
>>
>>            12 cp_dbcsr_cholesky_decompose
>>            11 qs_ot_get_derivative
>>            10 ot_mini
>>             9 ot_scf_mini
>>             8 qs_scf_loop_do_ot
>>             7 qs_scf_new_mos
>>             6 scf_env_do_scf_inner_loop
>>             5 scf_env_do_scf
>>             4 qs_energies
>>             3 qs_forces
>>             2 qs_mol_dyn_low
>>             1 CP2K
>>
>>  following are the combinations where the executable work/ dont work
>>
>>  
>>
>> MPI processes
>>
>> OPENMP threads
>>
>> works/not works
>>
>> 1
>>
>> 1
>>
>> works
>>
>> 1
>>
>> 2
>>
>> works
>>
>> 1
>>
>> 4
>>
>> works
>>
>> 1
>>
>> 8
>>
>> works
>>
>> 1
>>
>> 16
>>
>> works
>>
>> 1
>>
>> 32
>>
>> works
>>
>> 1
>>
>> 64
>>
>> works
>>
>> 2
>>
>> 1
>>
>> works
>>
>> 2
>>
>> 2
>>
>> works
>>
>> 2
>>
>> 4
>>
>> not works
>>
>> 2
>>
>> 8
>>
>> not works
>>
>> 2
>>
>> 16
>>
>> not works
>>
>> 2
>>
>> 32
>>
>> not works
>>
>> 2
>>
>> 64
>>
>> not works
>>
>> 4
>>
>> 1
>>
>> works
>>
>> 4
>>
>> 2
>>
>> works
>>
>> 4
>>
>> 4
>>
>> not works
>>
>> 8
>>
>> 1
>>
>> works
>>
>> 8
>>
>> 2
>>
>> works
>>
>> 8
>>
>> 3
>>
>> works
>>
>> 8
>>
>> 4
>>
>> not works
>>
>> 16
>>
>> 1
>>
>> works
>>
>> 16
>>
>> 2
>>
>> works
>>
>> 16
>>
>> 3
>>
>> not works
>>
>>  
>>
>> what may be the possible reason for this behavior and what may be the 
>> right way to debug this issue?. I tried both with openmpi and mpich and 
>> both give similar results. The compiler is in house compiler.
>>
>> Thanks for your help,
>> Shivarama Rao
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20200414/837cb6fd/attachment.htm>


More information about the CP2K-user mailing list