[CP2K-user] psmp runtime error with hybrid run combinations (mpi + openmp)

Alfio Lazzaro alfio.... at gmail.com
Wed Apr 15 08:07:24 UTC 2020

Uhm, it seems nothing wrong in the arch file...

Some suggestions:
1. Which version of the GCC compiler are you using (`gfotran --version`)?
2. Try to reduce the optimization from O3 to O2.
3. Which version of SCALAPACK/LAPACK/BLAS are you using? Could you use the 
CP2K toolchain to install these libraries?
4. CP2K 6.1 is now 2 years old, is it possible for you to upgrade to the 
7.1 (at least)?

Il giorno mercoledì 15 aprile 2020 07:03:16 UTC+2, Shivarama Rao ha scritto:
> Hi Alfio,
> Thanks for looking into this. following is the arch file I am using.
> DFLAGS      = -D__F2008 -D__FFTW3 -D__LIBINT -D__LIBXC -D__MPI_VERSION=3\
>               -D__LIBINT_MAX_AM=5 -D__LIBDERIV_MAX_AM1=4 -D__MAX_CONTR=4\
>               -D__parallel -D__SCALAPACK
> FCFLAGS     = $(DFLAGS) -O3 -ffree-form -fPIC\
>               -fopenmp -mtune=native \
>               -I$(FFTW_INC) -I$(LIBINT_INC) -I$(LIBXC_INC)
> LDFLAGS     = $(FCFLAGS) -libverbs -fPIC
> LIBS        = $(LIBSCALAPACK_LIB)/libscalapack.a\
>               $(LIBLAPACK_LIB)/liblapack.a\
>               $(LIBBLAS_LIB)/libblas.a\
>               $(FFTW_LIB)/libfftw3.a\
>               $(FFTW_LIB)/libfftw3_threads.a \
>               $(LIBXC_LIB)/libxcf03.a\
>               $(LIBXC_LIB)/libxc.a\
>               $(LIBINT_LIB)/libderiv.a\
>               $(LIBINT_LIB)/libint.a
> let me know if you find any issues with above.
> Thanks,
> Shivaram
> On Tuesday, April 14, 2020 at 12:25:54 PM UTC+5:30, Alfio Lazzaro wrote:
>> Hi,
>> So, from what you are saying the problem is the OpenMP parallelization 
>> (it works without).
>> However, Cholesky is SCALAPACK, so you should check how you are linking 
>> to it. 
>> In particular, CP2K suggested using Sequential BLAS.
>> Could you share you arch file and how you are compiling CP2K? (which 
>> libraries)
>> Alfio
>> Il giorno lunedì 13 aprile 2020 07:03:42 UTC+2, Shivarama Rao ha scritto:
>>> Hi,
>>> I am trying to debug an issue with cp2k 6.1. The popt run is fine. but 
>>> there are errors with psmp runs.
>>>   command line used:
>>> *     mpirun -np 2 -x OMP_NUM_THREADS=4 
>>> ../../../exe/Linux-x86-64-aocc/cp2k.psmp ./H2O-32.inp*
>>> psmp executable runs fine with 1 mpi rank and 4 openmp threads. but it 
>>> fails with 2 mpi rank and 4 openmp threads. 
>>> Following is the error generated with 2 mpi rank and 4 openmp threads. 
>>> the behavior is same for all other input sets like H2O-64.inp, H20-128.inp 
>>> H20-256.inp, H20-1024.inp
>>>  *******************************************************************************
>>>  *   ___                                                                
>>>        *
>>>  *  /                                                                    
>>>       *
>>>  *[ABORT]                                                                
>>>       *
>>>  * ___/           Cholesky decomposition failed. Matrix ill conditioned 
>>> ?      *
>>>  *  |                                                                    
>>>       *
>>>  *O/|     /                                                              
>>>       *
>>>  *| |     /                                                              
>>>       *
>>>  *                
>>>  /home/amd/cp2k_aocc/cp2k-6.1/src/cp_dbcsr_cholesky.F:121 *
>>>  *******************************************************************************
>>>  ===== Routine Calling Stack =====
>>>            12 cp_dbcsr_cholesky_decompose
>>>            11 qs_ot_get_derivative
>>>            10 ot_mini
>>>             9 ot_scf_mini
>>>             8 qs_scf_loop_do_ot
>>>             7 qs_scf_new_mos
>>>             6 scf_env_do_scf_inner_loop
>>>             5 scf_env_do_scf
>>>             4 qs_energies
>>>             3 qs_forces
>>>             2 qs_mol_dyn_low
>>>             1 CP2K
>>>  following are the combinations where the executable work/ dont work
>>> MPI processes
>>> OPENMP threads
>>> works/not works
>>> 1
>>> 1
>>> works
>>> 1
>>> 2
>>> works
>>> 1
>>> 4
>>> works
>>> 1
>>> 8
>>> works
>>> 1
>>> 16
>>> works
>>> 1
>>> 32
>>> works
>>> 1
>>> 64
>>> works
>>> 2
>>> 1
>>> works
>>> 2
>>> 2
>>> works
>>> 2
>>> 4
>>> not works
>>> 2
>>> 8
>>> not works
>>> 2
>>> 16
>>> not works
>>> 2
>>> 32
>>> not works
>>> 2
>>> 64
>>> not works
>>> 4
>>> 1
>>> works
>>> 4
>>> 2
>>> works
>>> 4
>>> 4
>>> not works
>>> 8
>>> 1
>>> works
>>> 8
>>> 2
>>> works
>>> 8
>>> 3
>>> works
>>> 8
>>> 4
>>> not works
>>> 16
>>> 1
>>> works
>>> 16
>>> 2
>>> works
>>> 16
>>> 3
>>> not works
>>> what may be the possible reason for this behavior and what may be the 
>>> right way to debug this issue?. I tried both with openmpi and mpich and 
>>> both give similar results. The compiler is in house compiler.
>>> Thanks for your help,
>>> Shivarama Rao
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20200415/3fea99c4/attachment.htm>

More information about the CP2K-user mailing list