[CP2K-user] psmp runtime error with hybrid run combinations (mpi + openmp)

Alfio Lazzaro alfio.... at gmail.com
Wed Apr 15 08:07:24 UTC 2020


Uhm, it seems nothing wrong in the arch file...

Some suggestions:
1. Which version of the GCC compiler are you using (`gfotran --version`)?
2. Try to reduce the optimization from O3 to O2.
3. Which version of SCALAPACK/LAPACK/BLAS are you using? Could you use the 
CP2K toolchain to install these libraries?
4. CP2K 6.1 is now 2 years old, is it possible for you to upgrade to the 
7.1 (at least)?



Il giorno mercoledì 15 aprile 2020 07:03:16 UTC+2, Shivarama Rao ha scritto:
>
> Hi Alfio,
>
> Thanks for looking into this. following is the arch file I am using.
>
> DFLAGS      = -D__F2008 -D__FFTW3 -D__LIBINT -D__LIBXC -D__MPI_VERSION=3\
>               -D__LIBINT_MAX_AM=5 -D__LIBDERIV_MAX_AM1=4 -D__MAX_CONTR=4\
>               -D__parallel -D__SCALAPACK
>
> CPPFLAGS    = -fPIC
> FCFLAGS     = $(DFLAGS) -O3 -ffree-form -fPIC\
>               -fopenmp -mtune=native \
>               -I$(FFTW_INC) -I$(LIBINT_INC) -I$(LIBXC_INC)
> LDFLAGS     = $(FCFLAGS) -libverbs -fPIC
>
> LIBS        = $(LIBSCALAPACK_LIB)/libscalapack.a\
>               $(LIBLAPACK_LIB)/liblapack.a\
>               $(LIBBLAS_LIB)/libblas.a\
>               $(FFTW_LIB)/libfftw3.a\
>               $(FFTW_LIB)/libfftw3_threads.a \
>               $(LIBXC_LIB)/libxcf03.a\
>               $(LIBXC_LIB)/libxc.a\
>               $(LIBINT_LIB)/libderiv.a\
>               $(LIBINT_LIB)/libint.a
>
> let me know if you find any issues with above.
>
> Thanks,
> Shivaram
>
>
> On Tuesday, April 14, 2020 at 12:25:54 PM UTC+5:30, Alfio Lazzaro wrote:
>>
>> Hi,
>> So, from what you are saying the problem is the OpenMP parallelization 
>> (it works without).
>> However, Cholesky is SCALAPACK, so you should check how you are linking 
>> to it. 
>> In particular, CP2K suggested using Sequential BLAS.
>>
>> Could you share you arch file and how you are compiling CP2K? (which 
>> libraries)
>>
>> Alfio
>>
>>
>> Il giorno lunedì 13 aprile 2020 07:03:42 UTC+2, Shivarama Rao ha scritto:
>>>
>>> Hi,
>>>
>>> I am trying to debug an issue with cp2k 6.1. The popt run is fine. but 
>>> there are errors with psmp runs.
>>>
>>>   command line used:
>>>
>>> *     mpirun -np 2 -x OMP_NUM_THREADS=4 
>>> ../../../exe/Linux-x86-64-aocc/cp2k.psmp ./H2O-32.inp*
>>>
>>> psmp executable runs fine with 1 mpi rank and 4 openmp threads. but it 
>>> fails with 2 mpi rank and 4 openmp threads. 
>>>
>>> Following is the error generated with 2 mpi rank and 4 openmp threads. 
>>> the behavior is same for all other input sets like H2O-64.inp, H20-128.inp 
>>> H20-256.inp, H20-1024.inp
>>>
>>>
>>>
>>>  *******************************************************************************
>>>  *   ___                                                                
>>>        *
>>>  *  /                                                                    
>>>       *
>>>  *[ABORT]                                                                
>>>       *
>>>  * ___/           Cholesky decomposition failed. Matrix ill conditioned 
>>> ?      *
>>>  *  |                                                                    
>>>       *
>>>  *O/|     /                                                              
>>>       *
>>>  *| |     /                                                              
>>>       *
>>>  *                
>>>  /home/amd/cp2k_aocc/cp2k-6.1/src/cp_dbcsr_cholesky.F:121 *
>>>
>>>  *******************************************************************************
>>>
>>>
>>>  ===== Routine Calling Stack =====
>>>
>>>            12 cp_dbcsr_cholesky_decompose
>>>            11 qs_ot_get_derivative
>>>            10 ot_mini
>>>             9 ot_scf_mini
>>>             8 qs_scf_loop_do_ot
>>>             7 qs_scf_new_mos
>>>             6 scf_env_do_scf_inner_loop
>>>             5 scf_env_do_scf
>>>             4 qs_energies
>>>             3 qs_forces
>>>             2 qs_mol_dyn_low
>>>             1 CP2K
>>>
>>>  following are the combinations where the executable work/ dont work
>>>
>>>  
>>>
>>> MPI processes
>>>
>>> OPENMP threads
>>>
>>> works/not works
>>>
>>> 1
>>>
>>> 1
>>>
>>> works
>>>
>>> 1
>>>
>>> 2
>>>
>>> works
>>>
>>> 1
>>>
>>> 4
>>>
>>> works
>>>
>>> 1
>>>
>>> 8
>>>
>>> works
>>>
>>> 1
>>>
>>> 16
>>>
>>> works
>>>
>>> 1
>>>
>>> 32
>>>
>>> works
>>>
>>> 1
>>>
>>> 64
>>>
>>> works
>>>
>>> 2
>>>
>>> 1
>>>
>>> works
>>>
>>> 2
>>>
>>> 2
>>>
>>> works
>>>
>>> 2
>>>
>>> 4
>>>
>>> not works
>>>
>>> 2
>>>
>>> 8
>>>
>>> not works
>>>
>>> 2
>>>
>>> 16
>>>
>>> not works
>>>
>>> 2
>>>
>>> 32
>>>
>>> not works
>>>
>>> 2
>>>
>>> 64
>>>
>>> not works
>>>
>>> 4
>>>
>>> 1
>>>
>>> works
>>>
>>> 4
>>>
>>> 2
>>>
>>> works
>>>
>>> 4
>>>
>>> 4
>>>
>>> not works
>>>
>>> 8
>>>
>>> 1
>>>
>>> works
>>>
>>> 8
>>>
>>> 2
>>>
>>> works
>>>
>>> 8
>>>
>>> 3
>>>
>>> works
>>>
>>> 8
>>>
>>> 4
>>>
>>> not works
>>>
>>> 16
>>>
>>> 1
>>>
>>> works
>>>
>>> 16
>>>
>>> 2
>>>
>>> works
>>>
>>> 16
>>>
>>> 3
>>>
>>> not works
>>>
>>>  
>>>
>>> what may be the possible reason for this behavior and what may be the 
>>> right way to debug this issue?. I tried both with openmpi and mpich and 
>>> both give similar results. The compiler is in house compiler.
>>>
>>> Thanks for your help,
>>> Shivarama Rao
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20200415/3fea99c4/attachment.htm>


More information about the CP2K-user mailing list