[CP2K-user] psmp runtime error with hybrid run combinations (mpi + openmp)

Alfio Lazzaro alfio.... at gmail.com
Fri Apr 17 06:29:45 UTC 2020


OK, FLANG is not supported (see https://www.cp2k.org/dev:compiler_support ).
I'm surprised that you can even compile the code...

I have not idea how to debug it. I would start with a very small set-up 
(-O0 optimization, minimal libraries) and see if the problem goes away with 
that.
I think you can put a set of prints of the values and see when FLAG version 
of the code diverges by the GNU one (old fashion debugging), but I'm not 
sure you want to debug FLANG... why don't use GNU?

Alfio



Il giorno giovedì 16 aprile 2020 14:27:40 UTC+2, Shivarama Rao ha scritto:
>
> Hi,
>
> This run is with our internal compiler which is flang based. GNU is 
> working fine. 
>    
> It is observed that the array fm_matrix%local_data (line number: 96 of 
> cp_dbcsr_cholesky.F) is having different value when number of threads are 
> changed. in GNU the value remains same.  Any information about how to root 
> cause the issue through tracing or debugging any point of the code?
>
> Thanks,
> Shivaram
>
> On Wednesday, April 15, 2020 at 1:37:25 PM UTC+5:30, Alfio Lazzaro wrote:
>>
>> Uhm, it seems nothing wrong in the arch file...
>>
>> Some suggestions:
>> 1. Which version of the GCC compiler are you using (`gfotran --version`)?
>> 2. Try to reduce the optimization from O3 to O2.
>> 3. Which version of SCALAPACK/LAPACK/BLAS are you using? Could you use 
>> the CP2K toolchain to install these libraries?
>> 4. CP2K 6.1 is now 2 years old, is it possible for you to upgrade to the 
>> 7.1 (at least)?
>>
>>
>>
>> Il giorno mercoledì 15 aprile 2020 07:03:16 UTC+2, Shivarama Rao ha 
>> scritto:
>>>
>>> Hi Alfio,
>>>
>>> Thanks for looking into this. following is the arch file I am using.
>>>
>>> DFLAGS      = -D__F2008 -D__FFTW3 -D__LIBINT -D__LIBXC -D__MPI_VERSION=3\
>>>               -D__LIBINT_MAX_AM=5 -D__LIBDERIV_MAX_AM1=4 -D__MAX_CONTR=4\
>>>               -D__parallel -D__SCALAPACK
>>>
>>> CPPFLAGS    = -fPIC
>>> FCFLAGS     = $(DFLAGS) -O3 -ffree-form -fPIC\
>>>               -fopenmp -mtune=native \
>>>               -I$(FFTW_INC) -I$(LIBINT_INC) -I$(LIBXC_INC)
>>> LDFLAGS     = $(FCFLAGS) -libverbs -fPIC
>>>
>>> LIBS        = $(LIBSCALAPACK_LIB)/libscalapack.a\
>>>               $(LIBLAPACK_LIB)/liblapack.a\
>>>               $(LIBBLAS_LIB)/libblas.a\
>>>               $(FFTW_LIB)/libfftw3.a\
>>>               $(FFTW_LIB)/libfftw3_threads.a \
>>>               $(LIBXC_LIB)/libxcf03.a\
>>>               $(LIBXC_LIB)/libxc.a\
>>>               $(LIBINT_LIB)/libderiv.a\
>>>               $(LIBINT_LIB)/libint.a
>>>
>>> let me know if you find any issues with above.
>>>
>>> Thanks,
>>> Shivaram
>>>
>>>
>>> On Tuesday, April 14, 2020 at 12:25:54 PM UTC+5:30, Alfio Lazzaro wrote:
>>>>
>>>> Hi,
>>>> So, from what you are saying the problem is the OpenMP parallelization 
>>>> (it works without).
>>>> However, Cholesky is SCALAPACK, so you should check how you are linking 
>>>> to it. 
>>>> In particular, CP2K suggested using Sequential BLAS.
>>>>
>>>> Could you share you arch file and how you are compiling CP2K? (which 
>>>> libraries)
>>>>
>>>> Alfio
>>>>
>>>>
>>>> Il giorno lunedì 13 aprile 2020 07:03:42 UTC+2, Shivarama Rao ha 
>>>> scritto:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I am trying to debug an issue with cp2k 6.1. The popt run is fine. but 
>>>>> there are errors with psmp runs.
>>>>>
>>>>>   command line used:
>>>>>
>>>>> *     mpirun -np 2 -x OMP_NUM_THREADS=4 
>>>>> ../../../exe/Linux-x86-64-aocc/cp2k.psmp ./H2O-32.inp*
>>>>>
>>>>> psmp executable runs fine with 1 mpi rank and 4 openmp threads. but it 
>>>>> fails with 2 mpi rank and 4 openmp threads. 
>>>>>
>>>>> Following is the error generated with 2 mpi rank and 4 openmp threads. 
>>>>> the behavior is same for all other input sets like H2O-64.inp, H20-128.inp 
>>>>> H20-256.inp, H20-1024.inp
>>>>>
>>>>>
>>>>>
>>>>>  *******************************************************************************
>>>>>  *   ___                                                              
>>>>>          *
>>>>>  *  /                                                                  
>>>>>         *
>>>>>  *[ABORT]                                                              
>>>>>         *
>>>>>  * ___/           Cholesky decomposition failed. Matrix ill 
>>>>> conditioned ?      *
>>>>>  *  |                                                                  
>>>>>         *
>>>>>  *O/|     /                                                            
>>>>>         *
>>>>>  *| |     /                                                            
>>>>>         *
>>>>>  *                
>>>>>  /home/amd/cp2k_aocc/cp2k-6.1/src/cp_dbcsr_cholesky.F:121 *
>>>>>
>>>>>  *******************************************************************************
>>>>>
>>>>>
>>>>>  ===== Routine Calling Stack =====
>>>>>
>>>>>            12 cp_dbcsr_cholesky_decompose
>>>>>            11 qs_ot_get_derivative
>>>>>            10 ot_mini
>>>>>             9 ot_scf_mini
>>>>>             8 qs_scf_loop_do_ot
>>>>>             7 qs_scf_new_mos
>>>>>             6 scf_env_do_scf_inner_loop
>>>>>             5 scf_env_do_scf
>>>>>             4 qs_energies
>>>>>             3 qs_forces
>>>>>             2 qs_mol_dyn_low
>>>>>             1 CP2K
>>>>>
>>>>>  following are the combinations where the executable work/ dont work
>>>>>
>>>>>  
>>>>>
>>>>> MPI processes
>>>>>
>>>>> OPENMP threads
>>>>>
>>>>> works/not works
>>>>>
>>>>> 1
>>>>>
>>>>> 1
>>>>>
>>>>> works
>>>>>
>>>>> 1
>>>>>
>>>>> 2
>>>>>
>>>>> works
>>>>>
>>>>> 1
>>>>>
>>>>> 4
>>>>>
>>>>> works
>>>>>
>>>>> 1
>>>>>
>>>>> 8
>>>>>
>>>>> works
>>>>>
>>>>> 1
>>>>>
>>>>> 16
>>>>>
>>>>> works
>>>>>
>>>>> 1
>>>>>
>>>>> 32
>>>>>
>>>>> works
>>>>>
>>>>> 1
>>>>>
>>>>> 64
>>>>>
>>>>> works
>>>>>
>>>>> 2
>>>>>
>>>>> 1
>>>>>
>>>>> works
>>>>>
>>>>> 2
>>>>>
>>>>> 2
>>>>>
>>>>> works
>>>>>
>>>>> 2
>>>>>
>>>>> 4
>>>>>
>>>>> not works
>>>>>
>>>>> 2
>>>>>
>>>>> 8
>>>>>
>>>>> not works
>>>>>
>>>>> 2
>>>>>
>>>>> 16
>>>>>
>>>>> not works
>>>>>
>>>>> 2
>>>>>
>>>>> 32
>>>>>
>>>>> not works
>>>>>
>>>>> 2
>>>>>
>>>>> 64
>>>>>
>>>>> not works
>>>>>
>>>>> 4
>>>>>
>>>>> 1
>>>>>
>>>>> works
>>>>>
>>>>> 4
>>>>>
>>>>> 2
>>>>>
>>>>> works
>>>>>
>>>>> 4
>>>>>
>>>>> 4
>>>>>
>>>>> not works
>>>>>
>>>>> 8
>>>>>
>>>>> 1
>>>>>
>>>>> works
>>>>>
>>>>> 8
>>>>>
>>>>> 2
>>>>>
>>>>> works
>>>>>
>>>>> 8
>>>>>
>>>>> 3
>>>>>
>>>>> works
>>>>>
>>>>> 8
>>>>>
>>>>> 4
>>>>>
>>>>> not works
>>>>>
>>>>> 16
>>>>>
>>>>> 1
>>>>>
>>>>> works
>>>>>
>>>>> 16
>>>>>
>>>>> 2
>>>>>
>>>>> works
>>>>>
>>>>> 16
>>>>>
>>>>> 3
>>>>>
>>>>> not works
>>>>>
>>>>>  
>>>>>
>>>>> what may be the possible reason for this behavior and what may be the 
>>>>> right way to debug this issue?. I tried both with openmpi and mpich and 
>>>>> both give similar results. The compiler is in house compiler.
>>>>>
>>>>> Thanks for your help,
>>>>> Shivarama Rao
>>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20200416/fe5e4215/attachment.htm>


More information about the CP2K-user mailing list