[CP2K-user] psmp runtime error with hybrid run combinations (mpi + openmp)
Shivarama Rao
shivar... at gmail.com
Thu Apr 16 12:27:40 UTC 2020
Hi,
This run is with our internal compiler which is flang based. GNU is working
fine.
It is observed that the array fm_matrix%local_data (line number: 96 of
cp_dbcsr_cholesky.F) is having different value when number of threads are
changed. in GNU the value remains same. Any information about how to root
cause the issue through tracing or debugging any point of the code?
Thanks,
Shivaram
On Wednesday, April 15, 2020 at 1:37:25 PM UTC+5:30, Alfio Lazzaro wrote:
>
> Uhm, it seems nothing wrong in the arch file...
>
> Some suggestions:
> 1. Which version of the GCC compiler are you using (`gfotran --version`)?
> 2. Try to reduce the optimization from O3 to O2.
> 3. Which version of SCALAPACK/LAPACK/BLAS are you using? Could you use the
> CP2K toolchain to install these libraries?
> 4. CP2K 6.1 is now 2 years old, is it possible for you to upgrade to the
> 7.1 (at least)?
>
>
>
> Il giorno mercoledì 15 aprile 2020 07:03:16 UTC+2, Shivarama Rao ha
> scritto:
>>
>> Hi Alfio,
>>
>> Thanks for looking into this. following is the arch file I am using.
>>
>> DFLAGS = -D__F2008 -D__FFTW3 -D__LIBINT -D__LIBXC -D__MPI_VERSION=3\
>> -D__LIBINT_MAX_AM=5 -D__LIBDERIV_MAX_AM1=4 -D__MAX_CONTR=4\
>> -D__parallel -D__SCALAPACK
>>
>> CPPFLAGS = -fPIC
>> FCFLAGS = $(DFLAGS) -O3 -ffree-form -fPIC\
>> -fopenmp -mtune=native \
>> -I$(FFTW_INC) -I$(LIBINT_INC) -I$(LIBXC_INC)
>> LDFLAGS = $(FCFLAGS) -libverbs -fPIC
>>
>> LIBS = $(LIBSCALAPACK_LIB)/libscalapack.a\
>> $(LIBLAPACK_LIB)/liblapack.a\
>> $(LIBBLAS_LIB)/libblas.a\
>> $(FFTW_LIB)/libfftw3.a\
>> $(FFTW_LIB)/libfftw3_threads.a \
>> $(LIBXC_LIB)/libxcf03.a\
>> $(LIBXC_LIB)/libxc.a\
>> $(LIBINT_LIB)/libderiv.a\
>> $(LIBINT_LIB)/libint.a
>>
>> let me know if you find any issues with above.
>>
>> Thanks,
>> Shivaram
>>
>>
>> On Tuesday, April 14, 2020 at 12:25:54 PM UTC+5:30, Alfio Lazzaro wrote:
>>>
>>> Hi,
>>> So, from what you are saying the problem is the OpenMP parallelization
>>> (it works without).
>>> However, Cholesky is SCALAPACK, so you should check how you are linking
>>> to it.
>>> In particular, CP2K suggested using Sequential BLAS.
>>>
>>> Could you share you arch file and how you are compiling CP2K? (which
>>> libraries)
>>>
>>> Alfio
>>>
>>>
>>> Il giorno lunedì 13 aprile 2020 07:03:42 UTC+2, Shivarama Rao ha scritto:
>>>>
>>>> Hi,
>>>>
>>>> I am trying to debug an issue with cp2k 6.1. The popt run is fine. but
>>>> there are errors with psmp runs.
>>>>
>>>> command line used:
>>>>
>>>> * mpirun -np 2 -x OMP_NUM_THREADS=4
>>>> ../../../exe/Linux-x86-64-aocc/cp2k.psmp ./H2O-32.inp*
>>>>
>>>> psmp executable runs fine with 1 mpi rank and 4 openmp threads. but it
>>>> fails with 2 mpi rank and 4 openmp threads.
>>>>
>>>> Following is the error generated with 2 mpi rank and 4 openmp threads.
>>>> the behavior is same for all other input sets like H2O-64.inp, H20-128.inp
>>>> H20-256.inp, H20-1024.inp
>>>>
>>>>
>>>>
>>>> *******************************************************************************
>>>> * ___
>>>> *
>>>> * /
>>>> *
>>>> *[ABORT]
>>>> *
>>>> * ___/ Cholesky decomposition failed. Matrix ill conditioned
>>>> ? *
>>>> * |
>>>> *
>>>> *O/| /
>>>> *
>>>> *| | /
>>>> *
>>>> *
>>>> /home/amd/cp2k_aocc/cp2k-6.1/src/cp_dbcsr_cholesky.F:121 *
>>>>
>>>> *******************************************************************************
>>>>
>>>>
>>>> ===== Routine Calling Stack =====
>>>>
>>>> 12 cp_dbcsr_cholesky_decompose
>>>> 11 qs_ot_get_derivative
>>>> 10 ot_mini
>>>> 9 ot_scf_mini
>>>> 8 qs_scf_loop_do_ot
>>>> 7 qs_scf_new_mos
>>>> 6 scf_env_do_scf_inner_loop
>>>> 5 scf_env_do_scf
>>>> 4 qs_energies
>>>> 3 qs_forces
>>>> 2 qs_mol_dyn_low
>>>> 1 CP2K
>>>>
>>>> following are the combinations where the executable work/ dont work
>>>>
>>>>
>>>>
>>>> MPI processes
>>>>
>>>> OPENMP threads
>>>>
>>>> works/not works
>>>>
>>>> 1
>>>>
>>>> 1
>>>>
>>>> works
>>>>
>>>> 1
>>>>
>>>> 2
>>>>
>>>> works
>>>>
>>>> 1
>>>>
>>>> 4
>>>>
>>>> works
>>>>
>>>> 1
>>>>
>>>> 8
>>>>
>>>> works
>>>>
>>>> 1
>>>>
>>>> 16
>>>>
>>>> works
>>>>
>>>> 1
>>>>
>>>> 32
>>>>
>>>> works
>>>>
>>>> 1
>>>>
>>>> 64
>>>>
>>>> works
>>>>
>>>> 2
>>>>
>>>> 1
>>>>
>>>> works
>>>>
>>>> 2
>>>>
>>>> 2
>>>>
>>>> works
>>>>
>>>> 2
>>>>
>>>> 4
>>>>
>>>> not works
>>>>
>>>> 2
>>>>
>>>> 8
>>>>
>>>> not works
>>>>
>>>> 2
>>>>
>>>> 16
>>>>
>>>> not works
>>>>
>>>> 2
>>>>
>>>> 32
>>>>
>>>> not works
>>>>
>>>> 2
>>>>
>>>> 64
>>>>
>>>> not works
>>>>
>>>> 4
>>>>
>>>> 1
>>>>
>>>> works
>>>>
>>>> 4
>>>>
>>>> 2
>>>>
>>>> works
>>>>
>>>> 4
>>>>
>>>> 4
>>>>
>>>> not works
>>>>
>>>> 8
>>>>
>>>> 1
>>>>
>>>> works
>>>>
>>>> 8
>>>>
>>>> 2
>>>>
>>>> works
>>>>
>>>> 8
>>>>
>>>> 3
>>>>
>>>> works
>>>>
>>>> 8
>>>>
>>>> 4
>>>>
>>>> not works
>>>>
>>>> 16
>>>>
>>>> 1
>>>>
>>>> works
>>>>
>>>> 16
>>>>
>>>> 2
>>>>
>>>> works
>>>>
>>>> 16
>>>>
>>>> 3
>>>>
>>>> not works
>>>>
>>>>
>>>>
>>>> what may be the possible reason for this behavior and what may be the
>>>> right way to debug this issue?. I tried both with openmpi and mpich and
>>>> both give similar results. The compiler is in house compiler.
>>>>
>>>> Thanks for your help,
>>>> Shivarama Rao
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20200416/f9dab72d/attachment.htm>
More information about the CP2K-user
mailing list