[CP2K-user] psmp runtime error with hybrid run combinations (mpi + openmp)
Alfio Lazzaro
alfio.... at gmail.com
Wed Apr 15 08:07:24 UTC 2020
Uhm, it seems nothing wrong in the arch file...
Some suggestions:
1. Which version of the GCC compiler are you using (`gfotran --version`)?
2. Try to reduce the optimization from O3 to O2.
3. Which version of SCALAPACK/LAPACK/BLAS are you using? Could you use the
CP2K toolchain to install these libraries?
4. CP2K 6.1 is now 2 years old, is it possible for you to upgrade to the
7.1 (at least)?
Il giorno mercoledì 15 aprile 2020 07:03:16 UTC+2, Shivarama Rao ha scritto:
>
> Hi Alfio,
>
> Thanks for looking into this. following is the arch file I am using.
>
> DFLAGS = -D__F2008 -D__FFTW3 -D__LIBINT -D__LIBXC -D__MPI_VERSION=3\
> -D__LIBINT_MAX_AM=5 -D__LIBDERIV_MAX_AM1=4 -D__MAX_CONTR=4\
> -D__parallel -D__SCALAPACK
>
> CPPFLAGS = -fPIC
> FCFLAGS = $(DFLAGS) -O3 -ffree-form -fPIC\
> -fopenmp -mtune=native \
> -I$(FFTW_INC) -I$(LIBINT_INC) -I$(LIBXC_INC)
> LDFLAGS = $(FCFLAGS) -libverbs -fPIC
>
> LIBS = $(LIBSCALAPACK_LIB)/libscalapack.a\
> $(LIBLAPACK_LIB)/liblapack.a\
> $(LIBBLAS_LIB)/libblas.a\
> $(FFTW_LIB)/libfftw3.a\
> $(FFTW_LIB)/libfftw3_threads.a \
> $(LIBXC_LIB)/libxcf03.a\
> $(LIBXC_LIB)/libxc.a\
> $(LIBINT_LIB)/libderiv.a\
> $(LIBINT_LIB)/libint.a
>
> let me know if you find any issues with above.
>
> Thanks,
> Shivaram
>
>
> On Tuesday, April 14, 2020 at 12:25:54 PM UTC+5:30, Alfio Lazzaro wrote:
>>
>> Hi,
>> So, from what you are saying the problem is the OpenMP parallelization
>> (it works without).
>> However, Cholesky is SCALAPACK, so you should check how you are linking
>> to it.
>> In particular, CP2K suggested using Sequential BLAS.
>>
>> Could you share you arch file and how you are compiling CP2K? (which
>> libraries)
>>
>> Alfio
>>
>>
>> Il giorno lunedì 13 aprile 2020 07:03:42 UTC+2, Shivarama Rao ha scritto:
>>>
>>> Hi,
>>>
>>> I am trying to debug an issue with cp2k 6.1. The popt run is fine. but
>>> there are errors with psmp runs.
>>>
>>> command line used:
>>>
>>> * mpirun -np 2 -x OMP_NUM_THREADS=4
>>> ../../../exe/Linux-x86-64-aocc/cp2k.psmp ./H2O-32.inp*
>>>
>>> psmp executable runs fine with 1 mpi rank and 4 openmp threads. but it
>>> fails with 2 mpi rank and 4 openmp threads.
>>>
>>> Following is the error generated with 2 mpi rank and 4 openmp threads.
>>> the behavior is same for all other input sets like H2O-64.inp, H20-128.inp
>>> H20-256.inp, H20-1024.inp
>>>
>>>
>>>
>>> *******************************************************************************
>>> * ___
>>> *
>>> * /
>>> *
>>> *[ABORT]
>>> *
>>> * ___/ Cholesky decomposition failed. Matrix ill conditioned
>>> ? *
>>> * |
>>> *
>>> *O/| /
>>> *
>>> *| | /
>>> *
>>> *
>>> /home/amd/cp2k_aocc/cp2k-6.1/src/cp_dbcsr_cholesky.F:121 *
>>>
>>> *******************************************************************************
>>>
>>>
>>> ===== Routine Calling Stack =====
>>>
>>> 12 cp_dbcsr_cholesky_decompose
>>> 11 qs_ot_get_derivative
>>> 10 ot_mini
>>> 9 ot_scf_mini
>>> 8 qs_scf_loop_do_ot
>>> 7 qs_scf_new_mos
>>> 6 scf_env_do_scf_inner_loop
>>> 5 scf_env_do_scf
>>> 4 qs_energies
>>> 3 qs_forces
>>> 2 qs_mol_dyn_low
>>> 1 CP2K
>>>
>>> following are the combinations where the executable work/ dont work
>>>
>>>
>>>
>>> MPI processes
>>>
>>> OPENMP threads
>>>
>>> works/not works
>>>
>>> 1
>>>
>>> 1
>>>
>>> works
>>>
>>> 1
>>>
>>> 2
>>>
>>> works
>>>
>>> 1
>>>
>>> 4
>>>
>>> works
>>>
>>> 1
>>>
>>> 8
>>>
>>> works
>>>
>>> 1
>>>
>>> 16
>>>
>>> works
>>>
>>> 1
>>>
>>> 32
>>>
>>> works
>>>
>>> 1
>>>
>>> 64
>>>
>>> works
>>>
>>> 2
>>>
>>> 1
>>>
>>> works
>>>
>>> 2
>>>
>>> 2
>>>
>>> works
>>>
>>> 2
>>>
>>> 4
>>>
>>> not works
>>>
>>> 2
>>>
>>> 8
>>>
>>> not works
>>>
>>> 2
>>>
>>> 16
>>>
>>> not works
>>>
>>> 2
>>>
>>> 32
>>>
>>> not works
>>>
>>> 2
>>>
>>> 64
>>>
>>> not works
>>>
>>> 4
>>>
>>> 1
>>>
>>> works
>>>
>>> 4
>>>
>>> 2
>>>
>>> works
>>>
>>> 4
>>>
>>> 4
>>>
>>> not works
>>>
>>> 8
>>>
>>> 1
>>>
>>> works
>>>
>>> 8
>>>
>>> 2
>>>
>>> works
>>>
>>> 8
>>>
>>> 3
>>>
>>> works
>>>
>>> 8
>>>
>>> 4
>>>
>>> not works
>>>
>>> 16
>>>
>>> 1
>>>
>>> works
>>>
>>> 16
>>>
>>> 2
>>>
>>> works
>>>
>>> 16
>>>
>>> 3
>>>
>>> not works
>>>
>>>
>>>
>>> what may be the possible reason for this behavior and what may be the
>>> right way to debug this issue?. I tried both with openmpi and mpich and
>>> both give similar results. The compiler is in house compiler.
>>>
>>> Thanks for your help,
>>> Shivarama Rao
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20200415/3fea99c4/attachment.htm>
More information about the CP2K-user
mailing list