[CP2K-user] [CP2K:18656] Re: Install issues with IBM Power9 processors with Nvidia V100 GPU
Alfio Lazzaro
alfio.lazzaro at gmail.com
Wed Apr 12 08:03:56 UTC 2023
I'm sorry, I understand I was not clear in my previous message: the error
you see it is not CP2K related, this is a COSMA error. Conclusion: you
cannot use the toolchain to install COSMA and you have to do your own
installation of COSMA and try to investigate where the problem is. You can
check the way to install COSMA at https://github.com/eth-cscs/COSMA.
I do the following:
Run the toolchain without COSMA.
Source the install/setup.
cosma_ver=2.6.5
wget
https://github.com/eth-cscs/COSMA/releases/download/v${cosma_ver}/COSMA-v${cosma_ver}.tar.gz
tar xf COSMA-v${cosma_ver}.tar.gz && rm COSMA-v${cosma_ver}.tar.gz
cd COSMA-v${cosma_ver}
mkdir build && cd build
mkdir install
cmake -DCMAKE_INSTALL_PREFIX=${PWD}/install -DCOSMA_BLAS=CUDA
-DCOSMA_SCALAPACK=OPENBLAS -DCOSMA_WITH_TESTS=NO -DCOSMA_WITH_BENCHMARKS=NO
-DCMAKE_CXX_COMPILER=mpic++ -DCOSMA_WITH_APPS=NO -DCOSMA_WITH_PROFILING=NO
-DBUILD_SHARED_LIBS=NO ..
make && make install
You can reuse the toolchain scalapack installation.
Note that I'm building with CUDA, my initial suggestion is to try the CPU
only (i.e. -DCOSMA_BLAS=OPENBLAS).
Then you can run again the toolchain with
./install_cp2k_toolchain.sh --install-all --with-cmake=system
--with-openmpi=system --with-gcc=system --with-quip=no --with-libtorch=no
--with-plumed=no --with-cosma=<path to your COSMA insallation>
--with-sirius=no --enable-cuda --gpu-ver=V100
Il giorno martedì 11 aprile 2023 alle 20:13:47 UTC+2 Nathan Keilbart ha
scritto:
> Seems my last post didn't go through. I will clarify in saying that I had
> to disable SIRIUS as it seems to hard code in the depedency of COSMA which
> enabled it everytime I was installing. It just seemed easier at that point
> to at least get a working binary.
>
> I have recompiled with the SIRIUS and COSMA library enabled. Here is the
> output when I run the input.
>
> error: GPU API call : unspecified launch failure
> terminate called after throwing an instance of 'std::runtime_error'
> what(): GPU ERROR
>
> Program received signal SIGABRT: Process abort signal.
>
> Backtrace for this error:
> error: GPU API call : unspecified launch failure
> terminate called after throwing an instance of 'std::runtime_error'
> what(): GPU ERROR
>
> Program received signal SIGABRT: Process abort signal.
>
> Backtrace for this error:
> #0 0x20002885b34f in ???
> #1 0x200028859c17 in ???
> #2 0x2000000504d7 in ???
> #0 0x20002885b34f in ???
> #1 0x200028859c17 in ???
> #2 0x2000000504d7 in ???
> #3 0x200028cafcb0 in __GI_raise
> at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
> #3 0x200028cafcb0 in __GI_raise
> at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
> #4 0x200028cb200b in __GI_abort
> at /usr/src/debug/glibc-2.17-c758a686/stdlib/abort.c:90
> #5 0x200011e3eda3 in ???
> #6 0x200011e3b5d3 in ???
> #7 0x200011e3b623 in ???
> #8 0x200011e3baa7 in ???
> #4 0x200028cb200b in __GI_abort
> at /usr/src/debug/glibc-2.17-c758a686/stdlib/abort.c:90
> #5 0x200011e3eda3 in ???
> #6 0x200011e3b5d3 in ???
> #7 0x200011e3b623 in ???
> #8 0x200011e3baa7 in ???
> #9 0x13a41fdb in check_runtime_status
> at
> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/libs/Tiled-MM/src/Tiled-MM/util.hpp:17
> #9 0x13a41fdb in check_runtime_status
> at
> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/libs/Tiled-MM/src/Tiled-MM/util.hpp:17
> #10 0x13a45c6f in
> _ZN3gpu4gemmIdEEvRNS_9mm_handleIT_EEPS2_S5_S5_iiiS2_S2_bb
> at
> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/libs/Tiled-MM/src/Tiled-MM/tiled_mm.cpp:480
> #10 0x13a45c6f in
> _ZN3gpu4gemmIdEEvRNS_9mm_handleIT_EEPS2_S5_S5_iiiS2_S2_bb
> at
> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/libs/Tiled-MM/src/Tiled-MM/tiled_mm.cpp:480
> #11 0x13a01ccf in
> _ZN5cosma14local_multiplyIdEEvPN3gpu9mm_handleIT_EEPS3_S6_S6_iiiS3_S3_bb
> at
> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/local_multiply.cpp:98
> #12 0x13a01dab in
> _ZN5cosma14local_multiplyIdEEvPNS_13cosma_contextIT_EEPS2_S5_S5_iiiS2_S2_b
> at
> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/local_multiply.cpp:168
> #11 0x13a01ccf in
> _ZN5cosma14local_multiplyIdEEvPN3gpu9mm_handleIT_EEPS3_S6_S6_iiiS3_S3_bb
> at
> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/local_multiply.cpp:98
> #12 0x13a01dab in
> _ZN5cosma14local_multiplyIdEEvPNS_13cosma_contextIT_EEPS2_S5_S5_iiiS2_S2_b
> at
> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/local_multiply.cpp:168
> #13 0x139e4cd7 in
> _ZN5cosma8multiplyIdEEvPNS_13cosma_contextIT_EERNS_11CosmaMatrixIS2_EES7_S7_RNS_8IntervalES9_S9_S9_mRKNS_8StrategyEPNS_12communicatorES2_S2_
> at
> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/multiply.cpp:382
> #14 0x139e468b in
> _ZN5cosma8parallelIdEEvPNS_13cosma_contextIT_EERNS_11CosmaMatrixIS2_EES7_S7_RNS_8IntervalES9_S9_S9_mRKNS_8StrategyEPNS_12communicatorES2_S2_
> at
> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/multiply.cpp:868
> #15 0x139e4ef3 in
> _ZN5cosma8multiplyIdEEvPNS_13cosma_contextIT_EERNS_11CosmaMatrixIS2_EES7_S7_RNS_8IntervalES9_S9_S9_mRKNS_8StrategyEPNS_12communicatorES2_S2_
> at
> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/multiply.cpp:409
> #16 0x139e5197 in
> _ZN5cosma8multiplyIdEEvPNS_13cosma_contextIT_EERNS_11CosmaMatrixIS2_EES7_S7_RKNS_8StrategyEP19ompi_communicator_tS2_S2_
> at
> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/multiply.cpp:285
> #17 0x139e5393 in
> _ZN5cosma8multiplyIdEEvRNS_11CosmaMatrixIT_EES4_S4_RKNS_8StrategyEP19ompi_communicator_tS2_S2_
> at
> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/multiply.cpp:228
> #13 0x139e4cd7 in
> _ZN5cosma8multiplyIdEEvPNS_13cosma_contextIT_EERNS_11CosmaMatrixIS2_EES7_S7_RNS_8IntervalES9_S9_S9_mRKNS_8StrategyEPNS_12communicatorES2_S2_
> at
> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/multiply.cpp:382
> #14 0x139e468b in
> _ZN5cosma8parallelIdEEvPNS_13cosma_contextIT_EERNS_11CosmaMatrixIS2_EES7_S7_RNS_8IntervalES9_S9_S9_mRKNS_8StrategyEPNS_12communicatorES2_S2_
> #15 0x139e4ef3 in
> _ZN5cosma8multiplyIdEEvPNS_13cosma_contextIT_EERNS_11CosmaMatrixIS2_EES7_S7_RNS_8IntervalES9_S9_S9_mRKNS_8StrategyEPNS_12communicatorES2_S2_
> at
> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/multiply.cpp:409
> #16 0x139e5197 in
> _ZN5cosma8multiplyIdEEvPNS_13cosma_contextIT_EERNS_11CosmaMatrixIS2_EES7_S7_RKNS_8StrategyEP19ompi_communicator_tS2_S2_
> at
> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/multiply.cpp:285
> #17 0x139e5393 in
> _ZN5cosma8multiplyIdEEvRNS_11CosmaMatrixIT_EES4_S4_RKNS_8StrategyEP19ompi_communicator_tS2_S2_
> at
> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/multiply.cpp:228
> #18 0x139b6613 in
> _ZN5cosma6pxgemmIdEEvcciiiT_PKS1_iiPKiS3_iiS5_S1_PS1_iiS5_
> at
> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/cosma_pxgemm.cpp:350
> #18 0x139b6613 in
> _ZN5cosma6pxgemmIdEEvcciiiT_PKS1_iiPKiS3_iiS5_S1_PS1_iiS5_
> at
> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/cosma_pxgemm.cpp:350
> #19 0x139aadd7 in cosma_pdgemm_
> at
> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/prefixed_pxgemm.cpp:51
> #19 0x139aadd7 in cosma_pdgemm_
> at
> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/prefixed_pxgemm.cpp:51
> #20 0x139ab62b in cosma_pdgemm
> at
> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/prefixed_pxgemm.cpp:225
> #20 0x139ab62b in cosma_pdgemm
> at
> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/prefixed_pxgemm.cpp:225
> #21 0x10a5e92f in cosma_pdgemm
> at
> /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/parallel_gemm_api.F:287
> #22 0x10a5e92f in __parallel_gemm_api_MOD_parallel_gemm_fm
> at
> /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/parallel_gemm_api.F:106
> #21 0x10a5e92f in cosma_pdgemm
> at
> /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/parallel_gemm_api.F:287
> #22 0x10a5e92f in __parallel_gemm_api_MOD_parallel_gemm_fm
> at
> /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/parallel_gemm_api.F:106
> #23 0x10cc23c7 in __qs_mo_methods_MOD_make_basis_sm
> at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_mo_methods.F:116
> #23 0x10cc23c7 in __qs_mo_methods_MOD_make_basis_sm
> at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_mo_methods.F:116
> #24 0x11a72b37 in __qs_initial_guess_MOD_calculate_first_density_matrix
> at
> /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_initial_guess.F:669
> #24 0x11a72b37 in __qs_initial_guess_MOD_calculate_first_density_matrix
> at
> /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_initial_guess.F:669
> #25 0x10db7b5b in scf_env_initial_rho_setup
> at
> /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_scf_initialization.F:1107
> #26 0x10db7b5b in init_scf_run
> at
> /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_scf_initialization.F:1003
> #27 0x10dbac9b in __qs_scf_initialization_MOD_qs_scf_env_initialize
> at
> /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_scf_initialization.F:181
> #25 0x10db7b5b in scf_env_initial_rho_setup
> at
> /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_scf_initialization.F:1107
> #26 0x10db7b5b in init_scf_run
> at
> /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_scf_initialization.F:1003
> #27 0x10dbac9b in __qs_scf_initialization_MOD_qs_scf_env_initialize
> at
> /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_scf_initialization.F:181
> #28 0x10daf233 in __qs_scf_MOD_scf
> at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_scf.F:232
> #28 0x10daf233 in __qs_scf_MOD_scf
> at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_scf.F:232
> #29 0x10b283c3 in __qs_energy_MOD_qs_energies
> at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_energy.F:111
> #29 0x10b283c3 in __qs_energy_MOD_qs_energies
> at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_energy.F:111
> #30 0x10b5fa43 in qs_forces
> at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_force.F:200
> #31 0x10b602ff in __qs_force_MOD_qs_calc_energy_force
> at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_force.F:110
> #30 0x10b5fa43 in qs_forces
> at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_force.F:200
> #31 0x10b602ff in __qs_force_MOD_qs_calc_energy_force
> at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_force.F:110
> #32 0x1079c84b in __force_env_methods_MOD_force_env_calc_energy_force
> at
> /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/force_env_methods.F:259
> #32 0x1079c84b in __force_env_methods_MOD_force_env_calc_energy_force
> at
> /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/force_env_methods.F:259
> #33 0x102f5323 in qs_mol_dyn_low
> at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/motion/md_run.F:371
> #34 0x102f648b in __md_run_MOD_qs_mol_dyn
> at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/motion/md_run.F:149
> #33 0x102f5323 in qs_mol_dyn_low
> at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/motion/md_run.F:371
> #34 0x102f648b in __md_run_MOD_qs_mol_dyn
> at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/motion/md_run.F:149
> #35 0x101e73d3 in cp2k_run
> at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/start/cp2k_runs.F:364
> #36 0x101e91af in __cp2k_runs_MOD_run_input
> at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/start/cp2k_runs.F:997
> #35 0x101e73d3 in cp2k_run
> at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/start/cp2k_runs.F:364
> #36 0x101e91af in __cp2k_runs_MOD_run_input
> at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/start/cp2k_runs.F:997
> #37 0x101e24f7 in cp2k
> at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/start/cp2k.F:379
> #38 0x101e3ca7 in main
> at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/start/cp2k.F:44
> #37 0x101e24f7 in cp2k
> at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/start/cp2k.F:379
> #38 0x101e3ca7 in main
> at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/start/cp2k.F:44
> ERROR: One or more process (first noticed rank 1) terminated with signal 6
> On Saturday, April 8, 2023 at 10:42:29 AM UTC-7 Alfio Lazzaro wrote:
>
>> I'm not sure what it can be wrong...
>> I suggest to compile COSMA outside the toolchain with two steps: only CPU
>> and test it, then if it works move to GPU compilation.
>> What's the error you get with COSMA?
>>
>> I'm surprised you get an error with Sirius, unless you specifically use
>> it if should give any error...
>>
>>
>>
>> Il giorno sabato 8 aprile 2023 alle 01:26:22 UTC+2 Nathan Keilbart ha
>> scritto:
>>
>>> Thanks Alfio. Sorry for my late reply. It seems something in my
>>> environment was keeping that from being detected correctly. My scripts now
>>> detect everything correctly and after finding certain libraries that
>>> wouldn't build I was finally able to get a working binary. One strange
>>> issue is that the -ldl flag was needed when compiling the parallel binary.
>>> Not sure if this is normally detected but for my system and inputs I was
>>> providing it didn't do it so I simply added it to the arch files.
>>>
>>> Initially, I was getting a cuda memory issue when running my test system
>>> of 300 atoms on one node with four GPUs but I have since resubmitted the
>>> job several times and it appears to be working. I'm not sure if I was just
>>> getting a bad node or something.
>>>
>>> As I mentioned, I had to disable quite a few libraries. They install
>>> just fine according to the terminal but when I go to compile the binaries
>>> it causes them to misbehave and crash before even doing the initial SCF
>>> loop. Here are the flags I used.
>>>
>>> ./install_cp2k_toolchain.sh --install-all --with-cmake=system
>>> --with-openmpi=system --with-gcc=system --with-quip=no --with-libtorch=no
>>> --with-plumed=no --with-cosma=no --with-sirius=no --enable-cuda
>>> --gpu-ver=V100
>>>
>>> In your opinion, would I get any more of a speed up by debugging this
>>> issue? I'm primarily concerned with the cosma and sirius libraries. Once
>>> again, thank you for your help. I'm working on an intel system and have a
>>> working binary but might have some questions as I'm seeing very poor
>>> scaling when I use multiple nodes.
>>> On Thursday, March 30, 2023 at 9:35:52 PM UTC-7 Alfio Lazzaro wrote:
>>>
>>>> There is still something wrong in your local_cuda.psmp file.
>>>> In your output above I cannot find the flag `-D__parallel` . Isee only
>>>> the followings:
>>>>
>>>> -D__OFFLOAD_CUDA -D__DBCSR_ACC -D__FFTW3 -D__LIBINT -D__LIBXC
>>>> -D__SCALAPACK -D__COSMA -D__ELPA -D__ELPA_NVIDIA_GPU -D__GSL -D__HDF5
>>>> -D__LIBVDWXC -D__SPGLIB -D__LIBVORI -D__SPFFT -D__OFFLOAD_GEMM -D__SPLA
>>>> -D__SIRIUS -D__CUDA
>>>>
>>>> So my guess is that the toolchain was not able to recognize MPI (no
>>>> idea why). Could you add -D__parallel on top of those flags?
>>>>
>>>> Il giorno venerdì 31 marzo 2023 alle 00:08:29 UTC+2 Nathan Keilbart ha
>>>> scritto:
>>>>
>>>>> Thank Alfio. I wasn't sure what file was controlling that. I updated
>>>>> the file to have those compilers and then did a make realclean. Afterwards,
>>>>> I am now getting this error:
>>>>>
>>>>> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/src/fm/cp_blacs_env.F:192:19:
>>>>>
>>>>> gcd_max = -1
>>>>> 1
>>>>> Error: Symbol 'gcd_max' at (1) has no IMPLICIT type
>>>>> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/src/fm/cp_blacs_env.F:193:18:
>>>>>
>>>>> DO ipe = 1, CEILING(SQRT(REAL(npe, dp)))
>>>>> 1
>>>>> Error: Symbol 'ipe' at (1) has no IMPLICIT type
>>>>> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/src/fm/cp_blacs_env.F:194:18:
>>>>>
>>>>> jpe = npe/ipe
>>>>> 1
>>>>> Error: Symbol 'jpe' at (1) has no IMPLICIT type
>>>>> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/src/fm/cp_blacs_env.F:185:29:
>>>>>
>>>>> my_blacs_grid_layout = BLACS_GRID_SQUARE
>>>>> 1
>>>>> Error: Symbol 'my_blacs_grid_layout' at (1) has no IMPLICIT type; did
>>>>> you mean 'blacs_grid_layout'?
>>>>> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/src/fm/cp_blacs_env.F:221:25:
>>>>>
>>>>> my_blacs_repeatable = .FALSE.
>>>>> 1
>>>>> Error: Symbol 'my_blacs_repeatable' at (1) has no IMPLICIT type; did
>>>>> you mean 'blacs_repeatable'?
>>>>> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/src/fm/cp_blacs_env.F:213:18:
>>>>>
>>>>> my_row_major = .TRUE.
>>>>> 1
>>>>> Error: Symbol 'my_row_major' at (1) has no IMPLICIT type; did you mean
>>>>> 'row_major'?
>>>>> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/src/fm/cp_blacs_env.F:174:11:
>>>>>
>>>>> npcol = 1
>>>>> 1
>>>>> Error: Symbol 'npcol' at (1) has no IMPLICIT type; did you mean
>>>>> 'ipcol'?
>>>>> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/src/fm/cp_blacs_env.F:175:9:
>>>>>
>>>>> npe = blacs_env%n_pid
>>>>> 1
>>>>> Error: Symbol 'npe' at (1) has no IMPLICIT type
>>>>> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/src/fm/cp_blacs_env.F:173:11:
>>>>>
>>>>> nprow = 1
>>>>> 1
>>>>> Error: Symbol 'nprow' at (1) has no IMPLICIT type; did you mean
>>>>> 'iprow'?
>>>>> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/src/fm/cp_blacs_env.F:188:22:
>>>>>
>>>>> SELECT CASE (my_blacs_grid_layout)
>>>>> 1
>>>>> Error: Argument of SELECT statement at (1) cannot be UNKNOWN
>>>>> make[3]: *** [/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/Makefile:519:
>>>>> cp_blacs_env.o] Error 1
>>>>> make[2]: *** [/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/Makefile:146:
>>>>> all] Error 2
>>>>>
>>>>> make[1]: *** [/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/Makefile:128:
>>>>> psmp] Error 2
>>>>> make: *** [Makefile:123: all] Error 2
>>>>>
>>>>> On Thursday, March 30, 2023 at 12:22:43 AM UTC-7 Alfio Lazzaro wrote:
>>>>>
>>>>>> There is no relation with the DBCSR compilation itself, you see a
>>>>>> problem in DBCSR simply because it is the first to compile in CP2K.
>>>>>> The error message is:
>>>>>>
>>>>>> /bin/sh: c: command not found
>>>>>>
>>>>>> and indeed you are using the command
>>>>>>
>>>>>> c -fno-omit-frame-pointer -fopenmp -g -mtune=native -O3
>>>>>> -funroll-loops ...
>>>>>>
>>>>>> for compiling, therefore there is something wrong in the compiler
>>>>>> call.
>>>>>> I think the problem is that the local_cuda.psmp file has something
>>>>>> wrong in the definition of the compilers, namely the lines
>>>>>>
>>>>>> CC := mpicc
>>>>>> FC := mpif90
>>>>>> LD := mpif90
>>>>>> AR := ar -r
>>>>>>
>>>>>> could you check if they are linking to the rights commands?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Il giorno giovedì 30 marzo 2023 alle 03:12:26 UTC+2 Nathan Keilbart
>>>>>> ha scritto:
>>>>>>
>>>>>>> Hello everyone,
>>>>>>>
>>>>>>> I've been working on installing CP2K on a system with IBM Power9
>>>>>>> processors and Nvidia V100 GPUs. I'm using the toolchain with these options:
>>>>>>>
>>>>>>> ./install_cp2k_toolchain.sh -j --with-cmake=system
>>>>>>> --mpi-mode=openmpi --enable-cuda --gpu-ver=V100
>>>>>>>
>>>>>>> It installs all the dependencies without any errors so that I copy
>>>>>>> over the files to the arch folder and then source the setup file followed by
>>>>>>>
>>>>>>> make -j ARCH=local_cuda VERSION=psmp
>>>>>>>
>>>>>>> The following is some of the last lines of output
>>>>>>>
>>>>>>> /usr/bin/env python3
>>>>>>> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/exts/dbcsr/tools/build_utils/fypp/bin/fypp
>>>>>>> -n --line-marker-format=gfortran5
>>>>>>> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/exts/dbcsr/src/tensors/dbcsr_tensor_test.F
>>>>>>> dbcsr_tensor_test.F90
>>>>>>> c -fno-omit-frame-pointer -fopenmp -g -mtune=native -O3
>>>>>>> -funroll-loops
>>>>>>> -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/openblas-0.3.21/include'
>>>>>>> -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/fftw-3.3.10/include'
>>>>>>> -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/libint-v2.6.0-cp2k-lmax-5/include'
>>>>>>> -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/libxc-6.0.0/include'
>>>>>>> -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/COSMA-2.6.2/include'
>>>>>>> -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/elpa-2022.11.001/nvidia/include/elpa_openmp-2022.11.001/modules'
>>>>>>> -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/elpa-2022.11.001/nvidia/include/elpa_openmp-2022.11.001/elpa'
>>>>>>> -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/gsl-2.7/include'
>>>>>>> -I/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/hdf5-1.12.0/include
>>>>>>> -I/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/libvdwxc-0.4.0/include
>>>>>>> -I/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/spglib-1.16.2/include
>>>>>>> -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/SpFFT-1.0.6/include'
>>>>>>> -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/SpLA-1.5.4/include/spla'
>>>>>>> -I/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/sirius-7.3.2/include/cuda
>>>>>>> -fbacktrace -ffree-form -fimplicit-none -std=f2008 -Werror=aliasing
>>>>>>> -Werror=ampersand -Werror=c-binding-type -Werror=intrinsic-shadow
>>>>>>> -Werror=intrinsics-std -Werror=line-truncation -Werror=tabs
>>>>>>> -Werror=target-lifetime -Werror=underflow -Werror=unused-but-set-variable
>>>>>>> -Werror=unused-variable -Werror=unused-dummy-argument -Werror=conversion
>>>>>>> -Werror=zerotrip -Wno-maybe-uninitialized -Wuninitialized
>>>>>>> -Wuse-without-only -D__OFFLOAD_CUDA -D__DBCSR_ACC -D__FFTW3 -D__LIBINT
>>>>>>> -D__LIBXC -D__SCALAPACK -D__COSMA -D__ELPA -D__ELPA_NVIDIA_GPU -D__GSL
>>>>>>> -D__HDF5 -D__LIBVDWXC -D__SPGLIB -D__LIBVORI -D__SPFFT -D__OFFLOAD_GEMM
>>>>>>> -D__SPLA -D__SIRIUS -D__CUDA -D__SHORT_FILE__="\"dbcsr_tensor_test.F\""
>>>>>>> -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/exts/dbcsr/src/tensors/'
>>>>>>> -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/exts/dbcsr/src'
>>>>>>> dbcsr_tensor_test.F90
>>>>>>> /bin/sh: c: command not found
>>>>>>> make[4]:
>>>>>>> [/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/exts/build_dbcsr//Makefile:258:
>>>>>>> dbcsr_tensor_test.o] Error 127 (ignored)
>>>>>>> /usr/bin/env python3
>>>>>>> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/exts/dbcsr/tools/build_utils/fypp/bin/fypp
>>>>>>> -n --line-marker-format=gfortran5
>>>>>>> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/exts/dbcsr/src/tensors/dbcsr_tensor_api.F
>>>>>>> dbcsr_tensor_api.F90
>>>>>>> c -fno-omit-frame-pointer -fopenmp -g -mtune=native -O3
>>>>>>> -funroll-loops
>>>>>>> -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/openblas-0.3.21/include'
>>>>>>> -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/fftw-3.3.10/include'
>>>>>>> -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/libint-v2.6.0-cp2k-lmax-5/include'
>>>>>>> -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/libxc-6.0.0/include'
>>>>>>> -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/COSMA-2.6.2/include'
>>>>>>> -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/elpa-2022.11.001/nvidia/include/elpa_openmp-2022.11.001/modules'
>>>>>>> -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/elpa-2022.11.001/nvidia/include/elpa_openmp-2022.11.001/elpa'
>>>>>>> -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/gsl-2.7/include'
>>>>>>> -I/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/hdf5-1.12.0/include
>>>>>>> -I/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/libvdwxc-0.4.0/include
>>>>>>> -I/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/spglib-1.16.2/include
>>>>>>> -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/SpFFT-1.0.6/include'
>>>>>>> -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/SpLA-1.5.4/include/spla'
>>>>>>> -I/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/sirius-7.3.2/include/cuda
>>>>>>> -fbacktrace -ffree-form -fimplicit-none -std=f2008 -Werror=aliasing
>>>>>>> -Werror=ampersand -Werror=c-binding-type -Werror=intrinsic-shadow
>>>>>>> -Werror=intrinsics-std -Werror=line-truncation -Werror=tabs
>>>>>>> -Werror=target-lifetime -Werror=underflow -Werror=unused-but-set-variable
>>>>>>> -Werror=unused-variable -Werror=unused-dummy-argument -Werror=conversion
>>>>>>> -Werror=zerotrip -Wno-maybe-uninitialized -Wuninitialized
>>>>>>> -Wuse-without-only -D__OFFLOAD_CUDA -D__DBCSR_ACC -D__FFTW3 -D__LIBINT
>>>>>>> -D__LIBXC -D__SCALAPACK -D__COSMA -D__ELPA -D__ELPA_NVIDIA_GPU -D__GSL
>>>>>>> -D__HDF5 -D__LIBVDWXC -D__SPGLIB -D__LIBVORI -D__SPFFT -D__OFFLOAD_GEMM
>>>>>>> -D__SPLA -D__SIRIUS -D__CUDA -D__SHORT_FILE__="\"dbcsr_tensor_api.F\""
>>>>>>> -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/exts/dbcsr/src/tensors/'
>>>>>>> -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/exts/dbcsr/src'
>>>>>>> dbcsr_tensor_api.F90
>>>>>>> /bin/sh: c: command not found
>>>>>>> make[4]:
>>>>>>> [/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/exts/build_dbcsr//Makefile:258:
>>>>>>> dbcsr_tensor_api.o] Error 127 (ignored)
>>>>>>> Updating archive
>>>>>>> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/lib/local_cuda/psmp/exts/dbcsr/libdbcsr.a
>>>>>>> ar: creating
>>>>>>> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/lib/local_cuda/psmp/exts/dbcsr/libdbcsr.a
>>>>>>> ar: dbcsr_cuda_profiling.o: No such file or directory
>>>>>>> make[4]: ***
>>>>>>> [/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/exts/build_dbcsr//Makefile:330:
>>>>>>> /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/lib/local_cuda/psmp/exts/dbcsr/libdbcsr.a]
>>>>>>> Error 1
>>>>>>> make[3]: ***
>>>>>>> [/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/exts/build_dbcsr/Makefile:179:
>>>>>>> libdbcsr] Error 2
>>>>>>> make[2]: ***
>>>>>>> [/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/exts/Makefile.inc:38: dbcsr]
>>>>>>> Error 2
>>>>>>> make[1]: *** [/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/Makefile:128:
>>>>>>> psmp] Error 2
>>>>>>> make: *** [Makefile:123: all] Error 2
>>>>>>>
>>>>>>> It seems that it is having issues with the DBCSR module. I initially
>>>>>>> had an issue with this because I seemed to have left off the --recursive
>>>>>>> option and after making sure my git clone had that it at least let me build
>>>>>>> most of the serial version. It at least gave me the cp2k.sopt binary and it
>>>>>>> seems to at least take inputs. I didn't have a chance to test it too much
>>>>>>> yet. When I got this binary I had done
>>>>>>>
>>>>>>> make -j ARCH=local_cuda VERSION="ssmp sdbg psmp pdbg"
>>>>>>>
>>>>>>> as suggested.
>>>>>>>
>>>>>>> Also, I've attempted to install with spack by using
>>>>>>>
>>>>>>> spack install
>>>>>>> cp2k at 2023.1+cosma+cuda+elpa+libint+libxc+mpi+openmp+pexsi+plumed+sirius+spglib
>>>>>>> smm=blas cuda_arch=70
>>>>>>>
>>>>>>> These are some of the last lines of output
>>>>>>>
>>>>>>> >> 4028 collect2: error: ld returned 1 exit status
>>>>>>> >> 4029 collect2: error: ld returned 1 exit status
>>>>>>> >> 4030 make[3]: ***
>>>>>>> [/tmp/keilbart/spack-stage/spack-stage-cp2k-2023.1-24dhoyt24tbnn4d423glgoeqqquibmb6/spack-src/obj/linux-rhel7-power9le-gcc/psmp/
>>>>>>> all.dep:178:
>>>>>>> /tmp/keilbart/spack-stage/spack-stage-cp2k-2023.1-24dhoyt24tbnn4d423glgoeqqquibmb6/spack-src/exe/linux-rhel7-power9le-gcc/cp2k.p
>>>>>>> smp] Error 1
>>>>>>> 4031 make[3]: *** Waiting for unfinished jobs....
>>>>>>> >> 4032 make[3]: ***
>>>>>>> [/tmp/keilbart/spack-stage/spack-stage-cp2k-2023.1-24dhoyt24tbnn4d423glgoeqqquibmb6/spack-src/obj/linux-rhel7-power9le-gcc/psmp/
>>>>>>> all.dep:194:
>>>>>>> /tmp/keilbart/spack-stage/spack-stage-cp2k-2023.1-24dhoyt24tbnn4d423glgoeqqquibmb6/spack-src/exe/linux-rhel7-power9le-gcc/libcp2
>>>>>>> k_unittest.psmp] Error 1
>>>>>>> >> 4033 make[2]: ***
>>>>>>> [/tmp/keilbart/spack-stage/spack-stage-cp2k-2023.1-24dhoyt24tbnn4d423glgoeqqquibmb6/spack-src/Makefile:146:
>>>>>>> all] Error 2
>>>>>>> >> 4034 make[1]: ***
>>>>>>> [/tmp/keilbart/spack-stage/spack-stage-cp2k-2023.1-24dhoyt24tbnn4d423glgoeqqquibmb6/spack-src/Makefile:128:
>>>>>>> psmp] Error 2
>>>>>>> >> 4035 make: *** [Makefile:123: all] Error 2
>>>>>>>
>>>>>>> Finally, I also have some intel machines that I'm attempting to
>>>>>>> build on and having issues as well but we can start with the IBM machine as
>>>>>>> we're hoping to accelerate the simulations with the GPU.
>>>>>>>
>>>>>>> Please let me know what other information I can provide. Thank you.
>>>>>>>
>>>>>>> Nathan
>>>>>>>
>>>>>>
--
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+unsubscribe at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/fa315e1b-33b3-4102-8cec-dfebc66b02bcn%40googlegroups.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20230412/8726589d/attachment-0001.htm>
More information about the CP2K-user
mailing list