[CP2K-user] CUDA RUNTIME API error: EventRecord failed with error cudaErrorInvalidResourceHandle
Alfio Lazzaro
alfio.... at gmail.com
Thu Feb 4 06:41:47 UTC 2021
The multi-gpu support is still not stable.
The error message is inside COSMA.
Could you remove this library from your installation of CP2K? I assume you
are using the toolchain, so just use --with-cosma=no
Then, I assume you are using PSMP version of CP2K (the only way of using
the multiple GPUs). Could you confirm? Note that there must be a rank (or
multiple ranks) attached to each GPU, e.g. for 12 GPUs I need at least 12
ranks (or multiples).
Alfio
Il giorno giovedì 4 febbraio 2021 alle 02:20:50 UTC+1 singlebook ha scritto:
>
> Hello, All
>
> I just install CP2K v8.1 on my workstation. There are 12 NVIDIA K80 GPUs
> in the workstation. The compiler is GCC 6.5 and CUDA 10.0.
>
> I want to perform AIMD for SiC, but when I use more than 6 GPUs, it always
> give me the error:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *CUDA RUNTIME API error: EventRecord failed with error
> cudaErrorInvalidResourceHandleerror: GPU API call : invalid resource
> handleterminate called after throwing an instance of 'std::runtime_error'
> what(): GPU ERRORProgram received signal SIGABRT: Process abort
> signal.Backtrace for this error:#0 0x7fc42ccc626f in ???#1 0x7fc42ccc61f7
> in ???#2 0x7fc42ccc78e7 in ???#3 0x7fc43d68193c in
> _ZN9__gnu_cxx27__verbose_terminate_handlerEv at
> ../../../../gcc-6.5.0/libstdc++-v3/libsupc++/vterminate.cc:95#4
> 0x7fc43d67f905 in _ZN10__cxxabiv111__terminateEPFvvE at
> ../../../../gcc-6.5.0/libstdc++-v3/libsupc++/eh_terminate.cc:47#5
> 0x7fc43d67f950 in _ZSt9terminatev at
> ../../../../gcc-6.5.0/libstdc++-v3/libsupc++/eh_terminate.cc:57#6
> 0x7fc43d67fb68 in __cxa_throw at
> ../../../../gcc-6.5.0/libstdc++-v3/libsupc++/eh_throw.cc:87#7 0x2b12c82 in
> check_runtime_status at
> /local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/libs/Tiled-MM/src/Tiled-MM/util.hpp:17#8
> 0x2b12c82 in _ZNK3gpu13device_stream13enqueue_eventEv at
> /local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/libs/Tiled-MM/src/Tiled-MM/device_stream.hpp:62#9
> 0x2b12c82 in
> _ZN3gpu11round_robinIdEEvRNS_12tiled_matrixIT_EES4_S4_RNS_13device_bufferIS2_EES7_S7_iiiS2_S2_RNS_9mm_handleIS2_EE
> at
> /local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/libs/Tiled-MM/src/Tiled-MM/tiled_mm.cpp:248#10
> 0x2b1351c in _ZN3gpu4gemmIdEEvRNS_9mm_handleIT_EEPS2_S5_S5_iiiS2_S2_b at
> /local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/libs/Tiled-MM/src/Tiled-MM/tiled_mm.cpp:341#11
> 0x2adfdee in
> _ZN5cosma14local_multiplyIdEEvPN3gpu9mm_handleIT_EEPS3_S6_S6_iiiS3_S3_
> at
> /local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/src/cosma/local_multiply.cpp:86#12
> 0x2ac8fb3 in
> _ZN5cosma8multiplyIdEEvPNS_13cosma_contextIT_EERNS_11CosmaMatrixIS2_EES7_S7_RNS_8IntervalES9_S9_S9_mRKNS_8StrategyERNS_12communicatorES2_S2_
> at
> /local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/src/cosma/multiply.cpp:355#13
> 0x2ac9c26 in
> _ZN5cosma8multiplyIdEEvPNS_13cosma_contextIT_EERNS_11CosmaMatrixIS2_EES7_S7_RKNS_8StrategyEiS2_S2_
> at
> /local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/src/cosma/multiply.cpp:272#14
> 0x2a9fc5d in ???#15 0x250cd5c in __cp_fm_basic_linalg_MOD_cp_fm_gemm at
> /local/src/cp2k-8.1/src/fm/cp_fm_basic_linalg.F:446#16 0xcd8744 in
> __cp_gemm_interface_MOD_cp_gemm at
> /local/src/cp2k-8.1/src/cp_gemm_interface.F:138#17 0x10c794b in
> __qs_wf_history_methods_MOD_wfi_extrapolate at
> /local/src/cp2k-8.1/src/qs_wf_history_methods.F:912#18 0x17a5b53 in
> scf_env_initial_rho_setup at
> /local/src/cp2k-8.1/src/qs_scf_initialization.F:1122#19 0x17a5b53 in
> init_scf_run at /local/src/cp2k-8.1/src/qs_scf_initialization.F:1047#20
> 0x17a79b5 in __qs_scf_initialization_MOD_qs_scf_env_initialize at
> /local/src/cp2k-8.1/src/qs_scf_initialization.F:182#21 0xf1e341 in
> __qs_scf_MOD_scf at /local/src/cp2k-8.1/src/qs_scf.F:222#22 0xc0e966 in
> __qs_energy_MOD_qs_energies at
> /local/src/cp2k-8.1/src/qs_energy.F:88#23 0x1979f13 in qs_forces at
> /local/src/cp2k-8.1/src/qs_force.F:209#24 0x197dc87 in
> __qs_force_MOD_qs_calc_energy_force at
> /local/src/cp2k-8.1/src/qs_force.F:114#25 0x112bfe5 in
> __force_env_methods_MOD_force_env_calc_energy_force at
> /local/src/cp2k-8.1/src/force_env_methods.F:271#26 0x797c55 in
> __integrator_MOD_nvt at
> /local/src/cp2k-8.1/src/motion/integrator.F:1103#27 0x78ddca in
> __velocity_verlet_control_MOD_velocity_verlet at
> /local/src/cp2k-8.1/src/motion/velocity_verlet_control.F:77#28 0x6c1695 in
> qs_mol_dyn_low at /local/src/cp2k-8.1/src/motion/md_run.F:481#29
> 0x6c209a in __md_run_MOD_qs_mol_dyn at
> /local/src/cp2k-8.1/src/motion/md_run.F:153#30 0x5536ae in cp2k_run at
> /local/src/cp2k-8.1/src/start/cp2k_runs.F:378#31 0x556764 in
> __cp2k_runs_MOD_run_input at
> /local/src/cp2k-8.1/src/start/cp2k_runs.F:983#32 0x534a31 in cp2k at
> /local/src/cp2k-8.1/src/start/cp2k.F:337#33 0x4ec1cc in main at
> /local/src/cp2k-8.1/src/start/cp2k.F:44====================================================================================
> BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES= PID 14969 RUNNING
> AT k172= EXIT CODE: 134= CLEANING UP REMAINING PROCESSES= YOU CAN
> IGNORE THE BELOW CLEANUP
> MESSAGES===================================================================================YOUR
> APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)This
> typically refers to a problem with your application.Please see the FAQ page
> for debugging suggestions*
>
> There is no problem for CP2K of CPU version, and I also perform classical
> MD for argon.inp in the exercise with 12 GPUs smoothly.
>
> Your response is highly appreciated.
>
> Best wishes,
>
> Wei
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20210203/250a365e/attachment.htm>
More information about the CP2K-user
mailing list