[CP2K-user] CUDA RUNTIME API error: EventRecord failed with error cudaErrorInvalidResourceHandle

Alfio Lazzaro alfio.... at gmail.com
Thu Feb 4 06:41:47 UTC 2021


The multi-gpu support is still not stable.
The error message is inside COSMA.
Could you remove this library from your installation of CP2K? I assume you 
are using the toolchain, so just use --with-cosma=no

Then, I assume you are using PSMP version of CP2K (the only way of using 
the multiple GPUs). Could you confirm? Note that there must be a rank (or 
multiple ranks) attached to each GPU, e.g. for 12 GPUs I need at least 12 
ranks (or multiples).

Alfio

Il giorno giovedì 4 febbraio 2021 alle 02:20:50 UTC+1 singlebook ha scritto:

>
> Hello, All
>
> I just install CP2K v8.1 on my workstation.  There are 12 NVIDIA K80 GPUs 
> in the workstation. The compiler is GCC 6.5 and CUDA 10.0.
>
> I want to perform AIMD for SiC, but when I use more than 6 GPUs, it always 
> give me the error:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *CUDA RUNTIME API error: EventRecord failed with error 
> cudaErrorInvalidResourceHandleerror: GPU API call : invalid resource 
> handleterminate called after throwing an instance of 'std::runtime_error'  
> what():  GPU ERRORProgram received signal SIGABRT: Process abort 
> signal.Backtrace for this error:#0  0x7fc42ccc626f in ???#1  0x7fc42ccc61f7 
> in ???#2  0x7fc42ccc78e7 in ???#3  0x7fc43d68193c in 
> _ZN9__gnu_cxx27__verbose_terminate_handlerEv    at 
> ../../../../gcc-6.5.0/libstdc++-v3/libsupc++/vterminate.cc:95#4  
> 0x7fc43d67f905 in _ZN10__cxxabiv111__terminateEPFvvE    at 
> ../../../../gcc-6.5.0/libstdc++-v3/libsupc++/eh_terminate.cc:47#5  
> 0x7fc43d67f950 in _ZSt9terminatev    at 
> ../../../../gcc-6.5.0/libstdc++-v3/libsupc++/eh_terminate.cc:57#6  
> 0x7fc43d67fb68 in __cxa_throw    at 
> ../../../../gcc-6.5.0/libstdc++-v3/libsupc++/eh_throw.cc:87#7  0x2b12c82 in 
> check_runtime_status    at 
> /local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/libs/Tiled-MM/src/Tiled-MM/util.hpp:17#8  
> 0x2b12c82 in _ZNK3gpu13device_stream13enqueue_eventEv    at 
> /local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/libs/Tiled-MM/src/Tiled-MM/device_stream.hpp:62#9  
> 0x2b12c82 in 
> _ZN3gpu11round_robinIdEEvRNS_12tiled_matrixIT_EES4_S4_RNS_13device_bufferIS2_EES7_S7_iiiS2_S2_RNS_9mm_handleIS2_EE   
>  at 
> /local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/libs/Tiled-MM/src/Tiled-MM/tiled_mm.cpp:248#10  
> 0x2b1351c in _ZN3gpu4gemmIdEEvRNS_9mm_handleIT_EEPS2_S5_S5_iiiS2_S2_b    at 
> /local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/libs/Tiled-MM/src/Tiled-MM/tiled_mm.cpp:341#11  
> 0x2adfdee in 
> _ZN5cosma14local_multiplyIdEEvPN3gpu9mm_handleIT_EEPS3_S6_S6_iiiS3_S3_   
>  at 
> /local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/src/cosma/local_multiply.cpp:86#12  
> 0x2ac8fb3 in 
> _ZN5cosma8multiplyIdEEvPNS_13cosma_contextIT_EERNS_11CosmaMatrixIS2_EES7_S7_RNS_8IntervalES9_S9_S9_mRKNS_8StrategyERNS_12communicatorES2_S2_   
>  at 
> /local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/src/cosma/multiply.cpp:355#13  
> 0x2ac9c26 in 
> _ZN5cosma8multiplyIdEEvPNS_13cosma_contextIT_EERNS_11CosmaMatrixIS2_EES7_S7_RKNS_8StrategyEiS2_S2_   
>  at 
> /local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/src/cosma/multiply.cpp:272#14  
> 0x2a9fc5d in ???#15  0x250cd5c in __cp_fm_basic_linalg_MOD_cp_fm_gemm    at 
> /local/src/cp2k-8.1/src/fm/cp_fm_basic_linalg.F:446#16  0xcd8744 in 
> __cp_gemm_interface_MOD_cp_gemm    at 
> /local/src/cp2k-8.1/src/cp_gemm_interface.F:138#17  0x10c794b in 
> __qs_wf_history_methods_MOD_wfi_extrapolate    at 
> /local/src/cp2k-8.1/src/qs_wf_history_methods.F:912#18  0x17a5b53 in 
> scf_env_initial_rho_setup    at 
> /local/src/cp2k-8.1/src/qs_scf_initialization.F:1122#19  0x17a5b53 in 
> init_scf_run    at /local/src/cp2k-8.1/src/qs_scf_initialization.F:1047#20  
> 0x17a79b5 in __qs_scf_initialization_MOD_qs_scf_env_initialize    at 
> /local/src/cp2k-8.1/src/qs_scf_initialization.F:182#21  0xf1e341 in 
> __qs_scf_MOD_scf    at /local/src/cp2k-8.1/src/qs_scf.F:222#22  0xc0e966 in 
> __qs_energy_MOD_qs_energies    at 
> /local/src/cp2k-8.1/src/qs_energy.F:88#23  0x1979f13 in qs_forces    at 
> /local/src/cp2k-8.1/src/qs_force.F:209#24  0x197dc87 in 
> __qs_force_MOD_qs_calc_energy_force    at 
> /local/src/cp2k-8.1/src/qs_force.F:114#25  0x112bfe5 in 
> __force_env_methods_MOD_force_env_calc_energy_force    at 
> /local/src/cp2k-8.1/src/force_env_methods.F:271#26  0x797c55 in 
> __integrator_MOD_nvt    at 
> /local/src/cp2k-8.1/src/motion/integrator.F:1103#27  0x78ddca in 
> __velocity_verlet_control_MOD_velocity_verlet    at 
> /local/src/cp2k-8.1/src/motion/velocity_verlet_control.F:77#28  0x6c1695 in 
> qs_mol_dyn_low    at /local/src/cp2k-8.1/src/motion/md_run.F:481#29  
> 0x6c209a in __md_run_MOD_qs_mol_dyn    at 
> /local/src/cp2k-8.1/src/motion/md_run.F:153#30  0x5536ae in cp2k_run    at 
> /local/src/cp2k-8.1/src/start/cp2k_runs.F:378#31  0x556764 in 
> __cp2k_runs_MOD_run_input    at 
> /local/src/cp2k-8.1/src/start/cp2k_runs.F:983#32  0x534a31 in cp2k    at 
> /local/src/cp2k-8.1/src/start/cp2k.F:337#33  0x4ec1cc in main    at 
> /local/src/cp2k-8.1/src/start/cp2k.F:44====================================================================================   
> BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES=   PID 14969 RUNNING 
> AT k172=   EXIT CODE: 134=   CLEANING UP REMAINING PROCESSES=   YOU CAN 
> IGNORE THE BELOW CLEANUP 
> MESSAGES===================================================================================YOUR 
> APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)This 
> typically refers to a problem with your application.Please see the FAQ page 
> for debugging suggestions*
>
> There is no problem for CP2K of CPU version, and I also perform classical 
> MD for   argon.inp in the exercise with 12 GPUs smoothly.
>
> Your response is highly appreciated.
>
> Best wishes,
>
> Wei
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20210203/250a365e/attachment.htm>


More information about the CP2K-user mailing list