[CP2K-user] CUDA RUNTIME API error: EventRecord failed with error cudaErrorInvalidResourceHandle

singlebook chenw... at gmail.com
Thu Feb 4 01:20:50 UTC 2021


Hello, All

I just install CP2K v8.1 on my workstation.  There are 12 NVIDIA K80 GPUs 
in the workstation. The compiler is GCC 6.5 and CUDA 10.0.

I want to perform AIMD for SiC, but when I use more than 6 GPUs, it always 
give me the error:



















































































*CUDA RUNTIME API error: EventRecord failed with error 
cudaErrorInvalidResourceHandleerror: GPU API call : invalid resource 
handleterminate called after throwing an instance of 'std::runtime_error'  
what():  GPU ERRORProgram received signal SIGABRT: Process abort 
signal.Backtrace for this error:#0  0x7fc42ccc626f in ???#1  0x7fc42ccc61f7 
in ???#2  0x7fc42ccc78e7 in ???#3  0x7fc43d68193c in 
_ZN9__gnu_cxx27__verbose_terminate_handlerEv    at 
../../../../gcc-6.5.0/libstdc++-v3/libsupc++/vterminate.cc:95#4  
0x7fc43d67f905 in _ZN10__cxxabiv111__terminateEPFvvE    at 
../../../../gcc-6.5.0/libstdc++-v3/libsupc++/eh_terminate.cc:47#5  
0x7fc43d67f950 in _ZSt9terminatev    at 
../../../../gcc-6.5.0/libstdc++-v3/libsupc++/eh_terminate.cc:57#6  
0x7fc43d67fb68 in __cxa_throw    at 
../../../../gcc-6.5.0/libstdc++-v3/libsupc++/eh_throw.cc:87#7  0x2b12c82 in 
check_runtime_status    at 
/local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/libs/Tiled-MM/src/Tiled-MM/util.hpp:17#8  
0x2b12c82 in _ZNK3gpu13device_stream13enqueue_eventEv    at 
/local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/libs/Tiled-MM/src/Tiled-MM/device_stream.hpp:62#9  
0x2b12c82 in 
_ZN3gpu11round_robinIdEEvRNS_12tiled_matrixIT_EES4_S4_RNS_13device_bufferIS2_EES7_S7_iiiS2_S2_RNS_9mm_handleIS2_EE   
 at 
/local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/libs/Tiled-MM/src/Tiled-MM/tiled_mm.cpp:248#10  
0x2b1351c in _ZN3gpu4gemmIdEEvRNS_9mm_handleIT_EEPS2_S5_S5_iiiS2_S2_b    at 
/local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/libs/Tiled-MM/src/Tiled-MM/tiled_mm.cpp:341#11  
0x2adfdee in 
_ZN5cosma14local_multiplyIdEEvPN3gpu9mm_handleIT_EEPS3_S6_S6_iiiS3_S3_   
 at 
/local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/src/cosma/local_multiply.cpp:86#12  
0x2ac8fb3 in 
_ZN5cosma8multiplyIdEEvPNS_13cosma_contextIT_EERNS_11CosmaMatrixIS2_EES7_S7_RNS_8IntervalES9_S9_S9_mRKNS_8StrategyERNS_12communicatorES2_S2_   
 at 
/local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/src/cosma/multiply.cpp:355#13  
0x2ac9c26 in 
_ZN5cosma8multiplyIdEEvPNS_13cosma_contextIT_EERNS_11CosmaMatrixIS2_EES7_S7_RKNS_8StrategyEiS2_S2_   
 at 
/local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/src/cosma/multiply.cpp:272#14  
0x2a9fc5d in ???#15  0x250cd5c in __cp_fm_basic_linalg_MOD_cp_fm_gemm    at 
/local/src/cp2k-8.1/src/fm/cp_fm_basic_linalg.F:446#16  0xcd8744 in 
__cp_gemm_interface_MOD_cp_gemm    at 
/local/src/cp2k-8.1/src/cp_gemm_interface.F:138#17  0x10c794b in 
__qs_wf_history_methods_MOD_wfi_extrapolate    at 
/local/src/cp2k-8.1/src/qs_wf_history_methods.F:912#18  0x17a5b53 in 
scf_env_initial_rho_setup    at 
/local/src/cp2k-8.1/src/qs_scf_initialization.F:1122#19  0x17a5b53 in 
init_scf_run    at /local/src/cp2k-8.1/src/qs_scf_initialization.F:1047#20  
0x17a79b5 in __qs_scf_initialization_MOD_qs_scf_env_initialize    at 
/local/src/cp2k-8.1/src/qs_scf_initialization.F:182#21  0xf1e341 in 
__qs_scf_MOD_scf    at /local/src/cp2k-8.1/src/qs_scf.F:222#22  0xc0e966 in 
__qs_energy_MOD_qs_energies    at 
/local/src/cp2k-8.1/src/qs_energy.F:88#23  0x1979f13 in qs_forces    at 
/local/src/cp2k-8.1/src/qs_force.F:209#24  0x197dc87 in 
__qs_force_MOD_qs_calc_energy_force    at 
/local/src/cp2k-8.1/src/qs_force.F:114#25  0x112bfe5 in 
__force_env_methods_MOD_force_env_calc_energy_force    at 
/local/src/cp2k-8.1/src/force_env_methods.F:271#26  0x797c55 in 
__integrator_MOD_nvt    at 
/local/src/cp2k-8.1/src/motion/integrator.F:1103#27  0x78ddca in 
__velocity_verlet_control_MOD_velocity_verlet    at 
/local/src/cp2k-8.1/src/motion/velocity_verlet_control.F:77#28  0x6c1695 in 
qs_mol_dyn_low    at /local/src/cp2k-8.1/src/motion/md_run.F:481#29  
0x6c209a in __md_run_MOD_qs_mol_dyn    at 
/local/src/cp2k-8.1/src/motion/md_run.F:153#30  0x5536ae in cp2k_run    at 
/local/src/cp2k-8.1/src/start/cp2k_runs.F:378#31  0x556764 in 
__cp2k_runs_MOD_run_input    at 
/local/src/cp2k-8.1/src/start/cp2k_runs.F:983#32  0x534a31 in cp2k    at 
/local/src/cp2k-8.1/src/start/cp2k.F:337#33  0x4ec1cc in main    at 
/local/src/cp2k-8.1/src/start/cp2k.F:44====================================================================================   
BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES=   PID 14969 RUNNING 
AT k172=   EXIT CODE: 134=   CLEANING UP REMAINING PROCESSES=   YOU CAN 
IGNORE THE BELOW CLEANUP 
MESSAGES===================================================================================YOUR 
APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)This 
typically refers to a problem with your application.Please see the FAQ page 
for debugging suggestions*

There is no problem for CP2K of CPU version, and I also perform classical 
MD for   argon.inp in the exercise with 12 GPUs smoothly.

Your response is highly appreciated.

Best wishes,

Wei
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20210203/45047201/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SiC.inp
Type: chemical/x-gamess-input
Size: 4862 bytes
Desc: not available
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20210203/45047201/attachment.inp>


More information about the CP2K-user mailing list