[CP2K-user] CUDA RUNTIME API error: EventRecord failed with error cudaErrorInvalidResourceHandle
singlebook
chenw... at gmail.com
Thu Feb 4 08:34:52 UTC 2021
Thanks for your reply! Yes, I am using PSMP. I will recompile cp2k without
cosma and give you feedback later.
On Thursday, February 4, 2021 at 2:41:48 PM UTC+8 Alfio Lazzaro wrote:
> The multi-gpu support is still not stable.
> The error message is inside COSMA.
> Could you remove this library from your installation of CP2K? I assume you
> are using the toolchain, so just use --with-cosma=no
>
> Then, I assume you are using PSMP version of CP2K (the only way of using
> the multiple GPUs). Could you confirm? Note that there must be a rank (or
> multiple ranks) attached to each GPU, e.g. for 12 GPUs I need at least 12
> ranks (or multiples).
>
> Alfio
>
> Il giorno giovedì 4 febbraio 2021 alle 02:20:50 UTC+1 singlebook ha
> scritto:
>
>>
>> Hello, All
>>
>> I just install CP2K v8.1 on my workstation. There are 12 NVIDIA K80 GPUs
>> in the workstation. The compiler is GCC 6.5 and CUDA 10.0.
>>
>> I want to perform AIMD for SiC, but when I use more than 6 GPUs, it
>> always give me the error:
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *CUDA RUNTIME API error: EventRecord failed with error
>> cudaErrorInvalidResourceHandleerror: GPU API call : invalid resource
>> handleterminate called after throwing an instance of 'std::runtime_error'
>> what(): GPU ERRORProgram received signal SIGABRT: Process abort
>> signal.Backtrace for this error:#0 0x7fc42ccc626f in ???#1 0x7fc42ccc61f7
>> in ???#2 0x7fc42ccc78e7 in ???#3 0x7fc43d68193c in
>> _ZN9__gnu_cxx27__verbose_terminate_handlerEv at
>> ../../../../gcc-6.5.0/libstdc++-v3/libsupc++/vterminate.cc:95#4
>> 0x7fc43d67f905 in _ZN10__cxxabiv111__terminateEPFvvE at
>> ../../../../gcc-6.5.0/libstdc++-v3/libsupc++/eh_terminate.cc:47#5
>> 0x7fc43d67f950 in _ZSt9terminatev at
>> ../../../../gcc-6.5.0/libstdc++-v3/libsupc++/eh_terminate.cc:57#6
>> 0x7fc43d67fb68 in __cxa_throw at
>> ../../../../gcc-6.5.0/libstdc++-v3/libsupc++/eh_throw.cc:87#7 0x2b12c82 in
>> check_runtime_status at
>> /local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/libs/Tiled-MM/src/Tiled-MM/util.hpp:17#8
>> 0x2b12c82 in _ZNK3gpu13device_stream13enqueue_eventEv at
>> /local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/libs/Tiled-MM/src/Tiled-MM/device_stream.hpp:62#9
>> 0x2b12c82 in
>> _ZN3gpu11round_robinIdEEvRNS_12tiled_matrixIT_EES4_S4_RNS_13device_bufferIS2_EES7_S7_iiiS2_S2_RNS_9mm_handleIS2_EE
>> at
>> /local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/libs/Tiled-MM/src/Tiled-MM/tiled_mm.cpp:248#10
>> 0x2b1351c in _ZN3gpu4gemmIdEEvRNS_9mm_handleIT_EEPS2_S5_S5_iiiS2_S2_b at
>> /local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/libs/Tiled-MM/src/Tiled-MM/tiled_mm.cpp:341#11
>> 0x2adfdee in
>> _ZN5cosma14local_multiplyIdEEvPN3gpu9mm_handleIT_EEPS3_S6_S6_iiiS3_S3_
>> at
>> /local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/src/cosma/local_multiply.cpp:86#12
>> 0x2ac8fb3 in
>> _ZN5cosma8multiplyIdEEvPNS_13cosma_contextIT_EERNS_11CosmaMatrixIS2_EES7_S7_RNS_8IntervalES9_S9_S9_mRKNS_8StrategyERNS_12communicatorES2_S2_
>> at
>> /local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/src/cosma/multiply.cpp:355#13
>> 0x2ac9c26 in
>> _ZN5cosma8multiplyIdEEvPNS_13cosma_contextIT_EERNS_11CosmaMatrixIS2_EES7_S7_RKNS_8StrategyEiS2_S2_
>> at
>> /local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/src/cosma/multiply.cpp:272#14
>> 0x2a9fc5d in ???#15 0x250cd5c in __cp_fm_basic_linalg_MOD_cp_fm_gemm at
>> /local/src/cp2k-8.1/src/fm/cp_fm_basic_linalg.F:446#16 0xcd8744 in
>> __cp_gemm_interface_MOD_cp_gemm at
>> /local/src/cp2k-8.1/src/cp_gemm_interface.F:138#17 0x10c794b in
>> __qs_wf_history_methods_MOD_wfi_extrapolate at
>> /local/src/cp2k-8.1/src/qs_wf_history_methods.F:912#18 0x17a5b53 in
>> scf_env_initial_rho_setup at
>> /local/src/cp2k-8.1/src/qs_scf_initialization.F:1122#19 0x17a5b53 in
>> init_scf_run at /local/src/cp2k-8.1/src/qs_scf_initialization.F:1047#20
>> 0x17a79b5 in __qs_scf_initialization_MOD_qs_scf_env_initialize at
>> /local/src/cp2k-8.1/src/qs_scf_initialization.F:182#21 0xf1e341 in
>> __qs_scf_MOD_scf at /local/src/cp2k-8.1/src/qs_scf.F:222#22 0xc0e966 in
>> __qs_energy_MOD_qs_energies at
>> /local/src/cp2k-8.1/src/qs_energy.F:88#23 0x1979f13 in qs_forces at
>> /local/src/cp2k-8.1/src/qs_force.F:209#24 0x197dc87 in
>> __qs_force_MOD_qs_calc_energy_force at
>> /local/src/cp2k-8.1/src/qs_force.F:114#25 0x112bfe5 in
>> __force_env_methods_MOD_force_env_calc_energy_force at
>> /local/src/cp2k-8.1/src/force_env_methods.F:271#26 0x797c55 in
>> __integrator_MOD_nvt at
>> /local/src/cp2k-8.1/src/motion/integrator.F:1103#27 0x78ddca in
>> __velocity_verlet_control_MOD_velocity_verlet at
>> /local/src/cp2k-8.1/src/motion/velocity_verlet_control.F:77#28 0x6c1695 in
>> qs_mol_dyn_low at /local/src/cp2k-8.1/src/motion/md_run.F:481#29
>> 0x6c209a in __md_run_MOD_qs_mol_dyn at
>> /local/src/cp2k-8.1/src/motion/md_run.F:153#30 0x5536ae in cp2k_run at
>> /local/src/cp2k-8.1/src/start/cp2k_runs.F:378#31 0x556764 in
>> __cp2k_runs_MOD_run_input at
>> /local/src/cp2k-8.1/src/start/cp2k_runs.F:983#32 0x534a31 in cp2k at
>> /local/src/cp2k-8.1/src/start/cp2k.F:337#33 0x4ec1cc in main at
>> /local/src/cp2k-8.1/src/start/cp2k.F:44====================================================================================
>> BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES= PID 14969 RUNNING
>> AT k172= EXIT CODE: 134= CLEANING UP REMAINING PROCESSES= YOU CAN
>> IGNORE THE BELOW CLEANUP
>> MESSAGES===================================================================================YOUR
>> APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)This
>> typically refers to a problem with your application.Please see the FAQ page
>> for debugging suggestions*
>>
>> There is no problem for CP2K of CPU version, and I also perform classical
>> MD for argon.inp in the exercise with 12 GPUs smoothly.
>>
>> Your response is highly appreciated.
>>
>> Best wishes,
>>
>> Wei
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20210204/11a368df/attachment.htm>
More information about the CP2K-user
mailing list