[CP2K-user] CUDA RUNTIME API error: EventRecord failed with error cudaErrorInvalidResourceHandle
Alfio Lazzaro
alfio.... at gmail.com
Fri Feb 5 07:08:21 UTC 2021
Hello!
I assume that by "12 cpus" you mean 12 MPI ranks, could you confirm? How
many threads?
First of all, consider that multigpu is still not well-tested. That said,
more GPUs don't mean faster execution if the code doesn't exploit that...
I see some possible explanations for your results:
1. the GPU part in CP2K is DBCSR, likely your benchmark doesn't use DBCSR
at lot, so no speed-up. From your CPU result, it seems that you are bound
by PDGEMMs, so COSMA is beneficial...
2. multiple GPUs can share the same PCIe so the data movement becomes the
bottleneck
I think a way to investigate is if you share the CP2K outputs. I can take a
look...
One more question: you said that it crashed for >6 GPUs, do you have a run
with 4 (or 6) GPUs with COSMA? If so, please share it.
One possibility is to use COSMA with only CPU and then the GPU for DBCSR.
However, it can be also possible that 6 GPUS with COSMA are good enough to
speed-up the execution...
For the rest, I suggest opening an issue on the COSMA page
(https://github.com/eth-cscs/COSMA/issues ) to understand why >6 GPUs are
not working (this is not strictly CP2K related).
Alfio
Il giorno venerdì 5 febbraio 2021 alle 02:18:43 UTC+1 singlebook ha scritto:
>
> Hello!
>
> I removed cosma from cp2k. Now it works for multiple GPUs, but the speed
> has not improved:
> 48 cpus without gpu : each scf step costs 0.3 second (cosma is available
> in cpu version.)
> 48 cpus with 12 gpus: each scf step costs 1.8 second (cosma is not
> available. )
> 12 cpus with 12 gpus: each scf step costs 1.2 second (cosma is not
> available. )
>
> On Thursday, February 4, 2021 at 2:41:48 PM UTC+8 Alfio Lazzaro wrote:
>
>> The multi-gpu support is still not stable.
>> The error message is inside COSMA.
>> Could you remove this library from your installation of CP2K? I assume
>> you are using the toolchain, so just use --with-cosma=no
>>
>> Then, I assume you are using PSMP version of CP2K (the only way of using
>> the multiple GPUs). Could you confirm? Note that there must be a rank (or
>> multiple ranks) attached to each GPU, e.g. for 12 GPUs I need at least 12
>> ranks (or multiples).
>>
>> Alfio
>>
>> Il giorno giovedì 4 febbraio 2021 alle 02:20:50 UTC+1 singlebook ha
>> scritto:
>>
>>>
>>> Hello, All
>>>
>>> I just install CP2K v8.1 on my workstation. There are 12 NVIDIA K80
>>> GPUs in the workstation. The compiler is GCC 6.5 and CUDA 10.0.
>>>
>>> I want to perform AIMD for SiC, but when I use more than 6 GPUs, it
>>> always give me the error:
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *CUDA RUNTIME API error: EventRecord failed with error
>>> cudaErrorInvalidResourceHandleerror: GPU API call : invalid resource
>>> handleterminate called after throwing an instance of 'std::runtime_error'
>>> what(): GPU ERRORProgram received signal SIGABRT: Process abort
>>> signal.Backtrace for this error:#0 0x7fc42ccc626f in ???#1 0x7fc42ccc61f7
>>> in ???#2 0x7fc42ccc78e7 in ???#3 0x7fc43d68193c in
>>> _ZN9__gnu_cxx27__verbose_terminate_handlerEv at
>>> ../../../../gcc-6.5.0/libstdc++-v3/libsupc++/vterminate.cc:95#4
>>> 0x7fc43d67f905 in _ZN10__cxxabiv111__terminateEPFvvE at
>>> ../../../../gcc-6.5.0/libstdc++-v3/libsupc++/eh_terminate.cc:47#5
>>> 0x7fc43d67f950 in _ZSt9terminatev at
>>> ../../../../gcc-6.5.0/libstdc++-v3/libsupc++/eh_terminate.cc:57#6
>>> 0x7fc43d67fb68 in __cxa_throw at
>>> ../../../../gcc-6.5.0/libstdc++-v3/libsupc++/eh_throw.cc:87#7 0x2b12c82 in
>>> check_runtime_status at
>>> /local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/libs/Tiled-MM/src/Tiled-MM/util.hpp:17#8
>>> 0x2b12c82 in _ZNK3gpu13device_stream13enqueue_eventEv at
>>> /local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/libs/Tiled-MM/src/Tiled-MM/device_stream.hpp:62#9
>>> 0x2b12c82 in
>>> _ZN3gpu11round_robinIdEEvRNS_12tiled_matrixIT_EES4_S4_RNS_13device_bufferIS2_EES7_S7_iiiS2_S2_RNS_9mm_handleIS2_EE
>>> at
>>> /local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/libs/Tiled-MM/src/Tiled-MM/tiled_mm.cpp:248#10
>>> 0x2b1351c in _ZN3gpu4gemmIdEEvRNS_9mm_handleIT_EEPS2_S5_S5_iiiS2_S2_b at
>>> /local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/libs/Tiled-MM/src/Tiled-MM/tiled_mm.cpp:341#11
>>> 0x2adfdee in
>>> _ZN5cosma14local_multiplyIdEEvPN3gpu9mm_handleIT_EEPS3_S6_S6_iiiS3_S3_
>>> at
>>> /local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/src/cosma/local_multiply.cpp:86#12
>>> 0x2ac8fb3 in
>>> _ZN5cosma8multiplyIdEEvPNS_13cosma_contextIT_EERNS_11CosmaMatrixIS2_EES7_S7_RNS_8IntervalES9_S9_S9_mRKNS_8StrategyERNS_12communicatorES2_S2_
>>> at
>>> /local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/src/cosma/multiply.cpp:355#13
>>> 0x2ac9c26 in
>>> _ZN5cosma8multiplyIdEEvPNS_13cosma_contextIT_EERNS_11CosmaMatrixIS2_EES7_S7_RKNS_8StrategyEiS2_S2_
>>> at
>>> /local/src/cp2k-8.1/tools/toolchain/build/cosma-2.2.0/src/cosma/multiply.cpp:272#14
>>> 0x2a9fc5d in ???#15 0x250cd5c in __cp_fm_basic_linalg_MOD_cp_fm_gemm at
>>> /local/src/cp2k-8.1/src/fm/cp_fm_basic_linalg.F:446#16 0xcd8744 in
>>> __cp_gemm_interface_MOD_cp_gemm at
>>> /local/src/cp2k-8.1/src/cp_gemm_interface.F:138#17 0x10c794b in
>>> __qs_wf_history_methods_MOD_wfi_extrapolate at
>>> /local/src/cp2k-8.1/src/qs_wf_history_methods.F:912#18 0x17a5b53 in
>>> scf_env_initial_rho_setup at
>>> /local/src/cp2k-8.1/src/qs_scf_initialization.F:1122#19 0x17a5b53 in
>>> init_scf_run at /local/src/cp2k-8.1/src/qs_scf_initialization.F:1047#20
>>> 0x17a79b5 in __qs_scf_initialization_MOD_qs_scf_env_initialize at
>>> /local/src/cp2k-8.1/src/qs_scf_initialization.F:182#21 0xf1e341 in
>>> __qs_scf_MOD_scf at /local/src/cp2k-8.1/src/qs_scf.F:222#22 0xc0e966 in
>>> __qs_energy_MOD_qs_energies at
>>> /local/src/cp2k-8.1/src/qs_energy.F:88#23 0x1979f13 in qs_forces at
>>> /local/src/cp2k-8.1/src/qs_force.F:209#24 0x197dc87 in
>>> __qs_force_MOD_qs_calc_energy_force at
>>> /local/src/cp2k-8.1/src/qs_force.F:114#25 0x112bfe5 in
>>> __force_env_methods_MOD_force_env_calc_energy_force at
>>> /local/src/cp2k-8.1/src/force_env_methods.F:271#26 0x797c55 in
>>> __integrator_MOD_nvt at
>>> /local/src/cp2k-8.1/src/motion/integrator.F:1103#27 0x78ddca in
>>> __velocity_verlet_control_MOD_velocity_verlet at
>>> /local/src/cp2k-8.1/src/motion/velocity_verlet_control.F:77#28 0x6c1695 in
>>> qs_mol_dyn_low at /local/src/cp2k-8.1/src/motion/md_run.F:481#29
>>> 0x6c209a in __md_run_MOD_qs_mol_dyn at
>>> /local/src/cp2k-8.1/src/motion/md_run.F:153#30 0x5536ae in cp2k_run at
>>> /local/src/cp2k-8.1/src/start/cp2k_runs.F:378#31 0x556764 in
>>> __cp2k_runs_MOD_run_input at
>>> /local/src/cp2k-8.1/src/start/cp2k_runs.F:983#32 0x534a31 in cp2k at
>>> /local/src/cp2k-8.1/src/start/cp2k.F:337#33 0x4ec1cc in main at
>>> /local/src/cp2k-8.1/src/start/cp2k.F:44====================================================================================
>>> BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES= PID 14969 RUNNING
>>> AT k172= EXIT CODE: 134= CLEANING UP REMAINING PROCESSES= YOU CAN
>>> IGNORE THE BELOW CLEANUP
>>> MESSAGES===================================================================================YOUR
>>> APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)This
>>> typically refers to a problem with your application.Please see the FAQ page
>>> for debugging suggestions*
>>>
>>> There is no problem for CP2K of CPU version, and I also perform
>>> classical MD for argon.inp in the exercise with 12 GPUs smoothly.
>>>
>>> Your response is highly appreciated.
>>>
>>> Best wishes,
>>>
>>> Wei
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20210204/c0cebb97/attachment.htm>
More information about the CP2K-user
mailing list