[CP2K-user] [CP2K:19974] Built cuda.ssmp version successfully but failed on regtest

Mike Chen mike.scchen at gmail.com
Tue Feb 27 07:39:03 UTC 2024


Hi,
According to the CUDA documents I found, the driver CUDA support version
and installed (or loaded) CUDA runtime version can be different.
It should work as long as the CUDA runtime version <= driver CUDA support
version for backward compatibility.
And again, the error I got from CUDA runtime 12.2 and 11.7 are identical.
I'm trying to change the --gpu-ver values and see if if works.

Mike

Krack Matthias <matthias.krack at psi.ch> 於 2024年2月26日 週一 下午9:06寫道:

> Hi
>
>
>
> This looks like an issue with the CUDA installation. You load the module
> cuda/11.7.1, whereas nvidia-smi returns “CUDA Version 12.2”. Maybe, that’s
> something to check.
>
>
>
> Best
>
>
>
> Matthias
>
>
>
> *From: *cp2k at googlegroups.com <cp2k at googlegroups.com> on behalf of Mike
> Chen <mike.scchen at gmail.com>
> *Date: *Sunday, 25 February 2024 at 14:03
> *To: *cp2k <cp2k at googlegroups.com>
> *Subject: *[CP2K:19967] Built cuda.ssmp version successfully but failed
> on regtest
>
> Hi all,
> I'm trying to build cuda_ssmp version of CP2K 2024.1, without the MPI
> support (not really necessary since the machine has only one GPU card?).
> The build process goes well, but the regtest failed as:
>
> [root at gpu01 cp2k-2024.1]# make ARCH=local_cuda VERSION=ssmp test
> (......)
> ========= Python (ssmp) =========
> /usr/bin/env python3 --version
> Python 3.6.8
> ----------------------- External Modules ---------------------------------
> DBCSR Version: 2.6.0 (2023-07-10)
> ---------------------------- Modules -------------------------------------
> Currently Loaded Modulefiles:
>  1) gcc-9.5.0/gcc   2) cuda/11.7.1   3) tools/cmake-3.28.1   4)
> tools/git-2.43.0
> *************************** Testing started ****************************
> ERROR: cuInit failed with error:  999
> /cluster/bld/cp2k-2024.1/src/offload/offload_library.c 57
> Program received signal SIGABRT: Process abort signal.
> Backtrace for this error:
> #0  0x7f8837d243ff in ???
> #1  0x7f8837d24387 in ???
> #2  0x7f8837d25a77 in ???
> #3  0x30fe53b in offload_init
>         at /cluster/bld/cp2k-2024.1/src/offload/offload_library.c:58
> #4  0xaf6e0b in __f77_interface_MOD_init_cp2k
>         at /cluster/bld/cp2k-2024.1/src/f77_interface.F:234
> #5  0x455f88 in cp2k
>         at /cluster/bld/cp2k-2024.1/src/start/cp2k.F:284
> #6  0x40ebdc in main
>         at /cluster/bld/cp2k-2024.1/src/start/cp2k.F:44
> Could not parse feature flags.
>
> The machine has a RTX A6000 card, and CUDA toolkit 12.3.2 was used:
>
> +---------------------------------------------------------------------------------------+
> | NVIDIA-SMI 545.23.08              Driver Version: 545.23.08    CUDA
> Version: 12.3     |
>
> |-----------------------------------------+----------------------+----------------------+
> | GPU  Name                 Persistence-M | Bus-Id        Disp.A |
> Volatile Uncorr. ECC |
> | Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage |
> GPU-Util  Compute M. |
> |                                         |                      |
>       MIG M. |
>
> |=========================================+======================+======================|
> |   0  NVIDIA RTX A6000               Off | 00000000:3B:00.0 Off |
>          Off |
> | 30%   36C    P0              67W / 300W |      2MiB / 49140MiB |      2%
>      Default |
> |                                         |                      |
>          N/A |
> +-----------------------------------------+----------------------+
>
> The OS is CentOS 7.9, and I used a source-built GCC 9.5.0 as loadable
> module.
> The toolchain was built with:
>
> ./install_cp2k_toolchain.sh --mpi-mode=no --with-cmake=system -j 16
> --enable-cuda=yes --gpu-ver=A100
>
> and then in the CP2K source folder:
>
> make -j 16 ARCH=local_cuda VERSION=ssmp
>
> However, the CPU ssmp version (ARCH=local) was built and can pass all the
> regtests successfully.
> Any suggestions on making the CUDA version works?
>
> Mike
>
> --
> You received this message because you are subscribed to the Google Groups
> "cp2k" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cp2k+unsubscribe at googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/cp2k/affc7786-0b86-4e51-8d40-22247d0d2ab9n%40googlegroups.com
> <https://groups.google.com/d/msgid/cp2k/affc7786-0b86-4e51-8d40-22247d0d2ab9n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "cp2k" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/cp2k/wJao2K2BFMw/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> cp2k+unsubscribe at googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/cp2k/ZRAP278MB08270DCEE2C5AF6F1C0A5AE5F45A2%40ZRAP278MB0827.CHEP278.PROD.OUTLOOK.COM
> <https://groups.google.com/d/msgid/cp2k/ZRAP278MB08270DCEE2C5AF6F1C0A5AE5F45A2%40ZRAP278MB0827.CHEP278.PROD.OUTLOOK.COM?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+unsubscribe at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/CABdS-%3DcQ_KXs0rcPStdzVApKunwtjvat7_Vw5Sx2yg6oEM5Lxw%40mail.gmail.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20240227/42ac5430/attachment-0001.htm>


More information about the CP2K-user mailing list