I have no clue what's wrong here, however I see in your log that ELPA is giving some warning message. For this reason, I would suggest to avoid elpa, i.e. add `--with-elpa=no` during the toolchain installation. Does it work on a single GPU, i.e. a single MPI rank?<br /><br /><div class="gmail_quote"><div dir="auto" class="gmail_attr">Il giorno venerdì 20 gennaio 2023 alle 15:33:45 UTC+1 jerryt...@gmail.com ha scritto:<br/></div><blockquote class="gmail_quote" style="margin: 0 0 0 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div>Dear Forum,</div><div>I successfully compiled v2023.1 (gcc-10.3.0, cuda-11.2, and MKL lib) with the toolchain using:</div><div><br>"-j 8 --no-check-certificate --install-all --with-gcc=system --with-openmpi --with-mkl --with-sirius=no --with-spfft=no --with-cmake=system --enable-cuda --gpu-ver=P100 --with-pexsi --with-sirius=no --with-quip=no --with-hdf5=no --with-libvdwxc=no --with-spla=no --with-libtorch=no"</div><div><br></div><div>However, when I ran a test job, the job crashed and I got GPU oversubscription as shown below:</div><div>+-----------------------------------------------------------------------------+<br>| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |<br>|-------------------------------+----------------------+----------------------+<br>| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |<br>| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |<br>|                               |                      |               MIG M. |<br>|===============================+======================+======================|<br>|   0  Tesla V100-SXM2...  Off  | 00000000:1A:00.0 Off |                    0 |<br>| N/A   42C    P0    80W / 300W |   2585MiB / 16384MiB |     53%      Default |<br>|                               |                      |                  N/A |<br>+-------------------------------+----------------------+----------------------+<br>|   1  Tesla V100-SXM2...  Off  | 00000000:1C:00.0 Off |                    0 |<br>| N/A   36C    P0    75W / 300W |   1664MiB / 16384MiB |     65%      Default |<br>|                               |                      |                  N/A |<br>+-------------------------------+----------------------+----------------------+<br>|   2  Tesla V100-SXM2...  Off  | 00000000:1D:00.0 Off |                    0 |<br>| N/A   36C    P0    73W / 300W |   1616MiB / 16384MiB |     54%      Default |<br>|                               |                      |                  N/A |<br>+-------------------------------+----------------------+----------------------+<br>|   3  Tesla V100-SXM2...  Off  | 00000000:1E:00.0 Off |                    0 |<br>| N/A   40C    P0    71W / 300W |   1614MiB / 16384MiB |     55%      Default |<br>|                               |                      |                  N/A |<br>+-------------------------------+----------------------+----------------------+<br>                                                                               <br>+-----------------------------------------------------------------------------+<br>| Processes:                                                                  |<br>|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |<br>|        ID   ID                                                   Usage      |<br>|=============================================================================|<br>|    0   N/A  N/A    159608      C   .../exe/local_cuda/cp2k.psmp     1655MiB |<br>|    0   N/A  N/A    159609      C   .../exe/local_cuda/cp2k.psmp      307MiB |<br>|    0   N/A  N/A    159610      C   .../exe/local_cuda/cp2k.psmp      307MiB |<br>|    0   N/A  N/A    159611      C   .../exe/local_cuda/cp2k.psmp      307MiB |<br>|    1   N/A  N/A    159609      C   .../exe/local_cuda/cp2k.psmp     1659MiB |<br>|    2   N/A  N/A    159610      C   .../exe/local_cuda/cp2k.psmp     1611MiB |<br>|    3   N/A  N/A    159611      C   .../exe/local_cuda/cp2k.psmp     1609MiB |<br>+-----------------------------------------------------------------------------+</div><div><br></div><div>However, using CP2K 2022.2, I ran the job successfully and did  not get this oversubscription.  <br></div><div><br></div><div>+-----------------------------------------------------------------------------+<br>| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |<br>|-------------------------------+----------------------+----------------------+<br>| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |<br>| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |<br>|                               |                      |               MIG M. |<br>|===============================+======================+======================|<br>|   0  Tesla V100-SXM2...  Off  | 00000000:1A:00.0 Off |                    0 |<br>| N/A   37C    P0    63W / 300W |   1598MiB / 16384MiB |     31%      Default |<br>|                               |                      |                  N/A |<br>+-------------------------------+----------------------+----------------------+<br>|   1  Tesla V100-SXM2...  Off  | 00000000:1C:00.0 Off |                    0 |<br>| N/A   33C    P0    63W / 300W |   1606MiB / 16384MiB |     29%      Default |<br>|                               |                      |                  N/A |<br>+-------------------------------+----------------------+----------------------+<br>|   2  Tesla V100-SXM2...  Off  | 00000000:1D:00.0 Off |                    0 |<br>| N/A   34C    P0    63W / 300W |   1562MiB / 16384MiB |     27%      Default |<br>|                               |                      |                  N/A |<br>+-------------------------------+----------------------+----------------------+<br>|   3  Tesla V100-SXM2...  Off  | 00000000:1E:00.0 Off |                    0 |<br>| N/A   37C    P0    67W / 300W |   1560MiB / 16384MiB |     25%      Default |<br>|                               |                      |                  N/A |<br>+-------------------------------+----------------------+----------------------+<br>                                                                               <br>+-----------------------------------------------------------------------------+<br>| Processes:                                                                  |<br>|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |<br>|        ID   ID                                                   Usage      |<br>|=============================================================================|<br>|    0   N/A  N/A    163862      C   .../exe/local_cuda/cp2k.psmp     1599MiB |<br>|    1   N/A  N/A    163863      C   .../exe/local_cuda/cp2k.psmp     1603MiB |<br>|    2   N/A  N/A    163864      C   .../exe/local_cuda/cp2k.psmp     1557MiB |<br>|    3   N/A  N/A    163865      C   .../exe/local_cuda/cp2k.psmp     1555MiB |<br>+-----------------------------------------------------------------------------+<br></div><div><br></div><div>Additionally, the system output file shows the following CUDA runtime error:</div><div><br></div><div>CUDA RUNTIME API error: EventRecord failed with error cudaErrorInvalidResourceHandle (/cluster/home/tanoury/CP2K/cp2k-2023.1_GPU/exts/dbcsr/src/acc/cuda_hip/acc_event.cpp::60)</div><div><br></div><div>I have also attached the error output.<br></div><div><br></div><div>Any help to solve this problem is greatly appreciated.</div><div><br></div><div>Thank you so much,</div><div>Jerry<br></div><div><br></div></blockquote></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups "cp2k" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="mailto:cp2k+unsubscribe@googlegroups.com">cp2k+unsubscribe@googlegroups.com</a>.<br />
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/cp2k/9c7abf5c-016b-4f2a-9911-ed4477c338f0n%40googlegroups.com?utm_medium=email&utm_source=footer">https://groups.google.com/d/msgid/cp2k/9c7abf5c-016b-4f2a-9911-ed4477c338f0n%40googlegroups.com</a>.<br />