I'm sorry, I don't know how to reply to your question. However, I've opened a ticket on the github CP2K repository, maybe someone more expert can reply to you (see https://github.com/cp2k/cp2k/issues/2530 ).<div><br /></div><div>From my experience, ELPA is useful when you engage many ranks. You can check your outputs for timing entries like `cp_fm_syevd` (ScaLAPACK) or `cp_fm_diag_elpa` (ELPA), e.g.:</div><div><br /></div><div>cp_fm_syevd                         36 10.6    0.001    0.001   13.586   13.587<br /></div><div><br /></div><div>In this particular case, ELPA takes 13.6 seconds (last column). So, you can check how much time you spend in the diagonalizer and compare to the total time.</div><div>Another possibility is that you can always switch the diagonalizer (ELPA vs ScaLAPACK) in your input file (see https://manual.cp2k.org/trunk/CP2K_INPUT/GLOBAL.html#PREFERRED_DIAG_LIBRARY). In this case, I can suggest to build ELPA without GPU support, so that you can still have ELPA on the CPU (assuming that it is beneficial in your case) by hacking the toolchain installation file:</div><div><br /></div><div>https://github.com/cp2k/cp2k/blob/master/tools/toolchain/scripts/stage5/install_elpa.sh<br /></div><div><br /></div><div>Hope it helps.</div><div><br /></div><div>Alfio</div><div><br /><br /></div><div class="gmail_quote"><div dir="auto" class="gmail_attr">Il giorno mercoledì 25 gennaio 2023 alle 12:37:49 UTC+1 jerryt...@gmail.com ha scritto:<br/></div><blockquote class="gmail_quote" style="margin: 0 0 0 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">Hi Alfio,<div>Yes, ELPA was the problem.  I removed it from my build and CP2K worked as expected.  Where does ELPA help the most?  The majority of my AIMD jobs are 1000 atoms or less.  Will ELPA provide a performance advantage over SCALAPACK for systems of that size?</div><div><br></div><div>Thank you,</div><div>Jerry<br><br></div><div class="gmail_quote"><div dir="auto" class="gmail_attr">On Monday, January 23, 2023 at 4:09:41 AM UTC-5 Alfio Lazzaro wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I have no clue what's wrong here, however I see in your log that ELPA is giving some warning message. For this reason, I would suggest to avoid elpa, i.e. add `--with-elpa=no` during the toolchain installation. Does it work on a single GPU, i.e. a single MPI rank?<br><br><div class="gmail_quote"><div dir="auto" class="gmail_attr">Il giorno venerdì 20 gennaio 2023 alle 15:33:45 UTC+1 <a rel="nofollow">jerryt...@gmail.com</a> ha scritto:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>Dear Forum,</div><div>I successfully compiled v2023.1 (gcc-10.3.0, cuda-11.2, and MKL lib) with the toolchain using:</div><div><br>"-j 8 --no-check-certificate --install-all --with-gcc=system --with-openmpi --with-mkl --with-sirius=no --with-spfft=no --with-cmake=system --enable-cuda --gpu-ver=P100 --with-pexsi --with-sirius=no --with-quip=no --with-hdf5=no --with-libvdwxc=no --with-spla=no --with-libtorch=no"</div><div><br></div><div>However, when I ran a test job, the job crashed and I got GPU oversubscription as shown below:</div><div>+-----------------------------------------------------------------------------+<br>| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |<br>|-------------------------------+----------------------+----------------------+<br>| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |<br>| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |<br>|                               |                      |               MIG M. |<br>|===============================+======================+======================|<br>|   0  Tesla V100-SXM2...  Off  | 00000000:1A:00.0 Off |                    0 |<br>| N/A   42C    P0    80W / 300W |   2585MiB / 16384MiB |     53%      Default |<br>|                               |                      |                  N/A |<br>+-------------------------------+----------------------+----------------------+<br>|   1  Tesla V100-SXM2...  Off  | 00000000:1C:00.0 Off |                    0 |<br>| N/A   36C    P0    75W / 300W |   1664MiB / 16384MiB |     65%      Default |<br>|                               |                      |                  N/A |<br>+-------------------------------+----------------------+----------------------+<br>|   2  Tesla V100-SXM2...  Off  | 00000000:1D:00.0 Off |                    0 |<br>| N/A   36C    P0    73W / 300W |   1616MiB / 16384MiB |     54%      Default |<br>|                               |                      |                  N/A |<br>+-------------------------------+----------------------+----------------------+<br>|   3  Tesla V100-SXM2...  Off  | 00000000:1E:00.0 Off |                    0 |<br>| N/A   40C    P0    71W / 300W |   1614MiB / 16384MiB |     55%      Default |<br>|                               |                      |                  N/A |<br>+-------------------------------+----------------------+----------------------+<br>                                                                               <br>+-----------------------------------------------------------------------------+<br>| Processes:                                                                  |<br>|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |<br>|        ID   ID                                                   Usage      |<br>|=============================================================================|<br>|    0   N/A  N/A    159608      C   .../exe/local_cuda/cp2k.psmp     1655MiB |<br>|    0   N/A  N/A    159609      C   .../exe/local_cuda/cp2k.psmp      307MiB |<br>|    0   N/A  N/A    159610      C   .../exe/local_cuda/cp2k.psmp      307MiB |<br>|    0   N/A  N/A    159611      C   .../exe/local_cuda/cp2k.psmp      307MiB |<br>|    1   N/A  N/A    159609      C   .../exe/local_cuda/cp2k.psmp     1659MiB |<br>|    2   N/A  N/A    159610      C   .../exe/local_cuda/cp2k.psmp     1611MiB |<br>|    3   N/A  N/A    159611      C   .../exe/local_cuda/cp2k.psmp     1609MiB |<br>+-----------------------------------------------------------------------------+</div><div><br></div><div>However, using CP2K 2022.2, I ran the job successfully and did  not get this oversubscription.  <br></div><div><br></div><div>+-----------------------------------------------------------------------------+<br>| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |<br>|-------------------------------+----------------------+----------------------+<br>| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |<br>| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |<br>|                               |                      |               MIG M. |<br>|===============================+======================+======================|<br>|   0  Tesla V100-SXM2...  Off  | 00000000:1A:00.0 Off |                    0 |<br>| N/A   37C    P0    63W / 300W |   1598MiB / 16384MiB |     31%      Default |<br>|                               |                      |                  N/A |<br>+-------------------------------+----------------------+----------------------+<br>|   1  Tesla V100-SXM2...  Off  | 00000000:1C:00.0 Off |                    0 |<br>| N/A   33C    P0    63W / 300W |   1606MiB / 16384MiB |     29%      Default |<br>|                               |                      |                  N/A |<br>+-------------------------------+----------------------+----------------------+<br>|   2  Tesla V100-SXM2...  Off  | 00000000:1D:00.0 Off |                    0 |<br>| N/A   34C    P0    63W / 300W |   1562MiB / 16384MiB |     27%      Default |<br>|                               |                      |                  N/A |<br>+-------------------------------+----------------------+----------------------+<br>|   3  Tesla V100-SXM2...  Off  | 00000000:1E:00.0 Off |                    0 |<br>| N/A   37C    P0    67W / 300W |   1560MiB / 16384MiB |     25%      Default |<br>|                               |                      |                  N/A |<br>+-------------------------------+----------------------+----------------------+<br>                                                                               <br>+-----------------------------------------------------------------------------+<br>| Processes:                                                                  |<br>|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |<br>|        ID   ID                                                   Usage      |<br>|=============================================================================|<br>|    0   N/A  N/A    163862      C   .../exe/local_cuda/cp2k.psmp     1599MiB |<br>|    1   N/A  N/A    163863      C   .../exe/local_cuda/cp2k.psmp     1603MiB |<br>|    2   N/A  N/A    163864      C   .../exe/local_cuda/cp2k.psmp     1557MiB |<br>|    3   N/A  N/A    163865      C   .../exe/local_cuda/cp2k.psmp     1555MiB |<br>+-----------------------------------------------------------------------------+<br></div><div><br></div><div>Additionally, the system output file shows the following CUDA runtime error:</div><div><br></div><div>CUDA RUNTIME API error: EventRecord failed with error cudaErrorInvalidResourceHandle (/cluster/home/tanoury/CP2K/cp2k-2023.1_GPU/exts/dbcsr/src/acc/cuda_hip/acc_event.cpp::60)</div><div><br></div><div>I have also attached the error output.<br></div><div><br></div><div>Any help to solve this problem is greatly appreciated.</div><div><br></div><div>Thank you so much,</div><div>Jerry<br></div><div><br></div></blockquote></div></blockquote></div></blockquote></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups "cp2k" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="mailto:cp2k+unsubscribe@googlegroups.com">cp2k+unsubscribe@googlegroups.com</a>.<br />
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/cp2k/30f62023-c218-43ca-8501-cc7213f5b6b1n%40googlegroups.com?utm_medium=email&utm_source=footer">https://groups.google.com/d/msgid/cp2k/30f62023-c218-43ca-8501-cc7213f5b6b1n%40googlegroups.com</a>.<br />