<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
span.EmailStyle19
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
--></style>
</head>
<body lang="en-CH" link="blue" vlink="purple" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US">Hi Salvatore<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US">You can add the keyword
<a href="https://manual.cp2k.org/cp2k-9_1-branch/CP2K_INPUT/GLOBAL.html#TRACE">TRACE</a> (or
<a href="https://manual.cp2k.org/cp2k-9_1-branch/CP2K_INPUT/GLOBAL.html#list_TRACE_MASTER">
TRACE_MASTER</a> to trace only the MPI root process) in the &GLOBAL section of the CP2K input to get a more detailed output.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US">Does the run freeze for any kind of CP2K input? How did you compile CP2K? Could you run the regression test successfully? It is difficult to make any suggestion without further information.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US">Best<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US">Matthias<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal" style="margin-left:36.0pt"><b><span style="font-size:12.0pt;color:black">From:
</span></b><span style="font-size:12.0pt;color:black">"cp2k@googlegroups.com" <cp2k@googlegroups.com> on behalf of Salvatore Labonia <salvatore.labonia@gmail.com><br>
<b>Reply to: </b>"cp2k@googlegroups.com" <cp2k@googlegroups.com><br>
<b>Date: </b>Wednesday, 3 August 2022 at 12:35<br>
<b>To: </b>"cp2k@googlegroups.com" <cp2k@googlegroups.com><br>
<b>Subject: </b>[CP2K:17431] CP2K freeze<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:36.0pt"><o:p> </o:p></p>
</div>
<p class="MsoNormal" style="margin-left:36.0pt">Hello, <o:p></o:p></p>
<div>
<p class="MsoNormal" style="margin-left:36.0pt">we are facing freeze using CP2K on our HPC cluster.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:36.0pt">We have totally 94 Dell server but running cp2k v9.1 compiled with intel compiler and linked with intel mpi library, customer is experiencing running freeze.
<o:p></o:p></p>
<div>
<p style="margin-left:36.0pt">No matter the number or the type of involved nodes.<o:p></o:p></p>
<p style="margin-left:36.0pt">The freeze happens randomly, not at the same interaction number, even using the same running command and the same dataset for input.<o:p></o:p></p>
<p style="margin-left:36.0pt">Looking at processes status on nodes when freeze occurs, they seem to be running, using CPU but, if we try to attach to any process (and forked children of course), we can see that they all are sitting on a wait system call for
data coming (orout going) from (to) a pipe.<o:p></o:p></p>
<p style="margin-left:36.0pt">No other systems call are run by processes…<o:p></o:p></p>
<p style="margin-left:36.0pt">Slurm thinks that job is still running.<o:p></o:p></p>
<p style="margin-left:36.0pt">Killing one of the stuck processes causes the death of orher processes and finally slurm realizes that job has crashed.<o:p></o:p></p>
<p style="margin-left:36.0pt">Is this behaviour usual in same circumstances (and therefore customer has something to do to avoid it) or could it be caused by some other reason (cp2k compilation, mpi version, intel compilers version)?<o:p></o:p></p>
<p style="margin-left:36.0pt">Is there any way to have a debugging execution of cp2k/mpi with a more or less verbose output in order to understand at which point/call does the freeze happen?<o:p></o:p></p>
<p style="margin-left:36.0pt"> Regards<o:p></o:p></p>
<p style="margin-left:36.0pt">Salvatore<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal" style="margin-left:36.0pt">-- <br>
You received this message because you are subscribed to the Google Groups "cp2k" group.<br>
To unsubscribe from this group and stop receiving emails from it, send an email to
<a href="mailto:cp2k+unsubscribe@googlegroups.com">cp2k+unsubscribe@googlegroups.com</a>.<br>
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/cp2k/2bffd2de-1afd-4980-b3aa-6438990d81a9n%40googlegroups.com?utm_medium=email&utm_source=footer">
https://groups.google.com/d/msgid/cp2k/2bffd2de-1afd-4980-b3aa-6438990d81a9n%40googlegroups.com</a>.<br>
<br>
<o:p></o:p></p>
</div>
</body>
</html>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups "cp2k" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="mailto:cp2k+unsubscribe@googlegroups.com">cp2k+unsubscribe@googlegroups.com</a>.<br />
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/cp2k/4B492500-D071-47FB-B7F6-9D95EA33A429%40psi.ch?utm_medium=email&utm_source=footer">https://groups.google.com/d/msgid/cp2k/4B492500-D071-47FB-B7F6-9D95EA33A429%40psi.ch</a>.<br />