[CP2K-user] [CP2K:17431] CP2K freeze

Salvatore Labonia salvatore.labonia at gmail.com
Wed Aug 3 10:35:18 UTC 2022


Hello,
we are facing freeze using CP2K on our HPC cluster.
We have totally 94 Dell server but running cp2k v9.1 compiled with intel 
compiler and linked with intel mpi library, customer is experiencing 
running freeze.

No matter the number or the type of involved nodes.

The freeze happens randomly, not at the same interaction number, even using 
the same running command and the same dataset for input.

Looking at processes status on nodes when freeze occurs, they seem to be 
running, using CPU but, if we try to attach to any process (and forked 
children of course), we can see that they all are sitting on a wait system 
call for data coming (orout going) from (to) a pipe.

No other systems call are run by processes…

Slurm thinks that job is still running.

Killing one of the stuck processes causes the death of orher processes and 
finally slurm realizes that job has crashed.

Is this behaviour usual in same circumstances (and therefore customer has 
something to do to avoid it) or could it be caused by some other reason 
(cp2k compilation, mpi version, intel compilers version)?

Is there any way to have a debugging execution of cp2k/mpi with a more or 
less verbose output in order to understand at which point/call does the 
freeze happen?

 Regards

Salvatore

-- 
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+unsubscribe at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/2bffd2de-1afd-4980-b3aa-6438990d81a9n%40googlegroups.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20220803/c9c1acd2/attachment-0001.htm>


More information about the CP2K-user mailing list