Hi Theo <br /><br />I think I accidentally sent it as private massage (reply to Author) so I hope you can get My answer in the private massage. Anyway I will post it again.<br /><br />So for your problem, the current solution that might be works are : <br /><br />a) Increase, how many RESTART step you will print out because it takes quite a lot of memory. Here in My case I run for 2000 and print RESTART every 500 steps <br /><br /><div>-------------------------</div><div> &RESTART_HISTORY OFF<br /> &END RESTART_HISTORY<br /> &RESTART ON<br /> FILENAME = rst-wat-dftb-01.restart<br /> &EACH<br /> <b>MD 500</b><br /> &END EACH<br /> &END RESTART</div><div>-------------------------<br /></div><div><br /></div><div>2) In your slurm script, increase the memory per cpu (<b>--mem-per-cpu</b>). I set to 5000 from previously 1200 (it is important to aware how much ram your cluster has). <br /></div><div><br /></div><div>to get the Idea I run in 1 node (md nodes: 188 Dual Socket, 24 core CPUs with 3GHz base clock and
192GB Ram) and request 12 MPI for the task (<b>--ntasks</b>) with each of them will perform 2 THREADS of OpenMP (<b>--cpus-per-task</b>). I have done this by add the following line<br />---------------------------------------<b><br /></b></div><div><b>export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK</b></div><div><div>---------------------------------------<b><br /></b></div><div><b><br /></b></div><div>So My system will required the total of 24 cores. This setup gave me 7 second per Step (DFTB calculation with 1536 atoms so I think its pretty OK).<br />one more thing is, please use the following line as the execution in you slurm script<br />------------------------------------------</div><div><b>mpirun -np 12 cp2k.psmp -i input.inp -o out.out</b></div><div><b>-----------------------------<br /></b></div></div><div><br /></div><div>For now I suspect this error due to the memory problem, I try to find how to prevent the input to creating the wfn and all the bak, but I could not find it yet <br /><br />I hope it will help. <br /><br /></div><div>My best regards<br />M. Saleh<br /></div><div><br /></div><div class="gmail_quote"><div dir="auto" class="gmail_attr">On Wednesday, March 1, 2023 at 1:42:24 PM UTC+1 Théo Cavignac wrote:<br/></div><blockquote class="gmail_quote" style="margin: 0 0 0 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">Hello,<br><br>I am trying to learn CP2K, and I am stuck with an error that I cannot<br>understand that looks like this:<br><br> Trying to move ...wfn to ....wfn.bak-1.<br> rename returned status:<span style="white-space:pre"> </span>-1<br><br>The same error is repeated more than once<br>Followed by a crash.<br><br> forrtl: error (76): Abort trap signal<br><br>I got the same error on a basic sample extracted from the CP2K<br>test suite, as well as my actual work case. I am running cp2k.popt on a<br>cluster with one or two nodes of 48 cores using Slurm.<br><br>The crash happens at the end of the m_mov routine of<br>src/base/machine_posix.f90.<br><br>According to the POSIX manual, rename (the C version at least) always<br>return -1, but the binding in the same file label the result code errno.<br>Sadly, even assuming that it somewhat get the global errno (I don't know<br>much about Fortran-C interop), -1 (or any unsigned equivalent of if) is<br>not a valid errno value.<br><br>I checked for permission problems, or size limits, but I don't see<br>anything wrong with that. However, I think there might be more than one<br>process trying to do the same m_mov concurrently, creating a race<br>condition where some of them try to move a file that does not exist.<br><br>So, I don't know what to do next to debug the issue.<br>What is the relevant info on the MPI context in the output?<br><div>Could it be some library mismatch with the MPI implementation?</div><div>What should I check to make sure I am not misusing CP2K?<br></div><br>Best regards,<br><br>Théo CAVIGNAC<br></blockquote></div>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups "cp2k" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="mailto:cp2k+unsubscribe@googlegroups.com">cp2k+unsubscribe@googlegroups.com</a>.<br />
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/cp2k/21546cd6-69d5-4a87-8fd0-5388ff197b8fn%40googlegroups.com?utm_medium=email&utm_source=footer">https://groups.google.com/d/msgid/cp2k/21546cd6-69d5-4a87-8fd0-5388ff197b8fn%40googlegroups.com</a>.<br />