[CP2K-user] [CP2K:18509] Re: "Problem moving file" when writing wfn files
captain mus
captainmozak at gmail.com
Thu Mar 2 21:44:33 UTC 2023
Hi Theo
I think I accidentally sent it as private massage (reply to Author) so I
hope you can get My answer in the private massage. Anyway I will post it
again.
So for your problem, the current solution that might be works are :
a) Increase, how many RESTART step you will print out because it takes
quite a lot of memory. Here in My case I run for 2000 and print RESTART
every 500 steps
-------------------------
&RESTART_HISTORY OFF
&END RESTART_HISTORY
&RESTART ON
FILENAME = rst-wat-dftb-01.restart
&EACH
*MD 500*
&END EACH
&END RESTART
-------------------------
2) In your slurm script, increase the memory per cpu (*--mem-per-cpu*). I
set to 5000 from previously 1200 (it is important to aware how much ram
your cluster has).
to get the Idea I run in 1 node (md nodes: 188 Dual Socket, 24 core CPUs
with 3GHz base clock and 192GB Ram) and request 12 MPI for the task (
*--ntasks*) with each of them will perform 2 THREADS of OpenMP (
*--cpus-per-task*). I have done this by add the following line
---------------------------------------
*export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK*
---------------------------------------
So My system will required the total of 24 cores. This setup gave me 7
second per Step (DFTB calculation with 1536 atoms so I think its pretty OK).
one more thing is, please use the following line as the execution in you
slurm script
------------------------------------------
*mpirun -np 12 cp2k.psmp -i input.inp -o out.out*
*-----------------------------*
For now I suspect this error due to the memory problem, I try to find how
to prevent the input to creating the wfn and all the bak, but I could not
find it yet
I hope it will help.
My best regards
M. Saleh
On Wednesday, March 1, 2023 at 1:42:24 PM UTC+1 Théo Cavignac wrote:
> Hello,
>
> I am trying to learn CP2K, and I am stuck with an error that I cannot
> understand that looks like this:
>
> Trying to move ...wfn to ....wfn.bak-1.
> rename returned status: -1
>
> The same error is repeated more than once
> Followed by a crash.
>
> forrtl: error (76): Abort trap signal
>
> I got the same error on a basic sample extracted from the CP2K
> test suite, as well as my actual work case. I am running cp2k.popt on a
> cluster with one or two nodes of 48 cores using Slurm.
>
> The crash happens at the end of the m_mov routine of
> src/base/machine_posix.f90.
>
> According to the POSIX manual, rename (the C version at least) always
> return -1, but the binding in the same file label the result code errno.
> Sadly, even assuming that it somewhat get the global errno (I don't know
> much about Fortran-C interop), -1 (or any unsigned equivalent of if) is
> not a valid errno value.
>
> I checked for permission problems, or size limits, but I don't see
> anything wrong with that. However, I think there might be more than one
> process trying to do the same m_mov concurrently, creating a race
> condition where some of them try to move a file that does not exist.
>
> So, I don't know what to do next to debug the issue.
> What is the relevant info on the MPI context in the output?
> Could it be some library mismatch with the MPI implementation?
> What should I check to make sure I am not misusing CP2K?
>
> Best regards,
>
> Théo CAVIGNAC
>
--
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+unsubscribe at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/21546cd6-69d5-4a87-8fd0-5388ff197b8fn%40googlegroups.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20230302/57931034/attachment.htm>
More information about the CP2K-user
mailing list