[CP2K-user] [CP2K:18509] Re: "Problem moving file" when writing wfn files

captain mus captainmozak at gmail.com
Thu Mar 2 21:44:33 UTC 2023

Hi Theo 

I think I accidentally sent it as private massage (reply to Author) so I 
hope you can get My answer in the private massage. Anyway I will post it 

So for your problem, the current solution that might be works are : 

a) Increase, how many RESTART step you will print out because it takes 
quite a lot of memory. Here in My case I run for 2000 and print RESTART 
every 500 steps 

    FILENAME = rst-wat-dftb-01.restart
      *MD 500*

2) In your slurm script, increase the memory per cpu (*--mem-per-cpu*). I 
set to 5000 from previously 1200 (it is important to aware how much ram 
your cluster has). 

to get the Idea I run in 1 node (md nodes: 188 Dual Socket, 24 core CPUs 
with 3GHz base clock and 192GB Ram) and request 12 MPI for the task  (
*--ntasks*) with each of them will perform 2 THREADS of OpenMP (
*--cpus-per-task*). I have done this by add the following line

So My system will required the total of 24 cores. This setup gave me 7 
second per Step (DFTB calculation with 1536 atoms so I think its pretty OK).
one more thing is, please use the following line as the execution in you 
slurm script
*mpirun -np 12 cp2k.psmp -i input.inp -o out.out*


For now I suspect this error due to the memory problem, I try to find how 
to prevent the input to creating the wfn and all the bak, but I could not 
find it yet 

I hope it will help. 

My best regards
M. Saleh

On Wednesday, March 1, 2023 at 1:42:24 PM UTC+1 Théo Cavignac wrote:

> Hello,
> I am trying to learn CP2K, and I am stuck with an error that I cannot
> understand that looks like this:
>     Trying to move ...wfn to ....wfn.bak-1.
>     rename returned status: -1
> The same error is repeated more than once
> Followed by a crash.
>     forrtl: error (76): Abort trap signal
> I got the same error on a basic sample extracted from the CP2K
> test suite, as well as my actual work case. I am running cp2k.popt on a
> cluster with one or two nodes of 48 cores using Slurm.
> The crash happens at the end of the m_mov routine of
> src/base/machine_posix.f90.
> According to the POSIX manual, rename (the C version at least) always
> return -1, but the binding in the same file label the result code errno.
> Sadly, even assuming that it somewhat get the global errno (I don't know
> much about Fortran-C interop), -1 (or any unsigned equivalent of if) is
> not a valid errno value.
> I checked for permission problems, or size limits, but I don't see
> anything wrong with that. However, I think there might be more than one
> process trying to do the same m_mov concurrently, creating a race
> condition where some of them try to move a file that does not exist.
> So, I don't know what to do next to debug the issue.
> What is the relevant info on the MPI context in the output?
> Could it be some library mismatch with the MPI implementation?
> What should I check to make sure I am not misusing CP2K?
> Best regards,

You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+unsubscribe at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/21546cd6-69d5-4a87-8fd0-5388ff197b8fn%40googlegroups.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20230302/57931034/attachment.htm>

More information about the CP2K-user mailing list