[CP2K-user] [CP2K:18509] Re: "Problem moving file" when writing wfn files

captain mus captainmozak at gmail.com
Thu Mar 2 21:44:33 UTC 2023


Hi Theo 

I think I accidentally sent it as private massage (reply to Author) so I 
hope you can get My answer in the private massage. Anyway I will post it 
again.

So for your problem, the current solution that might be works are : 

a) Increase, how many RESTART step you will print out because it takes 
quite a lot of memory. Here in My case I run for 2000 and print RESTART 
every 500 steps 

-------------------------
 &RESTART_HISTORY OFF
  &END RESTART_HISTORY
  &RESTART ON
    FILENAME = rst-wat-dftb-01.restart
    &EACH
      *MD 500*
    &END EACH
  &END RESTART
-------------------------

2) In your slurm script, increase the memory per cpu (*--mem-per-cpu*). I 
set to 5000 from previously 1200 (it is important to aware how much ram 
your cluster has). 

to get the Idea I run in 1 node (md nodes: 188 Dual Socket, 24 core CPUs 
with 3GHz base clock and 192GB Ram) and request 12 MPI for the task  (
*--ntasks*) with each of them will perform 2 THREADS of OpenMP (
*--cpus-per-task*). I have done this by add the following line
---------------------------------------
*export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK*
---------------------------------------

So My system will required the total of 24 cores. This setup gave me 7 
second per Step (DFTB calculation with 1536 atoms so I think its pretty OK).
one more thing is, please use the following line as the execution in you 
slurm script
------------------------------------------
*mpirun -np 12 cp2k.psmp -i input.inp -o out.out*

*-----------------------------*

For now I suspect this error due to the memory problem, I try to find how 
to prevent the input to creating the wfn and all the bak, but I could not 
find it yet 

I hope it will help. 

My best regards
M. Saleh

On Wednesday, March 1, 2023 at 1:42:24 PM UTC+1 Théo Cavignac wrote:

> Hello,
>
> I am trying to learn CP2K, and I am stuck with an error that I cannot
> understand that looks like this:
>
>     Trying to move ...wfn to ....wfn.bak-1.
>     rename returned status: -1
>
> The same error is repeated more than once
> Followed by a crash.
>
>     forrtl: error (76): Abort trap signal
>
> I got the same error on a basic sample extracted from the CP2K
> test suite, as well as my actual work case. I am running cp2k.popt on a
> cluster with one or two nodes of 48 cores using Slurm.
>
> The crash happens at the end of the m_mov routine of
> src/base/machine_posix.f90.
>
> According to the POSIX manual, rename (the C version at least) always
> return -1, but the binding in the same file label the result code errno.
> Sadly, even assuming that it somewhat get the global errno (I don't know
> much about Fortran-C interop), -1 (or any unsigned equivalent of if) is
> not a valid errno value.
>
> I checked for permission problems, or size limits, but I don't see
> anything wrong with that. However, I think there might be more than one
> process trying to do the same m_mov concurrently, creating a race
> condition where some of them try to move a file that does not exist.
>
> So, I don't know what to do next to debug the issue.
> What is the relevant info on the MPI context in the output?
> Could it be some library mismatch with the MPI implementation?
> What should I check to make sure I am not misusing CP2K?
>
> Best regards,
>
> Théo CAVIGNAC
>

-- 
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+unsubscribe at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/21546cd6-69d5-4a87-8fd0-5388ff197b8fn%40googlegroups.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20230302/57931034/attachment.htm>


More information about the CP2K-user mailing list