[CP2K:8750] Re: CI-NEB calculation: crashes

Jörg Saßmannshausen j.sassma... at ucl.ac.uk
Tue Feb 28 12:09:17 UTC 2017


Hi Matt,

thanks for the feedback. 

I think that error message is a bit of a red herring. I am running normal 
geometry and hessian calculations for some time now and my wavefunction file is 
always called WFN_restart.wfn in the input file. 

Originally I suspected it is a problem with the cluster but given that I could 
repeat the problem with that calculation and not with a different calculation I 
think that is not the problem. 

It is running now. All I done was removing the duplicated line

@ENDIF

in my input file. I don't really know why I had it twice to be honest and it 
does not make much sense to me that for a SM type of band calculations that 
did not cause any problems whereas it does for a CI-NEB calculation. I would 
have thought if there is a problem with the input file, the program crashes 
right at the beginning and not after the first step. 

So for now I think we can close that, problem sorted. 

Thanks for the feedback though.

All the best from a sunny London

Jörg

On Tuesday 28 Feb 2017 03:17:50 Matt W wrote:
> Hi Jörg,
> 
> to me this error message
> 
> Trying to move ./WFN_restart.wfn.bak-1 to ./WFN_restart.wfn.bak-2.
>  rename returned status:           -1
> 
> looks suspicious. I would expect the wavefunction files to have some
> prefixes indicating which replica etc. Maybe several MPI processes are
> trying to get a file lock on the same file?
> 
> Have you changed the names of any restart files / output file names etc in
> you input file?
> 
> Matt
> 
> On Monday, February 27, 2017 at 10:23:09 PM UTC, sassy wrote:
> > Dear all,
> > 
> > I am trying to do a CI-NEB calculation but after the first step the
> > calculation
> > 
> > crashed which this error message:
> >  NEB| Building initial set of coordinates. END
> >  
> >  *************************************************************************
> >  ******
> >  
> >  BAND TYPE                     =
> >  CI-
> > 
> > NEB
> > 
> >  BAND TYPE OPTIMIZATION        =
> > 
> > SD
> > 
> >  STEP NUMBER                   =
> > 
> > 0
> > 
> >  RMSD DISTANCE DEFINITION      =
> > 
> > T
> > 
> >  NUMBER OF NEB REPLICA         =
> > 
> > 5
> > 
> >  DISTANCES REP =        9.750661        9.750661        9.750661
> > 
> > 9.750661
> > 
> >  ENERGIES [au] =     -648.476382     -647.620195     -646.701277
> > 
> > -647.623017
> > 
> >                      -648.424927
> >  
> >  BAND TOTAL ENERGY [au]        =
> > 
> > -3238.84579812863058
> > 
> >  *************************************************************************
> >  ******
> >  
> >  Trying to move ./WFN_restart.wfn.bak-1 to ./WFN_restart.wfn.bak-2.
> >  rename returned status:           -1
> >  Problem moving file
> > 
> > -------------------------------------------------------
> > Primary job  terminated normally, but 1 process returned
> > a non-zero exit code.. Per user-direction, the job has been aborted.
> > -------------------------------------------------------
> > --------------------------------------------------------------------------
> > mpirun detected that one or more processes exited with non-zero status,
> > thus
> > causing
> > 
> > the job to be terminated. The first process to do so was:
> >   Process name: [[44988,1],192]
> >   Exit code:    1
> > 
> > The SGE error files contains this:
> > 
> > cp2k-4.1-avx2.popt:3555 terminated with signal 6 at PC=2ad95d2c35f7
> > SP=7ffe9c0dbcf8.
> > (I have omitted the backtrace)
> > 
> > I am using 256 cores and this is the relevant part of my input file:
> > 
> > @SET BAND_TYPE NEB
> > &MOTION
> > 
> >   &PRINT
> >   
> >     &VELOCITIES OFF
> >     &END
> >   
> >   &END
> >   &BAND
> >   
> >     NPROC_REP 32
> > 
> > @IF ( ${BAND_TYPE} == NEB )
> > 
> >     BAND_TYPE CI-NEB
> >     K_SPRING 0.2
> >     ROTATE_FRAMES T
> >     &CI_NEB
> >     
> >        NSTEPS_IT  5
> >     
> >     &END
> > 
> > @ENDIF
> > @ENDIF
> > 
> >     NUMBER_OF_REPLICA 5
> >     &CONVERGENCE_CONTROL
> >     
> >       MAX_FORCE 0.001
> >       RMS_FORCE 0.0005
> >     
> >     &END
> >     &OPTIMIZE_BAND
> >     
> >       OPTIMIZE_END_POINTS F
> >       OPT_TYPE DIIS
> >       &DIIS
> >       
> >        MAX_STEPS 200
> >        N_DIIS 7
> >        NO_LS
> >        STEPSIZE 0.5
> >        MAX_STEPSIZE 1.0
> >       
> >       &END
> >     
> >     &END
> >     &REPLICA
> >     
> >       COORD_FILE_NAME files/start-A.xyz
> >     
> >     &END
> >     &REPLICA
> >     
> >       COORD_FILE_NAME files/final-C.xyz
> >     
> >     &END
> >     &PROGRAM_RUN_INFO
> >     &END
> >     &CONVERGENCE_INFO
> >     &END
> >   
> >   &END BAND
> > 
> > &END MOTION
> > 
> > 
> > Could anybody point me in the right direction here? I am trying to get
> > these
> > calculations done for some time now and I am still stuck. I have checked
> > the
> > cluster with a different input file which I know works and so I got some
> > confidence it is not a cluster problem.
> > Anybody any ideas?
> > 
> > Please let me know if you need more informations.
> > 
> > All the best from London
> > 
> > Jörg
> > 
> > 
> > 
> > email: j.sas... at ucl.ac.uk <javascript:>
> > web: http://sassy.formativ.net
> > 
> > Please avoid sending me Word or PowerPoint attachments.
> > See http://www.gnu.org/philosophy/no-word-attachments.html


-- 
*************************************************************
Dr. Jörg Saßmannshausen, MRSC
University College London
Department of Chemistry
20 Gordon Street
London
WC1H 0AJ 

email: j.sassma... at ucl.ac.uk
web: http://sassy.formativ.net

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 220 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20170228/6d1e1630/attachment.sig>


More information about the CP2K-user mailing list