[CP2K:3935] Cray XE6 NAN Errors
DEC014
dcoss... at gmail.com
Thu Jul 26 12:55:10 UTC 2012
The job fails at Step 5251 with the NaN error as described before. I've
tried restarting from previous restart points and each time it fails at the
same point. Below I have placed my input file minus the majority of the
atom coords to save space (it's just a water box. I'm hoping there's an
easy fix since getting the supercomputing center to recompile software is
nearly impossible.
Input File:
@SET CP2K_DATA /u/dec014/QS
&GLOBAL
PROJECT watbox
PRINT_LEVEL LOW
PREFERRED_FFT_LIBRARY FFTW
&TIMINGS
THRESHOLD 0.000001
&END
RUN_TYPE MD
&END GLOBAL
&MOTION
&MD
ENSEMBLE NPT_F
STEPS 20000
TIMESTEP 1
TEMPERATURE 298.15
&THERMOSTAT
REGION MASSIVE
&NOSE
LENGTH 3
YOSHIDA 3
TIMECON 50
MTS 2
&END NOSE
&END
&BAROSTAT
PRESSURE 1.0
TIMECON 50
&THERMOSTAT
&NOSE
LENGTH 3
YOSHIDA 3
TIMECON 50
MTS 2
&END NOSE
&END THERMOSTAT
&END BAROSTAT
&END MD
&PRINT
&TRAJECTORY
&EACH
MD 20
&END EACH
&END TRAJECTORY
&VELOCITIES
&EACH
MD 20
&END EACH
&END VELOCITIES
&CELL
&EACH
MD 1
&END EACH
&END CELL
&STRESS
&EACH
MD 1
&END EACH
&END STRESS
&RESTART
FILENAME rst-md
&EACH
MD 250
&END EACH
&END RESTART
&END PRINT
&END MOTION
&FORCE_EVAL
METHOD QS
STRESS_TENSOR ANALYTICAL
&DFT
BASIS_SET_FILE_NAME ${CP2K_DATA}/GTH_BASIS_SETS
POTENTIAL_FILE_NAME ${CP2K_DATA}/POTENTIAL
&MGRID
CUTOFF 400
&END MGRID
&QS
EPS_DEFAULT 1.0E-14
EXTRAPOLATION ASPC
&END QS
&SCF
SCF_GUESS ATOMIC
MAX_SCF 20
&OUTER_SCF
MAX_SCF 20
&END OUTER_SCF
&OT ON
MINIMIZER DIIS
&END OT
&PRINT
&RESTART OFF
&END RESTART
&END PRINT
&END SCF
&XC
&XC_FUNCTIONAL
&PBE
PARAMETRIZATION REVPBE
&END
&END XC_FUNCTIONAL
&VDW_POTENTIAL
POTENTIAL_TYPE PAIR_POTENTIAL
&PAIR_POTENTIAL
TYPE DFTD2
SCALING 1.0e0
&END PAIR_POTENTIAL
&END VDW_POTENTIAL
&END XC
&END DFT
&SUBSYS
&KIND H
BASIS_SET DZVP-GTH
POTENTIAL GTH-PBE-q1
&END KIND
&KIND O
BASIS_SET DZVP-GTH
POTENTIAL GTH-PBE-q6
&END KIND
&CELL
ABC 26.6800 27.7260 25.9569
&CELL_REF
ABC 26.6800 27.7260 25.9569
&END CELL_REF
&END CELL
&COORD
O 8.4042 7.96733 3.94052
H 8.74996 6.96949 3.98698
H 7.54856 7.98758 4.41086
{ .... more water box coordinates ....}
O 22.2647 1.92667 12.3791
H 22.876 1.27983 11.8524
H 22.4165 1.57844 13.3125
&END COORD
&END SUBSYS
&END FORCE_EVAL
On Wednesday, July 25, 2012 3:22:50 PM UTC-4, IBethune wrote:
>
> Hi,
>
> I would be very surprised if a machine upgrade could cause the software to
> start producing numerical nonsense, however it is always a good idea to
> recompile the code after a new hardware or software upgrade to ensure you
> are getting good performance. You should also update your code to a recent
> SVN version if possible, to pick up any relevant bug-fixes. This *may*
> also help with the numerical troubles.
>
> Beyond that it's hard to say without seeing an input file and more
> specific detail of the problem.
>
> Cheers
>
> - Iain
>
> --
>
> Iain Bethune
> Applications Consultant, EPCC
>
> Email: ibet... at epcc.ed.ac.uk
> Twitter: @IainBethune
> Tel/Fax: +44 (0)131 650 5201/6555
> Mob: +44 (0)7598317015
> Addr: 2404 JCMB, The King's Buildings, Mayfield Road, Edinburgh, EH9 3JZ
>
>
>
>
>
>
> On 25 Jul 2012, at 17:34, DEC014 wrote:
>
> > I am running DFT MD Simulations on a Cray XE-6 machine. They used to
> run perfectly fine, however, the machines underwent some upgrades. Now,
> periodically and seemingly at random, the simulations run into NAN or MPI
> errors. In the OUT file, Barostat, Energy Drift, and Conserved Quantity,
> produce NaN and a corresponding NaN shows up in the ENER file. I'm
> re-running a job that completed before on the same system to see if it's a
> job error or system error. I'm guessing the upgrades are the problem, but
> I'm curious if any other are running into similar situations.
> >
> > If the Upgrades are the problem, what will solve the problem? Recompile
> the software?
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups "cp2k" group.
> > To view this discussion on the web visit
> https://groups.google.com/d/msg/cp2k/-/bG8mx_rJCgYJ.
> > To post to this group, send email to cp... at googlegroups.com.
> > To unsubscribe from this group, send email to
> cp2k+uns... at googlegroups.com.
> > For more options, visit this group at
> http://groups.google.com/group/cp2k?hl=en.
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20120726/90743666/attachment.htm>
More information about the CP2K-user
mailing list