[CP2K:3935] Cray XE6 NAN Errors

DEC014 dcoss... at gmail.com
Thu Jul 26 12:55:10 UTC 2012


The job fails at Step 5251 with the NaN error as described before.  I've 
tried restarting from previous restart points and each time it fails at the 
same point.  Below I have placed my input file minus the majority of the 
atom coords to save space (it's just a water box.  I'm hoping there's an 
easy fix since getting the supercomputing center to recompile software is 
nearly impossible.

Input File:
@SET CP2K_DATA /u/dec014/QS

 &GLOBAL
  PROJECT watbox
  PRINT_LEVEL LOW
  PREFERRED_FFT_LIBRARY   FFTW
  &TIMINGS
     THRESHOLD 0.000001
  &END
  RUN_TYPE MD
 &END GLOBAL

 &MOTION
   &MD
     ENSEMBLE NPT_F
     STEPS 20000
     TIMESTEP 1
     TEMPERATURE 298.15
    &THERMOSTAT
      REGION MASSIVE
      &NOSE
        LENGTH 3
        YOSHIDA 3
        TIMECON 50
        MTS 2
      &END NOSE
    &END
    &BAROSTAT
      PRESSURE 1.0
      TIMECON 50
      &THERMOSTAT
        &NOSE
          LENGTH 3
          YOSHIDA 3
          TIMECON 50
          MTS 2
        &END NOSE
      &END THERMOSTAT
    &END BAROSTAT
   &END MD
   &PRINT
     &TRAJECTORY
       &EACH
        MD 20
       &END EACH
     &END TRAJECTORY
     &VELOCITIES
       &EACH
        MD 20
       &END EACH
     &END VELOCITIES
     &CELL
       &EACH
         MD 1
       &END EACH
     &END CELL
     &STRESS
       &EACH
         MD 1
       &END EACH
     &END STRESS
     &RESTART
       FILENAME rst-md
       &EACH
         MD 250
       &END EACH
     &END RESTART
   &END PRINT
 &END MOTION

 &FORCE_EVAL
   METHOD QS
   STRESS_TENSOR ANALYTICAL
   &DFT
    BASIS_SET_FILE_NAME ${CP2K_DATA}/GTH_BASIS_SETS
    POTENTIAL_FILE_NAME ${CP2K_DATA}/POTENTIAL
    &MGRID
      CUTOFF 400
    &END MGRID
    &QS
      EPS_DEFAULT 1.0E-14
      EXTRAPOLATION ASPC
    &END QS
    &SCF
      SCF_GUESS ATOMIC
      MAX_SCF 20
      &OUTER_SCF
        MAX_SCF 20
      &END OUTER_SCF
      &OT ON
        MINIMIZER DIIS
      &END OT
        &PRINT
          &RESTART OFF
          &END RESTART
        &END PRINT
    &END SCF
    &XC
      &XC_FUNCTIONAL
        &PBE
         PARAMETRIZATION REVPBE
        &END
      &END XC_FUNCTIONAL
      &VDW_POTENTIAL
        POTENTIAL_TYPE PAIR_POTENTIAL
        &PAIR_POTENTIAL
          TYPE DFTD2
          SCALING 1.0e0
        &END PAIR_POTENTIAL
      &END VDW_POTENTIAL
    &END XC
   &END DFT   
   &SUBSYS
     &KIND H
       BASIS_SET DZVP-GTH
       POTENTIAL GTH-PBE-q1
     &END KIND
     &KIND O
       BASIS_SET DZVP-GTH
       POTENTIAL GTH-PBE-q6
     &END KIND
     &CELL
       ABC 26.6800 27.7260 25.9569
       &CELL_REF
         ABC 26.6800 27.7260 25.9569
       &END CELL_REF
     &END CELL
     &COORD
O           8.4042      7.96733      3.94052
H          8.74996      6.96949      3.98698
H          7.54856      7.98758      4.41086
{ .... more water box coordinates ....}
O          22.2647      1.92667      12.3791
H           22.876      1.27983      11.8524
H          22.4165      1.57844      13.3125
     &END COORD
   &END SUBSYS
 &END FORCE_EVAL



On Wednesday, July 25, 2012 3:22:50 PM UTC-4, IBethune wrote:
>
> Hi, 
>
> I would be very surprised if a machine upgrade could cause the software to 
> start producing numerical nonsense, however it is always a good idea to 
> recompile the code after a new hardware or software upgrade to ensure you 
> are getting good performance.  You should also update your code to a recent 
> SVN version if possible, to pick up any relevant bug-fixes.  This *may* 
> also help with the numerical troubles. 
>
> Beyond that it's hard to say without seeing an input file and more 
> specific detail of the problem. 
>
> Cheers 
>
> - Iain 
>
> -- 
>
> Iain Bethune 
> Applications Consultant, EPCC 
>
> Email: ibet... at epcc.ed.ac.uk 
> Twitter: @IainBethune 
> Tel/Fax: +44 (0)131 650 5201/6555 
> Mob: +44 (0)7598317015 
> Addr: 2404 JCMB, The King's Buildings, Mayfield Road, Edinburgh, EH9 3JZ 
>
>
>
>
>
>
> On 25 Jul 2012, at 17:34, DEC014 wrote: 
>
> > I am running DFT MD Simulations on a Cray XE-6 machine.  They used to 
> run perfectly fine, however, the machines underwent some upgrades.  Now, 
> periodically and seemingly at random, the simulations run into NAN or  MPI 
> errors. In the OUT file, Barostat, Energy Drift, and Conserved Quantity, 
> produce NaN and a corresponding NaN shows up in the ENER file.  I'm 
> re-running a job that completed before on the same system to see if it's a 
> job error or system error.  I'm guessing the upgrades are the problem, but 
> I'm curious if any other are running into similar situations. 
> > 
> > If the Upgrades are the problem, what will solve the problem?  Recompile 
> the software? 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "cp2k" group. 
> > To view this discussion on the web visit 
> https://groups.google.com/d/msg/cp2k/-/bG8mx_rJCgYJ. 
> > To post to this group, send email to cp... at googlegroups.com. 
> > To unsubscribe from this group, send email to 
> cp2k+uns... at googlegroups.com. 
> > For more options, visit this group at 
> http://groups.google.com/group/cp2k?hl=en. 
>
>
> -- 
> The University of Edinburgh is a charitable body, registered in 
> Scotland, with registration number SC005336. 
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20120726/90743666/attachment.htm>


More information about the CP2K-user mailing list