mpich? problems on a linux cluster

c.pignedoli c.pig... at gmail.com
Fri Dec 7 12:14:42 UTC 2007


Dear all,
I'm running cp2k on a linux cluster with the  intel mpi compiler.

I'm trying to run the following (at the end of the post)
input that had no problem on a cray XT3

If I run the job on 10 nodes (each node has 2 dual core cpus and 8 GB
of memory)
I get the following error (the allocated memory is around 600 MB per
processor):


DISTRIBUTION OF THE PARTICLES (COLUMNS)
  Process col   Number of particles   Number of matrix columns
            0                   123                         -1
            1                   122                         -1
            2                   123                         -1
            3                   122                         -1
            4                   122                         -1

          Sum                   612                         -1

 
*******************************************************************************
 ***                       STARTING GEOMETRY
OPTIMIZATION                    ***
 
*******************************************************************************


 DISTRIBUTION OF THE NEIGHBOR LISTS

  Process   Number of particle pairs   Number of matrix elements

        0                      10189                     1968031
        1                      10035                     1961313
        2                      10289                     1998317
        3                       9990                     1963291
        4                      10245                     1992513
        5                      10133                     1995053
.
.
.
I cut some lines

      Sum                     166159                    33196800

       of                     187578 ( 88.6 % occupation)
PSIlogger: Child with rank 3 exited on signal 11.
.
.
.
.

 CP2K| Stopped by processor
number                                             7
 CP2K|  mpi_alltoallv @ mp_alltoall_c22v
 CP2K| Error number
was                                                       18



on 20 nodes the job is able to do the WFN optimization
but at the first geometry optimization step I get the following error:

  Total electronic density (r-space):      -3345.9999989832
0.0000010168
  Total core charge density (r-space):      3345.9999999953
-0.0000000047
  Total charge density (r-space):
0.0000010121
  Total charge density (g-space):
0.0000010121

 ENERGY| Total FORCE_EVAL ( QS ) energy (a.u.):
-14081.720436242156211


 --------  Informations at step =     0 ------------
  Optimization Method        =                 BFGS
  Total Energy               =    -14081.7204362422
 ---------------------------------------------------

 --------------------------
 OPTIMIZATION STEP:      1
 --------------------------
PSIlogger: Child with rank 76 exited on signal 11.
forrtl: error (78): process killed (SIGTERM)
Image              PC                Routine            Line
Source
cp2k_para.x        00000000019F9A8D  Unknown               Unknown
Unknown
cp2k_para.x        0000000001662A61  Unknown               Unknown
Unknown
cp2k_para.x        00000000016325C6  Unknown               Unknown
Unknown
cp2k_para.x        00000000012D1D48  Unknown               Unknown
Unknown
libmkl_lapack.so   00002B74CC848C86  Unknown               Unknown
Unknown



Do you have any suggestion to solve this problem?


Thanks in advance

Carlo Pignedoli



&FORCE_EVAL
  METHOD Quickstep
  &DFT
    BASIS_SET_FILE_NAME BASIS_SET
    POTENTIAL_FILE_NAME GTH_POTENTIALS
    &MGRID
      CUTOFF 320
    &END MGRID
    &QS
     EPS_DEFAULT 1.0E-12
     MAP_CONSISTENT
     EXTRAPOLATION PS
     EXTRAPOLATION_ORDER 2
    &END QS
    &SCF
      MAX_SCF 100
      EPS_SCF 8.0E-6
      MIXING 0.4
      SCF_GUESS RESTART
      WRITE_RESTART_EACH 100
      &OUTER_SCF
       OPTIMIZER DIIS
      &END OUTER_SCF
      &OT
       MINIMIZER CG
       PRECONDITIONER FULL_ALL
       ENERGY_GAP 0.001
      &END OT
    &END SCF
    &XC
      &XC_FUNCTIONAL PADE
      &END XC_FUNCTIONAL
      &XC_GRID
        XC_DERIV SPLINE2_smooth
        XC_SMOOTH_RHO NN10
      &END XC_GRID
    &END XC
  &END DFT
  &SUBSYS
    &CELL
      A 19.98  0           0
      B -9.99 17.30318757  0
      C  0     0          39.673
      UNIT ANGSTROM
    &END CELL
    &COORD
  H         4.9950000000        8.6515937800       -1.0000000000
 Mn        -2.5048024304        8.7997909762        6.5591956999
 Mn        -3.6125984316       10.7073312460        6.5573654752
 Mn         2.5268519596        8.8010351939        6.5480419619
 Mn        -1.0883339946       15.0799127199        6.5561464474
.
.
.    more or less 600 atoms
.

 Al         3.6838441802       15.3157889782       31.0427695918
 Al         0.0034814513        8.8756390665       31.0335343629
 Al        -2.2900875277       12.8617932319       31.0318283490
 Al         2.2900445729       12.8664182533       31.0281697464
    &END COORD
    &KIND H
      BASIS_SET DZV-GTH-PBE
      POTENTIAL GTH-PBE-q1
    &END KIND
    &KIND Mn
      BASIS_SET DZV-GTH-PADE
      POTENTIAL GTH-PBE-q15
    &END KIND
    &KIND Al
      BASIS_SET DZVP-GTH-PADE
      POTENTIAL GTH-PBE-q3
    &END KIND
    &PRINT
      &ATOMIC_COORDINATES
       FILENAME ./COORDINATE
      &END ATOMIC_COORDINATES
      &CELL
      &END CELL
    &END PRINT
  &END SUBSYS
  &PRINT
   &FORCES
    FILENAME ./FORZE_1
   &END FORCES
  &END PRINT
&END FORCE_EVAL
&MOTION
 &CONSTRAINT
  &FIXED_ATOMS
   LIST 1
  &END FIXED_ATOMS
 &END CONSTRAINT
 &GEOOPT
  OPTIMIZER BFGS
  MAX_ITER 400
  RESTART F
 &END GEOOPT
 &PRINT
  &FORCES
   FILENAME ./FORC
  &END FORCES
  &STRESS
  &END STRESS
  &TRAJECTORY
   FILENAME ./TRAJ_1
  &END TRAJECTORY
 &END PRINT
&END MOTION
&EXT_RESTART
 RESTART_POS T
&END EXT_RESTART
&GLOBAL
  FFTLIB FFTSG

  RUN_TYPE GEO_OPT
  PRINT_LEVEL MEDIUM
&END GLOBAL




More information about the CP2K-user mailing list