[CP2K:6608] Compilations with Intel (XE 2013) for CP2K-trunk (2.7dev) & regtests errors

Iain Bethune ibet... at epcc.ed.ac.uk
Wed Jun 10 15:24:10 UTC 2015
Previous message (by thread): Compilations with Intel (XE 2013) for CP2K-trunk (2.7dev) & regtests errors
Next message (by thread): Compilations with Intel (XE 2013) for CP2K-trunk (2.7dev) & regtests errors
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Rolf,

Let me have a go at answering some of your (many) questions!

* In some of my (not-so-recent) testing, I found that the choice of gfortran or Ifort (or the precise optimisation levels of each), makes very little difference to the performance of the code.  In practice, as long as you compile for the vector instruction set of your CPU, then most of the runtime gains can be found from 5 places: a well-tuned BLAS/LAPACK/BLACS/ScaLAPACK stack (MKL is of similar quality to Cray Libsci here), FFTW3 (or MKL’s FFTW3 interface), libgrid (see cp2k/tools/autotune_grid) and libsmm (see cp2k/tools/build_libsmm), a good MPI library/interconnect.

* CP2K is very well-tested with gfortran (in fact I think we test nightly devel builds of gcc with CP2K trunk), so it’s possible to compile CP2K at -O3 with everything since GCC 4.6.  Intel we have good coverage of (currently 15.x compilers), and we do report bugs, which do get fixed, but only after the release of beta compilers, so the turnaround time is much longer, and some outstanding bugs remain (relating to OpenMP only).

* It’s not clear from your email re: testing with -O2 - your FAILED count is 0, but you said that some tests fail with segfaults - from my notes I believe the minimal set of files which must be compiled at low optimisation with ifort 14.0.2 are :

external_potential_types.F        -O1
qs_linres_current.F                   -O1
qs_vxc_atom.F                         -O1
mp2_optimize_ri_basis.F         -O0 (I believe a code change was made that should allow this to be compiled with -O1, we are still waiting for Intel to fix the relevant compiler bug though).

* Re: numerical errors, these do not mean very much - many of the tests are not converged, do not use production quality settings, or are otherwise numerically unstable (so they run quickly).  Thus they will vary, sometimes by more than several orders of magnitude between different optimisation settings, compiler versions etc.  Thus the ‘WRONG’ count should only really be used as a regression test when code changes are made that should not affect the numerical behaviour of the code.

* If you do want to compare against a fixed reference, then the gfortran-pdbg (-O1) is a reasonable baseline - see http://dashboard.cp2k.org/archive/mkrack-pdbg/index.html . The do_regtest script now prints the values that are compared in the test output.  However, to be sure you really need to run some larger simulations and check that your results are physically sensible (which of course we should always do!).

I hope that’s helpful!

Cheers

- Iain

--

Iain Bethune
Project Manager, EPCC

Email: ibet... at epcc.ed.ac.uk
Twitter: @IainBethune
Web: http://www2.epcc.ed.ac.uk/~ibethune
Tel/Fax: +44 (0)131 650 5201/6555
Mob: +44 (0)7598317015
Addr: 2404 JCMB, The King's Buildings, Peter Guthrie Tait Road, Edinburgh, EH9 3FD

> On 10 Jun 2015, at 11:38, Rolf David <rolf.d... at gmail.com> wrote:
> 
> Hi all
> 
> I've encountered several problem with CP2K compilation (trunk-rev-15402, popt) with Intel Compiler/MPI/MKL (icc/ifort 14.0.2 : mpi 4.1 Update 2 : mkl 11.1.2)
> 
> First my "out of the box" arch file (libint is 1.1.4, libxc 2.0.1):
>  
> CC       = mpiicc
> CPP      =
> FC       = mpiifort
> LD       = mpiifort
> AR       = xiar -r
> DFLAGS   = -D__INTEL -D__FFTSG -D__parallel -D__BLACS -D__SCALAPACK -D__MKL -D__FFTW3 -D__LIBINT -D__LIBXC2
> CPPFLAGS =
> FCFLAGS  = $(DFLAGS) $(INC) -O3 -axAVX -xSSE4.2 -heap-arrays 64 -funroll-loops -fpp -free
> FCFLAGS2 = $(DFLAGS) $(INC) -O1 -axAVX -xSSE4.2 -heap-arrays 64 -fpp -free
> LDFLAGS  = $(FCFLAGS)
> LIBS = -L$(MKL_LIB) -Wl,-rpath,$(MKL_LIB) \
>         -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -lmkl_intel_lp64 \
>         -lmkl_sequential -lmkl_core \
>         $(FFTW_LIB)/libfftw3xf_intel.a \
>         $(LIBINT_LIB)/libderiv.a $(LIBINT_LIB)/libint.a -lstdc++ \
>         $(LIBXC_LIB)/libxc.a \
>         -lpthread -lm
> OBJECTS_ARCHITECTURE = machine_intel.o
> graphcon.o: graphcon.F
>         $(FC) -c $(FCFLAGS2) $<
> # In order to avoid segv when HF exchange for example
> qs_vxc_atom.o: qs_vxc_atom.F
>         $(FC) -c $(FCFLAGS2) $<
> 
> We are calling it test-O3 
>  
> 
> Number of FAILED  tests 56
> Number of WRONG   tests 18
> Number of CORRECT tests 2559
> Number of NEW     tests 16
> Total number of   tests 2649
> GREPME 56 18 2559 16 2649 X
> 
> Most failed are regtesting Fist (regtest (-5)(-12)(-pol)(-6)(-15)(-1-3)(-4)(-1-2)(-2)(-8)(-9)(-11) (and /QS/regtest-ot/H2-BECKE-MD.inp, QMMM/SE/regtest/ mol_CSVR_gen*.inp,QMMM/SE/regtest_2/water_g3x3_excl_*m.inp)
> 
> 
> 
> If I do the same with -O2 instead of -O3 (test-O2)
> 
> 
> 
> Number of FAILED  tests 0
> Number of WRONG   tests 17
> Number of CORRECT tests 2616
> Number of NEW     tests 16
> Total number of   tests 2649
> GREPME 0 17 2616 16 2649 X
> 
> 
> 
> So I assume some files has to be compiled with -O1 (on top of the two ones with -O1) -> Fail segfault
> 
> And 10 errors are "unacceptable" (greater than one order : rel error 1e-13 tolerence is 1-14 is considered it ok, but not 1e-12)
> 
> 
> 
> and -O1 instead of -O3 (test-O1)
> 
> 
> 
> Number of FAILED  tests 0
> Number of WRONG   tests 81
> Number of CORRECT tests 2568
> Number of NEW     tests 0
> Total number of   tests 2649
> GREPME 0 81 2568 0 2649 X
> 
> 
> 
> More wrong (9 are "unacceptable" but different from -O2)
> 
> 
> 
> 
> 
> Also I've tried the -O2 on all, and -O1 on two files : (as hinted by Iain Bethune in https://groups.google.com/forum/#!searchin/cp2k/intel$20$20after$3A2014$2F01$2F01/cp2k/YZ3gVI-6Au0/uJZC8QKSzxUJ) (test-IB)
> 
> 
> 
> Number of FAILED  tests 166
> Number of WRONG   tests 16
> Number of CORRECT tests 2467
> Number of NEW     tests 0
> Total number of   tests 2649
> GREPME 166 16 2467 0 2649 X
> 
> 
> 
> This setup is wrose thant the previous -O2/-O1 files. I assume this was only valid for 2.5.1 as in the post.
> 
> 
> 
> And also using the Arch files from (http://support.euforia-project.eu/phi/popt/regtest-arch, but without -D__HAS_smm_dnn -D__HAS_LIBGRID) (test-EPCC)
> 
> 
> 
> Number of FAILED  tests 159
> Number of WRONG   tests 38
> Number of CORRECT tests 2436
> Number of NEW     tests 16
> Total number of   tests 2649
> GREPME 159 38 2436 16 2649 X
> 
> 
> 
> Lots more of failed: influence of LIBGRID/smm_dnn ? Or maybe the files compiled in -O1 aren't showed. Or since it's ins't the same compiler (XE 2015 vs XE 2013)
> 
> 
> 
> 
> 
> So I have some questions (first goal is no FAILED test while maintaining the best speed (-O1 is clearly slower, but maybe the diff -O3 vs -O2 is next to nothing, our cluster is small so we need to push it to the limit so we went for -O3 first))
> 
> 
> 
> -Is something wrong in our arch file ?
> 
> -Someone managed to compile in -O3 (or -O2) with some files in -O1 (I deduced graphcon.F and qs_vxc_atom.F must be compiled -O1, but maybe other, or some in -O2) with intel compiler 2013 (14.0.x versions)  and no big errors ?
> 
> -O2 vs -O3 ?
> 
> -What can I do to see what's wrong in FAILED/segfault, -traceback -g, but I what do I look for ? (I'm no expert !) or also what 'file.F' are included in each regtest if it's possible to know easily for now ?
> 
> 
> 
> -Also I noticed big errors being different from -O3/-O2/-O1 (the 3 first arch I used), and since that can I assume there is nothing wrong with libint/libxc/mkl, just -Oflags ? :
> 
> 
> 
> -O3 + -O1 on  graphcon.F and qs_vxc_atom.F (test -O3)
> 
> NEB/regtest-1/2gly_EB-NEB.inp.out 
> 
> 
> NEB/regtest-2/2gly_DIIS-SM.inp.out 
> 
> 
> NEB/regtest-2/2gly_DIIS-DNEB.inp.out
> 
> 
> NEB/regtest-2/2gly_DIIS-NEB.inp.out 
> 
> relative error :   2e-02 >  numerical tolerance = 8e-12/-11/-13
> 
> Fist/regtest-3/water_2_TS_CG.inp.out 
> 
> 
> 
> relative error :   2.21900214e-06 >  numerical tolerance = 1.0E-14
> 
> QS/regtest-ri-mp2/opt_basis_O_auto_gen.inp.out
> 
> 
> relative error :   6.54370492e-02 >  numerical tolerance = 1e-04
> 
> QS/regtest-almo-2/FH-chain.inp.out
> 
> 
> relative error :   2.00884032e-10 >  numerical tolerance = 1e-13
> 
> QS/regtest-almo-1/almo-x.inp.out
> 
> 
> QS/regtest-almo-1/almo-guess.inp.out
> 
> 
> QS/regtest-almo-1/almo-scf.inp.out 
> 
> 
>  relative error :   6e-12 >  numerical tolerance = 4/7/8e-14
> 
> SE/regtest-3-4/Al2O3.inp.out
> 
> 
>  relative error :   2.51373362e-05 >  numerical tolerance = 6e-14
> 
> 
> 
> -O2 + -O1 on  graphcon.F and qs_vxc_atom.F (---> Same errors as test-O3) (test -O2)
> 
> 
> NEB/regtest-1/2gly_EB-NEB.inp.out 
> 
> 
> NEB/regtest-2/2gly_DIIS-SM.inp.out 
> 
> 
> NEB/regtest-2/2gly_DIIS-DNEB.inp.out
> 
> 
> NEB/regtest-2/2gly_DIIS-NEB.inp.out 
> 
> relative error :   2e-02 >  numerical tolerance = 8e-12/-11/-13
> 
> 
> Fist/regtest-3/water_2_TS_CG.inp.out :
> 
> relative error :   2.21900214e-06 >  numerical tolerance = 1.0E-14
> 
> 
> QS/regtest-ri-mp2/opt_basis_O_auto_gen.inp.out
> 
> relative error :   6.54370492e-02 >  numerical tolerance = 1e-04
> 
> 
> QS/regtest-almo-2/FH-chain.inp.out 
> 
> relative error :   2.00884032e-10 >  numerical tolerance = 1e-13
> 
> QS/regtest-almo-1/almo-x.inp.out
> 
> 
> QS/regtest-almo-1/almo-guess.inp.out
> 
> 
> QS/regtest-almo-1/almo-scf.inp.out 
> 
> 
>  relative error :   6e-12 >  numerical tolerance = 4/7/8e-14
> 
> SE/regtest-3-4/Al2O3.inp.out
> 
> 
>  relative error :   2.51373362e-05 >  numerical tolerance = 6e-14
> 
> 
> 
> -O1 on all (---> Differents errors as test-O3/-O2) (test -O1)
> 
> 
> 
> QS/regtest-ps-implicit-1-3/Ar_mixed_planar.inp.out
> 
> 
>  relative error :   1.02615640e-09 >  numerical tolerance = 1e-12
> 
> 
> QS/regtest-ps-implicit-2-2/H2O_mixed_periodic_planar.inp.out :
> 
> 
>  relative error :   3.64315287e-07 >  numerical tolerance = 1e-12
> 
> 
> QS/regtest-ps-implicit-2-3/H2O_mixed_periodic_cylindrical.inp.out :
> 
> 
> 
>  relative error :   3.99727816e-07 >  numerical tolerance = 1e-12
> 
> 
> 
> QS/regtest-ps-implicit-1-2/Ar_mixed_periodic_planar.inp.out
> 
> 
> 
>  relative error :   1.63406231e-06 >  numerical tolerance = 1e-12
> 
> 
> QS/regtest-admm-4/MD-1.inp.out
> 
> 
> 
>  relative error :   6.79583221e-11 >  numerical tolerance = 7e-13
> 
> 
> QS/regtest-admm-4/MD-2_no_OT.inp.out
> 
> 
> relative error :   1.05116397e-11 >  numerical tolerance = 1.0E-14
> 
> 
> Fist/regtest-3/2d_pot.inp.out
> 
> 
> relative error :   2.56003763e-01 >  numerical tolerance = 5e-06
> 
> 
> Fist/regtest-1-2/deca_ala_reftraj.inp.out
> 
> 
>  relative error :   5.45234681e-12 >  numerical tolerance = 1.0E-14
> 
> 
> Fist/regtest-4/H2O-meta-combine.inp.out
> 
> 
>  relative error :   2.41671120e-02 >  numerical tolerance = 1.0E-14
> 
> 
> 
> 
> 
> Any help/hint/info/experience will be well recieved.
> 
> 
> 
> Also we have gcc/gfortran on the cluster. Is intel faster for CP2K or roughly the same as GCC ?
> 
> 
> 
> Thank you for your time if you've read all this !
> 
> 
> 
> Kind regards,
> 
> 
> 
> Rolf David
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups "cp2k" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+uns... at googlegroups.com.
> To post to this group, send email to cp... at googlegroups.com.
> Visit this group at http://groups.google.com/group/cp2k.
> For more options, visit https://groups.google.com/d/optout.


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Previous message (by thread): Compilations with Intel (XE 2013) for CP2K-trunk (2.7dev) & regtests errors
Next message (by thread): Compilations with Intel (XE 2013) for CP2K-trunk (2.7dev) & regtests errors
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the CP2K-user mailing list