CRAY-XT3 - PGI and PathScale

Axel akoh... at gmail.com
Tue Jul 17 04:41:52 UTC 2007


hi teo,

a few remarks to those arch files.

- i would replace: -O3 -Mscalarsse -Mvect=sse
  with: -O2 -Munroll
  in my experience, for almost all codes that use
  a lot of cosine/sine/exponential/power/... SSE
  is counterproductive, as the time to switch between
  the regular floating point unit and the SSE unit does
  not always offset the gain of using SSE, particularly
  in double precision. SSE is most useful for plain linear
  algebra, but for that we have BLAS and (Sca)LAPACK...
  loop unrolling on the other hand helps a lot and in many
  cases -O3 optimizes to aggressively. this is more visible
  on intel cpus compared to amd cpus but still...

- how about using pgf90 instead of ftn for the serial compile?
  this way you'd get a serial executable that can actually
  run on the frontend (and does not segfault).

ciao,
   axel.

p.s.: i know the flags you use are the ones suggested by cray,
but all the examples they show to illustrate those flags, fall under
the plain linear algebra case...

On Jul 16, 3:39 pm, Teodoro Laino <teodor... at gmail.com> wrote:
> FYI:
> I updated today a list of files that require an O0 optimization
> level, due to bugs in the portland compiler, in order
> to run cp2k on the full suite of regtests.
> The new arch files (CRAY-XT3.popt and CRAY-XT3.sopt) contain the full
> list of those subroutines..
>
> The produced executables (both with PGI and PatchScale) can run
> without crash the full set of regtests.
> teo




More information about the CP2K-user mailing list