CRAY-XT3 - PGI and PathScale
akoh... at gmail.com
Tue Jul 17 06:41:52 CEST 2007
a few remarks to those arch files.
- i would replace: -O3 -Mscalarsse -Mvect=sse
with: -O2 -Munroll
in my experience, for almost all codes that use
a lot of cosine/sine/exponential/power/... SSE
is counterproductive, as the time to switch between
the regular floating point unit and the SSE unit does
not always offset the gain of using SSE, particularly
in double precision. SSE is most useful for plain linear
algebra, but for that we have BLAS and (Sca)LAPACK...
loop unrolling on the other hand helps a lot and in many
cases -O3 optimizes to aggressively. this is more visible
on intel cpus compared to amd cpus but still...
- how about using pgf90 instead of ftn for the serial compile?
this way you'd get a serial executable that can actually
run on the frontend (and does not segfault).
p.s.: i know the flags you use are the ones suggested by cray,
but all the examples they show to illustrate those flags, fall under
the plain linear algebra case...
On Jul 16, 3:39 pm, Teodoro Laino <teodor... at gmail.com> wrote:
> I updated today a list of files that require an O0 optimization
> level, due to bugs in the portland compiler, in order
> to run cp2k on the full suite of regtests.
> The new arch files (CRAY-XT3.popt and CRAY-XT3.sopt) contain the full
> list of those subroutines..
> The produced executables (both with PGI and PatchScale) can run
> without crash the full set of regtests.
More information about the CP2K-user