[CP2K:1790] Re: FFTSGL compile

Ondrej Marsalek ondrej.... at gmail.com
Tue Feb 24 22:12:55 UTC 2009

Hi again,

On Sun, Feb 22, 2009 at 01:23, Axel <akoh... at gmail.com> wrote:
> On Feb 21, 4:39 pm, Ondrej Marsalek <ondrej.... at gmail.com> wrote:
>> Hi,
>> as I said in another thread, I have build a single precision version of
>> CP2K. It seems to (mostly) work, there were some minor issues in the
>> tests, but I have not explored them. I am only interested in this
>> because of the potential to use CUDA FFT. So far, I have linked against
>> an SP build of FFTW (version 3.2.1).
> please note, that there are two different issues. a full single
> precision version of cp2k and a version with just the FFT in single
> precision.  at the moment, the full single precision version is mostly
> kept alive to keep the code SP clean and for the - improbable - case -
> that somebody actually goes over the code to rewrite all the parts
> where single precision use in the current style is leading to
> instabilities. quantum chemistry is a particularly troublesome field,
> since you are interested in the differences of large numbers.

Sorry for the confusion, I meant a CP2K build with SP FFT. I have not
tried to get a fully SP CP2K and will not go into the trouble.

>> I have one question, though. The SP build seems to be a bit slower than
>> a DP build, on identical input and a Core2 processor. Is this to be
>> expected? Is it because SP is no faster than DP on such a CPU and there
>> is the additional overhead of conversion?
> assuming that you talk about the version with the single precision FFT
> only, then, yes, you have an additional copy/conversion of data and
> that should lead to a slowdown, and for as long as you didn't compile
> your FFTW with putting your FPU into single precision mode (-pc32 with
> intel compilers, you need -pc64 for double precision; the default is
> -pc80, btw. this governs how many iterations are needed to converge,
> e.g., a square root in the FPU), then there cannot be a speed
> difference between single and double precision floating point. except
> for the differences in memory bandwith requirements.

Just out of curiosity, I rebuilt FFTW in SP with -pc32 and the situation
remained the same - SP version is slightly slower than DP. I don't mind
that, because there is no need for it, but just for completeness.

>> I should have a Tesla available soon and I'll be happy to test CUDA
> actually a nvidia GTX 260 will be good enough for testing and is much
> cheaper.  the GTX 285 has a much higher internal memory bandwidth,
> which may help for some applications (should help with FFT) and but
> already costs double.  haven't checked yet for the GTX 295 (the dual
> GPU version of the 285) yet.

Thanks for the recommendations, but at least the very first round will
be a "borrowed" Tesla S1070 for a limited time for testing. We'll see
after that if we have any use for GPGPU hardware.


More information about the CP2K-user mailing list