[CP2K:1783] Re: FFTSGL compile
Ondrej Marsalek
ondrej.... at gmail.com
Sat Feb 21 21:39:49 UTC 2009
Hi,
as I said in another thread, I have build a single precision version of
CP2K. It seems to (mostly) work, there were some minor issues in the
tests, but I have not explored them. I am only interested in this
because of the potential to use CUDA FFT. So far, I have linked against
an SP build of FFTW (version 3.2.1).
I have one question, though. The SP build seems to be a bit slower than
a DP build, on identical input and a Core2 processor. Is this to be
expected? Is it because SP is no faster than DP on such a CPU and there
is the additional overhead of conversion?
I should have a Tesla available soon and I'll be happy to test CUDA
support and see the performance. I hope that there will be a clear gain
for a cluster system (meaning a lot of "empty space").
By the way, does anyone have any performance comparisons with and
without CUDA used for FFT?
Best,
Ondrej
On Tue, Feb 17, 2009 at 18:49, Ben Levine <ben.l... at gmail.com> wrote:
>
> For completeness: Iain Bethune submitted a patch which fixes my
> problem. I'm now looking into the CUDA compile to see if it's
> working. Thanks Iain!
>
> On Feb 9, 6:23 pm, Ben Levine <ben.l... at gmail.com> wrote:
>> Okay, well, I seem to have found the problem. Once I get things
>> cleaned up and tested I'll send a patch.
>>
>> On Feb 5, 5:38 pm, Ben Levine <ben.l... at gmail.com> wrote:
>>
>> > Hi Guys,
>> > As mentioned in another thread, I'm once again working with the CUDA
>> > capable version of CP2K. Unfortunately, it's been a long time since I
>> > last ran it and I'm having some difficulties. I'm working with the
>> > most recent version out of CVS. I compiled a serial version of the
>> > code successfully with -D__FFTSGL (with or without -D__CUDA).
>> > However, when I run the executables my jobs die with a seg fault after
>> > printing the line:
>>
>> > GLOBAL| This output is from
>> > process 0
>>
>> > I'm using the benchmark jobs from cp2k/tests/benchmarks as test runs
>> > (specifically H2O-64.inp and H20-512.inp). I have reproduced this
>> > error on two machines, though I use a very similar arch file on both.
>> > I've included one below. Simply removing -D__FFTSGL yeilds a fully
>> > functioning double precision executable. I wonder if anyone has an
>> > idea what is the problem, and if others can reproduce this problem.
>> > Thanks for your time!
>>
>> > Ben
>>
>> > # by default some intel compilers put temporaries on the stack
>> > # this might lead to segmentation faults is the stack limit is set to
>> > low
>> > # stack limits can be increased by sysadmins or e.g with ulimit -s
>> > 256000
>> > # furthermore new ifort (10.0?) compilers support the option
>> > # -heap-arrays 64
>> > # add this to the compilation flags is the other options do not work
>> > # The following settings worked for:
>> > # - AMD64 Opteron
>> > # - SUSE Linux Enterprise SIerver 10.0 (x86_64)
>> > # - Intel(R) Fortran Compiler for Intel(R) EM64T-based applications,
>> > Version 10.0
>> > # - AMD acml library version 3.6.0
>> > # - MPICH2-1.0.5p4
>> > # - FFTW 3.1.2
>> > #
>> > PERL = perl
>> > CC = gcc
>> > CPP = cpp
>> > FC = /opt/intel/fce/10.0.025/bin/ifort -FR
>> > LD = /opt/intel/fce/10.0.025/bin/ifort -i-static -openmp
>> > AR = ar -r
>> > #DFLAGS = -D__INTEL -D__FFTMKL -D__FFTSG
>> > DFLAGS = -D__INTEL -D__FFTSG -D__FFTSGL -D__FFTW3
>> > CFLAGS = -O2
>> > CPPFLAGS = -traditional -C $(DFLAGS) -P -I/opt/intel/mkl/10.0.1.014/
>> > include/fftw -I/opt/intel/mkl/10.0.1.014/include/
>> > FCFLAGS = $(DFLAGS) -O2 -xW
>> > MKLPATH = /opt/intel/mkl/10.0.1.014/lib/em64t/
>> > LDFLAGS = $(FCFLAGS)
>> > LIBS = -L$(MKLPATH)\
>> > $(MKLPATH)/libmkl_em64t.a\
>> > $(MKLPATH)/libmkl_lapack.a\
>> > $(MKLPATH)/libguide.a\
>> > /usr/local/lib/libfftw3f.a\
>> > -lpthread
>>
>> > OBJECTS_ARCHITECTURE = machine_intel.o
> >
>
More information about the CP2K-user
mailing list