[CP2K:8623] proplem with cp2k built with intelmpi

Iain Bethune i.be... at epcc.ed.ac.uk
Wed Feb 1 22:58:07 UTC 2017


On my intel build (same as used for the dashboard testing), the input runs fine, at least until the first SCF step when I stopped it.  The fact that it also works for you with a gfortran build also points to some local build configuration issue.  One thing to try is a serial build with -O0 and reference BLAS/LAPACK as this essentially rules out any problems with the compiler/MKL.  Obviously it will be very slow, but if you can get the calculation up to the point where it prints the first "Electronic density on regular grids” that is enough.  Assuming this works you can then start increasing the optimisation and adding in MKL, MPI etc. and see what causes the failure.

- Iain


--

Iain Bethune
Project Manager, EPCC

Email: i.be... at epcc.ed.ac.uk
Twitter: @IainBethune @PrimeGrid @CP2Kproject
Web: http://www2.epcc.ed.ac.uk/~ibethune
Tel/Fax: +44 (0)131 651 7183/6555
Mob: +44 (0)7598317015
Addr: 2404 JCMB, The King's Buildings, Peter Guthrie Tait Road, Edinburgh, EH9 3FD

> On 1 Feb 2017, at 14:11, Christopher Knight <cjknig... at gmail.com> wrote:
> 
> Have you already checked whether the issue goes away using "PREFERRED_FFT_LIBRARY FFTSG”?
> 
> I’ve noticed this issue as well on KNL with cp2k-4.1 and Intel 2017.1.132. Using FFTSG appeared to fix the issue for me, but I haven’t debugged further yet (deadlines…).
> 
> chris
> 
> 
> 
>> On Feb 1, 2017, at 7:31 AM, Mariella Ippolito <mariella... at gmail.com> wrote:
>> 
>> Dear Iain,
>> I built again the code (trunk version) using MKL 2017.0.098 and your arch file: unfortunately I obtain the same results (Electronic density on regular grids = NaN).
>> I read that some other users experienced similar problem but in their case the use MKL 2017.0.098 seem to solve the problem.
>> Some other suggestion?
>> 
>> Thank you,
>> Mariella
>> 
>> 
>> On Wednesday, February 1, 2017 at 11:53:15 AM UTC+1, Mariella Ippolito wrote:
>> Dear Iain, 
>> Thank you for your quick answer! 
>> I also thought that the problem was related to the mkl library, so I have tried to build cp2k using Scalapack, lapack and blas libraries, and I also reduced the optimization trying both O1 O0, but that job continues to give problems.
>> I try again using the previous version of mkl and your arch file.
>> At the moment I'm using the branch version of cp2k, do you suggest to use the trunk? 
>> 
>> Thank you,
>> Mariella
>> 
>> 
>> On Wednesday, February 1, 2017 at 9:47:09 AM UTC+1, IBethune wrote:
>> Dear Mariella, 
>> 
>> As per some recent discussions about Intel 2017 on this discussion forum, it looks like some bug(s) existing in MKL 2017.1.143.  The compiler and MPI library in this release appear to be OK, but you will need to use a previous MKL version.  I don’t know if you have had successful Intel builds before, but there are several files which need to be compiled a lower optimisation to work around compiler.  There are a set of arch files which are known to be working with the CP2K trunk available via the CP2K dashboard - see e.g. http://cp2k-www.epcc.ed.ac.uk/phi/psmp/regtest-arch (linked from http://dashboard.cp2k.org 
>> 
>> Cheers 
>> 
>> - Iain 
>> 
>> -- 
>> 
>> Iain Bethune 
>> Project Manager, EPCC 
>> 
>> Email: i.b... at epcc.ed.ac.uk 
>> Twitter: @IainBethune @PrimeGrid @CP2Kproject 
>> Web: http://www2.epcc.ed.ac.uk/~ibethune 
>> Tel/Fax: +44 (0)131 651 7183/6555 
>> Mob: +44 (0)7598317015 
>> Addr: 2404 JCMB, The King's Buildings, Peter Guthrie Tait Road, Edinburgh, EH9 3FD 
>> 
>> > On 1 Feb 2017, at 08:40, Mariella Ippolito <marie... at gmail.com> wrote: 
>> > 
>> > Dear all, 
>> > I find some problems in running qs calculations with cp2K 4.1 compiled with intelmpi-2017 (the same run goes fine with the executable obtained with openmpi-gnu compiler). 
>> > In particular in output I obtain 
>> > 
>> > ----------------------------------- OT --------------------------------------- 
>> > 
>> >   Step     Update method      Time    Convergence         Total energy    Change 
>> >   ------------------------------------------------------------------------------ 
>> > 
>> >   Trace(PS):                                 1200.0000000051 
>> >   Electronic density on regular grids:                   NaN                 NaN 
>> >   Core density on regular grids:             1200.0000000000       -0.0000000000 
>> >   Total charge density on r-space grids:                 NaN 
>> >   Total charge density g-space grids:          -5.8357006210 
>> > 
>> > Unlike the code compiled with openmpi-gnu gives: 
>> > 
>> >  ----------------------------------- OT --------------------------------------- 
>> > 
>> >   Step     Update method      Time    Convergence         Total energy    Change 
>> >   ------------------------------------------------------------------------------ 
>> > 
>> >   Trace(PS):                                 1199.9999998902 
>> >   Electronic density on regular grids:      -1199.9999998901        0.0000001099 
>> >   Core density on regular grids:             1199.9999999999       -0.0000000001 
>> >   Total charge density on r-space grids:        0.0000001098 
>> >   Total charge density g-space grids:           0.0000001099 
>> > 
>> > Clearly there is something wrong with the quantities 
>> > Electronic density on regular grids 
>> > Total charge density on r-space grids 
>> > 
>> > Looking at the source code I find that the problem may come from the quantities tot_tho_r and tot_rho_r_arr, in qs_ks_utils.F 
>> > 
>> > Line 855 in qs_ks_utils.F 
>> > CALL qs_rho_get(rho, tot_rho_r=tot_rho_r_arr, rho_ao_kp=rho_ao) 
>> > 
>> > If I print tot_rho_r_arr after this call I obtain NaN for both its components 
>> > and as consequences also 
>> > tot_rho_r = accurate_sum(tot_rho_r_arr) 
>> > is NaN 
>> >  while if I run the executable gnu it gives the right value 
>> > tot_rho_r = accurate_sum(tot_rho_r_arr) =- 1199.99999989 
>> > 
>> > I attach the restart file used for the calculations. 
>> > 
>> > Can you help me to fix this problem? 
>> > 
>> > Best regards, 
>> > Mariella 
>> > 
>> > -- 
>> > You received this message because you are subscribed to the Google Groups "cp2k" group. 
>> > To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+... at googlegroups.com. 
>> > To post to this group, send email to cp... at googlegroups.com. 
>> > Visit this group at https://groups.google.com/group/cp2k. 
>> > For more options, visit https://groups.google.com/d/optout. 
>> > <md.restart> 
>> 
>> 
>> -- 
>> The University of Edinburgh is a charitable body, registered in 
>> Scotland, with registration number SC005336. 
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups "cp2k" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+uns... at googlegroups.com.
>> To post to this group, send email to cp... at googlegroups.com.
>> Visit this group at https://groups.google.com/group/cp2k.
>> For more options, visit https://groups.google.com/d/optout.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups "cp2k" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+uns... at googlegroups.com.
> To post to this group, send email to cp... at googlegroups.com.
> Visit this group at https://groups.google.com/group/cp2k.
> For more options, visit https://groups.google.com/d/optout.


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.




More information about the CP2K-user mailing list