[CP2K-user] [CP2K:11132] Re: van der Waals regtests fail on Intel KNL, and build glitches

Alfio Lazzaro alfio.... at gmail.com
Tue Jan 8 10:07:19 UTC 2019


OK, let's focus on this test then.
The message is not really useful. Could you try a single thread? I think 
18.0.5 should be fine, but I would suggest to start with -O0 run. Somehow 
it should run. Then we can use the output as a reference....Alfio

Alfio




Il giorno martedì 8 gennaio 2019 10:12:58 UTC+1, Ronald Cohen ha scritto:
>
> Thank you so much. I don’t have 18.03 installed. I was also having problem 
> with earlier versions, but did not document so carefully
> and did not run regtests. When I try my own job (not the regtest) with 
> non-local vdW PSMP just never converges and it segfaults on the stress 
> calculation.
>
>
>
> Here is the end of the argon07 test:
>
>  Leaving inner SCF loop after reaching     2 steps.
>
>
>   Electronic density on regular grids:        -32.0000000000       
>  0.0000000000
>   Core density on regular grids:               31.9999999977       
> -0.0000000023
>   Total charge density on r-space grids:       -0.0000000023
>   Total charge density g-space grids:          -0.0000000023
>
>   Overlap energy of the core charge distribution:               
> 0.00000000000000
>   Self energy of the core charge distribution:               
> -180.54066673528200
>   Core Hamiltonian energy:                                     
> 42.11893140752033
>   Hartree energy:                                             
>  68.47313966072379
>   Exchange-correlation energy:                               
>  -15.35702018709375
>   Dispersion energy:                                           
>  0.25474393253353
>
>   Total energy:                                               
> -85.05087192159809
>
>  *** WARNING in qs_scf.F:542 :: SCF run NOT converged ***
>
> forrtl: severe (174): SIGSEGV, segmentation fault occurred
> forrtl: severe (174): SIGSEGV, segmentation fault occurred
> forrtl: severe (174): SIGSEGV, segmentation fault occurred
> forrtl: severe (174): SIGSEGV, segmentation fault occurred
> forrtl: severe (174): SIGSEGV, segmentation fault occurred
> forrtl: severe (174): SIGSEGV, segmentation fault occurred
> forrtl: severe (174): SIGSEGV, segmentation fault occurred
> forrtl: severe (174): SIGSEGV, segmentation fault occurred
> EXIT CODE:  174  MEANING:  RUNTIME FAIL
> xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>
>
>
> ---
> Ron Cohen
> rec... at gmail.com <javascript:>
> skypename: ronaldcohen
> twitter: @recohen3
>
>
>
>
> On Jan 8, 2019, at 9:13 AM, Alfio Lazzaro <alfi... at gmail.com 
> <javascript:>> wrote:
>
> Hi Ron,
> Could you share one of the FAILED logs? For instance the log of 
> argon07.inp. The regtests script prints the last part of the log at the end 
> of its execution... My suspicious is that these failing tests are dying 
> because of a numerical assert in CP2K, so they can be included in the WRONG 
> category. Now, you are saying that the problem comes from PSMP build, so my 
> first try (very conservative) would be to use a single thread and see if it 
> works. Note that CP2K tests with 2 threads (while you are using 4). Another 
> possibility would be to avoid AVX512 vectorization (CP2K doesn't test it 
> yet). Also, I have just realized that CP2K doesn't test 18.0.5 for PSMP and 
> 19.x at all (see CP2K tests at https://dashboard.cp2k.org/ ). So, my 
> suggestion is to reproduce what it is already tested by CP2K. A good 
> starting point is this test 
>  
>
> https://www.cp2k.org/static/regtest/trunk/swan-skl28/CRAY-XC40-intel-mkl.psmp_18.0.3.222.out
>
> Alfio
>
>
> Il giorno lunedì 7 gennaio 2019 20:34:23 UTC+1, Ronald Cohen ha scritto:
>>
>> I did build also with precise but did not help. The values are very wrong 
>> , not slightly. Ron
>>
>> Sent from my iPhone
>>
>> On Jan 7, 2019, at 20:13, Anton Kudelin <arch... at gmail.com> wrote:
>>
>> Could you add "-fp-model precise" to CFLAGS and FCFLAGS? It won't fix 
>> 'RUNTIME FAIL', but could help with 'WRONG RESULT'.
>>
>> On Monday, January 7, 2019 at 9:06:28 PM UTC+3, Ronald Cohen wrote:
>>>
>>> So I tried:
>>>
>>> export KMP_STACKSIZE=512M
>>> rcohen at tomcat3:~/CP2K/cp2k$ ./tools/regtesting/do_regtest -arch 
>>> Linux-x86-64-intel -version psmp -restrictdir QS/regtest-dft-vdw-corr-1/ 
>>> -restrictdir QS/regtest-dft-vdw-corr-2/ -restrictdir 
>>> QS/regtest-dft-vdw-corr-3/ -restrictdir QS/regtest-dft-vdw-corr-3/ -nobuild 
>>> -mpiranks 4 -ompthreads 4 -maxtasks 16 |& tee testwith512MKMP_STACKSIZE.out 
>>> &
>>> and I still get:
>>>
>>> < 
>>> /home/rcohen/CP2K/cp2k/TEST-Linux-x86-64-intel-psmp-2019-01-07_18-24-16/tests/QS/regtest-dft-vdw-corr-3 
>>> (1 of 3) done in 775.00 sec
>>> >>> 
>>> /home/rcohen/CP2K/cp2k/TEST-Linux-x86-64-intel-psmp-2019-01-07_18-24-16/tests/QS/regtest-dft-vdw-corr-3
>>>     argon05.inp                                               
>>> -85.02462435591488  WRONG RESULT TEST 1 
>>>     argon06.inp                                               
>>> -85.18989253445228  WRONG RESULT TEST 1 
>>>     argon07.inp                                               
>>> -85.05087192159809         RUNTIME FAIL 
>>>     argon08.inp                                               
>>> -85.05201740647929         RUNTIME FAIL 
>>>     argon09.inp                                               
>>> -85.05086520280044         RUNTIME FAIL 
>>>     argon10.inp                                               
>>> -85.05070440200512         RUNTIME FAIL 
>>>     argon11.inp                                               
>>> -84.69892988333885         RUNTIME FAIL 
>>>     argon12.inp                                               
>>> -84.69900817368848         RUNTIME FAIL 
>>>     argon13.inp                                               
>>> -84.81306482759408  WRONG RESULT TEST 1 
>>>     argon14.inp                                               
>>> -84.69889654472566  WRONG RESULT TEST 1 
>>>     argon-beef.inp                                           
>>>  -42.46311172518392  WRONG RESULT TEST 1 
>>>     dftd3bj_t1.inp                                             
>>> -0.00355123783846     OK (   1.19 sec) 
>>>     dftd3bj_t2.inp                                             
>>> -0.05897356220363     OK (   2.20 sec) 
>>>     dftd3bj_t3.inp                                             
>>> -0.00112424003807     OK (   3.75 sec) 
>>>     dftd3bj_t4.inp                                               
>>>  -84.2983390350     OK (   3.86 sec) 
>>> <<< 
>>> /home/rcohen/CP2K/cp2k/TEST-Linux-x86-64-intel-psmp-2019-01-07_18-24-16/tests/QS/regtest-dft-vdw-corr-3 
>>> (1 of 3) done in 775.00 sec
>>> Starting regression tests in 
>>> /home/rcohen/CP2K/cp2k/TEST-Linux-x86-64-intel-psmp-2019-01-07_18-24-16/tests/QS/regtest-dft-vdw-corr-2 
>>> (2 of 3)
>>> Starting regression tests in 
>>> /home/rcohen/CP2K/cp2k/TEST-Linux-x86-64-intel-psmp-2019-01-07_18-24-16/tests/QS/regtest-dft-vdw-corr-2 
>>> (2 of 3)
>>>
>>>
>>> Almost all of the non vdw routines pass.
>>>
>>> Sincerely,
>>>
>>> Ron
>>>
>>> ---
>>> Ron Cohen
>>> rec... at gmail.com
>>> skypename: ronaldcohen
>>> twitter: @recohen3
>>>
>>>
>>>
>>>
>>> On Jan 7, 2019, at 6:12 PM, Robert Schade <robe... at uni-paderborn.de> 
>>> wrote:
>>>
>>> Signed PGP part
>>> Could you try setting KMP_STACKSIZE to something large in the terminal
>>> session with "export KMP_STACKSIZE=512m" before you rerun the regtests
>>> with your intel-psmp-binary that failed before?
>>> Please also make sure that the general stack size is not the problem
>>> by running "ulimt -s unlimited" in the same terminal where you want to
>>> execute the regtests.
>>> Best Wishes
>>> Robert
>>>
>>> On 07.01.19 18:00, Ronald Cohen wrote:
>>> > BTW, in case it was not clear. My Intel builds of POPT and PSMP
>>> > versions were error free. The problems were all run time.
>>> >
>>> > Ron
>>> >
>>> > --- Ron Cohen rec... at gmail.com <mailto:... at gmail.com>
>>> > skypename: ronaldcohen twitter: @recohen3
>>> >
>>> >
>>> >
>>> >
>>> >> On Jan 7, 2019, at 5:39 PM, Robert Schade
>>> >> <robe... at uni-paderborn.de
>>> >> <mailto:rob... at uni-paderborn.de>> wrote:
>>> >>
>>> >> Signed PGP part r is automatically private because it is the
>>> >> first iteration variable. Every drho(s, i) is only read and
>>> >> written in exactly one loop iteration. The statement
>>> >> "COLLAPSE(3)" collapses the three perfectly nested loops into one
>>> >> loop. So, IMHO, this code looks ok. Best Wishes Robert
>>> >>
>>> >>
>>> >> On 07.01.19 14:52, Ronald Cohen wrote:
>>> >>> Yes, I agree. I have tried the 2018.05 and the 2019.1 intel
>>> >>> compilers. The POPT version runs fine, but the PSMP version
>>> >>> fails in the vDW routines. I find things like: in
>>> >>> qs_dispersion_nonloc.F
>>> >>>
>>> >>> !$OMP PARALLEL DO DEFAULT(NONE)                      & !$OMP
>>> >>> SHARED(ispin,i,n,lo,drho,drho_r)   & !$OMP
>>> >>> PRIVATE(s) & !$OMP             COLLAPSE(3) DO r = 0, n(3)-1 DO
>>> >>> q = 0, n(2)-1 DO p = 0, n(1)-1 s = r*n(2)*n(1)+q*n(1)+p+1
>>> >>> drho(s, i) = drho(s, i)+drho_r(i, ispin)%pw%cr3d(p+lo(1), q
>>> >>> +lo(2), r+lo(3)) END DO END DO END DO !$OMP END PARALLEL DO END
>>> >>> DO END DO
>>> >>>
>>> >>> Doesn’t this have to be marked as a reduction? And shouldn’t r,
>>> >>> q, p be labeled private? Perhaps this is automatic, but I do
>>> >>> not see that said anywhere. Does gnu treat such differently
>>> >>> than intel? Just ideas.
>>> >>>
>>> >>> I am currently trying the toolchain, but it is building
>>> >>> everything from scratch, including blas, lapack, scalapack etc
>>> >>> etc, so will take days.
>>> >>>
>>> >>> Thank you for your help,
>>> >>>
>>> >>> Sincerely,
>>> >>>
>>> >>> Ron
>>> >>>
>>> >>> --- Ron Cohen rec... at gmail.com <mailto:... at gmail.com>
>>> >> <mailto:... at gmail.com>
>>> >>> skypename: ronaldcohen twitter: @recohen3
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>> On Jan 7, 2019, at 2:16 PM, Robert Schade
>>> >>>> <robe... at uni-paderborn.de
>>> >>>> <mailto:rob... at uni-paderborn.de>
>>> >>>> <mailto:rob... at uni-paderborn.de>> wrote:
>>> >>>>
>>> >>>> Building cp2k on Intel Xeon Phi Knights Landing (KNL, not to
>>> >>>> be confused with KNC!) is not different from building it on
>>> >>>> any other Intel CPU. Hence, I think that the failing regtests
>>> >>>> point to an underlying issue. Which exact version of the
>>> >>>> Intel Compiler and MKL have you tried? Best Wishes Robert
>>> >>>>
>>> >>>> On 06.01.19 01:59, Ronald Cohen wrote:
>>> >>>>> OK—sorry for all the noise. I am trying:
>>> >>>>> ./install_cp2k_toolchain.sh --with-elpa=install
>>> >>>>> --with-libint=install --with-gcc=install I hate not being
>>> >>>>> able to use my intel tools which work for me for everything
>>> >>>>> else just fine.
>>> >>>>>
>>> >>>>> Ron
>>> >>>>>
>>> >>>>
>>> >>>> -- Robert Schade Paderborn Center for Parallel Computing
>>> >>>> (PC2) University of Paderborn Warburger Str. 100 D-33098
>>> >>>> Paderborn Germany robe... at uni-paderborn.de
>>> >> <mailto:rob... at uni-paderborn.de>
>>> >>>> <mailto:rob... at uni-paderborn.de> +49/(0)5251/60-5393
>>> >>>>
>>> >>>> -- You received this message because you are subscribed to a
>>> >>>> topic in the Google Groups "cp2k" group. To unsubscribe from
>>> >>>> this topic, visit
>>> >>>> https://groups.google.com/d/topic/cp2k/gzmRqKNt62U/unsubscribe.
>>> >>
>>> >>>>
>>> >> To unsubscribe from this group and all its topics, send an email
>>> >>>> to cp2k+... at googlegroups.com
>>> >> <mailto:cp2... at googlegroups.com>. To post to this
>>> >> group, send
>>> >>>> email to cp... at googlegroups.com
>>> >>>> <mail... at googlegroups.com>.
>>> >> Visit this group at
>>> >>>> https://groups.google.com/group/cp2k. For more options,
>>> >>>> visit https://groups.google.com/d/optout.
>>> >>>
>>> >>> -- You received this message because you are subscribed to the
>>> >>> Google Groups "cp2k" group. To unsubscribe from this group and
>>> >>> stop receiving emails from it, send an email to
>>> >>> cp2k+... at googlegroups.com
>>> >> <mailto:cp2... at googlegroups.com>
>>> >>> <mailto:cp2... at googlegroups.com>. To post to this
>>> >>> group, send email to cp... at googlegroups.com
>>> >>> <mail... at googlegroups.com> <mail... at googlegroups.com>.
>>> >>> Visit this group at https://groups.google.com/group/cp2k. For
>>> >>> more options, visit https://groups.google.com/d/optout.
>>> >>
>>> >> -- Robert Schade Paderborn Center for Parallel Computing (PC2)
>>> >> University of Paderborn Warburger Str. 100 D-33098 Paderborn
>>> >> Germany robe... at uni-paderborn.de
>>> >> <mailto:rob... at uni-paderborn.de> +49/(0)5251/60-5393
>>> >>
>>> >
>>> > -- You received this message because you are subscribed to the
>>> > Google Groups "cp2k" group. To unsubscribe from this group and stop
>>> > receiving emails from it, send an email to
>>> > cp2k+... at googlegroups.com
>>> > <mailto:cp2... at googlegroups.com>. To post to this group,
>>> > send email to cp... at googlegroups.com
>>> > <mail... at googlegroups.com>. Visit this group at
>>> > https://groups.google.com/group/cp2k. For more options, visit
>>> > https://groups.google.com/d/optout.
>>>
>>> --
>>> Robert Schade
>>> Paderborn Center for Parallel Computing (PC2)
>>> University of Paderborn
>>> Warburger Str. 100
>>> D-33098 Paderborn
>>> Germany
>>> robe... at uni-paderborn.de
>>> +49/(0)5251/60-5393
>>>
>>>
>>>
>> -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "cp2k" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/cp2k/gzmRqKNt62U/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> cp2k+... at googlegroups.com.
>> To post to this group, send email to cp... at googlegroups.com.
>> Visit this group at https://groups.google.com/group/cp2k.
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
> -- 
> You received this message because you are subscribed to a topic in the 
> Google Groups "cp2k" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/cp2k/gzmRqKNt62U/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> cp2k+... at googlegroups.com <javascript:>.
> To post to this group, send email to cp... at googlegroups.com <javascript:>.
> Visit this group at https://groups.google.com/group/cp2k.
> For more options, visit https://groups.google.com/d/optout.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20190108/a6d6f8ee/attachment.htm>


More information about the CP2K-user mailing list