[CP2K-user] [CP2K:20791] Re: compilation problems - LHS and RHS of an assignment statement have incompatible types

Frederick Stein f.stein at hzdr.de
Mon Oct 21 06:58:33 UTC 2024


Dear Bartosz,
I have no idea about the issue with LibXSMM.
Regarding the trace, I do not know either as there is not much that could 
break in pw_derive (it just performs multiplications) and the sequence of 
operations is to unspecific. It may be that the code actually breaks 
somewhere else. Can you do the same with the ssmp and post the last 100 
lines? This way, we remove the asynchronicity issues for backtraces with 
the psmp version.
Best,
Frederick

bartosz mazur schrieb am Sonntag, 20. Oktober 2024 um 16:47:15 UTC+2:

> The error is:
>
> ```
> LIBXSMM_VERSION: develop-1.17-3834 (25693946)
> CLX/DP      TRY    JIT    STA    COL
>    0..13      2      2      0      0
>   14..23      0      0      0      0
>
>   24..64      0      0      0      0
> Registry and code: 13 MB + 16 KB (gemm=2)
> Command (PID=2607388): 
> /lustre/pd01/hpc-kuchta-1716987452/software/cp2k/exe/local/cp2k.psmp -i 
> H2O-9.inp -o H2O-9.out
> Uptime: 5.288243 s
>
>
>
> ===================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   RANK 0 PID 2607388 RUNNING AT r21c01b10
>
> =   KILLED BY SIGNAL: 11 (Segmentation fault)
>
> ===================================================================================
>
>
> ===================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   RANK 1 PID 2607389 RUNNING AT r21c01b10
> =   KILLED BY SIGNAL: 9 (Killed)
>
> ===================================================================================
> ```
>
> and the last 20 lines:
>
> ```
>  000000:000002<<                                  13     76 pw_copy       
> 0.001
>  Hostmem: 693 MB GPUmem: 0 MB
>  000000:000002>>                                  13     19 pw_derive     
>   star
>  t Hostmem: 693 MB GPUmem: 0 MB
>  000000:000002<<                                  13     19 pw_derive     
>   0.00
>  2 Hostmem: 693 MB GPUmem: 0 MB
>  000000:000002>>                                  13    168 
> pw_pool_create_pw
>      start Hostmem: 693 MB GPUmem: 0 MB
>  000000:000002>>                                     14     97 
> pw_create_c1d
>     start Hostmem: 693 MB GPUmem: 0 MB
>  000000:000002<<                                     14     97 
> pw_create_c1d
>     0.000 Hostmem: 693 MB GPUmem: 0 MB
>  000000:000002<<                                  13    168 
> pw_pool_create_pw
>      0.000 Hostmem: 693 MB GPUmem: 0 MB
>  000000:000002>>                                  13     77 pw_copy       
> start
>  Hostmem: 693 MB GPUmem: 0 MB
>  000000:000002<<                                  13     77 pw_copy       
> 0.001
>  Hostmem: 693 MB GPUmem: 0 MB
>  000000:000002>>                                  13     20 pw_derive     
>   star
>  t Hostmem: 693 MB GPUmem: 0 MB
> ```
>
> Thanks!
> piątek, 18 października 2024 o 17:18:39 UTC+2 Frederick Stein napisał(a):
>
>> Please pick one of the failing tests. Then, add the TRACE keyword to the 
>> &GLOBAL section and then run the test manually. This increases the size of 
>> the output file dramatically (to some million lines). Can you send me the 
>> last ~20 lines of the output?
>> bartosz mazur schrieb am Freitag, 18. Oktober 2024 um 17:09:40 UTC+2:
>>
>>> I'm using do_regtests.py script, not make regtesting, but I assume it 
>>> makes no difference. As I mentioned in previous message for `--ompthreads 
>>> 1` all tests were passed both for ssmp and psmp. For ssmp with 
>>> `--ompthreads 2` I observe similar errors as for psmp with the same 
>>> setting, I provide example output as attachment. 
>>>
>>> Thanks
>>> Bartosz
>>>
>>> piątek, 18 października 2024 o 16:24:16 UTC+2 Frederick Stein napisał(a):
>>>
>>>> Dear Bartosz,
>>>> What happens if you set the number of OpenMP threads to 1 (add 
>>>> '--ompthreads 1' to TESTOPTS)? What errors do you observe in case of the 
>>>> ssmp?
>>>> Best,
>>>> Frederick
>>>>
>>>> bartosz mazur schrieb am Freitag, 18. Oktober 2024 um 15:37:43 UTC+2:
>>>>
>>>>> Hi Frederick,
>>>>>
>>>>> thanks again for help. So I have tested different simulation variants 
>>>>> and I know that the problem occurs when using OMP. For MPI calculations 
>>>>> without OMP all tests pass. I have also tested the effect of the `OMP_PROC_BIND` 
>>>>> and `OMP_PLACES` parameters and apart from the effect on simulation 
>>>>> time, they have no significant effect on the presence of errors. Below are 
>>>>> the results for ssmp:
>>>>>
>>>>> ```
>>>>> OMP_PROC_BIND, OMP_PLACES, correct, total, wrong, failed, time 
>>>>> spread, threads, 3850, 4144, 4, 290, 186min
>>>>> spread, cores, 3831, 4144, 3, 310, 183min
>>>>> spread, sockets, 3864, 4144, 3, 277, 104min
>>>>> close, threads, 3879, 4144, 3, 262, 171min
>>>>> close, cores, 3854, 4144, 0, 290, 168min
>>>>> close, sockets, 3865, 4144, 3, 276, 104min
>>>>> master, threads, 4121, 4144, 0, 23, 1002min
>>>>> master, cores, 4121, 4144, 0, 23, 986min
>>>>> master, sockets, 3942, 4144, 3, 199, 219min
>>>>> false, threads, 3918, 4144, 0, 226, 178min
>>>>> false, cores, 3919, 4144, 3, 222, 176min
>>>>> false, sockets, 3856, 4144, 4, 284, 104min
>>>>> ```
>>>>>
>>>>> and psmp:
>>>>>
>>>>> ```
>>>>> OMP_PROC_BIND, OMP_PLACES, results
>>>>> spread, threads, Summary: correct: 4097 / 4227; failed: 130; 495min
>>>>> spread, cores, 26 / 362
>>>>> spread, cores, 26 / 362
>>>>> close, threads, Summary: correct: 4133 / 4227; failed: 94; 484min
>>>>> close, cores, 60 / 362
>>>>> close, sockets, 13 / 362
>>>>> master, threads, 13 / 362
>>>>> master, cores, 79 / 362
>>>>> master, sockets, Summary: correct: 4153 / 4227; failed: 74; 563min
>>>>> false, threads, Summary: correct: 4153 / 4227; failed: 74; 556min
>>>>> false, cores, Summary: correct: 4106 / 4227; failed: 121; 511min
>>>>> false, sockets, 96 / 362
>>>>> not specified, not specified, Summary: correct: 4129 / 4227; failed: 
>>>>> 98; 263min
>>>>> ```
>>>>>
>>>>> Any ideas what I could do next to have more information about the 
>>>>> source of the problem or maybe you see a potential solution at this stage? 
>>>>> I would appreciate any further help. 
>>>>>
>>>>> Best
>>>>> Bartosz
>>>>>
>>>>>
>>>>> piątek, 11 października 2024 o 14:30:25 UTC+2 Frederick Stein 
>>>>> napisał(a):
>>>>>
>>>>>> Dear Bartosz,
>>>>>> If I am not mistaken, you used 8 OpenMP threads. The test do not run 
>>>>>> that efficiently with such a large number of threads. 2 should be 
>>>>>> sufficient.
>>>>>> The test result suggests that most of the functionality may work but 
>>>>>> due to a missing backtrace (or similar information), it is hard to tell why 
>>>>>> they fail. You could also try to run some of the single-node tests to 
>>>>>> assess the stability of CP2K.
>>>>>> Best,
>>>>>> Frederick
>>>>>>
>>>>>> bartosz mazur schrieb am Freitag, 11. Oktober 2024 um 13:48:42 UTC+2:
>>>>>>
>>>>>>> Sorry, forgot attachments.
>>>>>>>
>>>>>>>

-- 
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+unsubscribe at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/73ca2ac6-0c07-403c-b08e-d1d44cfcfdddn%40googlegroups.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20241020/1d8db44a/attachment-0001.htm>


More information about the CP2K-user mailing list