[CP2K-user] [CP2K:20905] Re: compilation problems - LHS and RHS of an assignment statement have incompatible types

Frederick Stein f.stein at hzdr.de
Wed Nov 20 15:28:02 UTC 2024


Dear Bartosz,
Without actual CP2K input or output files, I can only guess. The first 
Slurm output states "No space left on device", the others "Cannot allocate 
memory". This suggests that there is either not enough memory on the 
harddrive available (Do you have any additional CP2K output files from each 
respective rank?). The others that you do not have enough RAM available. 
You can try to run CP2K with less MPI ranks and more OpenMP ranks. This 
reduces the number of additional temporary output files and reduces the 
memory footprint in RAM but increases the the runtime.
Best,
Frederick

bartosz mazur schrieb am Mittwoch, 20. November 2024 um 16:01:01 UTC+1:

> Hi Frederic, 
>
> I am writing this as a follow up to previous discussions. I am currently 
> seeing a recurring problem with CP2K, where tasks are being killed after 
> about 10 days with errors as in the attached outputs. This is not 
> particularly annoying, as a restart is sufficient and the simulation can 
> run on. Unfortunately, I don't think you will be able to reproduce this 
> error, given the very long simulation time. However, if there is anything 
> else I can provide to help understand the source of these problems, let me 
> know. 
>
> Best
> Bartosz
>
> poniedziałek, 28 października 2024 o 09:34:45 UTC+1 bartosz mazur 
> napisał(a):
>
>> Many thanks Frederick for your help! 
>>
>> piątek, 25 października 2024 o 14:27:36 UTC+2 Frederick Stein napisał(a):
>>
>>> Regarding the other issues:
>>> I can confirm them but cannot provide fixes for all of them because the 
>>> probably trigger bugs in ifort. Because ifort is already deprecated, these 
>>> bugs will probably not be fixed. Furthermore, we do not see any issues on 
>>> our Intel CI. I will fix what I can but some of them will be left as we 
>>> will focus our efforts on the support of the new ifx compiler.
>>>
>>> Frederick Stein schrieb am Freitag, 25. Oktober 2024 um 11:46:00 UTC+2:
>>>
>>>> Dear Bartosz, 
>>>> I will check the other issues with your regtests.
>>>> Regarding your latest issue, please provide more information such as an 
>>>> output file or a hint on the context. If I am supposed to retry the 
>>>> calculation on my local machine, I need all additional input files such as 
>>>> your plumed file. I can run your input file up to the point that CP2K needs 
>>>> plumed.
>>>> Best,
>>>> Frederick
>>>> bartosz mazur schrieb am Freitag, 25. Oktober 2024 um 10:15:19 UTC+2:
>>>>
>>>>> I just got another error with LibXSMM, now in my regular simulation 
>>>>> and without using OpenMP. This is the error:
>>>>>
>>>>> ```
>>>>> [1729843139.920274] [r23c01b04:2913 :0]           ib_md.c:295  UCX 
>>>>>  ERROR ibv_reg_mr(address=0x14f0b46fc080, length=7424, access=0xf) failed: 
>>>>> Cannot allocate memory
>>>>> [1729843139.920290] [r23c01b04:2913 :0]          ucp_mm.c:70   UCX 
>>>>>  ERROR failed to register address 0x14f0b46fc080 (host) length 7424 on 
>>>>> md[4]=mlx5_0: Input/output error (md supports: host)
>>>>>
>>>>> LIBXSMM_VERSION: develop-1.17-3834 (25693946)[1729843139.932647] 
>>>>> [r23c01b04:2945 :0]           ib_md.c:295  UCX  ERROR 
>>>>> ibv_reg_mr(address=0x1491f069e040, length=8128, access=0xf) failed: Cannot 
>>>>> allocate memory
>>>>> [1729843139.932660] [r23c01b04:2945 :0]          ucp_mm.c:70   UCX 
>>>>>  ERROR failed to register address 0x1491f069e040 (host) length 8128 on 
>>>>> md[4]=mlx5_0: Input/output error (md supports: host)
>>>>>
>>>>>
>>>>> CLX/DP      TRY    JIT    STA    COL
>>>>>    0..13      4      4      0      0
>>>>>   14..23      4      4      0      0
>>>>>
>>>>>   24..64      0      0      0      0
>>>>> Registry and code: 13 MB + 80 KB (gemm=8)
>>>>> Command (PID=2913): 
>>>>> /lustre/pd01/hpc-kuchta-1716987452/software/cp2k/exe/local/cp2k.psmp -i 
>>>>> cp2k.inp -o cp2k.out
>>>>> Uptime: 407633.177169 s
>>>>> ```
>>>>>
>>>>> and this is simulation input I'm using:
>>>>>
>>>>> ```
>>>>> &GLOBAL
>>>>>   PROJECT uam1o_npt_rms
>>>>>   RUN_TYPE MD
>>>>>   PRINT_LEVEL LOW
>>>>>   PREFERRED_DIAG_LIBRARY SCALAPACK
>>>>> &END GLOBAL
>>>>>
>>>>> &FORCE_EVAL
>>>>>   METHOD QUICKSTEP
>>>>>   STRESS_TENSOR ANALYTICAL
>>>>>   &DFT
>>>>>     BASIS_SET_FILE_NAME BASIS_MOLOPT_UZH
>>>>>     POTENTIAL_FILE_NAME POTENTIAL_UZH
>>>>>     &MGRID
>>>>>       CUTOFF 500
>>>>>     &END MGRID
>>>>>     &XC
>>>>>       &XC_FUNCTIONAL PBE
>>>>>       &END XC_FUNCTIONAL
>>>>>       &VDW_POTENTIAL
>>>>>         POTENTIAL_TYPE PAIR_POTENTIAL
>>>>>         &PAIR_POTENTIAL
>>>>>           TYPE  DFTD3(BJ)
>>>>>           PARAMETER_FILE_NAME  dftd3.dat
>>>>>           REFERENCE_FUNCTIONAL PBE
>>>>>           R_CUTOFF  25.0
>>>>>         &END PAIR_POTENTIAL
>>>>>       &END VDW_POTENTIAL
>>>>>     &END XC
>>>>>   &END DFT
>>>>>
>>>>>   &SUBSYS
>>>>>     &CELL
>>>>>       A      12.2807999       0.0000000       0.0000000
>>>>>       B       7.6258602       9.6257200       0.0000000
>>>>>       C      -2.1557724      -1.0420258      18.0042801
>>>>>     &END CELL
>>>>>     &COORD
>>>>>       Zn      11.37811      4.60286      0.24515
>>>>>       Zn       8.15435      3.05288      8.74518
>>>>>       Zn       6.37590      3.97311     17.74650
>>>>>       Zn       9.59842      5.54014      9.24747
>>>>>       S       11.79344      6.72692     17.10850
>>>>>       S        4.06825      3.00573      9.90358
>>>>>       S        5.95830      1.84422      0.90027
>>>>>       S       13.67407      5.58944      8.10767
>>>>>       O       10.72408      3.58291      1.89315
>>>>>       O        8.51986      4.01962      1.53085
>>>>>       O        6.60135      3.91587      7.68572
>>>>>       O        7.74637      5.79259      8.21600
>>>>>       O       15.32810      8.58246      5.10041
>>>>>       O        9.35608      2.93551      7.09500
>>>>>       O       10.38999      4.93007      7.45977
>>>>>       O       11.66491      6.35111      1.31266
>>>>>       O        9.48582      6.62478      0.77364
>>>>>       O        2.59062      2.40094      3.91496
>>>>>       O        7.03031      4.99173     16.09885
>>>>>       O        9.23544      4.56122     16.46252
>>>>>       O       11.14602      4.67776     10.31440
>>>>>       O       10.00982      2.79915      9.77218
>>>>>       O        2.41388      0.01898     12.91899
>>>>>       O        8.39375      5.66143     10.89628
>>>>>       O        7.36998      3.66087     10.53589
>>>>>       O        6.08863      2.22161     16.68336
>>>>>       O        8.26988      1.95313     17.21650
>>>>>       O       15.16937      6.16381     14.09906
>>>>>       N       13.25907      3.80728      0.04001
>>>>>       N        2.36335     -0.74130     17.33402
>>>>>       N        7.60676      1.08576      8.95623
>>>>>       N       15.77729      5.75974      9.67861
>>>>>       N        4.49430      4.76652     17.95756
>>>>>       N       15.38873      9.31230      0.67467
>>>>>       N       10.14308      7.50848      9.04236
>>>>>       N        1.96529      2.83557      8.33233
>>>>>       C        6.76554      5.18292      7.68414
>>>>>       C       14.28210      4.11624      0.86006
>>>>>       C        9.47998      3.39622      2.09658
>>>>>       C        3.20112      3.42080      0.84626
>>>>>       C        9.91466      1.18589      3.17244
>>>>>       C        9.08210      2.29987      3.02657
>>>>>       C        5.74710      6.04945      7.01821
>>>>>       C        7.83265      2.30920      3.66005
>>>>>       C        3.35793      2.34328     -0.04029
>>>>>       C        4.51663      1.46385     -0.02755
>>>>>       C       16.24194      7.75266      5.73606
>>>>>       C        4.78940      5.52817      6.14198
>>>>>       C        7.40810      1.21174      4.39947
>>>>>       C       16.18016      6.38244      5.49010
>>>>>       C        9.48869      0.06986      3.88005
>>>>>       C       11.27238      1.77457     17.14330
>>>>>       C        5.77166      7.43009      7.27236
>>>>>       C       11.14819      8.24901     17.58588
>>>>>       C        8.22170      0.08058      4.47135
>>>>>       C        0.15087      1.02286     17.07544
>>>>>       C       17.16180      8.28565      6.64351
>>>>>       C       10.57067      7.01060      1.31282
>>>>>       C        6.72654      0.47459      8.14002
>>>>>       C       10.27972      3.79035      6.89470
>>>>>       C       14.15006      8.72843      8.15880
>>>>>       C       11.73751      2.06868      5.82537
>>>>>       C       11.38838      3.41515      5.96966
>>>>>       C       10.52304      8.34339      1.98566
>>>>>       C       12.16584      4.39562      5.33967
>>>>>       C       14.89762      7.93801      9.04648
>>>>>       C       14.86698      6.48365      9.03575
>>>>>       C        2.67167      1.17044      3.27681
>>>>>       C       11.52468      8.76552      2.86608
>>>>>       C       13.29140      4.04007      4.60622
>>>>>       C        3.78230      0.36534      3.52266
>>>>>       C       12.87823      1.70260      5.12344
>>>>>       C        8.27761      0.34001      9.85941
>>>>>       C        9.42677      9.18364      1.73295
>>>>>       C        3.27553      4.45658      9.42657
>>>>>       C       13.66559      2.69775      4.53650
>>>>>       C       15.77023      8.59069      9.93240
>>>>>       C        1.68356      0.78491      2.36643
>>>>>       C       10.98451      3.41041     10.31327
>>>>>       C        3.46873      4.45681     17.14097
>>>>>       C        8.27403      5.18373     15.89814
>>>>>       C       14.54907      5.15099     17.15930
>>>>>       C        7.83119      7.39584     14.82858
>>>>>       C        8.66916      6.28563     14.97331
>>>>>       C       11.99928      2.54577     10.98702
>>>>>       C        9.92072      6.28547     14.34388
>>>>>       C       16.54982      7.26986      0.04271
>>>>>       C       15.39103      8.14919      0.03189
>>>>>       C        1.50023      0.84646     12.27989
>>>>>       C       12.95126      3.06908     11.86817
>>>>>       C       10.34198      7.38826     13.61070
>>>>>       C        1.55836      2.21699     12.52561
>>>>>       C        8.25354      8.51697     14.12666
>>>>>       C        6.48249      6.79770      0.85630
>>>>>       C       11.97760      1.16465     10.73446
>>>>>       C        6.60385      0.32218      0.42301
>>>>>       C        9.52282      8.51550     13.54043
>>>>>       C       17.60321      7.54791      0.92891
>>>>>       C        0.58530      0.31102     11.36884
>>>>>       C        7.18362      1.56332     16.68291
>>>>>       C       11.01926      8.11905      9.86341
>>>>>       C        7.47582      4.80132     11.10039
>>>>>       C        3.59282     -0.13430      9.84955
>>>>>       C        6.01179      6.51430     12.17471
>>>>>       C        6.36853      5.17005     12.02942
>>>>>       C        7.23131      0.22715     16.01652
>>>>>       C        5.59963      4.18477     12.66234
>>>>>       C        2.84614      0.65728      8.96213
>>>>>       C        2.87561      2.11161      8.97508
>>>>>       C       15.08536      7.39548     14.73440
>>>>>       C        6.23001     -0.19920     15.13769
>>>>>       C        4.47482      4.53325     13.40042
>>>>>       C       13.97400      8.19851     14.48576
>>>>>       C        4.87173      6.87322     12.88120
>>>>>       C        9.47231      8.25578      8.14046
>>>>>       C        8.32790     -0.61137     16.27301
>>>>>       C       14.46698      4.13864      8.58475
>>>>>       C        4.09294      5.87331     13.47165
>>>>>       C        1.97640      0.00563      8.07267
>>>>>       C       16.07240      7.78504     15.64417
>>>>>       H       14.10215      4.93465      1.55678
>>>>>       H        3.98110      3.68721      1.55899
>>>>>       H       10.89072      1.19647      2.69205
>>>>>       H        7.19958      3.19021      3.56839
>>>>>       H        4.75923      4.45384      5.96230
>>>>>       H        6.45299      1.21835      4.92062
>>>>>       H       15.44211      6.00062      4.78824
>>>>>       H       17.75043      8.81610      3.97156
>>>>>       H       10.41563      1.57993     16.49923
>>>>>       H        6.49332      7.81303      7.99143
>>>>>       H        0.24800      0.19739     16.37425
>>>>>       H        9.53586     -0.26872      6.84508
>>>>>       H        6.19685      1.12218      7.44173
>>>>>       H       13.45550      8.28133      7.44815
>>>>>       H       11.11633      1.31384      6.30260
>>>>>       H       11.87413      5.44074      5.42962
>>>>>       H       12.38442      8.12016      3.04474
>>>>>       H       13.88694      4.78876      4.08791
>>>>>       H        4.53915      0.70283      4.22717
>>>>>       H        0.88557      0.65625      5.03328
>>>>>       H        8.96418      0.89159     10.50060
>>>>>       H        8.67994      8.85961      1.01083
>>>>>       H       16.35704      8.00331     10.63471
>>>>>       H       13.12606      1.45212      2.16563
>>>>>       H        3.64702      3.63930     16.44281
>>>>>       H       13.76743      4.88477     16.44833
>>>>>       H        6.85355      7.37827     15.30535
>>>>>       H       10.55820      5.40745     14.43410
>>>>>       H       12.97886      4.14375     12.04672
>>>>>       H       11.29905      7.38966     13.09313
>>>>>       H        2.29216      2.60091     13.23073
>>>>>       H       -0.01303     -0.23279     14.03603
>>>>>       H        7.34113      6.99275      1.49776
>>>>>       H       11.26049      0.78023     10.01184
>>>>>       H       17.50743      8.37258      1.63130
>>>>>       H        8.21398      8.86531     11.16822
>>>>>       H       11.54834      7.47018     10.56097
>>>>>       H        4.28503      0.31205     10.56295
>>>>>       H        6.62643      7.27289     11.69479
>>>>>       H        5.89748      3.14154     12.57118
>>>>>       H        5.36986      0.44461     14.95599
>>>>>       H        3.88656      3.78035     13.92095
>>>>>       H       13.21826      7.85764     13.78163
>>>>>       H       16.85773      7.91771     12.97237
>>>>>       H        8.78884      7.70469      7.49554
>>>>>       H        9.07452     -0.28399     16.99402
>>>>>       H        1.39009      0.59398      7.37083
>>>>>       H        4.63062      7.11938     15.84758
>>>>>     &END COORD
>>>>>     &KIND Zn
>>>>>       BASIS_SET TZVP-MOLOPT-PBE-GTH-q12
>>>>>       POTENTIAL GTH-PBE-q12
>>>>>     &END KIND
>>>>>     &KIND S
>>>>>       BASIS_SET TZVP-MOLOPT-PBE-GTH-q6
>>>>>       POTENTIAL GTH-PBE-q6
>>>>>     &END KIND
>>>>>     &KIND O
>>>>>       BASIS_SET TZVP-MOLOPT-PBE-GTH-q6
>>>>>       POTENTIAL GTH-PBE-q6
>>>>>     &END KIND
>>>>>     &KIND N
>>>>>       BASIS_SET TZVP-MOLOPT-PBE-GTH-q5
>>>>>       POTENTIAL GTH-PBE-q5
>>>>>     &END KIND
>>>>>     &KIND C
>>>>>       BASIS_SET TZVP-MOLOPT-PBE-GTH-q4
>>>>>       POTENTIAL GTH-PBE-q4
>>>>>     &END KIND
>>>>>     &KIND H
>>>>>       BASIS_SET TZVP-MOLOPT-PBE-GTH-q1
>>>>>       POTENTIAL GTH-PBE-q1
>>>>>     &END KIND
>>>>>   &END SUBSYS
>>>>> &END FORCE_EVAL
>>>>>
>>>>> &MOTION
>>>>>   &MD
>>>>>     ENSEMBLE NPT_I
>>>>>     TEMPERATURE 298
>>>>>     TIMESTEP 1.0
>>>>>     STEPS 50000
>>>>>     &THERMOSTAT
>>>>>       TYPE NOSE
>>>>>       &NOSE
>>>>>         LENGTH 3
>>>>>         YOSHIDA 3
>>>>>         TIMECON 1000
>>>>>       &END NOSE
>>>>>     &END THERMOSTAT
>>>>>     &BAROSTAT
>>>>>       PRESSURE 1.0
>>>>>       TIMECON 4000
>>>>>     &END BAROSTAT
>>>>>   &END MD
>>>>>   &FREE_ENERGY
>>>>>     METHOD METADYN
>>>>>     &METADYN
>>>>>       USE_PLUMED .TRUE.
>>>>>       PLUMED_INPUT_FILE plumed.dat
>>>>>     &END METADYN
>>>>>   &END FREE_ENERGY
>>>>>   &PRINT
>>>>>     &TRAJECTORY
>>>>>       &EACH
>>>>>         MD 5
>>>>>       &END EACH
>>>>>     &END TRAJECTORY
>>>>>     &FORCES
>>>>>       UNIT eV*angstrom^-1
>>>>>       &EACH
>>>>>         MD 5
>>>>>       &END EACH
>>>>>     &END FORCES
>>>>>     &CELL
>>>>>       &EACH
>>>>>         MD 5
>>>>>       &END EACH
>>>>>     &END CELL
>>>>>   &END PRINT
>>>>> &END MOTION
>>>>> ```
>>>>>
>>>>> This simulation was performed with previous version of cp2k (so 
>>>>> without your fix). 
>>>>> piątek, 25 października 2024 o 09:50:47 UTC+2 bartosz mazur napisał(a):
>>>>>
>>>>>> Hi Frederick, 
>>>>>>
>>>>>> it helped with most of the tests! Now only 13 have failed. In the 
>>>>>> attachments you will find full output from regtests and here is output from 
>>>>>> single job with TRACE enabled:
>>>>>>
>>>>>> ```
>>>>>> Loading intel/2024a
>>>>>>   Loading requirement: GCCcore/13.3.0 zlib/1.3.1-GCCcore-13.3.0
>>>>>>     binutils/2.42-GCCcore-13.3.0 intel-compilers/2024.2.0
>>>>>>     numactl/2.0.18-GCCcore-13.3.0 UCX/1.16.0-GCCcore-13.3.0
>>>>>>     impi/2021.13.0-intel-compilers-2024.2.0 imkl/2024.2.0 iimpi/2024a
>>>>>>     imkl-FFTW/2024.2.0-iimpi-2024a
>>>>>>
>>>>>> Currently Loaded Modulefiles:
>>>>>>  1) GCCcore/13.3.0                  7) 
>>>>>> impi/2021.13.0-intel-compilers-2024.2.0  
>>>>>>  2) zlib/1.3.1-GCCcore-13.3.0       8) imkl/2024.2.0                 
>>>>>>            
>>>>>>  3) binutils/2.42-GCCcore-13.3.0    9) iimpi/2024a                   
>>>>>>            
>>>>>>  4) intel-compilers/2024.2.0       10) imkl-FFTW/2024.2.0-iimpi-2024a 
>>>>>>           
>>>>>>  5) numactl/2.0.18-GCCcore-13.3.0  11) intel/2024a                   
>>>>>>            
>>>>>>  6) UCX/1.16.0-GCCcore-13.3.0      
>>>>>> 2 MPI processes with 2 OpenMP threads each
>>>>>> started at Fri Oct 25 09:34:34 CEST 2024 in /lustre/tmp/slurm/3127182
>>>>>> SIRIUS 7.6.1, git hash: 
>>>>>> https://api.github.com/repos/electronic-structure/SIRIUS/git/ref/tags/v7.6.1
>>>>>> Warning! Compiled in 'debug' mode with assert statements enabled!
>>>>>>
>>>>>>
>>>>>> LIBXSMM_VERSION: develop-1.17-3834 (25693946)
>>>>>> CLX/DP      TRY    JIT    STA    COL
>>>>>>    0..13      8      8      0      0 
>>>>>>   14..23      0      0      0      0 
>>>>>>   24..64      0      0      0      0 
>>>>>> Registry and code: 13 MB + 64 KB (gemm=8)
>>>>>> Command (PID=423503): 
>>>>>> /lustre/pd01/hpc-kuchta-1716987452/software/cp2k/exe/local/cp2k.psmp -i 
>>>>>> dftd3src1.inp -o dftd3src1.out
>>>>>> Uptime: 2.752513 s
>>>>>>
>>>>>>
>>>>>>
>>>>>> ===================================================================================
>>>>>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>>>>> =   RANK 0 PID 423503 RUNNING AT r21c01b03
>>>>>>
>>>>>> =   KILLED BY SIGNAL: 11 (Segmentation fault)
>>>>>>
>>>>>> ===================================================================================
>>>>>>
>>>>>>
>>>>>> ===================================================================================
>>>>>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>>>>> =   RANK 1 PID 423504 RUNNING AT r21c01b03
>>>>>>
>>>>>> =   KILLED BY SIGNAL: 9 (Killed)
>>>>>>
>>>>>> ===================================================================================
>>>>>> finished at Fri Oct 25 09:34:39 CEST 2024
>>>>>> ```
>>>>>>
>>>>>> and the last lines:
>>>>>>
>>>>>> ```
>>>>>>  000000:000002<<                                  13      3 
>>>>>> mp_sendrecv_dm2     
>>>>>>    0.000 Hostmem: 955 MB GPUmem: 0 MB
>>>>>>  000000:000002>>                                  13      4 
>>>>>> mp_sendrecv_dm2     
>>>>>>    start Hostmem: 955 MB GPUmem: 0 MB
>>>>>>  000000:000002<<                                  13      4 
>>>>>> mp_sendrecv_dm2     
>>>>>>    0.000 Hostmem: 955 MB GPUmem: 0 MB
>>>>>>  000000:000002<<                               12      2 
>>>>>> pw_nn_compose_r       0
>>>>>>  .003 Hostmem: 955 MB GPUmem: 0 MB
>>>>>>  000000:000002<<                            11      1 xc_pw_derive   
>>>>>>     0.003 H
>>>>>>  ostmem: 955 MB GPUmem: 0 MB
>>>>>>  000000:000002>>                            11      5 pw_zero       
>>>>>> start Hostme
>>>>>>  m: 955 MB GPUmem: 0 MB
>>>>>>  000000:000002<<                            11      5 pw_zero       
>>>>>> 0.000 Hostme
>>>>>>  m: 955 MB GPUmem: 0 MB
>>>>>>  000000:000002>>                            11      2 xc_pw_derive   
>>>>>>     start H
>>>>>>  ostmem: 955 MB GPUmem: 0 MB
>>>>>>  000000:000002>>                               12      3 
>>>>>> pw_nn_compose_r       s
>>>>>>  tart Hostmem: 955 MB GPUmem: 0 MB
>>>>>>  000000:000002>>                                  13      5 
>>>>>> mp_sendrecv_dm2     
>>>>>>    start Hostmem: 955 MB GPUmem: 0 MB
>>>>>>  000000:000002<<                                  13      5 
>>>>>> mp_sendrecv_dm2     
>>>>>>    0.000 Hostmem: 955 MB GPUmem: 0 MB
>>>>>>  000000:000002>>                                  13      6 
>>>>>> mp_sendrecv_dm2     
>>>>>>    start Hostmem: 955 MB GPUmem: 0 MB
>>>>>>  000000:000002<<                                  13      6 
>>>>>> mp_sendrecv_dm2     
>>>>>>    0.000 Hostmem: 955 MB GPUmem: 0 MB
>>>>>>  000000:000002<<                               12      3 
>>>>>> pw_nn_compose_r       0
>>>>>>  .002 Hostmem: 955 MB GPUmem: 0 MB
>>>>>>  000000:000002<<                            11      2 xc_pw_derive   
>>>>>>     0.002 H
>>>>>>  ostmem: 955 MB GPUmem: 0 MB
>>>>>>  000000:000002>>                            11      6 pw_zero       
>>>>>> start Hostme
>>>>>>  m: 955 MB GPUmem: 0 MB
>>>>>>  000000:000002<<                            11      6 pw_zero       
>>>>>> 0.001 Hostme
>>>>>>  m: 960 MB GPUmem: 0 MB
>>>>>>  000000:000002>>                            11      3 xc_pw_derive   
>>>>>>     start H
>>>>>>  ostmem: 960 MB GPUmem: 0 MB
>>>>>>  000000:000002>>                               12      4 
>>>>>> pw_nn_compose_r       s
>>>>>>  tart Hostmem: 960 MB GPUmem: 0 MB
>>>>>>  000000:000002>>                                  13      7 
>>>>>> mp_sendrecv_dm2     
>>>>>>    start Hostmem: 960 MB GPUmem: 0 MB
>>>>>>  000000:000002<<                                  13      7 
>>>>>> mp_sendrecv_dm2     
>>>>>>    0.000 Hostmem: 960 MB GPUmem: 0 MB
>>>>>>  000000:000002>>                                  13      8 
>>>>>> mp_sendrecv_dm2     
>>>>>>    start Hostmem: 960 MB GPUmem: 0 MB
>>>>>>  000000:000002<<                                  13      8 
>>>>>> mp_sendrecv_dm2     
>>>>>>    0.000 Hostmem: 960 MB GPUmem: 0 MB
>>>>>>  000000:000002<<                               12      4 
>>>>>> pw_nn_compose_r       0
>>>>>>  .002 Hostmem: 960 MB GPUmem: 0 MB
>>>>>>  000000:000002<<                            11      3 xc_pw_derive   
>>>>>>     0.002 H
>>>>>>  ostmem: 960 MB GPUmem: 0 MB
>>>>>>  000000:000002>>                            11      1 
>>>>>> pw_spline_scale_deriv     
>>>>>>    start Hostmem: 960 MB GPUmem: 0 MB
>>>>>>  000000:000002<<                            11      1 
>>>>>> pw_spline_scale_deriv     
>>>>>>    0.001 Hostmem: 960 MB GPUmem: 0 MB
>>>>>>  000000:000002>>                            11     20 
>>>>>> pw_pool_give_back_pw      
>>>>>>   start Hostmem: 965 MB GPUmem: 0 MB
>>>>>>  000000:000002<<                            11     20 
>>>>>> pw_pool_give_back_pw      
>>>>>>   0.000 Hostmem: 965 MB GPUmem: 0 MB
>>>>>>  000000:000002>>                            11     21 
>>>>>> pw_pool_give_back_pw      
>>>>>>   start Hostmem: 965 MB GPUmem: 0 MB
>>>>>>  000000:000002<<                            11     21 
>>>>>> pw_pool_give_back_pw      
>>>>>>   0.000 Hostmem: 965 MB GPUmem: 0 MB
>>>>>>  000000:000002>>                            11     22 
>>>>>> pw_pool_give_back_pw      
>>>>>>   start Hostmem: 965 MB GPUmem: 0 MB
>>>>>>  000000:000002<<                            11     22 
>>>>>> pw_pool_give_back_pw      
>>>>>>   0.000 Hostmem: 965 MB GPUmem: 0 MB
>>>>>>  000000:000002>>                            11     23 
>>>>>> pw_pool_give_back_pw      
>>>>>>   start Hostmem: 965 MB GPUmem: 0 MB
>>>>>>  000000:000002<<                            11     23 
>>>>>> pw_pool_give_back_pw      
>>>>>>   0.000 Hostmem: 965 MB GPUmem: 0 MB
>>>>>>  000000:000002>>                            11      1 
>>>>>> xc_functional_eval       s
>>>>>>  tart Hostmem: 965 MB GPUmem: 0 MB
>>>>>>  000000:000002>>                               12      1 b97_lda_eval 
>>>>>>       star
>>>>>>  t Hostmem: 965 MB GPUmem: 0 MB
>>>>>>  000000:000002<<                               12      1 b97_lda_eval 
>>>>>>       0.10
>>>>>>  3 Hostmem: 979 MB GPUmem: 0 MB
>>>>>>  000000:000002<<                            11      1 
>>>>>> xc_functional_eval       0
>>>>>>  .103 Hostmem: 979 MB GPUmem: 0 MB
>>>>>>  000000:000002<<                         10      1 
>>>>>> xc_rho_set_and_dset_create   
>>>>>>      0.120 Hostmem: 979 MB GPUmem: 0 MB
>>>>>>  000000:000002>>                         10      1 
>>>>>> check_for_derivatives       s
>>>>>>  tart Hostmem: 979 MB GPUmem: 0 MB
>>>>>>  000000:000002<<                         10      1 
>>>>>> check_for_derivatives       0
>>>>>>  .000 Hostmem: 979 MB GPUmem: 0 MB
>>>>>>  000000:000002>>                         10     14 pw_create_r3d     
>>>>>>   start Hos
>>>>>>  tmem: 979 MB GPUmem: 0 MB
>>>>>>  000000:000002<<                         10     14 pw_create_r3d     
>>>>>>   0.000 Hos
>>>>>>  tmem: 979 MB GPUmem: 0 MB
>>>>>>  000000:000002>>                         10     15 pw_create_r3d     
>>>>>>   start Hos
>>>>>>  tmem: 979 MB GPUmem: 0 MB
>>>>>>  000000:000002<<                         10     15 pw_create_r3d     
>>>>>>   0.000 Hos
>>>>>>  tmem: 979 MB GPUmem: 0 MB
>>>>>>  000000:000002>>                         10     16 pw_create_r3d     
>>>>>>   start Hos
>>>>>>  tmem: 979 MB GPUmem: 0 MB
>>>>>>  000000:000002<<                         10     16 pw_create_r3d     
>>>>>>   0.000 Hos
>>>>>>  tmem: 979 MB GPUmem: 0 MB
>>>>>>  000000:000002>>                         10     17 pw_create_r3d     
>>>>>>   start Hos
>>>>>>  tmem: 979 MB GPUmem: 0 MB
>>>>>>  000000:000002<<                         10     17 pw_create_r3d     
>>>>>>   0.000 Hos
>>>>>>  tmem: 979 MB GPUmem: 0 MB
>>>>>> ```
>>>>>>
>>>>>> Best
>>>>>> Bartosz
>>>>>>
>>>>>> środa, 23 października 2024 o 09:15:33 UTC+2 Frederick Stein 
>>>>>> napisał(a):
>>>>>>
>>>>>>> Dear Bartosz,
>>>>>>> My fix is merged. Can you switch to the CP2K master and try it 
>>>>>>> again? We are still working on a few issues with the Intel compilers such 
>>>>>>> that we may eventually migrate from ifort to ifx.
>>>>>>> Best,
>>>>>>> Frederick
>>>>>>>
>>>>>>> bartosz mazur schrieb am Dienstag, 22. Oktober 2024 um 17:45:21 
>>>>>>> UTC+2:
>>>>>>>
>>>>>>>> Great! Thank you for your help. 
>>>>>>>>
>>>>>>>> Best
>>>>>>>> Bartosz
>>>>>>>>
>>>>>>>> wtorek, 22 października 2024 o 15:24:04 UTC+2 Frederick Stein 
>>>>>>>> napisał(a):
>>>>>>>>
>>>>>>>>> I have a fix for it. In contrast to my first thought, it is a case 
>>>>>>>>> of invalid type conversion from real to complex numbers (yes, Fortran is 
>>>>>>>>> rather strict about it) in pw_derive. This may also be present in a few 
>>>>>>>>> other spots. I am currently running more tests and I will open a pull 
>>>>>>>>> request within the next few days.
>>>>>>>>> Best,
>>>>>>>>> Frederick
>>>>>>>>>
>>>>>>>>> Frederick Stein schrieb am Dienstag, 22. Oktober 2024 um 13:12:49 
>>>>>>>>> UTC+2:
>>>>>>>>>
>>>>>>>>>> I can reproduce the error locally. I am investigating it now.
>>>>>>>>>>
>>>>>>>>>> bartosz mazur schrieb am Dienstag, 22. Oktober 2024 um 11:58:57 
>>>>>>>>>> UTC+2:
>>>>>>>>>>
>>>>>>>>>>> I was loading it as it was needed for compilation. I have 
>>>>>>>>>>> unloaded the module, but the error still occurs: 
>>>>>>>>>>>
>>>>>>>>>>> ```
>>>>>>>>>>> LIBXSMM_VERSION: develop-1.17-3834 (25693946)
>>>>>>>>>>> CLX/DP      TRY    JIT    STA    COL
>>>>>>>>>>>    0..13      2      2      0      0 
>>>>>>>>>>>   14..23      0      0      0      0 
>>>>>>>>>>>   24..64      0      0      0      0 
>>>>>>>>>>> Registry and code: 13 MB + 16 KB (gemm=2)
>>>>>>>>>>> Command (PID=15485): 
>>>>>>>>>>> /lustre/pd01/hpc-kuchta-1716987452/software/cp2k/exe/local/cp2k.psmp -i 
>>>>>>>>>>> H2O-9.inp -o H2O-9.out
>>>>>>>>>>> Uptime: 1.757102 s
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ===================================================================================
>>>>>>>>>>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>>>>>>>>>> =   RANK 0 PID 15485 RUNNING AT r30c01b01
>>>>>>>>>>>
>>>>>>>>>>> =   KILLED BY SIGNAL: 11 (Segmentation fault)
>>>>>>>>>>>
>>>>>>>>>>> ===================================================================================
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ===================================================================================
>>>>>>>>>>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>>>>>>>>>> =   RANK 1 PID 15486 RUNNING AT r30c01b01
>>>>>>>>>>>
>>>>>>>>>>> =   KILLED BY SIGNAL: 9 (Killed)
>>>>>>>>>>>
>>>>>>>>>>> ===================================================================================
>>>>>>>>>>> ```
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> and the last 100 lines:
>>>>>>>>>>>
>>>>>>>>>>> ```
>>>>>>>>>>>  000000:000002>>                            11     37 
>>>>>>>>>>> pw_create_c1d       start 
>>>>>>>>>>>  Hostmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002<<                            11     37 
>>>>>>>>>>> pw_create_c1d       0.000 
>>>>>>>>>>>  Hostmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002<<                         10     64 
>>>>>>>>>>> pw_pool_create_pw       0.000
>>>>>>>>>>>   Hostmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002>>                         10     25 pw_copy       
>>>>>>>>>>> start Hostmem: 
>>>>>>>>>>>  697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002<<                         10     25 pw_copy       
>>>>>>>>>>> 0.001 Hostmem: 
>>>>>>>>>>>  697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002>>                         10     17 pw_axpy       
>>>>>>>>>>> start Hostmem: 
>>>>>>>>>>>  697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002<<                         10     17 pw_axpy       
>>>>>>>>>>> 0.001 Hostmem: 
>>>>>>>>>>>  697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002>>                         10     19 mp_sum_d     
>>>>>>>>>>>   start Hostmem:
>>>>>>>>>>>   697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002<<                         10     19 mp_sum_d     
>>>>>>>>>>>   0.000 Hostmem:
>>>>>>>>>>>   697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002>>                         10      3 
>>>>>>>>>>> pw_poisson_solve       start 
>>>>>>>>>>>  Hostmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002>>                            11      3 
>>>>>>>>>>> pw_poisson_rebuild       s
>>>>>>>>>>>  tart Hostmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002<<                            11      3 
>>>>>>>>>>> pw_poisson_rebuild       0
>>>>>>>>>>>  .000 Hostmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002>>                            11     65 
>>>>>>>>>>> pw_pool_create_pw       st
>>>>>>>>>>>  art Hostmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002>>                               12     38 
>>>>>>>>>>> pw_create_c1d       sta
>>>>>>>>>>>  rt Hostmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002<<                               12     38 
>>>>>>>>>>> pw_create_c1d       0.0
>>>>>>>>>>>  00 Hostmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002<<                            11     65 
>>>>>>>>>>> pw_pool_create_pw       0.
>>>>>>>>>>>  000 Hostmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002>>                            11     26 pw_copy   
>>>>>>>>>>>     start Hostme
>>>>>>>>>>>  m: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002<<                            11     26 pw_copy   
>>>>>>>>>>>     0.001 Hostme
>>>>>>>>>>>  m: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002>>                            11      3 
>>>>>>>>>>> pw_multiply_with       sta
>>>>>>>>>>>  rt Hostmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002<<                            11      3 
>>>>>>>>>>> pw_multiply_with       0.0
>>>>>>>>>>>  01 Hostmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002>>                            11     27 pw_copy   
>>>>>>>>>>>     start Hostme
>>>>>>>>>>>  m: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002<<                            11     27 pw_copy   
>>>>>>>>>>>     0.001 Hostme
>>>>>>>>>>>  m: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002>>                            11      3 
>>>>>>>>>>> pw_integral_ab       start
>>>>>>>>>>>   Hostmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002>>                               12     20 
>>>>>>>>>>> mp_sum_d       start Ho
>>>>>>>>>>>  stmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002<<                               12     20 
>>>>>>>>>>> mp_sum_d       0.001 Ho
>>>>>>>>>>>  stmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002<<                            11      3 
>>>>>>>>>>> pw_integral_ab       0.004
>>>>>>>>>>>   Hostmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002>>                            11      4 
>>>>>>>>>>> pw_poisson_set       start
>>>>>>>>>>>   Hostmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002>>                               12     66 
>>>>>>>>>>> pw_pool_create_pw      
>>>>>>>>>>>   start Hostmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002>>                                  13     39 
>>>>>>>>>>> pw_create_c1d       
>>>>>>>>>>>  start Hostmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002<<                                  13     39 
>>>>>>>>>>> pw_create_c1d       
>>>>>>>>>>>  0.000 Hostmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002<<                               12     66 
>>>>>>>>>>> pw_pool_create_pw      
>>>>>>>>>>>   0.000 Hostmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002>>                               12     28 pw_copy 
>>>>>>>>>>>       start Hos
>>>>>>>>>>>  tmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002<<                               12     28 pw_copy 
>>>>>>>>>>>       0.001 Hos
>>>>>>>>>>>  tmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002>>                               12      7 
>>>>>>>>>>> pw_derive       start H
>>>>>>>>>>>  ostmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002<<                               12      7 
>>>>>>>>>>> pw_derive       0.002 H
>>>>>>>>>>>  ostmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002>>                               12     67 
>>>>>>>>>>> pw_pool_create_pw      
>>>>>>>>>>>   start Hostmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002>>                                  13     40 
>>>>>>>>>>> pw_create_c1d       
>>>>>>>>>>>  start Hostmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002<<                                  13     40 
>>>>>>>>>>> pw_create_c1d       
>>>>>>>>>>>  0.000 Hostmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002<<                               12     67 
>>>>>>>>>>> pw_pool_create_pw      
>>>>>>>>>>>   0.000 Hostmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002>>                               12     29 pw_copy 
>>>>>>>>>>>       start Hos
>>>>>>>>>>>  tmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002<<                               12     29 pw_copy 
>>>>>>>>>>>       0.001 Hos
>>>>>>>>>>>  tmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002>>                               12      8 
>>>>>>>>>>> pw_derive       start H
>>>>>>>>>>>  ostmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002<<                               12      8 
>>>>>>>>>>> pw_derive       0.002 H
>>>>>>>>>>>  ostmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002>>                               12     68 
>>>>>>>>>>> pw_pool_create_pw      
>>>>>>>>>>>   start Hostmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002>>                                  13     41 
>>>>>>>>>>> pw_create_c1d       
>>>>>>>>>>>  start Hostmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002<<                                  13     41 
>>>>>>>>>>> pw_create_c1d       
>>>>>>>>>>>  0.000 Hostmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002<<                               12     68 
>>>>>>>>>>> pw_pool_create_pw      
>>>>>>>>>>>   0.000 Hostmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002>>                               12     30 pw_copy 
>>>>>>>>>>>       start Hos
>>>>>>>>>>>  tmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002<<                               12     30 pw_copy 
>>>>>>>>>>>       0.001 Hos
>>>>>>>>>>>  tmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  000000:000002>>                               12      9 
>>>>>>>>>>> pw_derive       start H
>>>>>>>>>>>  ostmem: 697 MB GPUmem: 0 MB
>>>>>>>>>>>  ```
>>>>>>>>>>>
>>>>>>>>>>> This is the list of currently loaded modules (all come with 
>>>>>>>>>>> intel):
>>>>>>>>>>>
>>>>>>>>>>> ```
>>>>>>>>>>> Currently Loaded Modulefiles:
>>>>>>>>>>>  1) GCCcore/13.3.0                  7) 
>>>>>>>>>>> impi/2021.13.0-intel-compilers-2024.2.0  
>>>>>>>>>>>  2) zlib/1.3.1-GCCcore-13.3.0       8) imkl/2024.2.0             
>>>>>>>>>>>                
>>>>>>>>>>>  3) binutils/2.42-GCCcore-13.3.0    9) iimpi/2024a               
>>>>>>>>>>>                
>>>>>>>>>>>  4) intel-compilers/2024.2.0       10) 
>>>>>>>>>>> imkl-FFTW/2024.2.0-iimpi-2024a           
>>>>>>>>>>>  5) numactl/2.0.18-GCCcore-13.3.0  11) intel/2024a               
>>>>>>>>>>>                
>>>>>>>>>>>  6) UCX/1.16.0-GCCcore-13.3.0    
>>>>>>>>>>> ```
>>>>>>>>>>> wtorek, 22 października 2024 o 11:12:57 UTC+2 Frederick Stein 
>>>>>>>>>>> napisał(a):
>>>>>>>>>>>
>>>>>>>>>>>> Dear Bartosz,
>>>>>>>>>>>> I am currently running some tests with the latest Intel 
>>>>>>>>>>>> compiler myself. What bothers me about your setup is the module 
>>>>>>>>>>>> GCC13/13.3.0 . Why is it loaded? Can you unload it? This would at least 
>>>>>>>>>>>> reduce potential interferences with between the Intel and the GCC compilers.
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Frederick
>>>>>>>>>>>>
>>>>>>>>>>>> bartosz mazur schrieb am Montag, 21. Oktober 2024 um 16:33:45 
>>>>>>>>>>>> UTC+2:
>>>>>>>>>>>>
>>>>>>>>>>>>> The error for ssmp is:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ```
>>>>>>>>>>>>> LIBXSMM_VERSION: develop-1.17-3834 (25693946)
>>>>>>>>>>>>> CLX/DP      TRY    JIT    STA    COL
>>>>>>>>>>>>>    0..13      4      4      0      0 
>>>>>>>>>>>>>   14..23      0      0      0      0 
>>>>>>>>>>>>>   24..64      0      0      0      0 
>>>>>>>>>>>>> Registry and code: 13 MB + 32 KB (gemm=4)
>>>>>>>>>>>>> Command (PID=54845): 
>>>>>>>>>>>>> /lustre/pd01/hpc-kuchta-1716987452/software/cp2k/exe/local/cp2k.ssmp -i 
>>>>>>>>>>>>> H2O-9.inp -o H2O-9.out
>>>>>>>>>>>>> Uptime: 2.861583 s
>>>>>>>>>>>>> /var/spool/slurmd/r30c01b15/job3120330/slurm_script: line 36: 
>>>>>>>>>>>>> 54845 Segmentation fault      (core dumped) 
>>>>>>>>>>>>> /lustre/pd01/hpc-kuchta-1716987452/software/cp2k/exe/local/cp2k.ssmp -i 
>>>>>>>>>>>>> H2O-9.inp -o H2O-9.out
>>>>>>>>>>>>> ```
>>>>>>>>>>>>>
>>>>>>>>>>>>> and the last 100 lines of output:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ```
>>>>>>>>>>>>>  000000:000001>>                               12     20 
>>>>>>>>>>>>> mp_sum_d       start Ho
>>>>>>>>>>>>>  stmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001<<                               12     20 
>>>>>>>>>>>>> mp_sum_d       0.000 Ho
>>>>>>>>>>>>>  stmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001<<                            11     13 
>>>>>>>>>>>>> dbcsr_dot_sd       0.000 H
>>>>>>>>>>>>>  ostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001<<                         10     12 
>>>>>>>>>>>>> calculate_ptrace_kp       0.0
>>>>>>>>>>>>>  00 Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001<<                       9      6 
>>>>>>>>>>>>> evaluate_core_matrix_traces     
>>>>>>>>>>>>>    0.000 Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001>>                       9      6 
>>>>>>>>>>>>> rebuild_ks_matrix       start Ho
>>>>>>>>>>>>>  stmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001>>                         10      6 
>>>>>>>>>>>>> qs_ks_build_kohn_sham_matrix 
>>>>>>>>>>>>>        start Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001>>                            11    140 
>>>>>>>>>>>>> pw_pool_create_pw       st
>>>>>>>>>>>>>  art Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001>>                               12     79 
>>>>>>>>>>>>> pw_create_c1d       sta
>>>>>>>>>>>>>  rt Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001<<                               12     79 
>>>>>>>>>>>>> pw_create_c1d       0.0
>>>>>>>>>>>>>  00 Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001<<                            11    140 
>>>>>>>>>>>>> pw_pool_create_pw       0.
>>>>>>>>>>>>>  000 Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001>>                            11    141 
>>>>>>>>>>>>> pw_pool_create_pw       st
>>>>>>>>>>>>>  art Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001>>                               12     80 
>>>>>>>>>>>>> pw_create_c1d       sta
>>>>>>>>>>>>>  rt Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001<<                               12     80 
>>>>>>>>>>>>> pw_create_c1d       0.0
>>>>>>>>>>>>>  00 Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001<<                            11    141 
>>>>>>>>>>>>> pw_pool_create_pw       0.
>>>>>>>>>>>>>  000 Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001>>                            11     61 pw_copy 
>>>>>>>>>>>>>       start Hostme
>>>>>>>>>>>>>  m: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001<<                            11     61 pw_copy 
>>>>>>>>>>>>>       0.004 Hostme
>>>>>>>>>>>>>  m: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001>>                            11     35 pw_axpy 
>>>>>>>>>>>>>       start Hostme
>>>>>>>>>>>>>  m: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001<<                            11     35 pw_axpy 
>>>>>>>>>>>>>       0.002 Hostme
>>>>>>>>>>>>>  m: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001>>                            11      6 
>>>>>>>>>>>>> pw_poisson_solve       sta
>>>>>>>>>>>>>  rt Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001>>                               12      6 
>>>>>>>>>>>>> pw_poisson_rebuild     
>>>>>>>>>>>>>    start Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001<<                               12      6 
>>>>>>>>>>>>> pw_poisson_rebuild     
>>>>>>>>>>>>>    0.000 Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001>>                               12    142 
>>>>>>>>>>>>> pw_pool_create_pw      
>>>>>>>>>>>>>   start Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001>>                                  13     81 
>>>>>>>>>>>>> pw_create_c1d       
>>>>>>>>>>>>>  start Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001<<                                  13     81 
>>>>>>>>>>>>> pw_create_c1d       
>>>>>>>>>>>>>  0.000 Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001<<                               12    142 
>>>>>>>>>>>>> pw_pool_create_pw      
>>>>>>>>>>>>>   0.000 Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001>>                               12     62 
>>>>>>>>>>>>> pw_copy       start Hos
>>>>>>>>>>>>>  tmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001<<                               12     62 
>>>>>>>>>>>>> pw_copy       0.003 Hos
>>>>>>>>>>>>>  tmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001>>                               12      6 
>>>>>>>>>>>>> pw_multiply_with       
>>>>>>>>>>>>>  start Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001<<                               12      6 
>>>>>>>>>>>>> pw_multiply_with       
>>>>>>>>>>>>>  0.002 Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001>>                               12     63 
>>>>>>>>>>>>> pw_copy       start Hos
>>>>>>>>>>>>>  tmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001<<                               12     63 
>>>>>>>>>>>>> pw_copy       0.003 Hos
>>>>>>>>>>>>>  tmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001>>                               12      6 
>>>>>>>>>>>>> pw_integral_ab       st
>>>>>>>>>>>>>  art Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001<<                               12      6 
>>>>>>>>>>>>> pw_integral_ab       0.
>>>>>>>>>>>>>  005 Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001>>                               12      7 
>>>>>>>>>>>>> pw_poisson_set       st
>>>>>>>>>>>>>  art Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001>>                                  13    143 
>>>>>>>>>>>>> pw_pool_create_pw   
>>>>>>>>>>>>>      start Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001>>                                     14     82 
>>>>>>>>>>>>> pw_create_c1d    
>>>>>>>>>>>>>     start Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001<<                                     14     82 
>>>>>>>>>>>>> pw_create_c1d    
>>>>>>>>>>>>>     0.000 Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001<<                                  13    143 
>>>>>>>>>>>>> pw_pool_create_pw   
>>>>>>>>>>>>>      0.000 Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001>>                                  13     64 
>>>>>>>>>>>>> pw_copy       start 
>>>>>>>>>>>>>  Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001<<                                  13     64 
>>>>>>>>>>>>> pw_copy       0.003 
>>>>>>>>>>>>>  Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001>>                                  13     16 
>>>>>>>>>>>>> pw_derive       star
>>>>>>>>>>>>>  t Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001<<                                  13     16 
>>>>>>>>>>>>> pw_derive       0.00
>>>>>>>>>>>>>  6 Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001>>                                  13    144 
>>>>>>>>>>>>> pw_pool_create_pw   
>>>>>>>>>>>>>      start Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001>>                                     14     83 
>>>>>>>>>>>>> pw_create_c1d    
>>>>>>>>>>>>>     start Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001<<                                     14     83 
>>>>>>>>>>>>> pw_create_c1d    
>>>>>>>>>>>>>     0.000 Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001<<                                  13    144 
>>>>>>>>>>>>> pw_pool_create_pw   
>>>>>>>>>>>>>      0.000 Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001>>                                  13     65 
>>>>>>>>>>>>> pw_copy       start 
>>>>>>>>>>>>>  Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001<<                                  13     65 
>>>>>>>>>>>>> pw_copy       0.004 
>>>>>>>>>>>>>  Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000001>>                                  13     17 
>>>>>>>>>>>>> pw_derive       star
>>>>>>>>>>>>>  t Hostmem: 380 MB GPUmem: 0 MB
>>>>>>>>>>>>> ```
>>>>>>>>>>>>>
>>>>>>>>>>>>> for psmp the last 100 lines is:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ```
>>>>>>>>>>>>>  000000:000002<<                       9      7 
>>>>>>>>>>>>> evaluate_core_matrix_traces     
>>>>>>>>>>>>>    0.000 Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002>>                       9      7 
>>>>>>>>>>>>> rebuild_ks_matrix       start Ho
>>>>>>>>>>>>>
>>>>>>>>>>>>>  stmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002>>                         10      7 
>>>>>>>>>>>>> qs_ks_build_kohn_sham_matrix 
>>>>>>>>>>>>>        start Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002>>                            11    164 
>>>>>>>>>>>>> pw_pool_create_pw       st
>>>>>>>>>>>>>  art Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002>>                               12     93 
>>>>>>>>>>>>> pw_create_c1d       sta
>>>>>>>>>>>>>  rt Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002<<                               12     93 
>>>>>>>>>>>>> pw_create_c1d       0.0
>>>>>>>>>>>>>  00 Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002<<                            11    164 
>>>>>>>>>>>>> pw_pool_create_pw       0.
>>>>>>>>>>>>>  000 Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002>>                            11    165 
>>>>>>>>>>>>> pw_pool_create_pw       st
>>>>>>>>>>>>>  art Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002>>                               12     94 
>>>>>>>>>>>>> pw_create_c1d       sta
>>>>>>>>>>>>>  rt Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002<<                               12     94 
>>>>>>>>>>>>> pw_create_c1d       0.0
>>>>>>>>>>>>>  00 Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002<<                            11    165 
>>>>>>>>>>>>> pw_pool_create_pw       0.
>>>>>>>>>>>>>  000 Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002>>                            11     73 pw_copy 
>>>>>>>>>>>>>       start Hostme
>>>>>>>>>>>>>
>>>>>>>>>>>>>  m: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002<<                            11     73 pw_copy 
>>>>>>>>>>>>>       0.001 Hostme
>>>>>>>>>>>>>
>>>>>>>>>>>>>  m: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002>>                            11     41 pw_axpy 
>>>>>>>>>>>>>       start Hostme
>>>>>>>>>>>>>
>>>>>>>>>>>>>  m: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002<<                            11     41 pw_axpy 
>>>>>>>>>>>>>       0.001 Hostme
>>>>>>>>>>>>>
>>>>>>>>>>>>>  m: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002>>                            11     52 mp_sum_d 
>>>>>>>>>>>>>       start Hostm
>>>>>>>>>>>>>
>>>>>>>>>>>>>  em: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002<<                            11     52 mp_sum_d 
>>>>>>>>>>>>>       0.000 Hostm
>>>>>>>>>>>>>
>>>>>>>>>>>>>  em: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002>>                            11      7 
>>>>>>>>>>>>> pw_poisson_solve       sta
>>>>>>>>>>>>>  rt Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002>>                               12      7 
>>>>>>>>>>>>> pw_poisson_rebuild     
>>>>>>>>>>>>>    start Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002<<                               12      7 
>>>>>>>>>>>>> pw_poisson_rebuild     
>>>>>>>>>>>>>    0.000 Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002>>                               12    166 
>>>>>>>>>>>>> pw_pool_create_pw      
>>>>>>>>>>>>>
>>>>>>>>>>>>>   start Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002>>                                  13     95 
>>>>>>>>>>>>> pw_create_c1d       
>>>>>>>>>>>>>  start Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002<<                                  13     95 
>>>>>>>>>>>>> pw_create_c1d       
>>>>>>>>>>>>>  0.000 Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002<<                               12    166 
>>>>>>>>>>>>> pw_pool_create_pw      
>>>>>>>>>>>>>
>>>>>>>>>>>>>   0.000 Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002>>                               12     74 
>>>>>>>>>>>>> pw_copy       start Hos
>>>>>>>>>>>>>
>>>>>>>>>>>>>  tmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002<<                               12     74 
>>>>>>>>>>>>> pw_copy       0.001 Hos
>>>>>>>>>>>>>
>>>>>>>>>>>>>  tmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002>>                               12      7 
>>>>>>>>>>>>> pw_multiply_with       
>>>>>>>>>>>>>  start Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002<<                               12      7 
>>>>>>>>>>>>> pw_multiply_with       
>>>>>>>>>>>>>  0.001 Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002>>                               12     75 
>>>>>>>>>>>>> pw_copy       start Hos
>>>>>>>>>>>>>
>>>>>>>>>>>>>  tmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002<<                               12     75 
>>>>>>>>>>>>> pw_copy       0.001 Hos
>>>>>>>>>>>>>
>>>>>>>>>>>>>  tmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002>>                               12      7 
>>>>>>>>>>>>> pw_integral_ab       st
>>>>>>>>>>>>>  art Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002>>                                  13     53 
>>>>>>>>>>>>> mp_sum_d       start
>>>>>>>>>>>>>
>>>>>>>>>>>>>   Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002<<                                  13     53 
>>>>>>>>>>>>> mp_sum_d       0.000
>>>>>>>>>>>>>
>>>>>>>>>>>>>   Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002<<                               12      7 
>>>>>>>>>>>>> pw_integral_ab       0.
>>>>>>>>>>>>>  003 Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002>>                               12      8 
>>>>>>>>>>>>> pw_poisson_set       st
>>>>>>>>>>>>>  art Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002>>                                  13    167 
>>>>>>>>>>>>> pw_pool_create_pw   
>>>>>>>>>>>>>      start Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002>>                                     14     96 
>>>>>>>>>>>>> pw_create_c1d    
>>>>>>>>>>>>>
>>>>>>>>>>>>>     start Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002<<                                     14     96 
>>>>>>>>>>>>> pw_create_c1d    
>>>>>>>>>>>>>
>>>>>>>>>>>>>     0.000 Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002<<                                  13    167 
>>>>>>>>>>>>> pw_pool_create_pw   
>>>>>>>>>>>>>      0.000 Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002>>                                  13     76 
>>>>>>>>>>>>> pw_copy       start 
>>>>>>>>>>>>>  Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002<<                                  13     76 
>>>>>>>>>>>>> pw_copy       0.001 
>>>>>>>>>>>>>  Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002>>                                  13     19 
>>>>>>>>>>>>> pw_derive       star
>>>>>>>>>>>>>  t Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002<<                                  13     19 
>>>>>>>>>>>>> pw_derive       0.00
>>>>>>>>>>>>>  2 Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002>>                                  13    168 
>>>>>>>>>>>>> pw_pool_create_pw   
>>>>>>>>>>>>>      start Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002>>                                     14     97 
>>>>>>>>>>>>> pw_create_c1d    
>>>>>>>>>>>>>     start Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002<<                                     14     97 
>>>>>>>>>>>>> pw_create_c1d    
>>>>>>>>>>>>>     0.000 Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002<<                                  13    168 
>>>>>>>>>>>>> pw_pool_create_pw   
>>>>>>>>>>>>>      0.000 Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002>>                                  13     77 
>>>>>>>>>>>>> pw_copy       start 
>>>>>>>>>>>>>  Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002<<                                  13     77 
>>>>>>>>>>>>> pw_copy       0.001 
>>>>>>>>>>>>>  Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>  000000:000002>>                                  13     20 
>>>>>>>>>>>>> pw_derive       star
>>>>>>>>>>>>>  t Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>> ```
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>> Bartosz
>>>>>>>>>>>>>
>>>>>>>>>>>>> poniedziałek, 21 października 2024 o 08:58:34 UTC+2 Frederick 
>>>>>>>>>>>>> Stein napisał(a):
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Dear Bartosz,
>>>>>>>>>>>>>> I have no idea about the issue with LibXSMM.
>>>>>>>>>>>>>> Regarding the trace, I do not know either as there is not 
>>>>>>>>>>>>>> much that could break in pw_derive (it just performs multiplications) and 
>>>>>>>>>>>>>> the sequence of operations is to unspecific. It may be that the code 
>>>>>>>>>>>>>> actually breaks somewhere else. Can you do the same with the ssmp and post 
>>>>>>>>>>>>>> the last 100 lines? This way, we remove the asynchronicity issues for 
>>>>>>>>>>>>>> backtraces with the psmp version.
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Frederick
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> bartosz mazur schrieb am Sonntag, 20. Oktober 2024 um 
>>>>>>>>>>>>>> 16:47:15 UTC+2:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The error is:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>> LIBXSMM_VERSION: develop-1.17-3834 (25693946)
>>>>>>>>>>>>>>> CLX/DP      TRY    JIT    STA    COL
>>>>>>>>>>>>>>>    0..13      2      2      0      0
>>>>>>>>>>>>>>>   14..23      0      0      0      0
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   24..64      0      0      0      0
>>>>>>>>>>>>>>> Registry and code: 13 MB + 16 KB (gemm=2)
>>>>>>>>>>>>>>> Command (PID=2607388): 
>>>>>>>>>>>>>>> /lustre/pd01/hpc-kuchta-1716987452/software/cp2k/exe/local/cp2k.psmp -i 
>>>>>>>>>>>>>>> H2O-9.inp -o H2O-9.out
>>>>>>>>>>>>>>> Uptime: 5.288243 s
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ===================================================================================
>>>>>>>>>>>>>>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>>>>>>>>>>>>>> =   RANK 0 PID 2607388 RUNNING AT r21c01b10
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> =   KILLED BY SIGNAL: 11 (Segmentation fault)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ===================================================================================
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ===================================================================================
>>>>>>>>>>>>>>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>>>>>>>>>>>>>> =   RANK 1 PID 2607389 RUNNING AT r21c01b10
>>>>>>>>>>>>>>> =   KILLED BY SIGNAL: 9 (Killed)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ===================================================================================
>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> and the last 20 lines:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>  000000:000002<<                                  13     76 
>>>>>>>>>>>>>>> pw_copy       0.001
>>>>>>>>>>>>>>>  Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>>>  000000:000002>>                                  13     19 
>>>>>>>>>>>>>>> pw_derive       star
>>>>>>>>>>>>>>>  t Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>>>  000000:000002<<                                  13     19 
>>>>>>>>>>>>>>> pw_derive       0.00
>>>>>>>>>>>>>>>  2 Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>>>  000000:000002>>                                  13    168 
>>>>>>>>>>>>>>> pw_pool_create_pw
>>>>>>>>>>>>>>>      start Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>>>  000000:000002>>                                     14     
>>>>>>>>>>>>>>> 97 pw_create_c1d
>>>>>>>>>>>>>>>     start Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>>>  000000:000002<<                                     14     
>>>>>>>>>>>>>>> 97 pw_create_c1d
>>>>>>>>>>>>>>>     0.000 Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>>>  000000:000002<<                                  13    168 
>>>>>>>>>>>>>>> pw_pool_create_pw
>>>>>>>>>>>>>>>      0.000 Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>>>  000000:000002>>                                  13     77 
>>>>>>>>>>>>>>> pw_copy       start
>>>>>>>>>>>>>>>  Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>>>  000000:000002<<                                  13     77 
>>>>>>>>>>>>>>> pw_copy       0.001
>>>>>>>>>>>>>>>  Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>>>  000000:000002>>                                  13     20 
>>>>>>>>>>>>>>> pw_derive       star
>>>>>>>>>>>>>>>  t Hostmem: 693 MB GPUmem: 0 MB
>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>> piątek, 18 października 2024 o 17:18:39 UTC+2 Frederick 
>>>>>>>>>>>>>>> Stein napisał(a):
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Please pick one of the failing tests. Then, add the TRACE 
>>>>>>>>>>>>>>>> keyword to the &GLOBAL section and then run the test manually. This 
>>>>>>>>>>>>>>>> increases the size of the output file dramatically (to some million lines). 
>>>>>>>>>>>>>>>> Can you send me the last ~20 lines of the output?
>>>>>>>>>>>>>>>> bartosz mazur schrieb am Freitag, 18. Oktober 2024 um 
>>>>>>>>>>>>>>>> 17:09:40 UTC+2:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'm using do_regtests.py script, not make regtesting, but 
>>>>>>>>>>>>>>>>> I assume it makes no difference. As I mentioned in previous message for 
>>>>>>>>>>>>>>>>> `--ompthreads 1` all tests were passed both for ssmp and psmp. For ssmp 
>>>>>>>>>>>>>>>>> with `--ompthreads 2` I observe similar errors as for psmp with the same 
>>>>>>>>>>>>>>>>> setting, I provide example output as attachment. 
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>> Bartosz
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> piątek, 18 października 2024 o 16:24:16 UTC+2 Frederick 
>>>>>>>>>>>>>>>>> Stein napisał(a):
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Dear Bartosz,
>>>>>>>>>>>>>>>>>> What happens if you set the number of OpenMP threads to 1 
>>>>>>>>>>>>>>>>>> (add '--ompthreads 1' to TESTOPTS)? What errors do you observe in case of 
>>>>>>>>>>>>>>>>>> the ssmp?
>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>> Frederick
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> bartosz mazur schrieb am Freitag, 18. Oktober 2024 um 
>>>>>>>>>>>>>>>>>> 15:37:43 UTC+2:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi Frederick,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> thanks again for help. So I have tested different 
>>>>>>>>>>>>>>>>>>> simulation variants and I know that the problem occurs when using OMP. For 
>>>>>>>>>>>>>>>>>>> MPI calculations without OMP all tests pass. I have also tested the effect 
>>>>>>>>>>>>>>>>>>> of the `OMP_PROC_BIND` and `OMP_PLACES` parameters and 
>>>>>>>>>>>>>>>>>>> apart from the effect on simulation time, they have no significant effect 
>>>>>>>>>>>>>>>>>>> on the presence of errors. Below are the results for ssmp:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>> OMP_PROC_BIND, OMP_PLACES, correct, total, wrong, 
>>>>>>>>>>>>>>>>>>> failed, time 
>>>>>>>>>>>>>>>>>>> spread, threads, 3850, 4144, 4, 290, 186min
>>>>>>>>>>>>>>>>>>> spread, cores, 3831, 4144, 3, 310, 183min
>>>>>>>>>>>>>>>>>>> spread, sockets, 3864, 4144, 3, 277, 104min
>>>>>>>>>>>>>>>>>>> close, threads, 3879, 4144, 3, 262, 171min
>>>>>>>>>>>>>>>>>>> close, cores, 3854, 4144, 0, 290, 168min
>>>>>>>>>>>>>>>>>>> close, sockets, 3865, 4144, 3, 276, 104min
>>>>>>>>>>>>>>>>>>> master, threads, 4121, 4144, 0, 23, 1002min
>>>>>>>>>>>>>>>>>>> master, cores, 4121, 4144, 0, 23, 986min
>>>>>>>>>>>>>>>>>>> master, sockets, 3942, 4144, 3, 199, 219min
>>>>>>>>>>>>>>>>>>> false, threads, 3918, 4144, 0, 226, 178min
>>>>>>>>>>>>>>>>>>> false, cores, 3919, 4144, 3, 222, 176min
>>>>>>>>>>>>>>>>>>> false, sockets, 3856, 4144, 4, 284, 104min
>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> and psmp:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>> OMP_PROC_BIND, OMP_PLACES, results
>>>>>>>>>>>>>>>>>>> spread, threads, Summary: correct: 4097 / 4227; failed: 
>>>>>>>>>>>>>>>>>>> 130; 495min
>>>>>>>>>>>>>>>>>>> spread, cores, 26 / 362
>>>>>>>>>>>>>>>>>>> spread, cores, 26 / 362
>>>>>>>>>>>>>>>>>>> close, threads, Summary: correct: 4133 / 4227; failed: 
>>>>>>>>>>>>>>>>>>> 94; 484min
>>>>>>>>>>>>>>>>>>> close, cores, 60 / 362
>>>>>>>>>>>>>>>>>>> close, sockets, 13 / 362
>>>>>>>>>>>>>>>>>>> master, threads, 13 / 362
>>>>>>>>>>>>>>>>>>> master, cores, 79 / 362
>>>>>>>>>>>>>>>>>>> master, sockets, Summary: correct: 4153 / 4227; failed: 
>>>>>>>>>>>>>>>>>>> 74; 563min
>>>>>>>>>>>>>>>>>>> false, threads, Summary: correct: 4153 / 4227; failed: 
>>>>>>>>>>>>>>>>>>> 74; 556min
>>>>>>>>>>>>>>>>>>> false, cores, Summary: correct: 4106 / 4227; failed: 
>>>>>>>>>>>>>>>>>>> 121; 511min
>>>>>>>>>>>>>>>>>>> false, sockets, 96 / 362
>>>>>>>>>>>>>>>>>>> not specified, not specified, Summary: correct: 4129 / 
>>>>>>>>>>>>>>>>>>> 4227; failed: 98; 263min
>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Any ideas what I could do next to have more information 
>>>>>>>>>>>>>>>>>>> about the source of the problem or maybe you see a potential solution at 
>>>>>>>>>>>>>>>>>>> this stage? I would appreciate any further help. 
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Best
>>>>>>>>>>>>>>>>>>> Bartosz
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> piątek, 11 października 2024 o 14:30:25 UTC+2 Frederick 
>>>>>>>>>>>>>>>>>>> Stein napisał(a):
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Dear Bartosz,
>>>>>>>>>>>>>>>>>>>> If I am not mistaken, you used 8 OpenMP threads. The 
>>>>>>>>>>>>>>>>>>>> test do not run that efficiently with such a large number of threads. 2 
>>>>>>>>>>>>>>>>>>>> should be sufficient.
>>>>>>>>>>>>>>>>>>>> The test result suggests that most of the functionality 
>>>>>>>>>>>>>>>>>>>> may work but due to a missing backtrace (or similar information), it is 
>>>>>>>>>>>>>>>>>>>> hard to tell why they fail. You could also try to run some of the 
>>>>>>>>>>>>>>>>>>>> single-node tests to assess the stability of CP2K.
>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>> Frederick
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> bartosz mazur schrieb am Freitag, 11. Oktober 2024 um 
>>>>>>>>>>>>>>>>>>>> 13:48:42 UTC+2:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Sorry, forgot attachments.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>

-- 
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+unsubscribe at googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/cp2k/b2d468f3-3e8b-4823-8132-376753042665n%40googlegroups.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20241120/fdd18376/attachment-0001.htm>


More information about the CP2K-user mailing list