[CP2K-user] [CP2K:17911] Re: Runtime error

Matt Watkins mattwatkinsuk at gmail.com
Thu Oct 20 17:03:35 UTC 2022


I can't directly comment but multiple GPUs per MPI task is something to set 
up gradually.
If you can run a gpu on an MPI process, mulitple MPI process per GPU etc 
then explain where things break.
Matt

On Wednesday, 19 October 2022 at 16:34:18 UTC+1 LC wrote:

> It seems that it breaks at Quickstep, could it be a bad installation? I 
> did not see any error but I could miss a required library?
>
>
> On Wednesday, 19 October 2022 at 14:02:38 UTC+1 LC wrote:
>
>> Hello,
>>
>> Running the H2O-DFT-LS benchmark on 4 GPUs ( 4 MPI tasks per GPU), I get:
>>
>>  CELL_REF| Volume [angstrom^3]:                                       
>>  25825.145
>>  CELL_REF| Vector a [angstrom    29.558     0.000     0.000    |a| =     
>>  29.558
>>  CELL_REF| Vector b [angstrom     0.000    29.558     0.000    |b| =     
>>  29.558
>>  CELL_REF| Vector c [angstrom     0.000     0.000    29.558    |c| =     
>>  29.558
>>  CELL_REF| Angle (b,c), alpha [degree]:                                   
>> 90.000
>>  CELL_REF| Angle (a,c), beta  [degree]:                                   
>> 90.000
>>  CELL_REF| Angle (a,b), gamma [degree]:                                   
>> 90.000
>>  CELL_REF| Numerically orthorhombic:                                     
>>     YES
>> [node:117708:0:117708] Caught signal 11 (Segmentation fault: address not 
>> mapped to object at address 0x440000
>>
>> ==== backtrace (tid: 117716) ====
>>  0 0x0000000000012b20 __funlockfile()  :0
>>  1 0x000000000006d680 PMPI_Comm_set_name()  ???:0
>>  2 0x000000000006d680 PMPI_Comm_size() 
>>  /build-result/src/hpcx-v2.12-gcc-MLNX_OFED_LINUX-5-redhat8-cuda11-gdr
>>
>> copy2-nccl2.12-x86_64/ompi-1c67bf1c6a156f1ae693f86a38f9d859e99eeb1f/ompi/mpi/c/profile/pcomm_size.c:63
>>  3 0x0000000000029e29 MKLMPI_Comm_size()  ???:0
>>  4 0x0000000000027ff1 mkl_blacs_init()  ???:0
>>  5 0x0000000000027f38 Cblacs_pinfo()  ???:0
>>  6 0x000000000001881f blacs_gridmap_()  ???:0
>>  7 0x00000000000181fe blacs_gridinit_()  ???:0
>>  8 0x00000000024c422c __cp_blacs_env_MOD_cp_blacs_env_create()  ???:0
>>  9 0x0000000000bc22b5 __qs_environment_MOD_qs_init()  ???:0
>> 10 0x0000000000c6dded __f77_interface_MOD_create_force_env()  ???:0
>> 11 0x00000000004478e4 __cp2k_runs_MOD_cp2k_run()  cp2k_runs.F90:0
>> 12 0x000000000044a212 __cp2k_runs_MOD_run_input()  ???:0
>> 13 0x000000000043dac1 MAIN__()  cp2k.F90:0
>> 14 0x000000000040ec6d main()  ???:0
>> 15 0x0000000000023493 __libc_start_main()  ???:0
>> 16 0x000000000043ccde _start()  ???:0
>>
>> Any idea why?
>> L
>>
>

-- 
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+unsubscribe at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/0538a35d-48b7-435d-b84b-e1207abcdf46n%40googlegroups.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20221020/874fbc64/attachment-0001.htm>


More information about the CP2K-user mailing list