AMD GPU driver load hitting BUG_ON in sync_global_pgds

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* AMD GPU driver load hitting BUG_ON in sync_global_pgds_l5()
@ 2025-04-22  6:34 Bharata B Rao
  2025-04-22  7:14 ` Balbir Singh
  2025-04-22 15:13 ` Dave Hansen
  0 siblings, 2 replies; 8+ messages in thread
From: Bharata B Rao @ 2025-04-22  6:34 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Dave Hansen, luto, peterz, tglx, mingo, bp, x86, hpa, nikunj,
	Balbir Singh, kees, alexander.deucher

Hi,

Nikunj and I have been debugging an issue seen during AMD GPU driver 
loading where we see the below failure:

-----------------------------------------
kernel BUG at arch/x86/mm/init_64.c:173!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 4 PID: 1222 Comm: modprobe Tainted: G            E      6.8.12+ #3
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.16.3-0-ga6ed6b70-prebuilt.qemu.org 04/01/2014
RIP: 0010:sync_global_pgds+0x343/0x560
Code: fb 66 9e 01 49 89 c0 48 89 f8 0f 1f 00 48 23 05 4b 92 9f 01 48 25 
00 f0 ff ff 48 03 05 de 66 9e 01 4c 39 c0 0f 84 c8 fd ff ff <0f> 0b 49 
8b 75 00 4c 89 ff e8 af 62 ff ff 90 e9 d3 fd ff ff 48 8b
RSP: 0018:ff52bf8d40a7f4e8 EFLAGS: 00010206
RAX: ff29cef78ad1a000 RBX: fffff1458477e080 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000010ad1a067
RBP: ff52bf8d40a7f530 R08: ff29cef78a0d0000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ff29cef79bd8322c
R13: ffffffffafc3c000 R14: 0000314480400000 R15: ff29cef79df82000
FS:  00007e1c04bf8000(0000) GS:ff29cfe72ea00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007e7e161f2a50 CR3: 0000000112c9a004 CR4: 0000000000771ef0
PKRU: 55555554
Call Trace:
  <TASK>
  ? show_regs+0x72/0x90
  ? die+0x38/0xb0
  ? do_trap+0xe3/0x100
  ? do_error_trap+0x75/0xb0
  ? sync_global_pgds+0x343/0x560
  ? exc_invalid_op+0x53/0x80
  ? sync_global_pgds+0x343/0x560
  ? asm_exc_invalid_op+0x1b/0x20
  ? sync_global_pgds+0x343/0x560
  ? sync_global_pgds+0x2d4/0x560
  vmemmap_populate+0x73/0xd0
  __populate_section_memmap+0x1fc/0x440
  sparse_add_section+0x155/0x390
  __add_pages+0xd1/0x190
  add_pages+0x17/0x70
  memremap_pages+0x471/0x6d0
  devm_memremap_pages+0x23/0x70
  kgd2kfd_init_zone_device+0x14a/0x270 [amdgpu]
  amdgpu_device_init+0x3042/0x3150 [amdgpu]
  ? do_pci_enable_device+0xcc/0x110
  amdgpu_driver_load_kms+0x1a/0x1c0 [amdgpu]
  amdgpu_pci_probe+0x1ba/0x610 [amdgpu]
  ? _raw_spin_unlock_irqrestore+0x11/0x60
  local_pci_probe+0x4b/0xb0
  pci_device_probe+0xc8/0x290
  really_probe+0x1d5/0x440
  __driver_probe_device+0x8a/0x190
  driver_probe_device+0x23/0xd0
  __driver_attach+0x10f/0x220
  ? __pfx___driver_attach+0x10/0x10
  bus_for_each_dev+0x7d/0xe0
  driver_attach+0x1e/0x30
  bus_add_driver+0x14e/0x290
  driver_register+0x64/0x140
  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
  __pci_register_driver+0x61/0x70
  amdgpu_init+0x69/0xff0 [amdgpu]
  do_one_initcall+0x49/0x330
  ? kmalloc_trace+0x136/0x380
  do_init_module+0x99/0x2b0
  load_module+0x241e/0x24e0
  init_module_from_file+0x9a/0x100
  ? init_module_from_file+0x9a/0x100
  idempotent_init_module+0x184/0x240
  __x64_sys_finit_module+0x64/0xd0
  x64_sys_call+0x1c4c/0x2660
  do_syscall_64+0x80/0x170
  ? ksys_mmap_pgoff+0x123/0x270
  ? do_syscall_64+0x8c/0x170
  ? syscall_exit_to_user_mode+0x83/0x260
  ? do_syscall_64+0x8c/0x170
  ? do_syscall_64+0x8c/0x170
  ? exc_page_fault+0x95/0x1b0
  entry_SYSCALL_64_after_hwframe+0x78/0x80
RIP: 0033:0x7e1c0431e88d
Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 
f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48
RSP: 002b:00007fffa97770b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 00006198887830f0 RCX: 00007e1c0431e88d
RDX: 0000000000000000 RSI: 0000619887b43cd2 RDI: 000000000000000e
RBP: 0000000000040000 R08: 0000000000000000 R09: 0000000000000002
R10: 000000000000000e R11: 0000000000000246 R12: 0000619887b43cd2
R13: 0000619888783220 R14: 0000619888782600 R15: 000061988878d190
  </TASK>
-----------------------------------------

A KVM guest (with 5 level page table enabled) is started with 8 GPUs 
(AMD GPU driver gets loaded) and CoralGemm workload (matrix 
multiplication stress) is run inside the guest. The guest is turned off 
after the workload run completes.

This test(start guest, run workload, turn off guest) is repeated for 
hundreds of time and approximately once in 500 such runs or so, AMD GPU 
driver fails to load as it hits the above mentioned problem.

As part of GPU driver load, the GPU memory gets hotplugged. When struct 
page mappings are getting created for the newly coming-in pages in 
vmemmap, the newly created PGD is synced with the per-process page 
tables. However the kernel finds that a different mapping for that PGD 
already exists for one of the processes and hence throws up the above error.

The debug print from __add_pages() shows the pfn that is getting added 
and the number of pages like this:
__add_pages pfn fffc010000 nr_pages 67043328 nid 0

Later in sync_global_pgds_l5(), the start and end addresses are coming 
out like this:
start = 0x314480400000 end = 0x3144805fffff

These are essentially the addresses of struct page and such addresses 
for page pointers are unexpected. The start address was obtained from 
page_to_pfn() which for the sparsemem case is defined like this:

#define __pfn_to_page(pfn)      (vmemmap + (pfn))

When the problem is hit, vmemmap was found to have a value of 
0xfffff14580000000. For the pfn value of 0xfffc010000,

start = 0xfffff14580000000(vmemmap) + 0xfffc010000(pfn) * 0x40(size of 
struct page) overflows (wraps around) and results in the start address 
of 0x314480400000.

This points to the problem of vmemmap_base selection by KASLR in 
kernel_randomize_memory(). Once in a while, due to randomization, 
vmemmap_base gets such a high value that when accommodating the 
hot-plugged pages, the address overflows resulting in invalid address 
that gets into problem later when syncing of PGDs.

The test ran for 1000 iterations when KASLR was disabled without hitting 
the issue.

At the outset, it appears that the selection of vmemmap_base doesn't 
seem to consider if there is going to be enough room of accommodating 
future hot plugged pages.

Also as per x86_64/mm.rst, for 5 level page table case, the range for 
vmemmap is ffd4000000000000 - ffd5ffffffffffff. Is it correct for 
vmemmap_base to start from a value which is outside the prescribed range 
as seen in this case?

Any pointers on how to correctly address this issue?

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: AMD GPU driver load hitting BUG_ON in sync_global_pgds_l5()
  2025-04-22  6:34 AMD GPU driver load hitting BUG_ON in sync_global_pgds_l5() Bharata B Rao
@ 2025-04-22  7:14 ` Balbir Singh
  2025-04-22  8:28   ` Bharata B Rao
  2025-04-23  6:40   ` Bharata B Rao
  2025-04-22 15:13 ` Dave Hansen
  1 sibling, 2 replies; 8+ messages in thread
From: Balbir Singh @ 2025-04-22  7:14 UTC (permalink / raw)
  To: Bharata B Rao, linux-kernel, linux-mm
  Cc: Dave Hansen, luto, peterz, tglx, mingo, bp, x86, hpa, nikunj,
	kees, alexander.deucher

On 4/22/25 16:34, Bharata B Rao wrote:
> Hi,
> 
> Nikunj and I have been debugging an issue seen during AMD GPU driver loading where we see the below failure:
> 
> -----------------------------------------
> kernel BUG at arch/x86/mm/init_64.c:173!
> invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 4 PID: 1222 Comm: modprobe Tainted: G            E      6.8.12+ #3
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b70-prebuilt.qemu.org 04/01/2014
> RIP: 0010:sync_global_pgds+0x343/0x560
> Code: fb 66 9e 01 49 89 c0 48 89 f8 0f 1f 00 48 23 05 4b 92 9f 01 48 25 00 f0 ff ff 48 03 05 de 66 9e 01 4c 39 c0 0f 84 c8 fd ff ff <0f> 0b 49 8b 75 00 4c 89 ff e8 af 62 ff ff 90 e9 d3 fd ff ff 48 8b
> RSP: 0018:ff52bf8d40a7f4e8 EFLAGS: 00010206
> RAX: ff29cef78ad1a000 RBX: fffff1458477e080 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000010ad1a067
> RBP: ff52bf8d40a7f530 R08: ff29cef78a0d0000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ff29cef79bd8322c
> R13: ffffffffafc3c000 R14: 0000314480400000 R15: ff29cef79df82000
> FS:  00007e1c04bf8000(0000) GS:ff29cfe72ea00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007e7e161f2a50 CR3: 0000000112c9a004 CR4: 0000000000771ef0
> PKRU: 55555554
> Call Trace:
>  <TASK>
>  ? show_regs+0x72/0x90
>  ? die+0x38/0xb0
>  ? do_trap+0xe3/0x100
>  ? do_error_trap+0x75/0xb0
>  ? sync_global_pgds+0x343/0x560
>  ? exc_invalid_op+0x53/0x80
>  ? sync_global_pgds+0x343/0x560
>  ? asm_exc_invalid_op+0x1b/0x20
>  ? sync_global_pgds+0x343/0x560
>  ? sync_global_pgds+0x2d4/0x560
>  vmemmap_populate+0x73/0xd0
>  __populate_section_memmap+0x1fc/0x440
>  sparse_add_section+0x155/0x390
>  __add_pages+0xd1/0x190
>  add_pages+0x17/0x70
>  memremap_pages+0x471/0x6d0
>  devm_memremap_pages+0x23/0x70
>  kgd2kfd_init_zone_device+0x14a/0x270 [amdgpu]
>  amdgpu_device_init+0x3042/0x3150 [amdgpu]
>  ? do_pci_enable_device+0xcc/0x110
>  amdgpu_driver_load_kms+0x1a/0x1c0 [amdgpu]
>  amdgpu_pci_probe+0x1ba/0x610 [amdgpu]
>  ? _raw_spin_unlock_irqrestore+0x11/0x60
>  local_pci_probe+0x4b/0xb0
>  pci_device_probe+0xc8/0x290
>  really_probe+0x1d5/0x440
>  __driver_probe_device+0x8a/0x190
>  driver_probe_device+0x23/0xd0
>  __driver_attach+0x10f/0x220
>  ? __pfx___driver_attach+0x10/0x10
>  bus_for_each_dev+0x7d/0xe0
>  driver_attach+0x1e/0x30
>  bus_add_driver+0x14e/0x290
>  driver_register+0x64/0x140
>  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
>  __pci_register_driver+0x61/0x70
>  amdgpu_init+0x69/0xff0 [amdgpu]
>  do_one_initcall+0x49/0x330
>  ? kmalloc_trace+0x136/0x380
>  do_init_module+0x99/0x2b0
>  load_module+0x241e/0x24e0
>  init_module_from_file+0x9a/0x100
>  ? init_module_from_file+0x9a/0x100
>  idempotent_init_module+0x184/0x240
>  __x64_sys_finit_module+0x64/0xd0
>  x64_sys_call+0x1c4c/0x2660
>  do_syscall_64+0x80/0x170
>  ? ksys_mmap_pgoff+0x123/0x270
>  ? do_syscall_64+0x8c/0x170
>  ? syscall_exit_to_user_mode+0x83/0x260
>  ? do_syscall_64+0x8c/0x170
>  ? do_syscall_64+0x8c/0x170
>  ? exc_page_fault+0x95/0x1b0
>  entry_SYSCALL_64_after_hwframe+0x78/0x80
> RIP: 0033:0x7e1c0431e88d
> Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48
> RSP: 002b:00007fffa97770b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> RAX: ffffffffffffffda RBX: 00006198887830f0 RCX: 00007e1c0431e88d
> RDX: 0000000000000000 RSI: 0000619887b43cd2 RDI: 000000000000000e
> RBP: 0000000000040000 R08: 0000000000000000 R09: 0000000000000002
> R10: 000000000000000e R11: 0000000000000246 R12: 0000619887b43cd2
> R13: 0000619888783220 R14: 0000619888782600 R15: 000061988878d190
>  </TASK>
> -----------------------------------------
> 
> A KVM guest (with 5 level page table enabled) is started with 8 GPUs (AMD GPU driver gets loaded) and CoralGemm workload (matrix multiplication stress) is run inside the guest. The guest is turned off after the workload run completes.
> 
> This test(start guest, run workload, turn off guest) is repeated for hundreds of time and approximately once in 500 such runs or so, AMD GPU driver fails to load as it hits the above mentioned problem.
> 
> As part of GPU driver load, the GPU memory gets hotplugged. When struct page mappings are getting created for the newly coming-in pages in vmemmap, the newly created PGD is synced with the per-process page tables. However the kernel finds that a different mapping for that PGD already exists for one of the processes and hence throws up the above error.
> 
> The debug print from __add_pages() shows the pfn that is getting added and the number of pages like this:
> __add_pages pfn fffc010000 nr_pages 67043328 nid 0
> 
> Later in sync_global_pgds_l5(), the start and end addresses are coming out like this:
> start = 0x314480400000 end = 0x3144805fffff
> 
> These are essentially the addresses of struct page and such addresses for page pointers are unexpected. The start address was obtained from page_to_pfn() which for the sparsemem case is defined like this:
> 
> #define __pfn_to_page(pfn)      (vmemmap + (pfn))
> 
> When the problem is hit, vmemmap was found to have a value of 0xfffff14580000000. For the pfn value of 0xfffc010000,
> 
> start = 0xfffff14580000000(vmemmap) + 0xfffc010000(pfn) * 0x40(size of struct page) overflows (wraps around) and results in the start address of 0x314480400000.
> 
> This points to the problem of vmemmap_base selection by KASLR in kernel_randomize_memory(). Once in a while, due to randomization, vmemmap_base gets such a high value that when accommodating the hot-plugged pages, the address overflows resulting in invalid address that gets into problem later when syncing of PGDs.
> 
> The test ran for 1000 iterations when KASLR was disabled without hitting the issue.
> 
> At the outset, it appears that the selection of vmemmap_base doesn't seem to consider if there is going to be enough room of accommodating future hot plugged pages.
> 
> Also as per x86_64/mm.rst, for 5 level page table case, the range for vmemmap is ffd4000000000000 - ffd5ffffffffffff. Is it correct for vmemmap_base to start from a value which is outside the prescribed range as seen in this case?
> 
> Any pointers on how to correctly address this issue?
> 
> 

Could you please confirm if this is a new issue? Sounds like your hitting it on 6.8.12+?
I've never tested this on a system with 5 levels of page tables, but with 5 levels you get
52 bits of VA and you'll need to look at the KASLR logic (max_pfn + padding) to see where
your ranges are getting assigned.

I'd start by dumping the kaslr_regions array.

Balbir Singh



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: AMD GPU driver load hitting BUG_ON in sync_global_pgds_l5()
  2025-04-22  7:14 ` Balbir Singh
@ 2025-04-22  8:28   ` Bharata B Rao
  2025-04-23  6:40   ` Bharata B Rao
  1 sibling, 0 replies; 8+ messages in thread
From: Bharata B Rao @ 2025-04-22  8:28 UTC (permalink / raw)
  To: Balbir Singh, linux-kernel, linux-mm
  Cc: Dave Hansen, luto, peterz, tglx, mingo, bp, x86, hpa, nikunj,
	kees, alexander.deucher


On 22-Apr-25 12:44 PM, Balbir Singh wrote:
> On 4/22/25 16:34, Bharata B Rao wrote:
> 
> Could you please confirm if this is a new issue? Sounds like your hitting it on 6.8.12+?
> I've never tested this on a system with 5 levels of page tables, but with 5 levels you get
> 52 bits of VA and you'll need to look at the KASLR logic (max_pfn + padding) to see where
> your ranges are getting assigned.

I haven't been able to test this on latest upstream. Will get back on 
this as it can take considerable time to recreate.

Same or similar-look bugs have been discussed earlier too:

https://gitlab.freedesktop.org/drm/amd/-/issues/3244

Disabling 5 level tables seemed to have solved the issue though.

> 
> I'd start by dumping the kaslr_regions array.

Sure, Thanks.

Regards,
Bharata.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: AMD GPU driver load hitting BUG_ON in sync_global_pgds_l5()
  2025-04-22  6:34 AMD GPU driver load hitting BUG_ON in sync_global_pgds_l5() Bharata B Rao
  2025-04-22  7:14 ` Balbir Singh
@ 2025-04-22 15:13 ` Dave Hansen
  2025-04-23  9:30   ` Bharata B Rao
  1 sibling, 1 reply; 8+ messages in thread
From: Dave Hansen @ 2025-04-22 15:13 UTC (permalink / raw)
  To: Bharata B Rao, linux-kernel, linux-mm
  Cc: Dave Hansen, luto, peterz, tglx, mingo, bp, x86, hpa, nikunj,
	Balbir Singh, kees, alexander.deucher

On 4/21/25 23:34, Bharata B Rao wrote:
> At the outset, it appears that the selection of vmemmap_base doesn't
> seem to consider if there is going to be enough room of accommodating
> future hot plugged pages.

Is this future hotplug area in the memory map at boot?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: AMD GPU driver load hitting BUG_ON in sync_global_pgds_l5()
  2025-04-22  7:14 ` Balbir Singh
  2025-04-22  8:28   ` Bharata B Rao
@ 2025-04-23  6:40   ` Bharata B Rao
  1 sibling, 0 replies; 8+ messages in thread
From: Bharata B Rao @ 2025-04-23  6:40 UTC (permalink / raw)
  To: Balbir Singh, linux-kernel, linux-mm
  Cc: Dave Hansen, luto, peterz, tglx, mingo, bp, x86, hpa, nikunj,
	kees, alexander.deucher

On 22-Apr-25 12:44 PM, Balbir Singh wrote:
> On 4/22/25 16:34, Bharata B Rao wrote:
> 
> Could you please confirm if this is a new issue? Sounds like your hitting it on 6.8.12+?
> I've never tested this on a system with 5 levels of page tables, but with 5 levels you get
> 52 bits of VA and you'll need to look at the KASLR logic (max_pfn + padding) to see where
> your ranges are getting assigned.
> 
> I'd start by dumping the kaslr_regions array.

Here is how the ranges look like for the upstream kernel for two boots:

Iteration 1:
kaslr_region: base[0] ff4552df80000000 size_tb 1000
kaslr_region: end[0] fffffffffffff
kaslr_region: base[1] ff69c69640000000 size_tb 3200
kaslr_region: base[2] ffd3140680000000 size_tb 40

Iteration 2:
kaslr_region: base[0] ff3a9c84c0000000 size_tb 1000
kaslr_region: end[0] fffffffffffff
kaslr_region: base[1] ff73d27480000000 size_tb 3200
kaslr_region: base[2] fff01edb40000000 size_tb 40

Regards,
Bharata.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: AMD GPU driver load hitting BUG_ON in sync_global_pgds_l5()
  2025-04-22 15:13 ` Dave Hansen
@ 2025-04-23  9:30   ` Bharata B Rao
  2025-04-23 16:01     ` Dave Hansen
  0 siblings, 1 reply; 8+ messages in thread
From: Bharata B Rao @ 2025-04-23  9:30 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel, linux-mm
  Cc: Dave Hansen, luto, peterz, tglx, mingo, bp, x86, hpa, nikunj,
	Balbir Singh, kees, alexander.deucher

On 22-Apr-25 8:43 PM, Dave Hansen wrote:
> On 4/21/25 23:34, Bharata B Rao wrote:
>> At the outset, it appears that the selection of vmemmap_base doesn't
>> seem to consider if there is going to be enough room of accommodating
>> future hot plugged pages.
> 
> Is this future hotplug area in the memory map at boot?

The KVM guest isn't using any -m maxmem option if that's what you are 
hinting at.

BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
BIOS-e820: [mem 0x0000000000100000-0x000000007ffdafff] usable
BIOS-e820: [mem 0x000000007ffdb000-0x000000007fffffff] reserved
BIOS-e820: [mem 0x00000000b0000000-0x00000000bfffffff] reserved
BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved
BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
BIOS-e820: [mem 0x0000000100000000-0x000000f4a3ffffff] usable
BIOS-e820: [mem 0x000000fd00000000-0x000000ffffffffff] reserved

kaslr_region: base[0] ff4552df80000000 size_tb 1000
kaslr_region: end[0] fffffffffffff
kaslr_region: base[1] ff69c69640000000 size_tb 3200
kaslr_region: base[2] ffd3140680000000 size_tb 40

So vmemmap_base is 0xffd3140680000000

Also the last and max_arch pfns are reported like this:
last_pfn = 0x7ffdb max_arch_pfn = 0x10000000000

Here is some data for the hotplug that happens for the 8 GPUs.

Driver is passing the following values for pgmap->range.start, 
pgmap->range.end and pgmap->type in dev_memremap_pages():

amdgpu: kgd2kfd_init_zone_device: start fffc010000000 end fffffffffffff 
type 1
amdgpu: kgd2kfd_init_zone_device: start fff8020000000 end fffc00fffffff 
type 1
amdgpu: kgd2kfd_init_zone_device: start fff4030000000 end fff801fffffff 
type 1
amdgpu: kgd2kfd_init_zone_device: start fff0040000000 end fff402fffffff 
type 1
amdgpu: kgd2kfd_init_zone_device: start ffec050000000 end fff003fffffff 
type 1
amdgpu: kgd2kfd_init_zone_device: start ffe8060000000 end ffec04fffffff 
type 1
amdgpu: kgd2kfd_init_zone_device: start ffe4070000000 end ffe805fffffff 
type 1
amdgpu: kgd2kfd_init_zone_device: start ffe0080000000 end ffe406fffffff 
type 1

The pfn and the number of pages being added in response to the above:
__add_pages pfn fffc010000 nr_pages 67043328 nid 0
__add_pages pfn fff8020000 nr_pages 67043328 nid 0
__add_pages pfn fff4030000 nr_pages 67043328 nid 0
__add_pages pfn fff0040000 nr_pages 67043328 nid 0
__add_pages pfn ffec050000 nr_pages 67043328 nid 0
__add_pages pfn ffe8060000 nr_pages 67043328 nid 0
__add_pages pfn ffe4070000 nr_pages 67043328 nid 0
__add_pages pfn ffe0080000 nr_pages 67043328 nid 0


For the above vmemmap_base, the (first) addresses seen in
sync_global_pgds_l5() for the above 8 hotplug cases are like this:
start ffd3540580400000, end = ffd35405805fffff
start ffd3540480800000, end = ffd35404809fffff
start ffd3540380c00000, end = ffd3540380dfffff
start ffd3540281000000, end = ffd35402811fffff
start ffd3540181400000, end = ffd35401815fffff
start ffd3540081800000, end = ffd35400819fffff
start ffd353ff81c00000, end = ffd353ff81dfffff
start ffd353fe82000000, end = ffd353fe821fffff

This is for the case that succeeds while I have shown the same data for 
the case that fails in the first mail thread.

When randomization results in bad vmemmap_base address, the hotplug of 
1st page for the 1st GPU results in BUG_ON.

Regards,
Bharata.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: AMD GPU driver load hitting BUG_ON in sync_global_pgds_l5()
  2025-04-23  9:30   ` Bharata B Rao
@ 2025-04-23 16:01     ` Dave Hansen
  2025-04-24 12:54       ` Bharata B Rao
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Hansen @ 2025-04-23 16:01 UTC (permalink / raw)
  To: Bharata B Rao, linux-kernel, linux-mm
  Cc: Dave Hansen, luto, peterz, tglx, mingo, bp, x86, hpa, nikunj,
	Balbir Singh, kees, alexander.deucher

On 4/23/25 02:30, Bharata B Rao wrote:
> On 22-Apr-25 8:43 PM, Dave Hansen wrote:
>> On 4/21/25 23:34, Bharata B Rao wrote:
>>> At the outset, it appears that the selection of vmemmap_base doesn't
>>> seem to consider if there is going to be enough room of accommodating
>>> future hot plugged pages.
>>
>> Is this future hotplug area in the memory map at boot?
> 
> The KVM guest isn't using any -m maxmem option if that's what you are
> hinting at.

How could vmemmap_base consider future hotplug areas if it isn't told
where they will be?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: AMD GPU driver load hitting BUG_ON in sync_global_pgds_l5()
  2025-04-23 16:01     ` Dave Hansen
@ 2025-04-24 12:54       ` Bharata B Rao
  0 siblings, 0 replies; 8+ messages in thread
From: Bharata B Rao @ 2025-04-24 12:54 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel, linux-mm
  Cc: Dave Hansen, luto, peterz, tglx, mingo, bp, x86, hpa, nikunj,
	Balbir Singh, kees, alexander.deucher



On 23-Apr-25 9:31 PM, Dave Hansen wrote:
> On 4/23/25 02:30, Bharata B Rao wrote:
>> On 22-Apr-25 8:43 PM, Dave Hansen wrote:
>>> On 4/21/25 23:34, Bharata B Rao wrote:
>>>> At the outset, it appears that the selection of vmemmap_base doesn't
>>>> seem to consider if there is going to be enough room of accommodating
>>>> future hot plugged pages.
>>>
>>> Is this future hotplug area in the memory map at boot?
>>
>> The KVM guest isn't using any -m maxmem option if that's what you are
>> hinting at.
> 
> How could vmemmap_base consider future hotplug areas if it isn't told
> where they will be?

This is device private memory which means only struct pages need to be 
mapped. What's the way by which kernel will know about the future growth 
in the number of struct pages to accommodate the incoming device private 
memory?

In any case, how can kaslr put vmemmap_base completely out of the range 
earmarked for it in mm.rst?

Regards,
Bharata.





^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-04-24 12:54 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-22  6:34 AMD GPU driver load hitting BUG_ON in sync_global_pgds_l5() Bharata B Rao
2025-04-22  7:14 ` Balbir Singh
2025-04-22  8:28   ` Bharata B Rao
2025-04-23  6:40   ` Bharata B Rao
2025-04-22 15:13 ` Dave Hansen
2025-04-23  9:30   ` Bharata B Rao
2025-04-23 16:01     ` Dave Hansen
2025-04-24 12:54       ` Bharata B Rao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox