Re: walk_pgd_range BUG: unable to handle page fault

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Tytus Rogalewski <tytanick@gmail.com>
To: "David Hildenbrand (Arm)" <david@kernel.org>
Cc: linux-mm@kvack.org, muchun.song@linux.dev, osalvador@suse.de,
	 Jens Axboe <axboe@kernel.dk>
Subject: Re: walk_pgd_range BUG: unable to handle page fault
Date: Thu, 5 Mar 2026 12:29:30 +0100	[thread overview]
Message-ID: <CANfXJzs29Zv7Qw7cAiEzDLdTJpQEbm6mPeLTpuZkErV5GsDKfQ@mail.gmail.com> (raw)
In-Reply-To: <0dfd588c-3284-4310-b1cd-aab54567f0c0@kernel.org>


[-- Attachment #1.1: Type: text/plain, Size: 12504 bytes --]

Memory space was fine.
I have big mem usage as i allocating 95% memory into huge pages 1G
so the rest is 32-64GB of memory for system. Anyway i had plenty at that
time when this happened
It was on kernel Kernel: 6.18.8-pbk

Claude was able to remind me that issue:
walk_pgd_range BUG on PVE12 — Crash Details

  Server: PVE12 (10.10.42.12)
  Date: February 4, 2026, 21:42:16 UTC
  Kernel: 6.18.8-pbk (custom build, compiled January 30, 2026) - this is
from https://prebuiltkernels.com/

  Error:
  BUG: unable to handle page fault for address: ff164aee00000000
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 4602067 P4D 0
  Oops: 0000 [#1] SMP NOPTI
  CPU: 125  UID: 0  PID: 783442  Comm: qm  Not tainted 6.18.8-pbk #1
PREEMPT(voluntary)
  Hardware: TURIN2D24G-2L+/500W (AMD Turin platform)
  RIP: 0010:walk_pgd_range+0x6ff/0xbb0

  Call Trace:
  walk_pgd_range+0x6ff/0xbb0
   → __walk_page_range+0x8e/0x220
    → walk_page_vma+0x92/0xe0
     → smap_gather_stats.part.0+0x8c/0xd0
      → show_smaps_rollup+0x258/0x420
       → seq_read_iter → seq_read → vfs_read → ksys_read
        → __x64_sys_read → do_syscall_64 → entry_SYSCALL_64

  Root Cause Analysis:
  The qm process (Proxmox QEMU VM manager, PID 783442) triggered a kernel
page fault while reading /proc/<pid>/smaps_rollup. The fault occurred in
walk_pgd_range() when the kernel attempted to dereference address
ff164aee00000000 (stored in R12), which was a not-present page. The
corrupted
  pointer pattern suggests a stale or corrupted page table entry.

  Likely Causes (ranked):
  1. Race condition between KVM/QEMU VM memory operations and
/proc/smaps_rollup reads
  2. Bug in vanilla kernel 6.18.x page table handling with VFIO passthrough
(mlx5, vfio-pci modules were loaded)
  3. Possible instability introduced by custom -pbk kernel patches
  4. Memory corruption from VFIO PCI passthrough affecting page table
structures

  Key Registers at Crash:
  - R12 (faulting pointer): ff164aee00000000
  - CR2 (fault address): ff164aee00000000
  - RSI: 000071afffffffff (end of VMA range)
  - R14: 000071afc0000000 (start of VMA range)

  Free Memory: Not checked during that session — no memory statistics were
collected.

  Previous Kernel (boot -1): Same version 6.18.8-pbk — the system was
rebooted after the crash and came back on the same kernel.


[image: CleanShot 2026-03-05 at 12.25.32@2x.png]

--

tel. 790 202 300

*Tytus Rogalewski*

Dolina Krzemowa 6A

83-010 Jagatowo

NIP: 9570976234


czw., 5 mar 2026 o 12:18 David Hildenbrand (Arm) <david@kernel.org>
napisał(a):

> On 3/5/26 09:11, Tytus Rogalewski wrote:
> > Hi David,
>
> Hi,
>
> let me CC Jens.
>
> @Jens, thread starts here:
>
> https://marc.info/?l=linux-mm&m=176961245128225&w=2
>
> For some reason, I cannot find that mail on lore, only my reply here:
>
>
> https://lore.kernel.org/all/5948f3a6-8f30-4c45-9b86-2af9a6b37405@kernel.org/
>
> >
> > This is strange but the issue stopped when i changed io_uring to threads.
> > Would that make any sense ?
>
> On the screenshot I can see that "Async IO" is changed from Default
> (io_uring) to "threads". I assume that changes QEMU's behavior to not
> use io_uring for I/O.
>
> > We did also few other smaller things but honestly issues stopped.
> > Was it fixed or maybe Async IO could cause it ? (The qcow2 image is on
> > fuse mounting).
> >
> > I am not certain but if that makes any sense, should i report this to
> > someone working on io_uring or it should have nothing to do with that
> bug ?
>
> Good question. It seems unrelated at first, but maybe it's related to
> memory consumption with io_uring (below).
>
> >
> > CleanShot 2026-03-05 at 09.08.15@2x.png
> >
> > On all those kernels i seen no such bug for few weeks after changing
> > async io.
> >
>
> What was the latest kernel you are running?
>
> > CleanShot 2026-03-05 at 09.09.03@2x.png
> >
>
> Looking back at the original report, I can see that the system has
> extremely
> little free memory. Maybe that gets eaten by io_uring somehow?
>
> Do you still see that memory consumption when not using io_uring?
>
>
> Let me copy the original information for Jens, maybe something could
> give him a clue whether io_uring could be involved here.
>
> "It happens on multiple servers (less on 6.18.6, more on 6.19-rc4+).
> All servers are doing KVM with vfio GPU PCIE passthrough and it
> happens when i am using HUGEPAGE 1GB + qemu Basically i am allocating
> 970GB into hugepages, leaving 37GB to kvm. In normal operation i have
> about 20GB free space but when this issue occurs, all RAM is taken and
> even when i have added 100GB swap, it was also consumed. It can work
> for days or week without issue and
>
> I did not seen that issue when i had hugepages disabled (on normal 2KB
> pages allocation in kvm). And i am using hugepages as it is impossible
> to boot VM with >200GB ram. When that issue happens, process ps hangs
> and only top shows something but machine needs to be rebooted due to many
> zombiee processes.
>
>
> Hardware:
> Motherboard: ASRockRack GENOA2D24G-2L
> CPU: 2x AMD EPYC 9654 96-Core Processor
> System ram: 1024 GB
> GPUs: 8x RTX5090 vfio passthrough
>
> root@pve14:~# uname -a
> Linux pve14 6.18.6-pbk #1 SMP PREEMPT_DYNAMIC Mon Jan 19 20:59:46 UTC 2026
> x86_64 GNU/Linux
>
> [171053.341288] BUG: unable to handle page fault for address:
> ff469ae640000000
> [171053.341310] #PF: supervisor read access in kernel mode
> [171053.341319] #PF: error_code(0x0000) - not-present page
> [171053.341328] PGD 4602067 P4D 0
> [171053.341337] Oops: Oops: 0000 [#1] SMP NOPTI
> [171053.341348] CPU: 16 UID: 0 PID: 3250869 Comm: qm Not tainted
> 6.18.6-pbk #1 PREEMPT(voluntary)
> [171053.341362] Hardware name:  TURIN2D24G-2L+/500W/TURIN2D24G-2L+/500W,
> BIOS 10.20 05/05/2025
> [171053.341373] RIP: 0010:walk_pgd_range+0x6ff/0xbb0
> [171053.341386] Code: 08 49 39 dd 0f 84 8c 01 00 00 49 89 de 49 8d 9e 00
> 00 20 00 48 8b 75 b8 48 81 e3 00 00 e0 ff 48 8d 43 ff 48 39 f0 49 0f 43 dd
> <49> f7 04 24 9f ff ff ff 0f 84 e2 fd ff ff 48 8b 45 c0 41 c7 47 20
> [171053.341406] RSP: 0018:ff59d95d70e6b748 EFLAGS: 00010287
> [171053.341416] RAX: 00007a22401fffff RBX: 00007a2240200000 RCX:
> 0000000000000000
> [171053.341425] RDX: 0000000000000000 RSI: 00007a227fffffff RDI:
> 800008dfc00002b7
> [171053.341435] RBP: ff59d95d70e6b828 R08: 0000000000000080 R09:
> 0000000000000000
> [171053.341444] R10: ffffffff8de588c0 R11: 0000000000000000 R12:
> ff469ae640000000
> [171053.341454] R13: 00007a2280000000 R14: 00007a2240000000 R15:
> ff59d95d70e6b8a8
> [171053.341464] FS:  00007d4e8ec94b80(0000) GS:ff4692876ae7e000(0000)
> knlGS:0000000000000000
> [171053.341476] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [171053.341485] CR2: ff469ae640000000 CR3: 0000008241eed006 CR4:
> 0000000000f71ef0
> [171053.341495] PKRU: 55555554
> [171053.341501] Call Trace:
> [171053.341508]  <TASK>
> [171053.341518]  __walk_page_range+0x8e/0x220
> [171053.341529]  ? sysvec_apic_timer_interrupt+0x57/0xc0
> [171053.341541]  walk_page_vma+0x92/0xe0
> [171053.341551]  smap_gather_stats.part.0+0x8c/0xd0
> [171053.341563]  show_smaps_rollup+0x258/0x420
> [171053.341577]  seq_read_iter+0x137/0x4c0
> [171053.341588]  seq_read+0xf5/0x140
> [171053.341596]  ? __x64_sys_move_mount+0x11/0x40
> [171053.341607]  vfs_read+0xbb/0x350
> [171053.341617]  ? do_syscall_64+0xb8/0xd00
> [171053.341627]  ? srso_alias_return_thunk+0x5/0xfbef5
> [171053.341637]  ? strncpy_from_user+0x27/0x130
> [171053.341649]  ksys_read+0x69/0xf0
> [171053.341658]  __x64_sys_read+0x19/0x30
> [171053.341666]  x64_sys_call+0x2180/0x25a0
> [171053.341855]  do_syscall_64+0x80/0xd00
> [171053.342029]  ? srso_alias_return_thunk+0x5/0xfbef5
> [171053.342198]  ? __x64_sys_ioctl+0x83/0x100
> [171053.342367]  ? srso_alias_return_thunk+0x5/0xfbef5
> [171053.342532]  ? srso_alias_return_thunk+0x5/0xfbef5
> [171053.342696]  ? x64_sys_call+0xac0/0x25a0
> [171053.342857]  ? srso_alias_return_thunk+0x5/0xfbef5
> [171053.343019]  ? do_syscall_64+0xb8/0xd00
> [171053.343181]  ? seq_read+0xf5/0x140
> [171053.343341]  ? __x64_sys_move_mount+0x11/0x40
> [171053.343504]  ? srso_alias_return_thunk+0x5/0xfbef5
> [171053.343662]  ? vfs_read+0xbb/0x350
> [171053.343819]  ? srso_alias_return_thunk+0x5/0xfbef5
> [171053.343973]  ? ksys_read+0x69/0xf0
> [171053.344126]  ? srso_alias_return_thunk+0x5/0xfbef5
> [171053.344280]  ? generic_file_llseek+0x21/0x40
> [171053.344432]  ? srso_alias_return_thunk+0x5/0xfbef5
> [171053.344582]  ? kernfs_fop_llseek+0x7b/0xd0
> [171053.344730]  ? srso_alias_return_thunk+0x5/0xfbef5
> [171053.344873]  ? ksys_lseek+0x4f/0xd0
> [171053.345010]  ? srso_alias_return_thunk+0x5/0xfbef5
> [171053.345144]  ? __x64_sys_lseek+0x18/0x30
> [171053.345275]  ? srso_alias_return_thunk+0x5/0xfbef5
> [171053.345407]  ? x64_sys_call+0x1abe/0x25a0
> [171053.345535]  ? srso_alias_return_thunk+0x5/0xfbef5
> [171053.345665]  ? do_syscall_64+0xb8/0xd00
> [171053.345792]  ? srso_alias_return_thunk+0x5/0xfbef5
> [171053.345919]  ? irqentry_exit+0x43/0x50
> [171053.346044]  ? srso_alias_return_thunk+0x5/0xfbef5
> [171053.346169]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [171053.346292] RIP: 0033:0x7d4e8ed61687
> [171053.346417] Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00
> 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05
> <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff
> [171053.346687] RSP: 002b:00007ffdd7c76000 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000000
> [171053.346828] RAX: ffffffffffffffda RBX: 00007d4e8ec94b80 RCX:
> 00007d4e8ed61687
> [171053.346969] RDX: 0000000000002000 RSI: 000061ff84297ce0 RDI:
> 0000000000000006
> [171053.347111] RBP: 0000000000002000 R08: 0000000000000000 R09:
> 0000000000000000
> [171053.347253] R10: 0000000000000000 R11: 0000000000000202 R12:
> 000061ff84297ce0
> [171053.347394] R13: 000061ff7d3d62a0 R14: 0000000000000006 R15:
> 000061ff842478c0
> [171053.347542]  </TASK>
> [171053.347684] Modules linked in: sctp ip6_udp_tunnel udp_tunnel
> nf_tables bridge stp llc softdog bonding sunrpc binfmt_misc nfnetlink_log
> amd_atl intel_rapl_msr intel_rapl_common nls_iso8859_1 amd64_edac
> edac_mce_amd kvm_amd snd_pcm snd_timer dax_hmem ipmi_ssif kvm cxl_acpi snd
> polyval_clmulni ghash_clmulni_intel cxl_port soundcore aesni_intel rapl
> cxl_core acpi_ipmi einj pcspkr ast ipmi_si spd5118 ipmi_devintf k10temp ccp
> ipmi_msghandler joydev input_leds mac_hid sch_fq_codel msr vhost_net vhost
> vhost_iotlb tap vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio
> iommufd efi_pstore nfnetlink dmi_sysfs autofs4 btrfs blake2b_generic xor
> raid6_pq mlx5_ib ib_uverbs macsec ib_core dm_thin_pool dm_persistent_data
> dm_bio_prison dm_bufio nvme mlx5_core nvme_core cdc_ether nvme_keyring igb
> mlxfw psample usbnet nvme_auth i2c_algo_bit usbkbd mii hid_generic hkdf tls
> dca ahci i2c_piix4 libahci i2c_smbus usbmouse usbhid hid
> [171053.349092] CR2: ff469ae640000000
> [171053.349269] ---[ end trace 0000000000000000 ]---
> [171054.248409] RIP: 0010:walk_pgd_range+0x6ff/0xbb0
> [171054.248750] Code: 08 49 39 dd 0f 84 8c 01 00 00 49 89 de 49 8d 9e 00
> 00 20 00 48 8b 75 b8 48 81 e3 00 00 e0 ff 48 8d 43 ff 48 39 f0 49 0f 43 dd
> <49> f7 04 24 9f ff ff ff 0f 84 e2 fd ff ff 48 8b 45 c0 41 c7 47 20
> [171054.249177] RSP: 0018:ff59d95d70e6b748 EFLAGS: 00010287
> [171054.249392] RAX: 00007a22401fffff RBX: 00007a2240200000 RCX:
> 0000000000000000
> [171054.249820] RDX: 0000000000000000 RSI: 00007a227fffffff RDI:
> 800008dfc00002b7
> [171054.250036] RBP: ff59d95d70e6b828 R08: 0000000000000080 R09:
> 0000000000000000
> [171054.250253] R10: ffffffff8de588c0 R11: 0000000000000000 R12:
> ff469ae640000000
> [171054.250471] R13: 00007a2280000000 R14: 00007a2240000000 R15:
> ff59d95d70e6b8a8
> [171054.250691] FS:  00007d4e8ec94b80(0000) GS:ff4692876ae7e000(0000)
> knlGS:0000000000000000
> [171054.250914] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [171054.251137] CR2: ff469ae640000000 CR3: 0000008241eed006 CR4:
> 0000000000f71ef0
> [171054.251375] PKRU: 55555554
> [171054.251601] note: qm[3250869] exited with irqs disabled
>
> --
> Cheers,
>
> David
>

[-- Attachment #1.2: Type: text/html, Size: 14902 bytes --]

[-- Attachment #2: CleanShot 2026-03-05 at 12.25.32@2x.png --]
[-- Type: image/png, Size: 152816 bytes --]

next prev parent reply	other threads:[~2026-03-05 11:29 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CANfXJzt4P+FCkdL_=FfmG80_bY8FkzSocJSPeksSQ_vXObRNOQ@mail.gmail.com>
2026-02-04 21:52 ` David Hildenbrand (arm)
2026-02-04 22:24   ` Tytus Rogalewski
2026-02-04 22:50     ` Tytus Rogalewski
2026-02-05 12:44       ` David Hildenbrand (Arm)
2026-02-05 12:46         ` Tytus Rogalewski
2026-02-05 12:57           ` David Hildenbrand (Arm)
2026-02-05 13:20             ` Tytus Rogalewski
2026-03-05  8:11               ` Tytus Rogalewski
2026-03-05 11:17                 ` David Hildenbrand (Arm)
2026-03-05 11:29                   ` Tytus Rogalewski [this message]
2026-03-05 11:33                     ` David Hildenbrand (Arm)
2026-03-05 11:34                       ` Tytus Rogalewski
2026-03-05 11:38                         ` David Hildenbrand (Arm)
2026-03-05 11:39                           ` Tytus Rogalewski
2026-03-05 11:40                             ` David Hildenbrand (Arm)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANfXJzs29Zv7Qw7cAiEzDLdTJpQEbm6mPeLTpuZkErV5GsDKfQ@mail.gmail.com \
    --to=tytanick@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=david@kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    --cc=osalvador@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox