linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Tytus Rogalewski <tytanick@gmail.com>
To: "David Hildenbrand (arm)" <david@kernel.org>
Cc: linux-mm@kvack.org, muchun.song@linux.dev, osalvador@suse.de
Subject: Re: walk_pgd_range BUG: unable to handle page fault
Date: Wed, 4 Feb 2026 23:50:12 +0100	[thread overview]
Message-ID: <CANfXJztYakHdsY0757+ssQf0WYtnrWdGd8mFegp6PLKZ_SKnCg@mail.gmail.com> (raw)
In-Reply-To: <CANfXJzsWFyKXJKsESM+7JXoGkDSeQt+Qaimy3FV1-neyXiHZBg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 12669 bytes --]

Just had another crash and it is still happening in 6.18.8 . Here is claude
output this time as i am lazy :)
Let me know what steps should i do next.


Claude:

  Summary (English)

  Crash Type: Kernel Oops - NULL pointer dereference / page fault

  Timeline: February 4, 2026 at 21:42:16

  Environment:
  - Host: pve12 (Proxmox VE)
  - Hardware: AMD EPYC Turin (TURIN2D24G-2L+/500W)
  - Kernel: 6.18.8-pbk (custom build)
  - CPU affected: Core 125 (of 128+ cores)

  Root Cause:
  The kernel encountered a page fault while the qm process (Proxmox QEMU VM
manager, PID 783442) was reading /proc/<pid>/smaps_rollup. The fault
occurred in the walk_pgd_range() function at address ff164aee00000000,
which is an
  invalid/not-present page.

  Call Stack:
  walk_pgd_range() → __walk_page_range() → walk_page_vma() →
  smap_gather_stats() → show_smaps_rollup() → seq_read() → vfs_read()

  Analysis:
  This is a race condition or memory corruption bug in the kernel's page
table walking code. The qm tool was attempting to gather memory statistics
from a VM process when the kernel dereferenced a stale or corrupted page
table entry
   (PTE). The address ff164aee00000000 in R12 suggests a corrupted pointer
that should have pointed to a valid page table structure.

  Potential Causes:
  1. Race condition between VM memory operations and /proc/smaps_rollup
reads
  2. Bug in kernel 6.18.8 page table handling with KVM/VFIO passthrough
  3. Custom kernel patches (-pbk) introducing instability (this is clean
ubuntu config)
  4. Possible memory corruption from VFIO PCI passthrough (mlx5, vfio-pci
modules loaded)

  Recommendations:
  1. Check upstream kernel 6.18.x changelogs for related fixes
  2. Test with vanilla kernel (without -pbk patches) (it happens on
other kernels too)
  3. Monitor for recurrence and check if specific VMs trigger the issue
(no, multiple ones)
  4. Consider enabling kernel debugging (CONFIG_DEBUG_VM) to catch similar
issues earlier




Bash(ssh root@10.10.42.12 "journalctl -b -1 -k --no-pager | grep -A 100
'BUG: unable to handle page fault'")
  ⎿  Feb 04 21:42:16 pve12 kernel: BUG: unable to handle page fault for
address: ff164aee00000000
     Feb 04 21:42:16 pve12 kernel: #PF: supervisor read access in kernel
mode
     Feb 04 21:42:16 pve12 kernel: #PF: error_code(0x0000) - not-present
page
     … +106 lines (ctrl+o to expand)

⏺ Crash Logs

  Feb 04 21:42:16 pve12 kernel: BUG: unable to handle page fault for
address: ff164aee00000000
  Feb 04 21:42:16 pve12 kernel: #PF: supervisor read access in kernel mode
  Feb 04 21:42:16 pve12 kernel: #PF: error_code(0x0000) - not-present page
  Feb 04 21:42:16 pve12 kernel: PGD 4602067 P4D 0
  Feb 04 21:42:16 pve12 kernel: Oops: Oops: 0000 [#1] SMP NOPTI
  Feb 04 21:42:16 pve12 kernel: CPU: 125 UID: 0 PID: 783442 Comm: qm Not
tainted 6.18.8-pbk #1 PREEMPT(voluntary)
  Feb 04 21:42:16 pve12 kernel: Hardware name:
 TURIN2D24G-2L+/500W/TURIN2D24G-2L+/500W, BIOS 10.12 11/14/2024
  Feb 04 21:42:16 pve12 kernel: RIP: 0010:walk_pgd_range+0x6ff/0xbb0
  Feb 04 21:42:16 pve12 kernel: Code: 08 49 39 dd 0f 84 8c 01 00 00 49 89
de 49 8d 9e 00 00 20 00 48 8b 75 b8 48 81 e3 00 00 e0 ff 48 8d 43 ff 48 39
f0 49 0f 43 dd <49> f7 04 24 9f ff ff ff 0f 84 e2 fd ff ff 48 8b 45 c0 41
c7 47 20
  Feb 04 21:42:16 pve12 kernel: RSP: 0018:ff643a4c403439a8 EFLAGS: 00010287
  Feb 04 21:42:16 pve12 kernel: RAX: 000071afc01fffff RBX: 000071afc0200000
RCX: 0000000000000000
  Feb 04 21:42:16 pve12 kernel: RDX: 0000000000000000 RSI: 000071afffffffff
RDI: 80002fd0000002b7
  Feb 04 21:42:16 pve12 kernel: RBP: ff643a4c40343a88 R08: 0000000000000080
R09: 0000000000000000
  Feb 04 21:42:16 pve12 kernel: R10: ffffffffb84588c0 R11: 0000000000000000
R12: ff164aee00000000
  Feb 04 21:42:16 pve12 kernel: R13: 000071b000000000 R14: 000071afc0000000
R15: ff643a4c40343b08
  Feb 04 21:42:16 pve12 kernel: FS:  0000767454f93b80(0000)
GS:ff161c21bdefe000(0000) knlGS:0000000000000000
  Feb 04 21:42:16 pve12 kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
  Feb 04 21:42:16 pve12 kernel: CR2: ff164aee00000000 CR3: 00000080dff18005
CR4: 0000000000f71ef0
  Feb 04 21:42:16 pve12 kernel: PKRU: 55555554
  Feb 04 21:42:16 pve12 kernel: Call Trace:
  Feb 04 21:42:16 pve12 kernel:  <TASK>
  Feb 04 21:42:16 pve12 kernel:  __walk_page_range+0x8e/0x220
  Feb 04 21:42:16 pve12 kernel:  walk_page_vma+0x92/0xe0
  Feb 04 21:42:16 pve12 kernel:  smap_gather_stats.part.0+0x8c/0xd0
  Feb 04 21:42:16 pve12 kernel:  show_smaps_rollup+0x258/0x420
  Feb 04 21:42:16 pve12 kernel:  seq_read_iter+0x137/0x4c0
  Feb 04 21:42:16 pve12 kernel:  seq_read+0xf5/0x140
  Feb 04 21:42:16 pve12 kernel:  vfs_read+0xbb/0x350
  Feb 04 21:42:16 pve12 kernel:  ksys_read+0x69/0xf0
  Feb 04 21:42:16 pve12 kernel:  __x64_sys_read+0x19/0x30
  Feb 04 21:42:16 pve12 kernel:  x64_sys_call+0x2180/0x25a0
  Feb 04 21:42:16 pve12 kernel:  do_syscall_64+0x80/0xce0
  Feb 04 21:42:16 pve12 kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
  Feb 04 21:42:16 pve12 kernel:  </TASK>
  Feb 04 21:42:16 pve12 kernel: Modules linked in: sctp ip6_udp_tunnel
udp_tunnel nf_tables bridge stp llc sunrpc softdog bonding binfmt_misc
nfnetlink_log nls_iso8859_1 intel_rapl_msr amd_atl intel_rapl_common
amd64_edac
  edac_mce_amd snd_pcm snd_timer kvm_amd snd ipmi_ssif soundcore kvm
polyval_clmulni dax_hmem ghash_clmulni_intel cxl_acpi aesni_intel cxl_port
rapl cxl_core pcspkr einj ast acpi_ipmi spd5118 ipmi_si k10temp
ipmi_devintf ccp
  ipmi_msghandler input_leds joydev mac_hid sch_fq_codel msr vhost_net
vhost vhost_iotlb tap vfio_pci vfio_pci_core irqbypass vfio_iommu_type1
vfio iommufd efi_pstore nfnetlink dmi_sysfs autofs4 btrfs blake2b_generic
xor raid6_pq
  mlx5_ib ib_uverbs macsec ib_core cdc_ether mlx5_core usbnet usbkbd mii
mlxfw nvme nvme_core psample igb nvme_keyring i2c_algo_bit nvme_auth dca
tls hkdf i2c_piix4 ahci libahci i2c_smbus hid_generic usbmouse usbhid hid
  Feb 04 21:42:16 pve12 kernel: CR2: ff164aee00000000
  Feb 04 21:42:16 pve12 kernel: ---[ end trace 0000000000000000 ]---
  Feb 04 21:42:16 pve12 kernel: RIP: 0010:walk_pgd_range+0x6ff/0xbb0
  Feb 04 21:42:16 pve12 kernel: note: qm[783442] exited with irqs disabled


--

tel. 790 202 300

*Tytus Rogalewski*

Dolina Krzemowa 6A

83-010 Jagatowo

NIP: 9570976234


śr., 4 lut 2026 o 23:24 Tytus Rogalewski <tytanick@gmail.com> napisał(a):

> Hi,
>
> hugepages is qemu term probably.
>
> Yeah 4k is default and booting is hard with that much memory aspecially if
> you boot , stop and few times.
> But this issue might be strictly related to vfio passthrough mix.
> I did not tested 2mb pages actually because why to use it if i have 1GB ?
> Do you think it could be more stable than 1GB or should it be the same
> logic as 2MB ?
>
> Well. i started to use 1GB ones recently as i had to get through all this
> iommu cpu labirynth with binding proper gpu to proper memory and proper cpu
> affinity in kvm. And proxmox ve does not have such logic.
>
> If you tell me what to collect, i can collect it.
>
> I have other symptom actually. Hmm maybe its related or maybe not.
> Still i had this second symptom from the beginning and i did nit had such
> crashes on 4k.
> I am using distributed network storage moosefs and mounting it via fuse.
> Then using qcow2 vm images.
> I am having freezes sometimes in VMs but that might be related to that
> fuse as i mount one fuse share and starting even 8 vms from that one mount.
> And from time to time some vms stop responding or freeze.
> I will soon rewrite it to use NBD istead and that should be fixed if that
> was caused by fuse.
> Still i am not sure actually if thise are separate issues or related and
> which triggers which.
> If there is blocked fuse process by vm A is it possible that vm B might
> throw this walk page bug or it should not be related even if disk slows
> down ?
>
> --
>
> tel. 790 202 300
>
> *Tytus Rogalewski*
>
> Dolina Krzemowa 6A
>
> 83-010 Jagatowo
>
> NIP: 9570976234
>
>
> W dniu śr., 4 lut 2026 o 22:52 David Hildenbrand (arm) <david@kernel.org>
> napisał(a):
>
>> On 1/28/26 15:14, Tytus Rogalewski wrote:
>> > Hello guys,
>> >
>>
>> Hi!
>>
>> > Recently i have reported slab memory leak and it was fixed.
>> >
>> > I am having yet another issue and wondering where to write with it.
>> > Would you be able to tell me if this is the right place or should i
>> send
>> > it to someone else ?
>> > The issue seems also like memory leak.
>> >
>> > It happens on multiple servers (less on 6.18.6, more on 6.19-rc4+).
>> > All servers are doing KVM with vfio GPU PCIE passthrough and it happens
>> > when i am using HUGEPAGE 1GB + qemu
>>
>> Okay, so we'll longterm-pin all guest memory into the iommu.
>>
>> > Basically i am allocating 970GB into hugepages, leaving 37GB to kvm.
>> > In normal operation i have about 20GB free space but when this issue
>> > occurs, all RAM is taken and even when i have added 100GB swap, it was
>> > also consumed.
>>
>> When you say hugepage you mean 1 GiB hugetlb, correct?
>>
>> > It can work for days or week without issue and
>> >
>> > I did not seen that issue when i had hugepages disabled (on normal 2KB
>> > pages allocation in kvm).
>>
>> I assume you meant 4k pages. What about 2 MiB hugetlb?
>>
>> > And i am using hugepages as it is impossible to boot VM with >200GB ram.
>>
>> Oh, really? That's odd.
>>
>> > When that issue happens, process ps hangs and only top shows
>> > something but machine needs to be rebooted due to many zombiee
>> processes.
>> >
>> > *Hardware: *
>> > Motherboard: ASRockRack GENOA2D24G-2L
>> > CPU: 2x AMD EPYC 9654 96-Core Processor
>> > System ram: 1024 GB
>> > GPUs: 8x RTX5090 vfio passthrough
>> >
>> > root@pve14:~# uname -a
>> > *Linux pve14 6.18.6-pbk* #1 SMP PREEMPT_DYNAMIC Mon Jan 19 20:59:46 UTC
>> > 2026 x86_64 GNU/Linux
>> >
>> > [171053.341288] *BUG: unable to handle page fault for address*:
>> > ff469ae640000000
>> > [171053.341310] #PF: supervisor read access in kernel mode
>> > [171053.341319] #PF: error_code(0x0000) - not-present page
>> > [171053.341328] PGD 4602067 P4D 0
>> > [171053.341337] *Oops*: Oops: 0000 [#1] SMP NOPTI
>> > [171053.341348] CPU: 16 UID: 0 PID: 3250869 Comm: qm Not tainted
>> 6.18.6-
>> > pbk #1 PREEMPT(voluntary)
>> > [171053.341362] Hardware name:
>>  TURIN2D24G-2L+/500W/TURIN2D24G-2L+/500W,
>> > BIOS 10.20 05/05/2025
>> > [171053.341373] RIP: 0010:*walk_pgd_range*+0x6ff/0xbb0
>> > [171053.341386] Code: 08 49 39 dd 0f 84 8c 01 00 00 49 89 de 49 8d 9e
>> 00
>> > 00 20 00 48 8b 75 b8 48 81 e3 00 00 e0 ff 48 8d 43 ff 48 39 f0 49 0f 43
>> > dd <49> f7 04 24 9f ff ff ff 0f 84 e2 fd ff ff 48 8b 45 c0 41 c7 47 20
>> > [171053.341406] RSP: 0018:ff59d95d70e6b748 EFLAGS: 00010287
>> > [171053.341416] RAX: 00007a22401fffff RBX: 00007a2240200000 RCX:
>> > 0000000000000000
>> > [171053.341425] RDX: 0000000000000000 RSI: 00007a227fffffff RDI:
>> > 800008dfc00002b7
>> > [171053.341435] RBP: ff59d95d70e6b828 R08: 0000000000000080 R09:
>> > 0000000000000000
>> > [171053.341444] R10: ffffffff8de588c0 R11: 0000000000000000 R12:
>> > ff469ae640000000
>> > [171053.341454] R13: 00007a2280000000 R14: 00007a2240000000 R15:
>> > ff59d95d70e6b8a8
>> > [171053.341464] FS:  00007d4e8ec94b80(0000) GS:ff4692876ae7e000(0000)
>> > knlGS:0000000000000000
>> > [171053.341476] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > [171053.341485] CR2: ff469ae640000000 CR3: 0000008241eed006 CR4:
>> > 0000000000f71ef0
>> > [171053.341495] PKRU: 55555554
>> > [171053.341501] Call Trace:
>> > [171053.341508]  <TASK>
>> > [171053.341518]  __walk_page_range+0x8e/0x220
>> > [171053.341529]  ? sysvec_apic_timer_interrupt+0x57/0xc0
>> > [171053.341541]  walk_page_vma+0x92/0xe0
>> > [171053.341551]  smap_gather_stats.part.0+0x8c/0xd0
>> > [171053.341563]  show_smaps_rollup+0x258/0x420
>>
>> Hm, so someone is reading /proc/$PID/smaps_rollup and we stumble
>> somewhere into something unexpected while doing a page table walk.
>>
>> [171053.341288] BUG: unable to handle page fault for address:
>> ff469ae640000000
>> [171053.341310] #PF: supervisor read access in kernel mode
>> [171053.341319] #PF: error_code(0x0000) - not-present page
>> [171053.341328] PGD 4602067 P4D 0
>>
>> There is not a lot of information there :(
>>
>> Did you have other splats/symptoms or was it always that?
>>
>> --
>> Cheers,
>>
>> David
>>
>

[-- Attachment #2: Type: text/html, Size: 16789 bytes --]

  reply	other threads:[~2026-02-04 22:50 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CANfXJzt4P+FCkdL_=FfmG80_bY8FkzSocJSPeksSQ_vXObRNOQ@mail.gmail.com>
2026-02-04 21:52 ` David Hildenbrand (arm)
2026-02-04 22:24   ` Tytus Rogalewski
2026-02-04 22:50     ` Tytus Rogalewski [this message]
2026-02-05 12:44       ` David Hildenbrand (Arm)
2026-02-05 12:46         ` Tytus Rogalewski
2026-02-05 12:57           ` David Hildenbrand (Arm)
2026-02-05 13:20             ` Tytus Rogalewski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANfXJztYakHdsY0757+ssQf0WYtnrWdGd8mFegp6PLKZ_SKnCg@mail.gmail.com \
    --to=tytanick@gmail.com \
    --cc=david@kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    --cc=osalvador@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox