From: "David Hildenbrand (arm)" <david@kernel.org>
To: Tytus Rogalewski <tytanick@gmail.com>,
linux-mm@kvack.org, muchun.song@linux.dev, osalvador@suse.de
Subject: Re: walk_pgd_range BUG: unable to handle page fault
Date: Wed, 4 Feb 2026 22:52:27 +0100 [thread overview]
Message-ID: <5948f3a6-8f30-4c45-9b86-2af9a6b37405@kernel.org> (raw)
In-Reply-To: <CANfXJzt4P+FCkdL_=FfmG80_bY8FkzSocJSPeksSQ_vXObRNOQ@mail.gmail.com>
On 1/28/26 15:14, Tytus Rogalewski wrote:
> Hello guys,
>
Hi!
> Recently i have reported slab memory leak and it was fixed.
>
> I am having yet another issue and wondering where to write with it.
> Would you be able to tell me if this is the right place or should i send
> it to someone else ?
> The issue seems also like memory leak.
>
> It happens on multiple servers (less on 6.18.6, more on 6.19-rc4+).
> All servers are doing KVM with vfio GPU PCIE passthrough and it happens
> when i am using HUGEPAGE 1GB + qemu
Okay, so we'll longterm-pin all guest memory into the iommu.
> Basically i am allocating 970GB into hugepages, leaving 37GB to kvm.
> In normal operation i have about 20GB free space but when this issue
> occurs, all RAM is taken and even when i have added 100GB swap, it was
> also consumed.
When you say hugepage you mean 1 GiB hugetlb, correct?
> It can work for days or week without issue and
>
> I did not seen that issue when i had hugepages disabled (on normal 2KB
> pages allocation in kvm).
I assume you meant 4k pages. What about 2 MiB hugetlb?
> And i am using hugepages as it is impossible to boot VM with >200GB ram.
Oh, really? That's odd.
> When that issue happens, process ps hangs and only top shows
> something but machine needs to be rebooted due to many zombiee processes.
>
> *Hardware: *
> Motherboard: ASRockRack GENOA2D24G-2L
> CPU: 2x AMD EPYC 9654 96-Core Processor
> System ram: 1024 GB
> GPUs: 8x RTX5090 vfio passthrough
>
> root@pve14:~# uname -a
> *Linux pve14 6.18.6-pbk* #1 SMP PREEMPT_DYNAMIC Mon Jan 19 20:59:46 UTC
> 2026 x86_64 GNU/Linux
>
> [171053.341288] *BUG: unable to handle page fault for address*:
> ff469ae640000000
> [171053.341310] #PF: supervisor read access in kernel mode
> [171053.341319] #PF: error_code(0x0000) - not-present page
> [171053.341328] PGD 4602067 P4D 0
> [171053.341337] *Oops*: Oops: 0000 [#1] SMP NOPTI
> [171053.341348] CPU: 16 UID: 0 PID: 3250869 Comm: qm Not tainted 6.18.6-
> pbk #1 PREEMPT(voluntary)
> [171053.341362] Hardware name: TURIN2D24G-2L+/500W/TURIN2D24G-2L+/500W,
> BIOS 10.20 05/05/2025
> [171053.341373] RIP: 0010:*walk_pgd_range*+0x6ff/0xbb0
> [171053.341386] Code: 08 49 39 dd 0f 84 8c 01 00 00 49 89 de 49 8d 9e 00
> 00 20 00 48 8b 75 b8 48 81 e3 00 00 e0 ff 48 8d 43 ff 48 39 f0 49 0f 43
> dd <49> f7 04 24 9f ff ff ff 0f 84 e2 fd ff ff 48 8b 45 c0 41 c7 47 20
> [171053.341406] RSP: 0018:ff59d95d70e6b748 EFLAGS: 00010287
> [171053.341416] RAX: 00007a22401fffff RBX: 00007a2240200000 RCX:
> 0000000000000000
> [171053.341425] RDX: 0000000000000000 RSI: 00007a227fffffff RDI:
> 800008dfc00002b7
> [171053.341435] RBP: ff59d95d70e6b828 R08: 0000000000000080 R09:
> 0000000000000000
> [171053.341444] R10: ffffffff8de588c0 R11: 0000000000000000 R12:
> ff469ae640000000
> [171053.341454] R13: 00007a2280000000 R14: 00007a2240000000 R15:
> ff59d95d70e6b8a8
> [171053.341464] FS: 00007d4e8ec94b80(0000) GS:ff4692876ae7e000(0000)
> knlGS:0000000000000000
> [171053.341476] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [171053.341485] CR2: ff469ae640000000 CR3: 0000008241eed006 CR4:
> 0000000000f71ef0
> [171053.341495] PKRU: 55555554
> [171053.341501] Call Trace:
> [171053.341508] <TASK>
> [171053.341518] __walk_page_range+0x8e/0x220
> [171053.341529] ? sysvec_apic_timer_interrupt+0x57/0xc0
> [171053.341541] walk_page_vma+0x92/0xe0
> [171053.341551] smap_gather_stats.part.0+0x8c/0xd0
> [171053.341563] show_smaps_rollup+0x258/0x420
Hm, so someone is reading /proc/$PID/smaps_rollup and we stumble
somewhere into something unexpected while doing a page table walk.
[171053.341288] BUG: unable to handle page fault for address:
ff469ae640000000
[171053.341310] #PF: supervisor read access in kernel mode
[171053.341319] #PF: error_code(0x0000) - not-present page
[171053.341328] PGD 4602067 P4D 0
There is not a lot of information there :(
Did you have other splats/symptoms or was it always that?
--
Cheers,
David
next parent reply other threads:[~2026-02-04 21:52 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CANfXJzt4P+FCkdL_=FfmG80_bY8FkzSocJSPeksSQ_vXObRNOQ@mail.gmail.com>
2026-02-04 21:52 ` David Hildenbrand (arm) [this message]
2026-02-04 22:24 ` Tytus Rogalewski
2026-02-04 22:50 ` Tytus Rogalewski
2026-02-05 12:44 ` David Hildenbrand (Arm)
2026-02-05 12:46 ` Tytus Rogalewski
2026-02-05 12:57 ` David Hildenbrand (Arm)
2026-02-05 13:20 ` Tytus Rogalewski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5948f3a6-8f30-4c45-9b86-2af9a6b37405@kernel.org \
--to=david@kernel.org \
--cc=linux-mm@kvack.org \
--cc=muchun.song@linux.dev \
--cc=osalvador@suse.de \
--cc=tytanick@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox