* [PATCH] mm: prepare anon_vma before swapin rmap
@ 2026-04-17 1:16 ZhengYuan Huang
2026-04-17 4:03 ` Matthew Wilcox
2026-04-17 10:53 ` David Hildenbrand (Arm)
0 siblings, 2 replies; 5+ messages in thread
From: ZhengYuan Huang @ 2026-04-17 1:16 UTC (permalink / raw)
To: akpm, david, ljs, Liam.Howlett, vbabka, rppt, surenb, mhocko, willy
Cc: linux-mm, linux-kernel, baijiaju1990, r33s3n6, zzzccc427,
ZhengYuan Huang
[BUG]
madvise(MADV_HWPOISON) can fault a swap entry back in through
get_user_pages_fast() and hit:
kernel BUG at mm/rmap.c:1364!
Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
RIP: 0010:__folio_set_anon mm/rmap.c:1364 [inline]
RIP: 0010:folio_add_new_anon_rmap+0x41d/0xc10 mm/rmap.c:1553
Call Trace:
do_swap_page+0x14c5/0x5c30 mm/memory.c:4963
handle_pte_fault mm/memory.c:6198 [inline]
__handle_mm_fault+0x1512/0x22f0 mm/memory.c:6336
handle_mm_fault+0x42f/0x820 mm/memory.c:6505
faultin_page mm/gup.c:1126 [inline]
__get_user_pages+0x4b3/0x3400 mm/gup.c:1428
__get_user_pages_locked mm/gup.c:1692 [inline]
__gup_longterm_locked+0x945/0x14b0 mm/gup.c:2476
gup_fast_fallback+0x8a3/0x2440 mm/gup.c:3220
get_user_pages_fast+0x64/0xb0 mm/gup.c:3298
madvise_inject_error mm/madvise.c:1456 [inline]
madvise_do_behavior+0x503/0x860 mm/madvise.c:1875
do_madvise+0x17e/0x210 mm/madvise.c:1978
__do_sys_madvise mm/madvise.c:1987 [inline]
__se_sys_madvise mm/madvise.c:1985 [inline]
__x64_sys_madvise+0xae/0x120 mm/madvise.c:1985
...
[CAUSE]
Commit a373baed5a9d ("mm: delay the check for a NULL anon_vma") moved
anon_vma preparation out of the generic fault path and into the fault
handlers that actually need to install anonymous rmap state.
do_swap_page() was left behind. It can still restore anonymous mappings
via folio_add_new_anon_rmap() or folio_add_anon_rmap_ptes(), but it does
not call vmf_anon_prepare() first. On a VMA-lock fault this can leave
vma->anon_vma NULL all the way down to __folio_set_anon(), which BUG_ONs
on that violated invariant.
[FIX]
Prepare the faulting VMA's anon_vma once do_swap_page() has confirmed it
is handling a real swap entry, before any swapin path can install
anonymous rmap state.
vmf_anon_prepare() already handles the retry rules for VMA-lock faults,
and when anon_vma is already present this stays a single likely branch in
the swap fault hot path.
Fixes: a373baed5a9d ("mm: delay the check for a NULL anon_vma")
Signed-off-by: ZhengYuan Huang <gality369@gmail.com>
---
I can reproduce this issue deterministically on v6.18, but I have not
been able to reproduce it with the same setup on next-20260415.
However, I have not identified a change that clearly explains the
difference. From code inspection, do_swap_page() still appears able to
reach folio_add_new_anon_rmap()/folio_add_anon_rmap_ptes() without a
prior vmf_anon_prepare(), while __folio_set_anon() still BUG_ONs if
vma->anon_vma is NULL. So although I could not reproduce the issue on
next-20260415, I also could not confirm that it has been fixed there.
Recent changes around the swap fault path may have affected the
reproduction conditions, but I may also be missing another relevant
change.
---
mm/memory.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/mm/memory.c b/mm/memory.c
index ea6568571131..a64bc9826cc5 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4850,6 +4850,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
goto out;
}
+ /* Swapin installs anonymous rmap state into the faulting VMA. */
+ ret = vmf_anon_prepare(vmf);
+ if (ret)
+ goto out;
+
/* Prevent swapoff from happening to us. */
si = get_swap_device(entry);
if (unlikely(!si))
--
2.49.0
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] mm: prepare anon_vma before swapin rmap
2026-04-17 1:16 [PATCH] mm: prepare anon_vma before swapin rmap ZhengYuan Huang
@ 2026-04-17 4:03 ` Matthew Wilcox
2026-04-17 10:53 ` David Hildenbrand (Arm)
1 sibling, 0 replies; 5+ messages in thread
From: Matthew Wilcox @ 2026-04-17 4:03 UTC (permalink / raw)
To: ZhengYuan Huang
Cc: akpm, david, ljs, Liam.Howlett, vbabka, rppt, surenb, mhocko,
linux-mm, linux-kernel, baijiaju1990, r33s3n6, zzzccc427
On Fri, Apr 17, 2026 at 09:16:06AM +0800, ZhengYuan Huang wrote:
> Commit a373baed5a9d ("mm: delay the check for a NULL anon_vma") moved
> anon_vma preparation out of the generic fault path and into the fault
> handlers that actually need to install anonymous rmap state.
>
> do_swap_page() was left behind. It can still restore anonymous mappings
> via folio_add_new_anon_rmap() or folio_add_anon_rmap_ptes(), but it does
> not call vmf_anon_prepare() first. On a VMA-lock fault this can leave
> vma->anon_vma NULL all the way down to __folio_set_anon(), which BUG_ONs
> on that violated invariant.
Huh. Can you share your reproducer? I wonder if there's an equivalent
problem with do_numa_fault(). And maybe the right solution might be
to put the call to vmf_anon_prepare() in handle_pte_fault() instead.
I'm asking because I don't quite understand how we get to this point
without an anon_vma being assigned to this VMA. We should allocate one on
the first fault ... so we cannot have ever faulted, but if we've never
faulted, how does madvise() manage to swap out a page if none has been
allocated?
(also if you share your reproducer, perhaps someone will add it to the
self-tests and maybe it'll prevent another bug in the future)
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] mm: prepare anon_vma before swapin rmap
2026-04-17 1:16 [PATCH] mm: prepare anon_vma before swapin rmap ZhengYuan Huang
2026-04-17 4:03 ` Matthew Wilcox
@ 2026-04-17 10:53 ` David Hildenbrand (Arm)
2026-04-17 11:57 ` David Hildenbrand (Arm)
1 sibling, 1 reply; 5+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-17 10:53 UTC (permalink / raw)
To: ZhengYuan Huang, akpm, ljs, Liam.Howlett, vbabka, rppt, surenb,
mhocko, willy
Cc: linux-mm, linux-kernel, baijiaju1990, r33s3n6, zzzccc427
On 4/17/26 03:16, ZhengYuan Huang wrote:
> [BUG]
> madvise(MADV_HWPOISON) can fault a swap entry back in through
> get_user_pages_fast() and hit:
>
> kernel BUG at mm/rmap.c:1364!
> Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
> RIP: 0010:__folio_set_anon mm/rmap.c:1364 [inline]
> RIP: 0010:folio_add_new_anon_rmap+0x41d/0xc10 mm/rmap.c:1553
> Call Trace:
> do_swap_page+0x14c5/0x5c30 mm/memory.c:4963
> handle_pte_fault mm/memory.c:6198 [inline]
> __handle_mm_fault+0x1512/0x22f0 mm/memory.c:6336
> handle_mm_fault+0x42f/0x820 mm/memory.c:6505
> faultin_page mm/gup.c:1126 [inline]
> __get_user_pages+0x4b3/0x3400 mm/gup.c:1428
> __get_user_pages_locked mm/gup.c:1692 [inline]
> __gup_longterm_locked+0x945/0x14b0 mm/gup.c:2476
> gup_fast_fallback+0x8a3/0x2440 mm/gup.c:3220
> get_user_pages_fast+0x64/0xb0 mm/gup.c:3298
> madvise_inject_error mm/madvise.c:1456 [inline]
> madvise_do_behavior+0x503/0x860 mm/madvise.c:1875
> do_madvise+0x17e/0x210 mm/madvise.c:1978
> __do_sys_madvise mm/madvise.c:1987 [inline]
> __se_sys_madvise mm/madvise.c:1985 [inline]
> __x64_sys_madvise+0xae/0x120 mm/madvise.c:1985
> ...
>
> [CAUSE]
> Commit a373baed5a9d ("mm: delay the check for a NULL anon_vma") moved
> anon_vma preparation out of the generic fault path and into the fault
> handlers that actually need to install anonymous rmap state.
>
> do_swap_page() was left behind. It can still restore anonymous mappings
> via folio_add_new_anon_rmap() or folio_add_anon_rmap_ptes(), but it does
> not call vmf_anon_prepare() first. On a VMA-lock fault this can leave
> vma->anon_vma NULL all the way down to __folio_set_anon(), which BUG_ONs
> on that violated invariant.
>
> [FIX]
> Prepare the faulting VMA's anon_vma once do_swap_page() has confirmed it
> is handling a real swap entry, before any swapin path can install
> anonymous rmap state.
>
> vmf_anon_prepare() already handles the retry rules for VMA-lock faults,
> and when anon_vma is already present this stays a single likely branch in
> the swap fault hot path.
>
> Fixes: a373baed5a9d ("mm: delay the check for a NULL anon_vma")
> Signed-off-by: ZhengYuan Huang <gality369@gmail.com>
> ---
> I can reproduce this issue deterministically on v6.18, but I have not
> been able to reproduce it with the same setup on next-20260415.
>
> However, I have not identified a change that clearly explains the
> difference. From code inspection, do_swap_page() still appears able to
> reach folio_add_new_anon_rmap()/folio_add_anon_rmap_ptes() without a
> prior vmf_anon_prepare()
If there is an anon page swapped out, certainly at the allocation time
of that anon folio, there would have to be a anon_rmap allocated.
During fork, the anon_rmap would have to be created as well.
Something is off here.
--
Cheers,
David
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] mm: prepare anon_vma before swapin rmap
2026-04-17 10:53 ` David Hildenbrand (Arm)
@ 2026-04-17 11:57 ` David Hildenbrand (Arm)
2026-04-17 13:03 ` Matthew Wilcox
0 siblings, 1 reply; 5+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-17 11:57 UTC (permalink / raw)
To: ZhengYuan Huang, akpm, ljs, Liam.Howlett, vbabka, rppt, surenb,
mhocko, willy
Cc: linux-mm, linux-kernel, baijiaju1990, r33s3n6, zzzccc427
On 4/17/26 12:53, David Hildenbrand (Arm) wrote:
> On 4/17/26 03:16, ZhengYuan Huang wrote:
>> [BUG]
>> madvise(MADV_HWPOISON) can fault a swap entry back in through
>> get_user_pages_fast() and hit:
>>
>> kernel BUG at mm/rmap.c:1364!
>> Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
>> RIP: 0010:__folio_set_anon mm/rmap.c:1364 [inline]
>> RIP: 0010:folio_add_new_anon_rmap+0x41d/0xc10 mm/rmap.c:1553
>> Call Trace:
>> do_swap_page+0x14c5/0x5c30 mm/memory.c:4963
>> handle_pte_fault mm/memory.c:6198 [inline]
>> __handle_mm_fault+0x1512/0x22f0 mm/memory.c:6336
>> handle_mm_fault+0x42f/0x820 mm/memory.c:6505
>> faultin_page mm/gup.c:1126 [inline]
>> __get_user_pages+0x4b3/0x3400 mm/gup.c:1428
>> __get_user_pages_locked mm/gup.c:1692 [inline]
>> __gup_longterm_locked+0x945/0x14b0 mm/gup.c:2476
>> gup_fast_fallback+0x8a3/0x2440 mm/gup.c:3220
>> get_user_pages_fast+0x64/0xb0 mm/gup.c:3298
>> madvise_inject_error mm/madvise.c:1456 [inline]
>> madvise_do_behavior+0x503/0x860 mm/madvise.c:1875
>> do_madvise+0x17e/0x210 mm/madvise.c:1978
>> __do_sys_madvise mm/madvise.c:1987 [inline]
>> __se_sys_madvise mm/madvise.c:1985 [inline]
>> __x64_sys_madvise+0xae/0x120 mm/madvise.c:1985
>> ...
>>
>> [CAUSE]
>> Commit a373baed5a9d ("mm: delay the check for a NULL anon_vma") moved
>> anon_vma preparation out of the generic fault path and into the fault
>> handlers that actually need to install anonymous rmap state.
>>
>> do_swap_page() was left behind. It can still restore anonymous mappings
>> via folio_add_new_anon_rmap() or folio_add_anon_rmap_ptes(), but it does
>> not call vmf_anon_prepare() first. On a VMA-lock fault this can leave
>> vma->anon_vma NULL all the way down to __folio_set_anon(), which BUG_ONs
>> on that violated invariant.
>>
>> [FIX]
>> Prepare the faulting VMA's anon_vma once do_swap_page() has confirmed it
>> is handling a real swap entry, before any swapin path can install
>> anonymous rmap state.
>>
>> vmf_anon_prepare() already handles the retry rules for VMA-lock faults,
>> and when anon_vma is already present this stays a single likely branch in
>> the swap fault hot path.
>>
>> Fixes: a373baed5a9d ("mm: delay the check for a NULL anon_vma")
>> Signed-off-by: ZhengYuan Huang <gality369@gmail.com>
>> ---
>> I can reproduce this issue deterministically on v6.18, but I have not
>> been able to reproduce it with the same setup on next-20260415.
>>
>> However, I have not identified a change that clearly explains the
>> difference. From code inspection, do_swap_page() still appears able to
>> reach folio_add_new_anon_rmap()/folio_add_anon_rmap_ptes() without a
>> prior vmf_anon_prepare()
>
>
> If there is an anon page swapped out, certainly at the allocation time
> of that anon folio, there would have to be a anon_rmap allocated.
>
> During fork, the anon_rmap would have to be created as well.
>
> Something is off here.
>
Just speculating, we had
commit 3b617fd3d317bf9dd7e2c233e56eafef05734c9d
Author: Lorenzo Stoakes <ljs@kernel.org>
Date: Mon Jan 5 20:11:49 2026 +0000
mm/vma: enforce VMA fork limit on unfaulted,faulted mremap merge too
Go into v6.19.
Maybe there was a scenario where we could have lost vma->anon_vma during
a merge, resulting in a swapped page in an anon_vma.
If this cannot be reproduced on 6.19+,there is nothing to worry about.
--
Cheers,
David
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] mm: prepare anon_vma before swapin rmap
2026-04-17 11:57 ` David Hildenbrand (Arm)
@ 2026-04-17 13:03 ` Matthew Wilcox
0 siblings, 0 replies; 5+ messages in thread
From: Matthew Wilcox @ 2026-04-17 13:03 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: ZhengYuan Huang, akpm, ljs, Liam.Howlett, vbabka, rppt, surenb,
mhocko, linux-mm, linux-kernel, baijiaju1990, r33s3n6, zzzccc427
On Fri, Apr 17, 2026 at 01:57:59PM +0200, David Hildenbrand (Arm) wrote:
> On 4/17/26 12:53, David Hildenbrand (Arm) wrote:
> > On 4/17/26 03:16, ZhengYuan Huang wrote:
> >> [BUG]
> >> madvise(MADV_HWPOISON) can fault a swap entry back in through
> >> get_user_pages_fast() and hit:
...
> >> I can reproduce this issue deterministically on v6.18, but I have not
> >> been able to reproduce it with the same setup on next-20260415.
>
> Just speculating, we had
>
> commit 3b617fd3d317bf9dd7e2c233e56eafef05734c9d
> Author: Lorenzo Stoakes <ljs@kernel.org>
> Date: Mon Jan 5 20:11:49 2026 +0000
>
> mm/vma: enforce VMA fork limit on unfaulted,faulted mremap merge too
>
> Go into v6.19.
>
> Maybe there was a scenario where we could have lost vma->anon_vma during
> a merge, resulting in a swapped page in an anon_vma.
>
> If this cannot be reproduced on 6.19+,there is nothing to worry about.
... except that 6.18 is LTS so we need a fix for that kernel version.
And maybe 6.12 as well (a373baed5a9d went into 6.9, so no need to
go further back than that)
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-04-17 13:03 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-04-17 1:16 [PATCH] mm: prepare anon_vma before swapin rmap ZhengYuan Huang
2026-04-17 4:03 ` Matthew Wilcox
2026-04-17 10:53 ` David Hildenbrand (Arm)
2026-04-17 11:57 ` David Hildenbrand (Arm)
2026-04-17 13:03 ` Matthew Wilcox
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox