linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: prepare anon_vma before swapin rmap
@ 2026-04-17  1:16 ZhengYuan Huang
  2026-04-17  4:03 ` Matthew Wilcox
  2026-04-17 10:53 ` David Hildenbrand (Arm)
  0 siblings, 2 replies; 7+ messages in thread
From: ZhengYuan Huang @ 2026-04-17  1:16 UTC (permalink / raw)
  To: akpm, david, ljs, Liam.Howlett, vbabka, rppt, surenb, mhocko, willy
  Cc: linux-mm, linux-kernel, baijiaju1990, r33s3n6, zzzccc427,
	ZhengYuan Huang

[BUG]
madvise(MADV_HWPOISON) can fault a swap entry back in through
get_user_pages_fast() and hit:

kernel BUG at mm/rmap.c:1364!
Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
RIP: 0010:__folio_set_anon mm/rmap.c:1364 [inline]
RIP: 0010:folio_add_new_anon_rmap+0x41d/0xc10 mm/rmap.c:1553
Call Trace:
 do_swap_page+0x14c5/0x5c30 mm/memory.c:4963
 handle_pte_fault mm/memory.c:6198 [inline]
 __handle_mm_fault+0x1512/0x22f0 mm/memory.c:6336
 handle_mm_fault+0x42f/0x820 mm/memory.c:6505
 faultin_page mm/gup.c:1126 [inline]
 __get_user_pages+0x4b3/0x3400 mm/gup.c:1428
 __get_user_pages_locked mm/gup.c:1692 [inline]
 __gup_longterm_locked+0x945/0x14b0 mm/gup.c:2476
 gup_fast_fallback+0x8a3/0x2440 mm/gup.c:3220
 get_user_pages_fast+0x64/0xb0 mm/gup.c:3298
 madvise_inject_error mm/madvise.c:1456 [inline]
 madvise_do_behavior+0x503/0x860 mm/madvise.c:1875
 do_madvise+0x17e/0x210 mm/madvise.c:1978
 __do_sys_madvise mm/madvise.c:1987 [inline]
 __se_sys_madvise mm/madvise.c:1985 [inline]
 __x64_sys_madvise+0xae/0x120 mm/madvise.c:1985
 ...

[CAUSE]
Commit a373baed5a9d ("mm: delay the check for a NULL anon_vma") moved
anon_vma preparation out of the generic fault path and into the fault
handlers that actually need to install anonymous rmap state.

do_swap_page() was left behind. It can still restore anonymous mappings
via folio_add_new_anon_rmap() or folio_add_anon_rmap_ptes(), but it does
not call vmf_anon_prepare() first. On a VMA-lock fault this can leave
vma->anon_vma NULL all the way down to __folio_set_anon(), which BUG_ONs
on that violated invariant.

[FIX]
Prepare the faulting VMA's anon_vma once do_swap_page() has confirmed it
is handling a real swap entry, before any swapin path can install
anonymous rmap state.

vmf_anon_prepare() already handles the retry rules for VMA-lock faults,
and when anon_vma is already present this stays a single likely branch in
the swap fault hot path.

Fixes: a373baed5a9d ("mm: delay the check for a NULL anon_vma")
Signed-off-by: ZhengYuan Huang <gality369@gmail.com>
---
I can reproduce this issue deterministically on v6.18, but I have not
been able to reproduce it with the same setup on next-20260415.

However, I have not identified a change that clearly explains the
difference. From code inspection, do_swap_page() still appears able to
reach folio_add_new_anon_rmap()/folio_add_anon_rmap_ptes() without a
prior vmf_anon_prepare(), while __folio_set_anon() still BUG_ONs if
vma->anon_vma is NULL. So although I could not reproduce the issue on
next-20260415, I also could not confirm that it has been fixed there.
Recent changes around the swap fault path may have affected the
reproduction conditions, but I may also be missing another relevant
change.
---
mm/memory.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/mm/memory.c b/mm/memory.c
index ea6568571131..a64bc9826cc5 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4850,6 +4850,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 		goto out;
 	}
 
+	/* Swapin installs anonymous rmap state into the faulting VMA. */
+	ret = vmf_anon_prepare(vmf);
+	if (ret)
+		goto out;
+
 	/* Prevent swapoff from happening to us. */
 	si = get_swap_device(entry);
 	if (unlikely(!si))
-- 
2.49.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: prepare anon_vma before swapin rmap
  2026-04-17  1:16 [PATCH] mm: prepare anon_vma before swapin rmap ZhengYuan Huang
@ 2026-04-17  4:03 ` Matthew Wilcox
  2026-04-17 10:53 ` David Hildenbrand (Arm)
  1 sibling, 0 replies; 7+ messages in thread
From: Matthew Wilcox @ 2026-04-17  4:03 UTC (permalink / raw)
  To: ZhengYuan Huang
  Cc: akpm, david, ljs, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	linux-mm, linux-kernel, baijiaju1990, r33s3n6, zzzccc427

On Fri, Apr 17, 2026 at 09:16:06AM +0800, ZhengYuan Huang wrote:
> Commit a373baed5a9d ("mm: delay the check for a NULL anon_vma") moved
> anon_vma preparation out of the generic fault path and into the fault
> handlers that actually need to install anonymous rmap state.
> 
> do_swap_page() was left behind. It can still restore anonymous mappings
> via folio_add_new_anon_rmap() or folio_add_anon_rmap_ptes(), but it does
> not call vmf_anon_prepare() first. On a VMA-lock fault this can leave
> vma->anon_vma NULL all the way down to __folio_set_anon(), which BUG_ONs
> on that violated invariant.

Huh.  Can you share your reproducer?  I wonder if there's an equivalent
problem with do_numa_fault().  And maybe the right solution might be
to put the call to vmf_anon_prepare() in handle_pte_fault() instead.

I'm asking because I don't quite understand how we get to this point
without an anon_vma being assigned to this VMA.  We should allocate one on
the first fault ... so we cannot have ever faulted, but if we've never
faulted, how does madvise() manage to swap out a page if none has been
allocated?

(also if you share your reproducer, perhaps someone will add it to the
self-tests and maybe it'll prevent another bug in the future)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: prepare anon_vma before swapin rmap
  2026-04-17  1:16 [PATCH] mm: prepare anon_vma before swapin rmap ZhengYuan Huang
  2026-04-17  4:03 ` Matthew Wilcox
@ 2026-04-17 10:53 ` David Hildenbrand (Arm)
  2026-04-17 11:57   ` David Hildenbrand (Arm)
  1 sibling, 1 reply; 7+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-17 10:53 UTC (permalink / raw)
  To: ZhengYuan Huang, akpm, ljs, Liam.Howlett, vbabka, rppt, surenb,
	mhocko, willy
  Cc: linux-mm, linux-kernel, baijiaju1990, r33s3n6, zzzccc427

On 4/17/26 03:16, ZhengYuan Huang wrote:
> [BUG]
> madvise(MADV_HWPOISON) can fault a swap entry back in through
> get_user_pages_fast() and hit:
> 
> kernel BUG at mm/rmap.c:1364!
> Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
> RIP: 0010:__folio_set_anon mm/rmap.c:1364 [inline]
> RIP: 0010:folio_add_new_anon_rmap+0x41d/0xc10 mm/rmap.c:1553
> Call Trace:
>  do_swap_page+0x14c5/0x5c30 mm/memory.c:4963
>  handle_pte_fault mm/memory.c:6198 [inline]
>  __handle_mm_fault+0x1512/0x22f0 mm/memory.c:6336
>  handle_mm_fault+0x42f/0x820 mm/memory.c:6505
>  faultin_page mm/gup.c:1126 [inline]
>  __get_user_pages+0x4b3/0x3400 mm/gup.c:1428
>  __get_user_pages_locked mm/gup.c:1692 [inline]
>  __gup_longterm_locked+0x945/0x14b0 mm/gup.c:2476
>  gup_fast_fallback+0x8a3/0x2440 mm/gup.c:3220
>  get_user_pages_fast+0x64/0xb0 mm/gup.c:3298
>  madvise_inject_error mm/madvise.c:1456 [inline]
>  madvise_do_behavior+0x503/0x860 mm/madvise.c:1875
>  do_madvise+0x17e/0x210 mm/madvise.c:1978
>  __do_sys_madvise mm/madvise.c:1987 [inline]
>  __se_sys_madvise mm/madvise.c:1985 [inline]
>  __x64_sys_madvise+0xae/0x120 mm/madvise.c:1985
>  ...
> 
> [CAUSE]
> Commit a373baed5a9d ("mm: delay the check for a NULL anon_vma") moved
> anon_vma preparation out of the generic fault path and into the fault
> handlers that actually need to install anonymous rmap state.
> 
> do_swap_page() was left behind. It can still restore anonymous mappings
> via folio_add_new_anon_rmap() or folio_add_anon_rmap_ptes(), but it does
> not call vmf_anon_prepare() first. On a VMA-lock fault this can leave
> vma->anon_vma NULL all the way down to __folio_set_anon(), which BUG_ONs
> on that violated invariant.
> 
> [FIX]
> Prepare the faulting VMA's anon_vma once do_swap_page() has confirmed it
> is handling a real swap entry, before any swapin path can install
> anonymous rmap state.
> 
> vmf_anon_prepare() already handles the retry rules for VMA-lock faults,
> and when anon_vma is already present this stays a single likely branch in
> the swap fault hot path.
> 
> Fixes: a373baed5a9d ("mm: delay the check for a NULL anon_vma")
> Signed-off-by: ZhengYuan Huang <gality369@gmail.com>
> ---
> I can reproduce this issue deterministically on v6.18, but I have not
> been able to reproduce it with the same setup on next-20260415.
> 
> However, I have not identified a change that clearly explains the
> difference. From code inspection, do_swap_page() still appears able to
> reach folio_add_new_anon_rmap()/folio_add_anon_rmap_ptes() without a
> prior vmf_anon_prepare()


If there is an anon page swapped out, certainly at the allocation time
of that anon folio, there would have to be a anon_rmap allocated.

During fork, the anon_rmap would have to be created as well.

Something is off here.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: prepare anon_vma before swapin rmap
  2026-04-17 10:53 ` David Hildenbrand (Arm)
@ 2026-04-17 11:57   ` David Hildenbrand (Arm)
  2026-04-17 13:03     ` Matthew Wilcox
  0 siblings, 1 reply; 7+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-17 11:57 UTC (permalink / raw)
  To: ZhengYuan Huang, akpm, ljs, Liam.Howlett, vbabka, rppt, surenb,
	mhocko, willy
  Cc: linux-mm, linux-kernel, baijiaju1990, r33s3n6, zzzccc427

On 4/17/26 12:53, David Hildenbrand (Arm) wrote:
> On 4/17/26 03:16, ZhengYuan Huang wrote:
>> [BUG]
>> madvise(MADV_HWPOISON) can fault a swap entry back in through
>> get_user_pages_fast() and hit:
>>
>> kernel BUG at mm/rmap.c:1364!
>> Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
>> RIP: 0010:__folio_set_anon mm/rmap.c:1364 [inline]
>> RIP: 0010:folio_add_new_anon_rmap+0x41d/0xc10 mm/rmap.c:1553
>> Call Trace:
>>  do_swap_page+0x14c5/0x5c30 mm/memory.c:4963
>>  handle_pte_fault mm/memory.c:6198 [inline]
>>  __handle_mm_fault+0x1512/0x22f0 mm/memory.c:6336
>>  handle_mm_fault+0x42f/0x820 mm/memory.c:6505
>>  faultin_page mm/gup.c:1126 [inline]
>>  __get_user_pages+0x4b3/0x3400 mm/gup.c:1428
>>  __get_user_pages_locked mm/gup.c:1692 [inline]
>>  __gup_longterm_locked+0x945/0x14b0 mm/gup.c:2476
>>  gup_fast_fallback+0x8a3/0x2440 mm/gup.c:3220
>>  get_user_pages_fast+0x64/0xb0 mm/gup.c:3298
>>  madvise_inject_error mm/madvise.c:1456 [inline]
>>  madvise_do_behavior+0x503/0x860 mm/madvise.c:1875
>>  do_madvise+0x17e/0x210 mm/madvise.c:1978
>>  __do_sys_madvise mm/madvise.c:1987 [inline]
>>  __se_sys_madvise mm/madvise.c:1985 [inline]
>>  __x64_sys_madvise+0xae/0x120 mm/madvise.c:1985
>>  ...
>>
>> [CAUSE]
>> Commit a373baed5a9d ("mm: delay the check for a NULL anon_vma") moved
>> anon_vma preparation out of the generic fault path and into the fault
>> handlers that actually need to install anonymous rmap state.
>>
>> do_swap_page() was left behind. It can still restore anonymous mappings
>> via folio_add_new_anon_rmap() or folio_add_anon_rmap_ptes(), but it does
>> not call vmf_anon_prepare() first. On a VMA-lock fault this can leave
>> vma->anon_vma NULL all the way down to __folio_set_anon(), which BUG_ONs
>> on that violated invariant.
>>
>> [FIX]
>> Prepare the faulting VMA's anon_vma once do_swap_page() has confirmed it
>> is handling a real swap entry, before any swapin path can install
>> anonymous rmap state.
>>
>> vmf_anon_prepare() already handles the retry rules for VMA-lock faults,
>> and when anon_vma is already present this stays a single likely branch in
>> the swap fault hot path.
>>
>> Fixes: a373baed5a9d ("mm: delay the check for a NULL anon_vma")
>> Signed-off-by: ZhengYuan Huang <gality369@gmail.com>
>> ---
>> I can reproduce this issue deterministically on v6.18, but I have not
>> been able to reproduce it with the same setup on next-20260415.
>>
>> However, I have not identified a change that clearly explains the
>> difference. From code inspection, do_swap_page() still appears able to
>> reach folio_add_new_anon_rmap()/folio_add_anon_rmap_ptes() without a
>> prior vmf_anon_prepare()
> 
> 
> If there is an anon page swapped out, certainly at the allocation time
> of that anon folio, there would have to be a anon_rmap allocated.
> 
> During fork, the anon_rmap would have to be created as well.
> 
> Something is off here.
> 

Just speculating, we had

commit 3b617fd3d317bf9dd7e2c233e56eafef05734c9d
Author: Lorenzo Stoakes <ljs@kernel.org>
Date:   Mon Jan 5 20:11:49 2026 +0000

    mm/vma: enforce VMA fork limit on unfaulted,faulted mremap merge too

Go into v6.19.

Maybe there was a scenario where we could have lost vma->anon_vma during
a merge, resulting in a swapped page in an anon_vma.

If this cannot be reproduced on 6.19+,there is nothing to worry about.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: prepare anon_vma before swapin rmap
  2026-04-17 11:57   ` David Hildenbrand (Arm)
@ 2026-04-17 13:03     ` Matthew Wilcox
  2026-04-17 13:36       ` Vlastimil Babka (SUSE)
  0 siblings, 1 reply; 7+ messages in thread
From: Matthew Wilcox @ 2026-04-17 13:03 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: ZhengYuan Huang, akpm, ljs, Liam.Howlett, vbabka, rppt, surenb,
	mhocko, linux-mm, linux-kernel, baijiaju1990, r33s3n6, zzzccc427

On Fri, Apr 17, 2026 at 01:57:59PM +0200, David Hildenbrand (Arm) wrote:
> On 4/17/26 12:53, David Hildenbrand (Arm) wrote:
> > On 4/17/26 03:16, ZhengYuan Huang wrote:
> >> [BUG]
> >> madvise(MADV_HWPOISON) can fault a swap entry back in through
> >> get_user_pages_fast() and hit:
...
> >> I can reproduce this issue deterministically on v6.18, but I have not
> >> been able to reproduce it with the same setup on next-20260415.
> 
> Just speculating, we had
> 
> commit 3b617fd3d317bf9dd7e2c233e56eafef05734c9d
> Author: Lorenzo Stoakes <ljs@kernel.org>
> Date:   Mon Jan 5 20:11:49 2026 +0000
> 
>     mm/vma: enforce VMA fork limit on unfaulted,faulted mremap merge too
> 
> Go into v6.19.
> 
> Maybe there was a scenario where we could have lost vma->anon_vma during
> a merge, resulting in a swapped page in an anon_vma.
> 
> If this cannot be reproduced on 6.19+,there is nothing to worry about.

... except that 6.18 is LTS so we need a fix for that kernel version.
And maybe 6.12 as well (a373baed5a9d went into 6.9, so no need to
go further back than that)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: prepare anon_vma before swapin rmap
  2026-04-17 13:03     ` Matthew Wilcox
@ 2026-04-17 13:36       ` Vlastimil Babka (SUSE)
  2026-04-17 15:09         ` Matthew Wilcox
  0 siblings, 1 reply; 7+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-04-17 13:36 UTC (permalink / raw)
  To: Matthew Wilcox, David Hildenbrand (Arm)
  Cc: ZhengYuan Huang, akpm, ljs, Liam.Howlett, rppt, surenb, mhocko,
	linux-mm, linux-kernel, baijiaju1990, r33s3n6, zzzccc427

On 4/17/26 15:03, Matthew Wilcox wrote:
> On Fri, Apr 17, 2026 at 01:57:59PM +0200, David Hildenbrand (Arm) wrote:
>> On 4/17/26 12:53, David Hildenbrand (Arm) wrote:
>> > On 4/17/26 03:16, ZhengYuan Huang wrote:
>> >> [BUG]
>> >> madvise(MADV_HWPOISON) can fault a swap entry back in through
>> >> get_user_pages_fast() and hit:
> ...
>> >> I can reproduce this issue deterministically on v6.18, but I have not
>> >> been able to reproduce it with the same setup on next-20260415.
>> 
>> Just speculating, we had
>> 
>> commit 3b617fd3d317bf9dd7e2c233e56eafef05734c9d
>> Author: Lorenzo Stoakes <ljs@kernel.org>
>> Date:   Mon Jan 5 20:11:49 2026 +0000
>> 
>>     mm/vma: enforce VMA fork limit on unfaulted,faulted mremap merge too
>> 
>> Go into v6.19.
>> 
>> Maybe there was a scenario where we could have lost vma->anon_vma during
>> a merge, resulting in a swapped page in an anon_vma.
>> 
>> If this cannot be reproduced on 6.19+,there is nothing to worry about.
> 
> ... except that 6.18 is LTS so we need a fix for that kernel version.

It's there: https://kernel.dance/#3b617fd3d317bf9dd7e2c233e56eafef05734c9d

> And maybe 6.12 as well (a373baed5a9d went into 6.9, so no need to
> go further back than that)

Not there (yet?) but it was tagged stable.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: prepare anon_vma before swapin rmap
  2026-04-17 13:36       ` Vlastimil Babka (SUSE)
@ 2026-04-17 15:09         ` Matthew Wilcox
  0 siblings, 0 replies; 7+ messages in thread
From: Matthew Wilcox @ 2026-04-17 15:09 UTC (permalink / raw)
  To: Vlastimil Babka (SUSE)
  Cc: David Hildenbrand (Arm),
	ZhengYuan Huang, akpm, ljs, Liam.Howlett, rppt, surenb, mhocko,
	linux-mm, linux-kernel, baijiaju1990, r33s3n6, zzzccc427

On Fri, Apr 17, 2026 at 03:36:59PM +0200, Vlastimil Babka (SUSE) wrote:
> On 4/17/26 15:03, Matthew Wilcox wrote:
> > On Fri, Apr 17, 2026 at 01:57:59PM +0200, David Hildenbrand (Arm) wrote:
> >> Just speculating, we had
> >> 
> >> commit 3b617fd3d317bf9dd7e2c233e56eafef05734c9d
> >> Author: Lorenzo Stoakes <ljs@kernel.org>
> >> Date:   Mon Jan 5 20:11:49 2026 +0000
> >> 
> >>     mm/vma: enforce VMA fork limit on unfaulted,faulted mremap merge too
> >> 
> >> Go into v6.19.
> >> 
> >> Maybe there was a scenario where we could have lost vma->anon_vma during
> >> a merge, resulting in a swapped page in an anon_vma.
> >> 
> >> If this cannot be reproduced on 6.19+,there is nothing to worry about.
> > 
> > ... except that 6.18 is LTS so we need a fix for that kernel version.
> 
> It's there: https://kernel.dance/#3b617fd3d317bf9dd7e2c233e56eafef05734c9d
> 
> > And maybe 6.12 as well (a373baed5a9d went into 6.9, so no need to
> > go further back than that)
> 
> Not there (yet?) but it was tagged stable.

879bca0a2c4f (the commit fixed by 3b617fd3d317) went into 6.15, so if
that's the problem then there's no need to backport 3b617fd3d317 to 6.12.
Unless someone backported 879bca0a2c4f to 6.12 which they might have
since it was a fix for a 2011 commit.  But so far nobody seems to have
done that, and since it's only a performance bug (right?), probably
nobody will bother.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-04-17 15:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-04-17  1:16 [PATCH] mm: prepare anon_vma before swapin rmap ZhengYuan Huang
2026-04-17  4:03 ` Matthew Wilcox
2026-04-17 10:53 ` David Hildenbrand (Arm)
2026-04-17 11:57   ` David Hildenbrand (Arm)
2026-04-17 13:03     ` Matthew Wilcox
2026-04-17 13:36       ` Vlastimil Babka (SUSE)
2026-04-17 15:09         ` Matthew Wilcox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox