From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Harry Yoo <harry.yoo@oracle.com>
Cc: "David Hildenbrand (Red Hat)" <david@kernel.org>,
Jeongjun Park <aha310510@gmail.com>,
Liam.Howlett@oracle.com, akpm@linux-foundation.org,
jannh@google.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, riel@surriel.com,
syzbot+b165fc2e11771c66d8ba@syzkaller.appspotmail.com,
syzkaller-bugs@googlegroups.com, vbabka@suse.cz
Subject: Re: [syzbot] [mm?] WARNING in folio_remove_rmap_ptes
Date: Fri, 2 Jan 2026 16:30:57 +0000 [thread overview]
Message-ID: <75ba8e22-9f00-489b-989a-373d374244f5@lucifer.local> (raw)
In-Reply-To: <aVd-UZQGW4ltH6hY@hyeyoo>
OK I have figured out the issue. Big long-winded explanation below and rough
patch included ONLY for reference, I'll send the fix properly once I've figured
out a less horrible repro.
This isn't a product of commit d23cb648e365 ("mm/mremap: permit mremap() move of
multiple VMAs"), that's a red herring as it seems it's just what syzbot needed
to trigger this (as well as Jann's assert obviously).
It is rather commit 879bca0a2c4f ("mm/vma: fix incorrectly disallowed anonymous
VMA merges"), another of mine (mea culpa!) that seems to have caused it due to a
really subtle corner case.
We initially set things up like this:
0x200000000000 0x200001000000
|---------------------------/\/---------------------------------------|
| anon | <guard VMA>
|---------------------------/\/---------------------------------------|
->
Then we mmap file0 at 0x200000ffc000 for size 0x4000.
First we unmap the existing:
0x200000000000 0x200000ffc000 0x200001000000
|---------------------------/\/| |
| anon | | <guard VMA>
|---------------------------/\/| |
Then map in the file0, having split anon:
->
0x200000000000 0x200000ffc000 0x200001000000
|---------------------------/\/|--------------------------------------|
| anon | file0, no uprobe, unfaulted | <guard VMA>
|---------------------------/\/|--------------------------------------|
Then we do the BPF shenanigans to install a uprobe in file0 when touched.
Then we mremap() [0x200000ffc000, 0x200001000000) to 0x200000002000.
This means we first unmap a region to fit it (err sorry diagram not to scale :):
->
0x200000000000 0x200000ffc000 0x200001000000
|-------| |-----/\/|--------------------------------------|
| anon | | anon | file0, no uprobe, unfaulted |
|-------| |-----/\/|--------------------------------------|
Then copy the VMA over, and since there's no page tables to copy, both the
source and destination VMA are basically equivalent due to MREMAP_DONTUNMAP:
->
0x200000000000 0x200000ffc000 0x200001000000
|-------|-------------|-----/\/|--------------------------------------|
| anon |file0,!u,!f | | file0, no uprobe, unfaulted |
|-------|-------------|-----/\/|--------------------------------------|
0x200000002000
0x200000006000
Note that the MREMAP_DONTUNMAP means we leave the file0 as-is since it's
unfaulted and no uprobe so no page tables, nothing.
Now - the repro isn't very clear (to say the least...!) but note that it makes
it possible for these mremap()'s to carry on in the background while other
things happen.
After this, we open then mmap() /dev/comedi3 at 0x200000ffe000, which overwrites
the latter portion of [0x200000ffc000, 0x200000ffe000).
Firstly we unmap the portion we are about to install, which as a result, causes
a VMA split. In this mode, we create a new duplicate VMA of the file0 VMA, which
means that when __split_vma() calls vma_complete() it invokes this logic:
if (vp->insert && vp->file)
uprobe_mmap(vp->insert);
Which faults in the VMA:
->
0x200000000000 0x200000ffc000 0x200000ffe000 0x200001000000
|-------|-------------|-----/\/|-------------------| |
| anon |file0,!u,!f | | file0, faulted in | | <guard VMA>
|-------|-------------|-----/\/|-------------------| |
0x200000002000
0x200000006000
We then put the /dev/comedi3 VMA in place:
->
0x200000000000 0x200000ffc000 0x200000ffe000 0x200001000000
|-------|-------------|-----/\/|-------------------|------------------|
| anon |file0,!u,!f | | file0, faulted in | comedi |
|-------|-------------|-----/\/|-------------------|------------------|
0x200000002000
0x200000006000
Now, with the background mremap()'ing going on, we might end up triggering the
multi-VMA move logic, which will separately move the file0 and comedi VMAs.
Note that this is not necessary to the bug, it's just what syzkaller happened to
end up using.
So if we at this stage mremap() the range [0x200000ffc000, 0x200001000000) this
will be executed as two separate moves - [0x200000ffc000, 0x200000ffe000) to
[0x200000002000, 0x200000004000) and [0x200000ffe000, 0x200001000000) to
[0x200000004000, 0x200000006000).
So we start with the file0 move.
mremap() will first unmap the range prior to the unfaulted file0 that we
previously copied:
->
0x200000000000 0x200000ffc000 0x200000ffe000 0x200001000000
|-------| |-------|-----/\/|-------------------|------------------|
| anon | ^ |f1,!u!f| | file0, faulted in | comedi |
|-------| | |-------|-----/\/|-------------------|------------------|
0x200000002000 |
| 0x200000004000 | to move
| 0x200000006000 |
|-----------------------------|
Then it will do the copy, and try to merge. A merge will succeed, as the
unfaulted file0 and and faulted file0 are compatible - very importantly, as
these are MAP_PRIVATE mappings of files, the vma->vm_pgoff offsets will be
compatible even with the faulted in 0x200000ffc000 VMA.
If these were anonymous, the vma->vm_pgoff would not be compatible.
They are compatible because of commit 879bca0a2c4f ("mm/vma: fix incorrectly
disallowed anonymous VMA merges") - the source of the bug AFAICT.
They're compatible because of case 1 in is_mergeable_anon_vma():
/* Case 1 - we will dup_anon_vma() from src into tgt. */
if (!tgt_anon && src_anon)
return !vma_had_uncowed_parents(src);
And because the _new VMA_ merge will treat the 0x200000004000 as the target VMA,
which will be expanded downwards to perform the merge.
The merge will be performed by copy_vma() via vma_merge_new_range(), which
ultimately invokes vma_expand() (big credit to Harry for honing in on this!)
This logic contains:
if (next && (target != next) && (vmg->end == next->vm_end)) {
...
ret = dup_anon_vma(target, next, &anon_dup);
...
}
However note that our target _is_ next.
So we end up with a folio that points at the 0x200000ffc000 VMA's anon_vma, and
the moved VMA does _not_ have 0x200000ffc000's anon_vma pointing at it at all,
there are no entries in the interval tree from it to the 0x200000002000 VMA, nor
does the 0x200000002000 VMA have its vma->anon_vma point at it.
So in the dontunmap_complete() function in mm/mremap.c we invoke
unlink_anon_vmas() for the 0x200000ffc000 VMA:
if (new_vma != vrm->vma && start == old_start && end == old_end)
unlink_anon_vmas(vrm->vma);
And this function methodically goes through each entry in the anon_vma chain of
the 0x200000ffc000 VMA in two cycles - first removing interval tree entries to
the VMA:
list_for_each_entry_safe(avc, next, &vma->anon_vma_chain, same_vma) {
...
anon_vma_interval_tree_remove(avc, &anon_vma->rb_root);
...
Then, if no interval tree edges exist, we leave it in the VMA's anon_vma_chain
list for later processing:
if (RB_EMPTY_ROOT(&anon_vma->rb_root.rb_root)) {
anon_vma->parent->num_children--;
continue;
}
And note this _will_ be the case here because we did NOT invoke dup_anon_vma()
and did NOT install any interval tree edges pointing at the 0x200000002000 VMA.
Then on the second loop, we will put the anon_vma, which has a refcount of 1,
which means it's freed:
list_for_each_entry_safe(avc, next, &vma->anon_vma_chain, same_vma) {
...
put_anon_vma(anon_vma);
...
}
And thus now the folio faulted in for the 0x200000ffc000 VMA has a pointer at an
anon_vma of refcount 0 (it's a SLAB_TYPESAFE_BY_RCU object so until an RCU grace
period can still be accessed with this refcount state).
And thus we trigger Jann's assert :)
So taking a step back - it turns out vma_expand() is _only_ invoked in a case
where it isn't just expanding a prev VMA in a situation where there might be an
anon_vma to propagate when copying a VMA when performing an mremap(), i.e. via
copy_vma().
So what we need here is to actually keep a track of the VMA we're copying from,
and dup_anon_vma() this in this special case (since the duplication logic needs
access to the old VMA's ->anon_vma_chain).
I have code that does this, and it fixes the bug.
So it seems to me therefore that commit 879bca0a2c4f ("mm/vma: fix incorrectly
disallowed anonymous VMA merges") introduced the bug, as I didn't consider this
very subtle corner case.
I am working on a less horrifying repro, but I already have a patch to fix this.
Which I include below for reference (against mm-new tree which bizarrely still
doesn't have my latest anon_vma work in it). I'm going to try to get this better
repro sorted so I can add a test or at least reference the repro.
Once I have that sorted out I'll send a proper patch. Not sure I'll catch the
next -rc but will highlight it's urgent.
The uprobe being invoked like that after an adjacent unmap is err... strange but
I guess maybe what we want? That's something else to consider anyway.
But one thing at a time, we need to get a fix out for this ASAP focusing on the
anon_vma bug, so this will look roughly look like:
---
mm/vma.c | 55 +++++++++++++++++++++++++++++++++++++++++--------------
mm/vma.h | 3 +++
2 files changed, 44 insertions(+), 14 deletions(-)
diff --git a/mm/vma.c b/mm/vma.c
index 7c712e0be28f..56bb46a2126c 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -1135,26 +1135,47 @@ int vma_expand(struct vma_merge_struct *vmg)
mmap_assert_write_locked(vmg->mm);
vma_start_write(target);
- if (next && (target != next) && (vmg->end == next->vm_end)) {
+ if (next && vmg->end == next->vm_end) {
int ret;
- sticky_flags |= next->vm_flags & VM_STICKY;
- remove_next = true;
- /* This should already have been checked by this point. */
- VM_WARN_ON_VMG(!can_merge_remove_vma(next), vmg);
- vma_start_write(next);
- /*
- * In this case we don't report OOM, so vmg->give_up_on_mm is
- * safe.
- */
- ret = dup_anon_vma(target, next, &anon_dup);
- if (ret)
- return ret;
+ if (target != next) {
+ sticky_flags |= next->vm_flags & VM_STICKY;
+ remove_next = true;
+ /* This should already have been checked by this point. */
+ VM_WARN_ON_VMG(!can_merge_remove_vma(next), vmg);
+ vma_start_write(next);
+ /*
+ * In this case we don't report OOM, so vmg->give_up_on_mm is
+ * safe.
+ */
+ ret = dup_anon_vma(target, next, &anon_dup);
+ if (ret)
+ return ret;
+ } else if (vmg->copied_from) {
+ /*
+ * We are copying from a VMA (i.e. mremap()'ing) having
+ * unmapped the target range. If we merge into next,
+ * then we must ensure the anon_vma is correctly
+ * propagated.
+ */
+ ret = dup_anon_vma(target, vmg->copied_from, &anon_dup);
+ if (ret)
+ return ret;
+ } else {
+ /* In no other case may the anon_vma differ. */
+ VM_WARN_ON_VMG(target->anon_vma != next->anon_vma, vmg);
+ }
}
/* Not merging but overwriting any part of next is not handled. */
VM_WARN_ON_VMG(next && !remove_next &&
next != target && vmg->end > next->vm_start, vmg);
+ /*
+ * We should only see a copy with next as the target on a new merge
+ * which sets the end to the next of next.
+ */
+ VM_WARN_ON_VMG(target == next && vmg->copied_from &&
+ vmg->end != next->vm_end, vmg);
/* Only handles expanding */
VM_WARN_ON_VMG(target->vm_start < vmg->start ||
target->vm_end > vmg->end, vmg);
@@ -1823,6 +1844,13 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
VMA_ITERATOR(vmi, mm, addr);
VMG_VMA_STATE(vmg, &vmi, NULL, vma, addr, addr + len);
+ /*
+ * VMG_VMA_STATE() installs vma in middle, but this is a new VMA, inform
+ * merging logic correctly.
+ */
+ vmg.copied_from = vma;
+ vmg.middle = NULL;
+
/*
* If anonymous vma has not yet been faulted, update new pgoff
* to match new location, to increase its chance of merging.
@@ -1844,7 +1872,6 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
if (new_vma && new_vma->vm_start < addr + len)
return NULL; /* should never get here */
- vmg.middle = NULL; /* New VMA range. */
vmg.pgoff = pgoff;
vmg.next = vma_iter_next_rewind(&vmi, &prev);
prev_start = prev->vm_start;
diff --git a/mm/vma.h b/mm/vma.h
index e4c7bd79de5f..50f0bdb0eb79 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -106,6 +106,9 @@ struct vma_merge_struct {
struct anon_vma_name *anon_name;
enum vma_merge_state state;
+ /* If we are copying a VMA, which VMA are we copying from? */
+ struct vm_area_struct *copied_from;
+
/* Flags which callers can use to modify merge behaviour: */
/*
--
2.52.0
Cheers, Lorenzo
next prev parent reply other threads:[~2026-01-02 16:31 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-23 5:23 syzbot
2025-12-23 8:24 ` David Hildenbrand (Red Hat)
2025-12-24 2:48 ` Hillf Danton
2025-12-24 5:35 ` Harry Yoo
2025-12-30 22:02 ` David Hildenbrand (Red Hat)
2025-12-31 6:59 ` Harry Yoo
2026-01-01 13:09 ` Jeongjun Park
2026-01-01 13:45 ` Harry Yoo
2026-01-01 14:30 ` Jeongjun Park
2026-01-01 16:32 ` Lorenzo Stoakes
2026-01-01 17:06 ` David Hildenbrand (Red Hat)
2026-01-01 21:28 ` Lorenzo Stoakes
2026-01-02 8:14 ` Harry Yoo
2026-01-02 11:31 ` Lorenzo Stoakes
2026-01-02 15:49 ` Lorenzo Stoakes
2026-01-02 16:30 ` Lorenzo Stoakes [this message]
2026-01-02 17:46 ` Lorenzo Stoakes
2026-01-01 16:54 ` Lorenzo Stoakes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=75ba8e22-9f00-489b-989a-373d374244f5@lucifer.local \
--to=lorenzo.stoakes@oracle.com \
--cc=Liam.Howlett@oracle.com \
--cc=aha310510@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=david@kernel.org \
--cc=harry.yoo@oracle.com \
--cc=jannh@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=riel@surriel.com \
--cc=syzbot+b165fc2e11771c66d8ba@syzkaller.appspotmail.com \
--cc=syzkaller-bugs@googlegroups.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox