Re: [PATCH] mm/vma: fix anon_vma UAF on mremap() faulted, unfaulted merge

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: "David Hildenbrand (Red Hat)" <david@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
	Pedro Falcato <pfalcato@suse.de>,
	Yeoreum Yun <yeoreum.yun@arm.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Jeongjun Park <aha310510@gmail.com>,
	Rik van Riel <riel@surriel.com>, Harry Yoo <harry.yoo@oracle.com>
Subject: Re: [PATCH] mm/vma: fix anon_vma UAF on mremap() faulted, unfaulted merge
Date: Mon, 5 Jan 2026 12:53:18 +0000	[thread overview]
Message-ID: <6ec35ba9-1829-4e40-ae5e-25d189397e26@lucifer.local> (raw)
In-Reply-To: <dc6a1474-70b4-47d5-b07d-a012b2c37b95@kernel.org>

On Sun, Jan 04, 2026 at 08:25:08PM +0100, David Hildenbrand (Red Hat) wrote:
> On 1/2/26 21:55, Lorenzo Stoakes wrote:
> > Commit 879bca0a2c4f ("mm/vma: fix incorrectly disallowed anonymous VMA
> > merges") introduced the ability to merge previously unavailable VMA merge
> > scenarios.
> >
> > The key piece of logic introduced was the ability to merge a faulted VMA
> > immediately next to an unfaulted VMA, which relies upon dup_anon_vma() to
> > correctly handle anon_vma state.
> >
> > In the case of the merge of an existing VMA (that is changing properties of
> > a VMA and then merging if those properties are shared by adjacent VMAs),
> > dup_anon_vma() is invoked correctly.
> >
> > However in the case of the merge of a new VMA, a corner case peculiar to
> > mremap() was missed.
> >
> > The issue is that vma_expand() only performs dup_anon_vma() if the target
> > (the VMA that will ultimately become the merged VMA): is not the next VMA,
> > i.e. the one that appears after the range in which the new VMA is to be
> > established.
> >
> > A key insight here is that in all other cases other than mremap(), a new
> > VMA merge either expands an existing VMA, meaning that the target VMA will
> > be that VMA, or would have anon_vma be NULL.
> >
> > Specifically:
> >
> > * __mmap_region() - no anon_vma in place, initial mapping.
> > * do_brk_flags() - expanding an existing VMA.
> > * vma_merge_extend() - expanding an existing VMA.
> > * relocate_vma_down() - no anon_vma in place, initial mapping.
> >
> > In addition, we are in the unique situation of needing to duplicate
> > anon_vma state from a VMA that is neither the previous or next VMA being
> > merged with.
> >
> > To account for this, introduce a new field in struct vma_merge_struct
> > specifically for the mremap() case, and update vma_expand() to explicitly
> > check for this case and invoke dup_anon_vma() to ensure anon_vma state is
> > correctly propagated.
> >
> > This issue can be observed most directly by invoked mremap() to move around
> > a VMA and cause this kind of merge with the MREMAP_DONTUNMAP flag
> > specified.
> >
> > This will result in unlink_anon_vmas() being called after failing to
> > duplicate anon_vma state to the target VMA, which results in the anon_vma
> > itself being freed with folios still possessing dangling pointers to the
> > anon_vma and thus a use-after-free bug.
>
> Makes sense to me.
>
> >
> > This bug was discovered via a syzbot report, which this patch resolves.
> >
> > The following program reproduces the issue (and is fixed by this patch):
> >
> > #define _GNU_SOURCE
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <unistd.h>
> > #include <sys/mman.h>
> >
> > #define RESERVED_PGS	(100)
> > #define VMA_A_PGS	(10)
> > #define VMA_B_PGS	(10)
> > #define NUM_ITERS	(1000)
> >
> > static void trigger_bug(void)
> > {
> > 	unsigned long page_size = sysconf(_SC_PAGE_SIZE);
> > 	char *reserved, *ptr_a, *ptr_b;
> >
> > 	/*
> > 	 * The goal here is to achieve:
> > 	 *
> > 	 * mremap() with MREMAP_DONTUNMAP such that A and B merge:
> > 	 *
> > 	 *      |-------------------------|
> > 	 *      |                         |
> > 	 *      |    |-----------|   |---------|
> > 	 *      v    | unfaulted |   | faulted |
> > 	 *           |-----------|   |---------|
> > 	 *                 B              A
> > 	 *
> > 	 * Then unmap VMA A to trigger the bug.
> > 	 */
> >
> > 	/* Reserve a region of memory to operate in. */
> > 	reserved = mmap(NULL, RESERVED_PGS * page_size, PROT_NONE,
> > 			MAP_PRIVATE | MAP_ANON, -1, 0);
> > 	if (reserved == MAP_FAILED) {
> > 		perror("mmap reserved");
> > 		exit(EXIT_FAILURE);
> > 	}
> >
> > 	/* Map VMA A into place. */
> > 	ptr_a = mmap(&reserved[page_size], VMA_A_PGS * page_size,
> > 		     PROT_READ | PROT_WRITE,
> > 		     MAP_PRIVATE | MAP_ANON | MAP_FIXED, -1, 0);
> > 	if (ptr_a == MAP_FAILED) {
> > 		perror("mmap VMA A");
> > 		exit(EXIT_FAILURE);
> > 	}
> > 	/* Fault it in. */
> > 	ptr_a[0] = 'x';
> >
> > 	/*
> > 	 * Now move it out of the way so we can place VMA B in position,
> > 	 * unfaulted.
> > 	 */
> > 	ptr_a = mremap(ptr_a, VMA_A_PGS * page_size, VMA_A_PGS * page_size,
> > 		       MREMAP_FIXED | MREMAP_MAYMOVE, &reserved[50 * page_size]);
> > 	if (ptr_a == MAP_FAILED) {
> > 		perror("mremap VMA A out of the way");
> > 		exit(EXIT_FAILURE);
> > 	}
> >
> > 	/* Map VMA B into place. */
> > 	ptr_b = mmap(&reserved[page_size + VMA_A_PGS * page_size],
> > 		     VMA_B_PGS * page_size, PROT_READ | PROT_WRITE,
> > 		     MAP_PRIVATE | MAP_ANON | MAP_FIXED, -1, 0);
> > 	if (ptr_b == MAP_FAILED) {
> > 		perror("mmap VMA B");
> > 		exit(EXIT_FAILURE);
> > 	}
> >
> > 	/* Now move VMA A into position w/MREMAP_DONTUNMAP + free anon_vma. */
> > 	ptr_a = mremap(ptr_a, VMA_A_PGS * page_size, VMA_A_PGS * page_size,
> > 		       MREMAP_FIXED | MREMAP_MAYMOVE | MREMAP_DONTUNMAP,
> > 		       &reserved[page_size]);
> > 	if (ptr_a == MAP_FAILED) {
> > 		perror("mremap VMA A with MREMAP_DONTUNMAP");
> > 		exit(EXIT_FAILURE);
> > 	}
> >
> > 	/* Finally, unmap VMA A which should trigger the bug. */
> > 	munmap(ptr_a, VMA_A_PGS * page_size);
> >
> > 	/* Cleanup in case bug didn't trigger sufficiently visibly... */
> > 	munmap(reserved, RESERVED_PGS * page_size);
> > }
> >
> > int main(void)
> > {
> > 	int i;
> >
> > 	for (i = 0; i < NUM_ITERS; i++)
> > 		trigger_bug();
>
> Just wondering, why do we have to loop, I would have thought that this would
> trigger deterministically.
>
> >
> > 	return EXIT_SUCCESS;
> > }
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > Fixes: 879bca0a2c4f ("mm/vma: fix incorrectly disallowed anonymous VMA merges")
> > Reported-by: syzbot+b165fc2e11771c66d8ba@syzkaller.appspotmail.com
> > Closes: https://lore.kernel.org/all/694a2745.050a0220.19928e.0017.GAE@google.com/
> > Cc: stable@kernel.org
>
> I was wondering whether this commit actually fixes older reports that Jann
> mentioned in his commit a222439e1e27 ("mm/rmap: add anon_vma lifetime debug check").
>
>     [1] https://lore.kernel.org/r/67abaeaf.050a0220.110943.0041.GAE@google.com
>     [2] https://lore.kernel.org/r/67a76f33.050a0220.3d72c.0028.GAE@google.com

They feel similar given the splats.

So what we had before was:

static inline bool is_mergeable_anon_vma(struct anon_vma *anon_vma1,
		 struct anon_vma *anon_vma2, struct vm_area_struct *vma)
{
	/*
	 * The list_is_singular() test is to avoid merging VMA cloned from
	 * parents. This can improve scalability caused by anon_vma lock.
	 */
	if ((!anon_vma1 || !anon_vma2) && (!vma ||
		list_is_singular(&vma->anon_vma_chain)))
		return true;
	return anon_vma1 == anon_vma2;
}


static bool can_vma_merge_before(struct vma_merge_struct *vmg)
{
	pgoff_t pglen = PHYS_PFN(vmg->end - vmg->start);

	if (is_mergeable_vma(vmg, /* merge_next = */ true) &&
	    is_mergeable_anon_vma(vmg->anon_vma, vmg->next->anon_vma, vmg->next)) {
		if (vmg->next->vm_pgoff == vmg->pgoff + pglen)
			return true;
	}

	return false;
}

So we'd disallow this kind of merge as vma == next and vmg->anon_vma = the
faulted-in VMA's anon_vma.


And after 6.16:

static bool is_mergeable_anon_vma(struct vma_merge_struct *vmg, bool merge_next)
{
	struct vm_area_struct *tgt = merge_next ? vmg->next : vmg->prev;
	struct vm_area_struct *src = vmg->middle; /* exisitng merge case. */
	struct anon_vma *tgt_anon = tgt->anon_vma;
	struct anon_vma *src_anon = vmg->anon_vma;

	/*
	 * We _can_ have !src, vmg->anon_vma via copy_vma(). In this instance we
	 * will remove the existing VMA's anon_vma's so there's no scalability
	 * concerns.
	 */
	VM_WARN_ON(src && src_anon != src->anon_vma);

	/* Case 1 - we will dup_anon_vma() from src into tgt. */
	if (!tgt_anon && src_anon)
		return !vma_had_uncowed_parents(src);
	/* Case 2 - we will simply use tgt's anon_vma. */
	if (tgt_anon && !src_anon)
		return !vma_had_uncowed_parents(tgt);
	/* Case 3 - the anon_vma's are already shared. */
	return src_anon == tgt_anon;
}

Where we _will_ allow this merge.

So I don't think this can be the same case.

That is bizarre.

By the way this also makes me think we should do something like:

&& !vma_had_uncowed_parents(vmg->copied_from)

For case 1... as otherwise we're making this case different from merging with
the VMA already moved.

Really using vma_merge_new_range() in copy_vma() is a hack as we're not merging
a _new_ VMA.

Hm maybe clearer to add vma_merge_copied_range() and just put all this horrid
stuff in one single place. Let me play around with this.

I will be adding tests.

>
>
> But 879bca0a2c4f went into v6.16, but [1] and [2] are against v6.14.
>
> So naturally I wonder, could it be that we had a bug even before 879bca0a2c4f that
> resulted in similar symptoms?
>
> Option (1): [1] and [2] are already fixed
>
> Option (2): [1] and [2] are still broken

Probably this.

>
> Option (3): [1] and [2] would be fixed by your patch as well
>
> But we don't even have reproducers, so [1] and [2] could just be a side-effect of
> another bug, maybe.

Yeah... so good idea to keep this assert here :)

>
>
>
> > ---
> >   mm/vma.c | 58 ++++++++++++++++++++++++++++++++++++++++++--------------
> >   mm/vma.h |  3 +++
> >   2 files changed, 47 insertions(+), 14 deletions(-)
> >
> > diff --git a/mm/vma.c b/mm/vma.c
> > index 6377aa290a27..2268f518a89b 100644
> > --- a/mm/vma.c
> > +++ b/mm/vma.c
> > @@ -1130,26 +1130,50 @@ int vma_expand(struct vma_merge_struct *vmg)
> >   	mmap_assert_write_locked(vmg->mm);
> >
> >   	vma_start_write(target);
> > -	if (next && (target != next) && (vmg->end == next->vm_end)) {
> > +	if (next && vmg->end == next->vm_end) {
> > +		struct vm_area_struct *copied_from = vmg->copied_from;
> >   		int ret;
> >
> > -		sticky_flags |= next->vm_flags & VM_STICKY;
> > -		remove_next = true;
> > -		/* This should already have been checked by this point. */
> > -		VM_WARN_ON_VMG(!can_merge_remove_vma(next), vmg);
> > -		vma_start_write(next);
> > -		/*
> > -		 * In this case we don't report OOM, so vmg->give_up_on_mm is
> > -		 * safe.
> > -		 */
> > -		ret = dup_anon_vma(target, next, &anon_dup);
> > -		if (ret)
> > -			return ret;
> > +		if (target != next) {
> > +			sticky_flags |= next->vm_flags & VM_STICKY;
> > +			remove_next = true;
> > +			/* This should already have been checked by this point. */
> > +			VM_WARN_ON_VMG(!can_merge_remove_vma(next), vmg);
> > +			vma_start_write(next);
> > +			/*
> > +			 * In this case we don't report OOM, so vmg->give_up_on_mm is
> > +			 * safe.
> > +			 */
> > +			ret = dup_anon_vma(target, next, &anon_dup);
> > +			if (ret)
> > +				return ret;
> > +		} else if (copied_from) {
> > +			vma_start_write(next);
> > +
> > +			/*
> > +			 * We are copying from a VMA (i.e. mremap()'ing) to
> > +			 * next, and thus must ensure that either anon_vma's are
> > +			 * already compatible (in which case this call is a nop)
> > +			 * or all anon_vma state is propagated to next
> > +			 */
> > +			ret = dup_anon_vma(next, copied_from, &anon_dup);
> > +			if (ret)
> > +				return ret;
> > +		} else {
> > +			/* In no other case may the anon_vma differ. */
> > +			VM_WARN_ON_VMG(target->anon_vma != next->anon_vma, vmg);
> > +		}
>
>
> No expert on that code, but looks reasonable to me.
>
> Wondering whether we want to pull the  vma_start_write(next) before the loop
> (for the warn we certainly don't care).

Actually I think we don't need vma_start_write(next) after all, since we're not
removing next in this case.

>
> --
> Cheers
>
> David

v2 incoming...

Thanks, Lorenzo

next prev parent reply	other threads:[~2026-01-05 12:53 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-02 20:55 Lorenzo Stoakes
2026-01-02 21:00 ` Lorenzo Stoakes
2026-01-04 19:25 ` David Hildenbrand (Red Hat)
2026-01-05 12:53   ` Lorenzo Stoakes [this message]
2026-01-05  5:11 ` Harry Yoo
2026-01-05  9:12   ` Lorenzo Stoakes
2026-01-05 15:24   ` Liam R. Howlett
2026-01-05 15:32     ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6ec35ba9-1829-4e40-ae5e-25d189397e26@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=aha310510@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@kernel.org \
    --cc=harry.yoo@oracle.com \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pfalcato@suse.de \
    --cc=riel@surriel.com \
    --cc=vbabka@suse.cz \
    --cc=yeoreum.yun@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox