linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Suren Baghdasaryan <surenb@google.com>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	David Hildenbrand <david@kernel.org>,
	Rik van Riel <riel@surriel.com>, Harry Yoo <harry.yoo@oracle.com>,
	Jann Horn <jannh@google.com>, Mike Rapoport <rppt@kernel.org>,
	Michal Hocko <mhocko@suse.com>, Pedro Falcato <pfalcato@suse.de>,
	Chris Li <chriscli@google.com>,
	Barry Song <v-songbaohua@oppo.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: [PATCH v3 2/9] mm/rmap: eliminate partial anon_vma tear-down in anon_vma_fork()
Date: Sun, 18 Jan 2026 14:50:38 +0000	[thread overview]
Message-ID: <9923da5f8b095dd1e8d677692dcaf95859de0ef5.1768746221.git.lorenzo.stoakes@oracle.com> (raw)
In-Reply-To: <cover.1768746221.git.lorenzo.stoakes@oracle.com>

We have spun a web of unnecessary headaches for ourselves in
anon_vma_fork() with the introduction of the anon_vma reuse logic, as
introduced by commit 7a3ef208e662 ("mm: prevent endless growth of anon_vma
hierarchy").

When we clone anon_vma's linked to a VMA via vma->anon_vma_chain, we check
each anon_vma for specific conditions, and if met we set vma->anon_vma to
this anon_vma to indicate we will reuse it rather than allocating a new
one.

It triggers upon the first ancestor anon_vma found that possesses at most 1
child, and no active VMAs.

This was implemented such that if you continually fork and free VMAs, you
would achieve anon_vma reuse rather than continually allocating unnecessary
new anon_vma's.

This however brings an unfortunate situation should a memory allocation
fail during this process. anon_vma_fork():

1. Clones the anon_vma.
2. If no reuse (i.e. !vma->anon_vma), tries to allocate anon_vma, AVC.
3. If 2 fails, we are forced to unwind step 1 by invoking
   unlink_anon_vmas(vma).

This means that we send a partially set up (i.e. invalid) VMA to
unlink_anon_vmas().

Doing this is dangerous and confusing - it is reasonable for kernel
developers to assume unlink_anon_vmas() is called on a correctly
established vma, and thus confusion arises if logic is implemented there to
account for invalid VMAs, and further development might introduce subtle
bugs.

It is especially problematic in the anon rmap implementation which is
essentially a broken abstraction.

The patch solves the issue by simply trying to allocate the anon_vma and
AVC ahead of time - i.e. optimising for the usual case - and freeing them
should reuse occur or an error arise in anon_vma_clone().

This is not egregious performance-wise, as this function is called on the
fork path which already performs a great number of allocations, and thus it
is already a slow-path in this respect.

It is additionally not egregious in terms of memory usage - the allocations
are too-small-to-fail anyway unless, for instance, a fatal signal may have
arisen, and any OOM for such tiny allocations that may arise would indicate
the system is under so much memory pressure that the associated process is
not long for this world anyway.

We also update anon_vma->num_active_vmas to 1 directly rather than
incrementing the newly allocated anon_vma's active VMA count - this makes
it clear that this detached anon_vma can have only 1 num_active_vma at this
point.

Finally we eliminate the out_error and out_error_free_anon_vma labels which
makes the logic much easier to follow. We also correct a small comment
typo.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 mm/rmap.c | 46 ++++++++++++++++++++++++----------------------
 1 file changed, 24 insertions(+), 22 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index cdb7618c10b1..a45b011e9846 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -360,7 +360,7 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
 {
 	struct anon_vma_chain *avc;
 	struct anon_vma *anon_vma;
-	int error;
+	int rc;
 
 	/* Don't bother if the parent process has no anon_vma here. */
 	if (!pvma->anon_vma)
@@ -369,27 +369,35 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
 	/* Drop inherited anon_vma, we'll reuse existing or allocate new. */
 	vma->anon_vma = NULL;
 
+	anon_vma = anon_vma_alloc();
+	if (!anon_vma)
+		return -ENOMEM;
+	avc = anon_vma_chain_alloc(GFP_KERNEL);
+	if (!avc) {
+		put_anon_vma(anon_vma);
+		return -ENOMEM;
+	}
+
 	/*
 	 * First, attach the new VMA to the parent VMA's anon_vmas,
 	 * so rmap can find non-COWed pages in child processes.
 	 */
-	error = anon_vma_clone(vma, pvma);
-	if (error)
-		return error;
-
-	/* An existing anon_vma has been reused, all done then. */
-	if (vma->anon_vma)
-		return 0;
+	rc = anon_vma_clone(vma, pvma);
+	/* An error arose or an existing anon_vma was reused, all done then. */
+	if (rc || vma->anon_vma) {
+		put_anon_vma(anon_vma);
+		anon_vma_chain_free(avc);
+		return rc;
+	}
 
-	/* Then add our own anon_vma. */
-	anon_vma = anon_vma_alloc();
-	if (!anon_vma)
-		goto out_error;
-	anon_vma->num_active_vmas++;
-	avc = anon_vma_chain_alloc(GFP_KERNEL);
-	if (!avc)
-		goto out_error_free_anon_vma;
+	/*
+	 * OK no reuse, so add our own anon_vma.
+	 *
+	 * Since it is not linked anywhere we can safely manipulate anon_vma
+	 * fields without a lock.
+	 */
 
+	anon_vma->num_active_vmas = 1;
 	/*
 	 * The root anon_vma's rwsem is the lock actually used when we
 	 * lock any of the anon_vmas in this anon_vma tree.
@@ -410,12 +418,6 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
 	anon_vma_unlock_write(anon_vma);
 
 	return 0;
-
- out_error_free_anon_vma:
-	put_anon_vma(anon_vma);
- out_error:
-	unlink_anon_vmas(vma);
-	return -ENOMEM;
 }
 
 /*
-- 
2.52.0



  parent reply	other threads:[~2026-01-18 14:52 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-18 14:50 [PATCH v3 0/9] mm: clean up anon_vma implementation Lorenzo Stoakes
2026-01-18 14:50 ` [PATCH v3 1/9] mm/rmap: improve anon_vma_clone(), unlink_anon_vmas() comments, add asserts Lorenzo Stoakes
2026-01-18 14:50 ` Lorenzo Stoakes [this message]
2026-01-18 14:50 ` [PATCH v3 3/9] mm/rmap: skip unfaulted VMAs on anon_vma clone, unlink Lorenzo Stoakes
2026-01-18 14:50 ` [PATCH v3 4/9] mm/rmap: remove unnecessary root lock dance in anon_vma clone, unmap Lorenzo Stoakes
2026-01-18 14:50 ` [PATCH v3 5/9] mm/rmap: remove anon_vma_merge() function Lorenzo Stoakes
2026-01-18 14:50 ` [PATCH v3 6/9] mm/rmap: make anon_vma functions internal Lorenzo Stoakes
2026-01-18 14:50 ` [PATCH v3 7/9] mm/mmap_lock: add vma_is_attached() helper Lorenzo Stoakes
2026-01-18 14:50 ` [PATCH v3 8/9] mm/rmap: allocate anon_vma_chain objects unlocked when possible Lorenzo Stoakes
2026-01-18 14:50 ` [PATCH v3 9/9] mm/rmap: separate out fork-only logic on anon_vma_clone() Lorenzo Stoakes
2026-01-18 20:06 ` [PATCH v3 0/9] mm: clean up anon_vma implementation Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9923da5f8b095dd1e8d677692dcaf95859de0ef5.1768746221.git.lorenzo.stoakes@oracle.com \
    --to=lorenzo.stoakes@oracle.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=chriscli@google.com \
    --cc=david@kernel.org \
    --cc=harry.yoo@oracle.com \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=pfalcato@suse.de \
    --cc=riel@surriel.com \
    --cc=rppt@kernel.org \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=v-songbaohua@oppo.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox