From: "Liam R. Howlett" <Liam.Howlett@oracle.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Suren Baghdasaryan <surenb@google.com>,
Vlastimil Babka <vbabka@suse.cz>,
Shakeel Butt <shakeel.butt@linux.dev>,
David Hildenbrand <david@kernel.org>,
Rik van Riel <riel@surriel.com>, Harry Yoo <harry.yoo@oracle.com>,
Jann Horn <jannh@google.com>, Mike Rapoport <rppt@kernel.org>,
Michal Hocko <mhocko@suse.com>, Pedro Falcato <pfalcato@suse.de>,
Chris Li <chriscli@google.com>,
Barry Song <v-songbaohua@oppo.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 7/8] mm/rmap: allocate anon_vma_chain objects unlocked when possible
Date: Tue, 6 Jan 2026 14:02:48 -0500 [thread overview]
Message-ID: <r57ezh55nc7x2uzepv4xxacmt5ar5mdlb732vllfh5juxnavv4@x3e33a6m2gxt> (raw)
In-Reply-To: <ff2651a4d5b73c7cc8160c55d46d6e5385996e62.1767711638.git.lorenzo.stoakes@oracle.com>
* Lorenzo Stoakes <lorenzo.stoakes@oracle.com> [260106 10:04]:
> There is no reason to allocate the anon_vma_chain under the anon_vma write
> lock when cloning - we can in fact assign these to the destination VMA
> safely as we hold the exclusive mmap lock and therefore preclude anybody
> else accessing these fields.
>
> We only need take the anon_vma write lock when we link rbtree edges from
> the anon_vma to the newly established AVCs.
>
> This also allows us to eliminate the weird GFP_NOWAIT, GFP_KERNEL dance
> introduced in commit dd34739c03f2 ("mm: avoid anon_vma_chain allocation
> under anon_vma lock"), further simplifying this logic.
>
> This should reduce lock anon_vma contention, and clarifies exactly where
> the anon_vma lock is required.
>
> We cannot adjust __anon_vma_prepare() in the same way as this is only
> protected by VMA read lock, so we have to perform the allocation here under
> the anon_vma write lock and page_table_lock (to protect against racing
> threads), and we wish to retain the lock ordering.
>
> With this change we can simplify cleanup_partial_anon_vmas() even further -
> since we allocate AVC's without any lock taken and do not insert anything
> into the interval tree until after the allocations are tried, we can remove
> all logic pertaining to this and just free up AVC's only.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> ---
> mm/rmap.c | 78 +++++++++++++++++++++++++------------------------------
> 1 file changed, 35 insertions(+), 43 deletions(-)
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 6ac42671bedd..8f4393546bce 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -147,14 +147,13 @@ static void anon_vma_chain_free(struct anon_vma_chain *anon_vma_chain)
> kmem_cache_free(anon_vma_chain_cachep, anon_vma_chain);
> }
>
> -static void anon_vma_chain_link(struct vm_area_struct *vma,
> - struct anon_vma_chain *avc,
> - struct anon_vma *anon_vma)
> +static void anon_vma_chain_assign(struct vm_area_struct *vma,
> + struct anon_vma_chain *avc,
> + struct anon_vma *anon_vma)
> {
> avc->vma = vma;
> avc->anon_vma = anon_vma;
> list_add(&avc->same_vma, &vma->anon_vma_chain);
> - anon_vma_interval_tree_insert(avc, &anon_vma->rb_root);
> }
>
> /**
> @@ -211,7 +210,8 @@ int __anon_vma_prepare(struct vm_area_struct *vma)
> spin_lock(&mm->page_table_lock);
> if (likely(!vma->anon_vma)) {
> vma->anon_vma = anon_vma;
> - anon_vma_chain_link(vma, avc, anon_vma);
> + anon_vma_chain_assign(vma, avc, anon_vma);
> + anon_vma_interval_tree_insert(avc, &anon_vma->rb_root);
> anon_vma->num_active_vmas++;
> allocated = NULL;
> avc = NULL;
> @@ -292,21 +292,31 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
>
> check_anon_vma_clone(dst, src);
>
> - /* All anon_vma's share the same root. */
> + /*
> + * Allocate AVCs. We don't need an anon_vma lock for this as we
> + * are not updating the anon_vma rbtree nor are we changing
> + * anon_vma statistics.
> + *
> + * We hold the exclusive mmap write lock so there's no possibliity of
> + * the unlinked AVC's being observed yet.
> + */
> + list_for_each_entry(pavc, &src->anon_vma_chain, same_vma) {
> + avc = anon_vma_chain_alloc(GFP_KERNEL);
> + if (!avc)
> + goto enomem_failure;
> +
> + anon_vma_chain_assign(dst, avc, pavc->anon_vma);
> + }
> +
> + /*
> + * Now link the anon_vma's back to the newly inserted AVCs.
> + * Note that all anon_vma's share the same root.
> + */
> anon_vma_lock_write(src->anon_vma);
> - list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
> - struct anon_vma *anon_vma;
> -
> - avc = anon_vma_chain_alloc(GFP_NOWAIT);
> - if (unlikely(!avc)) {
> - anon_vma_unlock_write(src->anon_vma);
> - avc = anon_vma_chain_alloc(GFP_KERNEL);
> - if (!avc)
> - goto enomem_failure;
> - anon_vma_lock_write(src->anon_vma);
> - }
> - anon_vma = pavc->anon_vma;
> - anon_vma_chain_link(dst, avc, anon_vma);
> + list_for_each_entry_reverse(avc, &dst->anon_vma_chain, same_vma) {
> + struct anon_vma *anon_vma = avc->anon_vma;
> +
> + anon_vma_interval_tree_insert(avc, &anon_vma->rb_root);
>
> /*
> * Reuse existing anon_vma if it has no vma and only one
> @@ -322,7 +332,6 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
> }
> if (dst->anon_vma)
> dst->anon_vma->num_active_vmas++;
> -
> anon_vma_unlock_write(src->anon_vma);
> return 0;
>
> @@ -384,8 +393,10 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
> get_anon_vma(anon_vma->root);
> /* Mark this anon_vma as the one where our new (COWed) pages go. */
> vma->anon_vma = anon_vma;
> + anon_vma_chain_assign(vma, avc, anon_vma);
> + /* Now let rmap see it. */
> anon_vma_lock_write(anon_vma);
> - anon_vma_chain_link(vma, avc, anon_vma);
> + anon_vma_interval_tree_insert(avc, &anon_vma->rb_root);
> anon_vma->parent->num_children++;
> anon_vma_unlock_write(anon_vma);
>
> @@ -402,34 +413,15 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
> * In the unfortunate case of anon_vma_clone() failing to allocate memory we
> * have to clean things up.
> *
> - * On clone we hold the exclusive mmap write lock, so we can't race
> - * unlink_anon_vmas(). Since we're cloning, we know we can't have empty
> - * anon_vma's, since existing anon_vma's are what we're cloning from.
> - *
> - * So this function needs only traverse the anon_vma_chain and free each
> - * allocated anon_vma_chain.
> + * Since we allocate anon_vma_chain's before we insert them into the interval
> + * trees, we simply have to free up the AVC's and remove the entries from the
> + * VMA's anon_vma_chain.
> */
> static void cleanup_partial_anon_vmas(struct vm_area_struct *vma)
> {
> struct anon_vma_chain *avc, *next;
> - bool locked = false;
> -
> - /*
> - * We exclude everybody else from being able to modify anon_vma's
> - * underneath us.
> - */
> - mmap_assert_locked(vma->vm_mm);
>
> list_for_each_entry_safe(avc, next, &vma->anon_vma_chain, same_vma) {
> - struct anon_vma *anon_vma = avc->anon_vma;
> -
> - /* All anon_vma's share the same root. */
> - if (!locked) {
> - anon_vma_lock_write(anon_vma);
> - locked = true;
> - }
> -
> - anon_vma_interval_tree_remove(avc, &anon_vma->rb_root);
> list_del(&avc->same_vma);
> anon_vma_chain_free(avc);
> }
> --
> 2.52.0
>
next prev parent reply other threads:[~2026-01-06 19:03 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-06 15:04 [PATCH v2 0/8] mm: clean up anon_vma implementation Lorenzo Stoakes
2026-01-06 15:04 ` [PATCH v2 1/8] mm/rmap: improve anon_vma_clone(), unlink_anon_vmas() comments, add asserts Lorenzo Stoakes
2026-01-14 16:14 ` Suren Baghdasaryan
2026-01-14 18:33 ` Lorenzo Stoakes
2026-01-14 18:48 ` Lorenzo Stoakes
2026-01-14 19:02 ` Lorenzo Stoakes
2026-01-14 21:59 ` Andrew Morton
2026-01-15 9:15 ` Lorenzo Stoakes
2026-01-06 15:04 ` [PATCH v2 2/8] mm/rmap: skip unfaulted VMAs on anon_vma clone, unlink Lorenzo Stoakes
2026-01-06 18:34 ` Liam R. Howlett
2026-01-14 16:47 ` Suren Baghdasaryan
2026-01-06 15:04 ` [PATCH v2 3/8] mm/rmap: remove unnecessary root lock dance in anon_vma clone, unmap Lorenzo Stoakes
2026-01-06 18:42 ` Liam R. Howlett
2026-01-14 16:55 ` Suren Baghdasaryan
2026-01-14 17:54 ` Lorenzo Stoakes
2026-01-14 18:01 ` Suren Baghdasaryan
2026-01-14 22:12 ` Andrew Morton
2026-01-15 9:14 ` Lorenzo Stoakes
2026-01-06 15:04 ` [PATCH v2 4/8] mm/rmap: remove anon_vma_merge() function Lorenzo Stoakes
2026-01-06 18:42 ` Liam R. Howlett
2026-01-06 15:04 ` [PATCH v2 5/8] mm/rmap: make anon_vma functions internal Lorenzo Stoakes
2026-01-06 18:54 ` Liam R. Howlett
2026-01-06 15:04 ` [PATCH v2 6/8] mm/mmap_lock: add vma_is_attached() helper Lorenzo Stoakes
2026-01-06 18:56 ` Liam R. Howlett
2026-01-06 15:04 ` [PATCH v2 7/8] mm/rmap: allocate anon_vma_chain objects unlocked when possible Lorenzo Stoakes
2026-01-06 19:02 ` Liam R. Howlett [this message]
2026-01-08 18:51 ` Lorenzo Stoakes
2026-01-06 15:04 ` [PATCH v2 8/8] mm/rmap: separate out fork-only logic on anon_vma_clone() Lorenzo Stoakes
2026-01-06 19:27 ` Liam R. Howlett
2026-01-08 17:58 ` Lorenzo Stoakes
2026-01-14 17:26 ` Suren Baghdasaryan
2026-01-08 18:52 ` Lorenzo Stoakes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=r57ezh55nc7x2uzepv4xxacmt5ar5mdlb732vllfh5juxnavv4@x3e33a6m2gxt \
--to=liam.howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=chriscli@google.com \
--cc=david@kernel.org \
--cc=harry.yoo@oracle.com \
--cc=jannh@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=pfalcato@suse.de \
--cc=riel@surriel.com \
--cc=rppt@kernel.org \
--cc=shakeel.butt@linux.dev \
--cc=surenb@google.com \
--cc=v-songbaohua@oppo.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox