From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Suren Baghdasaryan <surenb@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@suse.cz>,
Shakeel Butt <shakeel.butt@linux.dev>,
David Hildenbrand <david@kernel.org>,
Rik van Riel <riel@surriel.com>, Harry Yoo <harry.yoo@oracle.com>,
Jann Horn <jannh@google.com>, Mike Rapoport <rppt@kernel.org>,
Michal Hocko <mhocko@suse.com>, Pedro Falcato <pfalcato@suse.de>,
Chris Li <chriscli@google.com>,
Barry Song <v-songbaohua@oppo.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 7/8] mm/rmap: allocate anon_vma_chain objects unlocked when possible
Date: Tue, 6 Jan 2026 14:17:18 +0000 [thread overview]
Message-ID: <03723727-7fdf-4f06-8117-bbe2d6c2b7f7@lucifer.local> (raw)
In-Reply-To: <CAJuCfpHWUj6npkpMsgHmLHMk4NHRzaDyDG2oQk1SuQ53swcMSA@mail.gmail.com>
On Tue, Dec 30, 2025 at 01:35:41PM -0800, Suren Baghdasaryan wrote:
> On Wed, Dec 17, 2025 at 4:27 AM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> >
> > There is no reason to allocate the anon_vma_chain under the anon_vma write
> > lock when cloning - we can in fact assign these to the destination VMA
> > safely as we hold the exclusive mmap lock and therefore preclude anybody
> > else accessing these fields.
> >
> > We only need take the anon_vma write lock when we link rbtree edges from
> > the anon_vma to the newly established AVCs.
> >
> > This also allows us to eliminate the weird GFP_NOWAIT, GFP_KERNEL dance
> > introduced in commit dd34739c03f2 ("mm: avoid anon_vma_chain allocation
> > under anon_vma lock"), further simplifying this logic.
> >
> > This should reduce lock anon_vma contention, and clarifies exactly where
> > the anon_vma lock is required.
> >
> > We cannot adjust __anon_vma_prepare() in the same way as this is only
> > protected by VMA read lock, so we have to perform the allocation here under
> > the anon_vma write lock and page_table_lock (to protect against racing
> > threads), and we wish to retain the lock ordering.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> One nit but otherwise nice cleanup.
>
> Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Thanks!
One nice thing with the separate cleanup_partial_anon_vmas()'s function
introduced as part of this review (thanks for the good spot!) is we can now
simplify this _even further_ since we don't even insert anything into the
interval tree at the point of allocation, and so freeing is just a case of
freeing up AVC's.
>
> > ---
> > mm/rmap.c | 49 +++++++++++++++++++++++++++++--------------------
> > 1 file changed, 29 insertions(+), 20 deletions(-)
> >
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index 60134a566073..de9de6d71c23 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -146,14 +146,13 @@ static void anon_vma_chain_free(struct anon_vma_chain *anon_vma_chain)
> > kmem_cache_free(anon_vma_chain_cachep, anon_vma_chain);
> > }
> >
> > -static void anon_vma_chain_link(struct vm_area_struct *vma,
> > - struct anon_vma_chain *avc,
> > - struct anon_vma *anon_vma)
> > +static void anon_vma_chain_assign(struct vm_area_struct *vma,
> > + struct anon_vma_chain *avc,
> > + struct anon_vma *anon_vma)
> > {
> > avc->vma = vma;
> > avc->anon_vma = anon_vma;
> > list_add(&avc->same_vma, &vma->anon_vma_chain);
> > - anon_vma_interval_tree_insert(avc, &anon_vma->rb_root);
> > }
> >
> > /**
> > @@ -210,7 +209,8 @@ int __anon_vma_prepare(struct vm_area_struct *vma)
> > spin_lock(&mm->page_table_lock);
> > if (likely(!vma->anon_vma)) {
> > vma->anon_vma = anon_vma;
> > - anon_vma_chain_link(vma, avc, anon_vma);
> > + anon_vma_chain_assign(vma, avc, anon_vma);
> > + anon_vma_interval_tree_insert(avc, &anon_vma->rb_root);
> > anon_vma->num_active_vmas++;
> > allocated = NULL;
> > avc = NULL;
> > @@ -287,20 +287,28 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
> >
> > check_anon_vma_clone(dst, src);
> >
> > + /*
> > + * Allocate AVCs. We don't need an anon_vma lock for this as we
> > + * are not updating the anon_vma rbtree nor are we changing
> > + * anon_vma statistics.
> > + *
> > + * We hold the mmap write lock so there's no possibliity of
>
> To be more specific, we are holding src's mmap write lock. I think
> clarifying that will avoid any confusion.
Well, it's the same mm for both right? :) and actually the observations
would be made around dst no? As that's where the unlinked AVC's are being
established.
I think more clear is 'We hold the exclusive mmap write lock' just to
highlight that it excludes anybody else from accessing these fields in the
VMA.
>
> > + * the unlinked AVC's being observed yet.
> > + */
> > + list_for_each_entry(pavc, &src->anon_vma_chain, same_vma) {
> > + avc = anon_vma_chain_alloc(GFP_KERNEL);
> > + if (!avc)
> > + goto enomem_failure;
> > +
> > + anon_vma_chain_assign(dst, avc, pavc->anon_vma);
> > + }
> > +
> > + /* Now link the anon_vma's back to the newly inserted AVCs. */
> > anon_vma_lock_write(src->anon_vma);
> > - list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
> > - struct anon_vma *anon_vma;
> > -
> > - avc = anon_vma_chain_alloc(GFP_NOWAIT);
> > - if (unlikely(!avc)) {
> > - anon_vma_unlock_write(src->anon_vma);
> > - avc = anon_vma_chain_alloc(GFP_KERNEL);
> > - if (!avc)
> > - goto enomem_failure;
> > - anon_vma_lock_write(src->anon_vma);
> > - }
> > - anon_vma = pavc->anon_vma;
> > - anon_vma_chain_link(dst, avc, anon_vma);
> > + list_for_each_entry_reverse(avc, &dst->anon_vma_chain, same_vma) {
> > + struct anon_vma *anon_vma = avc->anon_vma;
> > +
> > + anon_vma_interval_tree_insert(avc, &anon_vma->rb_root);
> >
> > /*
> > * Reuse existing anon_vma if it has no vma and only one
> > @@ -316,7 +324,6 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
> > }
> > if (dst->anon_vma)
> > dst->anon_vma->num_active_vmas++;
> > -
> > anon_vma_unlock_write(src->anon_vma);
> > return 0;
> >
> > @@ -385,8 +392,10 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
> > get_anon_vma(anon_vma->root);
> > /* Mark this anon_vma as the one where our new (COWed) pages go. */
> > vma->anon_vma = anon_vma;
> > + anon_vma_chain_assign(vma, avc, anon_vma);
> > + /* Now let rmap see it. */
> > anon_vma_lock_write(anon_vma);
> > - anon_vma_chain_link(vma, avc, anon_vma);
> > + anon_vma_interval_tree_insert(avc, &anon_vma->rb_root);
> > anon_vma->parent->num_children++;
> > anon_vma_unlock_write(anon_vma);
> >
> > --
> > 2.52.0
> >
next prev parent reply other threads:[~2026-01-06 14:17 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-17 12:27 [PATCH 0/8] mm: clean up anon_vma implementation Lorenzo Stoakes
2025-12-17 12:27 ` [PATCH 1/8] mm/rmap: improve anon_vma_clone(), unlink_anon_vmas() comments, add asserts Lorenzo Stoakes
2025-12-19 18:22 ` Liam R. Howlett
2025-12-29 21:18 ` Suren Baghdasaryan
2025-12-30 21:21 ` Suren Baghdasaryan
2026-01-06 12:54 ` Lorenzo Stoakes
2026-01-06 13:01 ` Lorenzo Stoakes
2026-01-06 13:04 ` Lorenzo Stoakes
2026-01-06 13:34 ` Lorenzo Stoakes
2026-01-06 18:52 ` Suren Baghdasaryan
2026-01-06 13:51 ` Lorenzo Stoakes
2025-12-17 12:27 ` [PATCH 2/8] mm/rmap: skip unfaulted VMAs on anon_vma clone, unlink Lorenzo Stoakes
2025-12-19 18:28 ` Liam R. Howlett
2025-12-29 21:41 ` Suren Baghdasaryan
2026-01-06 13:17 ` Lorenzo Stoakes
2026-01-06 13:14 ` Lorenzo Stoakes
2026-01-06 13:42 ` Lorenzo Stoakes
2025-12-17 12:27 ` [PATCH 3/8] mm/rmap: remove unnecessary root lock dance in anon_vma clone, unmap Lorenzo Stoakes
2025-12-29 22:17 ` Suren Baghdasaryan
2026-01-06 13:58 ` Lorenzo Stoakes
2026-01-06 20:58 ` Suren Baghdasaryan
2026-01-08 17:46 ` Lorenzo Stoakes
2025-12-17 12:27 ` [PATCH 4/8] mm/rmap: remove anon_vma_merge() function Lorenzo Stoakes
2025-12-30 19:35 ` Suren Baghdasaryan
2026-01-06 14:00 ` Lorenzo Stoakes
2025-12-17 12:27 ` [PATCH 5/8] mm/rmap: make anon_vma functions internal Lorenzo Stoakes
2025-12-30 19:38 ` Suren Baghdasaryan
2026-01-06 14:03 ` Lorenzo Stoakes
2025-12-17 12:27 ` [PATCH 6/8] mm/mmap_lock: add vma_is_attached() helper Lorenzo Stoakes
2025-12-30 19:50 ` Suren Baghdasaryan
2026-01-06 14:06 ` Lorenzo Stoakes
2025-12-17 12:27 ` [PATCH 7/8] mm/rmap: allocate anon_vma_chain objects unlocked when possible Lorenzo Stoakes
2025-12-30 21:35 ` Suren Baghdasaryan
2026-01-06 14:17 ` Lorenzo Stoakes [this message]
2026-01-06 21:20 ` Suren Baghdasaryan
2026-01-08 17:26 ` Lorenzo Stoakes
2025-12-17 12:27 ` [PATCH 8/8] mm/rmap: separate out fork-only logic on anon_vma_clone() Lorenzo Stoakes
2025-12-30 22:02 ` Suren Baghdasaryan
2026-01-06 14:43 ` Lorenzo Stoakes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=03723727-7fdf-4f06-8117-bbe2d6c2b7f7@lucifer.local \
--to=lorenzo.stoakes@oracle.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=chriscli@google.com \
--cc=david@kernel.org \
--cc=harry.yoo@oracle.com \
--cc=jannh@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=pfalcato@suse.de \
--cc=riel@surriel.com \
--cc=rppt@kernel.org \
--cc=shakeel.butt@linux.dev \
--cc=surenb@google.com \
--cc=v-songbaohua@oppo.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox