From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Suren Baghdasaryan <surenb@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@suse.cz>,
Shakeel Butt <shakeel.butt@linux.dev>,
David Hildenbrand <david@kernel.org>,
Rik van Riel <riel@surriel.com>, Harry Yoo <harry.yoo@oracle.com>,
Jann Horn <jannh@google.com>, Mike Rapoport <rppt@kernel.org>,
Michal Hocko <mhocko@suse.com>, Pedro Falcato <pfalcato@suse.de>,
Chris Li <chriscli@google.com>,
Barry Song <v-songbaohua@oppo.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 1/8] mm/rmap: improve anon_vma_clone(), unlink_anon_vmas() comments, add asserts
Date: Wed, 14 Jan 2026 18:33:39 +0000 [thread overview]
Message-ID: <3da4df0f-4205-4f69-bd3a-cebfce5b6f9c@lucifer.local> (raw)
In-Reply-To: <CAJuCfpHoNdg4p__pmdsQepzxSpLe4j_7s17=Y5bicETacP4a+Q@mail.gmail.com>
On Wed, Jan 14, 2026 at 08:14:23AM -0800, Suren Baghdasaryan wrote:
> On Tue, Jan 6, 2026 at 7:13 AM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> >
> > Add kdoc comments and describe exactly what these functions are used for in
> > detail, pointing out importantly that the anon_vma_clone() !dst->anon_vma
> > && src->anon_vma dance is ONLY for fork.
> >
> > Both are confusing functions that will be refactored in a subsequent patch
> > but the first stage is establishing documentation and some invariants.
> >
> > Add some basic CONFIG_DEBUG_VM asserts that help document expected state,
> > specifically:
> >
> > anon_vma_clone()
> > - mmap write lock held.
> > - We do nothing if src VMA is not faulted.
> > - The destination VMA has no anon_vma_chain yet.
> > - We are always operating on the same active VMA (i.e. vma->anon_vma).
> > - If not forking, must operate on the same mm_struct.
> >
> > unlink_anon_vmas()
> > - mmap lock held (write lock except when freeing page tables).
> > - That unfaulted VMAs are no-ops.
> >
> > We are presented with a special case when anon_vma_clone() fails to
> > allocate memory, where we have a VMA with partially set up anon_vma state.
> > Since we hold the exclusive mmap write lock, and since we are cloning from
> > a source VMA which consequently can't also have its anon_vma state
> > modified, we know no anon_vma referenced can be empty.
> >
> > This allows us to significantly simplify this case and just remove
> > anon_vma_chain objects associated with the VMA, so we add a specific
> > partial cleanup path for this scenario.
> >
> > This also allows us to drop the hack of setting vma->anon_vma to NULL
> > before unlinking anon_vma state in this scenario.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> > ---
> > mm/rmap.c | 130 +++++++++++++++++++++++++++++++++++++++++++-----------
> > 1 file changed, 104 insertions(+), 26 deletions(-)
> >
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index c86f1135222b..54ccf884d90a 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -258,30 +258,62 @@ static inline void unlock_anon_vma_root(struct anon_vma *root)
> > up_write(&root->rwsem);
> > }
> >
> > -/*
> > - * Attach the anon_vmas from src to dst.
> > - * Returns 0 on success, -ENOMEM on failure.
> > - *
> > - * anon_vma_clone() is called by vma_expand(), vma_merge(), __split_vma(),
> > - * copy_vma() and anon_vma_fork(). The first four want an exact copy of src,
> > - * while the last one, anon_vma_fork(), may try to reuse an existing anon_vma to
> > - * prevent endless growth of anon_vma. Since dst->anon_vma is set to NULL before
> > - * call, we can identify this case by checking (!dst->anon_vma &&
> > - * src->anon_vma).
> > - *
> > - * If (!dst->anon_vma && src->anon_vma) is true, this function tries to find
> > - * and reuse existing anon_vma which has no vmas and only one child anon_vma.
> > - * This prevents degradation of anon_vma hierarchy to endless linear chain in
> > - * case of constantly forking task. On the other hand, an anon_vma with more
> > - * than one child isn't reused even if there was no alive vma, thus rmap
> > - * walker has a good chance of avoiding scanning the whole hierarchy when it
> > - * searches where page is mapped.
> > +static void check_anon_vma_clone(struct vm_area_struct *dst,
> > + struct vm_area_struct *src)
> > +{
> > + /* The write lock must be held. */
> > + mmap_assert_write_locked(src->vm_mm);
> > + /* If not a fork (implied by dst->anon_vma) then must be on same mm. */
> > + VM_WARN_ON_ONCE(dst->anon_vma && dst->vm_mm != src->vm_mm);
> > +
> > + /* If we have anything to do src->anon_vma must be provided. */
> > + VM_WARN_ON_ONCE(!src->anon_vma && !list_empty(&src->anon_vma_chain));
> > + VM_WARN_ON_ONCE(!src->anon_vma && dst->anon_vma);
> > + /* We are establishing a new anon_vma_chain. */
> > + VM_WARN_ON_ONCE(!list_empty(&dst->anon_vma_chain));
> > + /*
> > + * On fork, dst->anon_vma is set NULL (temporarily). Otherwise, anon_vma
> > + * must be the same across dst and src.
> > + */
> > + VM_WARN_ON_ONCE(dst->anon_vma && dst->anon_vma != src->anon_vma);
> > +}
> > +
> > +static void cleanup_partial_anon_vmas(struct vm_area_struct *vma);
> > +
> > +/**
> > + * anon_vma_clone - Establishes new anon_vma_chain objects in @dst linking to
> > + * all of the anon_vma objects contained within @src anon_vma_chain's.
> > + * @dst: The destination VMA with an empty anon_vma_chain.
> > + * @src: The source VMA we wish to duplicate.
> > + *
> > + * This is the heart of the VMA side of the anon_vma implementation - we invoke
> > + * this function whenever we need to set up a new VMA's anon_vma state.
> > + *
> > + * This is invoked for:
> > + *
> > + * - VMA Merge, but only when @dst is unfaulted and @src is faulted - meaning we
> > + * clone @src into @dst.
> > + * - VMA split.
> > + * - VMA (m)remap.
> > + * - Fork of faulted VMA.
> > + *
> > + * In all cases other than fork this is simply a duplication. Fork additionally
> > + * adds a new active anon_vma.
> > + *
> > + * ONLY in the case of fork do we try to 'reuse' existing anon_vma's in an
> > + * anon_vma hierarchy, reusing anon_vma's which have no VMA associated with them
> > + * but do have a single child. This is to avoid waste of memory when repeatedly
> > + * forking.
> > + *
> > + * Returns: 0 on success, -ENOMEM on failure.
> > */
> > int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
> > {
> > struct anon_vma_chain *avc, *pavc;
> > struct anon_vma *root = NULL;
> >
> > + check_anon_vma_clone(dst, src);
> > +
> > list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
> > struct anon_vma *anon_vma;
> >
> > @@ -315,14 +347,7 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
> > return 0;
> >
> > enomem_failure:
> > - /*
> > - * dst->anon_vma is dropped here otherwise its num_active_vmas can
> > - * be incorrectly decremented in unlink_anon_vmas().
> > - * We can safely do this because callers of anon_vma_clone() don't care
> > - * about dst->anon_vma if anon_vma_clone() failed.
> > - */
> > - dst->anon_vma = NULL;
> > - unlink_anon_vmas(dst);
> > + cleanup_partial_anon_vmas(dst);
> > return -ENOMEM;
> > }
> >
> > @@ -393,11 +418,64 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
> > return -ENOMEM;
> > }
> >
> > +/*
> > + * In the unfortunate case of anon_vma_clone() failing to allocate memory we
> > + * have to clean things up.
> > + *
> > + * On clone we hold the exclusive mmap write lock, so we can't race
> > + * unlink_anon_vmas(). Since we're cloning, we know we can't have empty
> > + * anon_vma's, since existing anon_vma's are what we're cloning from.
>
> nit: At first I got confused because it's possible that vma->anon_vma
> (which is dst->anon_vma) can be NULL but then I realized you are
> talking about avc->anon_vma here. Maybe change the comment to use
> avc->anon_vma instead of anon_vma for clarity?
I'm not sure it's useful, because even I say that and a person reading it was
under the impression vma->anon_vma was part of this they'd _still_ be confused
right?
vma->anon_vma is badly named and I'm a little conscious of not over-documenting
with so much detail that we cover all possible levels of confusion.
This series is emphatically meant to be a _foundation_ for fixing the obscene
mess, not to polish up a horror show.
So I'm not sure I'd not be just adding more questions by doing that or need to
add more paragraphs to explain that 'hey we're cleaning up the anon_vma's linked
via AVCs here which isn't the same as the _active_ anon_vma where faulted folios
point etc. etc.
>
> > + *
> > + * So this function needs only traverse the anon_vma_chain and free each
> > + * allocated anon_vma_chain.
> > + */
> > +static void cleanup_partial_anon_vmas(struct vm_area_struct *vma)
> > +{
> > + struct anon_vma_chain *avc, *next;
> > + bool locked = false;
> > +
> > + /*
> > + * We exclude everybody else from being able to modify anon_vma's
> > + * underneath us.
> > + */
> > + mmap_assert_locked(vma->vm_mm);
> > +
> > + list_for_each_entry_safe(avc, next, &vma->anon_vma_chain, same_vma) {
> > + struct anon_vma *anon_vma = avc->anon_vma;
> > +
> > + /* All anon_vma's share the same root. */
> > + if (!locked) {
> > + anon_vma_lock_write(anon_vma);
> > + locked = true;
> > + }
> > +
> > + anon_vma_interval_tree_remove(avc, &anon_vma->rb_root);
> > + list_del(&avc->same_vma);
> > + anon_vma_chain_free(avc);
> > + }
>
> Are you missing "if (locked) anon_vma_unlock_write()" here?
Ugh god yes. This is an annoying irrelevance as we eliminate it anyway later but
I'll do a fix-patch to avoid bisection hazard.
> You could also avoid using "locked" variable by setting anon_vma =
> NULL initially and using "if (anon_vma)" as an equivalent of "if
> (locked)"
I know, I did this on purpose to be explicit. The compiler will optimise this away.
Anyway it's moot as this gets removed later so I say let's keep it as it is.
>
> > +}
> > +
> > +/**
> > + * unlink_anon_vmas() - remove all links between a VMA and anon_vma's, freeing
> > + * anon_vma_chain objects.
> > + * @vma: The VMA whose links to anon_vma objects is to be severed.
> > + *
> > + * As part of the process anon_vma_chain's are freed,
> > + * anon_vma->num_children,num_active_vmas is updated as required and, if the
> > + * relevant anon_vma references no further VMAs, its reference count is
> > + * decremented.
> > + */
> > void unlink_anon_vmas(struct vm_area_struct *vma)
> > {
> > struct anon_vma_chain *avc, *next;
> > struct anon_vma *root = NULL;
> >
> > + /* Always hold mmap lock, read-lock on unmap possibly. */
> > + mmap_assert_locked(vma->vm_mm);
> > +
> > + /* Unfaulted is a no-op. */
> > + VM_WARN_ON_ONCE(!vma->anon_vma && !list_empty(&vma->anon_vma_chain));
> > +
> > /*
> > * Unlink each anon_vma chained to the VMA. This list is ordered
> > * from newest to oldest, ensuring the root anon_vma gets freed last.
> > --
> > 2.52.0
> >
next prev parent reply other threads:[~2026-01-14 18:33 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-06 15:04 [PATCH v2 0/8] mm: clean up anon_vma implementation Lorenzo Stoakes
2026-01-06 15:04 ` [PATCH v2 1/8] mm/rmap: improve anon_vma_clone(), unlink_anon_vmas() comments, add asserts Lorenzo Stoakes
2026-01-14 16:14 ` Suren Baghdasaryan
2026-01-14 18:33 ` Lorenzo Stoakes [this message]
2026-01-14 18:48 ` Lorenzo Stoakes
2026-01-14 19:02 ` Lorenzo Stoakes
2026-01-06 15:04 ` [PATCH v2 2/8] mm/rmap: skip unfaulted VMAs on anon_vma clone, unlink Lorenzo Stoakes
2026-01-06 18:34 ` Liam R. Howlett
2026-01-14 16:47 ` Suren Baghdasaryan
2026-01-06 15:04 ` [PATCH v2 3/8] mm/rmap: remove unnecessary root lock dance in anon_vma clone, unmap Lorenzo Stoakes
2026-01-06 18:42 ` Liam R. Howlett
2026-01-14 16:55 ` Suren Baghdasaryan
2026-01-14 17:54 ` Lorenzo Stoakes
2026-01-14 18:01 ` Suren Baghdasaryan
2026-01-06 15:04 ` [PATCH v2 4/8] mm/rmap: remove anon_vma_merge() function Lorenzo Stoakes
2026-01-06 18:42 ` Liam R. Howlett
2026-01-06 15:04 ` [PATCH v2 5/8] mm/rmap: make anon_vma functions internal Lorenzo Stoakes
2026-01-06 18:54 ` Liam R. Howlett
2026-01-06 15:04 ` [PATCH v2 6/8] mm/mmap_lock: add vma_is_attached() helper Lorenzo Stoakes
2026-01-06 18:56 ` Liam R. Howlett
2026-01-06 15:04 ` [PATCH v2 7/8] mm/rmap: allocate anon_vma_chain objects unlocked when possible Lorenzo Stoakes
2026-01-06 19:02 ` Liam R. Howlett
2026-01-08 18:51 ` Lorenzo Stoakes
2026-01-06 15:04 ` [PATCH v2 8/8] mm/rmap: separate out fork-only logic on anon_vma_clone() Lorenzo Stoakes
2026-01-06 19:27 ` Liam R. Howlett
2026-01-08 17:58 ` Lorenzo Stoakes
2026-01-14 17:26 ` Suren Baghdasaryan
2026-01-08 18:52 ` Lorenzo Stoakes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3da4df0f-4205-4f69-bd3a-cebfce5b6f9c@lucifer.local \
--to=lorenzo.stoakes@oracle.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=chriscli@google.com \
--cc=david@kernel.org \
--cc=harry.yoo@oracle.com \
--cc=jannh@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=pfalcato@suse.de \
--cc=riel@surriel.com \
--cc=rppt@kernel.org \
--cc=shakeel.butt@linux.dev \
--cc=surenb@google.com \
--cc=v-songbaohua@oppo.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox