linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Suren Baghdasaryan <surenb@google.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Vlastimil Babka <vbabka@suse.cz>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	 David Hildenbrand <david@kernel.org>,
	Rik van Riel <riel@surriel.com>, Harry Yoo <harry.yoo@oracle.com>,
	 Jann Horn <jannh@google.com>, Mike Rapoport <rppt@kernel.org>,
	Michal Hocko <mhocko@suse.com>,  Pedro Falcato <pfalcato@suse.de>,
	Chris Li <chriscli@google.com>,
	 Barry Song <v-songbaohua@oppo.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/8] mm/rmap: improve anon_vma_clone(), unlink_anon_vmas() comments, add asserts
Date: Tue, 6 Jan 2026 10:52:45 -0800	[thread overview]
Message-ID: <CAJuCfpGt7=RXW_MHy+1kEN3N=kV9zO+E_b9ybqtpWz79eVhnew@mail.gmail.com> (raw)
In-Reply-To: <13c66c95-ca0d-4711-b755-676ec4066811@lucifer.local>

On Tue, Jan 6, 2026 at 4:54 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Mon, Dec 29, 2025 at 01:18:04PM -0800, Suren Baghdasaryan wrote:
> > On Fri, Dec 19, 2025 at 10:22 AM Liam R. Howlett
> > <Liam.Howlett@oracle.com> wrote:
> > >
> > > * Lorenzo Stoakes <lorenzo.stoakes@oracle.com> [251217 07:27]:
> > > > Add kdoc comments, describe exactly what these functinos are used for in
> > > > detail, pointing out importantly that the anon_vma_clone() !dst->anon_vma
> > > > && src->anon_vma dance is ONLY for fork.
> > > >
> > > > Both are confusing functions that will be refactored in a subsequent patch
> > > > but the first stage is establishing documentation and some invariatns.
> > > >
> > > > Add some basic CONFIG_DEBUG_VM asserts that help document expected state,
> > > > specifically:
> > > >
> > > > anon_vma_clone()
> > > > - mmap write lock held.
> > > > - We do nothing if src VMA is not faulted.
> > > > - The destination VMA has no anon_vma_chain yet.
> > > > - We are always operating on the same active VMA (i.e. vma->anon-vma).
> >
> > nit: s/vma->anon-vma/vma->anon_vma
>
> Thanks will correct.
>
> >
> > > > - If not forking, must operate on the same mm_struct.
> > > >
> > > > unlink_anon_vmas()
> > > > - mmap lock held (read on unmap downgraded).
> >
> > Out of curiosity I looked for the place where unlink_anon_vmas() is
> > called with mmap_lock downgraded to read but could not find it. Could
> > you please point me to it?
>
> In brk() we call:
>
> -> do_vmi_align_munmap()
> -> ... (below)
>
> On munmap() we call:
>
> -> __vm_munmap()
> -> do_vmi_munmap()
> -> do_vmi_align_munmap()
> -> ... (below)
>
> On mremap() when shrinking a VMA in place we call:
>
> -> mremap_at()
> -> shrink_vma()
> -> do_vmi_munmap()
> -> do_vmi_align_munmap()
> -> ... (below)
>
> And the ... is:
>
> -> vms_complete_munmap_vmas() [ does downgrade since vms->unlock ]
> -> vms_clear_ptes()
> -> free_pgtables()
>
> I've improved the comment anyway to make it a little clearer.

Ah, now I see. Thanks!

>
> >
> > > > - That unfaulted VMAs are no-ops.
> > > >
> > > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > >
> > > Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> > >
> > > > ---
> > > >  mm/rmap.c | 82 +++++++++++++++++++++++++++++++++++++++++++------------
> > > >  1 file changed, 64 insertions(+), 18 deletions(-)
> > > >
> > > > diff --git a/mm/rmap.c b/mm/rmap.c
> > > > index d6799afe1114..0e34c0a69fbc 100644
> > > > --- a/mm/rmap.c
> > > > +++ b/mm/rmap.c
> > > > @@ -257,30 +257,60 @@ static inline void unlock_anon_vma_root(struct anon_vma *root)
> > > >               up_write(&root->rwsem);
> > > >  }
> > > >
> > > > -/*
> > > > - * Attach the anon_vmas from src to dst.
> > > > - * Returns 0 on success, -ENOMEM on failure.
> > > > - *
> > > > - * anon_vma_clone() is called by vma_expand(), vma_merge(), __split_vma(),
> > > > - * copy_vma() and anon_vma_fork(). The first four want an exact copy of src,
> > > > - * while the last one, anon_vma_fork(), may try to reuse an existing anon_vma to
> > > > - * prevent endless growth of anon_vma. Since dst->anon_vma is set to NULL before
> > > > - * call, we can identify this case by checking (!dst->anon_vma &&
> > > > - * src->anon_vma).
> > > > - *
> > > > - * If (!dst->anon_vma && src->anon_vma) is true, this function tries to find
> > > > - * and reuse existing anon_vma which has no vmas and only one child anon_vma.
> > > > - * This prevents degradation of anon_vma hierarchy to endless linear chain in
> > > > - * case of constantly forking task. On the other hand, an anon_vma with more
> > > > - * than one child isn't reused even if there was no alive vma, thus rmap
> > > > - * walker has a good chance of avoiding scanning the whole hierarchy when it
> > > > - * searches where page is mapped.
> > > > +static void check_anon_vma_clone(struct vm_area_struct *dst,
> > > > +                              struct vm_area_struct *src)
> > > > +{
> > > > +     /* The write lock must be held. */
> > > > +     mmap_assert_write_locked(src->vm_mm);
> > > > +     /* If not a fork (implied by dst->anon_vma) then must be on same mm. */
> > > > +     VM_WARN_ON_ONCE(dst->anon_vma && dst->vm_mm != src->vm_mm);
> > > > +
> > > > +     /* No source anon_vma is a no-op. */
> >
> > I'm confused about the above comment. Do you mean that if
> > !src->anon_vma then it's a no-op and therefore this function shouldn't
> > be called? If so, we could simply have VM_WARN_ON_ONCE(!src->anon_vma)
>
> It's a no-op :) so it makes no sense to specify other fields. In a later commit
> we literally bail out of anon_vma_clone() if it's not specified. In fact the
> very next patch...
>
> > but checks below have more conditions. Can this comment be perhaps
> > expanded please so that the reader clearly understands what is allowed
> > and what is not. For example, combination (!src->anon_vma &&
> > !dst->anon_vma) is allowed and we correctly not triggering a warning
> > here, however that's still a no-op IIUC.
>
> Yup it's correct and fine but it's a no-op, hence we have nothing to do, as you
> say.
>
> I thought it was self-documenting, given I literally spell out the expected
> conditions in the asserts but obviously this isn't entirely clear. I'm trying
> _not_ to write paragraphs here as that can actually make things _more_
> confusing.

Yeah, that comment just confused me a bit. If it's no-op then other
conditions should not matter, yet we are asserting them. Anyway, I
undersdand the intention and new new comment or no comment at all are
fine with me.

>
> Will update the comment to say more:
>
>         /* If we have anything to do src->anon_vma must be provided. */
>
> >
> > > > +     VM_WARN_ON_ONCE(!src->anon_vma && !list_empty(&src->anon_vma_chain));
> > > > +     VM_WARN_ON_ONCE(!src->anon_vma && dst->anon_vma);
> > > > +     /* We are establishing a new anon_vma_chain. */
> > > > +     VM_WARN_ON_ONCE(!list_empty(&dst->anon_vma_chain));
> > > > +     /*
> > > > +      * On fork, dst->anon_vma is set NULL (temporarily). Otherwise, anon_vma
> > > > +      * must be the same across dst and src.
> >
> > This is the second time in this small function where we have to remind
> > that dst->anon_vma==NULL means that we are forking. Maybe it's better
> > to introduce a `bool forking = dst->anon_vma==NULL;` variable at the
> > beginning and use it in all these checks?
>
> Later we make changes along these lines, so for the purposes of keeping things
> broken up I'd rather not.
>
> And yes, anon_vma is a complicated mess, this is why I'm trying to do things one
> step at a time, so we document the things you'd have to go research to
> understand, later we change the code.
>
> >
> > I know, I'm nitpicking but as you said, anon_vma code is very
> > compicated, so the more clarity we can bring to it the better.
>
> Right, sure, but it has to be one thing at a time.

Ack.

>
> >
> > > > +      */
> > > > +     VM_WARN_ON_ONCE(dst->anon_vma && dst->anon_vma != src->anon_vma);
> > > > +}
> > > > +
> > > > +/**
> > > > + * anon_vma_clone - Establishes new anon_vma_chain objects in @dst linking to
> > > > + * all of the anon_vma objects contained within @src anon_vma_chain's.
> > > > + * @dst: The destination VMA with an empty anon_vma_chain.
> > > > + * @src: The source VMA we wish to duplicate.
> > > > + *
> > > > + * This is the heart of the VMA side of the anon_vma implementation - we invoke
> > > > + * this function whenever we need to set up a new VMA's anon_vma state.
> > > > + *
> > > > + * This is invoked for:
> > > > + *
> > > > + * - VMA Merge, but only when @dst is unfaulted and @src is faulted - meaning we
> > > > + *   clone @src into @dst.
> > > > + * - VMA split.
> > > > + * - VMA (m)remap.
> > > > + * - Fork of faulted VMA.
> > > > + *
> > > > + * In all cases other than fork this is simply a duplication. Fork additionally
> > > > + * adds a new active anon_vma.
> > > > + *
> > > > + * ONLY in the case of fork do we try to 'reuse' existing anon_vma's in an
> > > > + * anon_vma hierarchy, reusing anon_vma's which have no VMA associated with them
> > > > + * but do have a single child. This is to avoid waste of memory when repeatedly
> > > > + * forking.
> > > > + *
> > > > + * Returns: 0 on success, -ENOMEM on failure.
> > > >   */
> > > >  int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
> > > >  {
> > > >       struct anon_vma_chain *avc, *pavc;
> > > >       struct anon_vma *root = NULL;
> > > >
> > > > +     check_anon_vma_clone(dst, src);
> > > > +
> > > >       list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
> > > >               struct anon_vma *anon_vma;
> > > >
> > > > @@ -392,11 +422,27 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
> > > >       return -ENOMEM;
> > > >  }
> > > >
> > > > +/**
> > > > + * unlink_anon_vmas() - remove all links between a VMA and anon_vma's, freeing
> > > > + * anon_vma_chain objects.
> > > > + * @vma: The VMA whose links to anon_vma objects is to be severed.
> > > > + *
> > > > + * As part of the process anon_vma_chain's are freed,
> > > > + * anon_vma->num_children,num_active_vmas is updated as required and, if the
> > > > + * relevant anon_vma references no further VMAs, its reference count is
> > > > + * decremented.
> > > > + */
> > > >  void unlink_anon_vmas(struct vm_area_struct *vma)
> > > >  {
> > > >       struct anon_vma_chain *avc, *next;
> > > >       struct anon_vma *root = NULL;
> > > >
> > > > +     /* Always hold mmap lock, read-lock on unmap possibly. */
> > > > +     mmap_assert_locked(vma->vm_mm);
> > > > +
> > > > +     /* Unfaulted is a no-op. */
> > > > +     VM_WARN_ON_ONCE(!vma->anon_vma && !list_empty(&vma->anon_vma_chain));
> >
> > Hmm. anon_vma_clone() calls unlink_anon_vmas() after setting
> > dst->anon_vma=NULL in the enomem_failure path. This warning would
> > imply that in such case dst->anon_vma_chain is always non-empty. But I
> > don't think we can always expect that... What if the very first call
> > to anon_vma_chain_alloc() in anon_vma_clone()'s loop failed, I think
> > this would result in dst->anon_vma_chain being empty, no?
>
> OK well that's a good spot, though this is never going to actually happen in
> reality as an allocation failure here would really be 'too small to fail'.
>
> It's a pity we have to give up a completely sensible invariant because of
> terribly written code for an event that will never happen.
>
> But sure will drop this then, that's awful to have to do though :/
>
> Hey maybe we'd have bot reports on this (would require fault injection) if this
> had been taken to any tree at any point. Ah well.

I'll look into the new version to see the final result. Thanks!

>
> >
> > > > +
> > > >       /*
> > > >        * Unlink each anon_vma chained to the VMA.  This list is ordered
> > > >        * from newest to oldest, ensuring the root anon_vma gets freed last.
> > > > --
> > > > 2.52.0
> > > >
>
> Thanks, Lorenzo


  parent reply	other threads:[~2026-01-06 18:53 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-17 12:27 [PATCH 0/8] mm: clean up anon_vma implementation Lorenzo Stoakes
2025-12-17 12:27 ` [PATCH 1/8] mm/rmap: improve anon_vma_clone(), unlink_anon_vmas() comments, add asserts Lorenzo Stoakes
2025-12-19 18:22   ` Liam R. Howlett
2025-12-29 21:18     ` Suren Baghdasaryan
2025-12-30 21:21       ` Suren Baghdasaryan
2026-01-06 12:54       ` Lorenzo Stoakes
2026-01-06 13:01         ` Lorenzo Stoakes
2026-01-06 13:04           ` Lorenzo Stoakes
2026-01-06 13:34             ` Lorenzo Stoakes
2026-01-06 18:52         ` Suren Baghdasaryan [this message]
2026-01-06 13:51     ` Lorenzo Stoakes
2025-12-17 12:27 ` [PATCH 2/8] mm/rmap: skip unfaulted VMAs on anon_vma clone, unlink Lorenzo Stoakes
2025-12-19 18:28   ` Liam R. Howlett
2025-12-29 21:41     ` Suren Baghdasaryan
2026-01-06 13:17       ` Lorenzo Stoakes
2026-01-06 13:14     ` Lorenzo Stoakes
2026-01-06 13:42       ` Lorenzo Stoakes
2025-12-17 12:27 ` [PATCH 3/8] mm/rmap: remove unnecessary root lock dance in anon_vma clone, unmap Lorenzo Stoakes
2025-12-29 22:17   ` Suren Baghdasaryan
2026-01-06 13:58     ` Lorenzo Stoakes
2026-01-06 20:58       ` Suren Baghdasaryan
2026-01-08 17:46         ` Lorenzo Stoakes
2025-12-17 12:27 ` [PATCH 4/8] mm/rmap: remove anon_vma_merge() function Lorenzo Stoakes
2025-12-30 19:35   ` Suren Baghdasaryan
2026-01-06 14:00     ` Lorenzo Stoakes
2025-12-17 12:27 ` [PATCH 5/8] mm/rmap: make anon_vma functions internal Lorenzo Stoakes
2025-12-30 19:38   ` Suren Baghdasaryan
2026-01-06 14:03     ` Lorenzo Stoakes
2025-12-17 12:27 ` [PATCH 6/8] mm/mmap_lock: add vma_is_attached() helper Lorenzo Stoakes
2025-12-30 19:50   ` Suren Baghdasaryan
2026-01-06 14:06     ` Lorenzo Stoakes
2025-12-17 12:27 ` [PATCH 7/8] mm/rmap: allocate anon_vma_chain objects unlocked when possible Lorenzo Stoakes
2025-12-30 21:35   ` Suren Baghdasaryan
2026-01-06 14:17     ` Lorenzo Stoakes
2026-01-06 21:20       ` Suren Baghdasaryan
2026-01-08 17:26         ` Lorenzo Stoakes
2025-12-17 12:27 ` [PATCH 8/8] mm/rmap: separate out fork-only logic on anon_vma_clone() Lorenzo Stoakes
2025-12-30 22:02   ` Suren Baghdasaryan
2026-01-06 14:43     ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJuCfpGt7=RXW_MHy+1kEN3N=kV9zO+E_b9ybqtpWz79eVhnew@mail.gmail.com' \
    --to=surenb@google.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=chriscli@google.com \
    --cc=david@kernel.org \
    --cc=harry.yoo@oracle.com \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=pfalcato@suse.de \
    --cc=riel@surriel.com \
    --cc=rppt@kernel.org \
    --cc=shakeel.butt@linux.dev \
    --cc=v-songbaohua@oppo.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox