linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Suren Baghdasaryan <surenb@google.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	 Vlastimil Babka <vbabka@suse.cz>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	 David Hildenbrand <david@kernel.org>,
	Rik van Riel <riel@surriel.com>, Harry Yoo <harry.yoo@oracle.com>,
	 Jann Horn <jannh@google.com>, Mike Rapoport <rppt@kernel.org>,
	Michal Hocko <mhocko@suse.com>,  Pedro Falcato <pfalcato@suse.de>,
	Chris Li <chriscli@google.com>,
	 Barry Song <v-songbaohua@oppo.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 1/8] mm/rmap: improve anon_vma_clone(), unlink_anon_vmas() comments, add asserts
Date: Wed, 14 Jan 2026 08:14:23 -0800	[thread overview]
Message-ID: <CAJuCfpHoNdg4p__pmdsQepzxSpLe4j_7s17=Y5bicETacP4a+Q@mail.gmail.com> (raw)
In-Reply-To: <5f55507a877028add5fdf8f207f5e333c7a3fc85.1767711638.git.lorenzo.stoakes@oracle.com>

On Tue, Jan 6, 2026 at 7:13 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> Add kdoc comments and describe exactly what these functions are used for in
> detail, pointing out importantly that the anon_vma_clone() !dst->anon_vma
> && src->anon_vma dance is ONLY for fork.
>
> Both are confusing functions that will be refactored in a subsequent patch
> but the first stage is establishing documentation and some invariants.
>
> Add some basic CONFIG_DEBUG_VM asserts that help document expected state,
> specifically:
>
> anon_vma_clone()
> - mmap write lock held.
> - We do nothing if src VMA is not faulted.
> - The destination VMA has no anon_vma_chain yet.
> - We are always operating on the same active VMA (i.e. vma->anon_vma).
> - If not forking, must operate on the same mm_struct.
>
> unlink_anon_vmas()
> - mmap lock held (write lock except when freeing page tables).
> - That unfaulted VMAs are no-ops.
>
> We are presented with a special case when anon_vma_clone() fails to
> allocate memory, where we have a VMA with partially set up anon_vma state.
> Since we hold the exclusive mmap write lock, and since we are cloning from
> a source VMA which consequently can't also have its anon_vma state
> modified, we know no anon_vma referenced can be empty.
>
> This allows us to significantly simplify this case and just remove
> anon_vma_chain objects associated with the VMA, so we add a specific
> partial cleanup path for this scenario.
>
> This also allows us to drop the hack of setting vma->anon_vma to NULL
> before unlinking anon_vma state in this scenario.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> ---
>  mm/rmap.c | 130 +++++++++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 104 insertions(+), 26 deletions(-)
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index c86f1135222b..54ccf884d90a 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -258,30 +258,62 @@ static inline void unlock_anon_vma_root(struct anon_vma *root)
>                 up_write(&root->rwsem);
>  }
>
> -/*
> - * Attach the anon_vmas from src to dst.
> - * Returns 0 on success, -ENOMEM on failure.
> - *
> - * anon_vma_clone() is called by vma_expand(), vma_merge(), __split_vma(),
> - * copy_vma() and anon_vma_fork(). The first four want an exact copy of src,
> - * while the last one, anon_vma_fork(), may try to reuse an existing anon_vma to
> - * prevent endless growth of anon_vma. Since dst->anon_vma is set to NULL before
> - * call, we can identify this case by checking (!dst->anon_vma &&
> - * src->anon_vma).
> - *
> - * If (!dst->anon_vma && src->anon_vma) is true, this function tries to find
> - * and reuse existing anon_vma which has no vmas and only one child anon_vma.
> - * This prevents degradation of anon_vma hierarchy to endless linear chain in
> - * case of constantly forking task. On the other hand, an anon_vma with more
> - * than one child isn't reused even if there was no alive vma, thus rmap
> - * walker has a good chance of avoiding scanning the whole hierarchy when it
> - * searches where page is mapped.
> +static void check_anon_vma_clone(struct vm_area_struct *dst,
> +                                struct vm_area_struct *src)
> +{
> +       /* The write lock must be held. */
> +       mmap_assert_write_locked(src->vm_mm);
> +       /* If not a fork (implied by dst->anon_vma) then must be on same mm. */
> +       VM_WARN_ON_ONCE(dst->anon_vma && dst->vm_mm != src->vm_mm);
> +
> +       /* If we have anything to do src->anon_vma must be provided. */
> +       VM_WARN_ON_ONCE(!src->anon_vma && !list_empty(&src->anon_vma_chain));
> +       VM_WARN_ON_ONCE(!src->anon_vma && dst->anon_vma);
> +       /* We are establishing a new anon_vma_chain. */
> +       VM_WARN_ON_ONCE(!list_empty(&dst->anon_vma_chain));
> +       /*
> +        * On fork, dst->anon_vma is set NULL (temporarily). Otherwise, anon_vma
> +        * must be the same across dst and src.
> +        */
> +       VM_WARN_ON_ONCE(dst->anon_vma && dst->anon_vma != src->anon_vma);
> +}
> +
> +static void cleanup_partial_anon_vmas(struct vm_area_struct *vma);
> +
> +/**
> + * anon_vma_clone - Establishes new anon_vma_chain objects in @dst linking to
> + * all of the anon_vma objects contained within @src anon_vma_chain's.
> + * @dst: The destination VMA with an empty anon_vma_chain.
> + * @src: The source VMA we wish to duplicate.
> + *
> + * This is the heart of the VMA side of the anon_vma implementation - we invoke
> + * this function whenever we need to set up a new VMA's anon_vma state.
> + *
> + * This is invoked for:
> + *
> + * - VMA Merge, but only when @dst is unfaulted and @src is faulted - meaning we
> + *   clone @src into @dst.
> + * - VMA split.
> + * - VMA (m)remap.
> + * - Fork of faulted VMA.
> + *
> + * In all cases other than fork this is simply a duplication. Fork additionally
> + * adds a new active anon_vma.
> + *
> + * ONLY in the case of fork do we try to 'reuse' existing anon_vma's in an
> + * anon_vma hierarchy, reusing anon_vma's which have no VMA associated with them
> + * but do have a single child. This is to avoid waste of memory when repeatedly
> + * forking.
> + *
> + * Returns: 0 on success, -ENOMEM on failure.
>   */
>  int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
>  {
>         struct anon_vma_chain *avc, *pavc;
>         struct anon_vma *root = NULL;
>
> +       check_anon_vma_clone(dst, src);
> +
>         list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
>                 struct anon_vma *anon_vma;
>
> @@ -315,14 +347,7 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
>         return 0;
>
>   enomem_failure:
> -       /*
> -        * dst->anon_vma is dropped here otherwise its num_active_vmas can
> -        * be incorrectly decremented in unlink_anon_vmas().
> -        * We can safely do this because callers of anon_vma_clone() don't care
> -        * about dst->anon_vma if anon_vma_clone() failed.
> -        */
> -       dst->anon_vma = NULL;
> -       unlink_anon_vmas(dst);
> +       cleanup_partial_anon_vmas(dst);
>         return -ENOMEM;
>  }
>
> @@ -393,11 +418,64 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>         return -ENOMEM;
>  }
>
> +/*
> + * In the unfortunate case of anon_vma_clone() failing to allocate memory we
> + * have to clean things up.
> + *
> + * On clone we hold the exclusive mmap write lock, so we can't race
> + * unlink_anon_vmas(). Since we're cloning, we know we can't have empty
> + * anon_vma's, since existing anon_vma's are what we're cloning from.

nit: At first I got confused because it's possible that vma->anon_vma
(which is dst->anon_vma) can be NULL but then I realized you are
talking about avc->anon_vma here. Maybe change the comment to use
avc->anon_vma instead of anon_vma for clarity?

> + *
> + * So this function needs only traverse the anon_vma_chain and free each
> + * allocated anon_vma_chain.
> + */
> +static void cleanup_partial_anon_vmas(struct vm_area_struct *vma)
> +{
> +       struct anon_vma_chain *avc, *next;
> +       bool locked = false;
> +
> +       /*
> +        * We exclude everybody else from being able to modify anon_vma's
> +        * underneath us.
> +        */
> +       mmap_assert_locked(vma->vm_mm);
> +
> +       list_for_each_entry_safe(avc, next, &vma->anon_vma_chain, same_vma) {
> +               struct anon_vma *anon_vma = avc->anon_vma;
> +
> +               /* All anon_vma's share the same root. */
> +               if (!locked) {
> +                       anon_vma_lock_write(anon_vma);
> +                       locked = true;
> +               }
> +
> +               anon_vma_interval_tree_remove(avc, &anon_vma->rb_root);
> +               list_del(&avc->same_vma);
> +               anon_vma_chain_free(avc);
> +       }

Are you missing "if (locked) anon_vma_unlock_write()" here?
You could also avoid using "locked" variable by setting anon_vma =
NULL initially and using "if (anon_vma)" as an equivalent of "if
(locked)"

> +}
> +
> +/**
> + * unlink_anon_vmas() - remove all links between a VMA and anon_vma's, freeing
> + * anon_vma_chain objects.
> + * @vma: The VMA whose links to anon_vma objects is to be severed.
> + *
> + * As part of the process anon_vma_chain's are freed,
> + * anon_vma->num_children,num_active_vmas is updated as required and, if the
> + * relevant anon_vma references no further VMAs, its reference count is
> + * decremented.
> + */
>  void unlink_anon_vmas(struct vm_area_struct *vma)
>  {
>         struct anon_vma_chain *avc, *next;
>         struct anon_vma *root = NULL;
>
> +       /* Always hold mmap lock, read-lock on unmap possibly. */
> +       mmap_assert_locked(vma->vm_mm);
> +
> +       /* Unfaulted is a no-op. */
> +       VM_WARN_ON_ONCE(!vma->anon_vma && !list_empty(&vma->anon_vma_chain));
> +
>         /*
>          * Unlink each anon_vma chained to the VMA.  This list is ordered
>          * from newest to oldest, ensuring the root anon_vma gets freed last.
> --
> 2.52.0
>


  reply	other threads:[~2026-01-14 16:14 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-06 15:04 [PATCH v2 0/8] mm: clean up anon_vma implementation Lorenzo Stoakes
2026-01-06 15:04 ` [PATCH v2 1/8] mm/rmap: improve anon_vma_clone(), unlink_anon_vmas() comments, add asserts Lorenzo Stoakes
2026-01-14 16:14   ` Suren Baghdasaryan [this message]
2026-01-14 18:33     ` Lorenzo Stoakes
2026-01-14 18:48       ` Lorenzo Stoakes
2026-01-14 19:02   ` Lorenzo Stoakes
2026-01-06 15:04 ` [PATCH v2 2/8] mm/rmap: skip unfaulted VMAs on anon_vma clone, unlink Lorenzo Stoakes
2026-01-06 18:34   ` Liam R. Howlett
2026-01-14 16:47     ` Suren Baghdasaryan
2026-01-06 15:04 ` [PATCH v2 3/8] mm/rmap: remove unnecessary root lock dance in anon_vma clone, unmap Lorenzo Stoakes
2026-01-06 18:42   ` Liam R. Howlett
2026-01-14 16:55   ` Suren Baghdasaryan
2026-01-14 17:54     ` Lorenzo Stoakes
2026-01-14 18:01       ` Suren Baghdasaryan
2026-01-06 15:04 ` [PATCH v2 4/8] mm/rmap: remove anon_vma_merge() function Lorenzo Stoakes
2026-01-06 18:42   ` Liam R. Howlett
2026-01-06 15:04 ` [PATCH v2 5/8] mm/rmap: make anon_vma functions internal Lorenzo Stoakes
2026-01-06 18:54   ` Liam R. Howlett
2026-01-06 15:04 ` [PATCH v2 6/8] mm/mmap_lock: add vma_is_attached() helper Lorenzo Stoakes
2026-01-06 18:56   ` Liam R. Howlett
2026-01-06 15:04 ` [PATCH v2 7/8] mm/rmap: allocate anon_vma_chain objects unlocked when possible Lorenzo Stoakes
2026-01-06 19:02   ` Liam R. Howlett
2026-01-08 18:51   ` Lorenzo Stoakes
2026-01-06 15:04 ` [PATCH v2 8/8] mm/rmap: separate out fork-only logic on anon_vma_clone() Lorenzo Stoakes
2026-01-06 19:27   ` Liam R. Howlett
2026-01-08 17:58     ` Lorenzo Stoakes
2026-01-14 17:26     ` Suren Baghdasaryan
2026-01-08 18:52   ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJuCfpHoNdg4p__pmdsQepzxSpLe4j_7s17=Y5bicETacP4a+Q@mail.gmail.com' \
    --to=surenb@google.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=chriscli@google.com \
    --cc=david@kernel.org \
    --cc=harry.yoo@oracle.com \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=pfalcato@suse.de \
    --cc=riel@surriel.com \
    --cc=rppt@kernel.org \
    --cc=shakeel.butt@linux.dev \
    --cc=v-songbaohua@oppo.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox