Re: [DISCUSSION] anon_vma root lock contention and per anon_vma lock

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Suren Baghdasaryan <surenb@google.com>
To: Barry Song <21cnbao@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	Nicolas Geoffray <ngeoffray@google.com>,
	 Lokesh Gidra <lokeshgidra@google.com>,
	David Hildenbrand <david@redhat.com>,
	 Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Harry Yoo <harry.yoo@oracle.com>,
	 Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@surriel.com>,
	 "Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
	 Linux-MM <linux-mm@kvack.org>,
	Kalesh Singh <kaleshsingh@google.com>,
	 SeongJae Park <sj@kernel.org>,
	Barry Song <v-songbaohua@oppo.com>, Peter Xu <peterx@redhat.com>
Subject: Re: [DISCUSSION] anon_vma root lock contention and per anon_vma lock
Date: Sun, 14 Sep 2025 18:47:48 -0700	[thread overview]
Message-ID: <CAJuCfpG1DHeLtsTcD3A46vDV5BwBDAD7B-EVG4TQEY-GSvzfeg@mail.gmail.com> (raw)
In-Reply-To: <CAGsJ_4xDRB_F-T42WnhqpmwLyiZRwLGqx9vDf_d5TFALsCRX4A@mail.gmail.com>

On Sun, Sep 14, 2025 at 5:23 PM Barry Song <21cnbao@gmail.com> wrote:
>
> On Mon, Sep 15, 2025 at 7:53 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Thu, Sep 11, 2025 at 07:17:01PM +1200, Barry Song wrote:
> > > In the process tree, many processes may share anon_vma->root, even if
> > > they don’t share the anon_vma itself. This causes serious lock contention
> > > between memory reclamation (which calls folio_referenced and try_to_unmap)
> > > and other processes calling fork(), exit(), mprotect(), etc.
> > >
> > > On Android, this issue becomes more severe since many processes are
> > > descendants of zygote.
> >
> > I'm not nearly as familiar with anon_vma as, well, the rest of you
> > are.  As I understand this situation, usually after fork(), a process
> > calls exec() and the VMAs evaporate.  Android is different in that after
> > the zygotecalls fork(), there is no exec() and so the VMAs stay COW.
> >
> > I wonder if we could fix this by adding a new syscall:
> >
> >         mremap(addr, size, size, MREMAP_COW_NOW);
> >
> > That would create a new VMA that contains the COWed pages from the
> > old VMA, but crucially no longer attached to the anon_vma root of
> > the zygote.  You wouldn't want to call this for every VMA, of course.
> > Just the ones which are likely to be fully COWed.
> >
> > Maybe this isn't practical, but I thought it worth suggesting.
>
> Thank you for the suggestion, Matthew.
>
> Lorenzo suggested possibly unlinking the child anon_vma from the root once all
> folios have been CoW-ed:
>
> "Right now, even if you entirely CoW everything in a VMA, we are still
> attached to parents with all the overhead. That's something I can look at.
> "
>
> My concern is that it’s difficult to determine whether a VMA has been completely
> CoW-ed, and a single shared folio would prevent the unlink.
> So I’m not sure this approach would work.
>
> You seem to be proposing a forced CoW as a way to safely unlink from the root.
>
> A side effect is the potential for sudden, heavy memory allocation,
> whereas CoW lets asynchronous tasks such as kswap work concurrently.
>
> Another issue is the extra memory use from folios that could have been
> shared but aren’t—likely minor on Android, since only a small portion
> of memory is actually shared, based on our observations.
>
> Calling mremap for each VMA might be difficult. Something applied to the
> whole process could be more practical—similar to exec, but only
> performing CoW and unlinking the anon_vma root.
>
> On the other hand, most anon folios are not actually shared, yet
> folio_referenced and try_to_unmap still take the entire root lock.
> In reality, they only care about their own node—no need to iterate
> the whole tree.
>
> I still think optimizing from that angle could be a better entry point :-)

Hi Barry,
Thanks for raising this issue. I think technically the optimization
you are suggesting is possible and it does look similar to per-vma
locking in that:
- The reader tries to read-lock a specific interval and on failure
falls back to locking the entire tree (root);
- The writer write-locks the root first and then one or more
individual nodes in the tree. Once the writer is done it unlocks all
the nodes it locked and then the root.
But as Lorenzo pointed out, this will not be pretty, as it adds yet
another lock and more locking/unlocking into the writer path.
In the case of the pagefault path, improving its performance at the
expense of the writers was not questioned due to pagefault being such
a hot path. I'm not sure reclaim will be given the same benefit...
Something to consider.
In any case, I'm very interested in continuing this discussion and
would love to test a POC or discuss this at LPC.
Thanks,
Suren.

>
> Thanks
> Barry

next prev parent reply	other threads:[~2025-09-15  1:48 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-11  7:17 Barry Song
2025-09-11  8:14 ` David Hildenbrand
2025-09-11  8:34   ` Lorenzo Stoakes
2025-09-11  9:18   ` Barry Song
2025-09-11 10:47     ` Lorenzo Stoakes
2025-09-11  8:28 ` Lorenzo Stoakes
2025-09-11 18:22   ` Jann Horn
2025-09-12  4:49     ` Lorenzo Stoakes
2025-09-12 11:37       ` Jann Horn
2025-09-12 11:56         ` Lorenzo Stoakes
2025-09-14 23:53 ` Matthew Wilcox
2025-09-15  0:23   ` Barry Song
2025-09-15  1:47     ` Suren Baghdasaryan [this message]
2025-09-15  8:41       ` Lorenzo Stoakes
2025-09-15  2:50     ` Matthew Wilcox
2025-09-15  5:17       ` David Hildenbrand
2025-09-15  9:42         ` Lorenzo Stoakes
2025-09-15 10:29           ` David Hildenbrand
2025-09-15 10:56             ` Lorenzo Stoakes
2025-09-15  9:22       ` Lorenzo Stoakes
2025-09-15 10:41         ` David Hildenbrand
2025-09-15 10:51           ` Lorenzo Stoakes
2025-09-15  8:57   ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJuCfpG1DHeLtsTcD3A46vDV5BwBDAD7B-EVG4TQEY-GSvzfeg@mail.gmail.com \
    --to=surenb@google.com \
    --cc=21cnbao@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=harry.yoo@oracle.com \
    --cc=jannh@google.com \
    --cc=kaleshsingh@google.com \
    --cc=linux-mm@kvack.org \
    --cc=lokeshgidra@google.com \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=ngeoffray@google.com \
    --cc=peterx@redhat.com \
    --cc=riel@surriel.com \
    --cc=sj@kernel.org \
    --cc=v-songbaohua@oppo.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox