linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Jann Horn <jannh@google.com>
Cc: Barry Song <21cnbao@gmail.com>,
	Nicolas Geoffray <ngeoffray@google.com>,
	Lokesh Gidra <lokeshgidra@google.com>,
	David Hildenbrand <david@redhat.com>,
	Harry Yoo <harry.yoo@oracle.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@surriel.com>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>, Linux-MM <linux-mm@kvack.org>,
	Kalesh Singh <kaleshsingh@google.com>,
	SeongJae Park <sj@kernel.org>, Barry Song <v-songbaohua@oppo.com>,
	Peter Xu <peterx@redhat.com>
Subject: Re: [DISCUSSION] anon_vma root lock contention and per anon_vma lock
Date: Fri, 12 Sep 2025 05:49:30 +0100	[thread overview]
Message-ID: <6558e0c9-fb6a-4f2f-b9e7-0647ff64ba66@lucifer.local> (raw)
In-Reply-To: <CAG48ez0GVWV024kPe6kSV8c0LO7coACYXf9-85iqw+T+paUi3Q@mail.gmail.com>

On Thu, Sep 11, 2025 at 08:22:13PM +0200, Jann Horn wrote:
> On Thu, Sep 11, 2025 at 10:29 AM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> > On Thu, Sep 11, 2025 at 07:17:01PM +1200, Barry Song wrote:
> > > Hi All,
> > >
> > > I’m aware that Lokesh started a discussion on the concurrency issue
> > > between usefaultfd_move and memory reclamation [1]. However, my
> > > concern is different, so I’m starting a separate discussion.
> > >
> > > In the process tree, many processes may share anon_vma->root, even if
> > > they don’t share the anon_vma itself. This causes serious lock contention
> > > between memory reclamation (which calls folio_referenced and try_to_unmap)
> > > and other processes calling fork(), exit(), mprotect(), etc.
> >
> > Well, when you say lock contention, I mean - we need to have a lock that is held
> > over the entire fork tree, as we are cloning references to them.
> >
> > This is at the anon_vma level - so the folio might be exclusive, but other
> > folios there might not be.
> >
> > Note that I'm working on a radical rework of anon_vma's at the moment (time
> > is not in my favour given other tasks + review workload, but it _is_
> > happening).
> >
> > So I'm interested to gather real world usecase data on how best to
> > implement things and this is interesting re: that.
> >
> > My proposed approach would use something like ranged locks. It's a bit
> > fuzzy right now so definitely interested in putting some meat on that.
> >
> > >
> > > On Android, this issue becomes more severe since many processes are
> > > descendants of zygote.
> > >
> > > Memory reclamation path:
> > >   folio_lock_anon_vma_read
> > >
> > > mprotect path:
> > >   mprotect
> > >     split_vma
> > >       anon_vma_clone
> > >
> > > fork / copy_process path:
> > >   copy_process
> > >     dup_mmap
> > >       anon_vma_fork
> > >
> > > exit path:
> > >   exit_mmap
> > >     free_pgtables
> > >       unlink_anon_vmas
> > >
> > > To be honest, memory reclamation—especially folio_referenced()—is a
> > > problem. It is called very frequently and can block other important
> > > user threads waiting for the anon_vma root lock, causing UI lag.
> > >
> > > I have a rough idea: since the vast majority of anon folios are actually
> > > exclusive (I observed almost 98% of Android anon folios fall into this
> > > category), they don’t need to iterate the anon_vma tree. They belong to
> > > a single process, and even for rmap, it is per-process.
> > >
> > > I propose introducing a per-anon_vma lock. For exclusive folios whose
> > > anon_vma is not shared, we could use this per-anon_vma lock.
> >
> > I'm not sure how adding _more_ locks is going to reduce contention :) and
> > the anon_vma's are all linked to their parents etc. etc. so it's simply not
> > ok to hold one lock and not the others when making changes.
>
> folio_referenced() only wants to look at mappings of a single folio,
> right? And it only uses the anon_vma of that folio? So as long as we
> can guarantee that the folio can't concurrently change which anon_vma
> it is associated with, folio_referenced() really only cares about the
> specific anon_vma that the folio is associated with, and the anon_vmas
> of other folios in the VMAs we traverse are irrelevant?

Right yeah, true. But the AVC's link you to 'related' VMA's which are
across the hierarchy.

I think really the refined way of saying this is - yes, you could, but
you're then putting the weight on the VMA side, and the VMA side is
being invoked _all the time_.

>
> Basically I think paths that come through the rmap would usually be
> able to use such a fine-grained lock, while paths that come through
> the MM would often have to use more coarse locking.

They'd have to use _both_ sets of locking.

And this is on every single fork, merge, etc. etc.

So we'd reduce lock acquisition from rmap end, and significantly increase
it, scaling with 'how far forked we are' ;)

So this is the fundamnetal issue.

>
> Of course paths requiring coarse locking (like for splitting VMAs and
> such) would then have to take a pile of locks, one lock per anon_vma
> associated with a given VMA. That part shouldn't be overly complicated
> though, we'd mainly have to make sure that there is a consistent lock
> ordering (such as "if you want to lock multiple anon_vmas, you have to
> lock the root anon_vma before the others").
>

I mean already this lock ordering is not so fun :)

I suspect there'd be other issues.

But perhaps a way forward is, since I'm working in this area already, to
try and hack together an RFC which we could use to figure out how heavy the
cost is...

So let me try and do that.


  reply	other threads:[~2025-09-12  4:49 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-11  7:17 Barry Song
2025-09-11  8:14 ` David Hildenbrand
2025-09-11  8:34   ` Lorenzo Stoakes
2025-09-11  9:18   ` Barry Song
2025-09-11 10:47     ` Lorenzo Stoakes
2025-09-11  8:28 ` Lorenzo Stoakes
2025-09-11 18:22   ` Jann Horn
2025-09-12  4:49     ` Lorenzo Stoakes [this message]
2025-09-12 11:37       ` Jann Horn
2025-09-12 11:56         ` Lorenzo Stoakes
2025-09-14 23:53 ` Matthew Wilcox
2025-09-15  0:23   ` Barry Song
2025-09-15  1:47     ` Suren Baghdasaryan
2025-09-15  8:41       ` Lorenzo Stoakes
2025-09-15  2:50     ` Matthew Wilcox
2025-09-15  5:17       ` David Hildenbrand
2025-09-15  9:42         ` Lorenzo Stoakes
2025-09-15 10:29           ` David Hildenbrand
2025-09-15 10:56             ` Lorenzo Stoakes
2025-09-15  9:22       ` Lorenzo Stoakes
2025-09-15 10:41         ` David Hildenbrand
2025-09-15 10:51           ` Lorenzo Stoakes
2025-09-15  8:57   ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6558e0c9-fb6a-4f2f-b9e7-0647ff64ba66@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=21cnbao@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=harry.yoo@oracle.com \
    --cc=jannh@google.com \
    --cc=kaleshsingh@google.com \
    --cc=linux-mm@kvack.org \
    --cc=lokeshgidra@google.com \
    --cc=ngeoffray@google.com \
    --cc=peterx@redhat.com \
    --cc=riel@surriel.com \
    --cc=sj@kernel.org \
    --cc=surenb@google.com \
    --cc=v-songbaohua@oppo.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox