From: Jann Horn <jannh@google.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Barry Song <21cnbao@gmail.com>,
Nicolas Geoffray <ngeoffray@google.com>,
Lokesh Gidra <lokeshgidra@google.com>,
David Hildenbrand <david@redhat.com>,
Harry Yoo <harry.yoo@oracle.com>,
Suren Baghdasaryan <surenb@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Rik van Riel <riel@surriel.com>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@suse.cz>, Linux-MM <linux-mm@kvack.org>,
Kalesh Singh <kaleshsingh@google.com>,
SeongJae Park <sj@kernel.org>,
Barry Song <v-songbaohua@oppo.com>, Peter Xu <peterx@redhat.com>
Subject: Re: [DISCUSSION] anon_vma root lock contention and per anon_vma lock
Date: Fri, 12 Sep 2025 13:37:35 +0200 [thread overview]
Message-ID: <CAG48ez1FXUMsvKEcJCFcUBUP0BM2=LTWzH=iKiHhL4PurN6q8g@mail.gmail.com> (raw)
In-Reply-To: <6558e0c9-fb6a-4f2f-b9e7-0647ff64ba66@lucifer.local>
On Fri, Sep 12, 2025 at 6:49 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
> On Thu, Sep 11, 2025 at 08:22:13PM +0200, Jann Horn wrote:
> > On Thu, Sep 11, 2025 at 10:29 AM Lorenzo Stoakes
> > <lorenzo.stoakes@oracle.com> wrote:
> > > On Thu, Sep 11, 2025 at 07:17:01PM +1200, Barry Song wrote:
> > > > Hi All,
> > > >
> > > > I’m aware that Lokesh started a discussion on the concurrency issue
> > > > between usefaultfd_move and memory reclamation [1]. However, my
> > > > concern is different, so I’m starting a separate discussion.
> > > >
> > > > In the process tree, many processes may share anon_vma->root, even if
> > > > they don’t share the anon_vma itself. This causes serious lock contention
> > > > between memory reclamation (which calls folio_referenced and try_to_unmap)
> > > > and other processes calling fork(), exit(), mprotect(), etc.
> > >
> > > Well, when you say lock contention, I mean - we need to have a lock that is held
> > > over the entire fork tree, as we are cloning references to them.
> > >
> > > This is at the anon_vma level - so the folio might be exclusive, but other
> > > folios there might not be.
> > >
> > > Note that I'm working on a radical rework of anon_vma's at the moment (time
> > > is not in my favour given other tasks + review workload, but it _is_
> > > happening).
> > >
> > > So I'm interested to gather real world usecase data on how best to
> > > implement things and this is interesting re: that.
> > >
> > > My proposed approach would use something like ranged locks. It's a bit
> > > fuzzy right now so definitely interested in putting some meat on that.
> > >
> > > >
> > > > On Android, this issue becomes more severe since many processes are
> > > > descendants of zygote.
> > > >
> > > > Memory reclamation path:
> > > > folio_lock_anon_vma_read
> > > >
> > > > mprotect path:
> > > > mprotect
> > > > split_vma
> > > > anon_vma_clone
> > > >
> > > > fork / copy_process path:
> > > > copy_process
> > > > dup_mmap
> > > > anon_vma_fork
> > > >
> > > > exit path:
> > > > exit_mmap
> > > > free_pgtables
> > > > unlink_anon_vmas
> > > >
> > > > To be honest, memory reclamation—especially folio_referenced()—is a
> > > > problem. It is called very frequently and can block other important
> > > > user threads waiting for the anon_vma root lock, causing UI lag.
> > > >
> > > > I have a rough idea: since the vast majority of anon folios are actually
> > > > exclusive (I observed almost 98% of Android anon folios fall into this
> > > > category), they don’t need to iterate the anon_vma tree. They belong to
> > > > a single process, and even for rmap, it is per-process.
> > > >
> > > > I propose introducing a per-anon_vma lock. For exclusive folios whose
> > > > anon_vma is not shared, we could use this per-anon_vma lock.
> > >
> > > I'm not sure how adding _more_ locks is going to reduce contention :) and
> > > the anon_vma's are all linked to their parents etc. etc. so it's simply not
> > > ok to hold one lock and not the others when making changes.
> >
> > folio_referenced() only wants to look at mappings of a single folio,
> > right? And it only uses the anon_vma of that folio? So as long as we
> > can guarantee that the folio can't concurrently change which anon_vma
> > it is associated with, folio_referenced() really only cares about the
> > specific anon_vma that the folio is associated with, and the anon_vmas
> > of other folios in the VMAs we traverse are irrelevant?
>
> Right yeah, true. But the AVC's link you to 'related' VMA's which are
> across the hierarchy.
>
> I think really the refined way of saying this is - yes, you could, but
> you're then putting the weight on the VMA side, and the VMA side is
> being invoked _all the time_.
Ah, fair.
I guess one approach would be to do something hazard-pointer-ish? Like
a semaphore-like thing in the root anon_vma that contains a normal
reader count, a hazard-pointer reader count (limited to some small
number like 2 or 4), and a writer count (up to 1), combined with a
limited number of hazard pointer slots; where a writer can ignore the
hazard-pointer reader count if none of the hazard pointers match any
anon_vma it wants to look at (but readers still always have to wait
for writers). The write-locking fastpath would just be a normal
"atomically add N if zero" just like with normal locking, and only the
case where there actually are hazard-pointer readers would make the
locking more expensive...
But inventing more artisanal locking schemes is probably not a great idea...
next prev parent reply other threads:[~2025-09-12 11:38 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-11 7:17 Barry Song
2025-09-11 8:14 ` David Hildenbrand
2025-09-11 8:34 ` Lorenzo Stoakes
2025-09-11 9:18 ` Barry Song
2025-09-11 10:47 ` Lorenzo Stoakes
2025-09-11 8:28 ` Lorenzo Stoakes
2025-09-11 18:22 ` Jann Horn
2025-09-12 4:49 ` Lorenzo Stoakes
2025-09-12 11:37 ` Jann Horn [this message]
2025-09-12 11:56 ` Lorenzo Stoakes
2025-09-14 23:53 ` Matthew Wilcox
2025-09-15 0:23 ` Barry Song
2025-09-15 1:47 ` Suren Baghdasaryan
2025-09-15 8:41 ` Lorenzo Stoakes
2025-09-15 2:50 ` Matthew Wilcox
2025-09-15 5:17 ` David Hildenbrand
2025-09-15 9:42 ` Lorenzo Stoakes
2025-09-15 10:29 ` David Hildenbrand
2025-09-15 10:56 ` Lorenzo Stoakes
2025-09-15 9:22 ` Lorenzo Stoakes
2025-09-15 10:41 ` David Hildenbrand
2025-09-15 10:51 ` Lorenzo Stoakes
2025-09-15 8:57 ` Lorenzo Stoakes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAG48ez1FXUMsvKEcJCFcUBUP0BM2=LTWzH=iKiHhL4PurN6q8g@mail.gmail.com' \
--to=jannh@google.com \
--cc=21cnbao@gmail.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=harry.yoo@oracle.com \
--cc=kaleshsingh@google.com \
--cc=linux-mm@kvack.org \
--cc=lokeshgidra@google.com \
--cc=lorenzo.stoakes@oracle.com \
--cc=ngeoffray@google.com \
--cc=peterx@redhat.com \
--cc=riel@surriel.com \
--cc=sj@kernel.org \
--cc=surenb@google.com \
--cc=v-songbaohua@oppo.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox