linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jann Horn <jannh@google.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Barry Song <21cnbao@gmail.com>,
	Nicolas Geoffray <ngeoffray@google.com>,
	 Lokesh Gidra <lokeshgidra@google.com>,
	David Hildenbrand <david@redhat.com>,
	 Harry Yoo <harry.yoo@oracle.com>,
	Suren Baghdasaryan <surenb@google.com>,
	 Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@surriel.com>,
	 "Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>, Linux-MM <linux-mm@kvack.org>,
	 Kalesh Singh <kaleshsingh@google.com>,
	SeongJae Park <sj@kernel.org>,
	 Barry Song <v-songbaohua@oppo.com>, Peter Xu <peterx@redhat.com>
Subject: Re: [DISCUSSION] anon_vma root lock contention and per anon_vma lock
Date: Fri, 12 Sep 2025 13:37:35 +0200	[thread overview]
Message-ID: <CAG48ez1FXUMsvKEcJCFcUBUP0BM2=LTWzH=iKiHhL4PurN6q8g@mail.gmail.com> (raw)
In-Reply-To: <6558e0c9-fb6a-4f2f-b9e7-0647ff64ba66@lucifer.local>

On Fri, Sep 12, 2025 at 6:49 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
> On Thu, Sep 11, 2025 at 08:22:13PM +0200, Jann Horn wrote:
> > On Thu, Sep 11, 2025 at 10:29 AM Lorenzo Stoakes
> > <lorenzo.stoakes@oracle.com> wrote:
> > > On Thu, Sep 11, 2025 at 07:17:01PM +1200, Barry Song wrote:
> > > > Hi All,
> > > >
> > > > I’m aware that Lokesh started a discussion on the concurrency issue
> > > > between usefaultfd_move and memory reclamation [1]. However, my
> > > > concern is different, so I’m starting a separate discussion.
> > > >
> > > > In the process tree, many processes may share anon_vma->root, even if
> > > > they don’t share the anon_vma itself. This causes serious lock contention
> > > > between memory reclamation (which calls folio_referenced and try_to_unmap)
> > > > and other processes calling fork(), exit(), mprotect(), etc.
> > >
> > > Well, when you say lock contention, I mean - we need to have a lock that is held
> > > over the entire fork tree, as we are cloning references to them.
> > >
> > > This is at the anon_vma level - so the folio might be exclusive, but other
> > > folios there might not be.
> > >
> > > Note that I'm working on a radical rework of anon_vma's at the moment (time
> > > is not in my favour given other tasks + review workload, but it _is_
> > > happening).
> > >
> > > So I'm interested to gather real world usecase data on how best to
> > > implement things and this is interesting re: that.
> > >
> > > My proposed approach would use something like ranged locks. It's a bit
> > > fuzzy right now so definitely interested in putting some meat on that.
> > >
> > > >
> > > > On Android, this issue becomes more severe since many processes are
> > > > descendants of zygote.
> > > >
> > > > Memory reclamation path:
> > > >   folio_lock_anon_vma_read
> > > >
> > > > mprotect path:
> > > >   mprotect
> > > >     split_vma
> > > >       anon_vma_clone
> > > >
> > > > fork / copy_process path:
> > > >   copy_process
> > > >     dup_mmap
> > > >       anon_vma_fork
> > > >
> > > > exit path:
> > > >   exit_mmap
> > > >     free_pgtables
> > > >       unlink_anon_vmas
> > > >
> > > > To be honest, memory reclamation—especially folio_referenced()—is a
> > > > problem. It is called very frequently and can block other important
> > > > user threads waiting for the anon_vma root lock, causing UI lag.
> > > >
> > > > I have a rough idea: since the vast majority of anon folios are actually
> > > > exclusive (I observed almost 98% of Android anon folios fall into this
> > > > category), they don’t need to iterate the anon_vma tree. They belong to
> > > > a single process, and even for rmap, it is per-process.
> > > >
> > > > I propose introducing a per-anon_vma lock. For exclusive folios whose
> > > > anon_vma is not shared, we could use this per-anon_vma lock.
> > >
> > > I'm not sure how adding _more_ locks is going to reduce contention :) and
> > > the anon_vma's are all linked to their parents etc. etc. so it's simply not
> > > ok to hold one lock and not the others when making changes.
> >
> > folio_referenced() only wants to look at mappings of a single folio,
> > right? And it only uses the anon_vma of that folio? So as long as we
> > can guarantee that the folio can't concurrently change which anon_vma
> > it is associated with, folio_referenced() really only cares about the
> > specific anon_vma that the folio is associated with, and the anon_vmas
> > of other folios in the VMAs we traverse are irrelevant?
>
> Right yeah, true. But the AVC's link you to 'related' VMA's which are
> across the hierarchy.
>
> I think really the refined way of saying this is - yes, you could, but
> you're then putting the weight on the VMA side, and the VMA side is
> being invoked _all the time_.

Ah, fair.

I guess one approach would be to do something hazard-pointer-ish? Like
a semaphore-like thing in the root anon_vma that contains a normal
reader count, a hazard-pointer reader count (limited to some small
number like 2 or 4), and a writer count (up to 1), combined with a
limited number of hazard pointer slots; where a writer can ignore the
hazard-pointer reader count if none of the hazard pointers match any
anon_vma it wants to look at (but readers still always have to wait
for writers). The write-locking fastpath would just be a normal
"atomically add N if zero" just like with normal locking, and only the
case where there actually are hazard-pointer readers would make the
locking more expensive...

But inventing more artisanal locking schemes is probably not a great idea...


  reply	other threads:[~2025-09-12 11:38 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-11  7:17 Barry Song
2025-09-11  8:14 ` David Hildenbrand
2025-09-11  8:34   ` Lorenzo Stoakes
2025-09-11  9:18   ` Barry Song
2025-09-11 10:47     ` Lorenzo Stoakes
2025-09-11  8:28 ` Lorenzo Stoakes
2025-09-11 18:22   ` Jann Horn
2025-09-12  4:49     ` Lorenzo Stoakes
2025-09-12 11:37       ` Jann Horn [this message]
2025-09-12 11:56         ` Lorenzo Stoakes
2025-09-14 23:53 ` Matthew Wilcox
2025-09-15  0:23   ` Barry Song
2025-09-15  1:47     ` Suren Baghdasaryan
2025-09-15  8:41       ` Lorenzo Stoakes
2025-09-15  2:50     ` Matthew Wilcox
2025-09-15  5:17       ` David Hildenbrand
2025-09-15  9:42         ` Lorenzo Stoakes
2025-09-15 10:29           ` David Hildenbrand
2025-09-15 10:56             ` Lorenzo Stoakes
2025-09-15  9:22       ` Lorenzo Stoakes
2025-09-15 10:41         ` David Hildenbrand
2025-09-15 10:51           ` Lorenzo Stoakes
2025-09-15  8:57   ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAG48ez1FXUMsvKEcJCFcUBUP0BM2=LTWzH=iKiHhL4PurN6q8g@mail.gmail.com' \
    --to=jannh@google.com \
    --cc=21cnbao@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=harry.yoo@oracle.com \
    --cc=kaleshsingh@google.com \
    --cc=linux-mm@kvack.org \
    --cc=lokeshgidra@google.com \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=ngeoffray@google.com \
    --cc=peterx@redhat.com \
    --cc=riel@surriel.com \
    --cc=sj@kernel.org \
    --cc=surenb@google.com \
    --cc=v-songbaohua@oppo.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox