linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Barry Song <21cnbao@gmail.com>
Cc: Nicolas Geoffray <ngeoffray@google.com>,
	Lokesh Gidra <lokeshgidra@google.com>,
	David Hildenbrand <david@redhat.com>,
	Harry Yoo <harry.yoo@oracle.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@surriel.com>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
	Linux-MM <linux-mm@kvack.org>,
	Kalesh Singh <kaleshsingh@google.com>,
	SeongJae Park <sj@kernel.org>, Barry Song <v-songbaohua@oppo.com>,
	Peter Xu <peterx@redhat.com>
Subject: Re: [DISCUSSION] anon_vma root lock contention and per anon_vma lock
Date: Thu, 11 Sep 2025 09:28:51 +0100	[thread overview]
Message-ID: <a67129f8-9ff6-4109-bbbf-4209f6dfa3be@lucifer.local> (raw)
In-Reply-To: <CAGsJ_4x=YsQR=nNcHA-q=0vg0b7ok=81C_qQqKmoJ+BZ+HVduQ@mail.gmail.com>

On Thu, Sep 11, 2025 at 07:17:01PM +1200, Barry Song wrote:
> Hi All,
>
> I’m aware that Lokesh started a discussion on the concurrency issue
> between usefaultfd_move and memory reclamation [1]. However, my
> concern is different, so I’m starting a separate discussion.
>
> In the process tree, many processes may share anon_vma->root, even if
> they don’t share the anon_vma itself. This causes serious lock contention
> between memory reclamation (which calls folio_referenced and try_to_unmap)
> and other processes calling fork(), exit(), mprotect(), etc.

Well, when you say lock contention, I mean - we need to have a lock that is held
over the entire fork tree, as we are cloning references to them.

This is at the anon_vma level - so the folio might be exclusive, but other
folios there might not be.

Note that I'm working on a radical rework of anon_vma's at the moment (time
is not in my favour given other tasks + review workload, but it _is_
happening).

So I'm interested to gather real world usecase data on how best to
implement things and this is interesting re: that.

My proposed approach would use something like ranged locks. It's a bit
fuzzy right now so definitely interested in putting some meat on that.

>
> On Android, this issue becomes more severe since many processes are
> descendants of zygote.
>
> Memory reclamation path:
>   folio_lock_anon_vma_read
>
> mprotect path:
>   mprotect
>     split_vma
>       anon_vma_clone
>
> fork / copy_process path:
>   copy_process
>     dup_mmap
>       anon_vma_fork
>
> exit path:
>   exit_mmap
>     free_pgtables
>       unlink_anon_vmas
>
> To be honest, memory reclamation—especially folio_referenced()—is a
> problem. It is called very frequently and can block other important
> user threads waiting for the anon_vma root lock, causing UI lag.
>
> I have a rough idea: since the vast majority of anon folios are actually
> exclusive (I observed almost 98% of Android anon folios fall into this
> category), they don’t need to iterate the anon_vma tree. They belong to
> a single process, and even for rmap, it is per-process.
>
> I propose introducing a per-anon_vma lock. For exclusive folios whose
> anon_vma is not shared, we could use this per-anon_vma lock.

I'm not sure how adding _more_ locks is going to reduce contention :) and
the anon_vma's are all linked to their parents etc. etc. so it's simply not
ok to hold one lock and not the others when making changes.

> folio_referenced declares that it will begin reading, and Lokesh’s
> folio_lock may also help maintain folios as exclusive, so I am
> somewhat in favor of his RFC. Any thread writing to such an anon_vma

Will reply on his latest re: Lokesh's approach.

> would take the per-vma write lock, and possibly also the anon_vma
> root write lock. If folio_referenced fails to declare the per-vma lock,
> it can fall back to the global anon_vma->root read mutex, similar to
> mmap_lock.

Again, we actually _need_ to hold a lock over this range. So you can't just
hold the root and a descendent it has to be all.

>
> I haven’t carefully considered this or written any code yet—just a
> very rough idea. Sorry if it comes across as too naive.

It's fine, though I do wish we'd have a _little_ less workload this cycle,
can barely breath at the moment, but that's not your fault ;)

I do wonder whether part of the problem here is keeping anon_vma's
connected to parents whwen they don't need to be.

Right now, even if you entirely CoW everything in a VMA, we are still
attached to parents with all the overhead. That's something I can look at.

But also perhaps worth considering how we approach the whole clone thing.

My (very early) anon_vma rework would do away with anon_vma_chain's
altogether and make forking simpler.

There'd be a per-mm object that connects to others via (probably) interval
tree edges for ranges that are CoW, so splitting for instance would be
easier.

Early days with it though...

>
> [1] https://lore.kernel.org/linux-mm/CA+EESO4Z6wtX7ZMdDHQRe5jAAS_bQ-POq5+4aDx5jh2DvY6UHg@mail.gmail.com/
>
> Thanks
> Barry
>

Cheers, Lorenzo


  parent reply	other threads:[~2025-09-11  8:29 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-11  7:17 Barry Song
2025-09-11  8:14 ` David Hildenbrand
2025-09-11  8:34   ` Lorenzo Stoakes
2025-09-11  9:18   ` Barry Song
2025-09-11 10:47     ` Lorenzo Stoakes
2025-09-11  8:28 ` Lorenzo Stoakes [this message]
2025-09-11 18:22   ` Jann Horn
2025-09-12  4:49     ` Lorenzo Stoakes
2025-09-12 11:37       ` Jann Horn
2025-09-12 11:56         ` Lorenzo Stoakes
2025-09-14 23:53 ` Matthew Wilcox
2025-09-15  0:23   ` Barry Song
2025-09-15  1:47     ` Suren Baghdasaryan
2025-09-15  8:41       ` Lorenzo Stoakes
2025-09-15  2:50     ` Matthew Wilcox
2025-09-15  5:17       ` David Hildenbrand
2025-09-15  9:42         ` Lorenzo Stoakes
2025-09-15 10:29           ` David Hildenbrand
2025-09-15 10:56             ` Lorenzo Stoakes
2025-09-15  9:22       ` Lorenzo Stoakes
2025-09-15 10:41         ` David Hildenbrand
2025-09-15 10:51           ` Lorenzo Stoakes
2025-09-15  8:57   ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a67129f8-9ff6-4109-bbbf-4209f6dfa3be@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=21cnbao@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=harry.yoo@oracle.com \
    --cc=jannh@google.com \
    --cc=kaleshsingh@google.com \
    --cc=linux-mm@kvack.org \
    --cc=lokeshgidra@google.com \
    --cc=ngeoffray@google.com \
    --cc=peterx@redhat.com \
    --cc=riel@surriel.com \
    --cc=sj@kernel.org \
    --cc=surenb@google.com \
    --cc=v-songbaohua@oppo.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox