From: Lokesh Gidra <lokeshgidra@google.com>
To: David Hildenbrand <david@redhat.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Andrew Morton <akpm@linux-foundation.org>,
Harry Yoo <harry.yoo@oracle.com>, Zi Yan <ziy@nvidia.com>,
Barry Song <21cnbao@gmail.com>,
"open list:MEMORY MANAGEMENT" <linux-mm@kvack.org>,
Peter Xu <peterx@redhat.com>,
Suren Baghdasaryan <surenb@google.com>,
Kalesh Singh <kaleshsingh@google.com>,
android-mm <android-mm@google.com>,
linux-kernel <linux-kernel@vger.kernel.org>,
Jann Horn <jannh@google.com>, Rik van Riel <riel@surriel.com>,
Vlastimil Babka <vbabka@suse.cz>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>
Subject: Re: [DISCUSSION] Unconditionally lock folios when calling rmap_walk()
Date: Mon, 25 Aug 2025 11:46:02 -0700 [thread overview]
Message-ID: <CA+EESO5Phk1W64mNm=YG8E1oNEXENP94cd5FUuq0PhcUsOe7+Q@mail.gmail.com> (raw)
In-Reply-To: <dc92aef8-757f-4432-923e-70d92d13fb37@redhat.com>
On Mon, Aug 25, 2025 at 8:19 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 22.08.25 19:29, Lokesh Gidra wrote:
> > Hi all,
> >
> > Currently, some callers of rmap_walk() conditionally avoid try-locking
> > non-ksm anon folios. This necessitates serialization through anon_vma
> > write-lock elsewhere when folio->mapping and/or folio->index (fields
> > involved in rmap_walk()) are to be updated. This hurts scalability due
> > to coarse granularity of the lock. For instance, when multiple threads
> > invoke userfaultfd’s MOVE ioctl simultaneously to move distinct pages
> > from the same src VMA, they all contend for the corresponding
> > anon_vma’s lock. Field traces for arm64 android devices reveal over
> > 30ms of uninterruptible sleep in the main UI thread, leading to janky
> > user interactions.
> >
> > Among all rmap_walk() callers that don’t lock anon folios,
> > folio_referenced() is the most critical (others are
> > page_idle_clear_pte_refs(), damon_folio_young(), and
> > damon_folio_mkold()). The relevant code in folio_referenced() is:
> >
> > if (!is_locked && (!folio_test_anon(folio) || folio_test_ksm(folio))) {
> > we_locked = folio_trylock(folio);
> > if (!we_locked)
> > return 1;
> > }
> >
> > It’s unclear why locking anon_vma exclusively (when updating
> > folio->mapping, like in uffd MOVE) is beneficial over walking rmap
> > with folio locked. It’s in the reclaim path, so should not be a
> > critical path that necessitates some special treatment, unless I’m
> > missing something.
> >
> > Therefore, I propose simplifying the locking mechanism by ensuring the
> > folio is locked before calling rmap_walk().
>
> Essentially, what you mean is roughly:
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 34333ae3bd80f..0800e73c0796e 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1005,7 +1005,7 @@ int folio_referenced(struct folio *folio, int is_locked,
> if (!folio_raw_mapping(folio))
> return 0;
>
> - if (!is_locked && (!folio_test_anon(folio) || folio_test_ksm(folio))) {
> + if (!is_locked) {
> we_locked = folio_trylock(folio);
> if (!we_locked)
> return 1;
>
>
> The downside of that change is that ordinary (!ksm) folios will observe being locked
> when we are actually only trying to asses if they were referenced.
>
> Does it matter?
>
> I can only speculate that it might have been very relevant before
> 6c287605fd56 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive").
>
> Essentially any R/O fault would have resulted in us copying the page, simply because
> there is concurrent folio_referenced() happening.
>
> Before 09854ba94c6a ("mm: do_wp_page() simplification") that wasn't an issue, but
> it would have meant that the write fault would be stuck until folio_referenced()
> would be done, which is also suboptimal.
>
> So likely, avoiding grabbing the folio lock was beneficial.
>
>
> Today, this would only affect R/O pages after fork (PageAnonExclusive not set).
>
>
> Staring at shrink_active_list()->folio_referenced(), we isolate the folio first
> (grabbing reference+clearing LRU), so do_wp_page()->wp_can_reuse_anon_folio()
> would already refuse to reuse immediately, because it would spot a raised reference.
> The folio lock does not make a difference anymore.
>
>
> Is there any other anon-specific (!ksm) folio locking? Nothing exciting comes to mind,
> except maybe some folio splitting or khugepaged that maybe would have to wait.
>
> But khugepaged would already also fail to isolate these folios, so probably it's not that
> relevant anymore ...
Thanks so much for your thorough analysis. Very useful!
For folio splitting, it seems anon_vma lock is acquired exclusively,
so it serializes against folio_referenced() anyways.
>
> --
> Cheers
>
> David / dhildenb
>
next prev parent reply other threads:[~2025-08-25 18:46 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-22 17:29 Lokesh Gidra
2025-08-22 17:44 ` Matthew Wilcox
2025-08-22 18:08 ` Lorenzo Stoakes
2025-08-24 4:18 ` Lokesh Gidra
2025-08-24 5:31 ` Harry Yoo
2025-08-24 6:45 ` Lokesh Gidra
2025-08-25 10:52 ` Harry Yoo
2025-08-28 11:32 ` Lorenzo Stoakes
2025-08-25 15:19 ` David Hildenbrand
2025-08-25 18:46 ` Lokesh Gidra [this message]
2025-08-28 12:04 ` Lorenzo Stoakes
2025-08-29 0:23 ` Lokesh Gidra
2025-08-29 8:42 ` David Hildenbrand
2025-08-29 9:04 ` Lorenzo Stoakes
2025-09-02 18:59 ` Lokesh Gidra
2025-09-02 19:01 ` David Hildenbrand
2025-09-02 19:04 ` Lokesh Gidra
2025-09-08 4:56 ` Lokesh Gidra
2025-08-29 9:01 ` Lorenzo Stoakes
2025-08-26 15:52 ` Lorenzo Stoakes
2025-08-26 22:23 ` Lokesh Gidra
2025-08-28 11:31 ` Lorenzo Stoakes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CA+EESO5Phk1W64mNm=YG8E1oNEXENP94cd5FUuq0PhcUsOe7+Q@mail.gmail.com' \
--to=lokeshgidra@google.com \
--cc=21cnbao@gmail.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=android-mm@google.com \
--cc=david@redhat.com \
--cc=harry.yoo@oracle.com \
--cc=jannh@google.com \
--cc=kaleshsingh@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=peterx@redhat.com \
--cc=riel@surriel.com \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox