linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lokesh Gidra <lokeshgidra@google.com>
To: David Hildenbrand <david@redhat.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Harry Yoo <harry.yoo@oracle.com>, Zi Yan <ziy@nvidia.com>,
	Barry Song <21cnbao@gmail.com>,
	 "open list:MEMORY MANAGEMENT" <linux-mm@kvack.org>,
	Peter Xu <peterx@redhat.com>,
	 Suren Baghdasaryan <surenb@google.com>,
	Kalesh Singh <kaleshsingh@google.com>,
	 android-mm <android-mm@google.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	 Jann Horn <jannh@google.com>, Rik van Riel <riel@surriel.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	 "Liam R. Howlett" <Liam.Howlett@oracle.com>
Subject: Re: [DISCUSSION] Unconditionally lock folios when calling rmap_walk()
Date: Mon, 25 Aug 2025 11:46:02 -0700	[thread overview]
Message-ID: <CA+EESO5Phk1W64mNm=YG8E1oNEXENP94cd5FUuq0PhcUsOe7+Q@mail.gmail.com> (raw)
In-Reply-To: <dc92aef8-757f-4432-923e-70d92d13fb37@redhat.com>

On Mon, Aug 25, 2025 at 8:19 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 22.08.25 19:29, Lokesh Gidra wrote:
> > Hi all,
> >
> > Currently, some callers of rmap_walk() conditionally avoid try-locking
> > non-ksm anon folios. This necessitates serialization through anon_vma
> > write-lock elsewhere when folio->mapping and/or folio->index (fields
> > involved in rmap_walk()) are to be updated. This hurts scalability due
> > to coarse granularity of the lock. For instance, when multiple threads
> > invoke userfaultfd’s MOVE ioctl simultaneously to move distinct pages
> > from the same src VMA, they all contend for the corresponding
> > anon_vma’s lock. Field traces for arm64 android devices reveal over
> > 30ms of uninterruptible sleep in the main UI thread, leading to janky
> > user interactions.
> >
> > Among all rmap_walk() callers that don’t lock anon folios,
> > folio_referenced() is the most critical (others are
> > page_idle_clear_pte_refs(), damon_folio_young(), and
> > damon_folio_mkold()). The relevant code in folio_referenced() is:
> >
> > if (!is_locked && (!folio_test_anon(folio) || folio_test_ksm(folio))) {
> >          we_locked = folio_trylock(folio);
> >          if (!we_locked)
> >                  return 1;
> > }
> >
> > It’s unclear why locking anon_vma exclusively (when updating
> > folio->mapping, like in uffd MOVE) is beneficial over walking rmap
> > with folio locked. It’s in the reclaim path, so should not be a
> > critical path that necessitates some special treatment, unless I’m
> > missing something.
> >
> > Therefore, I propose simplifying the locking mechanism by ensuring the
> > folio is locked before calling rmap_walk().
>
> Essentially, what you mean is roughly:
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 34333ae3bd80f..0800e73c0796e 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1005,7 +1005,7 @@ int folio_referenced(struct folio *folio, int is_locked,
>          if (!folio_raw_mapping(folio))
>                  return 0;
>
> -       if (!is_locked && (!folio_test_anon(folio) || folio_test_ksm(folio))) {
> +       if (!is_locked) {
>                  we_locked = folio_trylock(folio);
>                  if (!we_locked)
>                          return 1;
>
>
> The downside of that change is that ordinary (!ksm) folios will observe being locked
> when we are actually only trying to asses if they were referenced.
>
> Does it matter?
>
> I can only speculate that it might have been very relevant before
> 6c287605fd56 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive").
>
> Essentially any R/O fault would have resulted in us copying the page, simply because
> there is concurrent folio_referenced() happening.
>
> Before 09854ba94c6a ("mm: do_wp_page() simplification") that wasn't an issue, but
> it would have meant that the write fault would be stuck until folio_referenced()
> would be done, which is also suboptimal.
>
> So likely, avoiding grabbing the folio lock was beneficial.
>
>
> Today, this would only affect R/O pages after fork (PageAnonExclusive not set).
>
>
> Staring at shrink_active_list()->folio_referenced(), we isolate the folio first
> (grabbing reference+clearing LRU), so do_wp_page()->wp_can_reuse_anon_folio()
> would already refuse to reuse immediately, because it would spot a raised reference.
> The folio lock does not make a difference anymore.
>
>
> Is there any other anon-specific (!ksm) folio locking? Nothing exciting comes to mind,
> except maybe some folio splitting or khugepaged that maybe would have to wait.
>
> But khugepaged would already also fail to isolate these folios, so probably it's not that
> relevant anymore ...

Thanks so much for your thorough analysis. Very useful!

For folio splitting, it seems anon_vma lock is acquired exclusively,
so it serializes against folio_referenced() anyways.
>
> --
> Cheers
>
> David / dhildenb
>


  reply	other threads:[~2025-08-25 18:46 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-22 17:29 Lokesh Gidra
2025-08-22 17:44 ` Matthew Wilcox
2025-08-22 18:08   ` Lorenzo Stoakes
2025-08-24  4:18 ` Lokesh Gidra
2025-08-24  5:31   ` Harry Yoo
2025-08-24  6:45     ` Lokesh Gidra
2025-08-25 10:52       ` Harry Yoo
2025-08-28 11:32         ` Lorenzo Stoakes
2025-08-25 15:19 ` David Hildenbrand
2025-08-25 18:46   ` Lokesh Gidra [this message]
2025-08-28 12:04   ` Lorenzo Stoakes
2025-08-29  0:23     ` Lokesh Gidra
2025-08-29  8:42       ` David Hildenbrand
2025-08-29  9:04         ` Lorenzo Stoakes
2025-09-02 18:59           ` Lokesh Gidra
2025-09-02 19:01             ` David Hildenbrand
2025-09-02 19:04               ` Lokesh Gidra
2025-09-08  4:56                 ` Lokesh Gidra
2025-08-29  9:01       ` Lorenzo Stoakes
2025-08-26 15:52 ` Lorenzo Stoakes
2025-08-26 22:23   ` Lokesh Gidra
2025-08-28 11:31     ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+EESO5Phk1W64mNm=YG8E1oNEXENP94cd5FUuq0PhcUsOe7+Q@mail.gmail.com' \
    --to=lokeshgidra@google.com \
    --cc=21cnbao@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=android-mm@google.com \
    --cc=david@redhat.com \
    --cc=harry.yoo@oracle.com \
    --cc=jannh@google.com \
    --cc=kaleshsingh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=peterx@redhat.com \
    --cc=riel@surriel.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox