linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Improve UFFDIO_MOVE scalability by removing anon_vma lock
@ 2025-09-18  5:51 Lokesh Gidra
  2025-09-18  5:51 ` [PATCH 1/2] mm: always call rmap_walk() on locked folios Lokesh Gidra
  2025-09-18  5:51 ` [PATCH 2/2] mm/userfaultfd: don't lock anon_vma when performing UFFDIO_MOVE Lokesh Gidra
  0 siblings, 2 replies; 26+ messages in thread
From: Lokesh Gidra @ 2025-09-18  5:51 UTC (permalink / raw)
  To: akpm
  Cc: linux-mm, kaleshsingh, ngeoffray, jannh, Lokesh Gidra,
	David Hildenbrand, Lorenzo Stoakes, Harry Yoo, Peter Xu,
	Suren Baghdasaryan, Barry Song, SeongJae Park

Userfaultfd has a scalability issue in its UFFDIO_MOVE ioctl, which is
heavily used in Android as its java garbage collector uses it for
concurrent heap compaction.

The issue arises because UFFDIO_MOVE updates folio->mapping to an
anon_vma with a different root, in order to move the folio from a src
VMA to dst VMA. It performs the operation with the folio locked, but
this is insufficient, because rmap_walk() can be performed on non-KSM
anonymous folios without folio lock.

This means that UFFDIO_MOVE has to acquire the anon_vma write lock
of the root anon_vma belonging to the folio it wishes to move.

This causes scalability bottleneck when multiple threads perform
UFFDIO_MOVE simultanously on distinct pages of the same src VMA. In
field traces of arm64 android devices, we have observed janky user
interactions due to long (sometimes over ~50ms) uninterruptible
sleeps on main UI thread caused by anon_vma lock contention in
UFFDIO_MOVE. This is particularly severe during the beginning of
GC's compaction phase when it is likely to have multiple threads
involved.

This patch resolves the issue by removing the exception in rmap_walk()
for non-KSM anon folios by ensuring that all folios are locked during
rmap walk. This is less problematic than it might seem, as the only
major caller which utilises this mode is shrink_active_list().

To assess the impact of locking non-KSM anon folios in
shrink_active_list(), we performed an app cycle test on an arm64
android device. During the whole duration of the test there were over
140k invocations of the function, out of which over 29k had at least
one non-KSM anon folio on which folio_referenced() was called. In none
of these invocations folio_trylock() failed.

Of course, we now take a lock where we wouldn't previously have. In the
past it would have had a major impact in causing a CoW write fault to
copy a page in do_wp_page(), as commit 09854ba94c6a ("mm: do_wp_page()
simplification") caused a failure to obtain folio lock to result in a
page copy even if one wasn't necessary.

However, since commit 6c287605fd56 ("mm: remember exclusively mapped
anonymous pages with PG_anon_exclusive"), and the introduction of the
folio anon exclusive flag, this issue is significantly mitigated.

The only case remaining that we might worry about from this perspective
is that of read-only folios immediately after fork where the anon
exclusive bit will not have been set yet.

We note however in the case of read-only just-forked folios that
wp_can_reuse_anon_folio() will notice the raised reference count
established by shrink_active_list() via isolate_lru_folios() and refuse
to reuse in any case, so this will in fact have no impact - the folio
lock is ultimately immaterial here.

All-in-all it appears that there is little opportunity for meaningful
negative impact from this change.

As a result of changing our approach to locking, we can remove all
the code that took steps to acquire an anon_vma write lock instead
of a folio lock. This results in a significant simplification and
scalability improvement of the code (currently only in UFFDIO_MOVE).
Furthermore, as a side-effect, folio_lock_anon_vma_read() gets simpler
as we don't need to worry that folio->mapping may have changed under us.

Prior discussions on this can be found at [1, 2].

[1] https://lore.kernel.org/all/CA+EESO4Z6wtX7ZMdDHQRe5jAAS_bQ-POq5+4aDx5jh2DvY6UHg@mail.gmail.com/
[2] https://lore.kernel.org/all/20250908044950.311548-1-lokeshgidra@google.com/

Lokesh Gidra (2):
  mm: always call rmap_walk() on locked folios
  mm/userfaultfd: don't lock anon_vma when performing UFFDIO_MOVE

CC: David Hildenbrand <david@redhat.com>
CC: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
CC: Harry Yoo <harry.yoo@oracle.com>
CC: Peter Xu <peterx@redhat.com>
CC: Suren Baghdasaryan <surenb@google.com>
CC: Barry Song <baohua@kernel.org>
CC: SeongJae Park <sj@kernel.org>
---
 mm/damon/ops-common.c | 16 +++--------
 mm/huge_memory.c      | 22 +--------------
 mm/memory-failure.c   |  3 +++
 mm/page_idle.c        |  8 ++----
 mm/rmap.c             | 42 +++++++++--------------------
 mm/userfaultfd.c      | 62 ++++++++-----------------------------------
 6 files changed, 33 insertions(+), 120 deletions(-)


base-commit: 27efecc552641210647138ad3936229e7dacdf42
-- 
2.51.0.384.g4c02a37b29-goog



^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2025-11-03 16:38 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-18  5:51 [PATCH 0/2] Improve UFFDIO_MOVE scalability by removing anon_vma lock Lokesh Gidra
2025-09-18  5:51 ` [PATCH 1/2] mm: always call rmap_walk() on locked folios Lokesh Gidra
2025-09-18 11:57   ` Lorenzo Stoakes
2025-09-19  5:45     ` Lokesh Gidra
2025-09-19  9:59       ` Lorenzo Stoakes
2025-11-03 14:58       ` Lorenzo Stoakes
2025-11-03 15:46         ` Lokesh Gidra
2025-11-03 16:38           ` Lorenzo Stoakes
2025-09-18 12:15   ` David Hildenbrand
2025-09-19  6:09     ` Lokesh Gidra
2025-09-24 10:00       ` David Hildenbrand
2025-09-24 19:17         ` Lokesh Gidra
2025-09-25 11:06           ` David Hildenbrand
2025-10-02  6:46             ` Lokesh Gidra
2025-10-02  7:22               ` David Hildenbrand
2025-10-02  7:48                 ` Lokesh Gidra
2025-10-03 23:02                 ` Peter Xu
2025-10-06  6:43                   ` David Hildenbrand
2025-10-06 19:49                     ` Peter Xu
2025-10-06 20:02                       ` David Hildenbrand
2025-10-06 20:50                         ` Peter Xu
2025-09-18  5:51 ` [PATCH 2/2] mm/userfaultfd: don't lock anon_vma when performing UFFDIO_MOVE Lokesh Gidra
2025-09-18 12:38   ` Lorenzo Stoakes
2025-09-19  6:30     ` Lokesh Gidra
2025-09-19  9:57       ` Lorenzo Stoakes
2025-09-19 18:34         ` Lokesh Gidra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox