linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Lokesh Gidra <lokeshgidra@google.com>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
	kaleshsingh@google.com, ngeoffray@google.com, jannh@google.com,
	David Hildenbrand <david@redhat.com>,
	Harry Yoo <harry.yoo@oracle.com>, Peter Xu <peterx@redhat.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Barry Song <baohua@kernel.org>, SeongJae Park <sj@kernel.org>
Subject: Re: [PATCH v2 1/2] mm: always call rmap_walk() on locked folios
Date: Mon, 3 Nov 2025 17:51:08 +0000	[thread overview]
Message-ID: <e3cf072c-7523-4931-a060-a2890a30733a@lucifer.local> (raw)
In-Reply-To: <20250923071019.775806-2-lokeshgidra@google.com>

Apologies for late review, I somehow missed this...!

On Tue, Sep 23, 2025 at 12:10:18AM -0700, Lokesh Gidra wrote:
> Guarantee that rmap_walk() is called on locked folios so that threads
> changing folio->mapping and folio->index for non-KSM anon folios can
> serialize on fine-grained folio lock rather than anon_vma lock. Other
> folio types are already always locked before rmap_walk(). With this, we
> are going from 'not necessarily' locking the non-KSM anon folio to
> 'definitely' locking it during rmap walks.
>
> This patch is in preparation for removing anon_vma write-lock from
> UFFDIO_MOVE.
>
> With this patch, three functions are now expected to be called with a
> locked folio. To be careful of not missing any case, here is the
> exhaustive list of all their callers.
>
> 1) rmap_walk() is called from:
>
> a) folio_referenced()
> b) damon_folio_mkold()
> c) damon_folio_young()
> d) page_idle_clear_pte_refs()
> e) try_to_unmap()
> f) try_to_migrate()
> g) folio_mkclean()
> h) remove_migration_ptes()
>
> In the above list, first 4 are changed in this patch to try-lock non-KSM
> anon folios, similar to other types of folios. The remaining functions
> in the list already hold folio lock when calling rmap_walk().
>
> 2) folio_lock_anon_vma_read() is called from following functions:
>
> a) collect_procs_anon()
> b) page_idle_clear_pte_refs()
> c) damon_folio_mkold()
> d) damon_folio_young()
> e) folio_referenced()
> f) try_to_unmap()
> g) try_to_migrate()
>
> All the functions in above list, except collect_procs_anon(), are
> covered by the rmap_walk() list above. For collect_procs_anon(), with
> kill_procs_now() changed to take folio lock in this patch ensures that
> all callers of folio_lock_anon_vma_read() now hold the lock.
>
> 3) folio_get_anon_vma() is called from following functions, all of which
>    already hold the folio lock:
>
> a) move_pages_huge_pmd()
> b) __folio_split()
> c) move_pages_ptes()
> d) migrate_folio_unmap()
> e) unmap_and_move_huge_page()
>
> Functionally, this patch doesn't break the logic because rmap walkers
> generally do some other check to see if what is expected to mapped did
> happen so it's fine, or otherwise treat things as best-effort.
>
> Among the 4 functions changed in this patch, folio_referenced() is the
> only core-mm function, and is also frequently accessed. To assess the
> impact of locking non-KSM anon folios in
> shrink_active_list()->folio_referenced() path, we performed an app cycle
> test on an arm64 android device. During the whole duration of the test
> there were over 140k invocations of shrink_active_list(), out of which
> over 29k had at least one non-KSM anon folio on which folio_referenced()
> was called. In none of these invocations folio_trylock() failed.
>
> Of course, we now take a lock where we wouldn't previously have. In the
> past it would have had a major impact in causing a CoW write fault to
> copy a page in do_wp_page(), as commit 09854ba94c6a ("mm: do_wp_page()
> simplification") caused a failure to obtain folio lock to result in a
> page copy even if one wasn't necessary.
>
> However, since commit 6c287605fd56 ("mm: remember exclusively mapped
> anonymous pages with PG_anon_exclusive"), and the introduction of the
> folio anon exclusive flag, this issue is significantly mitigated.

Thanks this is great!

>
> The only case remaining that we might worry about from this perspective
> is that of read-only folios immediately after fork where the anon
> exclusive bit will not have been set yet.
>
> We note however in the case of read-only just-forked folios that
> wp_can_reuse_anon_folio() will notice the raised reference count
> established by shrink_active_list() via isolate_lru_folios() and refuse
> to reuse in any case, so this will in fact have no impact - the folio
> lock is ultimately immaterial here.

Great!

>
> All-in-all it appears that there is little opportunity for meaningful
> negative impact from this change.

Thanks.

>
> CC: David Hildenbrand <david@redhat.com>
> CC: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> CC: Harry Yoo <harry.yoo@oracle.com>
> CC: Peter Xu <peterx@redhat.com>
> CC: Suren Baghdasaryan <surenb@google.com>
> CC: Barry Song <baohua@kernel.org>
> CC: SeongJae Park <sj@kernel.org>
> Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>

LGTM, so:

Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

> ---
>  mm/damon/ops-common.c | 16 ++++------------
>  mm/memory-failure.c   |  3 +++
>  mm/page_idle.c        |  8 ++------
>  mm/rmap.c             | 42 ++++++++++++------------------------------
>  4 files changed, 21 insertions(+), 48 deletions(-)
>
> diff --git a/mm/damon/ops-common.c b/mm/damon/ops-common.c
> index 998c5180a603..f61d6dde13dc 100644
> --- a/mm/damon/ops-common.c
> +++ b/mm/damon/ops-common.c
> @@ -162,21 +162,17 @@ void damon_folio_mkold(struct folio *folio)
>  		.rmap_one = damon_folio_mkold_one,
>  		.anon_lock = folio_lock_anon_vma_read,
>  	};
> -	bool need_lock;
>
>  	if (!folio_mapped(folio) || !folio_raw_mapping(folio)) {
>  		folio_set_idle(folio);
>  		return;
>  	}
>
> -	need_lock = !folio_test_anon(folio) || folio_test_ksm(folio);
> -	if (need_lock && !folio_trylock(folio))
> +	if (!folio_trylock(folio))
>  		return;
>
>  	rmap_walk(folio, &rwc);
> -
> -	if (need_lock)
> -		folio_unlock(folio);
> +	folio_unlock(folio);
>
>  }
>
> @@ -228,7 +224,6 @@ bool damon_folio_young(struct folio *folio)
>  		.rmap_one = damon_folio_young_one,
>  		.anon_lock = folio_lock_anon_vma_read,
>  	};
> -	bool need_lock;
>
>  	if (!folio_mapped(folio) || !folio_raw_mapping(folio)) {
>  		if (folio_test_idle(folio))
> @@ -237,14 +232,11 @@ bool damon_folio_young(struct folio *folio)
>  			return true;
>  	}
>
> -	need_lock = !folio_test_anon(folio) || folio_test_ksm(folio);
> -	if (need_lock && !folio_trylock(folio))
> +	if (!folio_trylock(folio))
>  		return false;
>
>  	rmap_walk(folio, &rwc);
> -
> -	if (need_lock)
> -		folio_unlock(folio);
> +	folio_unlock(folio);
>
>  	return accessed;
>  }
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index a24806bb8e82..f698df156bf8 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -2143,7 +2143,10 @@ static void kill_procs_now(struct page *p, unsigned long pfn, int flags,
>  {
>  	LIST_HEAD(tokill);
>
> +	folio_lock(folio);
>  	collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED);
> +	folio_unlock(folio);
> +
>  	kill_procs(&tokill, true, pfn, flags);
>  }
>
> diff --git a/mm/page_idle.c b/mm/page_idle.c
> index a82b340dc204..9bf573d22e87 100644
> --- a/mm/page_idle.c
> +++ b/mm/page_idle.c
> @@ -101,19 +101,15 @@ static void page_idle_clear_pte_refs(struct folio *folio)
>  		.rmap_one = page_idle_clear_pte_refs_one,
>  		.anon_lock = folio_lock_anon_vma_read,
>  	};
> -	bool need_lock;
>
>  	if (!folio_mapped(folio) || !folio_raw_mapping(folio))
>  		return;
>
> -	need_lock = !folio_test_anon(folio) || folio_test_ksm(folio);
> -	if (need_lock && !folio_trylock(folio))
> +	if (!folio_trylock(folio))
>  		return;
>
>  	rmap_walk(folio, &rwc);
> -
> -	if (need_lock)
> -		folio_unlock(folio);
> +	folio_unlock(folio);
>  }
>
>  static ssize_t page_idle_bitmap_read(struct file *file, struct kobject *kobj,
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 0bc7cf8b7359..fd9f18670440 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -489,17 +489,15 @@ void __init anon_vma_init(void)
>   * if there is a mapcount, we can dereference the anon_vma after observing
>   * those.
>   *
> - * NOTE: the caller should normally hold folio lock when calling this.  If
> - * not, the caller needs to double check the anon_vma didn't change after
> - * taking the anon_vma lock for either read or write (UFFDIO_MOVE can modify it
> - * concurrently without folio lock protection). See folio_lock_anon_vma_read()
> - * which has already covered that, and comment above remap_pages().
> + * NOTE: the caller should hold folio lock when calling this.
>   */
>  struct anon_vma *folio_get_anon_vma(const struct folio *folio)
>  {
>  	struct anon_vma *anon_vma = NULL;
>  	unsigned long anon_mapping;
>
> +	VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
> +
>  	rcu_read_lock();
>  	anon_mapping = (unsigned long)READ_ONCE(folio->mapping);
>  	if ((anon_mapping & FOLIO_MAPPING_FLAGS) != FOLIO_MAPPING_ANON)
> @@ -546,7 +544,8 @@ struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio,
>  	struct anon_vma *root_anon_vma;
>  	unsigned long anon_mapping;
>
> -retry:
> +	VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
> +
>  	rcu_read_lock();
>  	anon_mapping = (unsigned long)READ_ONCE(folio->mapping);
>  	if ((anon_mapping & FOLIO_MAPPING_FLAGS) != FOLIO_MAPPING_ANON)
> @@ -557,17 +556,6 @@ struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio,
>  	anon_vma = (struct anon_vma *) (anon_mapping - FOLIO_MAPPING_ANON);
>  	root_anon_vma = READ_ONCE(anon_vma->root);
>  	if (down_read_trylock(&root_anon_vma->rwsem)) {
> -		/*
> -		 * folio_move_anon_rmap() might have changed the anon_vma as we
> -		 * might not hold the folio lock here.
> -		 */
> -		if (unlikely((unsigned long)READ_ONCE(folio->mapping) !=
> -			     anon_mapping)) {
> -			up_read(&root_anon_vma->rwsem);
> -			rcu_read_unlock();
> -			goto retry;
> -		}
> -
>  		/*
>  		 * If the folio is still mapped, then this anon_vma is still
>  		 * its anon_vma, and holding the mutex ensures that it will
> @@ -602,18 +590,6 @@ struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio,
>  	rcu_read_unlock();
>  	anon_vma_lock_read(anon_vma);
>
> -	/*
> -	 * folio_move_anon_rmap() might have changed the anon_vma as we might
> -	 * not hold the folio lock here.
> -	 */
> -	if (unlikely((unsigned long)READ_ONCE(folio->mapping) !=
> -		     anon_mapping)) {
> -		anon_vma_unlock_read(anon_vma);
> -		put_anon_vma(anon_vma);
> -		anon_vma = NULL;
> -		goto retry;
> -	}
> -
>  	if (atomic_dec_and_test(&anon_vma->refcount)) {
>  		/*
>  		 * Oops, we held the last refcount, release the lock
> @@ -988,7 +964,7 @@ int folio_referenced(struct folio *folio, int is_locked,
>  	if (!folio_raw_mapping(folio))
>  		return 0;
>
> -	if (!is_locked && (!folio_test_anon(folio) || folio_test_ksm(folio))) {
> +	if (!is_locked) {
>  		we_locked = folio_trylock(folio);
>  		if (!we_locked)
>  			return 1;
> @@ -2820,6 +2796,12 @@ static void rmap_walk_anon(struct folio *folio,
>  	pgoff_t pgoff_start, pgoff_end;
>  	struct anon_vma_chain *avc;
>
> +	/*
> +	 * The folio lock ensures that folio->mapping can't be changed under us
> +	 * to an anon_vma with different root.
> +	 */
> +	VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
> +
>  	if (locked) {
>  		anon_vma = folio_anon_vma(folio);
>  		/* anon_vma disappear under us? */
> --
> 2.51.0.534.gc79095c0ca-goog
>
>


  parent reply	other threads:[~2025-11-03 17:51 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-23  7:10 [PATCH v2 0/2] Improve UFFDIO_MOVE scalability by removing anon_vma lock Lokesh Gidra
2025-09-23  7:10 ` [PATCH v2 1/2] mm: always call rmap_walk() on locked folios Lokesh Gidra
2025-09-24 10:06   ` David Hildenbrand
2025-10-02  7:56     ` David Hildenbrand
2025-11-03 17:51   ` Lorenzo Stoakes [this message]
2025-09-23  7:10 ` [PATCH v2 2/2] mm/userfaultfd: don't lock anon_vma when performing UFFDIO_MOVE Lokesh Gidra
2025-09-24 10:07   ` David Hildenbrand
2025-11-03 17:52   ` Lorenzo Stoakes
2025-10-03 23:03 ` [PATCH v2 0/2] Improve UFFDIO_MOVE scalability by removing anon_vma lock Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e3cf072c-7523-4931-a060-a2890a30733a@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=david@redhat.com \
    --cc=harry.yoo@oracle.com \
    --cc=jannh@google.com \
    --cc=kaleshsingh@google.com \
    --cc=linux-mm@kvack.org \
    --cc=lokeshgidra@google.com \
    --cc=ngeoffray@google.com \
    --cc=peterx@redhat.com \
    --cc=sj@kernel.org \
    --cc=surenb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox