Re: [PATCH 1/2] mm: always call rmap_walk() on locked folios

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Lokesh Gidra <lokeshgidra@google.com>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
	kaleshsingh@google.com, ngeoffray@google.com, jannh@google.com,
	David Hildenbrand <david@redhat.com>,
	Harry Yoo <harry.yoo@oracle.com>, Peter Xu <peterx@redhat.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Barry Song <baohua@kernel.org>, SeongJae Park <sj@kernel.org>
Subject: Re: [PATCH 1/2] mm: always call rmap_walk() on locked folios
Date: Fri, 19 Sep 2025 10:59:31 +0100	[thread overview]
Message-ID: <67875b3a-69b0-4fe6-8e37-6289568e7921@lucifer.local> (raw)
In-Reply-To: <CA+EESO7yyhNH6TYza+j7h9JZb1_1eHdd1x3ASmuNga2yYng4JQ@mail.gmail.com>

On Thu, Sep 18, 2025 at 10:45:21PM -0700, Lokesh Gidra wrote:
> On Thu, Sep 18, 2025 at 4:57 AM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> >
> > On Wed, Sep 17, 2025 at 10:51:34PM -0700, Lokesh Gidra wrote:
> > > Guarantee that rmap_walk() is called on locked folios so that threads
> > > changing folio->mapping and folio->index for non-KSM anon folios can
> > > serialize on fine-grained folio lock rather than anon_vma lock. Other
> > > folio types are already always locked before rmap_walk().
> >
> > Be good to explain why you're doing certain things, like adding the folio
> > lock to kill_procs_now().
>
> Agreed! I'll add in the next version.

Great, thanks! :)

> >
> > Also worth noting that you're going from _definitely_ locking non-KSM anon
> > to _not necessarily_ locking it.
>
> Will do. But, just to be clear, you mean the opposite right?
> >
> > You should explain why you think this is fine (in general - rmap callers do
> > some other check to see if what is expected to mapped did happen so it's
> > fine, or otherwise treat things as best-effort).
>
> Sure thing.

Thanks!

> >
> > You should probably also put some information about performance impact
> > here, I think Barry provided some?
> >
> I added that in the cover letter. Basically the impact of trylocking
> non-KSM anon folios (in folio_referenced()) on active_shrink_list().
> I'll move it here.
> > >
> > > This is in preparation for removing anon_vma write-lock from
> > > UFFDIO_MOVE.
> >
> > Thanks for mentioning this :)
> >
> I got your message loud and clear the last time :)
> > >
> > > CC: David Hildenbrand <david@redhat.com>
> > > CC: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > > CC: Harry Yoo <harry.yoo@oracle.com>
> > > CC: Peter Xu <peterx@redhat.com>
> > > CC: Suren Baghdasaryan <surenb@google.com>
> > > CC: Barry Song <baohua@kernel.org>
> > > CC: SeongJae Park <sj@kernel.org>
> > > Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
> >
> > OK so you're making:
> >
> > folio_lock_anon_vma_read()
> > folio_get_anon_vma()
> >
> > Require folio locks.
> >
> > folio_lock_anon_vma_read() is called from:
> >
> > connect_procs_anon() - changed to take folio lock.
> > page_idle_clear_pte_refs() - changed to take folio lock.
> > damon_folio_mkold() - changed to take folio lock.
> > damon_folio_young() - changed to take folio lock.
> > folio_referenced() - changed to take folio lock.
> > try_to_unmap() - ???
> > try_to_migrate() - ???
> >
> > 9I note that we allow a TTU_RMAP_LOCKED walk in the above unmap, migrate
> > cases too, wonder how these will interact?)
> >
> > folio_get_anon_vma() is called from:
> >
> > move_pages_huge_pmd() - already holds folio lock.
> > __folio_split() - already holds folio lock.
> > move_pages_ptes() [uffd] - already holds folio lock.
> > migrate_folio_unmap() - already holds folio lock.
> > unmap_and_move_huge_page() - already holds folio lock.
> >
> > Can you:
> >
> > a. Confirm the try_to_unmap() and try_to_migrate() cases take the folio
> >    lock. Explicitly list the callers and how they acquire the folio lock.
> >
> Description comments of both try_to_migrate() and try_to_unmap() say
> that the caller must hold folio lock. But just to be safe, I went
> through all the callers, and all of them are holding the folio lock:
>
> try_to_unmap() is called from:
> unmap_folio()
> collapse_file()
> shrink_folio_list()
> shink_folio_list()->unmap_poisoned_folio()
> do_migrate_range()->unmap_poisoned_folio()
> try_memory_failure_hugetlb()->hwpoison_user_mappings()->unmap_poisoned_folio()
> memory_failure()->hwpoison_user_mappings()->unmap_poisoned_folio()
>
> try_to_migrate() is called from:
> unmap_folio()
> unmap_and_move_huge_page()
> migrate_folio_unmap()
> migrate_vma_collect()->migrate_vma_collect_pmd() acquires in case of
> migrate_vma_setup()->migrate_vma_unmap()->migrate_device_unmap()
> migrate_device_pfn_lock() acquires folio locks in the following cases:
> migrate_device_range()->migrate_device_unmap()
> migrate_device_pfns()->migrate_device_unmap()
>
> All the callers of rmap_walk()/rmap_walk_locked() are already covered
> in the folio_lock_anon_vma_read() list that you added, except
> remove_migration_ptes(), which is called from:
> __folio_split()->remap_page()
> migrate_device_unmap()
> __migrate_device_finalize()
> unmap_and_move_huge_page()
> migrate_folio_unmap() locks the folio for the following two in
> migrate_pages_batch():
> migrate_folio_move()
> migrate_folio_undo_src()

Awesome thanks for checking that! I did think it was probably fine, but
it's important to be as thorough as we can be.

>
> > b. Update the commit message to include the above. You're making a _very_
> >    sensitive locking change here, it's important to demonstrate that you've
> >    considered all cases.
> Certainly, will do.
> >
> > Thanks!
> >
> > > ---
> > >  mm/damon/ops-common.c | 16 ++++------------
> > >  mm/memory-failure.c   |  3 +++
> > >  mm/page_idle.c        |  8 ++------
> > >  mm/rmap.c             | 42 ++++++++++++------------------------------
> > >  4 files changed, 21 insertions(+), 48 deletions(-)
> > >
> > > diff --git a/mm/damon/ops-common.c b/mm/damon/ops-common.c
> > > index 998c5180a603..f61d6dde13dc 100644
> > > --- a/mm/damon/ops-common.c
> > > +++ b/mm/damon/ops-common.c
> > > @@ -162,21 +162,17 @@ void damon_folio_mkold(struct folio *folio)
> > >               .rmap_one = damon_folio_mkold_one,
> > >               .anon_lock = folio_lock_anon_vma_read,
> > >       };
> > > -     bool need_lock;
> > >
> > >       if (!folio_mapped(folio) || !folio_raw_mapping(folio)) {
> > >               folio_set_idle(folio);
> > >               return;
> > >       }
> > >
> > > -     need_lock = !folio_test_anon(folio) || folio_test_ksm(folio);
> > > -     if (need_lock && !folio_trylock(folio))
> > > +     if (!folio_trylock(folio))
> > >               return;
> >
> > This _seems_ to be best effort and not relying on anon always
> > succeeding. So should be fine.
> >
> > >
> > >       rmap_walk(folio, &rwc);
> > > -
> > > -     if (need_lock)
> > > -             folio_unlock(folio);
> > > +     folio_unlock(folio);
> > >
> > >  }
> > >
> > > @@ -228,7 +224,6 @@ bool damon_folio_young(struct folio *folio)
> > >               .rmap_one = damon_folio_young_one,
> > >               .anon_lock = folio_lock_anon_vma_read,
> > >       };
> > > -     bool need_lock;
> > >
> > >       if (!folio_mapped(folio) || !folio_raw_mapping(folio)) {
> > >               if (folio_test_idle(folio))
> > > @@ -237,14 +232,11 @@ bool damon_folio_young(struct folio *folio)
> > >                       return true;
> > >       }
> > >
> > > -     need_lock = !folio_test_anon(folio) || folio_test_ksm(folio);
> > > -     if (need_lock && !folio_trylock(folio))
> > > +     if (!folio_trylock(folio))
> > >               return false;
> >
> > Same as above here.
> >
> > >
> > >       rmap_walk(folio, &rwc);
> > > -
> > > -     if (need_lock)
> > > -             folio_unlock(folio);
> > > +     folio_unlock(folio);
> > >
> > >       return accessed;
> > >  }
> > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > > index a24806bb8e82..f698df156bf8 100644
> > > --- a/mm/memory-failure.c
> > > +++ b/mm/memory-failure.c
> > > @@ -2143,7 +2143,10 @@ static void kill_procs_now(struct page *p, unsigned long pfn, int flags,
> > >  {
> > >       LIST_HEAD(tokill);
> > >
> > > +     folio_lock(folio);
> > >       collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED);
> > > +     folio_unlock(folio);
> > > +
> >
> > Good. I hate how this works.
> >
> > >       kill_procs(&tokill, true, pfn, flags);
> > >  }
> > >
> > > diff --git a/mm/page_idle.c b/mm/page_idle.c
> > > index a82b340dc204..9bf573d22e87 100644
> > > --- a/mm/page_idle.c
> > > +++ b/mm/page_idle.c
> > > @@ -101,19 +101,15 @@ static void page_idle_clear_pte_refs(struct folio *folio)
> > >               .rmap_one = page_idle_clear_pte_refs_one,
> > >               .anon_lock = folio_lock_anon_vma_read,
> > >       };
> > > -     bool need_lock;
> > >
> > >       if (!folio_mapped(folio) || !folio_raw_mapping(folio))
> > >               return;
> > >
> > > -     need_lock = !folio_test_anon(folio) || folio_test_ksm(folio);
> > > -     if (need_lock && !folio_trylock(folio))
> > > +     if (!folio_trylock(folio))
> > >               return;
> >
> > This checks folio idle bit after so that's fine for anon to not succeed due
> > to contention.
> >
> > >
> > >       rmap_walk(folio, &rwc);
> > > -
> > > -     if (need_lock)
> > > -             folio_unlock(folio);
> > > +     folio_unlock(folio);
> > >  }
> > >
> > >  static ssize_t page_idle_bitmap_read(struct file *file, struct kobject *kobj,
> > > diff --git a/mm/rmap.c b/mm/rmap.c
> > > index 34333ae3bd80..90584f5da379 100644
> > > --- a/mm/rmap.c
> > > +++ b/mm/rmap.c
> > > @@ -489,17 +489,15 @@ void __init anon_vma_init(void)
> > >   * if there is a mapcount, we can dereference the anon_vma after observing
> > >   * those.
> > >   *
> > > - * NOTE: the caller should normally hold folio lock when calling this.  If
> > > - * not, the caller needs to double check the anon_vma didn't change after
> > > - * taking the anon_vma lock for either read or write (UFFDIO_MOVE can modify it
> > > - * concurrently without folio lock protection). See folio_lock_anon_vma_read()
> > > - * which has already covered that, and comment above remap_pages().
> > > + * NOTE: the caller should hold folio lock when calling this.
> > >   */
> > >  struct anon_vma *folio_get_anon_vma(const struct folio *folio)
> > >  {
> > >       struct anon_vma *anon_vma = NULL;
> > >       unsigned long anon_mapping;
> > >
> > > +     VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
> > > +
> > >       rcu_read_lock();
> > >       anon_mapping = (unsigned long)READ_ONCE(folio->mapping);
> > >       if ((anon_mapping & FOLIO_MAPPING_FLAGS) != FOLIO_MAPPING_ANON)
> > > @@ -546,7 +544,8 @@ struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio,
> > >       struct anon_vma *root_anon_vma;
> > >       unsigned long anon_mapping;
> > >
> > > -retry:
> > > +     VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
> > > +
> > >       rcu_read_lock();
> > >       anon_mapping = (unsigned long)READ_ONCE(folio->mapping);
> > >       if ((anon_mapping & FOLIO_MAPPING_FLAGS) != FOLIO_MAPPING_ANON)
> > > @@ -557,17 +556,6 @@ struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio,
> > >       anon_vma = (struct anon_vma *) (anon_mapping - FOLIO_MAPPING_ANON);
> > >       root_anon_vma = READ_ONCE(anon_vma->root);
> > >       if (down_read_trylock(&root_anon_vma->rwsem)) {
> > > -             /*
> > > -              * folio_move_anon_rmap() might have changed the anon_vma as we
> > > -              * might not hold the folio lock here.
> > > -              */
> > > -             if (unlikely((unsigned long)READ_ONCE(folio->mapping) !=
> > > -                          anon_mapping)) {
> > > -                     up_read(&root_anon_vma->rwsem);
> > > -                     rcu_read_unlock();
> > > -                     goto retry;
> > > -             }
> > > -
> > >               /*
> > >                * If the folio is still mapped, then this anon_vma is still
> > >                * its anon_vma, and holding the mutex ensures that it will
> > > @@ -602,18 +590,6 @@ struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio,
> > >       rcu_read_unlock();
> > >       anon_vma_lock_read(anon_vma);
> > >
> > > -     /*
> > > -      * folio_move_anon_rmap() might have changed the anon_vma as we might
> > > -      * not hold the folio lock here.
> > > -      */
> > > -     if (unlikely((unsigned long)READ_ONCE(folio->mapping) !=
> > > -                  anon_mapping)) {
> > > -             anon_vma_unlock_read(anon_vma);
> > > -             put_anon_vma(anon_vma);
> > > -             anon_vma = NULL;
> > > -             goto retry;
> > > -     }
> > > -
> > >       if (atomic_dec_and_test(&anon_vma->refcount)) {
> > >               /*
> > >                * Oops, we held the last refcount, release the lock
> > > @@ -1005,7 +981,7 @@ int folio_referenced(struct folio *folio, int is_locked,
> > >       if (!folio_raw_mapping(folio))
> > >               return 0;
> > >
> > > -     if (!is_locked && (!folio_test_anon(folio) || folio_test_ksm(folio))) {
> > > +     if (!is_locked) {
> > >               we_locked = folio_trylock(folio);
> > >               if (!we_locked)
> > >                       return 1;
> > > @@ -2815,6 +2791,12 @@ static void rmap_walk_anon(struct folio *folio,
> > >       pgoff_t pgoff_start, pgoff_end;
> > >       struct anon_vma_chain *avc;
> > >
> > > +     /*
> > > +      * The folio lock ensures that folio->mapping can't be changed under us
> > > +      * to an anon_vma with different root.
> > > +      */
> > > +     VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
> > > +
> > >       if (locked) {
> > >               anon_vma = folio_anon_vma(folio);
> > >               /* anon_vma disappear under us? */
> > > --
> > > 2.51.0.384.g4c02a37b29-goog
> > >
> > >
> >
> > All of the above actual changes to the locking logic looks ok to me though.
> Awesome! :)
>

:) we're nearly there on this... now you, David and I have gone through
this in detail I'm confident this is actually a really quite nice
improvement in general and by unravelling some of the locking hell we might
actually get some other benefits out of this (as referenced by Barry) :)

Cheers, Lorenzo

next prev parent reply	other threads:[~2025-09-19  9:59 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-18  5:51 [PATCH 0/2] Improve UFFDIO_MOVE scalability by removing anon_vma lock Lokesh Gidra
2025-09-18  5:51 ` [PATCH 1/2] mm: always call rmap_walk() on locked folios Lokesh Gidra
2025-09-18 11:57   ` Lorenzo Stoakes
2025-09-19  5:45     ` Lokesh Gidra
2025-09-19  9:59       ` Lorenzo Stoakes [this message]
2025-11-03 14:58       ` Lorenzo Stoakes
2025-11-03 15:46         ` Lokesh Gidra
2025-11-03 16:38           ` Lorenzo Stoakes
2025-09-18 12:15   ` David Hildenbrand
2025-09-19  6:09     ` Lokesh Gidra
2025-09-24 10:00       ` David Hildenbrand
2025-09-24 19:17         ` Lokesh Gidra
2025-09-25 11:06           ` David Hildenbrand
2025-10-02  6:46             ` Lokesh Gidra
2025-10-02  7:22               ` David Hildenbrand
2025-10-02  7:48                 ` Lokesh Gidra
2025-10-03 23:02                 ` Peter Xu
2025-10-06  6:43                   ` David Hildenbrand
2025-10-06 19:49                     ` Peter Xu
2025-10-06 20:02                       ` David Hildenbrand
2025-10-06 20:50                         ` Peter Xu
2025-09-18  5:51 ` [PATCH 2/2] mm/userfaultfd: don't lock anon_vma when performing UFFDIO_MOVE Lokesh Gidra
2025-09-18 12:38   ` Lorenzo Stoakes
2025-09-19  6:30     ` Lokesh Gidra
2025-09-19  9:57       ` Lorenzo Stoakes
2025-09-19 18:34         ` Lokesh Gidra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=67875b3a-69b0-4fe6-8e37-6289568e7921@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=david@redhat.com \
    --cc=harry.yoo@oracle.com \
    --cc=jannh@google.com \
    --cc=kaleshsingh@google.com \
    --cc=linux-mm@kvack.org \
    --cc=lokeshgidra@google.com \
    --cc=ngeoffray@google.com \
    --cc=peterx@redhat.com \
    --cc=sj@kernel.org \
    --cc=surenb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox