linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lokesh Gidra <lokeshgidra@google.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
	kaleshsingh@google.com,  ngeoffray@google.com, jannh@google.com,
	David Hildenbrand <david@redhat.com>,
	 Peter Xu <peterx@redhat.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Barry Song <baohua@kernel.org>
Subject: Re: [PATCH 2/2] mm/userfaultfd: don't lock anon_vma when performing UFFDIO_MOVE
Date: Fri, 19 Sep 2025 11:34:01 -0700	[thread overview]
Message-ID: <CA+EESO7b0HvYCDTqPdhe5Oq5xo1YdGQjW=pCRzyX+8Yi0dfJGQ@mail.gmail.com> (raw)
In-Reply-To: <e8fcbb82-9029-456f-a5c1-eb5cf4b05ba3@lucifer.local>

On Fri, Sep 19, 2025 at 2:58 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Thu, Sep 18, 2025 at 11:30:48PM -0700, Lokesh Gidra wrote:
> > On Thu, Sep 18, 2025 at 5:38 AM Lorenzo Stoakes
> > <lorenzo.stoakes@oracle.com> wrote:
> > >
> > > On Wed, Sep 17, 2025 at 10:51:35PM -0700, Lokesh Gidra wrote:
> > > > Now that rmap_walk() is guaranteed to be called with the folio lock
> > > > held, we can stop serializing on the src VMA anon_vma lock when moving
> > > > an exclusive folio from a src VMA to a dst VMA in UFFDIO_MOVE ioctl.
> > > >
> > > > When moving a folio, we modify folio->mapping through
> > > > folio_move_anon_rmap() and adjust folio->index accordingly. Doing that
> > > > while we could have concurrent RMAP walks would be dangerous. Therefore,
> > > > to avoid that, we had to acquire anon_vma of src VMA in write-mode. That
> > > > meant that when multiple threads called UFFDIO_MOVE concurrently on
> > > > distinct pages of the same src VMA, they would serialize on it, hurting
> > > > scalability.
> > > >
> > > > In addition to avoiding the scalability bottleneck, this patch also
> > > > simplifies the complicated lock dance that UFFDIO_MOVE has to go through
> > > > between RCU, folio-lock, ptl, and anon_vma.
> > > >
> > > > folio_move_anon_rmap() already enforces that the folio is locked. So
> > > > when we have the folio locked we can no longer race with concurrent
> > > > rmap_walk() as used by folio_referenced() and hence the anon_vma lock
> > >
> > > And other rmap callers right?
> > Right. Will fix it in the next version.
>
> Thanks!
>
> > >
> > > > is no longer required.
> > > >
> > > > Note that this handling is now the same as for other
> > > > folio_move_anon_rmap() users that also do not hold the anon_vma lock --
> > > > namely COW reuse handling. These users never required the anon_vma lock
> > > > as they are only moving the anon VMA closer to the anon_vma leaf of the
> > > > VMA, for example, from an anon_vma root to a leaf of that root. rmap
> > > > walks were always able to tolerate that scenario.
> > >
> > > Which users?
> >
> > The COW reusers, namely:
> > do_wp_page()->wp_can_reuse_anon_folio()
> > do_huge_pmd_wp_page()
> > hugetlb_wp()
>
> Right let's put this in the commit message is what I mean :)
>
> >
> > >
> > > >
> > > > CC: David Hildenbrand <david@redhat.com>
> > > > CC: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > > > CC: Peter Xu <peterx@redhat.com>
> > > > CC: Suren Baghdasaryan <surenb@google.com>
> > > > CC: Barry Song <baohua@kernel.org>
> > > > Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
> > > > ---
> > > >  mm/huge_memory.c | 22 +----------------
> > > >  mm/userfaultfd.c | 62 +++++++++---------------------------------------
> > > >  2 files changed, 12 insertions(+), 72 deletions(-)
> > > >
> > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > > index 5acca24bbabb..f444c142a8be 100644
> > > > --- a/mm/huge_memory.c
> > > > +++ b/mm/huge_memory.c
> > > > @@ -2533,7 +2533,6 @@ int move_pages_huge_pmd(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, pm
> > > >       pmd_t _dst_pmd, src_pmdval;
> > > >       struct page *src_page;
> > > >       struct folio *src_folio;
> > > > -     struct anon_vma *src_anon_vma;
> > > >       spinlock_t *src_ptl, *dst_ptl;
> > > >       pgtable_t src_pgtable;
> > > >       struct mmu_notifier_range range;
> > > > @@ -2582,23 +2581,9 @@ int move_pages_huge_pmd(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, pm
> > > >                               src_addr + HPAGE_PMD_SIZE);
> > > >       mmu_notifier_invalidate_range_start(&range);
> > > >
> > > > -     if (src_folio) {
> > > > +     if (src_folio)
> > > >               folio_lock(src_folio);
> > > >
> > > > -             /*
> > > > -              * split_huge_page walks the anon_vma chain without the page
> > > > -              * lock. Serialize against it with the anon_vma lock, the page
> > > > -              * lock is not enough.
> > > > -              */
> > > > -             src_anon_vma = folio_get_anon_vma(src_folio);
> > > > -             if (!src_anon_vma) {
> > > > -                     err = -EAGAIN;
> > > > -                     goto unlock_folio;
> > > > -             }
> > > > -             anon_vma_lock_write(src_anon_vma);
> > > > -     } else
> > > > -             src_anon_vma = NULL;
> > > > -
> > >
> > > Hmm this seems an odd thing to include in the uffd change. Why not just include
> > > it in the last commit or as a separate commit?
>
> You're changing move_pages_huge_pmd() here in a change that's about the uffd
> change, seems unrelated no?

This function is a part of UFFDIO_MOVE only :) It handles the
huge-page case of UFFDIO_MOVE and there are no other callers. But let
me know if you would like this in a separate patch.
>
> >
> > I'm not sure I follow. What am I including here?
> >
> > BTW, IMHO, the comment is wrong here. folio split code already
> > acquires folio lock. The anon_vma lock is required here for the same
> > reason as non-large page case - to avoid concurrent rmap walks.
>
> This is called via split_huge_page() used by KMS and memory failure, not the
> usual folio split logic afaict.
>
> But those callers all take the folio look afaict :)
Sorry, yes that's what I meant. The real issue here also is
rmap_walk() because of which anon_vma lock was required and not what
is mentioned in the comment.
>
> So yeah the comment is wrong it seems!


      reply	other threads:[~2025-09-19 18:34 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-18  5:51 [PATCH 0/2] Improve UFFDIO_MOVE scalability by removing anon_vma lock Lokesh Gidra
2025-09-18  5:51 ` [PATCH 1/2] mm: always call rmap_walk() on locked folios Lokesh Gidra
2025-09-18 11:57   ` Lorenzo Stoakes
2025-09-19  5:45     ` Lokesh Gidra
2025-09-19  9:59       ` Lorenzo Stoakes
2025-11-03 14:58       ` Lorenzo Stoakes
2025-11-03 15:46         ` Lokesh Gidra
2025-11-03 16:38           ` Lorenzo Stoakes
2025-09-18 12:15   ` David Hildenbrand
2025-09-19  6:09     ` Lokesh Gidra
2025-09-24 10:00       ` David Hildenbrand
2025-09-24 19:17         ` Lokesh Gidra
2025-09-25 11:06           ` David Hildenbrand
2025-10-02  6:46             ` Lokesh Gidra
2025-10-02  7:22               ` David Hildenbrand
2025-10-02  7:48                 ` Lokesh Gidra
2025-10-03 23:02                 ` Peter Xu
2025-10-06  6:43                   ` David Hildenbrand
2025-10-06 19:49                     ` Peter Xu
2025-10-06 20:02                       ` David Hildenbrand
2025-10-06 20:50                         ` Peter Xu
2025-09-18  5:51 ` [PATCH 2/2] mm/userfaultfd: don't lock anon_vma when performing UFFDIO_MOVE Lokesh Gidra
2025-09-18 12:38   ` Lorenzo Stoakes
2025-09-19  6:30     ` Lokesh Gidra
2025-09-19  9:57       ` Lorenzo Stoakes
2025-09-19 18:34         ` Lokesh Gidra [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+EESO7b0HvYCDTqPdhe5Oq5xo1YdGQjW=pCRzyX+8Yi0dfJGQ@mail.gmail.com' \
    --to=lokeshgidra@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=david@redhat.com \
    --cc=jannh@google.com \
    --cc=kaleshsingh@google.com \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=ngeoffray@google.com \
    --cc=peterx@redhat.com \
    --cc=surenb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox