* [PATCH 0/2] move_pages_pte() fixes
@ 2025-02-26 18:55 Suren Baghdasaryan
2025-02-26 18:55 ` [PATCH 1/2] userfaultfd: do not block on locking a large folio with raised refcount Suren Baghdasaryan
2025-02-26 18:55 ` [PATCH 2/2] userfaultfd: fix PTE unmapping stack-allocated PTE copies Suren Baghdasaryan
0 siblings, 2 replies; 5+ messages in thread
From: Suren Baghdasaryan @ 2025-02-26 18:55 UTC (permalink / raw)
To: akpm
Cc: lokeshgidra, aarcange, 21cnbao, v-songbaohua, david, peterx,
willy, Liam.Howlett, lorenzo.stoakes, hughd, jannh, kaleshsingh,
surenb, linux-mm, linux-kernel, stable
Patchset bundles two *unrelated* fixes in move_pages_pte because otherwise
they would create a merge conflict. The first fix which was posted before
at [1] fixes a livelock issue. The second change corrects the use of PTEs
when unmapping them.
The patchset applies cleanly over mm-hotfixes-unstable which contains
Barry's fix [2] that changes related code.
[1] https://lore.kernel.org/all/20250225204613.2316092-1-surenb@google.com/
[2] https://lore.kernel.org/all/20250226003234.0B98FC4CEDD@smtp.kernel.org/
Suren Baghdasaryan (2):
userfaultfd: do not block on locking a large folio with raised
refcount
userfaultfd: fix PTE unmapping stack-allocated PTE copies
mm/userfaultfd.c | 37 ++++++++++++++++++++++++++-----------
1 file changed, 26 insertions(+), 11 deletions(-)
base-commit: a88b5ef577dd7ddb8606ef233c0634f05e884d4a
--
2.48.1.658.g4767266eb4-goog
^ permalink raw reply [flat|nested] 5+ messages in thread* [PATCH 1/2] userfaultfd: do not block on locking a large folio with raised refcount 2025-02-26 18:55 [PATCH 0/2] move_pages_pte() fixes Suren Baghdasaryan @ 2025-02-26 18:55 ` Suren Baghdasaryan 2025-02-26 20:23 ` Liam R. Howlett 2025-02-26 18:55 ` [PATCH 2/2] userfaultfd: fix PTE unmapping stack-allocated PTE copies Suren Baghdasaryan 1 sibling, 1 reply; 5+ messages in thread From: Suren Baghdasaryan @ 2025-02-26 18:55 UTC (permalink / raw) To: akpm Cc: lokeshgidra, aarcange, 21cnbao, v-songbaohua, david, peterx, willy, Liam.Howlett, lorenzo.stoakes, hughd, jannh, kaleshsingh, surenb, linux-mm, linux-kernel, stable Lokesh recently raised an issue about UFFDIO_MOVE getting into a deadlock state when it goes into split_folio() with raised folio refcount. split_folio() expects the reference count to be exactly mapcount + num_pages_in_folio + 1 (see can_split_folio()) and fails with EAGAIN otherwise. If multiple processes are trying to move the same large folio, they raise the refcount (all tasks succeed in that) then one of them succeeds in locking the folio, while others will block in folio_lock() while keeping the refcount raised. The winner of this race will proceed with calling split_folio() and will fail returning EAGAIN to the caller and unlocking the folio. The next competing process will get the folio locked and will go through the same flow. In the meantime the original winner will be retried and will block in folio_lock(), getting into the queue of waiting processes only to repeat the same path. All this results in a livelock. An easy fix would be to avoid waiting for the folio lock while holding folio refcount, similar to madvise_free_huge_pmd() where folio lock is acquired before raising the folio refcount. Since we lock and take a refcount of the folio while holding the PTE lock, changing the order of these operations should not break anything. Modify move_pages_pte() to try locking the folio first and if that fails and the folio is large then return EAGAIN without touching the folio refcount. If the folio is single-page then split_folio() is not called, so we don't have this issue. Lokesh has a reproducer [1] and I verified that this change fixes the issue. [1] https://github.com/lokeshgidra/uffd_move_ioctl_deadlock Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI") Reported-by: Lokesh Gidra <lokeshgidra@google.com> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: stable@vger.kernel.org --- Note this patch is v2 of [2] but I did not bump up the version because now it's part of the patchset which is at its v1. Hopefully that's not too confusing. Changes since v1 [2]: - Rebased over mm-hotfixes-unstable to avoid merge conflicts with [3] - Added Reviewed-by, per Peter Xu - Added a note about PTL lock in the changelog, per Liam R. Howlett - CC'ed stable [2] https://lore.kernel.org/all/20250225204613.2316092-1-surenb@google.com/ [3] https://lore.kernel.org/all/20250226003234.0B98FC4CEDD@smtp.kernel.org/ mm/userfaultfd.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 8eae4ea3cafd..e0f1e38ac5d8 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -1250,6 +1250,7 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, */ if (!src_folio) { struct folio *folio; + bool locked; /* * Pin the page while holding the lock to be sure the @@ -1269,12 +1270,26 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, goto out; } + locked = folio_trylock(folio); + /* + * We avoid waiting for folio lock with a raised refcount + * for large folios because extra refcounts will result in + * split_folio() failing later and retrying. If multiple + * tasks are trying to move a large folio we can end + * livelocking. + */ + if (!locked && folio_test_large(folio)) { + spin_unlock(src_ptl); + err = -EAGAIN; + goto out; + } + folio_get(folio); src_folio = folio; src_folio_pte = orig_src_pte; spin_unlock(src_ptl); - if (!folio_trylock(src_folio)) { + if (!locked) { pte_unmap(&orig_src_pte); pte_unmap(&orig_dst_pte); src_pte = dst_pte = NULL; -- 2.48.1.658.g4767266eb4-goog ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] userfaultfd: do not block on locking a large folio with raised refcount 2025-02-26 18:55 ` [PATCH 1/2] userfaultfd: do not block on locking a large folio with raised refcount Suren Baghdasaryan @ 2025-02-26 20:23 ` Liam R. Howlett 0 siblings, 0 replies; 5+ messages in thread From: Liam R. Howlett @ 2025-02-26 20:23 UTC (permalink / raw) To: Suren Baghdasaryan Cc: akpm, lokeshgidra, aarcange, 21cnbao, v-songbaohua, david, peterx, willy, lorenzo.stoakes, hughd, jannh, kaleshsingh, linux-mm, linux-kernel, stable * Suren Baghdasaryan <surenb@google.com> [250226 13:55]: > Lokesh recently raised an issue about UFFDIO_MOVE getting into a deadlock > state when it goes into split_folio() with raised folio refcount. > split_folio() expects the reference count to be exactly > mapcount + num_pages_in_folio + 1 (see can_split_folio()) and fails with > EAGAIN otherwise. If multiple processes are trying to move the same > large folio, they raise the refcount (all tasks succeed in that) then > one of them succeeds in locking the folio, while others will block in > folio_lock() while keeping the refcount raised. The winner of this > race will proceed with calling split_folio() and will fail returning > EAGAIN to the caller and unlocking the folio. The next competing process > will get the folio locked and will go through the same flow. In the > meantime the original winner will be retried and will block in > folio_lock(), getting into the queue of waiting processes only to repeat > the same path. All this results in a livelock. > An easy fix would be to avoid waiting for the folio lock while holding > folio refcount, similar to madvise_free_huge_pmd() where folio lock is > acquired before raising the folio refcount. Since we lock and take a > refcount of the folio while holding the PTE lock, changing the order of > these operations should not break anything. > Modify move_pages_pte() to try locking the folio first and if that fails > and the folio is large then return EAGAIN without touching the folio > refcount. If the folio is single-page then split_folio() is not called, > so we don't have this issue. > Lokesh has a reproducer [1] and I verified that this change fixes the > issue. > > [1] https://github.com/lokeshgidra/uffd_move_ioctl_deadlock > > Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI") > Reported-by: Lokesh Gidra <lokeshgidra@google.com> > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > Reviewed-by: Peter Xu <peterx@redhat.com> > Cc: stable@vger.kernel.org Acked-by: Liam R. Howlett <Liam.Howlett@Oracle.com> > --- > Note this patch is v2 of [2] but I did not bump up the version because now > it's part of the patchset which is at its v1. Hopefully that's not too > confusing. > > Changes since v1 [2]: > - Rebased over mm-hotfixes-unstable to avoid merge conflicts with [3] > - Added Reviewed-by, per Peter Xu > - Added a note about PTL lock in the changelog, per Liam R. Howlett > - CC'ed stable > > [2] https://lore.kernel.org/all/20250225204613.2316092-1-surenb@google.com/ > [3] https://lore.kernel.org/all/20250226003234.0B98FC4CEDD@smtp.kernel.org/ > > mm/userfaultfd.c | 17 ++++++++++++++++- > 1 file changed, 16 insertions(+), 1 deletion(-) > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > index 8eae4ea3cafd..e0f1e38ac5d8 100644 > --- a/mm/userfaultfd.c > +++ b/mm/userfaultfd.c > @@ -1250,6 +1250,7 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, > */ > if (!src_folio) { > struct folio *folio; > + bool locked; > > /* > * Pin the page while holding the lock to be sure the > @@ -1269,12 +1270,26 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, > goto out; > } > > + locked = folio_trylock(folio); > + /* > + * We avoid waiting for folio lock with a raised refcount > + * for large folios because extra refcounts will result in > + * split_folio() failing later and retrying. If multiple > + * tasks are trying to move a large folio we can end > + * livelocking. > + */ > + if (!locked && folio_test_large(folio)) { > + spin_unlock(src_ptl); > + err = -EAGAIN; > + goto out; > + } > + > folio_get(folio); > src_folio = folio; > src_folio_pte = orig_src_pte; > spin_unlock(src_ptl); > > - if (!folio_trylock(src_folio)) { > + if (!locked) { > pte_unmap(&orig_src_pte); > pte_unmap(&orig_dst_pte); > src_pte = dst_pte = NULL; > -- > 2.48.1.658.g4767266eb4-goog > ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 2/2] userfaultfd: fix PTE unmapping stack-allocated PTE copies 2025-02-26 18:55 [PATCH 0/2] move_pages_pte() fixes Suren Baghdasaryan 2025-02-26 18:55 ` [PATCH 1/2] userfaultfd: do not block on locking a large folio with raised refcount Suren Baghdasaryan @ 2025-02-26 18:55 ` Suren Baghdasaryan 2025-02-26 20:43 ` Peter Xu 1 sibling, 1 reply; 5+ messages in thread From: Suren Baghdasaryan @ 2025-02-26 18:55 UTC (permalink / raw) To: akpm Cc: lokeshgidra, aarcange, 21cnbao, v-songbaohua, david, peterx, willy, Liam.Howlett, lorenzo.stoakes, hughd, jannh, kaleshsingh, surenb, linux-mm, linux-kernel, stable Current implementation of move_pages_pte() copies source and destination PTEs in order to detect concurrent changes to PTEs involved in the move. However these copies are also used to unmap the PTEs, which will fail if CONFIG_HIGHPTE is enabled because the copies are allocated on the stack. Fix this by using the actual PTEs which were kmap()ed. Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI") Reported-by: Peter Xu <peterx@redhat.com> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: stable@vger.kernel.org --- mm/userfaultfd.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index e0f1e38ac5d8..dda1c9a3662a 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -1290,8 +1290,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, spin_unlock(src_ptl); if (!locked) { - pte_unmap(&orig_src_pte); - pte_unmap(&orig_dst_pte); + pte_unmap(src_pte); + pte_unmap(dst_pte); src_pte = dst_pte = NULL; /* now we can block and wait */ folio_lock(src_folio); @@ -1307,8 +1307,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, /* at this point we have src_folio locked */ if (folio_test_large(src_folio)) { /* split_folio() can block */ - pte_unmap(&orig_src_pte); - pte_unmap(&orig_dst_pte); + pte_unmap(src_pte); + pte_unmap(dst_pte); src_pte = dst_pte = NULL; err = split_folio(src_folio); if (err) @@ -1333,8 +1333,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, goto out; } if (!anon_vma_trylock_write(src_anon_vma)) { - pte_unmap(&orig_src_pte); - pte_unmap(&orig_dst_pte); + pte_unmap(src_pte); + pte_unmap(dst_pte); src_pte = dst_pte = NULL; /* now we can block and wait */ anon_vma_lock_write(src_anon_vma); @@ -1352,8 +1352,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, entry = pte_to_swp_entry(orig_src_pte); if (non_swap_entry(entry)) { if (is_migration_entry(entry)) { - pte_unmap(&orig_src_pte); - pte_unmap(&orig_dst_pte); + pte_unmap(src_pte); + pte_unmap(dst_pte); src_pte = dst_pte = NULL; migration_entry_wait(mm, src_pmd, src_addr); err = -EAGAIN; @@ -1396,8 +1396,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, src_folio = folio; src_folio_pte = orig_src_pte; if (!folio_trylock(src_folio)) { - pte_unmap(&orig_src_pte); - pte_unmap(&orig_dst_pte); + pte_unmap(src_pte); + pte_unmap(dst_pte); src_pte = dst_pte = NULL; /* now we can block and wait */ folio_lock(src_folio); -- 2.48.1.658.g4767266eb4-goog ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] userfaultfd: fix PTE unmapping stack-allocated PTE copies 2025-02-26 18:55 ` [PATCH 2/2] userfaultfd: fix PTE unmapping stack-allocated PTE copies Suren Baghdasaryan @ 2025-02-26 20:43 ` Peter Xu 0 siblings, 0 replies; 5+ messages in thread From: Peter Xu @ 2025-02-26 20:43 UTC (permalink / raw) To: Suren Baghdasaryan Cc: akpm, lokeshgidra, aarcange, 21cnbao, v-songbaohua, david, willy, Liam.Howlett, lorenzo.stoakes, hughd, jannh, kaleshsingh, linux-mm, linux-kernel, stable On Wed, Feb 26, 2025 at 10:55:09AM -0800, Suren Baghdasaryan wrote: > Current implementation of move_pages_pte() copies source and destination > PTEs in order to detect concurrent changes to PTEs involved in the move. > However these copies are also used to unmap the PTEs, which will fail if > CONFIG_HIGHPTE is enabled because the copies are allocated on the stack. > Fix this by using the actual PTEs which were kmap()ed. > > Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI") > Reported-by: Peter Xu <peterx@redhat.com> > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > Cc: stable@vger.kernel.org Reviewed-by: Peter Xu <peterx@redhat.com> -- Peter Xu ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-02-26 20:43 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-02-26 18:55 [PATCH 0/2] move_pages_pte() fixes Suren Baghdasaryan 2025-02-26 18:55 ` [PATCH 1/2] userfaultfd: do not block on locking a large folio with raised refcount Suren Baghdasaryan 2025-02-26 20:23 ` Liam R. Howlett 2025-02-26 18:55 ` [PATCH 2/2] userfaultfd: fix PTE unmapping stack-allocated PTE copies Suren Baghdasaryan 2025-02-26 20:43 ` Peter Xu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox