linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: lokeshgidra@google.com, aarcange@redhat.com, 21cnbao@gmail.com,
	 v-songbaohua@oppo.com, david@redhat.com, peterx@redhat.com,
	 willy@infradead.org, Liam.Howlett@oracle.com,
	lorenzo.stoakes@oracle.com,  hughd@google.com, jannh@google.com,
	kaleshsingh@google.com, surenb@google.com,  linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org
Subject: [PATCH 1/2] userfaultfd: do not block on locking a large folio with raised refcount
Date: Wed, 26 Feb 2025 10:55:08 -0800	[thread overview]
Message-ID: <20250226185510.2732648-2-surenb@google.com> (raw)
In-Reply-To: <20250226185510.2732648-1-surenb@google.com>

Lokesh recently raised an issue about UFFDIO_MOVE getting into a deadlock
state when it goes into split_folio() with raised folio refcount.
split_folio() expects the reference count to be exactly
mapcount + num_pages_in_folio + 1 (see can_split_folio()) and fails with
EAGAIN otherwise. If multiple processes are trying to move the same
large folio, they raise the refcount (all tasks succeed in that) then
one of them succeeds in locking the folio, while others will block in
folio_lock() while keeping the refcount raised. The winner of this
race will proceed with calling split_folio() and will fail returning
EAGAIN to the caller and unlocking the folio. The next competing process
will get the folio locked and will go through the same flow. In the
meantime the original winner will be retried and will block in
folio_lock(), getting into the queue of waiting processes only to repeat
the same path. All this results in a livelock.
An easy fix would be to avoid waiting for the folio lock while holding
folio refcount, similar to madvise_free_huge_pmd() where folio lock is
acquired before raising the folio refcount. Since we lock and take a
refcount of the folio while holding the PTE lock, changing the order of
these operations should not break anything.
Modify move_pages_pte() to try locking the folio first and if that fails
and the folio is large then return EAGAIN without touching the folio
refcount. If the folio is single-page then split_folio() is not called,
so we don't have this issue.
Lokesh has a reproducer [1] and I verified that this change fixes the
issue.

[1] https://github.com/lokeshgidra/uffd_move_ioctl_deadlock

Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Reported-by: Lokesh Gidra <lokeshgidra@google.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: stable@vger.kernel.org
---
Note this patch is v2 of [2] but I did not bump up the version because now
it's part of the patchset which is at its v1. Hopefully that's not too
confusing.

Changes since v1 [2]:
- Rebased over mm-hotfixes-unstable to avoid merge conflicts with [3]
- Added Reviewed-by, per Peter Xu
- Added a note about PTL lock in the changelog, per Liam R. Howlett
- CC'ed stable

[2] https://lore.kernel.org/all/20250225204613.2316092-1-surenb@google.com/
[3] https://lore.kernel.org/all/20250226003234.0B98FC4CEDD@smtp.kernel.org/

 mm/userfaultfd.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 8eae4ea3cafd..e0f1e38ac5d8 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -1250,6 +1250,7 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
 		 */
 		if (!src_folio) {
 			struct folio *folio;
+			bool locked;
 
 			/*
 			 * Pin the page while holding the lock to be sure the
@@ -1269,12 +1270,26 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
 				goto out;
 			}
 
+			locked = folio_trylock(folio);
+			/*
+			 * We avoid waiting for folio lock with a raised refcount
+			 * for large folios because extra refcounts will result in
+			 * split_folio() failing later and retrying. If multiple
+			 * tasks are trying to move a large folio we can end
+			 * livelocking.
+			 */
+			if (!locked && folio_test_large(folio)) {
+				spin_unlock(src_ptl);
+				err = -EAGAIN;
+				goto out;
+			}
+
 			folio_get(folio);
 			src_folio = folio;
 			src_folio_pte = orig_src_pte;
 			spin_unlock(src_ptl);
 
-			if (!folio_trylock(src_folio)) {
+			if (!locked) {
 				pte_unmap(&orig_src_pte);
 				pte_unmap(&orig_dst_pte);
 				src_pte = dst_pte = NULL;
-- 
2.48.1.658.g4767266eb4-goog



  reply	other threads:[~2025-02-26 18:55 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-26 18:55 [PATCH 0/2] move_pages_pte() fixes Suren Baghdasaryan
2025-02-26 18:55 ` Suren Baghdasaryan [this message]
2025-02-26 20:23   ` [PATCH 1/2] userfaultfd: do not block on locking a large folio with raised refcount Liam R. Howlett
2025-02-26 18:55 ` [PATCH 2/2] userfaultfd: fix PTE unmapping stack-allocated PTE copies Suren Baghdasaryan
2025-02-26 20:43   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250226185510.2732648-2-surenb@google.com \
    --to=surenb@google.com \
    --cc=21cnbao@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=jannh@google.com \
    --cc=kaleshsingh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lokeshgidra@google.com \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=peterx@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=v-songbaohua@oppo.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox