From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B7FCC021B2 for ; Tue, 25 Feb 2025 20:56:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 41079280007; Tue, 25 Feb 2025 15:56:39 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3C0A9280002; Tue, 25 Feb 2025 15:56:39 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 287CE280007; Tue, 25 Feb 2025 15:56:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 0B97F280002 for ; Tue, 25 Feb 2025 15:56:39 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 3F7D051D1C for ; Tue, 25 Feb 2025 20:56:38 +0000 (UTC) X-FDA: 83159675676.06.E4F6B95 Received: from mail-qt1-f172.google.com (mail-qt1-f172.google.com [209.85.160.172]) by imf18.hostedemail.com (Postfix) with ESMTP id A98AE1C000B for ; Tue, 25 Feb 2025 20:56:35 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ou24x96f; spf=pass (imf18.hostedemail.com: domain of lokeshgidra@google.com designates 209.85.160.172 as permitted sender) smtp.mailfrom=lokeshgidra@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740516995; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BLQFei54lPc5IjpuxnIiS2CD+MZaB+VV8WFTCrdFhlY=; b=Sv7oENCJT30K5gFu4qAO/3PaBLoBXwXxb5SJl4mh1axtR/OjwscczX75RTuyrtbEPgimPY 0JVUCyGDBjwhmaORyn70cDPDPcDlxpdW86+3z+ypLw+ESiwMuPUl5l/YtCO4yfTeuD1qo/ oyxisn97qAzkeH4MZcW2AYE+5Fe8R14= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ou24x96f; spf=pass (imf18.hostedemail.com: domain of lokeshgidra@google.com designates 209.85.160.172 as permitted sender) smtp.mailfrom=lokeshgidra@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740516995; a=rsa-sha256; cv=none; b=TaONiktE046M4xWMhYXAbtpso0As1UQOVfqCyV+HgSwXnm1CkNDXo7AcrC9ossiGdirg57 pNvEDx7z57YKQShlyjaloyL9Odf5JmE/wObiha3QPJkUlSZWdaLdWuHl2N7nBlGLTDZgYZ jBbu6w2w1jeoG4OxMGCd/lyXBfMAuZw= Received: by mail-qt1-f172.google.com with SMTP id d75a77b69052e-471f1dd5b80so9241cf.1 for ; Tue, 25 Feb 2025 12:56:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1740516995; x=1741121795; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=BLQFei54lPc5IjpuxnIiS2CD+MZaB+VV8WFTCrdFhlY=; b=ou24x96fYomVrM11eXqWn5uXulxzsexsjOjBSUd7Ai+m05YM7kJFRXItdlq/SVwnTK Pi15uQRss8c6hfsPKszbkrnEzlIc8/cTb03Sr7RKOUV/6Q0hK1i+NIsFgOtbBLRdp80V JEUYtxYbJjy2GdbPmcDxFYJPnI3DcQSAsnVic8U0PWtsASv4hhJiBM2fYG6cayFzo/WX OIeRzNB0BFqRCkXm8CUwbJnPSvv8vRY9liJILEko3IRqniOIAAHetmd4/KUYu5upBqd3 P4Smcb9NIHVQR00IWdqRpm1I0J/SrJdSzySh374mvJ8BIY+YV4gwFftoiiR1Q3+FXGRQ RIKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740516995; x=1741121795; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BLQFei54lPc5IjpuxnIiS2CD+MZaB+VV8WFTCrdFhlY=; b=TabHXmZZAjvRxURo9C3P6ZzBIhaG+UCWlI6+QcyMnQdcX+ut2UXy0x49KmnReHb/bI SCaYTbdy+DiRG4dHHh/0VQ6y8t5Qd6x8s527CLEaVgVfrCFmMbuf/SphxXvzQfKr5sx/ uke8fuW7tMqDETNr1paMjFL4DIdWjkJSbEmag1dqd3UWU+Cjpnz7aRU+BAW0AW3IslK2 kNOMXnk0ILIspJjk48E1tCUC5QQs7NOz9FBN8JTTjgHBi0Qinys1bANQo8sBueWSZIn6 AvAkYdpwvbkA3c6DNl4MOVs/Gzhg4DvQ8qPnrFT1yWsP5RcTpsEgxxSj95zvb6AuiJgr CmhQ== X-Forwarded-Encrypted: i=1; AJvYcCWccQEY19ph/ZxIf25PmLYzbJX0YNqz9HlcBuz6oOuzuOEe//bVdg472jhLZBYk6cT7dB/2L9OzpQ==@kvack.org X-Gm-Message-State: AOJu0Yy/FrNXNSLo4w1iwE+KpJJr0z9m/2+dLTcNxurkAP3nySJkW2r2 PilNQhqwVQqS9KuhXn8eL52wnX6W74CIFQAqYDPHYEZ3iGyLVQCsziJwp7ycXjQ3i8nv0TV75Wv DzCVvMZBwlOe/FQx1jhGaHQHz1hX/wFg5DFTU X-Gm-Gg: ASbGnctwsosmFzyLwm1vAgUrcRFXT2dBybdNKpq7MJ4icI5uxrj+3MbLbCEuaCwf24n oyfmW7pxkHnOE18aNT26Ab1JULiVIHFoQA4bm0gjeP6SSLICPzKWRPI+vK3/N8f2243p4faroJs FDdr8NSIoS8lSbh0asAKZl5IbKyLchcGr4C24x X-Google-Smtp-Source: AGHT+IE2HpGiBeVmayZi6OAx9oa2M81VAty5D9F16fCVYjWasqS1C7etoYlS+u16+CJCZx2CWz01xf+uzvrq2nGkvAo= X-Received: by 2002:ac8:7d12:0:b0:471:f2dd:4184 with SMTP id d75a77b69052e-47376e80f47mr6818791cf.11.1740516994368; Tue, 25 Feb 2025 12:56:34 -0800 (PST) MIME-Version: 1.0 References: <20250225204613.2316092-1-surenb@google.com> In-Reply-To: <20250225204613.2316092-1-surenb@google.com> From: Lokesh Gidra Date: Tue, 25 Feb 2025 12:56:22 -0800 X-Gm-Features: AQ5f1Jqt0vylsPZEiL0s8BW5MqvPM88h0-uw_12NykS6cMFAJJ6oZ8OrgF9uyQI Message-ID: Subject: Re: [PATCH 1/1] userfaultfd: do not block on locking a large folio with raised refcount To: Suren Baghdasaryan Cc: akpm@linux-foundation.org, aarcange@redhat.com, 21cnbao@gmail.com, v-songbaohua@oppo.com, david@redhat.com, peterx@redhat.com, willy@infradead.org, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, hughd@google.com, jannh@google.com, kaleshsingh@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: A98AE1C000B X-Stat-Signature: dgj4njizzjw69kk6x113xn5x1wbc7j93 X-HE-Tag: 1740516995-803789 X-HE-Meta: U2FsdGVkX1+jZK76N6a9jU0wZGObSBHjxUvMcJt0kzJ3ZaQxSt9+YGKw5PySVk/PjsAETn+CVTZBCM+m+R/S3521DQOoFXI7k4YOweuXUM/TwlntVsKBx98r9oTf3iGjHzrLAMkDnnop2/Bmb0ikciwli9e/1Ys92txDf4w6KnSasZhqG1NAgWBiUeBk5Ai/et09kxsowsZpu3PooZ+3jb06yZjqZojDp70+oSpkiwFH//gl2vBheVkaQ4+zg6Gzn4tT4w90VPL3637l3I+YVf1FIABqCK6S+f0lB0j5H8CZuAJZiQJVjYMRB3QBk4ZF3foRFzoLV7gX8SOXXZkwvOzKW+2ac/oiqDWGVLsc5A1oACYAMKmOdTmjbh3GKUdYNffyxWDnQB+wjVtl/yTfHEt2f0hownCV48zwEXX9WCp/Ph/TAzNmo2Zq8WuMp/ISETiWo+Jy4N3XfuIAP6NEjAPEWduExoUjML7OXpqfDaEC8VfQJxwHweu74rpaSLq683fLn8AY7W3rVzQvznravJrOjKDXUXP9AIvqdgXkCVExwnL5bmhi1Fe8SvND4hqspfViL9wzrMXWCsEgqZgbM14yqMQ/EzlNgHM5HKet9YMQSff9QiBk+k9izyrV8EDZcHSjSXUIPM1zxitqk9K1b5uXT7vnbEvIFWaPk10IzLqQ5CwRte2Z8ILd7Nmi/0GnOxat4ieHsnnMQrnJh9JORNLglRvDAPIURt6W9j1sadL1qTtkKYR9QoT5fVJfgSiMH3AE5F4yu6GUtdlqNzOtsI2r8vsYSicz5G4UtFhcj9a5Cx0PWDvn4RBpo4hqomqB9+lu92w+FLSIIh5IU3Oy6GAQ+rtB5wop8wyYvBc1hkRbs1mAEu8WIncuQEnjs4/8NX5syRELPiJfkjoSL8Vc+QAJ6TUnzVWFgjYyMlHobp8n46oMTB/b820MGlwHIXP1rUscn/nFTDjUDgE52wd ZhLaEv6G agNHm9Xm2HHMH7inL54wirJN3xbtklnkRtAwRnbwccH4szfKuHXIfYCsN+PVSIU+sNAJFBoYpu/XR/brnUhYK7g3w0R70rnCsGhdhePYMKNg52qkHAUd0ydArcO45ukk+5wTaEIRxntN8HEnKtSKhC7daoL46LwCxMKA+QW08ksjCXCvmOEo2Ip1+R44sAeWjmTBF7DhwPWm60XXYf1ElI+sn/56n0q/lnsi5WEaUuxHa7uYI+HWI+lSSjA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000002, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Feb 25, 2025 at 12:46=E2=80=AFPM Suren Baghdasaryan wrote: > > Lokesh recently raised an issue about UFFDIO_MOVE getting into a deadlock > state when it goes into split_folio() with raised folio refcount. > split_folio() expects the reference count to be exactly > mapcount + num_pages_in_folio + 1 (see can_split_folio()) and fails with > EAGAIN otherwise. If multiple processes are trying to move the same > large folio, they raise the refcount (all tasks succeed in that) then > one of them succeeds in locking the folio, while others will block in > folio_lock() while keeping the refcount raised. The winner of this > race will proceed with calling split_folio() and will fail returning > EAGAIN to the caller and unlocking the folio. The next competing process > will get the folio locked and will go through the same flow. In the > meantime the original winner will be retried and will block in > folio_lock(), getting into the queue of waiting processes only to repeat > the same path. All this results in a livelock. > An easy fix would be to avoid waiting for the folio lock while holding > folio refcount, similar to madvise_free_huge_pmd() where folio lock is > acquired before raising the folio refcount. > Modify move_pages_pte() to try locking the folio first and if that fails > and the folio is large then return EAGAIN without touching the folio > refcount. If the folio is single-page then split_folio() is not called, > so we don't have this issue. > Lokesh has a reproducer [1] and I verified that this change fixes the > issue. > > [1] https://github.com/lokeshgidra/uffd_move_ioctl_deadlock > Thanks so much for fixing this, Suren. Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI") > Reported-by: Lokesh Gidra > Signed-off-by: Suren Baghdasaryan > --- > mm/userfaultfd.c | 17 ++++++++++++++++- > 1 file changed, 16 insertions(+), 1 deletion(-) > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > index 867898c4e30b..f17f8290c523 100644 > --- a/mm/userfaultfd.c > +++ b/mm/userfaultfd.c > @@ -1236,6 +1236,7 @@ static int move_pages_pte(struct mm_struct *mm, pmd= _t *dst_pmd, pmd_t *src_pmd, > */ > if (!src_folio) { > struct folio *folio; > + bool locked; > > /* > * Pin the page while holding the lock to be sure= the > @@ -1255,12 +1256,26 @@ static int move_pages_pte(struct mm_struct *mm, p= md_t *dst_pmd, pmd_t *src_pmd, > goto out; > } > > + locked =3D folio_trylock(folio); > + /* > + * We avoid waiting for folio lock with a raised = refcount > + * for large folios because extra refcounts will = result in > + * split_folio() failing later and retrying. If m= ultiple > + * tasks are trying to move a large folio we can = end > + * livelocking. > + */ > + if (!locked && folio_test_large(folio)) { > + spin_unlock(src_ptl); > + err =3D -EAGAIN; > + goto out; > + } > + > folio_get(folio); > src_folio =3D folio; > src_folio_pte =3D orig_src_pte; > spin_unlock(src_ptl); > > - if (!folio_trylock(src_folio)) { > + if (!locked) { > pte_unmap(&orig_src_pte); > pte_unmap(&orig_dst_pte); > src_pte =3D dst_pte =3D NULL; > > base-commit: 801d47bd96ce22acd43809bc09e004679f707c39 > -- > 2.48.1.658.g4767266eb4-goog >