From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6B43C52D7B for ; Tue, 13 Aug 2024 14:58:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3CCEB6B008C; Tue, 13 Aug 2024 10:58:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 37CC36B0092; Tue, 13 Aug 2024 10:58:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 244E76B0095; Tue, 13 Aug 2024 10:58:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 03F286B008C for ; Tue, 13 Aug 2024 10:58:30 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id A8CE91A0873 for ; Tue, 13 Aug 2024 14:58:30 +0000 (UTC) X-FDA: 82447528380.15.3C13D75 Received: from mail-ed1-f49.google.com (mail-ed1-f49.google.com [209.85.208.49]) by imf30.hostedemail.com (Postfix) with ESMTP id CD2B38002B for ; Tue, 13 Aug 2024 14:58:28 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=kfQ9zItK; spf=pass (imf30.hostedemail.com: domain of jannh@google.com designates 209.85.208.49 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723561073; a=rsa-sha256; cv=none; b=GfFPncM4tk7XVo/2AgPK11HztVzfP/eidhc3eX65/niiXVO0wlU15yyGIZE04jt9Emfc9P KrgtqIb1jEpfjv/h3eppxtlNAAkCtB9PdZUslcqy6Ai7XWHBQ2dI1JQDPTyRO1knVqAWIM 4tklTOWTnTx+cREyqHTS8VNUplqPYb8= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=kfQ9zItK; spf=pass (imf30.hostedemail.com: domain of jannh@google.com designates 209.85.208.49 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723561073; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Px8iqdxxWIK7rvwP9ZrwWvLe40dNsz+A9PyUAYGuJmc=; b=pzV7RT33fdfaT7ofdWxS8Dqp5B7O6JODVdhdfNZTHD622lCHweu5Qh4J6XIyp6+H3hYi0g t8Qa2BeipDaBVedUaTw3XDQ98P7BIzCSpRDQYu7vZngcoaMFEWVNGRW0yJfGJDR/xnPYhu B1cEfkXGc6XnHqa1E9Ihwjy+zYE+czw= Received: by mail-ed1-f49.google.com with SMTP id 4fb4d7f45d1cf-5a28b61b880so7333a12.1 for ; Tue, 13 Aug 2024 07:58:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1723561107; x=1724165907; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Px8iqdxxWIK7rvwP9ZrwWvLe40dNsz+A9PyUAYGuJmc=; b=kfQ9zItKUYR93CmFgJ9M7iDg1m/kIBamMaBH4wwHUW+9uGYzhgEp2vvmvPf45MX/Nk oKlkUtxTjw3ZWhN9mPTpmF0lqpoBfbeAQcRGz2faVNMCjRzfWvdfqzvq833munhusrY+ 43CTdj+c3uuQmhU1XkCmWRmp7qdNx7nLk7dUcAsdUviZEMUB07/TG2NCk/QOrIC9je7C rIlT8MhlOfa/ZT7kanlLzQl6Kv4Vwr+voueMfZnX+uIa/+S4/NLbjn1XNjEx8eXNDalS +ncmT2NUqQwDhT+tmxSugOdo1d3NfmxSE0MfjSZaHeFxRg1UbfupQCcO9jlFj4PXFhI6 7+mQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723561107; x=1724165907; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Px8iqdxxWIK7rvwP9ZrwWvLe40dNsz+A9PyUAYGuJmc=; b=ghpsjzhAqAqSK+q/4+azSt/YNvvN1PbBc+s3AYedWSCrEH8qvSjttTZF/bKuSYYcaY gpbrz+s5hit0XpaFQ4SHYLiAxGQKP2mdrngwvXq7DXR/pVXxxbSNfODsX2ycafRJhPd5 mJZKkb0OLbwxKS4lH9JXy/jp/g/X6INIlJjZnDraaeiil8XeCLUnEfaPYD30zxG18mJ+ 8ADx6tPoCaQLq1CqElrf0/iE6b8LH+Ij45NNiDtFYwoYrRd9rXH+mDFFLOkq/7gU6qTi JlOKZryqSkPIQtx+rmpxkHLI01mamUiJwHi23WJ0IG4bYnbkZcQDwr6UBJiAMlbfojvj /vBQ== X-Forwarded-Encrypted: i=1; AJvYcCXLKUOaxBKn1LV3jWCYoWgHijqySPESzHYnOJpcfJCys9UuzArm5+paoLYij0ODTD9dl3cUQWZXCdlPs0gqAwmaW/s= X-Gm-Message-State: AOJu0YwWu4sGC0HrW/WDDzeP54wYEs9W7XIM9l4Zk8GxHgzCk+/0vxEM nBLf6NBdjmqg0EACw/+aAzu8XaiV/zVC/7EqpeTCb2a/1pQTNNb2sN1glE3UthVaTkNKvwkvah8 09jKcqxkPZOrPKwlMZYHX5aIUQckq4XbM2z0X X-Google-Smtp-Source: AGHT+IGOsd8SzOXCFhPWeyAenAK6WO3rwmgPcyiNCHk3I9lmR9JbamLzSgDR2g8gJ9VBCKK9WwmsBSAZFt+lugxKjpI= X-Received: by 2002:a05:6402:3493:b0:5b4:df4a:48bb with SMTP id 4fb4d7f45d1cf-5bd471c09a5mr98555a12.0.1723561106621; Tue, 13 Aug 2024 07:58:26 -0700 (PDT) MIME-Version: 1.0 References: <20240812-uffd-thp-flip-fix-v1-0-4fc1db7ccdd0@google.com> <20240812-uffd-thp-flip-fix-v1-1-4fc1db7ccdd0@google.com> <59bf3c2e-d58b-41af-ab10-3e631d802229@bytedance.com> In-Reply-To: <59bf3c2e-d58b-41af-ab10-3e631d802229@bytedance.com> From: Jann Horn Date: Tue, 13 Aug 2024 16:57:48 +0200 Message-ID: Subject: Re: [PATCH 1/2] userfaultfd: Fix pmd_trans_huge() recheck race To: Qi Zheng Cc: Andrew Morton , Pavel Emelianov , Andrea Arcangeli , Hugh Dickins , linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: y18q5qhc343ypjr585ezk4onkdou4oak X-Rspamd-Queue-Id: CD2B38002B X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1723561108-527553 X-HE-Meta: U2FsdGVkX1/3a0pEcTUvSuD5fnf20OKK+SLkTSvNybuoPbjvlyFWA7mO1gDt9bLt6xUOxjPGc0f5RDeJYNRUFaVwunOAWsQREDfOHhE9LVMiUxNsBiHi52JUx3CeHdgSPddvSwUj1guC7sEr4Adt5ty23jQNp/P48fMc2VcaCO4VgyzKdIiFHNFbHEFyf3f5afiFbUW+2KC4nv+VFcOUABxxtuMrDjG54UtWxrREOuxOLJ+lhog2A9+OtkDvomSpgyjcilJ/lWfDgRM3xe0+tFzFcw7/SXq8YSHYpNgik3wprbK99OJr/U8E03Vo52/m04CUQpMEWInNRKcDVUOWNf7s+MTt0uTb4xgIGTsfqguJkaLKENumJDWr4D/dKBBylmJCXKLDnKzzu8wvGEBPbNxy2ZgLuTI6nIIWg9ozrxb2E1q0/GQHPrBJdIL7u6gH30+w3erAUcTWIAap5KOlQ+LPsWPos4z0iXpAeYtgdeGYkkDcphu+I80MDSUfpZBQnXUYMnoUAVrZbpNLEEeLkwDhEjcp/wT6mkBWVMj7XhqL35h4/2YPmOj0K1WQxKXiQloprhEpigNCNw5Ttm1xn4ntHuEaW4QWGDR414gut8ec9A7jHSNkMtjMP01lo5z2U7hdy/26CbvIwOm2RCumDw6lDXvmxT8dqHwZTSTSKKDe+AJBOzpf3GF3o56UvCS/dRw4K2avquIyDDwNLQV4ejh7SS45Lq9FsRZ7EHTYMBf1ndlIu+cydh8gIJDivTYfLKKGWaX3+YYgl24dsZrILnlE1JtQg9eGAqAPHLcSf7lDp6cyhfpJ+juPwqPS0Kp+T2QwrEyLbi7jan3dOQZy1uJW7z9dG4LHychDmsDUiEjHWp0o5cObZXtthczPHqW4eMs1697joS2CjtPKwB/E2pNlGw9dITxVtVgeOpvzo32aDdVySKHbclqJoQq4j/hkw17LkQG43h3cOhbGlAi 25Topiws yhuOnReFcUW2wL3Oj60tT0z9cHsUpcC1K2JrFUf1jGO+bksClaoc1BTz+JyNdFoSRQMu/4k4aBl7N1qdpXqlKaNPdC8wV69jleHADRHMZSpGzL3jusJ00eufrhCuBE2az4zPkPPStEPfzBM9A+MUIRNzTsdas0wcgEU5maLDrEndZB+hAsaCPd7vr9pdOurBkuCq94b+oaqtbL822luScs6xSp0rY3UR2OygG6nWb1mf6kYcmyhQj8cPZPjK5LGAw0hi3Hk+8XrazVeJoOClYnZETLNatP4RyNOoAD8OU2KtGrLjad37qVrDOn/3s+eW8GtI96s+m+fdbPivMDs1GO4ajxL74re9AroPC X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Aug 13, 2024 at 8:19=E2=80=AFAM Qi Zheng wrote: > > Hi Jann, > > On 2024/8/13 00:42, Jann Horn wrote: > > The following race can occur: > > > > mfill_atomic other thread > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D > > > > pmdp_get_lockless() [reads none pmd] > > > > > > > > __pte_alloc [no-op] > > > > > > BUG_ON(pmd_none(*dst_pmd)) > > > > I have experimentally verified this in a kernel with extra mdelay() cal= ls; > > the BUG_ON(pmd_none(*dst_pmd)) triggers. > > > > On kernels newer than commit 0d940a9b270b ("mm/pgtable: allow > > pte_offset_map[_lock]() to fail"), this can't lead to anything worse th= an > > a BUG_ON(), since the page table access helpers are actually designed t= o > > deal with page tables concurrently disappearing; but on older kernels > > (<=3D6.4), I think we could probably theoretically race past the two BU= G_ON() > > checks and end up treating a hugepage as a page table. > > > > Cc: stable@vger.kernel.org > > Fixes: c1a4de99fada ("userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_= COPY|UFFDIO_ZEROPAGE preparation") > > Signed-off-by: Jann Horn > > --- > > mm/userfaultfd.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > > index e54e5c8907fa..ec3750467aa5 100644 > > --- a/mm/userfaultfd.c > > +++ b/mm/userfaultfd.c > > @@ -801,7 +801,8 @@ static __always_inline ssize_t mfill_atomic(struct = userfaultfd_ctx *ctx, > > break; > > } > > /* If an huge pmd materialized from under us fail */ > > - if (unlikely(pmd_trans_huge(*dst_pmd))) { > > + dst_pmdval =3D pmdp_get_lockless(dst_pmd); > > + if (unlikely(pmd_none(dst_pmdval) || pmd_trans_huge(dst_p= mdval))) { > > Before commit 0d940a9b270b, should we also check for > is_pmd_migration_entry(), pmd_devmap() and pmd_bad() here? Oooh. I think you're right that this check is insufficient, thanks for spotting that. I think I should probably change the check to something like this? if (unlikely(!pmd_present(dst_pmdval) || pmd_trans_huge(dst_pmdval) || pmd_devmap(dst_pmdval) || pmd_bad(dst_pmdval))) { !pmd_present() implies !is_pmd_migration_entry(). And the pmd_bad() at the end shouldn't be necessary if everything is working right, I'm just tacking it on to be safe. I'll send a v2 with this change soon. (Alternatively, pmd_leaf() might be useful here, but then we'd have to figure an alternate way of doing this for the backport.)