From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 33987C7619A for ; Wed, 5 Apr 2023 16:59:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A79B86B0083; Wed, 5 Apr 2023 12:59:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9DAE66B0087; Wed, 5 Apr 2023 12:59:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 87BCC6B0089; Wed, 5 Apr 2023 12:59:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 6BE496B0083 for ; Wed, 5 Apr 2023 12:59:30 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 42569AC68A for ; Wed, 5 Apr 2023 16:59:30 +0000 (UTC) X-FDA: 80647948500.20.74967CD Received: from mail-pf1-f177.google.com (mail-pf1-f177.google.com [209.85.210.177]) by imf26.hostedemail.com (Postfix) with ESMTP id 7AC08140016 for ; Wed, 5 Apr 2023 16:59:28 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=AFIzP3KP; spf=pass (imf26.hostedemail.com: domain of shy828301@gmail.com designates 209.85.210.177 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680713968; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8DcuuYVp4lBpbnwNZw6dcWtfsLRDW/kYgpPy1GgV8X8=; b=n0X/bBw5BtgMdtwkCLN3CpERkkYZSvDcTTFsg7FxmFbAxPR/c7EXqRwpEAyANDQMTC0hME BuOPjVFbO6AEWaiF/Edu2t95j95T/p8fLgaG4J4UlXVcXhk2v/BSy6qSRNtvzYUhoCT9Mz wFfW4+77Q1XrBnIYg8RJ1+aiJY45Sn8= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=AFIzP3KP; spf=pass (imf26.hostedemail.com: domain of shy828301@gmail.com designates 209.85.210.177 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680713968; a=rsa-sha256; cv=none; b=KvbgIibbxHmDPmuxdMIyDlRFCE9jq2p2YRsDiqsBFNTFzRuwznqSNQudeiZx3RU2KHRsqf 7wQmp/KpYHL33qZoEUciT606NaKnWuQAU00+FJvdlMkD0BSudHPuX2kKNcRwdnP2hpA/1u 0zlcOMAkUYIS+Fa/X6G6e+zSmWllHcw= Received: by mail-pf1-f177.google.com with SMTP id cu12so24103574pfb.13 for ; Wed, 05 Apr 2023 09:59:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1680713967; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=8DcuuYVp4lBpbnwNZw6dcWtfsLRDW/kYgpPy1GgV8X8=; b=AFIzP3KPzIpj+6ehD1zoVcl5dyDzmj/vAjnX/9Plg+T7hmbu9BPSazINsMASRnjMpa s6XvVTXA6xmpQwnGgweEVTj86ikmfijGw+t89gg2j/RfOZW0ykrfcpuClEKbIrJrROjy Jrv29UFSm3qnfCyMj/TioDS1TGMVmIPqEPYkbgLGwY4wQRSu8fTitoOU1d+DUF00cSpO 5ZU9AcMC8Wl/JhS4wxVguVurwIBSQfikRmGNkpvY/iOKCDmjFpTIvERNklmKqQ66tBQT UgeXRVTA7AFs+NlwAam2GZPws6KR8mFP8eLbZBX5/KvWM8VHhLusJbiM69492x0lLhst RN7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680713967; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8DcuuYVp4lBpbnwNZw6dcWtfsLRDW/kYgpPy1GgV8X8=; b=Yy7F+LVBZe8HEudT0pvdijJjYBEpWsAfIvn9OnVYvm9UD+LxSLLFSk/471dOeD2+hr 90cEmV2xfU/8e75DTf9iV6HjCxiXGCYAsHzmtKr7mFMosvcFMrAxMgsSjWiGYOZisV7K nGP43lOJgw/Laqof0MMmmMd+rocx26wIFtXI1IJ3ma+U58uUVrfE+9y8Z+KURz6kEPbq nahA9WO7bDGZbs5BWD7GxEjcVJ/3fAFHCcfX7rwLme+HaN3EgeWdYLURwQ35DdoS6z6G Ytdej7BQRbIFkcYNI4k3FMXXamKqudT/LvBN1kc7DkDiEbR3l86Y6iFysKBhDCmsoj4w kCoA== X-Gm-Message-State: AAQBX9dVxWQKeewGdDe95c4uEs+SW9c/yMa4pkTR+DDToPjnIh8FhFqG z4VCjg1P/qy8f1rWjsQwRITaPsl6h0aDRZLrBQ4= X-Google-Smtp-Source: AKy350ZNBa+4aZrn1jl9jaZLyzGz/4+iwTpCulbn97BE9QQwzUBNEt4Xg5jPqvSPvWvu+6+rf/CPt3TSCBEiymjzD0w= X-Received: by 2002:a05:6a00:2d25:b0:625:96ce:f774 with SMTP id fa37-20020a056a002d2500b0062596cef774mr3871869pfb.0.1680713967230; Wed, 05 Apr 2023 09:59:27 -0700 (PDT) MIME-Version: 1.0 References: <20230405155120.3608140-1-peterx@redhat.com> In-Reply-To: <20230405155120.3608140-1-peterx@redhat.com> From: Yang Shi Date: Wed, 5 Apr 2023 09:59:15 -0700 Message-ID: Subject: Re: [PATCH] mm/khugepaged: Check again on anon uffd-wp during isolation To: Peter Xu Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Axel Rasmussen , Nadav Amit , David Hildenbrand , Andrew Morton , Andrea Arcangeli , Mike Rapoport , linux-stable Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: ywe5iz41eppdsrtptzudhash6d6iaez8 X-Rspam-User: X-Rspamd-Queue-Id: 7AC08140016 X-Rspamd-Server: rspam06 X-HE-Tag: 1680713968-878801 X-HE-Meta: U2FsdGVkX1+R1JEUzHmow318qSouunyZ01ZYe4HaRwPnBZgTJX8IUgvG36tSHm1H6tn2G323nkYIYGLdS+2JJ9++q62VL0ywNyrqyU2WbyuAtprKff281k8yeTthGOxh+Db8dqIcr6sTJGR9mg6yp+AEacPZFEzBuIVLLosGLrkJlB3S3/OqJpFCMGJXdGe2NC6yeoprvmYwFxVFEJV9sQdcnnAsIqhNpQHDosmn7+ooLhLCC8OFFbHH6PSMw31j7GH5Xdga24c0m6mon9Fah/bMCzpwQCeVMvLZd6L3+/zIXTA6jPfR4ivG5jmTe6IqaQ13nfXZaqB8xXDF8AMtsUMTmc2wpkvtSeqQr+D9jmVCbZ3OG6M4hEbtipf6Vr8VXZQ8N2UVrmPiBAA27ITSsHI7ASDKL29gZOAJn6zJjIK5QqcMbVTUQGQ1OXDRQdVzt7E5xFon7+c82lto4hcxuahFS0gLAXkznYoBXN9YyKAmjg+fq2DAY/g3CFB/Bp552Leb13TkWW6EdZC42YxxbqSmz9js9t3DoITylCiV1rgecDESGfAEpzV//hzSx6W9+KDWTT5a9vsdS7q9rlSXFIT+9S+3OteFRYbfyeDBFWzqERCXG6TiOgsKrwR/oymAW2IrcF72EGjrpH+oW6QDoorBiFVZkYxmifmuriv6ebec5K4yimXAkZUD8l/6D/Dmz1t0hizsTepnOsmZgCeRXa7J1H2tNasWf4TTPiZwm9Y6dBErS4PJcdqNB2mUgImj+ao9TNYpRXkhIMGBLmo97byUl1VVbXS90S4xf5fyfEwvUhTBWRwjpuhgLZHRnKSgpp0aWHB6bjHSrkuhfvBm7yynr0WkLNZasWfEdoNQ1w044PnVEaZoGLPwpWrpHQemkTVAwbNjevhmIggWob1/Aq7Gox0Kxo3zo9ECIjf5xx+hsL8ukeTOvCz80wucLaj1Z9zscX53B8YHbnXZIF7 0iAL/74H jpTXXDBWt4zWP3wMtIlRp5lji+Bi57r1ZJCDcyj00Ms+bymzGHyIKRPhd6WoSFp6QicbSa7QeD7mOhzI3/8AdD2VzW6JL7HGapiGy7MWbrfCL2f2SLlYp0ArsUozDbKnof7TBWKEI/ThTaMDD+O71ySPF0A4kQP/7YVvw9Bav3RkuvIp9CT8ly6ldnd861tB9mFeRNi0L8VWmwJG/pKPpmxZ53foUtNjmQOlL8CUuB/WkrTE0q7zMYgPpbx3SqH9WZb/IBP0qYP5m3oc4Bn7CF1zVi8rIVBAtNFCIn7gH6wJRAxT3KyNe3RxBWSitRHLeLWI6qEk1myL/rkypClG0zB94QnYhChmr1GYGePSh5Vz0dbtDREcFniqJHJ7UxbU1dcztyn5aWmT+ywE9zZSJ9Prtb6OaNNZ6ZkStXss9BoKMVpxS7A5bHxN8ZmzejoukPQ6oJNqgj6PoQlkpIWCyYjmDiSx7xGwOFPVgNgmUSlQ8uPUrLnclWVFO4PYOmR2iM5Wm7uKpzl2chTE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Apr 5, 2023 at 8:51=E2=80=AFAM Peter Xu wrote: > > Khugepaged collapse an anonymous thp in two rounds of scans. The 2nd rou= nd > done in __collapse_huge_page_isolate() after hpage_collapse_scan_pmd(), > during which all the locks will be released temporarily. It means the > pgtable can change during this phase before 2nd round starts. > > It's logically possible some ptes got wr-protected during this phase, and > we can errornously collapse a thp without noticing some ptes are > wr-protected by userfault. e1e267c7928f wanted to avoid it but it only d= id > that for the 1st phase, not the 2nd phase. > > Since __collapse_huge_page_isolate() happens after a round of small page > swapins, we don't need to worry on any !present ptes - if it existed > khugepaged will already bail out. So we only need to check present ptes > with uffd-wp bit set there. > > This is something I found only but never had a reproducer, I thought it w= as > one caused a bug in Muhammad's recent pagemap new ioctl work, but it turn= s > out it's not the cause of that but an userspace bug. However this seems = to > still be a real bug even with a very small race window, still worth to ha= ve > it fixed and copy stable. Yeah, I agree. But I got confused by userfaultfd_wp(vma) and pte_uffd_wp(pte). If a vma is armed with uffd wp, shall we skip the whole vma? If so, whether it is better to just check vma? We do revalidate vma once reacquiring mmap_lock, so we should be able to bail out earlier. > > Cc: linux-stable > Fixes: e1e267c7928f ("khugepaged: skip collapse if uffd-wp detected") > Signed-off-by: Peter Xu > --- > mm/khugepaged.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index a19aa140fd52..42ac93b4bd87 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -575,6 +575,10 @@ static int __collapse_huge_page_isolate(struct vm_ar= ea_struct *vma, > result =3D SCAN_PTE_NON_PRESENT; > goto out; > } > + if (pte_uffd_wp(pteval)) { > + result =3D SCAN_PTE_UFFD_WP; > + goto out; > + } > page =3D vm_normal_page(vma, address, pteval); > if (unlikely(!page) || unlikely(is_zone_device_page(page)= )) { > result =3D SCAN_PAGE_NULL; > -- > 2.39.1 >