From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34799C76188 for ; Wed, 5 Apr 2023 18:10:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C66DC6B0074; Wed, 5 Apr 2023 14:10:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C165D6B0075; Wed, 5 Apr 2023 14:10:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A900D6B0078; Wed, 5 Apr 2023 14:10:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 9AA816B0074 for ; Wed, 5 Apr 2023 14:10:08 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 3D203160C36 for ; Wed, 5 Apr 2023 18:10:08 +0000 (UTC) X-FDA: 80648126496.24.A43C8F2 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf12.hostedemail.com (Postfix) with ESMTP id E2E084001A for ; Wed, 5 Apr 2023 18:10:04 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="WV19tLy/"; spf=pass (imf12.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680718205; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iQaR8cLruAgHBCHSLgU+0MSFf8a4qwol/WFeB+l5N2I=; b=CzDmCk9q1IkGnmi1X+mzyd2cuSahiSBC8mvzb3QGyrm6wKmSh5zDK0cdWkWyk3HsBceIVA iye2K1eFtf4u27O3dPs4xINQLv851W2hUndXRtO50Wna5/n4rod4Wj/HxPtoLjz2C7HsIe 5wEao29X2NaQy+8TPd4LOROospgMvS4= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="WV19tLy/"; spf=pass (imf12.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680718205; a=rsa-sha256; cv=none; b=1HEmtNsLhICnbYXM0No9bnYs3FZfTVhzejcPfT2RcDcFhL0LoYq9J0SnBavBLO0QfZfTyM 1XJ4OHAknzLMUKBbcDdGnSoOBxXQFWuViUqNmWXwfSD9VodilU6inu0HaZMhWr79sZrlDw IYP5RmhdGW9Xh//lYyvpMuf0rXN2gIk= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680718204; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iQaR8cLruAgHBCHSLgU+0MSFf8a4qwol/WFeB+l5N2I=; b=WV19tLy/OshpoCNf/kYpmqTQcxj/jIWgLiea7918FAQw2aezVEdHVQsemfZadxGFTGb3Db mgdX2dTkBD9sjOXJSDrteWWV6/OhUNqzjOOJjptrOY/xhivsEzmGBDMEwz3+Wg8DPLv7bI qhVlTxPM+KnfERRtCL5IIBMk3Ibea6E= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-578-DLDSmhtcPkO0mQ2-emW5cQ-1; Wed, 05 Apr 2023 14:10:02 -0400 X-MC-Unique: DLDSmhtcPkO0mQ2-emW5cQ-1 Received: by mail-qt1-f197.google.com with SMTP id d75a77b69052e-3e1522cf031so12452831cf.1 for ; Wed, 05 Apr 2023 11:10:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680718202; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=iQaR8cLruAgHBCHSLgU+0MSFf8a4qwol/WFeB+l5N2I=; b=pQdTfK3+dc4hVqyqiZhSIPL41wLZKFiEng3SUr7LVOkgjze/dJ/tNXI3VrrMwU9KQB OnI143pv9K2joCJIS788oL0LNxOXBJs4evSZZrmboBJqMtuyv1EugYH2vlk4+0jdLJc9 NkiYz8X18bReqXHaR2h3HrksdALA6bXlg4KWNNwJoAGPKRg/yipN6IQNi5SLk3H7m44R QdDakReU2WUv9cFRuisoBfmB9fbwk3uzRhtRPnkyyBdbKicHygG1oNZnymVHfN5udIja yyjzs3GTAXbOdk7Q5ZL28CYhQakSqdCP5QveIq7Nzc+xkaMDuALLpWdrKTMkL1oTn3hy pzqw== X-Gm-Message-State: AAQBX9divzzuMzYsQ0juNl8pToSRqHEppTKDypH9thsQiQrYgcOPlqZ1 /PhC1lZ2VmM2pJ2aPd/DnPjDbR8wSz2RYZFLhzT3BPzlyIQCIWcO2pk6Ha8GeEF5DNMOjnB5ntw UWqfHpXP2aTw= X-Received: by 2002:a05:622a:1a24:b0:3e6:707e:d3c2 with SMTP id f36-20020a05622a1a2400b003e6707ed3c2mr7427869qtb.0.1680718201962; Wed, 05 Apr 2023 11:10:01 -0700 (PDT) X-Google-Smtp-Source: AKy350YSvj/32iOp1pxSx0WMu3r+k7g2O68tq9YtIyZ2lMFIwHgD2EkuFNYDwmPWW8noOYsVZWdd+Q== X-Received: by 2002:a05:622a:1a24:b0:3e6:707e:d3c2 with SMTP id f36-20020a05622a1a2400b003e6707ed3c2mr7427806qtb.0.1680718201547; Wed, 05 Apr 2023 11:10:01 -0700 (PDT) Received: from x1n (bras-base-aurron9127w-grc-40-70-52-229-124.dsl.bell.ca. [70.52.229.124]) by smtp.gmail.com with ESMTPSA id c30-20020ac86e9e000000b003e388264753sm4116280qtv.65.2023.04.05.11.10.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Apr 2023 11:10:01 -0700 (PDT) Date: Wed, 5 Apr 2023 14:09:59 -0400 From: Peter Xu To: Yang Shi Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Axel Rasmussen , Nadav Amit , David Hildenbrand , Andrew Morton , Andrea Arcangeli , Mike Rapoport , linux-stable Subject: Re: [PATCH] mm/khugepaged: Check again on anon uffd-wp during isolation Message-ID: References: <20230405155120.3608140-1-peterx@redhat.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-Stat-Signature: m1cra8xm89ism8yzacyhzf9r8ht66kpg X-Rspam-User: X-Rspamd-Queue-Id: E2E084001A X-Rspamd-Server: rspam06 X-HE-Tag: 1680718204-597660 X-HE-Meta: U2FsdGVkX18KCKpO0PG9CyORRb5BnPokwk9U5lmDb4PoMp/3MO/weKA18s0c5LPTAaszS3XQmemjMHcXo8MDe3Typw1lamdicnvttDEEKyCaFLk1UECd3r+gCUlzB035OOYVjGu5SjRvPBOx68lO7t5ZfLTbEzLKP1E5gBjticbTSD8qwpl4Ce4Oesc4Tu5MkuZ1yfFv7+4Eg9Za6EyqZSOfvxkVcM3TCdmkXKJXtb3hbMnehNIBWWpHBhJLlbVx1h61Jg7G20twZPOLy7Q2L57v+XPA9duHSe6HIQ5FX+Gttlvzh+uMAenTFxN/xoyoH6yo196PmOe3/3GpL9GtJdLqSI+P6fts8b9Tx+Sgf5FfOGIG++viNZPowR94Ns58wlZdAd6ItCn02jD1U4x5ZlQYTFrFrGBi+W/4EYTanMpkCvxue7WLA+tADqFe+pVvTikroDqi/oGj6d4xVFzuPYEilihPWJsnF/YrWWZS5zds5BTi+2vY5mrD2jwsaOLNlwhM60gagAkMtoU+UtFGDsquy7mr1O8hX1gaqrFvGaRkVawHNQULyu6cATCLmpTUW4MA45+7j+er7CwSnDC7fH7S7qarMFJKu6YNSoJzjEJWQ2wUE2WEbbhzyqZuwykg+FBTcZzmy8QdPcYwQlALWOidNkCBQun2h5mnW+JySyq3RcgbCl790xMy0MMR8aYP3XnYN0yAbYwuRVDEtKh6RXbMVOJ9fWtmUIMCfYycOuNySieqo5JXcBJLqUvlJSbM0ZofkD768MckeQzmXhbRN4vaxDJjhW0OV7m2IpwdA3kq9YbKdCiM42PTL6PACvcU+qopnJb2IxjXqNngqqrSEwyMet3pdR5JVE9zhRhUuRXLwNpfXWxVz4S4Z9q2Cv38PvDz6GTKAPI0T6fZ58c2F4H1DfxEXoJzdXCnoqRm305XjPCm9YupGvr4dX7+KVt4yW8Hl1OLW6EIk5RQCbV fYX9e1mb ylsnS1SAsiyOsIB0DfXUFMypFHCwUhcJDnLVhiLVdNzhmo3ui/QmjXk5hOMBVkprsDjJOBr+yE7TlPb9JaF7GSXFyCnREi6SM0sNFi/hNZ8Tn/L4FPNhvwDtMY4T8UR2fJMv9LqwkRPD6W+HjQPuZ7p+CU4qN7cTz5Kiblzb0LZt244SY6hcSOQMMSpXo0SPMaNR3RqhK2cBqu2d8MeGr/yy3NtICZW2xsUC5+iapq6+d0eSu20mpFYo80wztSgN1M4/Z38ZEhM+KbpXvQqphrR76eK6jQhjYPb6H4yat0XzkDPN3i4LNGZq6dMR38V1/B2V/W2WCyNRZJH1nGXYFOJja92UjaggMNnaPhgHSiwjMrUb+5Ihsec7oliGY5l2FNJCzY2z9mOTT5Md0sPAzHDHEi1Z3p6Ow/cgCBQpV/q80X4DWuLkdC7grQlgS+HAgdEhdjDCRYqtYa9zyXtifDv2YgGNbKPAvASU5 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Apr 05, 2023 at 09:59:15AM -0700, Yang Shi wrote: > On Wed, Apr 5, 2023 at 8:51 AM Peter Xu wrote: > > > > Khugepaged collapse an anonymous thp in two rounds of scans. The 2nd round > > done in __collapse_huge_page_isolate() after hpage_collapse_scan_pmd(), > > during which all the locks will be released temporarily. It means the > > pgtable can change during this phase before 2nd round starts. > > > > It's logically possible some ptes got wr-protected during this phase, and > > we can errornously collapse a thp without noticing some ptes are > > wr-protected by userfault. e1e267c7928f wanted to avoid it but it only did > > that for the 1st phase, not the 2nd phase. > > > > Since __collapse_huge_page_isolate() happens after a round of small page > > swapins, we don't need to worry on any !present ptes - if it existed > > khugepaged will already bail out. So we only need to check present ptes > > with uffd-wp bit set there. > > > > This is something I found only but never had a reproducer, I thought it was > > one caused a bug in Muhammad's recent pagemap new ioctl work, but it turns > > out it's not the cause of that but an userspace bug. However this seems to > > still be a real bug even with a very small race window, still worth to have > > it fixed and copy stable. > > Yeah, I agree. But I got confused by userfaultfd_wp(vma) and > pte_uffd_wp(pte). If a vma is armed with uffd wp, shall we skip the > whole vma? If so, whether it is better to just check vma? We do > revalidate vma once reacquiring mmap_lock, so we should be able to > bail out earlier. Checking against VMA is safe too, the difference is current code still allows thp to be collapsed as long as none of the page is explicitly protected over the thp range, even if the range is registered with userfault-wp. That's also what e1e267c7928f does. Here we have slightly different handling between anon / file thps (file thps checks against the vma flags), IMHO mostly because we don't scan pgtables when making decisions to collapse a shmem thp, so we made it simple by checking against vma flags. We can make it the same as anon but it might be an overkill just to scan the entries for uffd-wp purpose. For anon we always scans the pgtable anyway so it's easier to make a more accurate decision. Thanks, -- Peter Xu