From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB10FC761A6 for ; Tue, 21 Mar 2023 19:51:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3FF476B0074; Tue, 21 Mar 2023 15:51:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3AF3F6B0078; Tue, 21 Mar 2023 15:51:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2773D6B007E; Tue, 21 Mar 2023 15:51:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 171646B0074 for ; Tue, 21 Mar 2023 15:51:04 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id E4E1912013B for ; Tue, 21 Mar 2023 19:51:03 +0000 (UTC) X-FDA: 80593948806.27.BE3EFD8 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf12.hostedemail.com (Postfix) with ESMTP id D787D40003 for ; Tue, 21 Mar 2023 19:51:01 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="Gn/+ylrG"; spf=pass (imf12.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679428261; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=b040N8CNxfbYJhZz1b4CUOkkZsaH37yKg0qihqmHA8c=; b=uVBAE64ZTFbxE6RZZe7RnTmmNrvQgXfWuc28NzLGxj3w59QkPfCbyBJVhhH/RmiNTD7oEE YZm4aIx+IRIEZU8hMAQ6c41t2jrfGtZEb3Br19T06CTtCpdrRFpUV5EHyE5Z1vJ20XBHA0 O6eFk2l3U51wEs9CjEsEqDlPYAaCGzw= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="Gn/+ylrG"; spf=pass (imf12.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679428261; a=rsa-sha256; cv=none; b=YXm7jjHxBWOL+9mFxxm247fIxC2XIwOq73WE9eFral9LIB/wJGVxy+w1k2urD3yk8Sf7Dl 0BnCAzFPc7GaB+gdXebK4TltQDkzggN3CYEsR7k7YLN5GkgxlsHEH+jv3hbJN1BUUCceml 8Sg35TKabbDXTnKkI/UJbFseVSRfs7E= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679428261; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=b040N8CNxfbYJhZz1b4CUOkkZsaH37yKg0qihqmHA8c=; b=Gn/+ylrGGxwKesZJkMdCyvg4hoB6bK/waMmaIFZhtbUSyIY49I3itNR77VOT3soGns5wki LhGvXcVGg9clFmln/v949wOawmxPUBL7ldh/6IqETpB9ay2S2AUafzI7Fez6emSNMpld6l Ev/1o4DE8ofj1EB+leIGCySkA04YYx4= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-404-5BQkqG0sNjWA1f2KYLmgQQ-1; Tue, 21 Mar 2023 15:51:00 -0400 X-MC-Unique: 5BQkqG0sNjWA1f2KYLmgQQ-1 Received: by mail-qt1-f200.google.com with SMTP id l17-20020ac84cd1000000b003bfbae42753so9415938qtv.12 for ; Tue, 21 Mar 2023 12:51:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679428259; x=1682020259; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=b040N8CNxfbYJhZz1b4CUOkkZsaH37yKg0qihqmHA8c=; b=bLQ6BozyT9inzwh/wAQu1RQNf6OOCyYg3vVcJt1fy3Ccx7/cKy6u9FPTdXIU7Jr4XW 4NHixU9qY3wl/9dqy+PC3/kzKj/fNImMGw/vk4ocZZRn4hNS+1KW4+oJ/018G1s4ABJG PV67MJGk333PNHiW32+naS+VkCb0H61hWVj+yorn8NMO5t5TmKrPrqwVjUCObWmI2yf7 3yf8vKcROyVS7YCx0T7c+bFNarUGEd7s6I393X6Embl2dn/3+KN5yC+JO+RAAy67CFGs hY/rse/coVDkFUpyfUqCxutSIh9YkS4WoMBxoaNl9lIsY6JKnY8wrfPm6dZJRzM5h0QC uY4w== X-Gm-Message-State: AO0yUKXKYW1he8j9PNBhJE46f8v1f11yS1wLlXfKUE1PZLitvZuzVBeT blnIocIgVB9pm721qEHAEUnRwE4DR5SAb1hNWxkzJizqMsuRfgwc6o7M9G2rsXRlJ/dFvshVUZh Sdwu6WCuYt2c= X-Received: by 2002:a05:6214:528e:b0:56e:f7dd:47ad with SMTP id kj14-20020a056214528e00b0056ef7dd47admr5782773qvb.5.1679428259578; Tue, 21 Mar 2023 12:50:59 -0700 (PDT) X-Google-Smtp-Source: AK7set+sVDRBm/gwJm/iFwMYO6SbODkEMeOMDTM6JC6ocZJrlH5/OeyrDjWyRXgOS0eDEVvUVqyHGA== X-Received: by 2002:a05:6214:528e:b0:56e:f7dd:47ad with SMTP id kj14-20020a056214528e00b0056ef7dd47admr5782743qvb.5.1679428259230; Tue, 21 Mar 2023 12:50:59 -0700 (PDT) Received: from x1n (bras-base-aurron9127w-grc-40-70-52-229-124.dsl.bell.ca. [70.52.229.124]) by smtp.gmail.com with ESMTPSA id c23-20020a379a17000000b007436d0e9408sm4882350qke.127.2023.03.21.12.50.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Mar 2023 12:50:58 -0700 (PDT) Date: Tue, 21 Mar 2023 15:50:57 -0400 From: Peter Xu To: David Hildenbrand Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrea Arcangeli , Andrew Morton , Axel Rasmussen , Mike Rapoport , Nadav Amit , Muhammad Usama Anjum , linux-stable Subject: Re: [PATCH] mm/hugetlb: Fix uffd wr-protection for CoW optimization path Message-ID: References: <20230321191840.1897940-1-peterx@redhat.com> <44aae7fc-fb1f-b38e-bc17-504abf054e3f@redhat.com> MIME-Version: 1.0 In-Reply-To: <44aae7fc-fb1f-b38e-bc17-504abf054e3f@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Stat-Signature: hd5zjgux9uwdie3g8doyzg7hhap5q3fm X-Rspam-User: X-Rspamd-Queue-Id: D787D40003 X-Rspamd-Server: rspam06 X-HE-Tag: 1679428261-776843 X-HE-Meta: U2FsdGVkX1/9tH27MjXiekh7YrcFr0g8sxZ7wbzMG1UgyGlFvZ/xkBR874m0PZsLbaPqweEyeTTNDJGiElCA0Kh8GUOWPbhTU7MfqDNMFnoQuBO1v8aMMCeJTxo98hg4kymOZ/+Nu9/7O/EPwuulKrcM1rBN/6MNoHXymWHUmi6MlWkqgVjS0BZWIpTkfP9JKmgKEy3RE+n/iF93fICSUU+3YA7TvQoEh8oONoVdx+ttKBoJLUtgG2oHKE5WguIXz0a2oJC6/DeAGQBAOJzx4u4tWbJiRuUos78uLBpGu3uFhBUF2PABGijWCZ2TXKpur4+dg31metxuIhbVGSYWyAX5jml5AWyvr/NXvx3u5Q68m/p6aPmj1DhB4Mct07cP8H8N5eU1dAu0IfHMrwsiJuOzYfJkneOj5IgBdU38VMKUbPuUryQPiXD+6/4ZPw7eRbOIcUC/Lixk4M0+1DKJ6dQstLDzQ5ufaaHR0yZk7V0XLZCR+ghW8900B2hbFu+LRfHKAeQ6vCRzWHo7g5L69gxIFx4YyFUQEVreMA/swZ9jOXeIPXqHaVCaCAjo5kLqlt0FhD5by00pc2tobhqHG0JHFQidtFVI0x0bo1S80Eo3kyoiRWKk4qf93Z1knWSHUj60RbDUs6YjOKdooSPSmc34S+nZoimTfRRWIYiP4k+Qntmv3tE7MRSYL0yO+w2r4AAJBenJmzQ5J6VW4mv9bYfBzq3fqfE0J0ewj/VvcEvi4Cv0lbcwiD6OWXtE5NiBp6igrbwUdaf6V0MtCgfvLq9Y6MQGm0ibPnF/vfGzYFfSq7OHjrPRCGZaC66J3S3I2arTaOrLIiAx7PXiNCff4RXCZcNpUIkb2zOn/8aCn31U9I71apD5wbScKxqeu5QHnVsZ1WuWSzJvky8yMQ6aQYonVmD/OhJzDh1GTypnoRIDyZyLadmV0Bs/LntISNYbac4cDlDxt4Ci9ccPF5b w+v/w2wt /1Ux6KnwrED6zbz8W7zXrG8irx8KiCFW7RfBSp+hgJ3fCkrkm0AJ4xqqwn68zCdOtDxk1oGv+CNxw6Eub3yVsGoHd49CMrbN/0QB0T6y/BqRCfgcpPaZ1ziP32XEgOaM1mkjBj3eS76uU0X5kLYfcN7iVtPR+Dl2pQVJWHceGrY1Ts7sxJbJdkUq6dwpEsoZcLi2V6kOLwmhLS2KJiu8t+F1W5vFRTz9T7ax4CRNX83kI4gLkkUVdmYec/yNEhOGfABnaww0SvJQuonYRJqphz9+XTq1jIf/wwNDT1weuT1vFK2zsUvaFrcSyHfUs5sGhdRh85llKrodQg02E7aZ1SByXsLd5jP7JfVm0QuK9bNbbetOhc68jsSh9k7I3jxkFAZqDvd4pA+CIO2JBmuyegu/md1YmdfCZyv/r7E0o3BiPmdMfpJuJW9f0eHEnyBat18thmXDejT1cdvyMqDaafRHVJTpniJI/jh8HFVYxAnwn4h9gTebxLsnZsXAjmlBk+R/TCRPdH7NRYJSudNJjHw3Afyg8aE54mne1IQFIijAMJBbT6eVPwcbfi55/L90ZkqjaAI9pCuQ/DwmZeNrHWnhJAVkNQewnwk5W1Znz3E1oFHI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Mar 21, 2023 at 08:36:35PM +0100, David Hildenbrand wrote: > On 21.03.23 20:18, Peter Xu wrote: > > This patch fixes an issue that a hugetlb uffd-wr-protected mapping can be > > writable even with uffd-wp bit set. It only happens with all these > > conditions met: (1) hugetlb memory (2) private mapping (3) original mapping > > was missing, then (4) being wr-protected (IOW, pte marker installed). Then > > write to the page to trigger. > > > > Userfaultfd-wp trap for hugetlb was implemented in hugetlb_fault() before > > even reaching hugetlb_wp() to avoid taking more locks that userfault won't > > need. However there's one CoW optimization path for missing hugetlb page > > that can trigger hugetlb_wp() inside hugetlb_no_page(), that can bypass the > > userfaultfd-wp traps. > > > > A few ways to resolve this: > > > > (1) Skip the CoW optimization for hugetlb private mapping, considering > > that private mappings for hugetlb should be very rare, so it may not > > really be helpful to major workloads. The worst case is we only skip the > > optimization if userfaultfd_wp(vma)==true, because uffd-wp needs another > > fault anyway. > > > > (2) Move the userfaultfd-wp handling for hugetlb from hugetlb_fault() > > into hugetlb_wp(). The major cons is there're a bunch of locks taken > > when calling hugetlb_wp(), and that will make the changeset unnecessarily > > complicated due to the lock operations. > > > > (3) Carry over uffd-wp bit in hugetlb_wp(), so it'll need to fault again > > for uffd-wp privately mapped pages. > > > > This patch chose option (3) which contains the minimum changeset (simplest > > for backport) and also make sure hugetlb_wp() itself will start to be > > always safe with uffd-wp ptes even if called elsewhere in the future. > > > > This patch will be needed for v5.19+ hence copy stable. > > > > Reported-by: Muhammad Usama Anjum > > Cc: linux-stable > > Fixes: 166f3ecc0daf ("mm/hugetlb: hook page faults for uffd write protection") > > Signed-off-by: Peter Xu > > --- > > mm/hugetlb.c | 8 +++++--- > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index 8bfd07f4c143..22337b191eae 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -5478,7 +5478,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, > > struct folio *pagecache_folio, spinlock_t *ptl) > > { > > const bool unshare = flags & FAULT_FLAG_UNSHARE; > > - pte_t pte; > > + pte_t pte, newpte; > > struct hstate *h = hstate_vma(vma); > > struct page *old_page; > > struct folio *new_folio; > > @@ -5622,8 +5622,10 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, > > mmu_notifier_invalidate_range(mm, range.start, range.end); > > page_remove_rmap(old_page, vma, true); > > hugepage_add_new_anon_rmap(new_folio, vma, haddr); > > - set_huge_pte_at(mm, haddr, ptep, > > - make_huge_pte(vma, &new_folio->page, !unshare)); > > + newpte = make_huge_pte(vma, &new_folio->page, !unshare); > > + if (huge_pte_uffd_wp(pte)) > > + newpte = huge_pte_mkuffd_wp(newpte); > > + set_huge_pte_at(mm, haddr, ptep, newpte); > > folio_set_hugetlb_migratable(new_folio); > > /* Make the old page be freed below */ > > new_folio = page_folio(old_page); > > Looks correct to me. Do we have a reproducer? I used a reproducer for the async mode I wrote (patch 2 attached, need to change to VM_PRIVATE): https://lore.kernel.org/all/ZBNr4nohj%2FTw4Zhw@x1n/ I don't think kernel kselftest can trigger it because we don't do strict checks yet with uffd-wp bits. I've already started looking into cleanup the test cases and I do plan to add new tests to cover this. Meanwhile, let's also wait for an ack from Muhammad. Even though the async mode is not part of the code base, it'll be a good test for verifying every single uffd-wp bit being set or cleared as expected. > Acked-by: David Hildenbrand Thanks, -- Peter Xu