From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02E6DC6FD20 for ; Fri, 24 Mar 2023 14:11:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6562E6B0072; Fri, 24 Mar 2023 10:11:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6062D6B0074; Fri, 24 Mar 2023 10:11:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4CF056B0075; Fri, 24 Mar 2023 10:11:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 3CACD6B0072 for ; Fri, 24 Mar 2023 10:11:40 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 113F91207B4 for ; Fri, 24 Mar 2023 14:11:40 +0000 (UTC) X-FDA: 80603979960.02.AA74FD9 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf01.hostedemail.com (Postfix) with ESMTP id 068764000A for ; Fri, 24 Mar 2023 14:11:36 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=XoNmC+Xq; spf=pass (imf01.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679667097; a=rsa-sha256; cv=none; b=PGPug2+pGH6IMg8LHO7Ya4jRYVMxkyPmPMxTht75zeOP/80/mW/1U30PG8VRTyK7tLW9zY AsVPnt6JvmC7FH6tRXte20buetaAU3Z5t1yG7/r6rFCYx0dGvN6yJbrTJbl/GxmH3mCxS5 ifpZKc387FfddlTjgHk4gGemOonAp+E= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=XoNmC+Xq; spf=pass (imf01.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679667097; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=67+EwHu+QToRo/sF9RBeGijI61E5QqD34PaUertJaVA=; b=4J64+ZkZs94L5uvMT9ZknRMdSqYqnXgpwVlfLzmr2uPL+jfGuzZvDsPpMiInNGXfrxkURp elcP01ei/uPSwFCBL9WGkpzHsd2YqC5M7UpOT1AxlR5hlYFowed8CWmSGdbMfw5Px/B8sS kFMV3qUmzKxLKRxHlCFmFSIeKclDry8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679667096; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=67+EwHu+QToRo/sF9RBeGijI61E5QqD34PaUertJaVA=; b=XoNmC+Xqq28yp/NgZmy1ENWY1Rtp9Z5OwKAUcXt1fX03xUjVFcsDoAOufd8vpxy7tA9AoT 4PXwUvqagsGtte9RO/fNjVDkkGTESF4Mmyf1EheVqqb/hqAuoRmaOu+7Y4/x37uj+Xlr1A tksDRqYlTJjSqWvtLCA+C3lpsA2Rlc8= Received: from mail-qv1-f69.google.com (mail-qv1-f69.google.com [209.85.219.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-338-8gxXmKG3MQuVUzLrrFEDNA-1; Fri, 24 Mar 2023 10:11:34 -0400 X-MC-Unique: 8gxXmKG3MQuVUzLrrFEDNA-1 Received: by mail-qv1-f69.google.com with SMTP id a10-20020a0ccdca000000b005d70160fbb0so1042072qvn.21 for ; Fri, 24 Mar 2023 07:11:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679667094; x=1682259094; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=67+EwHu+QToRo/sF9RBeGijI61E5QqD34PaUertJaVA=; b=Q7vt3dc1S21LwwPJ5JA0OhQ83H3jOSc1moOqSAQftk9bPffT7VITPug/e5daDd5HIe MWHVczry5pplT/Ie8fV2nrnsiymEIoZMx+zaE6vl+o/RK4CGn5XbI8CzIBpfmm5sxo5j skmwGdH1WmjOdUBUe/bVpfBeCeqqqvuPTsXFwzpSn6QL1Z28H+wFDKak0BmPHlB3PfBN ey3uJ7fkfeLGn3b3FdYYTdZIg7pylD6CG25ZvOXqc+kw8qdRWiWGotUuOyDMuJOQTQLH esOM6joDGnMaB83/677SCi2VhMXrOUhH0CGegAsAHdpJnsQteRcrGCZ6hE2tVzL8rL1c waqQ== X-Gm-Message-State: AO0yUKVcP+MV8F86YPluXd64IRct59gEjxbcN4pr7u0cQpD1b8O6Qilp bdawWbXsrlEszUtgS6CRBNPkQ8yv4ibRX1njwQW1edhmtp6BZ/sbUlPJk+iuV6uGVyv8aZspaEP comwmRMbvrdk= X-Received: by 2002:a05:622a:1829:b0:3e3:8f8c:b92f with SMTP id t41-20020a05622a182900b003e38f8cb92fmr5523902qtc.5.1679667094004; Fri, 24 Mar 2023 07:11:34 -0700 (PDT) X-Google-Smtp-Source: AK7set8y3AMK7uJVBJ0ROFsgRE9C0G+cSC8bK88uYmM8vWRjvzsRIBReBY87Ayov21XFeK4yUFjTSg== X-Received: by 2002:a05:622a:1829:b0:3e3:8f8c:b92f with SMTP id t41-20020a05622a182900b003e38f8cb92fmr5523859qtc.5.1679667093676; Fri, 24 Mar 2023 07:11:33 -0700 (PDT) Received: from x1n (bras-base-aurron9127w-grc-40-70-52-229-124.dsl.bell.ca. [70.52.229.124]) by smtp.gmail.com with ESMTPSA id c9-20020ac80549000000b003d8d1ec2672sm12584782qth.89.2023.03.24.07.11.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Mar 2023 07:11:33 -0700 (PDT) Date: Fri, 24 Mar 2023 10:11:31 -0400 From: Peter Xu To: David Hildenbrand Cc: Muhammad Usama Anjum , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrea Arcangeli , Andrew Morton , Axel Rasmussen , Mike Rapoport , Nadav Amit , linux-stable Subject: Re: [PATCH] mm/hugetlb: Fix uffd wr-protection for CoW optimization path Message-ID: References: <20230321191840.1897940-1-peterx@redhat.com> <44aae7fc-fb1f-b38e-bc17-504abf054e3f@redhat.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Queue-Id: 068764000A X-Rspamd-Server: rspam01 X-Stat-Signature: ahs5wzhja4ct81sdy71ons36o7h8w5cb X-HE-Tag: 1679667096-726099 X-HE-Meta: U2FsdGVkX19Vkp/aFYYlzv7HqY0PHyVWgg63EDaOeNGeDGudvQeXTCqPHQ90WwiGgKIRDQox5c4xw1H/nCNIrY2HT+gkAJkA4d1DSQcjr7PmP4SHue0CJEEmMF86/gdR0r3ASv2w5UmZI2TaDFDjc2Oba4nv8ulsEMTVQ2ALCI5UnL0Gj3iw9Lj+U9Px5vSIzzlGcOjk2SiOR16Rcl0+NRwTDgMeT5eZhc0l4aRXdd8kgP8XaId16uNdft4MB00O3FkEaSsGuyoI9zCRBVXpXqIarGt/13iZfiX9K1Misi6fXatbfud5k5Mxl8r8EXFY08Qu2XNcNQEA3tauVIB++dqQ2M5/2VxQDtAP1zsvxCed71TAJhI5numZcVySfw2CH5OEcg9uD+myoY2PQtBiBf39yWdjTqdGhyffwGfYjGoHgTeNs2rSykidnByEhPecgjOmVU9beB11aZZK2Iq22EYQDWPjdEgRUSJKupsYheFs8uysK3Gzwcz/I0gav8MEg1ydjSjJFqxEqwtE4FIwKNfYevSkINIh3G/TUROQH7RNv0eRgHZt2eoyR4fpuNTRJ2KE5kyDKodX7rzKOTlQ2chK+YgzWL2WpnkIayLq3osHaEd6PHiGM7X5lPbAbMDfa8ankVzZVDM4yY/fgIf7wxkIFCwFvr7oXeiijLJz1GskQ7XrZYtSDQoE4Dj2e0oqrvSDj6Aeh0MTFaGzso4TQZxYbzdjyuUuFEV/eWET1ZAcY+4l/T3uxPQM760CI/wmttPjZct1H1pmhW0aDvLosu3M1mm6yL1sPbnSkzn9nOATEQr0vGFDgjJ/tRvneaOWsYtuIDXe067qzU9BRBvi3PF20n9O2mk6IUE07sd9jpVwQ6cCfeUpbmAtCfL8P/ChkOSOTrv9ke2d3iq+GTW/eeaPTI5L4UMIBZIVfOqViiIDE7ySbtgojtOj8WKtZwuRwIcyrNqPxKT/m24jbyu AlFlEnkk yVpUIyY9Zn1rUWEOO1l30109g4uUaHbYXkkP6uIrroBngYFDd5leXsBM0grDtMoj9PpZg/onrTNC+lk71vQd2YuG0axOaf8b/i2sqC1+Gj5IZZQRvuCxnIxr1AIqNn73ep0eztYz1lrBoM9B2E53qvDQq/RbrwT94t61YkirfdNhKU4c3GGAOhVQ0NyWXMKhWvvcZB78Y5/Yup7HM/Mn+5dVtua6laQ/VpqNghRKLYtLx5n78nr2V/NpopIp5HKJ0gQcnGHiHBspY44KvE5+geU/VBLq0icAn3K7LVmawRSeMVylYeVju6RIc0wbLvHU1qDvkDkmsV20n6Mg4NkPh9cn/1dMqijwlQNoPUUiXItIPSUtLGZGVfEEJLQ67b/+SQHTnYYk6vIFN/mxeu1ibFWmLseot31nO7IzT5WnPhRWOmYxx1vmuDgPs9V442JlMC+CyupPwaps3tYp2iMqe1udQ3kxijHR5aGGOHLULgBfwCspHww7GrD/HYvnlPfIlrCCpS/DT6wRVoyb70kzrcLtDF3F4UaW0q38q9sxvxoaqtkzruC+DZ5I4kAALfj1gIfiiTM0E18ZlVnn5wF2OEoTqSpX/tlqKq/PwWdYyxFltIiY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Mar 24, 2023 at 09:51:27AM +0100, David Hildenbrand wrote: > On 23.03.23 23:11, Peter Xu wrote: > > On Thu, Mar 23, 2023 at 08:33:07PM +0500, Muhammad Usama Anjum wrote: > > > Hi Peter, > > > > > > Sorry for late reply. > > > > > > On 3/22/23 12:50 AM, Peter Xu wrote: > > > > On Tue, Mar 21, 2023 at 08:36:35PM +0100, David Hildenbrand wrote: > > > > > On 21.03.23 20:18, Peter Xu wrote: > > > > > > This patch fixes an issue that a hugetlb uffd-wr-protected mapping can be > > > > > > writable even with uffd-wp bit set. It only happens with all these > > > > > > conditions met: (1) hugetlb memory (2) private mapping (3) original mapping > > > > > > was missing, then (4) being wr-protected (IOW, pte marker installed). Then > > > > > > write to the page to trigger. > > > > > > > > > > > > Userfaultfd-wp trap for hugetlb was implemented in hugetlb_fault() before > > > > > > even reaching hugetlb_wp() to avoid taking more locks that userfault won't > > > > > > need. However there's one CoW optimization path for missing hugetlb page > > > > > > that can trigger hugetlb_wp() inside hugetlb_no_page(), that can bypass the > > > > > > userfaultfd-wp traps. > > > > > > > > > > > > A few ways to resolve this: > > > > > > > > > > > > (1) Skip the CoW optimization for hugetlb private mapping, considering > > > > > > that private mappings for hugetlb should be very rare, so it may not > > > > > > really be helpful to major workloads. The worst case is we only skip the > > > > > > optimization if userfaultfd_wp(vma)==true, because uffd-wp needs another > > > > > > fault anyway. > > > > > > > > > > > > (2) Move the userfaultfd-wp handling for hugetlb from hugetlb_fault() > > > > > > into hugetlb_wp(). The major cons is there're a bunch of locks taken > > > > > > when calling hugetlb_wp(), and that will make the changeset unnecessarily > > > > > > complicated due to the lock operations. > > > > > > > > > > > > (3) Carry over uffd-wp bit in hugetlb_wp(), so it'll need to fault again > > > > > > for uffd-wp privately mapped pages. > > > > > > > > > > > > This patch chose option (3) which contains the minimum changeset (simplest > > > > > > for backport) and also make sure hugetlb_wp() itself will start to be > > > > > > always safe with uffd-wp ptes even if called elsewhere in the future. > > > > > > > > > > > > This patch will be needed for v5.19+ hence copy stable. > > > > > > > > > > > > Reported-by: Muhammad Usama Anjum > > > > > > Cc: linux-stable > > > > > > Fixes: 166f3ecc0daf ("mm/hugetlb: hook page faults for uffd write protection") > > > > > > Signed-off-by: Peter Xu > > > > > > --- > > > > > > mm/hugetlb.c | 8 +++++--- > > > > > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > > > > > > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > > > > > index 8bfd07f4c143..22337b191eae 100644 > > > > > > --- a/mm/hugetlb.c > > > > > > +++ b/mm/hugetlb.c > > > > > > @@ -5478,7 +5478,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, > > > > > > struct folio *pagecache_folio, spinlock_t *ptl) > > > > > > { > > > > > > const bool unshare = flags & FAULT_FLAG_UNSHARE; > > > > > > - pte_t pte; > > > > > > + pte_t pte, newpte; > > > > > > struct hstate *h = hstate_vma(vma); > > > > > > struct page *old_page; > > > > > > struct folio *new_folio; > > > > > > @@ -5622,8 +5622,10 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, > > > > > > mmu_notifier_invalidate_range(mm, range.start, range.end); > > > > > > page_remove_rmap(old_page, vma, true); > > > > > > hugepage_add_new_anon_rmap(new_folio, vma, haddr); > > > > > > - set_huge_pte_at(mm, haddr, ptep, > > > > > > - make_huge_pte(vma, &new_folio->page, !unshare)); > > > > > > + newpte = make_huge_pte(vma, &new_folio->page, !unshare); > > > > > > + if (huge_pte_uffd_wp(pte)) > > > > > > + newpte = huge_pte_mkuffd_wp(newpte); > > > > > > + set_huge_pte_at(mm, haddr, ptep, newpte); > > > > > > folio_set_hugetlb_migratable(new_folio); > > > > > > /* Make the old page be freed below */ > > > > > > new_folio = page_folio(old_page); > > > > > > > > > > Looks correct to me. Do we have a reproducer? > > > > > > > > I used a reproducer for the async mode I wrote (patch 2 attached, need to > > > > change to VM_PRIVATE): > > > > > > > > https://lore.kernel.org/all/ZBNr4nohj%2FTw4Zhw@x1n/ > > > > > > > > I don't think kernel kselftest can trigger it because we don't do strict > > > > checks yet with uffd-wp bits. I've already started looking into cleanup > > > > the test cases and I do plan to add new tests to cover this. > > > > > > > > Meanwhile, let's also wait for an ack from Muhammad. Even though the async > > > > mode is not part of the code base, it'll be a good test for verifying every > > > > single uffd-wp bit being set or cleared as expected. > > > I've tested by applying this patch. But the bug is still there. Just like > > > Peter has mentioned, we are using our in progress patches related to > > > pagemap_scan ioctl and userfaultd wp async patches to reproduce it. > > > > > > To reproduce please build kernel and run pagemap_ioctl test in mm in > > > hugetlb_mem_reproducer branch: > > > https://gitlab.collabora.com/usama.anjum/linux-mainline/-/tree/hugetlb_mem_reproducer > > > > > > In case you have any question on how to reproduce, please let me know. I'll > > > try to provide a cleaner alternative. > > > > Hmm, I think my current fix is incomplete if not wrong. The root cause > > should still be valid, however I overlooked another path: > > > > if (page_mapcount(old_page) == 1 && PageAnon(old_page)) { > > if (!PageAnonExclusive(old_page)) > > page_move_anon_rmap(old_page, vma); > > if (likely(!unshare)) > > set_huge_ptep_writable(vma, haddr, ptep); > > > > delayacct_wpcopy_end(); > > return 0; > > } > > > > We should bail out early in this path, and it'll be even easier we always > > bail out hugetlb_wp() as long as uffd-wp is detected because userfault > > should always be handled before any decision to CoW. > > > > v2 attached.. Please give it another shot. > > Hmmm, I think you must only do that for !unshare (FAULT_FLAG_WRITE). > Otherwise you'll never be able to resolve an unsharing request on a r/o > mapped hugetlb page that has the uffd-wp set? > > Or am I missing something? No, I think you're right. I'll fix that up when posting a v3. Thanks, -- Peter Xu