From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E40F5C6FD1C for ; Fri, 24 Mar 2023 08:51:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 59F656B0075; Fri, 24 Mar 2023 04:51:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 550396B0078; Fri, 24 Mar 2023 04:51:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3CA516B007B; Fri, 24 Mar 2023 04:51:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 2A3EE6B0075 for ; Fri, 24 Mar 2023 04:51:37 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 681B812035F for ; Fri, 24 Mar 2023 08:51:36 +0000 (UTC) X-FDA: 80603173392.11.77E330A Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf07.hostedemail.com (Postfix) with ESMTP id 816774000F for ; Fri, 24 Mar 2023 08:51:33 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="Hz/r9x0d"; spf=pass (imf07.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679647894; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Yv/84eJK6r5Ff9q11bolfxLRKLu7fL8hFk1vJYwSiNA=; b=Ctnjh+stbhrXgt8WkeCmwHZajhsaFew/DTv2EPKg2MF4dU2cpR0aezRMmVyo4gYC8QtV5z +6C27q01vkl/cv6rACiagX2yRVnK0q9CR0jFoI5jCZTe/F1iIfOXIHXU8I1F/5TY3cecpr KS+JCcifXhAfUOCkVROk6JUA0EpbeO0= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="Hz/r9x0d"; spf=pass (imf07.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679647894; a=rsa-sha256; cv=none; b=4pXJMqLvU2QrNUpBTXUvq9piHXfc43vEgldd6QFQnOyGJec1RGdwwHNuZWesJUKtQTdtGg pYKC3OFC48aTtANkusDUox3ViYZVpIytdlNhOt51L4qP4UsP1h++Nr+9YNun3oSxzCbo09 3DmFMPCwSCq8Ib/3romp+2J+j+oA15g= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679647892; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Yv/84eJK6r5Ff9q11bolfxLRKLu7fL8hFk1vJYwSiNA=; b=Hz/r9x0d8DTGLJB1vn2sZ3o70LEVFpkqRicHq3ES3k+1h+O3IvgJOslYbsR6JQmmv39vQM Gf4KDHpISJ6Yka7JA35Uia5kXZy0rk8RcqaxCsXeJebvTbCxVJbCCic475Eg0p0EwYJqQ6 qvHWYEFkRbS+t8+K2300Ig+2g490HEY= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-558-zx06w1ISMviOcfCgaCfzeQ-1; Fri, 24 Mar 2023 04:51:31 -0400 X-MC-Unique: zx06w1ISMviOcfCgaCfzeQ-1 Received: by mail-wm1-f72.google.com with SMTP id r35-20020a05600c322300b003edce6ff3b4so511392wmp.4 for ; Fri, 24 Mar 2023 01:51:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679647890; h=content-transfer-encoding:in-reply-to:organization:from :content-language:references:cc:to:subject:user-agent:mime-version :date:message-id:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Yv/84eJK6r5Ff9q11bolfxLRKLu7fL8hFk1vJYwSiNA=; b=IdL57mnLBlaAZojlhoY8lgculb85AMSBHLK4zxBbCN9SOLW6IYZEQ59qgm9aPhrWhN +gQeQg+OmhGHoRidnKo/xlCx2aroh/xv5eEVPQ1q2UXoNGbZPSVJrQQrbaBcOCq5xBc5 MZZMT0W1ROUABiNQ+W1wgu2tCW/dVGVj+yUC7fTOx6RjjyEYHNhotoZBuqkVT2yJlhUC TiQjFmxxyBgAOpkK2hUYQDBFSjYnliIfOMvLH3LKNxPomhJtAix/mAfduO2EijXayCSj JsZTlD5mHy78AfyGmTknJJ0r0C6mATFHosTX4h4KCQjmyU2VVQdSFay2LvcZ9Loj78oB zUiA== X-Gm-Message-State: AO0yUKWyblAa6rkx+kSvenb+E3NhacP75EioNYT1Vmx5PVA216MaI4qC P8vsA2y1T/UK5OsDtzGrrs+Eta56F15vHeFwYiMocsj2JjpPqMsteh6WFA2RbFFOzw8T3fbql2U z+xA6LxVVaEQ= X-Received: by 2002:a7b:ce87:0:b0:3ed:2eb5:c2e8 with SMTP id q7-20020a7bce87000000b003ed2eb5c2e8mr1733448wmj.10.1679647890487; Fri, 24 Mar 2023 01:51:30 -0700 (PDT) X-Google-Smtp-Source: AK7set94n3ISseXtVNAidSp4VI7jXPw+pEShrF/SmbkuflFdjnVVKbuG4ArpgvpcAoE1zFy8qp0Tow== X-Received: by 2002:a7b:ce87:0:b0:3ed:2eb5:c2e8 with SMTP id q7-20020a7bce87000000b003ed2eb5c2e8mr1733434wmj.10.1679647890161; Fri, 24 Mar 2023 01:51:30 -0700 (PDT) Received: from [10.105.158.254] ([88.128.92.189]) by smtp.gmail.com with ESMTPSA id h6-20020adfe986000000b002d09cba6beasm18105351wrm.72.2023.03.24.01.51.28 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 24 Mar 2023 01:51:29 -0700 (PDT) Message-ID: Date: Fri, 24 Mar 2023 09:51:27 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.0 Subject: Re: [PATCH] mm/hugetlb: Fix uffd wr-protection for CoW optimization path To: Peter Xu , Muhammad Usama Anjum Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrea Arcangeli , Andrew Morton , Axel Rasmussen , Mike Rapoport , Nadav Amit , linux-stable References: <20230321191840.1897940-1-peterx@redhat.com> <44aae7fc-fb1f-b38e-bc17-504abf054e3f@redhat.com> From: David Hildenbrand Organization: Red Hat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 816774000F X-Rspam-User: X-Stat-Signature: nka98nuqmox79kunrkbmbom5m3aywcu3 X-HE-Tag: 1679647893-524809 X-HE-Meta: U2FsdGVkX1+FylAKr8mvajlufIvDRkPJRx3vHsfvhJwjVoQywpgePDaVABlXLBsIW4WAwo6huJiFok/SwXi6/zyg69aH7T6Bos9wZqyDlPpjS6iXjQzKqeaqPlJbz1842RW3fjM0JeZ7rthMPc+RxYH0JUr/0EvXq9fStyPd/DG9RwqAdbko2w3Kb5Mb0zbfu1ODxoQ6gFWJ4i38/2xhXAvrGOlJiUH1zBhLdNfT5hGNJmsX9WZCAS5iUJQHDDx0V/mPVtnzmnc9Kbj05UCCEb/RNSqyfpMpLkLH7bkeZhNTyiTZiNoWj9iJLm/z6K2v+G9PFGHke1gt8CUokoruMIzv6Wlttog0Vo6RIu3oT4eIGPfQGgUDkLuJxRq1VGriL1TLa46oFegE1eYfnuYOcDM/8OI4PnWsAxD4OtSMaeyFoGXqII2aw0Yvzc2NV6pKsA87QzTe0iUy/iOEuZT1eKBSTf6rt9TGI5kkLlNhRZYqtes60fpzasTocVmnWptybeygiEKKT4A5ravPHrSl7Q6LFSfEzsddg0gHnUOjO2ffmLI5EqJZ+1yqdxNxZkJE7/YgSBvBLygPFBVNbfsWAX8zyD4Mhank7ai2GBalbTgfMQnp8vutzk+pJhkox4ytrWROAloQ+p0ROom0A3XAR2JXgPeZyvkq0wl2wupgX3TG+9YRIJB7gykSG6TW+R81zh2wGdvxVx+Mkd0fkOhxrMJC0Fds9aUjJ7EDnsfhmOHpNX0/CSNhmz8tU8BZMFxcM9fEK/MsV4UUD8eU9Pgu+lj1WKnupgPsIWqJRxJDg6/Hu8a6ZHlg9XVulLSVTmV5tWmZM5S5uWQFI3PpqlRLUjaq40o50pSoJplEdQVRLoiQ9A/j+WwqWSazX3vrqYBPkxR/4db9g8ULNuyGNS3f224Bl2JvoNFtu+jNv9M1z5MRTSCsN6kU96QdxxNjxOiDbIsFaAuplkGly5Wo0NH GjXqtkyN Prldfg6La58w8yv1B9o6uDIxG7lnLKXGTHBtxFZpfUxHgunpCytwbUPgC3G2uj6M35+iHkTVA/dtXCYS0Hk+1PoVETEKBz5dp+9HA3NaAlFY7NyzyDSQz3po/RbzSIKitF/Sv5uW0B1GYzi5h1Kyhojgtmv2tNvYqqO0AzZ/KDy2u2yxn5eQNRqjSHK1EnoGJKFXb6JKjGvgY+Z4xbTlezTfUBdcTkzPeBlQrl/N0w9c4+0XcOopWvPJ+mpZEkYyixxtXNkHiq076eWhruI+GjTe2jxbhNxEVf2gYxmlXDVhUlpYcfrSkkpO85ZE1OzhxeHUMAH6puIUdcSVDcvXpi15NYtjtV8IloERGyIKfObfKjp4y9mL1Icg7oYaQeraDL50wXc3c7T4uscDhIw6u2AB8t8BoisjweXyYYRdhXS2nCccr7C105C5FsPuQWklMXFs0urqZO2CRL+IcteGOPhXBCNkepAbnUd9ECrMWb492RpX5ehIdaL/RgftOILIokHiQ5JvY2PyRcnYmO0RcMZA2es8FEKgbfXwOybdCutrbOBXSlbLTCaAkZH2ECjgbBfPVyAGPwXWN5VW3HhhHnsX0LDCO44DpexjUxp+hXBUZMERHXGIASnnKOrkTCrR1Z2KhO4yH4HDyiz1R8XLiabLcyQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 23.03.23 23:11, Peter Xu wrote: > On Thu, Mar 23, 2023 at 08:33:07PM +0500, Muhammad Usama Anjum wrote: >> Hi Peter, >> >> Sorry for late reply. >> >> On 3/22/23 12:50 AM, Peter Xu wrote: >>> On Tue, Mar 21, 2023 at 08:36:35PM +0100, David Hildenbrand wrote: >>>> On 21.03.23 20:18, Peter Xu wrote: >>>>> This patch fixes an issue that a hugetlb uffd-wr-protected mapping can be >>>>> writable even with uffd-wp bit set. It only happens with all these >>>>> conditions met: (1) hugetlb memory (2) private mapping (3) original mapping >>>>> was missing, then (4) being wr-protected (IOW, pte marker installed). Then >>>>> write to the page to trigger. >>>>> >>>>> Userfaultfd-wp trap for hugetlb was implemented in hugetlb_fault() before >>>>> even reaching hugetlb_wp() to avoid taking more locks that userfault won't >>>>> need. However there's one CoW optimization path for missing hugetlb page >>>>> that can trigger hugetlb_wp() inside hugetlb_no_page(), that can bypass the >>>>> userfaultfd-wp traps. >>>>> >>>>> A few ways to resolve this: >>>>> >>>>> (1) Skip the CoW optimization for hugetlb private mapping, considering >>>>> that private mappings for hugetlb should be very rare, so it may not >>>>> really be helpful to major workloads. The worst case is we only skip the >>>>> optimization if userfaultfd_wp(vma)==true, because uffd-wp needs another >>>>> fault anyway. >>>>> >>>>> (2) Move the userfaultfd-wp handling for hugetlb from hugetlb_fault() >>>>> into hugetlb_wp(). The major cons is there're a bunch of locks taken >>>>> when calling hugetlb_wp(), and that will make the changeset unnecessarily >>>>> complicated due to the lock operations. >>>>> >>>>> (3) Carry over uffd-wp bit in hugetlb_wp(), so it'll need to fault again >>>>> for uffd-wp privately mapped pages. >>>>> >>>>> This patch chose option (3) which contains the minimum changeset (simplest >>>>> for backport) and also make sure hugetlb_wp() itself will start to be >>>>> always safe with uffd-wp ptes even if called elsewhere in the future. >>>>> >>>>> This patch will be needed for v5.19+ hence copy stable. >>>>> >>>>> Reported-by: Muhammad Usama Anjum >>>>> Cc: linux-stable >>>>> Fixes: 166f3ecc0daf ("mm/hugetlb: hook page faults for uffd write protection") >>>>> Signed-off-by: Peter Xu >>>>> --- >>>>> mm/hugetlb.c | 8 +++++--- >>>>> 1 file changed, 5 insertions(+), 3 deletions(-) >>>>> >>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >>>>> index 8bfd07f4c143..22337b191eae 100644 >>>>> --- a/mm/hugetlb.c >>>>> +++ b/mm/hugetlb.c >>>>> @@ -5478,7 +5478,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, >>>>> struct folio *pagecache_folio, spinlock_t *ptl) >>>>> { >>>>> const bool unshare = flags & FAULT_FLAG_UNSHARE; >>>>> - pte_t pte; >>>>> + pte_t pte, newpte; >>>>> struct hstate *h = hstate_vma(vma); >>>>> struct page *old_page; >>>>> struct folio *new_folio; >>>>> @@ -5622,8 +5622,10 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, >>>>> mmu_notifier_invalidate_range(mm, range.start, range.end); >>>>> page_remove_rmap(old_page, vma, true); >>>>> hugepage_add_new_anon_rmap(new_folio, vma, haddr); >>>>> - set_huge_pte_at(mm, haddr, ptep, >>>>> - make_huge_pte(vma, &new_folio->page, !unshare)); >>>>> + newpte = make_huge_pte(vma, &new_folio->page, !unshare); >>>>> + if (huge_pte_uffd_wp(pte)) >>>>> + newpte = huge_pte_mkuffd_wp(newpte); >>>>> + set_huge_pte_at(mm, haddr, ptep, newpte); >>>>> folio_set_hugetlb_migratable(new_folio); >>>>> /* Make the old page be freed below */ >>>>> new_folio = page_folio(old_page); >>>> >>>> Looks correct to me. Do we have a reproducer? >>> >>> I used a reproducer for the async mode I wrote (patch 2 attached, need to >>> change to VM_PRIVATE): >>> >>> https://lore.kernel.org/all/ZBNr4nohj%2FTw4Zhw@x1n/ >>> >>> I don't think kernel kselftest can trigger it because we don't do strict >>> checks yet with uffd-wp bits. I've already started looking into cleanup >>> the test cases and I do plan to add new tests to cover this. >>> >>> Meanwhile, let's also wait for an ack from Muhammad. Even though the async >>> mode is not part of the code base, it'll be a good test for verifying every >>> single uffd-wp bit being set or cleared as expected. >> I've tested by applying this patch. But the bug is still there. Just like >> Peter has mentioned, we are using our in progress patches related to >> pagemap_scan ioctl and userfaultd wp async patches to reproduce it. >> >> To reproduce please build kernel and run pagemap_ioctl test in mm in >> hugetlb_mem_reproducer branch: >> https://gitlab.collabora.com/usama.anjum/linux-mainline/-/tree/hugetlb_mem_reproducer >> >> In case you have any question on how to reproduce, please let me know. I'll >> try to provide a cleaner alternative. > > Hmm, I think my current fix is incomplete if not wrong. The root cause > should still be valid, however I overlooked another path: > > if (page_mapcount(old_page) == 1 && PageAnon(old_page)) { > if (!PageAnonExclusive(old_page)) > page_move_anon_rmap(old_page, vma); > if (likely(!unshare)) > set_huge_ptep_writable(vma, haddr, ptep); > > delayacct_wpcopy_end(); > return 0; > } > > We should bail out early in this path, and it'll be even easier we always > bail out hugetlb_wp() as long as uffd-wp is detected because userfault > should always be handled before any decision to CoW. > > v2 attached.. Please give it another shot. Hmmm, I think you must only do that for !unshare (FAULT_FLAG_WRITE). Otherwise you'll never be able to resolve an unsharing request on a r/o mapped hugetlb page that has the uffd-wp set? Or am I missing something? -- Thanks, David / dhildenb