From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 80F0DC6FD1A for ; Tue, 7 Mar 2023 16:11:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0F7AD280002; Tue, 7 Mar 2023 11:11:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 07F466B0073; Tue, 7 Mar 2023 11:11:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E8965280002; Tue, 7 Mar 2023 11:11:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id D3B8C6B0071 for ; Tue, 7 Mar 2023 11:11:57 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 77D8F1607A2 for ; Tue, 7 Mar 2023 16:11:57 +0000 (UTC) X-FDA: 80542593474.03.6873E99 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf21.hostedemail.com (Postfix) with ESMTP id F41DF1C000B for ; Tue, 7 Mar 2023 16:11:54 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UUC9YGtx; spf=pass (imf21.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1678205515; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=oo9Ql82WQk+WwZEhSCLKfZVtCnkfKktd5ammh9bPW10=; b=cz0WRgU8R+EIuBVtDgGwuGX9ic//ToKE7qY42bauYItZ+46DIfVQatBnx0sp/rKIeQN7p+ VM7oaYmPfU+5fCp6iyhc7gVDZbF4upe7FymO8d8Qk4l8lYEr3Br7vSybzu0uXXwEDWA9MH E6yobiNwv0QDZvDHwedpYpXtyOjZx7g= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UUC9YGtx; spf=pass (imf21.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1678205515; a=rsa-sha256; cv=none; b=BhRyEjCqZQ5PRBCauUMiwFqOEFYKpSEkkcb/s4k0Xm2Urxs5CStgV/YNSDqjWmXxGteE1Y XNdLhrefJ3W/Ab+mmwM3XR9VhUCX8KTMH+Y4UYAeUDyzRk4SXTDHqzBYU5FekxJ0QS/olJ YcKbtJJMxPGgxToMjtvLtbjedOaX7/M= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1678205514; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oo9Ql82WQk+WwZEhSCLKfZVtCnkfKktd5ammh9bPW10=; b=UUC9YGtxNSVQuwhBW+B0yu9edvBLVVKPTUBaBIpdvDCZQyoj45xac4Dv/qfdlOSsOPbDDG oJWnl2LRU0pfPIATHi6cG62MeOjo6zCW1kUMQ2VsSODuSDtyQ/29j61q5yfZv3ScpGdlKv CKa9ZWdUrTaZBB7JVYLX1qvDixg841c= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-534-VcGkTIfpNk-cjqTANF03LQ-1; Tue, 07 Mar 2023 11:11:28 -0500 X-MC-Unique: VcGkTIfpNk-cjqTANF03LQ-1 Received: by mail-wm1-f70.google.com with SMTP id bi21-20020a05600c3d9500b003e836e354e0so5144240wmb.5 for ; Tue, 07 Mar 2023 08:11:09 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678205469; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=oo9Ql82WQk+WwZEhSCLKfZVtCnkfKktd5ammh9bPW10=; b=Jb9kn1sddELHkeW8gg1qYVYewZQZiJmYAI26SG1ThdSbzPmJ9zxnb/dD9Hk8dKKPo/ 6tFYxY3QtD8Eg9EY+mDscHxP+Docj2dZt+rCsEoa9an9ANYURX84l944TLoKXXHxMGOG 2ieG9kLtzaqPqneSswcCiAftkgeFc7D/9L5BgUyny+STIbSUqEVk5O/3WixWNKSHjkqK 02665tnYMOve9hrHhNVHic3K2k6WyIHyDmIA+U+XLv2NTmDuSh0kJqAU0xnBrfKtflLE kPr6IzGIbtuRUVyMlL3Nf/00ppjsAp6Iuan+8tSsq5QQh1BDFev9nGqRZVmyv+v5ZBLE a/3A== X-Gm-Message-State: AO0yUKWB9HZ7jAci7+RQhmzIEa9yVq9iQeFGtSaMt3ug7hxRLZErKPr4 edroW46Tc/ed7hDp+EgSzuvtu8fTA7QfuU5Fnc7eyM/nzeabTSYbWHer8zWTi2dUVKPOi/yT5Sm QiMjEZScs8dA= X-Received: by 2002:a5d:658b:0:b0:2c9:79f4:101f with SMTP id q11-20020a5d658b000000b002c979f4101fmr10742131wru.34.1678205468789; Tue, 07 Mar 2023 08:11:08 -0800 (PST) X-Google-Smtp-Source: AK7set83og5nEYla97Fnw+XisXoZwVed2N+iJd6f789nKsBveRVprqXqIQ2BzzXK+rD9vLm4ROQKXw== X-Received: by 2002:a5d:658b:0:b0:2c9:79f4:101f with SMTP id q11-20020a5d658b000000b002c979f4101fmr10742101wru.34.1678205468396; Tue, 07 Mar 2023 08:11:08 -0800 (PST) Received: from ?IPV6:2003:cb:c707:a100:e20:41da:c49b:8974? (p200300cbc707a1000e2041dac49b8974.dip0.t-ipconnect.de. [2003:cb:c707:a100:e20:41da:c49b:8974]) by smtp.gmail.com with ESMTPSA id d18-20020a5d6452000000b002c71dd1109fsm12715822wrw.47.2023.03.07.08.11.07 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 07 Mar 2023 08:11:07 -0800 (PST) Message-ID: <94be7b9f-c33e-c6dc-4132-6cb78f7c0624@redhat.com> Date: Tue, 7 Mar 2023 17:11:07 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 To: Peter Xu , linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muhammad Usama Anjum , Andrea Arcangeli , Axel Rasmussen , Mike Rapoport , Nadav Amit , Paul Gofman , Andrew Morton References: <20230306213925.617814-1-peterx@redhat.com> <20230306213925.617814-2-peterx@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v3 1/2] mm/uffd: UFFD_FEATURE_WP_UNPOPULATED In-Reply-To: <20230306213925.617814-2-peterx@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: F41DF1C000B X-Stat-Signature: o3fxsfr8hyjfh6q6y1qzdgsrr1yd1udo X-HE-Tag: 1678205514-794515 X-HE-Meta: U2FsdGVkX1/hozH2vDR2BY5sdxMaZuWbgwco2r+/D6Yv4pLlbcrR1HKBZz4hPOw8xkuLTqMliIfFxkiK/gfydGw/LeueBt9qQGolftNIsEsrr5XB8BHAm4JgCt+49ovh6X3iGx0j3IG/MQS79H9bqI32tR0Y6W7BwPb3VD8qWEnc17WcRA8dG+i7vUWU5kiFnt2zPedthJvTyfcQ1+XYxxerp22l6lTypNzd4oE0TrFInaCCG4KY4r2k1g61smPM5IZDnP+X7C8cJjlHigpsxjk4dOYEqs29daD8QaYlSdDQnicAAlBwj9sfmPs4WyGvbb8h/pXkl1B7yTXybhhXeUTR0tOSoqAGjivjFCUkIe3l4v7prI8nZYGGNeIKBHW6JpEnvOLxGp9bH637qVmBNF13ZzgTm5Cu1OqBe/6HuWhscq7G5AyNC+kn1/1TrGPP6+pELOCOSoIrM8s19S1xs1Dwu1h2iiWGKsXwBopKGxXOqcMty2Q92SA7hGT04RZEZ6owNigkghHBnOAeo/AP5ln12C0TzLLcnDCKdFPF1wG13XqkEfb+9IKtw9JqCwElhhnX1LAuxzk/0gS48HnMRm6EJsTfzaXcvU8piKihp9V3u+5WLNCBt8mj/O1VXeKjScgQ5j5d8iaf3KkghYTrYX8gODsUE+ISq2dSmOiyk/vCL27xlbuVK0v8HpxCTxM8rvfVloaDhBxEqRpVxsSplAhGyZjeK2fIljYQ7q2jASYddcqhIu5lXgPqym1QEDuFhyd+Bl2hMirc6hqSlCuPeU79HRiXBOJ67IabPmLKZofiLK5jx3daAMen9X7xPvfmUBXwd3I0I6G1GCOsqKxwllb3SGLLZ3LMoR6GD3b+eN/9idrNNlSuMEeWN31ntxgj6NY/v5iPEOZITP5bIHRD0/T1r7E1y7WD3Mu3ltL1KZ9BAFDAOLWyw/LFeM1i2Vwnnryn9Fta2AmIeGUPt+Z Y5OnBI00 qXISvqPKC3+PAZoUcEmI6WRlHkXRustyL/XkgsW1xqFwNiVISIsU2aBGxXWCLPlTbYKvOAW1M75Aay4GB7bCnDY7WzYZOjw0k7tcmErj8+GCczvOoSjc7sqgvUUl31gBCiUMZGB/JxjD8y1LbqsMP+qkZRNYdkS1gLB4xc9NqWW69BN9j6f0GplJ9DwSlSH5uurgh/ThAFcLnw3P5rsSGAe0dV3jkVZ4K5u9mZXKCK4FswTnZ3mUQhNoqy2/kw5JTfZyHWU3gZVsfOspRdM8NyhVGgPa3INnFBSqT61OFzt5CXx/w+2NEkWvFDJ367Pz99ewXYbpM9huJ8V8tfGSFSZd3fOf/PsoO79f8sy/7u7t9OZXbZrdKPan/7qdnkmzANOngq+RmQEhy/jmLHlH883CUxFa0EPex3vV+v8PMJese+VC+qMSoBdHW7QmSFhxIu4H00dgMBvOfOpfOmP+iKXTZn+p0MCde33GqfMOeCS26sfYhqvCFmaNA/YfvWPCOXbDfaveM/9oOTVR8c7aeEFMcJeP2qWpLj0mMiCuw2sUC0BB6UEp/6L1mwpwConfgfVmNUHw6EUvAu7hERN9HL/DfIR0U6lcEEXo19sbuHjJO+Ep8wF9fiPtiu2hHyw+8kpJp3R/B4RltxQA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 06.03.23 22:39, Peter Xu wrote: Note that I wodnered for a second if we'd call it "UFFD_FEATURE_WP_MISSING" instead (similar to the definition of MISSING uffd that triggers when we have nothing mapped). Just a thought. [...] > With WP_UNPOPUATED, application like QEMU can avoid pre-read faults all the > memory before wr-protect during taking a live snapshot. Quotting from > Muhammad's test result here [3] based on a simple program [4]: > > (1) With huge page disabled > echo madvise > /sys/kernel/mm/transparent_hugepage/enabled > ./uffd_wp_perf > Test DEFAULT: 4 > Test PRE-READ: 1111453 (pre-fault 1101011) > Test MADVISE: 278276 (pre-fault 266378) > Test WP-UNPOPULATE: 11712 > > (2) With Huge page enabled > echo always > /sys/kernel/mm/transparent_hugepage/enabled > ./uffd_wp_perf > Test DEFAULT: 4 > Test PRE-READ: 22521 (pre-fault 22348) > Test MADVISE: 4909 (pre-fault 4743) > Test WP-UNPOPULATE: 14448 > > There'll be a great perf boost for no-thp case, while for thp enabled with > extreme case of all-thp-zero WP_UNPOPULATED can be slower than MADVISE, but > that's low possibility in reality, also the overhead was not reduced but > postponed until a follow up write on any huge zero thp, so potentitially it s/potentitially/potentially/ > is faster by making the follow up writes slower. What I realized, interrestingly not only the writes, but also the reads. In case of background snapshots we'll be reading all VM memory I think ... but we could optimize in QEMU by consulting the pagemap if there is anything mapped at all, and not read zeros in that case [an optimization brought up several times already]. I am not sure yet if we want to change the QEMU implementation. But anyhow, that's a different discussion. > > [1] https://lore.kernel.org/all/20210401092226.102804-4-andrey.gruzdev@virtuozzo.com/ > [2] https://lore.kernel.org/all/Y+v2HJ8+3i%2FKzDBu@x1n/ > [3] https://lore.kernel.org/all/d0eb0a13-16dc-1ac1-653a-78b7273781e3@collabora.com/ > [4] https://github.com/xzpeter/clibs/blob/master/uffd-test/uffd-wp-perf.c > > Signed-off-by: Peter Xu > --- > fs/userfaultfd.c | 14 ++++++++ > include/linux/mm_inline.h | 6 ++++ > include/linux/userfaultfd_k.h | 6 ++++ > include/uapi/linux/userfaultfd.h | 10 +++++- > mm/memory.c | 56 ++++++++++++++++++++++-------- > mm/mprotect.c | 59 ++++++++++++++++++++++++++------ > 6 files changed, 126 insertions(+), 25 deletions(-) [...] > > +static vm_fault_t handle_pte_missing(struct vm_fault *vmf) > +{ > + if (vma_is_anonymous(vmf->vma)) > + return do_anonymous_page(vmf); > + else > + return do_fault(vmf); > +} > + > /* > * This is actually a page-missing access, but with uffd-wp special pte > * installed. It means this pte was wr-protected before being unmapped. > @@ -3634,11 +3664,10 @@ static vm_fault_t pte_marker_handle_uffd_wp(struct vm_fault *vmf) > * Just in case there're leftover special ptes even after the region > * got unregistered - we can simply clear them. > */ > - if (unlikely(!userfaultfd_wp(vmf->vma) || vma_is_anonymous(vmf->vma))) > + if (unlikely(!userfaultfd_wp(vmf->vma))) > return pte_marker_clear(vmf); > > - /* do_fault() can handle pte markers too like none pte */ > - return do_fault(vmf); > + return handle_pte_missing(vmf); > } > > static vm_fault_t handle_pte_marker(struct vm_fault *vmf) > @@ -4008,6 +4037,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > */ > static vm_fault_t do_anonymous_page(struct vm_fault *vmf) > { > + bool uffd_wp = vmf_orig_pte_uffd_wp(vmf); > struct vm_area_struct *vma = vmf->vma; > struct folio *folio; > vm_fault_t ret = 0; > @@ -4041,7 +4071,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) > vma->vm_page_prot)); > vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, > vmf->address, &vmf->ptl); > - if (!pte_none(*vmf->pte)) { > + if (vmf_pte_changed(vmf)) { > update_mmu_tlb(vma, vmf->address, vmf->pte); > goto unlock; > } > @@ -4081,7 +4111,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) > > vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, > &vmf->ptl); > - if (!pte_none(*vmf->pte)) { > + if (vmf_pte_changed(vmf)) { > update_mmu_tlb(vma, vmf->address, vmf->pte); > goto release; > } > @@ -4101,6 +4131,8 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) > folio_add_new_anon_rmap(folio, vma, vmf->address); > folio_add_lru_vma(folio, vma); > setpte: > + if (uffd_wp) > + entry = pte_mkuffd_wp(entry); > set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry); > > /* No need to invalidate - it was non-present before */ > @@ -4268,7 +4300,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page) > void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr) > { > struct vm_area_struct *vma = vmf->vma; > - bool uffd_wp = pte_marker_uffd_wp(vmf->orig_pte); > + bool uffd_wp = vmf_orig_pte_uffd_wp(vmf); > bool write = vmf->flags & FAULT_FLAG_WRITE; > bool prefault = vmf->address != addr; > pte_t entry; > @@ -4915,12 +4947,8 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf) > } > } > > - if (!vmf->pte) { > - if (vma_is_anonymous(vmf->vma)) > - return do_anonymous_page(vmf); > - else > - return do_fault(vmf); > - } > + if (!vmf->pte) > + return handle_pte_missing(vmf); It would better blend in if it would be called "do_pte_missing()". > > if (!pte_present(vmf->orig_pte)) > return do_swap_page(vmf); > diff --git a/mm/mprotect.c b/mm/mprotect.c > index 231929f119d9..6a2df93158ee 100644 > --- a/mm/mprotect.c > +++ b/mm/mprotect.c > @@ -276,7 +276,16 @@ static long change_pte_range(struct mmu_gather *tlb, > } else { > /* It must be an none page, or what else?.. */ > WARN_ON_ONCE(!pte_none(oldpte)); > - if (unlikely(uffd_wp && !vma_is_anonymous(vma))) { > + > + /* > + * Nobody plays with any none ptes besides > + * userfaultfd when applying the protections. > + */ > + if (likely(!uffd_wp)) > + continue; > + > + if (!vma_is_anonymous(vma) || > + userfaultfd_wp_unpopulated(vma)) { I think it would make sense to replace all 3 instances of this check by a new function (userfaultfd_wp_use_markers() ? ) and move some doc from pgtable_populate_needed() in there. > /* > * For file-backed mem, we need to be able to > * wr-protect a none pte, because even if the > @@ -320,23 +329,53 @@ static inline int pmd_none_or_clear_bad_unless_trans_huge(pmd_t *pmd) > return 0; > } > > -/* Return true if we're uffd wr-protecting file-backed memory, or false */ > +/* > + * Return true if we want to split huge thps in change protection > + * procedure, false otherwise. > + */ > static inline bool > -uffd_wp_protect_file(struct vm_area_struct *vma, unsigned long cp_flags) > +pgtable_split_needed(struct vm_area_struct *vma, unsigned long cp_flags) > { > + /* > + * pte markers only resides in pte level, if we need pte markers, > + * we need to split. We cannot wr-protect shmem thp because file > + * thp is handled differently when split by erasing the pmd so far. > + */ > return (cp_flags & MM_CP_UFFD_WP) && !vma_is_anonymous(vma); > } > > /* > - * If wr-protecting the range for file-backed, populate pgtable for the case > - * when pgtable is empty but page cache exists. When {pte|pmd|...}_alloc() > - * failed we treat it the same way as pgtable allocation failures during > - * page faults by kicking OOM and returning error. > + * Return true if we want to populate pgtables in change protection > + * procedure, false otherwise > + */ > +static inline bool > +pgtable_populate_needed(struct vm_area_struct *vma, unsigned long cp_flags) > +{ > + /* If not within ioctl(UFFDIO_WRITEPROTECT), then don't bother */ > + if (!(cp_flags & MM_CP_UFFD_WP)) > + return false; > + > + /* Either if this is file-based, we need it for pte markers */ > + if (!vma_is_anonymous(vma)) > + return true; > + > + /* > + * Or anonymous, we only need this if WP_ZEROPAGE enabled (to > + * install zero pages). s/WP_ZEROPAGE/WP_UNPOPULATED/ > + */ > + return userfaultfd_wp_unpopulated(vma); > +} > + -- Thanks, David / dhildenb