From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 466CAC7618A for ; Mon, 20 Mar 2023 10:21:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7BEA26B0074; Mon, 20 Mar 2023 06:21:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 76DF46B0075; Mon, 20 Mar 2023 06:21:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5E8656B0078; Mon, 20 Mar 2023 06:21:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 4C2D96B0074 for ; Mon, 20 Mar 2023 06:21:23 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 0460FA08B0 for ; Mon, 20 Mar 2023 10:21:22 +0000 (UTC) X-FDA: 80588884446.24.794F10E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf29.hostedemail.com (Postfix) with ESMTP id 31160120021 for ; Mon, 20 Mar 2023 10:21:18 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=XPdjKZ9l; spf=pass (imf29.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679307679; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BgSWj3zSpvR+t6erekcaSg4yxSMvQy9o3Bkd3DNLF24=; b=3QjERDL/eO7LV2x1iKbCcg5pzMC3WtX/qT3Db4+uXnFy457bO6WCGWWNJ+7Q4tzoJQTjg/ +dEKW/agLimjiD2NX+mTCCeHRNWRVQyiXzcLFp1db5tFRSkaD+cWCBWHNxI4wS6QK2fAXn blYUmUjoVbcQMvpnZzDMuL34ziu7LM8= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=XPdjKZ9l; spf=pass (imf29.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679307679; a=rsa-sha256; cv=none; b=TSZpHIbHnyCNIaq7VK6rJqZamKC8NGQw/nBe9JFCcVS0MziXeBTzvHGgFcWLMC4RhcsYbN 6eqPdOpIp/EhEEURnnrRKEoFkkAZW/rCAsapWAOyl77xqCGxPnBXkKX3Fd0bFnySzkuIuL YvCC38YWzCv7RAMd+e3XhxXzjyBVXIk= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679307678; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BgSWj3zSpvR+t6erekcaSg4yxSMvQy9o3Bkd3DNLF24=; b=XPdjKZ9lEZsgWYGc/KZ3YiqnsjzQhQIcLllEgACbdQT0m2PJmgL7I2iNqNOla1SeQIxDqH 4KWN0Vnp23mai2+/KHj+PNP+KOj62wpZeiqnxsE5MwJ5Ud277o92DEuynYabIZY1kK1/IX ycSOHICiGYDy2hHgmMZ7+PLjSc6ML68= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-634-B4VlKCwpPwe7wWwV1Btanw-1; Mon, 20 Mar 2023 06:21:17 -0400 X-MC-Unique: B4VlKCwpPwe7wWwV1Btanw-1 Received: by mail-wm1-f69.google.com with SMTP id bi7-20020a05600c3d8700b003edecc610abso1529717wmb.7 for ; Mon, 20 Mar 2023 03:21:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679307676; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=BgSWj3zSpvR+t6erekcaSg4yxSMvQy9o3Bkd3DNLF24=; b=KTvTNVldHDnOkEKYyZV8NsdKcEH9D1HVR7T5FB016ouSjTd116kabQSleBJsIQgyl4 Diy3ql+sgQeFpVg6jP8pu26maf/ij0xmQ0jDllQlwol3Gg6AMUYDOWMJem/n6bmmSTVE ngI38fI1dFV36qiHDfzA2TC/th4TI9Q5gIot+7SXFUjqeJdBpWpmR9ihMfWhMYwmAo9l uVcp69t1negam7KUmHJrKcyrZ8F4mQgMRsWWFs1yqQ+p2kN26njEnixbBmMzh+1mDzYb DG7b6PZ2//M09+vXhPm+Ri71XK/qk4tIq7Tfht81dUeOhjUNLe5afd6hXVsG5Pim7naq AqKQ== X-Gm-Message-State: AO0yUKVCASNrysMKua57MgIo3tHJSedGKQVZgOUObK75jtJbXnbNhHVR 2LHl3Zb0uMevxlwVYyGlRToD/tsXyJf85me3286TI+11matM6NoW8NcvH8WWVSxCGwKFu7m9TM+ ZHdl+H/+VsQbzNJbbsdo= X-Received: by 2002:a05:600c:4ecf:b0:3eb:29fe:70ec with SMTP id g15-20020a05600c4ecf00b003eb29fe70ecmr36012810wmq.27.1679307675669; Mon, 20 Mar 2023 03:21:15 -0700 (PDT) X-Google-Smtp-Source: AK7set+vItu0U3zmhAl3fXtHhWhsS19G9xFs9rpMm3JTWvnO8fUtGK+E9HJKmLUIHcQQoJKjWjITsg== X-Received: by 2002:a05:600c:4ecf:b0:3eb:29fe:70ec with SMTP id g15-20020a05600c4ecf00b003eb29fe70ecmr36012793wmq.27.1679307675309; Mon, 20 Mar 2023 03:21:15 -0700 (PDT) Received: from ?IPV6:2003:cb:c702:4100:a064:1ded:25ec:cf2f? (p200300cbc7024100a0641ded25eccf2f.dip0.t-ipconnect.de. [2003:cb:c702:4100:a064:1ded:25ec:cf2f]) by smtp.gmail.com with ESMTPSA id n15-20020a1c720f000000b003ee0fc6244asm861481wmc.32.2023.03.20.03.21.14 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 20 Mar 2023 03:21:14 -0700 (PDT) Message-ID: Date: Mon, 20 Mar 2023 11:21:13 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 To: Peter Xu , linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Nadav Amit , Axel Rasmussen , Paul Gofman , Muhammad Usama Anjum , Mike Rapoport , Andrea Arcangeli , Andrew Morton References: <20230309223711.823547-1-peterx@redhat.com> <20230309223711.823547-2-peterx@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v4 1/2] mm/uffd: UFFD_FEATURE_WP_UNPOPULATED In-Reply-To: <20230309223711.823547-2-peterx@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 31160120021 X-Stat-Signature: p6phekiatjyk1y8gawp69ixjtbsffaa1 X-Rspam-User: X-HE-Tag: 1679307678-799447 X-HE-Meta: U2FsdGVkX19fnmkUf193pNMS1lc8LJOGVpgP3FaqH36tCo2ZpnYoQllnMmyU8OmPNVsXmSpIY3tG/y5cB+qbPzjjyKrP9QtZfsv7tARjYEqrOCdB4nqWkjORVVNsg14rlwNgfRFM0tukIkna13XVUuK38ikMOd5Qmayx8e0VL2u25ktSBZ288uQxROc7KnJYdHnNOEKCIVWAoBlDy03xFCC90rIIzdFsYUlgula6Oc3tvJCkec4wKH+AHKJ3ASgvOmt9QVXMxOjPTLlHx4+g19Mat802Ox3MxoXiYUifvMFivFQ/ej9EVFwf9iJ1I9z0j7Dy3YEte4+G2aABhb981HvAaO9Iq5upOAaYbQCWQgPLnoRckr6fOQm51iqUuTkduXjDKVML26x9jVGvCH1W+x7YKDzTiPc9IvbcbzwW79PezLmX2vJ+YfkahJ7siFr9gyQ3VkJIxZbd9ipY6T+B7+nwS9GcIu0v02X3i98InUPKRXHZr+gF8vbu1Gn+WAHqHJawNUIRHqhlGDoPX14y63+yxdQCl098gsibLU26cxXszdEsaTDXPIk2fU7WwRF03ZFdt/6h3KPx46FpJFE0BRzyLi3ydf2EBJ5NIyRnM/o/gVaFvb/4ttYfxXB5GMSvWAMAVPqeGbP/puZsZ1bXF7n8mtAWKqxrZXOBgs+g/uuGQeg0U+mTt/3X+xcuJanNo3nUnCgW37/oH0u9O+3ssFLfil0YlP9kMuztNhP6tBtAihXxG0FQzbMerrYD/NJkYxXre7zwcz8gFEAWKI1b4J2WN68nPvjSc+/gQWMy1HvbB0E3oGtAiHBTiH0iMaldUDC6hvu3aSTUnHBhcCAczs5RWW3zP67+2CYC3LgD/tC8CQ5gHvVeB1Q6wcsjpGFJRNPtt+ziebxnhIBCZc9x8QW9iNC+AdGHwmPDHu4U2/7N5FoMdeqD+K9J6mtBA5bOae+88GQ8Q1IS9Xk9Kfj v6zE4WYB 2kko5v7VnyOHpdly5Icx+FciUE7yDa0yMYCUFNxYcn5gUFyrNQfJx8DuwIVawHzb0cJd756bg8SpbDtenI4vIHxxZ0MGDoI8WiZrDxx3Rd2T8v7mpZEcFtzg7+chnOMrV+yszoHjkMzCZPIRdvljHKPvZUpDm0ZCsIk1uxTFuIV7cob2aBqJaOKUBwoJrx8vi6wH1SyP4qIR/ExOflD6dJLmO38T8KNCP37EqFQIvKElhhd6VpxkeeU6ntsgdlsUSs6mLp/WbPZonjs0JolnHhOO13U7FyhblIrsF+YvXXBNfl8pFE/UFJfdZJDPDqP81qY4D9UdxigDviUKopP4iFyJFTgrJh3yd2XH3mTRbk5GZacthFOw4ponh7uB5JrbFVBhzOuIl7uT/P2god2uq5PYOSLa/Vk3L8CKlAZqQpKRNx7YpDYOedHJ89IhMCssX8Mbxh1+qITQYLq/hrXtG07Q8CDN00B9WMsFbkR0kG2LIY0s5up9JXybR9JcJBzgTE55BM+qetk4fwkCcVL2E1xPDTnlR5/7r7xNkvJSIVF5dTNSC3ZrozcelZGaHvlVhI/Ox4KogMgA6bm+LMlxoESRDHKnvAVJKAQkUMZspQNmvjQNcigwuwuTvwbcsc0jLvEhdJy9Qt4h8moNolQy4VagleK9JjqURqT0XutYOOrvWXMxR25eX6i9KQGqktcXdE7lyplgu9AtvN7JKcWaiAiU1ubvN4ZSkSrKCfBgKL/g2Ai5iAo3Ti7l4Ujs1g2jlJ1d1J8/i8oKdhoo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > (1) With huge page disabled > echo madvise > /sys/kernel/mm/transparent_hugepage/enabled > ./uffd_wp_perf > Test DEFAULT: 4 > Test PRE-READ: 1111453 (pre-fault 1101011) > Test MADVISE: 278276 (pre-fault 266378) Thinking about it, I guess the biggest slowdown here is the "one fake pagefault at a time" handling. > Test WP-UNPOPULATE: 11712 > > (2) With Huge page enabled > echo always > /sys/kernel/mm/transparent_hugepage/enabled > ./uffd_wp_perf > Test DEFAULT: 4 > Test PRE-READ: 22521 (pre-fault 22348) > Test MADVISE: 4909 (pre-fault 4743) > Test WP-UNPOPULATE: 14448 > > There'll be a great perf boost for no-thp case, while for thp enabled with > extreme case of all-thp-zero WP_UNPOPULATED can be slower than MADVISE, but > that's low possibility in reality, also the overhead was not reduced but > postponed until a follow up write on any huge zero thp, so potentially it > is faster by making the follow up writes slower. > > [1] https://lore.kernel.org/all/20210401092226.102804-4-andrey.gruzdev@virtuozzo.com/ > [2] https://lore.kernel.org/all/Y+v2HJ8+3i%2FKzDBu@x1n/ > [3] https://lore.kernel.org/all/d0eb0a13-16dc-1ac1-653a-78b7273781e3@collabora.com/ > [4] https://github.com/xzpeter/clibs/blob/master/uffd-test/uffd-wp-perf.c > > Signed-off-by: Peter Xu > --- > Documentation/admin-guide/mm/userfaultfd.rst | 17 ++++++ > fs/userfaultfd.c | 16 ++++++ > include/linux/mm_inline.h | 6 +++ > include/linux/userfaultfd_k.h | 23 ++++++++ > include/uapi/linux/userfaultfd.h | 10 +++- > mm/memory.c | 56 +++++++++++++++----- > mm/mprotect.c | 51 ++++++++++++++---- > 7 files changed, 154 insertions(+), 25 deletions(-) > > diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst > index 7dc823b56ca4..c86b56c95ea6 100644 > --- a/Documentation/admin-guide/mm/userfaultfd.rst > +++ b/Documentation/admin-guide/mm/userfaultfd.rst > @@ -219,6 +219,23 @@ former will have ``UFFD_PAGEFAULT_FLAG_WP`` set, the latter > you still need to supply a page when ``UFFDIO_REGISTER_MODE_MISSING`` was > used. > > +Userfaultfd write-protect mode currently behave differently on none ptes > +(when e.g. page is missing) over different types of memories. > + > +For anonymous memory, ``ioctl(UFFDIO_WRITEPROTECT)`` will ignore none ptes > +(e.g. when pages are missing and not populated). For file-backed memories > +like shmem and hugetlbfs, none ptes will be write protected just like a > +present pte. In other words, there will be a userfaultfd write fault > +message generated when writting to a missing page on file typed memories, s/writting/writing/ > +as long as the page range was write-protected before. Such a message will > +not be generated on anonymous memories by default. > + > +If the application wants to be able to write protect none ptes on anonymous > +memory, one can pre-populate the memory with e.g. MADV_POPULATE_READ. On > +newer kernels, one can also detect the feature UFFD_FEATURE_WP_UNPOPULATED > +and set the feature bit in advance to make sure none ptes will also be > +write protected even upon anonymous memory. > + [...] > /* > * A number of key systems in x86 including ioremap() rely on the assumption > @@ -1350,6 +1364,10 @@ zap_install_uffd_wp_if_needed(struct vm_area_struct *vma, > unsigned long addr, pte_t *pte, > struct zap_details *details, pte_t pteval) > { > + /* Zap on anonymous always means dropping everything */ > + if (vma_is_anonymous(vma)) > + return; > + > if (zap_drop_file_uffd_wp(details)) > return; > > @@ -1456,8 +1474,12 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, > continue; > rss[mm_counter(page)]--; > } else if (pte_marker_entry_uffd_wp(entry)) { > - /* Only drop the uffd-wp marker if explicitly requested */ > - if (!zap_drop_file_uffd_wp(details)) > + /* > + * For anon: always drop the marker; for file: only > + * drop the marker if explicitly requested. > + */ So MADV_DONTNEED a pte marker in an anonymous VMA will always remove that marker. Is that the same handling as for MADV_DONTNEED on shmem or on fallocate(PUNCHHOLE) on shmem? > + if (!vma_is_anonymous(vma) && > + !zap_drop_file_uffd_wp(details)) > continue; Maybe it would be nicer to have a zap_drop_uffd_wp_marker(vma, details) and have the comment in there. Especially because of the other hunk above. So zap_drop_file_uffd_wp(details) -> zap_drop_uffd_wp_marker(vma, details) and move the anon handling + comment in there. > } else if (is_hwpoison_entry(entry) || > is_swapin_error_entry(entry)) { > @@ -3624,6 +3646,14 @@ static vm_fault_t pte_marker_clear(struct vm_fault *vmf) > return 0; > } > > +static vm_fault_t do_pte_missing(struct vm_fault *vmf) > +{ > + if (vma_is_anonymous(vmf->vma)) > + return do_anonymous_page(vmf); > + else > + return do_fault(vmf); No need for the "else" statement. > +} > + > /* > * This is actually a page-missing access, but with uffd-wp special pte > * installed. It means this pte was wr-protected before being unmapped. > @@ -3634,11 +3664,10 @@ static vm_fault_t pte_marker_handle_uffd_wp(struct vm_fault *vmf) > * Just in case there're leftover special ptes even after the region > * got unregistered - we can simply clear them. > */ > - if (unlikely(!userfaultfd_wp(vmf->vma) || vma_is_anonymous(vmf->vma))) > + if (unlikely(!userfaultfd_wp(vmf->vma))) > return pte_marker_clear(vmf); > > - /* do_fault() can handle pte markers too like none pte */ > - return do_fault(vmf); > + return do_pte_missing(vmf); > } > [...] > diff --git a/mm/mprotect.c b/mm/mprotect.c > index 231929f119d9..455f7051098f 100644 > --- a/mm/mprotect.c > +++ b/mm/mprotect.c > @@ -276,7 +276,15 @@ static long change_pte_range(struct mmu_gather *tlb, > } else { > /* It must be an none page, or what else?.. */ > WARN_ON_ONCE(!pte_none(oldpte)); > - if (unlikely(uffd_wp && !vma_is_anonymous(vma))) { > + > + /* > + * Nobody plays with any none ptes besides > + * userfaultfd when applying the protections. > + */ > + if (likely(!uffd_wp)) > + continue; > + > + if (userfaultfd_wp_use_markers(vma)) { > /* > * For file-backed mem, we need to be able to > * wr-protect a none pte, because even if the > @@ -320,23 +328,46 @@ static inline int pmd_none_or_clear_bad_unless_trans_huge(pmd_t *pmd) > return 0; > } > > -/* Return true if we're uffd wr-protecting file-backed memory, or false */ > +/* > + * Return true if we want to split huge thps in change protection "huge thps" sounds redundant. "if we want to PTE-map a huge PMD" ? > + * procedure, false otherwise. In general, Acked-by: David Hildenbrand -- Thanks, David / dhildenb