From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68EF3C4332F for ; Thu, 8 Dec 2022 11:47:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0C94B8E0003; Thu, 8 Dec 2022 06:47:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0523C8E0001; Thu, 8 Dec 2022 06:47:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E0EF68E0003; Thu, 8 Dec 2022 06:47:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CE4F08E0001 for ; Thu, 8 Dec 2022 06:47:00 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 9C0ED1C2EB3 for ; Thu, 8 Dec 2022 11:47:00 +0000 (UTC) X-FDA: 80218962600.27.655D975 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf21.hostedemail.com (Postfix) with ESMTP id A3A2F1C000D for ; Thu, 8 Dec 2022 11:46:58 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="f9UN/yTH"; spf=pass (imf21.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1670500018; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=oGw8YYU0rbHc/quYUzo9IFL61lrcqVseJSNshZ63/gE=; b=ie/6IQXxShHVfv0x2Xi9L02fz1vYBJuCO84YvdxLBg5y1QThO6+KeSygsH0oKBpcV0qecY OkQL++F/KQ5VTyaF0NxRaafg+j7Eafvpx/ZTaTt5VMF4BrW33FWxrZp7xxvvwKoY97lqt7 3tIPqzrTeMSg5d4dNoTKV34k13xVgm8= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="f9UN/yTH"; spf=pass (imf21.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1670500018; a=rsa-sha256; cv=none; b=PhZBX1WWlY+VJ0c+2S/Q600aCwJ/5GoSiaWOwFiZVSi7c5ZgDcbHs89uh7b8bnu0alXkUP k3AwDZyB6fNYrS3oV321gP1Vx8qMPWY3Mz/RdOohdft7s1/giJbX9zm1gpuqNJk16wPaEo taaP6aps90lQ5a4c8bfePOeAQrBiQKk= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670500018; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oGw8YYU0rbHc/quYUzo9IFL61lrcqVseJSNshZ63/gE=; b=f9UN/yTHF6+EsbJTWqLrzXtCSk6rROixiw9wWNXETbLVnCLfD3O6Gf8meo6e5i/J9Ke2rU eLl4Lq26tCIY5Sn5NhrihZp1JELlG+j3S0gVXYMhH8nHJ6XIXPF37LKsF84GjM04D8GBrw XF8AZ/0SxsKsNXt6ZpS7pEIihrIe+hs= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-353-1OBYF3-ePDqgtjsIOozFSg-1; Thu, 08 Dec 2022 06:46:56 -0500 X-MC-Unique: 1OBYF3-ePDqgtjsIOozFSg-1 Received: by mail-wm1-f70.google.com with SMTP id m17-20020a05600c3b1100b003cf9cc47da5so692625wms.9 for ; Thu, 08 Dec 2022 03:46:56 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:organization:references:cc:to :from:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=oGw8YYU0rbHc/quYUzo9IFL61lrcqVseJSNshZ63/gE=; b=2kz8Y7bi0gDc1938QOF4DzXwPPZrOMPzKOpHdFC8oeQ7GMde1yrB5fFGTMdKtKDno+ XUEeEHxn/jMjLdxQEABdvFEa4VxpYnH4lMgm/mOcKbtzIdS21rkcwB7DuLO408sEBduW AThu5XVAeoX11W2TlLWc90NCg3Y+9898vqo0UjPTX405qtZOq8WgAJl+jlPmxF/ZpMlN ZCWskmJYNimBbNUwz0/D8hzWEBysfBDioqcjUWG0y1g9UUlaLGhGlJ3vNw0Qde9ZYSce +WQK4DUZ2YDoqaaYOHLeM89gA9U8SuYP83qAQB/bAfuimEzx3xFAvc1+yNUjtgptcp/U NPDQ== X-Gm-Message-State: ANoB5pnrvP8UtqtjZiweUqkuG3bVl/a4W4HABFa3bwSuR44NVqRCHrzJ EBdbYjI3Pl29YTlEVZzVvj57rHwcGDZKydBPOWk06CDYYcCIQZZ+90fC/0kP9w5a+8PzXoJK3zy vkIza2mute3o= X-Received: by 2002:a5d:698b:0:b0:242:768:8aef with SMTP id g11-20020a5d698b000000b0024207688aefmr35307632wru.544.1670500015716; Thu, 08 Dec 2022 03:46:55 -0800 (PST) X-Google-Smtp-Source: AA0mqf5XUwuUi7GA/xwxko4CqHajEq3kCK4c/2pFhp7kR0oD6YYj39Hiq6XC0KRzcA8rXFetziOdDA== X-Received: by 2002:a5d:698b:0:b0:242:768:8aef with SMTP id g11-20020a5d698b000000b0024207688aefmr35307624wru.544.1670500015368; Thu, 08 Dec 2022 03:46:55 -0800 (PST) Received: from ?IPV6:2a09:80c0:192:0:5dac:bf3d:c41:c3e7? ([2a09:80c0:192:0:5dac:bf3d:c41:c3e7]) by smtp.gmail.com with ESMTPSA id m11-20020a7bcb8b000000b003d09150b339sm4551530wmi.20.2022.12.08.03.46.54 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 08 Dec 2022 03:46:54 -0800 (PST) Message-ID: Date: Thu, 8 Dec 2022 12:46:53 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.5.0 Subject: Re: [PATCH v1] mm/userfaultfd: enable writenotify while userfaultfd-wp is enabled for a VMA From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, Ives van Hoorne , Peter Xu , stable@vger.kernel.org, Andrew Morton , Alistair Popple , Mike Rapoport , Nadav Amit , Andrea Arcangeli , Hugh Dickins References: <20221208114137.35035-1-david@redhat.com> <83259ac7-1aa4-e186-43d9-2b280795e510@redhat.com> Organization: Red Hat In-Reply-To: <83259ac7-1aa4-e186-43d9-2b280795e510@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: A3A2F1C000D X-Rspam-User: X-Stat-Signature: xwj5w1n4yfw43q1z3ptj8xujrktz3k5g X-HE-Tag: 1670500018-361806 X-HE-Meta: U2FsdGVkX1+HF0qobb7TowrUcYnHtAK/AnouozZRhXM2hAouRP/fWyNRqZrl9mH/CjGV03Fx/g349cRTGseLFv9Uucpc7yvOuDQx0TN+9AbWJqFkUINnZDuAsJKlDM2kLMC270oMSBHxh1ch3c2L2QWkm5wRfccdEjGXV7dzReTvjPEVCuyBpDcJWZ/psMaRnaW7/U2w4G8c9hViS0WAJLJQXQVOb91/IleCrEV2Un3dmqcak0RZr1COCf6c971NDVuoMq9LW9AlPJt6o3hJ+5iD13LY0+QFqqCH9jqtPjtCVvkBifmfi32INzb1+dW7pL4XQ9pQfU8zlSzSkqN6blQLQYTzKuZP7iG0BmEW13/uT+IhHPuj42i4isXMcwCHAK+CY0XJXoKJkILO4US8wy7050DX/1Sn3Q/XdrgOirgvm616FN2WcnU4i3j0phUkbXm2Ll5ipDxk0ZAsKFnnywZxWLrTIkn1OYUOJOHI/KyO1qOoyBw08aWdC8qGXXcXaiNLIZ0zgkPowELq8CNCEmJ8WquNoV9BoNiSgEyGuAWSRPVHKvcgzAvZ88QSsiqP/AiKX4Kjx6uQNEgoPP3Vsr+j8mKsfqeBsKphigjLF0g2aiNs5K2DFS28/eCEFEyTAH4x8DjSzZ9xJRrQ33knFNUoK5zh2484SfEtMp29euw3yrZBKYAv3CCt66yKcJbe3DViOZ8Bptn8mO6YIiPXAPokZhfJUiPCxrayPgwGwWrU0NT0GIBL5p1x/qNhPZW+xfXMTlUH4UhwHSTtBClC/bQeXOb5wEHy0Zlk38Eb14Ob0JyxQkOH4AJcZVsxVlDn7YloDoXrCzYG0ioHqLvHVCHyMHuDj5Iz7heh+xKfiVCWfVh0/drx9tsGIWjaqZQ8UCwo93cVn7tU1SOR1Rm1Cij5fOtY+2AVUSnPp6Q4GL4axHb/Y8Chyv+oLde/6ivN6qO7jHqh43EtwUYRhve 3r8DGRLd T39wKR0OmjuBryeh8lypZO2U0/YoF4WffqX5CYOUOOVls2agfJ4iYcerfm5S2xcjHXlimfxGEhZnO2S6/ugJ/9PYMzBWbplysRQQ/Hnkc1vY984VUv6QFdi10XEp5VvIe/ErD0w4tGyQT7TcxSY7pLBUr+m3lS9y3MppVriUI8dHmzJOyaSTzuhNm1iO3fBAtnx/wR93LK1/ET5OYEQJZYhLc1Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 08.12.22 12:45, David Hildenbrand wrote: > On 08.12.22 12:41, David Hildenbrand wrote: >> Currently, we don't enable writenotify when enabling userfaultfd-wp on >> a shared writable mapping (for now only shmem and hugetlb). The consequence >> is that vma->vm_page_prot will still include write permissions, to be set >> as default for all PTEs that get remapped (e.g., mprotect(), NUMA hinting, >> page migration, ...). >> >> So far, vma->vm_page_prot is assumed to be a safe default, meaning that >> we only add permissions (e.g., mkwrite) but not remove permissions (e.g., >> wrprotect). For example, when enabling softdirty tracking, we enable >> writenotify. With uffd-wp on shared mappings, that changed. More details >> on vma->vm_page_prot semantics were summarized in [1]. >> >> This is problematic for uffd-wp: we'd have to manually check for >> a uffd-wp PTEs/PMDs and manually write-protect PTEs/PMDs, which is error >> prone. Prone to such issues is any code that uses vma->vm_page_prot to set >> PTE permissions: primarily pte_modify() and mk_pte(). >> >> Instead, let's enable writenotify such that PTEs/PMDs/... will be mapped >> write-protected as default and we will only allow selected PTEs that are >> definitely safe to be mapped without write-protection (see >> can_change_pte_writable()) to be writable. In the future, we might want >> to enable write-bit recovery -- e.g., can_change_pte_writable() -- at >> more locations, for example, also when removing uffd-wp protection. >> >> This fixes two known cases: >> >> (a) remove_migration_pte() mapping uffd-wp'ed PTEs writable, resulting >> in uffd-wp not triggering on write access. >> (b) do_numa_page() / do_huge_pmd_numa_page() mapping uffd-wp'ed PTEs/PMDs >> writable, resulting in uffd-wp not triggering on write access. >> >> Note that do_numa_page() / do_huge_pmd_numa_page() can be reached even >> without NUMA hinting (which currently doesn't seem to be applicable to >> shmem), for example, by using uffd-wp with a PROT_WRITE shmem VMA. >> On such a VMA, userfaultfd-wp is currently non-functional. >> >> Note that when enabling userfaultfd-wp, there is no need to walk page >> tables to enforce the new default protection for the PTEs: we know that >> they cannot be uffd-wp'ed yet, because that can only happen after >> enabling uffd-wp for the VMA in general. >> >> Also note that this makes mprotect() on ranges with uffd-wp'ed PTEs not >> accidentally set the write bit -- which would result in uffd-wp not >> triggering on later write access. This commit makes uffd-wp on shmem behave >> just like uffd-wp on anonymous memory (iow, less special) in that regard, >> even though, mixing mprotect with uffd-wp is controversial. >> >> [1] https://lkml.kernel.org/r/92173bad-caa3-6b43-9d1e-9a471fdbc184@redhat.com >> >> Reported-by: Ives van Hoorne >> Debugged-by: Peter Xu >> Fixes: b1f9e876862d ("mm/uffd: enable write protection for shmem & hugetlbfs") >> Cc: stable@vger.kernel.org >> Cc: Andrew Morton >> Cc: Hugh Dickins > > No idea how a wrong mail address from Hugh sneaked in 2 (I assume, > copy-paste issue from de1ccfb64824). Let's properly cc him and keep the > full patch. This time really ;) > >> Cc: Alistair Popple >> Cc: Mike Rapoport >> Cc: Nadav Amit >> Cc: Andrea Arcangeli >> Signed-off-by: David Hildenbrand >> --- >> >> As discussed in [2], this is supposed to replace the fix by Peter: >> [PATCH v3 1/2] mm/migrate: Fix read-only page got writable when recover >> pte >> >> This survives vm/selftests and my reproducers: >> * migrating pages that are uffd-wp'ed using mbind() on a machine with 2 >> NUMA nodes >> * Using a PROT_WRITE mapping with uffd-wp >> * Using a PROT_READ|PROT_WRITE mapping with uffd-wp'ed pages and >> mprotect()'ing it PROT_WRITE >> * Using a PROT_READ|PROT_WRITE mapping with uffd-wp'ed pages and >> temporarily mprotect()'ing it PROT_READ >> >> uffd-wp properly triggers in all cases. On v8.1-rc8, all mre reproducers >> fail. >> >> It would be good to get some more testing feedback and review. >> >> [2] https://lkml.kernel.org/r/20221202122748.113774-1-david@redhat.com >> >> --- >> fs/userfaultfd.c | 28 ++++++++++++++++++++++------ >> mm/mmap.c | 4 ++++ >> 2 files changed, 26 insertions(+), 6 deletions(-) >> >> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c >> index 98ac37e34e3d..fb0733f2e623 100644 >> --- a/fs/userfaultfd.c >> +++ b/fs/userfaultfd.c >> @@ -108,6 +108,21 @@ static bool userfaultfd_is_initialized(struct userfaultfd_ctx *ctx) >> return ctx->features & UFFD_FEATURE_INITIALIZED; >> } >> >> +static void userfaultfd_set_vm_flags(struct vm_area_struct *vma, >> + vm_flags_t flags) >> +{ >> + const bool uffd_wp = !!((vma->vm_flags | flags) & VM_UFFD_WP); >> + >> + vma->vm_flags = flags; >> + /* >> + * For shared mappings, we want to enable writenotify while >> + * userfaultfd-wp is enabled (see vma_wants_writenotify()). We'll simply >> + * recalculate vma->vm_page_prot whenever userfaultfd-wp is involved. >> + */ >> + if ((vma->vm_flags & VM_SHARED) && uffd_wp) >> + vma_set_page_prot(vma); >> +} >> + >> static int userfaultfd_wake_function(wait_queue_entry_t *wq, unsigned mode, >> int wake_flags, void *key) >> { >> @@ -618,7 +633,8 @@ static void userfaultfd_event_wait_completion(struct userfaultfd_ctx *ctx, >> for_each_vma(vmi, vma) { >> if (vma->vm_userfaultfd_ctx.ctx == release_new_ctx) { >> vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; >> - vma->vm_flags &= ~__VM_UFFD_FLAGS; >> + userfaultfd_set_vm_flags(vma, >> + vma->vm_flags & ~__VM_UFFD_FLAGS); >> } >> } >> mmap_write_unlock(mm); >> @@ -652,7 +668,7 @@ int dup_userfaultfd(struct vm_area_struct *vma, struct list_head *fcs) >> octx = vma->vm_userfaultfd_ctx.ctx; >> if (!octx || !(octx->features & UFFD_FEATURE_EVENT_FORK)) { >> vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; >> - vma->vm_flags &= ~__VM_UFFD_FLAGS; >> + userfaultfd_set_vm_flags(vma, vma->vm_flags & ~__VM_UFFD_FLAGS); >> return 0; >> } >> >> @@ -733,7 +749,7 @@ void mremap_userfaultfd_prep(struct vm_area_struct *vma, >> } else { >> /* Drop uffd context if remap feature not enabled */ >> vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; >> - vma->vm_flags &= ~__VM_UFFD_FLAGS; >> + userfaultfd_set_vm_flags(vma, vma->vm_flags & ~__VM_UFFD_FLAGS); >> } >> } >> >> @@ -895,7 +911,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file) >> prev = vma; >> } >> >> - vma->vm_flags = new_flags; >> + userfaultfd_set_vm_flags(vma, new_flags); >> vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; >> } >> mmap_write_unlock(mm); >> @@ -1463,7 +1479,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, >> * the next vma was merged into the current one and >> * the current one has not been updated yet. >> */ >> - vma->vm_flags = new_flags; >> + userfaultfd_set_vm_flags(vma, new_flags); >> vma->vm_userfaultfd_ctx.ctx = ctx; >> >> if (is_vm_hugetlb_page(vma) && uffd_disable_huge_pmd_share(vma)) >> @@ -1651,7 +1667,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, >> * the next vma was merged into the current one and >> * the current one has not been updated yet. >> */ >> - vma->vm_flags = new_flags; >> + userfaultfd_set_vm_flags(vma, new_flags); >> vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; >> >> skip: >> diff --git a/mm/mmap.c b/mm/mmap.c >> index a5eb2f175da0..6033d20198b0 100644 >> --- a/mm/mmap.c >> +++ b/mm/mmap.c >> @@ -1525,6 +1525,10 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot) >> if (vma_soft_dirty_enabled(vma) && !is_vm_hugetlb_page(vma)) >> return 1; >> >> + /* Do we need write faults for uffd-wp tracking? */ >> + if (userfaultfd_wp(vma)) >> + return 1; >> + >> /* Specialty mapping? */ >> if (vm_flags & VM_PFNMAP) >> return 0; >> >> base-commit: 8ed710da2873c2aeb3bb805864a699affaf1d03b > -- Thanks, David / dhildenb