From: David Hildenbrand <david@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Ives van Hoorne <ives@codesandbox.io>,
stable@vger.kernel.org, Andrew Morton <akpm@linux-foundation.org>,
Hugh Dickins <hugh@veritas.com>,
Alistair Popple <apopple@nvidia.com>,
Mike Rapoport <rppt@linux.vnet.ibm.com>,
Nadav Amit <nadav.amit@gmail.com>,
Andrea Arcangeli <aarcange@redhat.com>
Subject: Re: [PATCH v1] mm/userfaultfd: enable writenotify while userfaultfd-wp is enabled for a VMA
Date: Thu, 8 Dec 2022 17:44:35 +0100 [thread overview]
Message-ID: <b9162f04-7d8e-1ada-f428-85fd84327d1c@redhat.com> (raw)
In-Reply-To: <Y5IQzJkBSYwPOtiP@x1n>
On 08.12.22 17:29, Peter Xu wrote:
> On Thu, Dec 08, 2022 at 12:41:37PM +0100, David Hildenbrand wrote:
>> Currently, we don't enable writenotify when enabling userfaultfd-wp on
>> a shared writable mapping (for now only shmem and hugetlb). The consequence
>> is that vma->vm_page_prot will still include write permissions, to be set
>> as default for all PTEs that get remapped (e.g., mprotect(), NUMA hinting,
>> page migration, ...).
>>
>> So far, vma->vm_page_prot is assumed to be a safe default, meaning that
>> we only add permissions (e.g., mkwrite) but not remove permissions (e.g.,
>> wrprotect). For example, when enabling softdirty tracking, we enable
>> writenotify. With uffd-wp on shared mappings, that changed. More details
>> on vma->vm_page_prot semantics were summarized in [1].
>>
>> This is problematic for uffd-wp: we'd have to manually check for
>> a uffd-wp PTEs/PMDs and manually write-protect PTEs/PMDs, which is error
>> prone. Prone to such issues is any code that uses vma->vm_page_prot to set
>> PTE permissions: primarily pte_modify() and mk_pte().
>>
>> Instead, let's enable writenotify such that PTEs/PMDs/... will be mapped
>> write-protected as default and we will only allow selected PTEs that are
>> definitely safe to be mapped without write-protection (see
>> can_change_pte_writable()) to be writable. In the future, we might want
>> to enable write-bit recovery -- e.g., can_change_pte_writable() -- at
>> more locations, for example, also when removing uffd-wp protection.
>>
>> This fixes two known cases:
>>
>> (a) remove_migration_pte() mapping uffd-wp'ed PTEs writable, resulting
>> in uffd-wp not triggering on write access.
>> (b) do_numa_page() / do_huge_pmd_numa_page() mapping uffd-wp'ed PTEs/PMDs
>> writable, resulting in uffd-wp not triggering on write access.
>>
>> Note that do_numa_page() / do_huge_pmd_numa_page() can be reached even
>> without NUMA hinting (which currently doesn't seem to be applicable to
>> shmem), for example, by using uffd-wp with a PROT_WRITE shmem VMA.
>> On such a VMA, userfaultfd-wp is currently non-functional.
>>
>> Note that when enabling userfaultfd-wp, there is no need to walk page
>> tables to enforce the new default protection for the PTEs: we know that
>> they cannot be uffd-wp'ed yet, because that can only happen after
>> enabling uffd-wp for the VMA in general.
>>
>> Also note that this makes mprotect() on ranges with uffd-wp'ed PTEs not
>> accidentally set the write bit -- which would result in uffd-wp not
>> triggering on later write access. This commit makes uffd-wp on shmem behave
>> just like uffd-wp on anonymous memory (iow, less special) in that regard,
>> even though, mixing mprotect with uffd-wp is controversial.
>>
>> [1] https://lkml.kernel.org/r/92173bad-caa3-6b43-9d1e-9a471fdbc184@redhat.com
>>
>> Reported-by: Ives van Hoorne <ives@codesandbox.io>
>> Debugged-by: Peter Xu <peterx@redhat.com>
>> Fixes: b1f9e876862d ("mm/uffd: enable write protection for shmem & hugetlbfs")
>> Cc: stable@vger.kernel.org
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Hugh Dickins <hugh@veritas.com>
>> Cc: Alistair Popple <apopple@nvidia.com>
>> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
>> Cc: Nadav Amit <nadav.amit@gmail.com>
>> Cc: Andrea Arcangeli <aarcange@redhat.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>
> Acked-by: Peter Xu <peterx@redhat.com>
>
> One trivial nit.
>
>> ---
>>
>> As discussed in [2], this is supposed to replace the fix by Peter:
>> [PATCH v3 1/2] mm/migrate: Fix read-only page got writable when recover
>> pte
>>
>> This survives vm/selftests and my reproducers:
>> * migrating pages that are uffd-wp'ed using mbind() on a machine with 2
>> NUMA nodes
>> * Using a PROT_WRITE mapping with uffd-wp
>> * Using a PROT_READ|PROT_WRITE mapping with uffd-wp'ed pages and
>> mprotect()'ing it PROT_WRITE
>> * Using a PROT_READ|PROT_WRITE mapping with uffd-wp'ed pages and
>> temporarily mprotect()'ing it PROT_READ
>>
>> uffd-wp properly triggers in all cases. On v8.1-rc8, all mre reproducers
>> fail.
>>
>> It would be good to get some more testing feedback and review.
>>
>> [2] https://lkml.kernel.org/r/20221202122748.113774-1-david@redhat.com
>>
>> ---
>> fs/userfaultfd.c | 28 ++++++++++++++++++++++------
>> mm/mmap.c | 4 ++++
>> 2 files changed, 26 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
>> index 98ac37e34e3d..fb0733f2e623 100644
>> --- a/fs/userfaultfd.c
>> +++ b/fs/userfaultfd.c
>> @@ -108,6 +108,21 @@ static bool userfaultfd_is_initialized(struct userfaultfd_ctx *ctx)
>> return ctx->features & UFFD_FEATURE_INITIALIZED;
>> }
>>
>> +static void userfaultfd_set_vm_flags(struct vm_area_struct *vma,
>> + vm_flags_t flags)
>> +{
>> + const bool uffd_wp = !!((vma->vm_flags | flags) & VM_UFFD_WP);
>
> IIUC this can be "uffd_wp_changed" then switch "|" to "^". But not a hot
> path at all, so shouldn't matter a lot.
Yes, let's do that (we can also remove the !! here):
This hunk will be:
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 98ac37e34e3d..a988485ada05 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -108,6 +108,21 @@ static bool userfaultfd_is_initialized(struct userfaultfd_ctx *ctx)
return ctx->features & UFFD_FEATURE_INITIALIZED;
}
+static void userfaultfd_set_vm_flags(struct vm_area_struct *vma,
+ vm_flags_t flags)
+{
+ const bool uffd_wp_changed = (vma->vm_flags ^ flags) & VM_UFFD_WP;
+
+ vma->vm_flags = flags;
+ /*
+ * For shared mappings, we want to enable writenotify while
+ * userfaultfd-wp is enabled (see vma_wants_writenotify()). We'll simply
+ * recalculate vma->vm_page_prot whenever userfaultfd-wp changes.
+ */
+ if ((vma->vm_flags & VM_SHARED) && uffd_wp_changed)
+ vma_set_page_prot(vma);
+}
+
I'll wait for some more (+retest) before I resend tomorrow.
Thanks!
--
Thanks,
David / dhildenb
next prev parent reply other threads:[~2022-12-08 16:44 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-08 11:41 David Hildenbrand
2022-12-08 11:45 ` David Hildenbrand
2022-12-08 11:46 ` David Hildenbrand
2022-12-08 16:29 ` Peter Xu
2022-12-08 16:44 ` David Hildenbrand [this message]
2022-12-08 20:06 ` Peter Xu
2022-12-08 20:21 ` Peter Xu
2022-12-09 8:07 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b9162f04-7d8e-1ada-f428-85fd84327d1c@redhat.com \
--to=david@redhat.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=hugh@veritas.com \
--cc=ives@codesandbox.io \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nadav.amit@gmail.com \
--cc=peterx@redhat.com \
--cc=rppt@linux.vnet.ibm.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox