From: Nadav Amit <nadav.amit@gmail.com>
To: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Mike Rapoport <rppt@linux.ibm.com>,
Axel Rasmussen <axelrasmussen@google.com>,
Nadav Amit <namit@vmware.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Andrew Cooper <andrew.cooper3@citrix.com>,
Andy Lutomirski <luto@kernel.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
David Hildenbrand <david@redhat.com>,
Peter Xu <peterx@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Will Deacon <will@kernel.org>, Yu Zhao <yuzhao@google.com>,
Nick Piggin <npiggin@gmail.com>
Subject: [RFC PATCH 01/14] userfaultfd: set dirty and young on writeprotect
Date: Mon, 18 Jul 2022 05:01:59 -0700 [thread overview]
Message-ID: <20220718120212.3180-2-namit@vmware.com> (raw)
In-Reply-To: <20220718120212.3180-1-namit@vmware.com>
From: Nadav Amit <namit@vmware.com>
When userfaultfd makes a PTE writable, it can now change the PTE
directly, in some cases, without going triggering a page-fault first.
Yet, doing so might leave the PTE that was write-unprotected as old and
clean. At least on x86, this would cause a >500 cycles overhead when the
PTE is first accessed.
Use MM_CP_WILL_NEED to set the PTE as young and dirty when userfaultfd
gets a hint that the page is likely to be used. Avoid changing the PTE
to young and dirty in other cases to avoid excessive writeback and
messing with the page reclamation logic.
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Nick Piggin <npiggin@gmail.com>
---
include/linux/mm.h | 2 ++
mm/mprotect.c | 9 ++++++++-
mm/userfaultfd.c | 8 ++++++--
3 files changed, 16 insertions(+), 3 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9cc02a7e503b..4afd75ce5875 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1988,6 +1988,8 @@ extern unsigned long move_page_tables(struct vm_area_struct *vma,
/* Whether this change is for write protecting */
#define MM_CP_UFFD_WP (1UL << 2) /* do wp */
#define MM_CP_UFFD_WP_RESOLVE (1UL << 3) /* Resolve wp */
+/* Whether to try to mark entries as dirty as they are to be written */
+#define MM_CP_WILL_NEED (1UL << 4)
#define MM_CP_UFFD_WP_ALL (MM_CP_UFFD_WP | \
MM_CP_UFFD_WP_RESOLVE)
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 996a97e213ad..34c2dfb68c42 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -82,6 +82,7 @@ static unsigned long change_pte_range(struct mmu_gather *tlb,
bool prot_numa = cp_flags & MM_CP_PROT_NUMA;
bool uffd_wp = cp_flags & MM_CP_UFFD_WP;
bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE;
+ bool will_need = cp_flags & MM_CP_WILL_NEED;
tlb_change_page_size(tlb, PAGE_SIZE);
@@ -172,6 +173,9 @@ static unsigned long change_pte_range(struct mmu_gather *tlb,
ptent = pte_clear_uffd_wp(ptent);
}
+ if (will_need)
+ ptent = pte_mkyoung(ptent);
+
/*
* In some writable, shared mappings, we might want
* to catch actual write access -- see
@@ -187,8 +191,11 @@ static unsigned long change_pte_range(struct mmu_gather *tlb,
*/
if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) &&
!pte_write(ptent) &&
- can_change_pte_writable(vma, addr, ptent))
+ can_change_pte_writable(vma, addr, ptent)) {
ptent = pte_mkwrite(ptent);
+ if (will_need)
+ ptent = pte_mkdirty(ptent);
+ }
ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent);
if (pte_needs_flush(oldpte, ptent))
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 954c6980b29f..e0492f5f06a0 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -749,6 +749,7 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
bool enable_wp = uffd_flags & UFFD_FLAGS_WP;
struct vm_area_struct *dst_vma;
unsigned long page_mask;
+ unsigned long cp_flags;
struct mmu_gather tlb;
pgprot_t newprot;
int err;
@@ -795,9 +796,12 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
else
newprot = vm_get_page_prot(dst_vma->vm_flags);
+ cp_flags = enable_wp ? MM_CP_UFFD_WP : MM_CP_UFFD_WP_RESOLVE;
+ if (uffd_flags & (UFFD_FLAGS_ACCESS_LIKELY|UFFD_FLAGS_WRITE_LIKELY))
+ cp_flags |= MM_CP_WILL_NEED;
+
tlb_gather_mmu(&tlb, dst_mm);
- change_protection(&tlb, dst_vma, start, start + len, newprot,
- enable_wp ? MM_CP_UFFD_WP : MM_CP_UFFD_WP_RESOLVE);
+ change_protection(&tlb, dst_vma, start, start + len, newprot, cp_flags);
tlb_finish_mmu(&tlb);
err = 0;
--
2.25.1
next parent reply other threads:[~2022-07-18 19:37 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20220718120212.3180-1-namit@vmware.com>
2022-07-18 12:01 ` Nadav Amit [this message]
2022-07-19 20:47 ` Peter Xu
2022-07-20 9:39 ` David Hildenbrand
2022-07-20 13:10 ` Peter Xu
2022-07-20 15:10 ` David Hildenbrand
2022-07-20 19:15 ` Peter Xu
2022-07-20 19:33 ` David Hildenbrand
2022-07-20 19:48 ` Peter Xu
2022-07-20 19:55 ` David Hildenbrand
2022-07-20 20:22 ` Nadav Amit
2022-07-20 20:38 ` David Hildenbrand
2022-07-20 20:56 ` Nadav Amit
2022-07-21 7:52 ` David Hildenbrand
2022-07-21 14:10 ` David Hildenbrand
2022-07-20 9:42 ` David Hildenbrand
2022-07-20 17:36 ` Nadav Amit
2022-07-20 18:00 ` David Hildenbrand
2022-07-20 18:09 ` Nadav Amit
2022-07-20 18:11 ` David Hildenbrand
2022-07-18 12:02 ` [RFC PATCH 02/14] userfaultfd: try to map write-unprotected pages Nadav Amit
2022-07-19 20:49 ` Peter Xu
2022-07-18 12:02 ` [RFC PATCH 03/14] mm/mprotect: allow exclusive anon pages to be writable Nadav Amit
2022-07-20 15:19 ` David Hildenbrand
2022-07-20 17:25 ` Nadav Amit
2022-07-21 7:45 ` David Hildenbrand
2022-07-18 12:02 ` [RFC PATCH 04/14] mm/mprotect: preserve write with MM_CP_TRY_CHANGE_WRITABLE Nadav Amit
2022-07-18 12:02 ` [RFC PATCH 06/14] mm/rmap: avoid flushing on page_vma_mkclean_one() when possible Nadav Amit
2022-07-18 12:02 ` [RFC PATCH 07/14] mm: do fix spurious page-faults for instruction faults Nadav Amit
2022-07-18 12:02 ` [RFC PATCH 08/14] x86/mm: introduce flush_tlb_fix_spurious_fault Nadav Amit
2022-07-18 12:02 ` [RFC PATCH 10/14] x86/mm: introduce relaxed TLB flushes Nadav Amit
2022-07-18 12:02 ` [RFC PATCH 11/14] x86/mm: use relaxed TLB flushes when protection is removed Nadav Amit
2022-07-18 12:02 ` [RFC PATCH 12/14] x86/tlb: no flush on PTE change from RW->RO when PTE is clean Nadav Amit
2022-07-18 12:02 ` [RFC PATCH 14/14] mm: conditional check of pfn in pte_flush_type Nadav Amit
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220718120212.3180-2-namit@vmware.com \
--to=nadav.amit@gmail.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=andrew.cooper3@citrix.com \
--cc=axelrasmussen@google.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=namit@vmware.com \
--cc=npiggin@gmail.com \
--cc=peterx@redhat.com \
--cc=peterz@infradead.org \
--cc=rppt@linux.ibm.com \
--cc=tglx@linutronix.de \
--cc=will@kernel.org \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox