linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ryan Roberts <ryan.roberts@arm.com>
To: Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Uladzislau Rezki <urezki@gmail.com>,
	Christoph Hellwig <hch@infradead.org>,
	David Hildenbrand <david@redhat.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Alexandre Ghiti <alexghiti@rivosinc.com>,
	Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>,
	linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH v2 07/14] arm64/mm: Avoid barriers for invalid or userspace mappings
Date: Mon, 17 Feb 2025 14:07:59 +0000	[thread overview]
Message-ID: <20250217140809.1702789-8-ryan.roberts@arm.com> (raw)
In-Reply-To: <20250217140809.1702789-1-ryan.roberts@arm.com>

__set_pte_complete(), set_pmd(), set_pud(), set_p4d() and set_pgd() are
used to write entries into pgtables. And they issue barriers (currently
dsb and isb) to ensure that the written values are observed by the table
walker prior to any program-order-future memory access to the mapped
location.

Over the years some of these functions have received optimizations: In
particular, commit 7f0b1bf04511 ("arm64: Fix barriers used for page
table modifications") made it so that the barriers were only emitted for
valid-kernel mappings for set_pte() (now __set_pte_complete()). And
commit 0795edaf3f1f ("arm64: pgtable: Implement p[mu]d_valid() and check
in set_p[mu]d()") made it so that set_pmd()/set_pud() only emitted the
barriers for valid mappings. set_p4d()/set_pgd() continue to emit the
barriers unconditionally.

This is all very confusing to the casual observer; surely the rules
should be invariant to the level? Let's change this so that every level
consistently emits the barriers only when setting valid, non-user
entries (both table and leaf).

It seems obvious that if it is ok to elide barriers all but valid kernel
mappings at pte level, it must also be ok to do this for leaf entries at
other levels: If setting an entry to invalid, a tlb maintenance
operation must surely follow to synchronise the TLB and this contains
the required barriers. If setting a valid user mapping, the previous
mapping must have been invalid and there must have been a TLB
maintenance operation (complete with barriers) to honour
break-before-make. So the worst that can happen is we take an extra
fault (which will imply the DSB + ISB) and conclude that there is
nothing to do. These are the arguments for doing this optimization at
pte level and they also apply to leaf mappings at other levels.

For table entries, the same arguments hold: If unsetting a table entry,
a TLB is required and this will emit the required barriers. If setting a
table entry, the previous value must have been invalid and the table
walker must already be able to observe that. Additionally the contents
of the pgtable being pointed to in the newly set entry must be visible
before the entry is written and this is enforced via smp_wmb() (dmb) in
the pgtable allocation functions and in __split_huge_pmd_locked(). But
this last part could never have been enforced by the barriers in
set_pXd() because they occur after updating the entry. So ultimately,
the wost that can happen by eliding these barriers for user table
entries is an extra fault.

I observe roughly the same number of page faults (107M) with and without
this change when compiling the kernel on Apple M2.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/pgtable.h | 34 ++++++++++++++++++++++++++------
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index e4b1946b261f..51128c2956f8 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -767,6 +767,19 @@ static inline bool in_swapper_pgdir(void *addr)
 	        ((unsigned long)swapper_pg_dir & PAGE_MASK);
 }
 
+static inline bool pmd_valid_not_user(pmd_t pmd)
+{
+	/*
+	 * User-space table entries always have (PXN && !UXN). All other
+	 * combinations indicate it's a table entry for kernel space.
+	 * Valid-not-user leaf entries follow the same rules as
+	 * pte_valid_not_user().
+	 */
+	if (pmd_table(pmd))
+		return !((pmd_val(pmd) & (PMD_TABLE_PXN | PMD_TABLE_UXN)) == PMD_TABLE_PXN);
+	return pte_valid_not_user(pmd_pte(pmd));
+}
+
 static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
 {
 #ifdef __PAGETABLE_PMD_FOLDED
@@ -778,7 +791,7 @@ static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
 
 	WRITE_ONCE(*pmdp, pmd);
 
-	if (pmd_valid(pmd)) {
+	if (pmd_valid_not_user(pmd)) {
 		dsb(ishst);
 		isb();
 	}
@@ -833,6 +846,7 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
 #define pud_valid(pud)		pte_valid(pud_pte(pud))
 #define pud_user(pud)		pte_user(pud_pte(pud))
 #define pud_user_exec(pud)	pte_user_exec(pud_pte(pud))
+#define pud_valid_not_user(pud)	pmd_valid_not_user(pte_pmd(pud_pte(pud)))
 
 static inline bool pgtable_l4_enabled(void);
 
@@ -845,7 +859,7 @@ static inline void set_pud(pud_t *pudp, pud_t pud)
 
 	WRITE_ONCE(*pudp, pud);
 
-	if (pud_valid(pud)) {
+	if (pud_valid_not_user(pud)) {
 		dsb(ishst);
 		isb();
 	}
@@ -916,6 +930,7 @@ static inline bool mm_pud_folded(const struct mm_struct *mm)
 #define p4d_none(p4d)		(pgtable_l4_enabled() && !p4d_val(p4d))
 #define p4d_bad(p4d)		(pgtable_l4_enabled() && !(p4d_val(p4d) & P4D_TABLE_BIT))
 #define p4d_present(p4d)	(!p4d_none(p4d))
+#define p4d_valid_not_user(p4d)	pmd_valid_not_user(pte_pmd(p4d_pte(p4d)))
 
 static inline void set_p4d(p4d_t *p4dp, p4d_t p4d)
 {
@@ -925,8 +940,11 @@ static inline void set_p4d(p4d_t *p4dp, p4d_t p4d)
 	}
 
 	WRITE_ONCE(*p4dp, p4d);
-	dsb(ishst);
-	isb();
+
+	if (p4d_valid_not_user(p4d)) {
+		dsb(ishst);
+		isb();
+	}
 }
 
 static inline void p4d_clear(p4d_t *p4dp)
@@ -1043,6 +1061,7 @@ static inline bool mm_p4d_folded(const struct mm_struct *mm)
 #define pgd_none(pgd)		(pgtable_l5_enabled() && !pgd_val(pgd))
 #define pgd_bad(pgd)		(pgtable_l5_enabled() && !(pgd_val(pgd) & PGD_TABLE_BIT))
 #define pgd_present(pgd)	(!pgd_none(pgd))
+#define pgd_valid_not_user(pgd)	pmd_valid_not_user(pte_pmd(pgd_pte(pgd)))
 
 static inline void set_pgd(pgd_t *pgdp, pgd_t pgd)
 {
@@ -1052,8 +1071,11 @@ static inline void set_pgd(pgd_t *pgdp, pgd_t pgd)
 	}
 
 	WRITE_ONCE(*pgdp, pgd);
-	dsb(ishst);
-	isb();
+
+	if (pgd_valid_not_user(pgd)) {
+		dsb(ishst);
+		isb();
+	}
 }
 
 static inline void pgd_clear(pgd_t *pgdp)
-- 
2.43.0



  parent reply	other threads:[~2025-02-17 14:08 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-17 14:07 [PATCH v2 00/14] Perf improvements for hugetlb and vmalloc on arm64 Ryan Roberts
2025-02-17 14:07 ` [PATCH v2 01/14] arm64: hugetlb: Cleanup huge_pte size discovery mechanisms Ryan Roberts
2025-02-17 14:07 ` [PATCH v2 02/14] arm64: hugetlb: Refine tlb maintenance scope Ryan Roberts
2025-02-17 14:07 ` [PATCH v2 03/14] mm/page_table_check: Batch-check pmds/puds just like ptes Ryan Roberts
2025-02-17 14:07 ` [PATCH v2 04/14] arm64/mm: Refactor __set_ptes() and __ptep_get_and_clear() Ryan Roberts
2025-02-17 14:07 ` [PATCH v2 05/14] arm64: hugetlb: Use set_ptes_anysz() and ptep_get_and_clear_anysz() Ryan Roberts
2025-02-17 14:07 ` [PATCH v2 06/14] arm64/mm: Hoist barriers out of set_ptes_anysz() loop Ryan Roberts
2025-02-22 11:56   ` Catalin Marinas
2025-02-24 12:18     ` Ryan Roberts
2025-02-17 14:07 ` Ryan Roberts [this message]
2025-02-20 16:54   ` [PATCH v2 07/14] arm64/mm: Avoid barriers for invalid or userspace mappings Kevin Brodsky
2025-02-24 12:26     ` Ryan Roberts
2025-02-22 13:17   ` Catalin Marinas
2025-02-25 16:41     ` Ryan Roberts
2025-02-17 14:08 ` [PATCH v2 08/14] mm/vmalloc: Warn on improper use of vunmap_range() Ryan Roberts
2025-02-20  7:02   ` Anshuman Khandual
2025-02-24 12:03   ` Catalin Marinas
2025-02-24 12:04   ` Catalin Marinas
2025-02-17 14:08 ` [PATCH v2 09/14] mm/vmalloc: Gracefully unmap huge ptes Ryan Roberts
2025-02-20 12:05   ` Uladzislau Rezki
2025-02-24 12:04   ` Catalin Marinas
2025-02-17 14:08 ` [PATCH v2 10/14] arm64/mm: Support huge pte-mapped pages in vmap Ryan Roberts
2025-02-24 12:03   ` Catalin Marinas
2025-02-25 16:57     ` Ryan Roberts
2025-02-17 14:08 ` [PATCH v2 11/14] mm/vmalloc: Batch arch_sync_kernel_mappings() more efficiently Ryan Roberts
2025-02-25 15:37   ` Catalin Marinas
2025-02-25 16:58     ` Ryan Roberts
2025-02-17 14:08 ` [PATCH v2 12/14] mm: Generalize arch_sync_kernel_mappings() Ryan Roberts
2025-02-25 17:10   ` Ryan Roberts
2025-02-25 17:52     ` Catalin Marinas
2025-02-17 14:08 ` [PATCH v2 13/14] mm: Only call arch_update_kernel_mappings_[begin|end]() for kernel mappings Ryan Roberts
2025-02-17 14:08 ` [PATCH v2 14/14] arm64/mm: Batch barriers when updating " Ryan Roberts

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250217140809.1702789-8-ryan.roberts@arm.com \
    --to=ryan.roberts@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexghiti@rivosinc.com \
    --cc=anshuman.khandual@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=david@redhat.com \
    --cc=hch@infradead.org \
    --cc=kevin.brodsky@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mark.rutland@arm.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=urezki@gmail.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox