From: Kevin Brodsky <kevin.brodsky@arm.com>
To: linux-hardening@vger.kernel.org
Cc: linux-kernel@vger.kernel.org,
Kevin Brodsky <kevin.brodsky@arm.com>,
Andrew Morton <akpm@linux-foundation.org>,
Andy Lutomirski <luto@kernel.org>,
Catalin Marinas <catalin.marinas@arm.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
David Hildenbrand <david@redhat.com>,
Ira Weiny <ira.weiny@intel.com>, Jann Horn <jannh@google.com>,
Jeff Xu <jeffxu@chromium.org>, Joey Gouly <joey.gouly@arm.com>,
Kees Cook <kees@kernel.org>,
Linus Walleij <linus.walleij@linaro.org>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Marc Zyngier <maz@kernel.org>, Mark Brown <broonie@kernel.org>,
Matthew Wilcox <willy@infradead.org>,
Maxwell Bland <mbland@motorola.com>,
"Mike Rapoport (IBM)" <rppt@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Pierre Langlois <pierre.langlois@arm.com>,
Quentin Perret <qperret@google.com>,
Rick Edgecombe <rick.p.edgecombe@intel.com>,
Ryan Roberts <ryan.roberts@arm.com>,
Thomas Gleixner <tglx@linutronix.de>,
Vlastimil Babka <vbabka@suse.cz>, Will Deacon <will@kernel.org>,
Yang Shi <yang@os.amperecomputing.com>,
Yeoreum Yun <yeoreum.yun@arm.com>,
linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org,
x86@kernel.org
Subject: [PATCH v6 18/30] mm: kpkeys: Introduce early page table allocator
Date: Fri, 27 Feb 2026 17:55:06 +0000 [thread overview]
Message-ID: <20260227175518.3728055-19-kevin.brodsky@arm.com> (raw)
In-Reply-To: <20260227175518.3728055-1-kevin.brodsky@arm.com>
The kpkeys_hardened_pgtables feature aims to protect all page table
pages (PTPs) by mapping them with a privileged pkey. This is primarily
handled by kpkeys_pgtable_alloc(), called from pagetable_alloc().
However, this does not cover PTPs allocated early, before the
buddy allocator is available. These PTPs are allocated by architecture
code, either 1. from static pools or 2. using the memblock allocator,
and should also be protected.
This patch addresses the second category: PTPs allocated via memblock.
Such PTPs are notably used to create the linear map. Protecting them as
soon as they are allocated would require modifying the linear map while
it is being created, which seems at best difficult. Instead, a simple
allocator is introduced, refilling a cache with memblock and keeping
track of all allocated ranges to set their pkey once it is safe to do
so. PTPs allocated at that stage are not freed, so there is no need
to manage a free list.
The refill size/alignment is the same as for the pkeys block allocator.
For systems that use large block mappings, the same rationale applies
(reducing fragmentation of the linear map). This is also used for other
systems, as this reduces the number of calls to memblock, without much
downside.
The number of PTPs required to create the linear map is proportional to
the amount of available memory, which means it may be large. At that
stage the memblock allocator may however only track a limited number of
regions, and we size our own tracking array (full_ranges) accordingly.
The array may be quite large as a result (16KB on arm64), but it is
discarded once boot has completed (__initdata).
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
---
The full_ranges array will end up mostly empty on most systems, but
relying on INIT_MEMBLOCK_MEMORY_REGIONS seemed to be the only way to
guarantee that we can track all ranges regardless of the size and layout
of physical memory.
An alternative would be to rebuild the ranges by walking the kernel page
tables in init_late(), but that's arguably at least as complex
(requiring stop_machine()).
---
include/linux/kpkeys.h | 7 ++
mm/kpkeys_hardened_pgtables.c | 165 ++++++++++++++++++++++++++++++++++
2 files changed, 172 insertions(+)
diff --git a/include/linux/kpkeys.h b/include/linux/kpkeys.h
index 8cfeb6e5af56..73b456ecec65 100644
--- a/include/linux/kpkeys.h
+++ b/include/linux/kpkeys.h
@@ -139,6 +139,8 @@ void kpkeys_hardened_pgtables_init(void);
*/
void kpkeys_hardened_pgtables_init_late(void);
+phys_addr_t kpkeys_physmem_pgtable_alloc(void);
+
#else /* CONFIG_KPKEYS_HARDENED_PGTABLES */
static inline bool kpkeys_hardened_pgtables_enabled(void)
@@ -167,6 +169,11 @@ static inline void kpkeys_hardened_pgtables_init(void) {}
static inline void kpkeys_hardened_pgtables_init_late(void) {}
+static inline phys_addr_t kpkeys_physmem_pgtable_alloc(void)
+{
+ return 0;
+}
+
#endif /* CONFIG_KPKEYS_HARDENED_PGTABLES */
#endif /* _LINUX_KPKEYS_H */
diff --git a/mm/kpkeys_hardened_pgtables.c b/mm/kpkeys_hardened_pgtables.c
index dcc5e6da7c85..1b649812f474 100644
--- a/mm/kpkeys_hardened_pgtables.c
+++ b/mm/kpkeys_hardened_pgtables.c
@@ -3,6 +3,7 @@
#include <linux/list.h>
#include <linux/highmem.h>
#include <linux/kpkeys.h>
+#include <linux/memblock.h>
#include <linux/memcontrol.h>
#include <linux/mm.h>
#include <linux/mutex.h>
@@ -41,6 +42,9 @@ static bool pba_ready_for_direct_map_split(void);
static void pba_init(void);
static void pba_init_late(void);
+/* pkeys physmem allocator (PPA) - implemented below */
+static void ppa_finalize(void);
+
/* Trivial allocator in case the linear map is PTE-mapped (no block mapping) */
static struct page *noblock_pgtable_alloc(gfp_t gfp)
{
@@ -113,8 +117,14 @@ void __init kpkeys_hardened_pgtables_init_late(void)
if (!arch_kpkeys_enabled())
return;
+ /*
+ * Called first to avoid relying on pba_early_region for splitting
+ * the linear map in the subsequent calls.
+ */
if (pba_enabled())
pba_init_late();
+
+ ppa_finalize();
}
/*
@@ -751,3 +761,158 @@ static int __init pba_init_shrinker(void)
return 0;
}
late_initcall(pba_init_shrinker);
+
+/*
+ * pkeys physmem allocator (PPA): block-based allocator for very early page
+ * tables (especially for creating the linear map), based on memblock. Blocks
+ * are tracked so that their pkey can be set once it is safe to do so.
+ */
+
+/*
+ * We may have to track many ranges when allocating page tables for the linear
+ * map, as their number grows with the amount of available memory. Assuming that
+ * memblock returns contiguous blocks whenever possible, the number of ranges
+ * to track cannot however exceed the number of regions that memblock itself
+ * tracks. memblock_allow_resize() hasn't been called yet at that point, so
+ * that limit is the size of the statically allocated array.
+ */
+#define PHYSMEM_MAX_RANGES INIT_MEMBLOCK_MEMORY_REGIONS
+
+/*
+ * We allocate ranges with the same size and alignment as the maximum refill
+ * size for the regular block allocator, with the same rationale (minimising
+ * spliting and optimising TLB usage).
+ */
+#define PHYSMEM_REFILL_SIZE (PAGE_SIZE << refill_orders[0])
+
+struct physmem_range {
+ phys_addr_t addr;
+ phys_addr_t size;
+};
+
+struct pkeys_physmem_allocator {
+ struct physmem_range free_range;
+
+ struct physmem_range full_ranges[PHYSMEM_MAX_RANGES];
+ unsigned int nr_full_ranges;
+};
+
+static struct pkeys_physmem_allocator pkeys_physmem_allocator __initdata;
+
+static int __init set_pkey_pgtable_phys(phys_addr_t pa, phys_addr_t size)
+{
+ unsigned long addr = (unsigned long)__va(pa);
+ int ret;
+
+ ret = set_memory_pkey(addr, size / PAGE_SIZE, KPKEYS_PKEY_PGTABLES);
+ pr_debug("%s: addr=%pa, size=%pa\n", __func__, &addr, &size);
+
+ WARN_ON(ret);
+ return ret;
+}
+
+static bool __init ppa_try_extend_last_range(phys_addr_t addr, phys_addr_t size)
+{
+ struct pkeys_physmem_allocator *ppa = &pkeys_physmem_allocator;
+ struct physmem_range *range;
+
+ if (!ppa->nr_full_ranges)
+ return false;
+
+ range = &ppa->full_ranges[ppa->nr_full_ranges - 1];
+
+ /* Merge the new range into the last range if they are contiguous */
+ if (addr == range->addr + range->size) {
+ range->size += size;
+ return true;
+ } else if (addr + size == range->addr) {
+ range->addr -= size;
+ range->size += size;
+ return true;
+ }
+
+ return false;
+}
+
+static void __init ppa_register_full_range(phys_addr_t addr)
+{
+ struct pkeys_physmem_allocator *ppa = &pkeys_physmem_allocator;
+ struct physmem_range *range;
+
+ if (!addr)
+ return;
+
+ if (ppa_try_extend_last_range(addr, PHYSMEM_REFILL_SIZE))
+ return;
+
+ /* Could not extend the last range, create a new one */
+ if (WARN_ON(ppa->nr_full_ranges >= PHYSMEM_MAX_RANGES))
+ return;
+
+ range = &ppa->full_ranges[ppa->nr_full_ranges++];
+ range->addr = addr;
+ range->size = PHYSMEM_REFILL_SIZE;
+}
+
+static void __init ppa_refill(void)
+{
+ struct pkeys_physmem_allocator *ppa = &pkeys_physmem_allocator;
+ phys_addr_t size = PHYSMEM_REFILL_SIZE;
+ phys_addr_t addr;
+
+ /*
+ * There should be plenty of contiguous physical memory available so
+ * early during boot so there should be no need for fallback sizes.
+ */
+ addr = memblock_phys_alloc_range(size, size, 0,
+ MEMBLOCK_ALLOC_NOLEAKTRACE);
+ WARN_ON(!addr);
+
+ pr_debug("%s: addr=%pa\n", __func__, &addr);
+
+ ppa->free_range.addr = addr;
+ ppa->free_range.size = (addr ? size : 0);
+}
+
+static void __init ppa_finalize(void)
+{
+ struct pkeys_physmem_allocator *ppa = &pkeys_physmem_allocator;
+
+ if (ppa->free_range.addr) {
+ struct physmem_range *free_range = &ppa->free_range;
+
+ /* Protect the range that was allocated, and free the rest */
+ set_pkey_pgtable_phys(free_range->addr + free_range->size,
+ PHYSMEM_REFILL_SIZE - free_range->size);
+
+ if (free_range->size)
+ memblock_free_late(free_range->addr, free_range->size);
+
+ free_range->addr = 0;
+ free_range->size = 0;
+ }
+
+ for (unsigned int i = 0; i < ppa->nr_full_ranges; i++) {
+ struct physmem_range *range = &ppa->full_ranges[i];
+
+ set_pkey_pgtable_phys(range->addr, range->size);
+ }
+}
+
+phys_addr_t __init kpkeys_physmem_pgtable_alloc(void)
+{
+ struct pkeys_physmem_allocator *ppa = &pkeys_physmem_allocator;
+
+ if (!ppa->free_range.size) {
+ ppa_register_full_range(ppa->free_range.addr);
+ ppa_refill();
+ }
+
+ if (!ppa->free_range.addr)
+ /* Refilling failed - allocate untracked memory */
+ return memblock_phys_alloc_range(PAGE_SIZE, PAGE_SIZE, 0,
+ MEMBLOCK_ALLOC_NOLEAKTRACE);
+
+ ppa->free_range.size -= PAGE_SIZE;
+ return ppa->free_range.addr + ppa->free_range.size;
+}
--
2.51.2
next prev parent reply other threads:[~2026-02-27 17:57 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-27 17:54 [PATCH v6 00/30] pkeys-based page table hardening Kevin Brodsky
2026-02-27 17:54 ` [PATCH v6 01/30] mm: Introduce kpkeys Kevin Brodsky
2026-02-27 17:54 ` [PATCH v6 02/30] set_memory: Introduce set_memory_pkey() stub Kevin Brodsky
2026-02-27 17:54 ` [PATCH v6 03/30] arm64: mm: Enable overlays for all EL1 indirect permissions Kevin Brodsky
2026-02-27 17:54 ` [PATCH v6 04/30] arm64: Introduce por_elx_set_pkey_perms() helper Kevin Brodsky
2026-02-27 17:54 ` [PATCH v6 05/30] arm64: Implement asm/kpkeys.h using POE Kevin Brodsky
2026-02-27 17:54 ` [PATCH v6 06/30] arm64: set_memory: Implement set_memory_pkey() Kevin Brodsky
2026-02-27 17:54 ` [PATCH v6 07/30] arm64: Reset POR_EL1 on exception entry Kevin Brodsky
2026-02-27 17:54 ` [PATCH v6 08/30] arm64: Context-switch POR_EL1 Kevin Brodsky
2026-02-27 17:54 ` [PATCH v6 09/30] arm64: Initialize POR_EL1 register on cpu_resume() Kevin Brodsky
2026-02-27 17:54 ` [PATCH v6 10/30] arm64: Enable kpkeys Kevin Brodsky
2026-02-27 17:54 ` [PATCH v6 11/30] memblock: Move INIT_MEMBLOCK_* macros to header Kevin Brodsky
2026-02-27 17:55 ` [PATCH v6 12/30] set_memory: Introduce arch_has_pte_only_direct_map() Kevin Brodsky
2026-02-27 17:55 ` [PATCH v6 13/30] mm: kpkeys: Introduce kpkeys_hardened_pgtables feature Kevin Brodsky
2026-02-27 17:55 ` [PATCH v6 14/30] mm: kpkeys: Introduce block-based page table allocator Kevin Brodsky
2026-02-27 17:55 ` [PATCH v6 15/30] mm: kpkeys: Handle splitting of linear map Kevin Brodsky
2026-02-27 17:55 ` [PATCH v6 16/30] mm: kpkeys: Defer early call to set_memory_pkey() Kevin Brodsky
2026-02-27 17:55 ` [PATCH v6 17/30] mm: kpkeys: Add shrinker for block pgtable allocator Kevin Brodsky
2026-02-27 17:55 ` Kevin Brodsky [this message]
2026-02-27 17:55 ` [PATCH v6 19/30] mm: kpkeys: Introduce hook for protecting static page tables Kevin Brodsky
2026-02-27 17:55 ` [PATCH v6 20/30] arm64: cpufeature: Add helper to directly probe CPU for POE support Kevin Brodsky
2026-02-27 17:55 ` [PATCH v6 21/30] arm64: set_memory: Implement arch_has_pte_only_direct_map() Kevin Brodsky
2026-02-27 17:55 ` [PATCH v6 22/30] arm64: kpkeys: Support KPKEYS_LVL_PGTABLES Kevin Brodsky
2026-02-27 17:55 ` [PATCH v6 23/30] arm64: kpkeys: Ensure the linear map can be modified Kevin Brodsky
2026-02-27 20:28 ` kernel test robot
2026-02-27 17:55 ` [PATCH v6 24/30] arm64: kpkeys: Handle splitting of linear map Kevin Brodsky
2026-02-27 17:55 ` [PATCH v6 25/30] arm64: kpkeys: Protect early page tables Kevin Brodsky
2026-02-27 17:55 ` [PATCH v6 26/30] arm64: kpkeys: Protect init_pg_dir Kevin Brodsky
2026-02-27 17:55 ` [PATCH v6 27/30] arm64: kpkeys: Guard page table writes Kevin Brodsky
2026-02-27 17:55 ` [PATCH v6 28/30] arm64: kpkeys: Batch KPKEYS_LVL_PGTABLES switches Kevin Brodsky
2026-02-27 17:55 ` [PATCH v6 29/30] arm64: kpkeys: Enable kpkeys_hardened_pgtables support Kevin Brodsky
2026-02-27 17:55 ` [PATCH v6 30/30] mm: Add basic tests for kpkeys_hardened_pgtables Kevin Brodsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260227175518.3728055-19-kevin.brodsky@arm.com \
--to=kevin.brodsky@arm.com \
--cc=akpm@linux-foundation.org \
--cc=broonie@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=ira.weiny@intel.com \
--cc=jannh@google.com \
--cc=jeffxu@chromium.org \
--cc=joey.gouly@arm.com \
--cc=kees@kernel.org \
--cc=linus.walleij@linaro.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-hardening@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=luto@kernel.org \
--cc=maz@kernel.org \
--cc=mbland@motorola.com \
--cc=peterz@infradead.org \
--cc=pierre.langlois@arm.com \
--cc=qperret@google.com \
--cc=rick.p.edgecombe@intel.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=tglx@linutronix.de \
--cc=vbabka@suse.cz \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
--cc=yang@os.amperecomputing.com \
--cc=yeoreum.yun@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox