From: Brendan Jackman <jackmanb@google.com>
To: Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Andy Lutomirski <luto@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Sean Christopherson <seanjc@google.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Alexandre Chartre <alexandre.chartre@oracle.com>,
Liran Alon <liran.alon@oracle.com>,
Jan Setje-Eilers <jan.setjeeilers@oracle.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mgorman@suse.de>,
Lorenzo Stoakes <lstoakes@gmail.com>,
David Hildenbrand <david@redhat.com>,
Vlastimil Babka <vbabka@suse.cz>,
Michal Hocko <mhocko@kernel.org>,
Khalid Aziz <khalid.aziz@oracle.com>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Valentin Schneider <vschneid@redhat.com>,
Paul Turner <pjt@google.com>, Reiji Watanabe <reijiw@google.com>,
Junaid Shahid <junaids@google.com>,
Ofir Weisse <oweisse@google.com>,
Yosry Ahmed <yosryahmed@google.com>,
Patrick Bellasi <derkling@google.com>,
KP Singh <kpsingh@google.com>,
Alexandra Sandulescu <aesa@google.com>,
Matteo Rizzo <matteorizzo@google.com>,
Jann Horn <jannh@google.com>
Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org,
kvm@vger.kernel.org, Brendan Jackman <jackmanb@google.com>
Subject: [PATCH 17/26] mm: asi: Map kernel text and static data as nonsensitive
Date: Fri, 12 Jul 2024 17:00:35 +0000 [thread overview]
Message-ID: <20240712-asi-rfc-24-v1-17-144b319a40d8@google.com> (raw)
In-Reply-To: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com>
Basically we need to map the kernel code and all its static variables.
Per-CPU variables need to be treated specially as described in the
comments. The cpu_entry_area is similar - this needs to be
nonsensitive so that the CPU can access the GDT etc when handling
a page fault.
Under 5-level paging, most of the kernel memory comes under a single PGD
entry (see Documentation/x86/x86_64/mm.rst. Basically, the mapping is
for this big region is the same as under 4-level, just wrapped in an
outer PGD entry). For that region, the "clone" logic is moved down one
step of the paging hierarchy.
Note that the p4d_alloc in asi_clone_p4d won't actually be used in
practice; the relevant PGD entry will always have been populated by
prior asi_map calls so this code would "work" if we just wrote
p4d_offset (but asi_clone_p4d would be broken if viewed in isolation).
The vmemmap area is not under this single PGD, it has its own 2-PGD
area, so we still use asi_clone_pgd for that one.
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
arch/x86/mm/asi.c | 106 +++++++++++++++++++++++++++++++++++++-
include/asm-generic/vmlinux.lds.h | 11 ++++
2 files changed, 116 insertions(+), 1 deletion(-)
diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c
index 6e106f25abbb..891b8d351df8 100644
--- a/arch/x86/mm/asi.c
+++ b/arch/x86/mm/asi.c
@@ -7,8 +7,8 @@
#include <linux/init.h>
#include <linux/pgtable.h>
-#include <asm/asi.h>
#include <asm/cmdline.h>
+#include <asm/page.h>
#include <asm/pgalloc.h>
#include <asm/mmu_context.h>
#include <asm/traps.h>
@@ -184,8 +184,68 @@ void __init asi_check_boottime_disable(void)
pr_info("ASI enablement ignored due to incomplete implementation.\n");
}
+/*
+ * Map data by sharing sub-PGD pagetables with the unrestricted mapping. This is
+ * more efficient than asi_map, but only works when you know the whole top-level
+ * page needs to be mapped in the restricted tables. Note that the size of the
+ * mappings this creates differs between 4 and 5-level paging.
+ */
+static void asi_clone_pgd(pgd_t *dst_table, pgd_t *src_table, size_t addr)
+{
+ pgd_t *src = pgd_offset_pgd(src_table, addr);
+ pgd_t *dst = pgd_offset_pgd(dst_table, addr);
+
+ if (!pgd_val(*dst))
+ set_pgd(dst, *src);
+ else
+ WARN_ON_ONCE(pgd_val(*dst) != pgd_val(*src));
+}
+
+/*
+ * For 4-level paging this is exactly the same as asi_clone_pgd. For 5-level
+ * paging it clones one level lower. So this always creates a mapping of the
+ * same size.
+ */
+static void asi_clone_p4d(pgd_t *dst_table, pgd_t *src_table, size_t addr)
+{
+ pgd_t *src_pgd = pgd_offset_pgd(src_table, addr);
+ pgd_t *dst_pgd = pgd_offset_pgd(dst_table, addr);
+ p4d_t *src_p4d = p4d_alloc(&init_mm, src_pgd, addr);
+ p4d_t *dst_p4d = p4d_alloc(&init_mm, dst_pgd, addr);
+
+ if (!p4d_val(*dst_p4d))
+ set_p4d(dst_p4d, *src_p4d);
+ else
+ WARN_ON_ONCE(p4d_val(*dst_p4d) != p4d_val(*src_p4d));
+}
+
+/*
+ * percpu_addr is where the linker put the percpu variable. asi_map_percpu finds
+ * the place where the percpu allocator copied the data during boot.
+ *
+ * This is necessary even when the page allocator defaults to
+ * global-nonsensitive, because the percpu allocator uses the memblock allocator
+ * for early allocations.
+ */
+static int asi_map_percpu(struct asi *asi, void *percpu_addr, size_t len)
+{
+ int cpu, err;
+ void *ptr;
+
+ for_each_possible_cpu(cpu) {
+ ptr = per_cpu_ptr(percpu_addr, cpu);
+ err = asi_map(asi, ptr, len);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
static int __init asi_global_init(void)
{
+ int err;
+
if (!boot_cpu_has(X86_FEATURE_ASI))
return 0;
@@ -205,6 +265,46 @@ static int __init asi_global_init(void)
VMALLOC_START, VMALLOC_END,
"ASI Global Non-sensitive vmalloc");
+ /* Map all kernel text and static data */
+ err = asi_map(ASI_GLOBAL_NONSENSITIVE, (void *)__START_KERNEL,
+ (size_t)_end - __START_KERNEL);
+ if (WARN_ON(err))
+ return err;
+ err = asi_map(ASI_GLOBAL_NONSENSITIVE, (void *)FIXADDR_START,
+ FIXADDR_SIZE);
+ if (WARN_ON(err))
+ return err;
+ /* Map all static percpu data */
+ err = asi_map_percpu(
+ ASI_GLOBAL_NONSENSITIVE,
+ __per_cpu_start, __per_cpu_end - __per_cpu_start);
+ if (WARN_ON(err))
+ return err;
+
+ /*
+ * The next areas are mapped using shared sub-P4D paging structures
+ * (asi_clone_p4d instead of asi_map), since we know the whole P4D will
+ * be mapped.
+ */
+ asi_clone_p4d(asi_global_nonsensitive_pgd, init_mm.pgd,
+ CPU_ENTRY_AREA_BASE);
+#ifdef CONFIG_X86_ESPFIX64
+ asi_clone_p4d(asi_global_nonsensitive_pgd, init_mm.pgd,
+ ESPFIX_BASE_ADDR);
+#endif
+ /*
+ * The vmemmap area actually _must_ be cloned via shared paging
+ * structures, since mappings can potentially change dynamically when
+ * hugetlbfs pages are created or broken down.
+ *
+ * We always clone 2 PGDs, this is a corrolary of the sizes of struct
+ * page, a page, and the physical address space.
+ */
+ WARN_ON(sizeof(struct page) * MAXMEM / PAGE_SIZE != 2 * (1UL << PGDIR_SHIFT));
+ asi_clone_pgd(asi_global_nonsensitive_pgd, init_mm.pgd, VMEMMAP_START);
+ asi_clone_pgd(asi_global_nonsensitive_pgd, init_mm.pgd,
+ VMEMMAP_START + (1UL << PGDIR_SHIFT));
+
return 0;
}
subsys_initcall(asi_global_init)
@@ -482,6 +582,10 @@ static bool follow_physaddr(
* Map the given range into the ASI page tables. The source of the mapping is
* the regular unrestricted page tables. Can be used to map any kernel memory.
*
+ * In contrast to some internal ASI logic (asi_clone_pgd and asi_clone_p4d) this
+ * never shares pagetables between restricted and unrestricted address spaces,
+ * instead it creates wholly new equivalent mappings.
+ *
* The caller MUST ensure that the source mapping will not change during this
* function. For dynamic kernel memory, this is generally ensured by mapping the
* memory within the allocator.
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index f7749d0f2562..4eca33d62950 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -1021,6 +1021,16 @@
COMMON_DISCARDS \
}
+/*
+ * ASI maps certain sections with certain sensitivity levels, so they need to
+ * have a page-aligned size.
+ */
+#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION
+#define ASI_ALIGN() ALIGN(PAGE_SIZE)
+#else
+#define ASI_ALIGN() .
+#endif
+
/**
* PERCPU_INPUT - the percpu input sections
* @cacheline: cacheline size
@@ -1042,6 +1052,7 @@
*(.data..percpu) \
*(.data..percpu..shared_aligned) \
PERCPU_DECRYPTED_SECTION \
+ . = ASI_ALIGN(); \
__per_cpu_end = .;
/**
--
2.45.2.993.g49e7a77208-goog
next prev parent reply other threads:[~2024-07-12 17:01 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-12 17:00 [PATCH 00/26] Address Space Isolation (ASI) 2024 Brendan Jackman
2024-07-12 17:00 ` [PATCH 01/26] mm: asi: Make some utility functions noinstr compatible Brendan Jackman
2024-10-25 11:41 ` Borislav Petkov
2024-10-25 13:21 ` Brendan Jackman
2024-10-29 17:38 ` Junaid Shahid
2024-10-29 19:12 ` Thomas Gleixner
2024-11-01 1:44 ` Junaid Shahid
2024-11-01 10:06 ` Brendan Jackman
2024-11-01 20:27 ` Thomas Gleixner
2024-11-05 21:40 ` Junaid Shahid
2024-12-13 14:45 ` Brendan Jackman
2024-07-12 17:00 ` [PATCH 02/26] x86: Create CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION Brendan Jackman
2024-07-22 7:55 ` Geert Uytterhoeven
2024-07-12 17:00 ` [PATCH 03/26] mm: asi: Introduce ASI core API Brendan Jackman
2024-07-12 17:00 ` [PATCH 04/26] objtool: let some noinstr functions make indirect calls Brendan Jackman
2024-07-12 17:00 ` [PATCH 05/26] mm: asi: Add infrastructure for boot-time enablement Brendan Jackman
2024-07-12 17:00 ` [PATCH 06/26] mm: asi: ASI support in interrupts/exceptions Brendan Jackman
2024-07-12 17:00 ` [PATCH 07/26] mm: asi: Switch to unrestricted address space before a context switch Brendan Jackman
2024-07-12 17:00 ` [PATCH 08/26] mm: asi: Use separate PCIDs for restricted address spaces Brendan Jackman
2024-07-12 17:00 ` [PATCH 09/26] mm: asi: Make __get_current_cr3_fast() ASI-aware Brendan Jackman
2024-07-12 17:00 ` [PATCH 10/26] mm: asi: Avoid warning from NMI userspace accesses in ASI context Brendan Jackman
2024-07-14 3:59 ` kernel test robot
2024-07-12 17:00 ` [PATCH 11/26] mm: asi: ASI page table allocation functions Brendan Jackman
2024-07-12 17:00 ` [PATCH 12/26] mm: asi: asi_exit() on PF, skip handling if address is accessible Brendan Jackman
2024-07-12 17:00 ` [PATCH 13/26] mm: asi: Functions to map/unmap a memory range into ASI page tables Brendan Jackman
2024-07-12 17:00 ` [PATCH 14/26] mm: asi: Add basic infrastructure for global non-sensitive mappings Brendan Jackman
2024-07-12 17:00 ` [PATCH 15/26] mm: Add __PAGEFLAG_FALSE Brendan Jackman
2024-07-12 17:00 ` [PATCH 16/26] mm: asi: Map non-user buddy allocations as nonsensitive Brendan Jackman
2024-08-21 13:59 ` Brendan Jackman
2024-07-12 17:00 ` Brendan Jackman [this message]
2024-07-12 17:00 ` [PATCH 18/26] mm: asi: Map vmalloc/vmap data as nonsesnitive Brendan Jackman
2024-07-13 15:53 ` kernel test robot
2024-07-12 17:00 ` [PATCH 19/26] percpu: clean up all mappings when pcpu_map_pages() fails Brendan Jackman
2024-07-16 1:33 ` Yosry Ahmed
2024-07-12 17:00 ` [PATCH 20/26] mm: asi: Map dynamic percpu memory as nonsensitive Brendan Jackman
2024-07-12 17:00 ` [PATCH 21/26] KVM: x86: asi: Restricted address space for VM execution Brendan Jackman
2024-07-12 17:00 ` [PATCH 22/26] KVM: x86: asi: Stabilize CR3 when potentially accessing with ASI Brendan Jackman
2024-07-12 17:00 ` [PATCH 23/26] mm: asi: Stabilize CR3 in switch_mm_irqs_off() Brendan Jackman
2024-07-12 17:00 ` [PATCH 24/26] mm: asi: Make TLB flushing correct under ASI Brendan Jackman
2024-07-12 17:00 ` [PATCH 25/26] mm: asi: Stop ignoring asi=on cmdline flag Brendan Jackman
2024-07-12 17:00 ` [PATCH 26/26] KVM: x86: asi: Add some mitigations on address space transitions Brendan Jackman
2024-07-14 5:02 ` kernel test robot
2024-08-20 10:52 ` Shivank Garg
2024-08-21 9:38 ` Brendan Jackman
2024-08-21 16:00 ` Shivank Garg
2024-07-12 17:09 ` [PATCH 00/26] Address Space Isolation (ASI) 2024 Brendan Jackman
2024-09-11 16:37 ` Brendan Jackman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240712-asi-rfc-24-v1-17-144b319a40d8@google.com \
--to=jackmanb@google.com \
--cc=aesa@google.com \
--cc=akpm@linux-foundation.org \
--cc=alexandre.chartre@oracle.com \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=derkling@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=hpa@zytor.com \
--cc=jan.setjeeilers@oracle.com \
--cc=jannh@google.com \
--cc=junaids@google.com \
--cc=juri.lelli@redhat.com \
--cc=khalid.aziz@oracle.com \
--cc=kpsingh@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=liran.alon@oracle.com \
--cc=lstoakes@gmail.com \
--cc=luto@kernel.org \
--cc=mark.rutland@arm.com \
--cc=matteorizzo@google.com \
--cc=mgorman@suse.de \
--cc=mhocko@kernel.org \
--cc=mingo@redhat.com \
--cc=oweisse@google.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=reijiw@google.com \
--cc=rostedt@goodmis.org \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=vbabka@suse.cz \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=will@kernel.org \
--cc=x86@kernel.org \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox