From: Brendan Jackman <jackmanb@google.com>
To: Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Andy Lutomirski <luto@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Sean Christopherson <seanjc@google.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Alexandre Chartre <alexandre.chartre@oracle.com>,
Liran Alon <liran.alon@oracle.com>,
Jan Setje-Eilers <jan.setjeeilers@oracle.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mgorman@suse.de>,
Lorenzo Stoakes <lstoakes@gmail.com>,
David Hildenbrand <david@redhat.com>,
Vlastimil Babka <vbabka@suse.cz>,
Michal Hocko <mhocko@kernel.org>,
Khalid Aziz <khalid.aziz@oracle.com>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Valentin Schneider <vschneid@redhat.com>,
Paul Turner <pjt@google.com>, Reiji Watanabe <reijiw@google.com>,
Junaid Shahid <junaids@google.com>,
Ofir Weisse <oweisse@google.com>,
Yosry Ahmed <yosryahmed@google.com>,
Patrick Bellasi <derkling@google.com>,
KP Singh <kpsingh@google.com>,
Alexandra Sandulescu <aesa@google.com>,
Matteo Rizzo <matteorizzo@google.com>,
Jann Horn <jannh@google.com>
Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org,
kvm@vger.kernel.org, Brendan Jackman <jackmanb@google.com>
Subject: [PATCH 12/26] mm: asi: asi_exit() on PF, skip handling if address is accessible
Date: Fri, 12 Jul 2024 17:00:30 +0000 [thread overview]
Message-ID: <20240712-asi-rfc-24-v1-12-144b319a40d8@google.com> (raw)
In-Reply-To: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com>
From: Ofir Weisse <oweisse@google.com>
On a page-fault - do asi_exit(). Then check if now after the exit the
address is accessible. We do this by refactoring spurious_kernel_fault()
into two parts:
1. Verify that the error code value is something that could arise from a
lazy TLB update.
2. Walk the page table and verify permissions, which is now called
is_address_accessible(). We also define PTE_PRESENT() and PMD_PRESENT()
which are suitable for checking userspace pages. For the sake of
spurious faults, pte_present() and pmd_present() are only good for
kernelspace pages. This is because these macros might return true even
if the present bit is 0 (only relevant for userspace).
Signed-off-by: Ofir Weisse <oweisse@google.com>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
arch/x86/mm/fault.c | 119 +++++++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 104 insertions(+), 15 deletions(-)
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index bba4e020dd64..e0bc5006c371 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -942,7 +942,7 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
force_sig_fault(SIGBUS, BUS_ADRERR, (void __user *)address);
}
-static int spurious_kernel_fault_check(unsigned long error_code, pte_t *pte)
+static __always_inline int kernel_protection_ok(unsigned long error_code, pte_t *pte)
{
if ((error_code & X86_PF_WRITE) && !pte_write(*pte))
return 0;
@@ -953,6 +953,9 @@ static int spurious_kernel_fault_check(unsigned long error_code, pte_t *pte)
return 1;
}
+static inline_or_noinstr int kernel_access_ok(
+ unsigned long error_code, unsigned long address, pgd_t *pgd);
+
/*
* Handle a spurious fault caused by a stale TLB entry.
*
@@ -978,11 +981,6 @@ static noinline int
spurious_kernel_fault(unsigned long error_code, unsigned long address)
{
pgd_t *pgd;
- p4d_t *p4d;
- pud_t *pud;
- pmd_t *pmd;
- pte_t *pte;
- int ret;
/*
* Only writes to RO or instruction fetches from NX may cause
@@ -998,6 +996,50 @@ spurious_kernel_fault(unsigned long error_code, unsigned long address)
return 0;
pgd = init_mm.pgd + pgd_index(address);
+ return kernel_access_ok(error_code, address, pgd);
+}
+NOKPROBE_SYMBOL(spurious_kernel_fault);
+
+/*
+ * For kernel addresses, pte_present and pmd_present are sufficient for
+ * is_address_accessible. For user addresses these functions will return true
+ * even though the pte is not actually accessible by hardware (i.e _PAGE_PRESENT
+ * is not set). This happens in cases where the pages are physically present in
+ * memory, but they are not made accessible to hardware as they need software
+ * handling first:
+ *
+ * - ptes/pmds with _PAGE_PROTNONE need autonuma balancing (see pte_protnone(),
+ * change_prot_numa(), and do_numa_page()).
+ *
+ * - pmds with _PAGE_PSE & !_PAGE_PRESENT are undergoing splitting (see
+ * split_huge_page()).
+ *
+ * Here, we care about whether the hardware can actually access the page right
+ * now.
+ *
+ * These issues aren't currently present for PUD but we also have a custom
+ * PUD_PRESENT for a layer of future-proofing.
+ */
+#define PUD_PRESENT(pud) (pud_flags(pud) & _PAGE_PRESENT)
+#define PMD_PRESENT(pmd) (pmd_flags(pmd) & _PAGE_PRESENT)
+#define PTE_PRESENT(pte) (pte_flags(pte) & _PAGE_PRESENT)
+
+/*
+ * Check if an access by the kernel would cause a page fault. The access is
+ * described by a page fault error code (whether it was a write/instruction
+ * fetch) and address. This doesn't check for types of faults that are not
+ * expected to affect the kernel, e.g. PKU. The address can be user or kernel
+ * space, if user then we assume the access would happen via the uaccess API.
+ */
+static inline_or_noinstr int
+kernel_access_ok(unsigned long error_code, unsigned long address, pgd_t *pgd)
+{
+ p4d_t *p4d;
+ pud_t *pud;
+ pmd_t *pmd;
+ pte_t *pte;
+ int ret;
+
if (!pgd_present(*pgd))
return 0;
@@ -1006,27 +1048,27 @@ spurious_kernel_fault(unsigned long error_code, unsigned long address)
return 0;
if (p4d_leaf(*p4d))
- return spurious_kernel_fault_check(error_code, (pte_t *) p4d);
+ return kernel_protection_ok(error_code, (pte_t *) p4d);
pud = pud_offset(p4d, address);
- if (!pud_present(*pud))
+ if (!PUD_PRESENT(*pud))
return 0;
if (pud_leaf(*pud))
- return spurious_kernel_fault_check(error_code, (pte_t *) pud);
+ return kernel_protection_ok(error_code, (pte_t *) pud);
pmd = pmd_offset(pud, address);
- if (!pmd_present(*pmd))
+ if (!PMD_PRESENT(*pmd))
return 0;
if (pmd_leaf(*pmd))
- return spurious_kernel_fault_check(error_code, (pte_t *) pmd);
+ return kernel_protection_ok(error_code, (pte_t *) pmd);
pte = pte_offset_kernel(pmd, address);
- if (!pte_present(*pte))
+ if (!PTE_PRESENT(*pte))
return 0;
- ret = spurious_kernel_fault_check(error_code, pte);
+ ret = kernel_protection_ok(error_code, pte);
if (!ret)
return 0;
@@ -1034,12 +1076,11 @@ spurious_kernel_fault(unsigned long error_code, unsigned long address)
* Make sure we have permissions in PMD.
* If not, then there's a bug in the page tables:
*/
- ret = spurious_kernel_fault_check(error_code, (pte_t *) pmd);
+ ret = kernel_protection_ok(error_code, (pte_t *) pmd);
WARN_ONCE(!ret, "PMD has incorrect permission bits\n");
return ret;
}
-NOKPROBE_SYMBOL(spurious_kernel_fault);
int show_unhandled_signals = 1;
@@ -1483,6 +1524,29 @@ handle_page_fault(struct pt_regs *regs, unsigned long error_code,
}
}
+static __always_inline void warn_if_bad_asi_pf(
+ unsigned long error_code, unsigned long address)
+{
+#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION
+ struct asi *target;
+
+ /*
+ * It's a bug to access sensitive data from the "critical section", i.e.
+ * on the path between asi_enter and asi_relax, where untrusted code
+ * gets run. #PF in this state sees asi_intr_nest_depth() as 1 because
+ * #PF increments it. We can't think of a better way to determine if
+ * this has happened than to check the ASI pagetables, hence we can't
+ * really have this check in non-debug builds unfortunately.
+ */
+ VM_WARN_ONCE(
+ (target = asi_get_target(current)) != NULL &&
+ asi_intr_nest_depth() == 1 &&
+ !kernel_access_ok(error_code, address, asi_pgd(target)),
+ "ASI-sensitive data access from critical section, addr=%px error_code=%lx class=%s",
+ (void *) address, error_code, target->class->name);
+#endif
+}
+
DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
{
irqentry_state_t state;
@@ -1490,6 +1554,31 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
address = cpu_feature_enabled(X86_FEATURE_FRED) ? fred_event_data(regs) : read_cr2();
+ if (static_asi_enabled() && !user_mode(regs)) {
+ pgd_t *pgd;
+
+ /* Can be a NOP even for ASI faults, because of NMIs */
+ asi_exit();
+
+ /*
+ * handle_page_fault() might oops if we run it for a kernel
+ * address. This might be the case if we got here due to an ASI
+ * fault. We avoid this case by checking whether the address is
+ * now, after asi_exit(), accessible by hardware. If it is -
+ * there's nothing to do. Note that this is a bit of a shotgun;
+ * we can also bail early from user-address faults here that
+ * weren't actually caused by ASI. So we might wanna move this
+ * logic later in the handler. In particular, we might be losing
+ * some stats here. However for now this keeps ASI page faults
+ * nice and fast.
+ */
+ pgd = (pgd_t *)__va(read_cr3_pa()) + pgd_index(address);
+ if (kernel_access_ok(error_code, address, pgd)) {
+ warn_if_bad_asi_pf(error_code, address);
+ return;
+ }
+ }
+
prefetchw(¤t->mm->mmap_lock);
/*
--
2.45.2.993.g49e7a77208-goog
next prev parent reply other threads:[~2024-07-12 17:01 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-12 17:00 [PATCH 00/26] Address Space Isolation (ASI) 2024 Brendan Jackman
2024-07-12 17:00 ` [PATCH 01/26] mm: asi: Make some utility functions noinstr compatible Brendan Jackman
2024-10-25 11:41 ` Borislav Petkov
2024-10-25 13:21 ` Brendan Jackman
2024-10-29 17:38 ` Junaid Shahid
2024-10-29 19:12 ` Thomas Gleixner
2024-11-01 1:44 ` Junaid Shahid
2024-11-01 10:06 ` Brendan Jackman
2024-11-01 20:27 ` Thomas Gleixner
2024-11-05 21:40 ` Junaid Shahid
2024-12-13 14:45 ` Brendan Jackman
2024-07-12 17:00 ` [PATCH 02/26] x86: Create CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION Brendan Jackman
2024-07-22 7:55 ` Geert Uytterhoeven
2024-07-12 17:00 ` [PATCH 03/26] mm: asi: Introduce ASI core API Brendan Jackman
2024-07-12 17:00 ` [PATCH 04/26] objtool: let some noinstr functions make indirect calls Brendan Jackman
2024-07-12 17:00 ` [PATCH 05/26] mm: asi: Add infrastructure for boot-time enablement Brendan Jackman
2024-07-12 17:00 ` [PATCH 06/26] mm: asi: ASI support in interrupts/exceptions Brendan Jackman
2024-07-12 17:00 ` [PATCH 07/26] mm: asi: Switch to unrestricted address space before a context switch Brendan Jackman
2024-07-12 17:00 ` [PATCH 08/26] mm: asi: Use separate PCIDs for restricted address spaces Brendan Jackman
2024-07-12 17:00 ` [PATCH 09/26] mm: asi: Make __get_current_cr3_fast() ASI-aware Brendan Jackman
2024-07-12 17:00 ` [PATCH 10/26] mm: asi: Avoid warning from NMI userspace accesses in ASI context Brendan Jackman
2024-07-14 3:59 ` kernel test robot
2024-07-12 17:00 ` [PATCH 11/26] mm: asi: ASI page table allocation functions Brendan Jackman
2024-07-12 17:00 ` Brendan Jackman [this message]
2024-07-12 17:00 ` [PATCH 13/26] mm: asi: Functions to map/unmap a memory range into ASI page tables Brendan Jackman
2024-07-12 17:00 ` [PATCH 14/26] mm: asi: Add basic infrastructure for global non-sensitive mappings Brendan Jackman
2024-07-12 17:00 ` [PATCH 15/26] mm: Add __PAGEFLAG_FALSE Brendan Jackman
2024-07-12 17:00 ` [PATCH 16/26] mm: asi: Map non-user buddy allocations as nonsensitive Brendan Jackman
2024-08-21 13:59 ` Brendan Jackman
2024-07-12 17:00 ` [PATCH 17/26] mm: asi: Map kernel text and static data " Brendan Jackman
2024-07-12 17:00 ` [PATCH 18/26] mm: asi: Map vmalloc/vmap data as nonsesnitive Brendan Jackman
2024-07-13 15:53 ` kernel test robot
2024-07-12 17:00 ` [PATCH 19/26] percpu: clean up all mappings when pcpu_map_pages() fails Brendan Jackman
2024-07-16 1:33 ` Yosry Ahmed
2024-07-12 17:00 ` [PATCH 20/26] mm: asi: Map dynamic percpu memory as nonsensitive Brendan Jackman
2024-07-12 17:00 ` [PATCH 21/26] KVM: x86: asi: Restricted address space for VM execution Brendan Jackman
2024-07-12 17:00 ` [PATCH 22/26] KVM: x86: asi: Stabilize CR3 when potentially accessing with ASI Brendan Jackman
2024-07-12 17:00 ` [PATCH 23/26] mm: asi: Stabilize CR3 in switch_mm_irqs_off() Brendan Jackman
2024-07-12 17:00 ` [PATCH 24/26] mm: asi: Make TLB flushing correct under ASI Brendan Jackman
2024-07-12 17:00 ` [PATCH 25/26] mm: asi: Stop ignoring asi=on cmdline flag Brendan Jackman
2024-07-12 17:00 ` [PATCH 26/26] KVM: x86: asi: Add some mitigations on address space transitions Brendan Jackman
2024-07-14 5:02 ` kernel test robot
2024-08-20 10:52 ` Shivank Garg
2024-08-21 9:38 ` Brendan Jackman
2024-08-21 16:00 ` Shivank Garg
2024-07-12 17:09 ` [PATCH 00/26] Address Space Isolation (ASI) 2024 Brendan Jackman
2024-09-11 16:37 ` Brendan Jackman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240712-asi-rfc-24-v1-12-144b319a40d8@google.com \
--to=jackmanb@google.com \
--cc=aesa@google.com \
--cc=akpm@linux-foundation.org \
--cc=alexandre.chartre@oracle.com \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=derkling@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=hpa@zytor.com \
--cc=jan.setjeeilers@oracle.com \
--cc=jannh@google.com \
--cc=junaids@google.com \
--cc=juri.lelli@redhat.com \
--cc=khalid.aziz@oracle.com \
--cc=kpsingh@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=liran.alon@oracle.com \
--cc=lstoakes@gmail.com \
--cc=luto@kernel.org \
--cc=mark.rutland@arm.com \
--cc=matteorizzo@google.com \
--cc=mgorman@suse.de \
--cc=mhocko@kernel.org \
--cc=mingo@redhat.com \
--cc=oweisse@google.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=reijiw@google.com \
--cc=rostedt@goodmis.org \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=vbabka@suse.cz \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=will@kernel.org \
--cc=x86@kernel.org \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox