From: Brendan Jackman <jackmanb@google.com>
To: Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Andy Lutomirski <luto@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Sean Christopherson <seanjc@google.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Alexandre Chartre <alexandre.chartre@oracle.com>,
Liran Alon <liran.alon@oracle.com>,
Jan Setje-Eilers <jan.setjeeilers@oracle.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mgorman@suse.de>,
Lorenzo Stoakes <lstoakes@gmail.com>,
David Hildenbrand <david@redhat.com>,
Vlastimil Babka <vbabka@suse.cz>,
Michal Hocko <mhocko@kernel.org>,
Khalid Aziz <khalid.aziz@oracle.com>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Valentin Schneider <vschneid@redhat.com>,
Paul Turner <pjt@google.com>, Reiji Watanabe <reijiw@google.com>,
Junaid Shahid <junaids@google.com>,
Ofir Weisse <oweisse@google.com>,
Yosry Ahmed <yosryahmed@google.com>,
Patrick Bellasi <derkling@google.com>,
KP Singh <kpsingh@google.com>,
Alexandra Sandulescu <aesa@google.com>,
Matteo Rizzo <matteorizzo@google.com>,
Jann Horn <jannh@google.com>
Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org,
kvm@vger.kernel.org, Brendan Jackman <jackmanb@google.com>
Subject: [PATCH 26/26] KVM: x86: asi: Add some mitigations on address space transitions
Date: Fri, 12 Jul 2024 17:00:44 +0000 [thread overview]
Message-ID: <20240712-asi-rfc-24-v1-26-144b319a40d8@google.com> (raw)
In-Reply-To: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com>
Here we start actually turning ASI into a real exploit mitigation. On
all CPUs we attempt to obliterate any indirect branch predictor training
before mapping in any secrets. We can also flush side channels on the
inverse transition. So, in this iteration we flush L1D, but only on CPUs
affected by L1TF.
The rationale for this is: L1TF seems to have been a relative outlier in
terms of its impact, and the mitigation is obviously rather devastating.
On the other hand, Spectre-type attacks are continuously being found,
and it's quite reasonable to assume that existing systems are vulnerable
to variations that are not currently mitigated by bespoke techniques
like Safe RET.
This is clearly an incomplete policy, for example it probably makes
sense to perform MDS mitigations in post_asi_enter, and there is clearly
a wide range of alternative postures with regard to per-platform vs
blanket mitigation configurations. This also ought to be integrated more
intelligently with bugs.c - this will probably require a fair bit of
discussion so it might warrant a patchset all to itself. For now though,
this ouhgt to provide an example of the kind of thing we might do with
ASI.
The changes to the inline asm for L1D flushes are to avoid duplicate
jump labels breaking the build in the case that vmx_l1d_flush() gets
inlined at multiple locations (as it seems to do in my builds).
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
arch/x86/include/asm/kvm_host.h | 2 +
arch/x86/include/asm/nospec-branch.h | 2 +
arch/x86/kvm/vmx/vmx.c | 88 ++++++++++++++++++++++++------------
arch/x86/kvm/x86.c | 33 +++++++++++++-
arch/x86/lib/retpoline.S | 7 +++
5 files changed, 101 insertions(+), 31 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6c3326cb8273c..8b7226dd2e027 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1840,6 +1840,8 @@ struct kvm_x86_init_ops {
struct kvm_x86_ops *runtime_ops;
struct kvm_pmu_ops *pmu_ops;
+
+ void (*post_asi_enter)(void);
};
struct kvm_arch_async_pf {
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index ff5f1ecc7d1e6..9502bdafc1edd 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -605,6 +605,8 @@ static __always_inline void mds_idle_clear_cpu_buffers(void)
mds_clear_cpu_buffers();
}
+extern void fill_return_buffer(void);
+
#endif /* __ASSEMBLY__ */
#endif /* _ASM_X86_NOSPEC_BRANCH_H_ */
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1105d666a8ade..6efcbddf6ce27 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6629,37 +6629,18 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
* is not exactly LRU. This could be sized at runtime via topology
* information but as all relevant affected CPUs have 32KiB L1D cache size
* there is no point in doing so.
+ *
+ * Must be reentrant, for use by vmx_post_asi_enter.
*/
-static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu)
+static inline_or_noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu)
{
int size = PAGE_SIZE << L1D_CACHE_ORDER;
/*
- * This code is only executed when the flush mode is 'cond' or
- * 'always'
+ * In theory we lose some of these increments to reentrancy under ASI.
+ * We just tolerate imprecise stats rather than deal with synchronizing.
+ * Anyway in practice on 64 bit it's gonna be a single instruction.
*/
- if (static_branch_likely(&vmx_l1d_flush_cond)) {
- bool flush_l1d;
-
- /*
- * Clear the per-vcpu flush bit, it gets set again
- * either from vcpu_run() or from one of the unsafe
- * VMEXIT handlers.
- */
- flush_l1d = vcpu->arch.l1tf_flush_l1d;
- vcpu->arch.l1tf_flush_l1d = false;
-
- /*
- * Clear the per-cpu flush bit, it gets set again from
- * the interrupt handlers.
- */
- flush_l1d |= kvm_get_cpu_l1tf_flush_l1d();
- kvm_clear_cpu_l1tf_flush_l1d();
-
- if (!flush_l1d)
- return;
- }
-
vcpu->stat.l1d_flush++;
if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) {
@@ -6670,26 +6651,57 @@ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu)
asm volatile(
/* First ensure the pages are in the TLB */
"xorl %%eax, %%eax\n"
- ".Lpopulate_tlb:\n\t"
+ ".Lpopulate_tlb_%=:\n\t"
"movzbl (%[flush_pages], %%" _ASM_AX "), %%ecx\n\t"
"addl $4096, %%eax\n\t"
"cmpl %%eax, %[size]\n\t"
- "jne .Lpopulate_tlb\n\t"
+ "jne .Lpopulate_tlb_%=\n\t"
"xorl %%eax, %%eax\n\t"
"cpuid\n\t"
/* Now fill the cache */
"xorl %%eax, %%eax\n"
- ".Lfill_cache:\n"
+ ".Lfill_cache_%=:\n"
"movzbl (%[flush_pages], %%" _ASM_AX "), %%ecx\n\t"
"addl $64, %%eax\n\t"
"cmpl %%eax, %[size]\n\t"
- "jne .Lfill_cache\n\t"
+ "jne .Lfill_cache_%=\n\t"
"lfence\n"
:: [flush_pages] "r" (vmx_l1d_flush_pages),
[size] "r" (size)
: "eax", "ebx", "ecx", "edx");
}
+static noinstr void vmx_maybe_l1d_flush(struct kvm_vcpu *vcpu)
+{
+ /*
+ * This code is only executed when the flush mode is 'cond' or
+ * 'always'
+ */
+ if (static_branch_likely(&vmx_l1d_flush_cond)) {
+ bool flush_l1d;
+
+ /*
+ * Clear the per-vcpu flush bit, it gets set again
+ * either from vcpu_run() or from one of the unsafe
+ * VMEXIT handlers.
+ */
+ flush_l1d = vcpu->arch.l1tf_flush_l1d;
+ vcpu->arch.l1tf_flush_l1d = false;
+
+ /*
+ * Clear the per-cpu flush bit, it gets set again from
+ * the interrupt handlers.
+ */
+ flush_l1d |= kvm_get_cpu_l1tf_flush_l1d();
+ kvm_clear_cpu_l1tf_flush_l1d();
+
+ if (!flush_l1d)
+ return;
+ }
+
+ vmx_l1d_flush(vcpu);
+}
+
static void vmx_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
{
struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
@@ -7284,7 +7296,7 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
* This is only after asi_enter() for performance reasons.
*/
if (static_branch_unlikely(&vmx_l1d_should_flush))
- vmx_l1d_flush(vcpu);
+ vmx_maybe_l1d_flush(vcpu);
else if (static_branch_unlikely(&mmio_stale_data_clear) &&
kvm_arch_has_assigned_device(vcpu->kvm))
mds_clear_cpu_buffers();
@@ -8321,6 +8333,14 @@ gva_t vmx_get_untagged_addr(struct kvm_vcpu *vcpu, gva_t gva, unsigned int flags
return (sign_extend64(gva, lam_bit) & ~BIT_ULL(63)) | (gva & BIT_ULL(63));
}
+#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION
+static noinstr void vmx_post_asi_enter(void)
+{
+ if (boot_cpu_has_bug(X86_BUG_L1TF))
+ vmx_l1d_flush(kvm_get_running_vcpu());
+}
+#endif
+
static struct kvm_x86_ops vmx_x86_ops __initdata = {
.name = KBUILD_MODNAME,
@@ -8727,6 +8747,14 @@ static struct kvm_x86_init_ops vmx_init_ops __initdata = {
.runtime_ops = &vmx_x86_ops,
.pmu_ops = &intel_pmu_ops,
+
+#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION
+ /*
+ * Only Intel CPUs currently do anything in post-enter, so this is a
+ * vendor hook for now.
+ */
+ .post_asi_enter = vmx_post_asi_enter,
+#endif
};
static void vmx_cleanup_l1d_flush(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b9947e88d4ac6..b5e4df2aa1636 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9695,6 +9695,36 @@ static void kvm_x86_check_cpu_compat(void *ret)
*(int *)ret = kvm_x86_check_processor_compatibility();
}
+#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION
+
+static noinstr void pre_asi_exit(void)
+{
+ /*
+ * Flush out prediction trainings by the guest before we go to access
+ * secrets.
+ */
+
+ /* Clear normal indirect branch predictions, if we haven't */
+ if (cpu_feature_enabled(X86_FEATURE_IBPB) &&
+ !cpu_feature_enabled(X86_FEATURE_IBPB_ON_VMEXIT))
+ __wrmsr(MSR_IA32_PRED_CMD, PRED_CMD_IBPB, 0);
+
+ /* Flush the RAS/RSB if we haven't already. */
+ if (!IS_ENABLED(CONFIG_RETPOLINE) ||
+ !cpu_feature_enabled(X86_FEATURE_RSB_VMEXIT))
+ fill_return_buffer();
+}
+
+struct asi_hooks asi_hooks = {
+ .pre_asi_exit = pre_asi_exit,
+ /* post_asi_enter populated later. */
+};
+
+#else /* CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION */
+struct asi_hooks asi_hooks = {};
+#endif /* CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION */
+
+
int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
{
u64 host_pat;
@@ -9753,7 +9783,8 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
if (r)
goto out_free_percpu;
- r = asi_register_class("KVM", NULL);
+ asi_hooks.post_asi_enter = ops->post_asi_enter;
+ r = asi_register_class("KVM", &asi_hooks);
if (r < 0)
goto out_mmu_exit;
kvm_asi_index = r;
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index 391059b2c6fbc..db5b8ee01efeb 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -396,3 +396,10 @@ SYM_CODE_END(__x86_return_thunk)
EXPORT_SYMBOL(__x86_return_thunk)
#endif /* CONFIG_MITIGATION_RETHUNK */
+
+.pushsection .noinstr.text, "ax"
+SYM_CODE_START(fill_return_buffer)
+ __FILL_RETURN_BUFFER(%_ASM_AX,RSB_CLEAR_LOOPS)
+ RET
+SYM_CODE_END(fill_return_buffer)
+.popsection
--
2.45.2.993.g49e7a77208-goog
next prev parent reply other threads:[~2024-07-12 17:02 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-12 17:00 [PATCH 00/26] Address Space Isolation (ASI) 2024 Brendan Jackman
2024-07-12 17:00 ` [PATCH 01/26] mm: asi: Make some utility functions noinstr compatible Brendan Jackman
2024-10-25 11:41 ` Borislav Petkov
2024-10-25 13:21 ` Brendan Jackman
2024-10-29 17:38 ` Junaid Shahid
2024-10-29 19:12 ` Thomas Gleixner
2024-11-01 1:44 ` Junaid Shahid
2024-11-01 10:06 ` Brendan Jackman
2024-11-01 20:27 ` Thomas Gleixner
2024-11-05 21:40 ` Junaid Shahid
2024-12-13 14:45 ` Brendan Jackman
2024-07-12 17:00 ` [PATCH 02/26] x86: Create CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION Brendan Jackman
2024-07-22 7:55 ` Geert Uytterhoeven
2024-07-12 17:00 ` [PATCH 03/26] mm: asi: Introduce ASI core API Brendan Jackman
2024-07-12 17:00 ` [PATCH 04/26] objtool: let some noinstr functions make indirect calls Brendan Jackman
2024-07-12 17:00 ` [PATCH 05/26] mm: asi: Add infrastructure for boot-time enablement Brendan Jackman
2024-07-12 17:00 ` [PATCH 06/26] mm: asi: ASI support in interrupts/exceptions Brendan Jackman
2024-07-12 17:00 ` [PATCH 07/26] mm: asi: Switch to unrestricted address space before a context switch Brendan Jackman
2024-07-12 17:00 ` [PATCH 08/26] mm: asi: Use separate PCIDs for restricted address spaces Brendan Jackman
2024-07-12 17:00 ` [PATCH 09/26] mm: asi: Make __get_current_cr3_fast() ASI-aware Brendan Jackman
2024-07-12 17:00 ` [PATCH 10/26] mm: asi: Avoid warning from NMI userspace accesses in ASI context Brendan Jackman
2024-07-14 3:59 ` kernel test robot
2024-07-12 17:00 ` [PATCH 11/26] mm: asi: ASI page table allocation functions Brendan Jackman
2024-07-12 17:00 ` [PATCH 12/26] mm: asi: asi_exit() on PF, skip handling if address is accessible Brendan Jackman
2024-07-12 17:00 ` [PATCH 13/26] mm: asi: Functions to map/unmap a memory range into ASI page tables Brendan Jackman
2024-07-12 17:00 ` [PATCH 14/26] mm: asi: Add basic infrastructure for global non-sensitive mappings Brendan Jackman
2024-07-12 17:00 ` [PATCH 15/26] mm: Add __PAGEFLAG_FALSE Brendan Jackman
2024-07-12 17:00 ` [PATCH 16/26] mm: asi: Map non-user buddy allocations as nonsensitive Brendan Jackman
2024-08-21 13:59 ` Brendan Jackman
2024-07-12 17:00 ` [PATCH 17/26] mm: asi: Map kernel text and static data " Brendan Jackman
2024-07-12 17:00 ` [PATCH 18/26] mm: asi: Map vmalloc/vmap data as nonsesnitive Brendan Jackman
2024-07-13 15:53 ` kernel test robot
2024-07-12 17:00 ` [PATCH 19/26] percpu: clean up all mappings when pcpu_map_pages() fails Brendan Jackman
2024-07-16 1:33 ` Yosry Ahmed
2024-07-12 17:00 ` [PATCH 20/26] mm: asi: Map dynamic percpu memory as nonsensitive Brendan Jackman
2024-07-12 17:00 ` [PATCH 21/26] KVM: x86: asi: Restricted address space for VM execution Brendan Jackman
2024-07-12 17:00 ` [PATCH 22/26] KVM: x86: asi: Stabilize CR3 when potentially accessing with ASI Brendan Jackman
2024-07-12 17:00 ` [PATCH 23/26] mm: asi: Stabilize CR3 in switch_mm_irqs_off() Brendan Jackman
2024-07-12 17:00 ` [PATCH 24/26] mm: asi: Make TLB flushing correct under ASI Brendan Jackman
2024-07-12 17:00 ` [PATCH 25/26] mm: asi: Stop ignoring asi=on cmdline flag Brendan Jackman
2024-07-12 17:00 ` Brendan Jackman [this message]
2024-07-14 5:02 ` [PATCH 26/26] KVM: x86: asi: Add some mitigations on address space transitions kernel test robot
2024-08-20 10:52 ` Shivank Garg
2024-08-21 9:38 ` Brendan Jackman
2024-08-21 16:00 ` Shivank Garg
2024-07-12 17:09 ` [PATCH 00/26] Address Space Isolation (ASI) 2024 Brendan Jackman
2024-09-11 16:37 ` Brendan Jackman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240712-asi-rfc-24-v1-26-144b319a40d8@google.com \
--to=jackmanb@google.com \
--cc=aesa@google.com \
--cc=akpm@linux-foundation.org \
--cc=alexandre.chartre@oracle.com \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=derkling@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=hpa@zytor.com \
--cc=jan.setjeeilers@oracle.com \
--cc=jannh@google.com \
--cc=junaids@google.com \
--cc=juri.lelli@redhat.com \
--cc=khalid.aziz@oracle.com \
--cc=kpsingh@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=liran.alon@oracle.com \
--cc=lstoakes@gmail.com \
--cc=luto@kernel.org \
--cc=mark.rutland@arm.com \
--cc=matteorizzo@google.com \
--cc=mgorman@suse.de \
--cc=mhocko@kernel.org \
--cc=mingo@redhat.com \
--cc=oweisse@google.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=reijiw@google.com \
--cc=rostedt@goodmis.org \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=vbabka@suse.cz \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=will@kernel.org \
--cc=x86@kernel.org \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox