From: Valentin Schneider <vschneid@redhat.com>
To: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org,
x86@kernel.org, rcu@vger.kernel.org,
linux-kselftest@vger.kernel.org
Cc: "Peter Zijlstra" <peterz@infradead.org>,
"Nicolas Saenz Julienne" <nsaenzju@redhat.com>,
"Steven Rostedt" <rostedt@goodmis.org>,
"Masami Hiramatsu" <mhiramat@kernel.org>,
"Jonathan Corbet" <corbet@lwn.net>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Ingo Molnar" <mingo@redhat.com>,
"Borislav Petkov" <bp@alien8.de>,
"Dave Hansen" <dave.hansen@linux.intel.com>,
"H. Peter Anvin" <hpa@zytor.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Wanpeng Li" <wanpengli@tencent.com>,
"Vitaly Kuznetsov" <vkuznets@redhat.com>,
"Andy Lutomirski" <luto@kernel.org>,
"Frederic Weisbecker" <frederic@kernel.org>,
"Paul E. McKenney" <paulmck@kernel.org>,
"Neeraj Upadhyay" <quic_neeraju@quicinc.com>,
"Joel Fernandes" <joel@joelfernandes.org>,
"Josh Triplett" <josh@joshtriplett.org>,
"Boqun Feng" <boqun.feng@gmail.com>,
"Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>,
"Lai Jiangshan" <jiangshanlai@gmail.com>,
Zqiang <qiang.zhang1211@gmail.com>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Uladzislau Rezki" <urezki@gmail.com>,
"Christoph Hellwig" <hch@infradead.org>,
"Lorenzo Stoakes" <lstoakes@gmail.com>,
"Josh Poimboeuf" <jpoimboe@kernel.org>,
"Jason Baron" <jbaron@akamai.com>,
"Kees Cook" <keescook@chromium.org>,
"Sami Tolvanen" <samitolvanen@google.com>,
"Ard Biesheuvel" <ardb@kernel.org>,
"Nicholas Piggin" <npiggin@gmail.com>,
"Juerg Haefliger" <juerg.haefliger@canonical.com>,
"Nicolas Saenz Julienne" <nsaenz@kernel.org>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
"Nadav Amit" <namit@vmware.com>,
"Dan Carpenter" <error27@gmail.com>,
"Chuang Wang" <nashuiliang@gmail.com>,
"Yang Jihong" <yangjihong1@huawei.com>,
"Petr Mladek" <pmladek@suse.com>,
"Jason A. Donenfeld" <Jason@zx2c4.com>,
"Song Liu" <song@kernel.org>,
"Julian Pidancet" <julian.pidancet@oracle.com>,
"Tom Lendacky" <thomas.lendacky@amd.com>,
"Dionna Glaze" <dionnaglaze@google.com>,
"Thomas Weißschuh" <linux@weissschuh.net>,
"Juri Lelli" <juri.lelli@redhat.com>,
"Marcelo Tosatti" <mtosatti@redhat.com>,
"Yair Podemsky" <ypodemsk@redhat.com>,
"Daniel Wagner" <dwagner@suse.de>,
"Petr Tesarik" <ptesarik@suse.com>
Subject: [RFC PATCH v3 12/15] context_tracking,x86: Defer kernel text patching IPIs
Date: Tue, 19 Nov 2024 16:34:59 +0100 [thread overview]
Message-ID: <20241119153502.41361-13-vschneid@redhat.com> (raw)
In-Reply-To: <20241119153502.41361-1-vschneid@redhat.com>
text_poke_bp_batch() sends IPIs to all online CPUs to synchronize
them vs the newly patched instruction. CPUs that are executing in userspace
do not need this synchronization to happen immediately, and this is
actually harmful interference for NOHZ_FULL CPUs.
As the synchronization IPIs are sent using a blocking call, returning from
text_poke_bp_batch() implies all CPUs will observe the patched
instruction(s), and this should be preserved even if the IPI is deferred.
In other words, to safely defer this synchronization, any kernel
instruction leading to the execution of the deferred instruction
sync (ct_work_flush()) must *not* be mutable (patchable) at runtime.
This means we must pay attention to mutable instructions in the early entry
code:
- alternatives
- static keys
- all sorts of probes (kprobes/ftrace/bpf/???)
The early entry code leading to ct_work_flush() is noinstr, which gets rid
of the probes.
Alternatives are safe, because it's boot-time patching (before SMP is
even brought up) which is before any IPI deferral can happen.
This leaves us with static keys. Any static key used in early entry code
should be only forever-enabled at boot time, IOW __ro_after_init (pretty
much like alternatives). Exceptions are marked as forceful and will always
generate an IPI when flipped. Objtool is now able to point at static keys
that don't respect this, and all static keys used in early entry code have
now been verified as behaving like so.
Leverage the new context_tracking infrastructure to defer sync_core() IPIs
to a target CPU's next kernel entry.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
---
arch/x86/include/asm/context_tracking_work.h | 6 ++--
arch/x86/include/asm/text-patching.h | 1 +
arch/x86/kernel/alternative.c | 33 +++++++++++++++++---
arch/x86/kernel/kprobes/core.c | 4 +--
arch/x86/kernel/kprobes/opt.c | 4 +--
arch/x86/kernel/module.c | 2 +-
include/linux/context_tracking_work.h | 4 +--
7 files changed, 41 insertions(+), 13 deletions(-)
diff --git a/arch/x86/include/asm/context_tracking_work.h b/arch/x86/include/asm/context_tracking_work.h
index 5bc29e6b2ed38..2c66687ce00e2 100644
--- a/arch/x86/include/asm/context_tracking_work.h
+++ b/arch/x86/include/asm/context_tracking_work.h
@@ -2,11 +2,13 @@
#ifndef _ASM_X86_CONTEXT_TRACKING_WORK_H
#define _ASM_X86_CONTEXT_TRACKING_WORK_H
+#include <asm/sync_core.h>
+
static __always_inline void arch_context_tracking_work(int work)
{
switch (work) {
- case CONTEXT_WORK_n:
- // Do work...
+ case CONTEXT_WORK_SYNC:
+ sync_core();
break;
}
}
diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h
index e34de36cab61e..37344e95f738f 100644
--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -33,6 +33,7 @@ extern void apply_relocation(u8 *buf, const u8 * const instr, size_t instrlen, u
*/
extern void *text_poke(void *addr, const void *opcode, size_t len);
extern void text_poke_sync(void);
+extern void text_poke_sync_deferrable(void);
extern void *text_poke_kgdb(void *addr, const void *opcode, size_t len);
extern void *text_poke_copy(void *addr, const void *opcode, size_t len);
extern void *text_poke_copy_locked(void *addr, const void *opcode, size_t len, bool core_ok);
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 954c4c0f7fc58..4ce224d927b03 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -18,6 +18,7 @@
#include <linux/mmu_context.h>
#include <linux/bsearch.h>
#include <linux/sync_core.h>
+#include <linux/context_tracking.h>
#include <asm/text-patching.h>
#include <asm/alternative.h>
#include <asm/sections.h>
@@ -2080,9 +2081,24 @@ static void do_sync_core(void *info)
sync_core();
}
+static bool do_sync_core_defer_cond(int cpu, void *info)
+{
+ return !ct_set_cpu_work(cpu, CONTEXT_WORK_SYNC);
+}
+
+static void __text_poke_sync(smp_cond_func_t cond_func)
+{
+ on_each_cpu_cond(cond_func, do_sync_core, NULL, 1);
+}
+
void text_poke_sync(void)
{
- on_each_cpu(do_sync_core, NULL, 1);
+ __text_poke_sync(NULL);
+}
+
+void text_poke_sync_deferrable(void)
+{
+ __text_poke_sync(do_sync_core_defer_cond);
}
/*
@@ -2257,6 +2273,8 @@ static int tp_vec_nr;
static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries)
{
unsigned char int3 = INT3_INSN_OPCODE;
+ bool force_ipi = false;
+ void (*sync_fn)(void);
unsigned int i;
int do_sync;
@@ -2291,11 +2309,18 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries
* First step: add a int3 trap to the address that will be patched.
*/
for (i = 0; i < nr_entries; i++) {
+ /*
+ * Record that we need to send the IPI if at least one location
+ * in the batch requires it.
+ */
+ force_ipi |= tp[i].force_ipi;
tp[i].old = *(u8 *)text_poke_addr(&tp[i]);
text_poke(text_poke_addr(&tp[i]), &int3, INT3_INSN_SIZE);
}
- text_poke_sync();
+ sync_fn = force_ipi ? text_poke_sync : text_poke_sync_deferrable;
+
+ sync_fn();
/*
* Second step: update all but the first byte of the patched range.
@@ -2357,7 +2382,7 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries
* not necessary and we'd be safe even without it. But
* better safe than sorry (plus there's not only Intel).
*/
- text_poke_sync();
+ sync_fn();
}
/*
@@ -2378,7 +2403,7 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries
}
if (do_sync)
- text_poke_sync();
+ sync_fn();
/*
* Remove and wait for refs to be zero.
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 72e6a45e7ec24..c2fd2578fd5fc 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -817,7 +817,7 @@ void arch_arm_kprobe(struct kprobe *p)
u8 int3 = INT3_INSN_OPCODE;
text_poke(p->addr, &int3, 1);
- text_poke_sync();
+ text_poke_sync_deferrable();
perf_event_text_poke(p->addr, &p->opcode, 1, &int3, 1);
}
@@ -827,7 +827,7 @@ void arch_disarm_kprobe(struct kprobe *p)
perf_event_text_poke(p->addr, &int3, 1, &p->opcode, 1);
text_poke(p->addr, &p->opcode, 1);
- text_poke_sync();
+ text_poke_sync_deferrable();
}
void arch_remove_kprobe(struct kprobe *p)
diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c
index 36d6809c6c9e1..b2ce4d9c3ba56 100644
--- a/arch/x86/kernel/kprobes/opt.c
+++ b/arch/x86/kernel/kprobes/opt.c
@@ -513,11 +513,11 @@ void arch_unoptimize_kprobe(struct optimized_kprobe *op)
JMP32_INSN_SIZE - INT3_INSN_SIZE);
text_poke(addr, new, INT3_INSN_SIZE);
- text_poke_sync();
+ text_poke_sync_deferrable();
text_poke(addr + INT3_INSN_SIZE,
new + INT3_INSN_SIZE,
JMP32_INSN_SIZE - INT3_INSN_SIZE);
- text_poke_sync();
+ text_poke_sync_deferrable();
perf_event_text_poke(op->kp.addr, old, JMP32_INSN_SIZE, new, JMP32_INSN_SIZE);
}
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index 837450b6e882f..00e71ad30c01d 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -191,7 +191,7 @@ static int write_relocate_add(Elf64_Shdr *sechdrs,
write, apply);
if (!early) {
- text_poke_sync();
+ text_poke_sync_deferrable();
mutex_unlock(&text_mutex);
}
diff --git a/include/linux/context_tracking_work.h b/include/linux/context_tracking_work.h
index fb74db8876dd2..13fc97b395030 100644
--- a/include/linux/context_tracking_work.h
+++ b/include/linux/context_tracking_work.h
@@ -5,12 +5,12 @@
#include <linux/bitops.h>
enum {
- CONTEXT_WORK_n_OFFSET,
+ CONTEXT_WORK_SYNC_OFFSET,
CONTEXT_WORK_MAX_OFFSET
};
enum ct_work {
- CONTEXT_WORK_n = BIT(CONTEXT_WORK_n_OFFSET),
+ CONTEXT_WORK_SYNC = BIT(CONTEXT_WORK_SYNC_OFFSET),
CONTEXT_WORK_MAX = BIT(CONTEXT_WORK_MAX_OFFSET)
};
--
2.43.0
next prev parent reply other threads:[~2024-11-19 15:39 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-19 15:34 [RFC PATCH v3 00/15] context_tracking,x86: Defer some IPIs until a user->kernel transition Valentin Schneider
2024-11-19 15:34 ` [RFC PATCH v3 01/15] objtool: Make validate_call() recognize indirect calls to pv_ops[] Valentin Schneider
2024-11-19 20:38 ` Josh Poimboeuf
2024-11-19 15:34 ` [RFC PATCH v3 02/15] objtool: Flesh out warning related to pv_ops[] calls Valentin Schneider
2024-11-19 20:38 ` Josh Poimboeuf
2024-11-19 15:34 ` [RFC PATCH v3 03/15] sched/clock: Make sched_clock_running __ro_after_init Valentin Schneider
2024-11-19 15:34 ` [RFC PATCH v3 04/15] rcu: Add a small-width RCU watching counter debug option Valentin Schneider
2024-11-20 14:50 ` Peter Zijlstra
2024-11-20 16:15 ` Valentin Schneider
2024-11-22 12:53 ` Paul E. McKenney
2024-11-22 13:56 ` Valentin Schneider
2024-11-19 15:34 ` [RFC PATCH v3 05/15] rcutorture: Make TREE04 use CONFIG_RCU_DYNTICKS_TORTURE Valentin Schneider
2024-11-22 12:54 ` Paul E. McKenney
2024-11-19 15:34 ` [RFC PATCH v3 06/15] jump_label: Add forceful jump label type Valentin Schneider
2024-11-19 23:39 ` Josh Poimboeuf
2024-11-20 14:56 ` Peter Zijlstra
2024-11-20 14:57 ` Peter Zijlstra
2024-11-20 16:55 ` Josh Poimboeuf
2024-11-21 11:00 ` Peter Zijlstra
2024-11-21 15:38 ` Josh Poimboeuf
2024-11-21 15:51 ` Valentin Schneider
2024-11-21 20:21 ` Josh Poimboeuf
2024-11-22 10:17 ` Valentin Schneider
2024-11-20 16:24 ` Valentin Schneider
2024-11-20 0:05 ` Josh Poimboeuf
2024-11-20 10:22 ` Peter Zijlstra
2024-11-19 15:34 ` [RFC PATCH v3 07/15] x86/speculation/mds: Make mds_idle_clear forceful Valentin Schneider
2024-11-19 15:34 ` [RFC PATCH v3 08/15] sched/clock, x86: Make __sched_clock_stable forceful Valentin Schneider
2024-11-20 14:59 ` Peter Zijlstra
2024-11-20 16:34 ` Valentin Schneider
2024-11-21 11:02 ` Peter Zijlstra
2024-11-19 15:34 ` [RFC PATCH v3 09/15] objtool: Warn about non __ro_after_init static key usage in .noinstr Valentin Schneider
2024-11-20 17:13 ` Josh Poimboeuf
2024-11-19 15:34 ` [RFC PATCH v3 10/15] x86/alternatives: Record text_poke's of JUMP_TYPE_FORCEFUL labels Valentin Schneider
2024-11-19 15:34 ` [RFC PATCH v3 11/15] context-tracking: Introduce work deferral infrastructure Valentin Schneider
2024-11-20 10:54 ` Frederic Weisbecker
2024-11-20 14:23 ` Frederic Weisbecker
2024-11-20 17:10 ` Valentin Schneider
2024-11-20 17:30 ` Frederic Weisbecker
2024-11-22 14:56 ` Valentin Schneider
2024-11-24 21:46 ` Frederic Weisbecker
2024-11-29 16:40 ` Valentin Schneider
2024-11-29 22:19 ` Frederic Weisbecker
2024-11-19 15:34 ` Valentin Schneider [this message]
2024-11-20 15:13 ` [RFC PATCH v3 12/15] context_tracking,x86: Defer kernel text patching IPIs Peter Zijlstra
2024-11-19 15:35 ` [RFC PATCH v3 13/15] context_tracking,x86: Add infrastructure to defer kernel TLBI Valentin Schneider
2024-11-20 15:22 ` Peter Zijlstra
2024-11-20 15:32 ` Peter Zijlstra
2024-11-20 17:24 ` Valentin Schneider
2024-11-21 11:12 ` Peter Zijlstra
2024-11-21 15:07 ` Dave Hansen
2024-11-21 15:30 ` Peter Zijlstra
2024-12-05 17:31 ` Petr Tesarik
2024-12-09 12:04 ` Valentin Schneider
2024-12-09 12:12 ` Peter Zijlstra
2024-12-09 14:42 ` Petr Tesarik
2024-12-10 13:53 ` Valentin Schneider
2024-12-10 14:42 ` Petr Tesarik
2024-12-09 12:33 ` Petr Tesarik
2024-11-21 16:26 ` Peter Zijlstra
2024-11-19 15:35 ` [RFC PATCH v3 14/15] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs Valentin Schneider
2024-11-19 15:35 ` [RFC PATCH v3 15/15] context-tracking: Add a Kconfig to enable IPI deferral for NO_HZ_IDLE Valentin Schneider
2024-11-19 16:45 ` [RFC PATCH v3 00/15] context_tracking,x86: Defer some IPIs until a user->kernel transition Steven Rostedt
2024-11-19 22:51 ` Valentin Schneider
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241119153502.41361-13-vschneid@redhat.com \
--to=vschneid@redhat.com \
--cc=Jason@zx2c4.com \
--cc=akpm@linux-foundation.org \
--cc=ardb@kernel.org \
--cc=boqun.feng@gmail.com \
--cc=bp@alien8.de \
--cc=bpf@vger.kernel.org \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=dionnaglaze@google.com \
--cc=dwagner@suse.de \
--cc=error27@gmail.com \
--cc=frederic@kernel.org \
--cc=hch@infradead.org \
--cc=hpa@zytor.com \
--cc=jbaron@akamai.com \
--cc=jiangshanlai@gmail.com \
--cc=joel@joelfernandes.org \
--cc=josh@joshtriplett.org \
--cc=jpoimboe@kernel.org \
--cc=juerg.haefliger@canonical.com \
--cc=julian.pidancet@oracle.com \
--cc=juri.lelli@redhat.com \
--cc=keescook@chromium.org \
--cc=kirill.shutemov@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux@weissschuh.net \
--cc=lstoakes@gmail.com \
--cc=luto@kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=mingo@redhat.com \
--cc=mtosatti@redhat.com \
--cc=namit@vmware.com \
--cc=nashuiliang@gmail.com \
--cc=npiggin@gmail.com \
--cc=nsaenz@kernel.org \
--cc=nsaenzju@redhat.com \
--cc=paulmck@kernel.org \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=pmladek@suse.com \
--cc=ptesarik@suse.com \
--cc=qiang.zhang1211@gmail.com \
--cc=quic_neeraju@quicinc.com \
--cc=rcu@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=samitolvanen@google.com \
--cc=song@kernel.org \
--cc=tglx@linutronix.de \
--cc=thomas.lendacky@amd.com \
--cc=urezki@gmail.com \
--cc=vkuznets@redhat.com \
--cc=wanpengli@tencent.com \
--cc=x86@kernel.org \
--cc=yangjihong1@huawei.com \
--cc=ypodemsk@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox