linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Frederic Weisbecker <frederic@kernel.org>
To: Valentin Schneider <vschneid@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Nicolas Saenz Julienne <nsaenzju@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Andy Lutomirski <luto@kernel.org>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Josh Poimboeuf <jpoimboe@kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Arnd Bergmann <arnd@arndb.de>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Jason Baron <jbaron@akamai.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ard Biesheuvel <ardb@kernel.org>,
	Sami Tolvanen <samitolvanen@google.com>,
	"David S. Miller" <davem@davemloft.net>,
	Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
	Joel Fernandes <joelagnelf@nvidia.com>,
	Josh Triplett <josh@joshtriplett.org>,
	Boqun Feng <boqun.feng@gmail.com>,
	Uladzislau Rezki <urezki@gmail.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Mel Gorman <mgorman@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Masahiro Yamada <masahiroy@kernel.org>,
	Han Shen <shenhan@google.com>, Rik van Riel <riel@surriel.com>,
	Jann Horn <jannh@google.com>,
	Dan Carpenter <dan.carpenter@linaro.org>,
	Oleg Nesterov <oleg@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Clark Williams <williams@redhat.com>,
	Tomas Glozar <tglozar@redhat.com>,
	Yair Podemsky <ypodemsk@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	Daniel Wagner <dwagner@suse.de>, Petr Tesarik <ptesarik@suse.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>
Subject: Re: [RFC PATCH v8 09/10] context_tracking,x86: Defer kernel text patching IPIs when tracking CR3 switches
Date: Wed, 15 Apr 2026 14:11:35 +0200	[thread overview]
Message-ID: <ad-Ad44-IyU-yKJf@localhost.localdomain> (raw)
In-Reply-To: <20260324094801.3092968-10-vschneid@redhat.com>

Le Tue, Mar 24, 2026 at 10:48:00AM +0100, Valentin Schneider a écrit :
> text_poke_bp_batch() sends IPIs to all online CPUs to synchronize
> them vs the newly patched instruction. CPUs that are executing in userspace
> do not need this synchronization to happen immediately, and this is
> actually harmful interference for NOHZ_FULL CPUs.
> 
> As the synchronization IPIs are sent using a blocking call, returning from
> text_poke_bp_batch() implies all CPUs will observe the patched
> instruction(s), and this should be preserved even if the IPI is deferred.
> In other words, to safely defer this synchronization, any kernel
> instruction leading to the execution of the deferred instruction
> sync must *not* be mutable (patchable) at runtime.
> 
> This means we must pay attention to mutable instructions in the early entry
> code:
> - alternatives
> - static keys
> - static calls
> - all sorts of probes (kprobes/ftrace/bpf/???)
> 
> The early entry code is noinstr, which gets rid of the probes.
> 
> Alternatives are safe, because it's boot-time patching (before SMP is
> even brought up) which is before any IPI deferral can happen.
> 
> This leaves us with static keys and static calls. Any static key used in
> early entry code should be only forever-enabled at boot time, IOW
> __ro_after_init (pretty much like alternatives). Exceptions to that will
> now be caught by objtool.
> 
> The deferred instruction sync is the CR3 RMW done as part of
> kPTI when switching to the kernel page table:
> 
>   SDM vol2 chapter 4.3 - Move to/from control registers:
>   ```
>   MOV CR* instructions, except for MOV CR8, are serializing instructions.
>   ```
> 
> Leverage the new kernel_cr3_loaded signal and the kPTI CR3 RMW to defer
> sync_core() IPIs targeting NOHZ_FULL CPUs.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
> Signed-off-by: Valentin Schneider <vschneid@redhat.com>
> ---
>  arch/x86/include/asm/text-patching.h |  5 ++++
>  arch/x86/kernel/alternative.c        | 34 +++++++++++++++++++++++-----
>  arch/x86/kernel/kprobes/core.c       |  4 ++--
>  arch/x86/kernel/kprobes/opt.c        |  4 ++--
>  arch/x86/kernel/module.c             |  2 +-
>  5 files changed, 38 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h
> index f2d142a0a862e..628e80f8318cd 100644
> --- a/arch/x86/include/asm/text-patching.h
> +++ b/arch/x86/include/asm/text-patching.h
> @@ -33,6 +33,11 @@ extern void text_poke_apply_relocation(u8 *buf, const u8 * const instr, size_t i
>   */
>  extern void *text_poke(void *addr, const void *opcode, size_t len);
>  extern void smp_text_poke_sync_each_cpu(void);
> +#ifdef CONFIG_TRACK_CR3
> +extern void smp_text_poke_sync_each_cpu_deferrable(void);
> +#else
> +#define smp_text_poke_sync_each_cpu_deferrable smp_text_poke_sync_each_cpu
> +#endif
>  extern void *text_poke_kgdb(void *addr, const void *opcode, size_t len);
>  extern void *text_poke_copy(void *addr, const void *opcode, size_t len);
>  #define text_poke_copy text_poke_copy
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index 28518371d8bf3..f3af77d7c533c 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -6,6 +6,7 @@
>  #include <linux/vmalloc.h>
>  #include <linux/memory.h>
>  #include <linux/execmem.h>
> +#include <linux/sched/isolation.h>
>  
>  #include <asm/text-patching.h>
>  #include <asm/insn.h>
> @@ -13,6 +14,7 @@
>  #include <asm/ibt.h>
>  #include <asm/set_memory.h>
>  #include <asm/nmi.h>
> +#include <asm/tlbflush.h>
>  
>  int __read_mostly alternatives_patched;
>  
> @@ -2706,11 +2708,29 @@ static void do_sync_core(void *info)
>  	sync_core();
>  }
>  
> +static void __smp_text_poke_sync_each_cpu(smp_cond_func_t cond_func)
> +{
> +	on_each_cpu_cond(cond_func, do_sync_core, NULL, 1);
> +}
> +
>  void smp_text_poke_sync_each_cpu(void)
>  {
> -	on_each_cpu(do_sync_core, NULL, 1);
> +	__smp_text_poke_sync_each_cpu(NULL);
> +}
> +
> +#ifdef CONFIG_TRACK_CR3
> +static bool do_sync_core_defer_cond(int cpu, void *info)
> +{
> +	return housekeeping_cpu(cpu, HK_TYPE_KERNEL_NOISE) ||
> +	       per_cpu(kernel_cr3_loaded, cpu);

|| should be && ?

Also I would again expect full ordering here with an smp_mb() before the
check. So that:

CPU 0                            CPU 1
-----                            -----
//enter_kernel                   //do_sync_core_defer_cond
kernel_cr3_loaded = 1            WRITE page table
smp_mb()                         smp_mb()
WRITE cr3                        READ kernel_cr3_loaded

But I'm not sure if that ordering is enough to imply that if CPU 1 observes
kernel_cr3_loaded == 0, then subsequent CPU 0 entering the kernel is guaranteed
to flush the TLB with the latest page table write.

Thoughts?

Thanks.

-- 
Frederic Weisbecker
SUSE Labs


  reply	other threads:[~2026-04-15 12:11 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-24  9:47 [RFC PATCH v8 00/10] context_tracking,x86: Defer some IPIs until a user->kernel transition Valentin Schneider
2026-03-24  9:47 ` [RFC PATCH v8 01/10] objtool: Make validate_call() recognize indirect calls to pv_ops[] Valentin Schneider
2026-03-24  9:47 ` [RFC PATCH v8 02/10] objtool: Flesh out warning related to pv_ops[] calls Valentin Schneider
2026-03-24  9:47 ` [RFC PATCH v8 03/10] objtool: Always pass a section to validate_unwind_hints() Valentin Schneider
2026-03-24  9:47 ` [RFC PATCH v8 04/10] x86/retpoline: Make warn_thunk_thunk .noinstr Valentin Schneider
2026-03-24  9:47 ` [RFC PATCH v8 05/10] sched/isolation: Mark housekeeping_overridden key as __ro_after_init Valentin Schneider
2026-03-24 15:17   ` Shrikanth Hegde
2026-03-24 19:46     ` Valentin Schneider
2026-03-24  9:47 ` [RFC PATCH v8 06/10] objtool: Add .entry.text validation for static branches Valentin Schneider
2026-03-24  9:47 ` [RFC PATCH v8 07/10] x86/jump_label: Add ASM support for static_branch_likely() Valentin Schneider
2026-03-24  9:47 ` [RFC PATCH v8 08/10] x86/mm/pti: Introduce a kernel/user CR3 software signal Valentin Schneider
2026-04-15 12:02   ` Frederic Weisbecker
2026-03-24  9:48 ` [RFC PATCH v8 09/10] context_tracking,x86: Defer kernel text patching IPIs when tracking CR3 switches Valentin Schneider
2026-04-15 12:11   ` Frederic Weisbecker [this message]
2026-03-24  9:48 ` [RFC PATCH v8 10/10] x86/mm, mm/vmalloc: Defer kernel TLB flush " Valentin Schneider
2026-03-24 15:01 ` [syzbot ci] Re: context_tracking,x86: Defer some IPIs until a user->kernel transition syzbot ci

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ad-Ad44-IyU-yKJf@localhost.localdomain \
    --to=frederic@kernel.org \
    --cc=acme@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=ardb@kernel.org \
    --cc=arnd@arndb.de \
    --cc=boqun.feng@gmail.com \
    --cc=bp@alien8.de \
    --cc=dan.carpenter@linaro.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=dwagner@suse.de \
    --cc=hpa@zytor.com \
    --cc=jannh@google.com \
    --cc=jbaron@akamai.com \
    --cc=joelagnelf@nvidia.com \
    --cc=josh@joshtriplett.org \
    --cc=jpoimboe@kernel.org \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=masahiroy@kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=neeraj.upadhyay@kernel.org \
    --cc=nsaenzju@redhat.com \
    --cc=oleg@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=ptesarik@suse.com \
    --cc=riel@surriel.com \
    --cc=rostedt@goodmis.org \
    --cc=samitolvanen@google.com \
    --cc=shenhan@google.com \
    --cc=sshegde@linux.ibm.com \
    --cc=tglozar@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=urezki@gmail.com \
    --cc=vschneid@redhat.com \
    --cc=williams@redhat.com \
    --cc=x86@kernel.org \
    --cc=ypodemsk@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox