From: Sean Christopherson <seanjc@google.com>
To: Valentin Schneider <vschneid@redhat.com>
Cc: linux-kernel@vger.kernel.org, x86@kernel.org,
virtualization@lists.linux.dev,
linux-arm-kernel@lists.infradead.org, loongarch@lists.linux.dev,
linux-riscv@lists.infradead.org,
linux-perf-users@vger.kernel.org,
xen-devel@lists.xenproject.org, kvm@vger.kernel.org,
linux-arch@vger.kernel.org, rcu@vger.kernel.org,
linux-hardening@vger.kernel.org, linux-mm@kvack.org,
linux-kselftest@vger.kernel.org, bpf@vger.kernel.org,
bcm-kernel-feedback-list@broadcom.com,
Peter Zijlstra <peterz@infradead.org>,
Nicolas Saenz Julienne <nsaenzju@redhat.com>,
Juergen Gross <jgross@suse.com>,
Ajay Kaher <ajay.kaher@broadcom.com>,
Alexey Makhalov <alexey.amakhalov@broadcom.com>,
Russell King <linux@armlinux.org.uk>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>,
Huacai Chen <chenhuacai@kernel.org>,
WANG Xuerui <kernel@xen0n.name>,
Paul Walmsley <paul.walmsley@sifive.com>,
Palmer Dabbelt <palmer@dabbelt.com>,
Albert Ou <aou@eecs.berkeley.edu>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Namhyung Kim <namhyung@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Jiri Olsa <jolsa@kernel.org>, Ian Rogers <irogers@google.com>,
Adrian Hunter <adrian.hunter@intel.com>,
Kan Liang <kan.liang@linux.intel.com>,
Boris Ostrovsky <boris.ostrovsky@oracle.com>,
Josh Poimboeuf <jpoimboe@kernel.org>,
Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Andy Lutomirski <luto@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
Frederic Weisbecker <frederic@kernel.org>,
"Paul E. McKenney" <paulmck@kernel.org>,
Jason Baron <jbaron@akamai.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ard Biesheuvel <ardb@kernel.org>,
Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
Joel Fernandes <joel@joelfernandes.org>,
Josh Triplett <josh@joshtriplett.org>,
Boqun Feng <boqun.feng@gmail.com>,
Uladzislau Rezki <urezki@gmail.com>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Lai Jiangshan <jiangshanlai@gmail.com>,
Zqiang <qiang.zhang1211@gmail.com>,
Juri Lelli <juri.lelli@redhat.com>,
Clark Williams <williams@redhat.com>,
Yair Podemsky <ypodemsk@redhat.com>,
Tomas Glozar <tglozar@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Kees Cook <kees@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Christoph Hellwig <hch@infradead.org>,
Shuah Khan <shuah@kernel.org>,
Sami Tolvanen <samitolvanen@google.com>,
Miguel Ojeda <ojeda@kernel.org>,
Alice Ryhl <aliceryhl@google.com>,
"Mike Rapoport (Microsoft)" <rppt@kernel.org>,
Samuel Holland <samuel.holland@sifive.com>,
Rong Xu <xur@google.com>,
Geert Uytterhoeven <geert@linux-m68k.org>,
Yosry Ahmed <yosryahmed@google.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
"Masami Hiramatsu (Google)" <mhiramat@kernel.org>,
Jinghao Jia <jinghao7@illinois.edu>,
Luis Chamberlain <mcgrof@kernel.org>,
Randy Dunlap <rdunlap@infradead.org>,
Tiezhu Yang <yangtiezhu@loongson.cn>
Subject: Re: [PATCH v4 25/30] context_tracking,x86: Defer kernel text patching IPIs
Date: Tue, 14 Jan 2025 13:13:57 -0800 [thread overview]
Message-ID: <Z4bTlZkqihaAyGb4@google.com> (raw)
In-Reply-To: <20250114175143.81438-26-vschneid@redhat.com>
On Tue, Jan 14, 2025, Valentin Schneider wrote:
> text_poke_bp_batch() sends IPIs to all online CPUs to synchronize
> them vs the newly patched instruction. CPUs that are executing in userspace
> do not need this synchronization to happen immediately, and this is
> actually harmful interference for NOHZ_FULL CPUs.
...
> This leaves us with static keys and static calls.
...
> @@ -2317,11 +2334,20 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries
> * First step: add a int3 trap to the address that will be patched.
> */
> for (i = 0; i < nr_entries; i++) {
> - tp[i].old = *(u8 *)text_poke_addr(&tp[i]);
> - text_poke(text_poke_addr(&tp[i]), &int3, INT3_INSN_SIZE);
> + void *addr = text_poke_addr(&tp[i]);
> +
> + /*
> + * There's no safe way to defer IPIs for patching text in
> + * .noinstr, record whether there is at least one such poke.
> + */
> + if (is_kernel_noinstr_text((unsigned long)addr))
> + cond = NULL;
Maybe pre-check "cond", especially if multiple ranges need to be checked? I.e.
if (cond && is_kernel_noinstr_text(...))
> +
> + tp[i].old = *((u8 *)addr);
> + text_poke(addr, &int3, INT3_INSN_SIZE);
> }
>
> - text_poke_sync();
> + __text_poke_sync(cond);
>
> /*
> * Second step: update all but the first byte of the patched range.
...
> +/**
> + * is_kernel_noinstr_text - checks if the pointer address is located in the
> + * .noinstr section
> + *
> + * @addr: address to check
> + *
> + * Returns: true if the address is located in .noinstr, false otherwise.
> + */
> +static inline bool is_kernel_noinstr_text(unsigned long addr)
> +{
> + return addr >= (unsigned long)__noinstr_text_start &&
> + addr < (unsigned long)__noinstr_text_end;
> +}
This doesn't do the right thing for modules, which matters because KVM can be
built as a module on x86, and because context tracking understands transitions
to GUEST mode, i.e. CPUs that are running in a KVM guest will be treated as not
being in the kernel, and thus will have IPIs deferred. If KVM uses a static key
or branch between guest_state_enter_irqoff() and guest_state_exit_irqoff(), the
patching code won't wait for CPUs to exit guest mode, i.e. KVM could theoretically
use the wrong static path.
I don't expect this to ever cause problems in practice, because patching code in
KVM's VM-Enter/VM-Exit path that has *functional* implications, while CPUs are
actively running guest code, would be all kinds of crazy. But I do think we
should plug the hole.
If this issue is unique to KVM, i.e. is not a generic problem for all modules (I
assume module code generally isn't allowed in the entry path, even via NMI?), one
idea would be to let KVM register its noinstr section for text poking.
next prev parent reply other threads:[~2025-01-14 21:14 UTC|newest]
Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-14 17:51 [PATCH v4 00/30] context_tracking,x86: Defer some IPIs until a user->kernel transition Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 01/30] objtool: Make validate_call() recognize indirect calls to pv_ops[] Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 02/30] objtool: Flesh out warning related to pv_ops[] calls Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 03/30] rcu: Add a small-width RCU watching counter debug option Valentin Schneider
2025-01-21 13:56 ` Frederic Weisbecker
2025-01-14 17:51 ` [PATCH v4 04/30] rcutorture: Make TREE04 use CONFIG_RCU_DYNTICKS_TORTURE Valentin Schneider
2025-01-21 14:00 ` Frederic Weisbecker
2025-01-14 17:51 ` [PATCH v4 05/30] jump_label: Add annotations for validating noinstr usage Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 06/30] static_call: Add read-only-after-init static calls Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 07/30] x86/paravirt: Mark pv_sched_clock static call as __ro_after_init Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 08/30] x86/idle: Mark x86_idle " Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 09/30] x86/paravirt: Mark pv_steal_clock " Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 10/30] riscv/paravirt: " Valentin Schneider
2025-01-14 18:29 ` Andrew Jones
2025-01-14 17:51 ` [PATCH v4 11/30] loongarch/paravirt: " Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 12/30] arm64/paravirt: " Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 13/30] arm/paravirt: " Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 14/30] perf/x86/amd: Mark perf_lopwr_cb " Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 15/30] sched/clock: Mark sched_clock_running key " Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 16/30] x86/speculation/mds: Mark mds_idle_clear key as allowed in .noinstr Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 17/30] sched/clock, x86: Mark __sched_clock_stable " Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 18/30] x86/kvm/vmx: Mark vmx_l1d_should flush and vmx_l1d_flush_cond keys " Valentin Schneider
2025-01-14 21:19 ` Sean Christopherson
2025-01-17 9:50 ` Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 19/30] stackleack: Mark stack_erasing_bypass key " Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 20/30] objtool: Add noinstr validation for static branches/calls Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 21/30] context_tracking: Explicitely use CT_STATE_KERNEL where it is missing Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 22/30] context_tracking: Exit CT_STATE_IDLE upon irq/nmi entry Valentin Schneider
2025-01-22 0:22 ` Frederic Weisbecker
2025-01-22 1:04 ` Sean Christopherson
2025-01-27 11:17 ` Valentin Schneider
2025-02-07 17:06 ` Valentin Schneider
2025-02-07 18:37 ` Frederic Weisbecker
2025-02-10 17:36 ` Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 23/30] context_tracking: Turn CT_STATE_* into bits Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 24/30] context-tracking: Introduce work deferral infrastructure Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 25/30] context_tracking,x86: Defer kernel text patching IPIs Valentin Schneider
2025-01-14 21:13 ` Sean Christopherson [this message]
2025-01-14 21:48 ` Sean Christopherson
2025-01-17 9:54 ` Valentin Schneider
2025-01-17 9:47 ` Valentin Schneider
2025-01-17 17:15 ` Sean Christopherson
2025-01-20 13:53 ` Valentin Schneider
2025-01-14 21:26 ` Sean Christopherson
2025-01-24 10:48 ` K Prateek Nayak
2025-01-14 17:51 ` [PATCH v4 26/30] x86,tlb: Make __flush_tlb_global() noinstr-compliant Valentin Schneider
2025-01-14 21:45 ` Dave Hansen
2025-01-17 13:44 ` Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 27/30] x86/tlb: Make __flush_tlb_local() noinstr-compliant Valentin Schneider
2025-01-14 21:24 ` Sean Christopherson
2025-01-14 17:51 ` [PATCH v4 28/30] x86/tlb: Make __flush_tlb_all() noinstr Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs Valentin Schneider
2025-01-14 18:16 ` Jann Horn
2025-01-17 15:25 ` Valentin Schneider
2025-01-17 15:52 ` Jann Horn
2025-01-17 16:53 ` Valentin Schneider
2025-02-19 15:05 ` Joel Fernandes
2025-02-19 16:18 ` Valentin Schneider
2025-02-19 17:08 ` Joel Fernandes
2025-02-19 20:32 ` Dave Hansen
2025-01-27 15:51 ` Will Deacon
2025-02-10 18:36 ` Valentin Schneider
2025-02-10 22:08 ` Jann Horn
2025-02-11 13:33 ` Valentin Schneider
2025-02-11 14:03 ` Mark Rutland
2025-02-11 16:09 ` Valentin Schneider
2025-02-11 14:22 ` Dave Hansen
2025-02-11 16:10 ` Valentin Schneider
2025-02-18 22:40 ` Valentin Schneider
2025-02-19 0:39 ` Dave Hansen
2025-02-19 15:13 ` Valentin Schneider
2025-02-19 20:25 ` Dave Hansen
2025-02-20 17:10 ` Valentin Schneider
2025-02-20 17:38 ` Dave Hansen
2025-02-26 16:52 ` Valentin Schneider
2025-03-25 17:52 ` Valentin Schneider
2025-03-25 18:41 ` Jann Horn
2025-03-26 8:56 ` Valentin Schneider
2025-01-17 16:11 ` Uladzislau Rezki
2025-01-17 17:00 ` Valentin Schneider
2025-01-20 11:15 ` Uladzislau Rezki
2025-01-20 16:09 ` Valentin Schneider
2025-01-21 17:00 ` Uladzislau Rezki
2025-01-24 15:22 ` Valentin Schneider
2025-01-27 10:36 ` Uladzislau Rezki
2025-01-14 17:51 ` [PATCH v4 30/30] context-tracking: Add a Kconfig to enable IPI deferral for NO_HZ_IDLE Valentin Schneider
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z4bTlZkqihaAyGb4@google.com \
--to=seanjc@google.com \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=ajay.kaher@broadcom.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.shishkin@linux.intel.com \
--cc=alexey.amakhalov@broadcom.com \
--cc=aliceryhl@google.com \
--cc=aou@eecs.berkeley.edu \
--cc=ardb@kernel.org \
--cc=arnd@arndb.de \
--cc=bcm-kernel-feedback-list@broadcom.com \
--cc=boqun.feng@gmail.com \
--cc=boris.ostrovsky@oracle.com \
--cc=bp@alien8.de \
--cc=bpf@vger.kernel.org \
--cc=bsegall@google.com \
--cc=catalin.marinas@arm.com \
--cc=chenhuacai@kernel.org \
--cc=dave.hansen@linux.intel.com \
--cc=dietmar.eggemann@arm.com \
--cc=frederic@kernel.org \
--cc=geert@linux-m68k.org \
--cc=hch@infradead.org \
--cc=hpa@zytor.com \
--cc=irogers@google.com \
--cc=jbaron@akamai.com \
--cc=jgross@suse.com \
--cc=jiangshanlai@gmail.com \
--cc=jinghao7@illinois.edu \
--cc=joel@joelfernandes.org \
--cc=jolsa@kernel.org \
--cc=josh@joshtriplett.org \
--cc=jpoimboe@kernel.org \
--cc=juri.lelli@redhat.com \
--cc=kan.liang@linux.intel.com \
--cc=kees@kernel.org \
--cc=kernel@xen0n.name \
--cc=kirill.shutemov@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-hardening@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=linux@armlinux.org.uk \
--cc=loongarch@lists.linux.dev \
--cc=luto@kernel.org \
--cc=mark.rutland@arm.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=mcgrof@kernel.org \
--cc=mgorman@suse.de \
--cc=mhiramat@kernel.org \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=neeraj.upadhyay@kernel.org \
--cc=nsaenzju@redhat.com \
--cc=ojeda@kernel.org \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=paulmck@kernel.org \
--cc=pawan.kumar.gupta@linux.intel.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=qiang.zhang1211@gmail.com \
--cc=rcu@vger.kernel.org \
--cc=rdunlap@infradead.org \
--cc=rostedt@goodmis.org \
--cc=rppt@kernel.org \
--cc=samitolvanen@google.com \
--cc=samuel.holland@sifive.com \
--cc=shuah@kernel.org \
--cc=tglozar@redhat.com \
--cc=tglx@linutronix.de \
--cc=urezki@gmail.com \
--cc=vincent.guittot@linaro.org \
--cc=virtualization@lists.linux.dev \
--cc=vschneid@redhat.com \
--cc=will@kernel.org \
--cc=williams@redhat.com \
--cc=x86@kernel.org \
--cc=xen-devel@lists.xenproject.org \
--cc=xur@google.com \
--cc=yangtiezhu@loongson.cn \
--cc=yosryahmed@google.com \
--cc=ypodemsk@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox