linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mark Rutland <mark.rutland@arm.com>
To: Valentin Schneider <vschneid@redhat.com>
Cc: Jann Horn <jannh@google.com>,
	linux-kernel@vger.kernel.org, x86@kernel.org,
	virtualization@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org, loongarch@lists.linux.dev,
	linux-riscv@lists.infradead.org,
	linux-perf-users@vger.kernel.org, xen-devel@lists.xenproject.org,
	kvm@vger.kernel.org, linux-arch@vger.kernel.org,
	rcu@vger.kernel.org, linux-hardening@vger.kernel.org,
	linux-mm@kvack.org, linux-kselftest@vger.kernel.org,
	bpf@vger.kernel.org, bcm-kernel-feedback-list@broadcom.com,
	Juergen Gross <jgross@suse.com>,
	Ajay Kaher <ajay.kaher@broadcom.com>,
	Alexey Makhalov <alexey.amakhalov@broadcom.com>,
	Russell King <linux@armlinux.org.uk>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Huacai Chen <chenhuacai@kernel.org>,
	WANG Xuerui <kernel@xen0n.name>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Albert Ou <aou@eecs.berkeley.edu>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Namhyung Kim <namhyung@kernel.org>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>, Ian Rogers <irogers@google.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	"Liang, Kan" <kan.liang@linux.intel.com>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Josh Poimboeuf <jpoimboe@kernel.org>,
	Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
	Sean Christopherson <seanjc@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Andy Lutomirski <luto@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
	Frederic Weisbecker <frederic@kernel.org>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Jason Baron <jbaron@akamai.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ard Biesheuvel <ardb@kernel.org>,
	Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
	Joel Fernandes <joel@joelfernandes.org>,
	Josh Triplett <josh@joshtriplett.org>,
	Boqun Feng <boqun.feng@gmail.com>,
	Uladzislau Rezki <urezki@gmail.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Zqiang <qiang.zhang1211@gmail.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Clark Williams <williams@redhat.com>,
	Yair Podemsky <ypodemsk@redhat.com>,
	Tomas Glozar <tglozar@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Kees Cook <kees@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christoph Hellwig <hch@infradead.org>,
	Shuah Khan <shuah@kernel.org>,
	Sami Tolvanen <samitolvanen@google.com>,
	Miguel Ojeda <ojeda@kernel.org>,
	Alice Ryhl <aliceryhl@google.com>,
	"Mike Rapoport (Microsoft)" <rppt@kernel.org>,
	Samuel Holland <samuel.holland@sifive.com>,
	Rong Xu <xur@google.com>,
	Nicolas Saenz Julienne <nsaenzju@redhat.com>,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	Yosry Ahmed <yosryahmed@google.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	"Masami Hiramatsu (Google)" <mhiramat@kernel.org>,
	Jinghao Jia <jinghao7@illinois.edu>,
	Luis Chamberlain <mcgrof@kernel.org>,
	Randy Dunlap <rdunlap@infradead.org>,
	Tiezhu Yang <yangtiezhu@loongson.cn>
Subject: Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs
Date: Tue, 11 Feb 2025 14:03:08 +0000	[thread overview]
Message-ID: <Z6tYnOEBkOlT_ehp@J2N7QTR9R3> (raw)
In-Reply-To: <xhsmh34gkk3ls.mognet@vschneid-thinkpadt14sgen2i.remote.csb>

On Tue, Feb 11, 2025 at 02:33:51PM +0100, Valentin Schneider wrote:
> On 10/02/25 23:08, Jann Horn wrote:
> > On Mon, Feb 10, 2025 at 7:36 PM Valentin Schneider <vschneid@redhat.com> wrote:
> >> What if isolated CPUs unconditionally did a TLBi as late as possible in
> >> the stack right before returning to userspace? This would mean that upon
> >> re-entering the kernel, an isolated CPU's TLB wouldn't contain any kernel
> >> range translation - with the exception of whatever lies between the
> >> last-minute flush and the actual userspace entry, which should be feasible
> >> to vet? Then AFAICT there wouldn't be any work/flush to defer, the IPI
> >> could be entirely silenced if it targets an isolated CPU.
> >
> > Two issues with that:
> 
> Firstly, thank you for entertaining the idea :-)
> 
> > 1. I think the "Common not Private" feature Will Deacon referred to is
> > incompatible with this idea:
> > <https://developer.arm.com/documentation/101811/0104/Address-spaces/Common-not-Private>
> > says "When the CnP bit is set, the software promises to use the ASIDs
> > and VMIDs in the same way on all processors, which allows the TLB
> > entries that are created by one processor to be used by another"
> 
> Sorry for being obtuse - I can understand inconsistent TLB states (old vs
> new translations being present in separate TLBs) due to not sending the
> flush IPI causing an issue with that, but not "flushing early". Even if TLB
> entries can be shared/accessed between CPUs, a CPU should be allowed not to
> have a shared entry in its TLB - what am I missing?
> 
> > 2. It's wrong to assume that TLB entries are only populated for
> > addresses you access - thanks to speculative execution, you have to
> > assume that the CPU might be populating random TLB entries all over
> > the place.
> 
> Gotta love speculation. Now it is supposed to be limited to genuinely
> accessible data & code, right? Say theoretically we have a full TLBi as
> literally the last thing before doing the return-to-userspace, speculation
> should be limited to executing maybe bits of the return-from-userspace
> code?

I think it's easier to ignore speculation entirely, and just assume that
the MMU can arbitrarily fill TLB entries from any page table entries
which are valid/accessible in the active page tables. Hardware
prefetchers can do that regardless of the specific path of speculative
execution.

Thus TLB fills are not limited to VAs which would be used on that
return-to-userspace path.

> Furthermore, I would hope that once a CPU is executing in userspace, it's
> not going to populate the TLB with kernel address translations - AIUI the
> whole vulnerability mitigation debacle was about preventing this sort of
> thing.

The CPU can definitely do that; the vulnerability mitigations are all
about what userspace can observe rather than what the CPU can do in the
background. Additionally, there are features like SPE and TRBE that use
kernel addresses while the CPU is executing userspace instructions.

The latest ARM Architecture Reference Manual (ARM DDI 0487 L.a) is fairly clear
about that in section D8.16 "Translation Lookaside Buff", where it says
(among other things):

  When address translation is enabled, if a translation table entry
  meets all of the following requirements, then that translation table
  entry is permitted to be cached in a TLB or intermediate TLB caching
  structure at any time:
  • The translation table entry itself does not generate a Translation
    fault, an Address size fault, or an Access flag fault.
  • The translation table entry is not from a translation regime
    configured by an Exception level that is lower than the current
    Exception level.

Here "permitted to be cached in a TLB" also implies that the HW is
allowed to fetch the translation tabl entry (which is what ARM call page
table entries).

The PDF can be found at:

  https://developer.arm.com/documentation/ddi0487/la/?lang=en

Mark.


  reply	other threads:[~2025-02-11 14:05 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-14 17:51 [PATCH v4 00/30] context_tracking,x86: Defer some IPIs until a user->kernel transition Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 01/30] objtool: Make validate_call() recognize indirect calls to pv_ops[] Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 02/30] objtool: Flesh out warning related to pv_ops[] calls Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 03/30] rcu: Add a small-width RCU watching counter debug option Valentin Schneider
2025-01-21 13:56   ` Frederic Weisbecker
2025-01-14 17:51 ` [PATCH v4 04/30] rcutorture: Make TREE04 use CONFIG_RCU_DYNTICKS_TORTURE Valentin Schneider
2025-01-21 14:00   ` Frederic Weisbecker
2025-01-14 17:51 ` [PATCH v4 05/30] jump_label: Add annotations for validating noinstr usage Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 06/30] static_call: Add read-only-after-init static calls Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 07/30] x86/paravirt: Mark pv_sched_clock static call as __ro_after_init Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 08/30] x86/idle: Mark x86_idle " Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 09/30] x86/paravirt: Mark pv_steal_clock " Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 10/30] riscv/paravirt: " Valentin Schneider
2025-01-14 18:29   ` Andrew Jones
2025-01-14 17:51 ` [PATCH v4 11/30] loongarch/paravirt: " Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 12/30] arm64/paravirt: " Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 13/30] arm/paravirt: " Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 14/30] perf/x86/amd: Mark perf_lopwr_cb " Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 15/30] sched/clock: Mark sched_clock_running key " Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 16/30] x86/speculation/mds: Mark mds_idle_clear key as allowed in .noinstr Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 17/30] sched/clock, x86: Mark __sched_clock_stable " Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 18/30] x86/kvm/vmx: Mark vmx_l1d_should flush and vmx_l1d_flush_cond keys " Valentin Schneider
2025-01-14 21:19   ` Sean Christopherson
2025-01-17  9:50     ` Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 19/30] stackleack: Mark stack_erasing_bypass key " Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 20/30] objtool: Add noinstr validation for static branches/calls Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 21/30] context_tracking: Explicitely use CT_STATE_KERNEL where it is missing Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 22/30] context_tracking: Exit CT_STATE_IDLE upon irq/nmi entry Valentin Schneider
2025-01-22  0:22   ` Frederic Weisbecker
2025-01-22  1:04     ` Sean Christopherson
2025-01-27 11:17     ` Valentin Schneider
2025-02-07 17:06       ` Valentin Schneider
2025-02-07 18:37         ` Frederic Weisbecker
2025-02-10 17:36           ` Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 23/30] context_tracking: Turn CT_STATE_* into bits Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 24/30] context-tracking: Introduce work deferral infrastructure Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 25/30] context_tracking,x86: Defer kernel text patching IPIs Valentin Schneider
2025-01-14 21:13   ` Sean Christopherson
2025-01-14 21:48     ` Sean Christopherson
2025-01-17  9:54       ` Valentin Schneider
2025-01-17  9:47     ` Valentin Schneider
2025-01-17 17:15       ` Sean Christopherson
2025-01-20 13:53         ` Valentin Schneider
2025-01-14 21:26   ` Sean Christopherson
2025-01-24 10:48   ` K Prateek Nayak
2025-01-14 17:51 ` [PATCH v4 26/30] x86,tlb: Make __flush_tlb_global() noinstr-compliant Valentin Schneider
2025-01-14 21:45   ` Dave Hansen
2025-01-17 13:44     ` Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 27/30] x86/tlb: Make __flush_tlb_local() noinstr-compliant Valentin Schneider
2025-01-14 21:24   ` Sean Christopherson
2025-01-14 17:51 ` [PATCH v4 28/30] x86/tlb: Make __flush_tlb_all() noinstr Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs Valentin Schneider
2025-01-14 18:16   ` Jann Horn
2025-01-17 15:25     ` Valentin Schneider
2025-01-17 15:52       ` Jann Horn
2025-01-17 16:53         ` Valentin Schneider
2025-02-19 15:05           ` Joel Fernandes
2025-02-19 16:18             ` Valentin Schneider
2025-02-19 17:08               ` Joel Fernandes
2025-02-19 20:32                 ` Dave Hansen
2025-01-27 15:51         ` Will Deacon
2025-02-10 18:36         ` Valentin Schneider
2025-02-10 22:08           ` Jann Horn
2025-02-11 13:33             ` Valentin Schneider
2025-02-11 14:03               ` Mark Rutland [this message]
2025-02-11 16:09                 ` Valentin Schneider
2025-02-11 14:22               ` Dave Hansen
2025-02-11 16:10                 ` Valentin Schneider
2025-02-18 22:40                 ` Valentin Schneider
2025-02-19  0:39                   ` Dave Hansen
2025-02-19 15:13                     ` Valentin Schneider
2025-02-19 20:25                       ` Dave Hansen
2025-02-20 17:10                         ` Valentin Schneider
2025-02-20 17:38                           ` Dave Hansen
2025-02-26 16:52                             ` Valentin Schneider
2025-03-25 17:52                             ` Valentin Schneider
2025-03-25 18:41                               ` Jann Horn
2025-03-26  8:56                                 ` Valentin Schneider
2025-01-17 16:11       ` Uladzislau Rezki
2025-01-17 17:00         ` Valentin Schneider
2025-01-20 11:15           ` Uladzislau Rezki
2025-01-20 16:09             ` Valentin Schneider
2025-01-21 17:00               ` Uladzislau Rezki
2025-01-24 15:22                 ` Valentin Schneider
2025-01-27 10:36                   ` Uladzislau Rezki
2025-01-14 17:51 ` [PATCH v4 30/30] context-tracking: Add a Kconfig to enable IPI deferral for NO_HZ_IDLE Valentin Schneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z6tYnOEBkOlT_ehp@J2N7QTR9R3 \
    --to=mark.rutland@arm.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=ajay.kaher@broadcom.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=alexey.amakhalov@broadcom.com \
    --cc=aliceryhl@google.com \
    --cc=aou@eecs.berkeley.edu \
    --cc=ardb@kernel.org \
    --cc=arnd@arndb.de \
    --cc=bcm-kernel-feedback-list@broadcom.com \
    --cc=boqun.feng@gmail.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=bpf@vger.kernel.org \
    --cc=bsegall@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=chenhuacai@kernel.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=frederic@kernel.org \
    --cc=geert@linux-m68k.org \
    --cc=hch@infradead.org \
    --cc=hpa@zytor.com \
    --cc=irogers@google.com \
    --cc=jannh@google.com \
    --cc=jbaron@akamai.com \
    --cc=jgross@suse.com \
    --cc=jiangshanlai@gmail.com \
    --cc=jinghao7@illinois.edu \
    --cc=joel@joelfernandes.org \
    --cc=jolsa@kernel.org \
    --cc=josh@joshtriplett.org \
    --cc=jpoimboe@kernel.org \
    --cc=juri.lelli@redhat.com \
    --cc=kan.liang@linux.intel.com \
    --cc=kees@kernel.org \
    --cc=kernel@xen0n.name \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux@armlinux.org.uk \
    --cc=loongarch@lists.linux.dev \
    --cc=luto@kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mcgrof@kernel.org \
    --cc=mgorman@suse.de \
    --cc=mhiramat@kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=neeraj.upadhyay@kernel.org \
    --cc=nsaenzju@redhat.com \
    --cc=ojeda@kernel.org \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=paulmck@kernel.org \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=qiang.zhang1211@gmail.com \
    --cc=rcu@vger.kernel.org \
    --cc=rdunlap@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=samitolvanen@google.com \
    --cc=samuel.holland@sifive.com \
    --cc=seanjc@google.com \
    --cc=shuah@kernel.org \
    --cc=tglozar@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=urezki@gmail.com \
    --cc=vincent.guittot@linaro.org \
    --cc=virtualization@lists.linux.dev \
    --cc=vschneid@redhat.com \
    --cc=will@kernel.org \
    --cc=williams@redhat.com \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    --cc=xur@google.com \
    --cc=yangtiezhu@loongson.cn \
    --cc=yosryahmed@google.com \
    --cc=ypodemsk@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox