[PATCH v5 0/2] skip redundant sync IPIs when TLB flush sent them

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v5 0/2] skip redundant sync IPIs when TLB flush sent them
@ 2026-03-02  6:30 Lance Yang
  2026-03-02  6:30 ` [PATCH v5 1/2] mm/mmu_gather: prepare to skip redundant sync IPIs Lance Yang
  2026-03-02  6:30 ` [PATCH v5 2/2] x86/tlb: skip redundant sync IPIs for native TLB flush Lance Yang
  0 siblings, 2 replies; 5+ messages in thread
From: Lance Yang @ 2026-03-02  6:30 UTC (permalink / raw)
  To: akpm
  Cc: peterz, david, dave.hansen, dave.hansen, ypodemsk, hughd, will,
	aneesh.kumar, npiggin, tglx, mingo, bp, x86, hpa, arnd,
	lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, shy828301, riel, jannh, jgross,
	seanjc, pbonzini, boris.ostrovsky, virtualization, kvm,
	linux-arch, linux-mm, linux-kernel, ioworker0

Hi all,

When page table operations require synchronization with software/lockless
walkers, they call tlb_remove_table_sync_{one,rcu}() after flushing the
TLB (tlb->freed_tables or tlb->unshared_tables).

On architectures where the TLB flush already sends IPIs to all target CPUs,
the subsequent sync IPI broadcast is redundant. This is not only costly on
large systems where it disrupts all CPUs even for single-process page table
operations, but has also been reported to hurt RT workloads[1].

This series introduces tlb_table_flush_implies_ipi_broadcast() to check if
the prior TLB flush already provided the necessary synchronization. When
true, the sync calls can early-return.

A few cases rely on this synchronization:

1) hugetlb PMD unshare[2]: The problem is not the freeing but the reuse
   of the PMD table for other purposes in the last remaining user after
   unsharing.

2) khugepaged collapse[3]: Ensure no concurrent GUP-fast before collapsing
   and (possibly) freeing the page table / re-depositing it.

Two-step plan as David suggested[4]:

Step 1 (this series): Skip redundant sync when we're 100% certain the TLB
flush sent IPIs. INVLPGB is excluded because when supported, we cannot
guarantee IPIs were sent, keeping it clean and simple.

Step 2 (future work): Send targeted IPIs only to CPUs actually doing
software/lockless page table walks, benefiting all architectures.

Regarding Step 2, it obviously only applies to setups where Step 1 does not
apply: like x86 with INVLPGB or arm64. Step 2 work is ongoing; early
attempts showed ~3% GUP-fast overhead. Reducing the overhead requires more
work and tuning; it will be submitted separately once ready.

David Hildenbrand did the initial implementation. I built on his work and
relied on off-list discussions to push it further - thanks a lot David!

[1] https://lore.kernel.org/linux-mm/1b27a3fa-359a-43d0-bdeb-c31341749367@kernel.org/
[2] https://lore.kernel.org/linux-mm/6a364356-5fea-4a6c-b959-ba3b22ce9c88@kernel.org/
[3] https://lore.kernel.org/linux-mm/2cb4503d-3a3f-4f6c-8038-7b3d1c74b3c2@kernel.org/
[4] https://lore.kernel.org/linux-mm/bbfdf226-4660-4949-b17b-0d209ee4ef8c@kernel.org/

v4 -> v5:
- Drop per-CPU tracking (active_lockless_pt_walk_mm) from this series;
  defer to Step 2 as it adds ~3% GUP-fast overhead
- Keep pv_ops property false for PV backends like KVM: preempted vCPUs
  cannot be assumed safe（per Sean)
  https://lore.kernel.org/linux-mm/aaCP95l-m8ISXF78@google.com/
- https://lore.kernel.org/linux-mm/20260202074557.16544-1-lance.yang@linux.dev/ 

v3 -> v4:
- Rework based on David's two-step direction and per-CPU idea:
  1) Targeted IPIs: per-CPU variable when entering/leaving lockless page
     table walk; tlb_remove_table_sync_mm() IPIs only those CPUs.
  2) On x86, pv_mmu_ops property set at init to skip the extra sync when
     flush_tlb_multi() already sends IPIs.
  https://lore.kernel.org/linux-mm/bbfdf226-4660-4949-b17b-0d209ee4ef8c@kernel.org/
- https://lore.kernel.org/linux-mm/20260106120303.38124-1-lance.yang@linux.dev/

v2 -> v3:
- Complete rewrite: use dynamic IPI tracking instead of static checks
  (per Dave Hansen, thanks!)
- Track IPIs via mmu_gather: native_flush_tlb_multi() sets flag when
  actually sending IPIs
- Motivation for skipping redundant IPIs explained by David:
  https://lore.kernel.org/linux-mm/1b27a3fa-359a-43d0-bdeb-c31341749367@kernel.org/
- https://lore.kernel.org/linux-mm/20251229145245.85452-1-lance.yang@linux.dev/

v1 -> v2:
- Fix cover letter encoding to resolve send-email issues. Apologies for
  any email flood caused by the failed send attempts :(

RFC -> v1:
- Use a callback function in pv_mmu_ops instead of comparing function
  pointers (per David)
- Embed the check directly in tlb_remove_table_sync_one() instead of
  requiring every caller to check explicitly (per David)
- Move tlb_table_flush_implies_ipi_broadcast() outside of
  CONFIG_MMU_GATHER_RCU_TABLE_FREE to fix build error on architectures
  that don't enable this config.
  https://lore.kernel.org/oe-kbuild-all/202512142156.cShiu6PU-lkp@intel.com/
- https://lore.kernel.org/linux-mm/20251213080038.10917-1-lance.yang@linux.dev/

Lance Yang (2):
  mm/mmu_gather: prepare to skip redundant sync IPIs
  x86/tlb: skip redundant sync IPIs for native TLB flush

 arch/x86/include/asm/paravirt_types.h |  5 +++++
 arch/x86/include/asm/smp.h            |  7 +++++++
 arch/x86/include/asm/tlb.h            | 20 +++++++++++++++++++-
 arch/x86/kernel/paravirt.c            | 16 ++++++++++++++++
 arch/x86/kernel/smpboot.c             |  1 +
 include/asm-generic/tlb.h             | 17 +++++++++++++++++
 mm/mmu_gather.c                       | 15 +++++++++++++++
 7 files changed, 80 insertions(+), 1 deletion(-)

-- 
2.49.0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v5 1/2] mm/mmu_gather: prepare to skip redundant sync IPIs
  2026-03-02  6:30 [PATCH v5 0/2] skip redundant sync IPIs when TLB flush sent them Lance Yang
@ 2026-03-02  6:30 ` Lance Yang
  2026-03-02  6:30 ` [PATCH v5 2/2] x86/tlb: skip redundant sync IPIs for native TLB flush Lance Yang
  1 sibling, 0 replies; 5+ messages in thread
From: Lance Yang @ 2026-03-02  6:30 UTC (permalink / raw)
  To: akpm
  Cc: peterz, david, dave.hansen, dave.hansen, ypodemsk, hughd, will,
	aneesh.kumar, npiggin, tglx, mingo, bp, x86, hpa, arnd,
	lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, shy828301, riel, jannh, jgross,
	seanjc, pbonzini, boris.ostrovsky, virtualization, kvm,
	linux-arch, linux-mm, linux-kernel, ioworker0, Lance Yang

From: Lance Yang <lance.yang@linux.dev>

When page table operations require synchronization with software/lockless
walkers, they call tlb_remove_table_sync_{one,rcu}() after flushing the
TLB (tlb->freed_tables or tlb->unshared_tables).

On architectures where the TLB flush already sends IPIs to all target CPUs,
the subsequent sync IPI broadcast is redundant. This is not only costly on
large systems where it disrupts all CPUs even for single-process page table
operations, but has also been reported to hurt RT workloads[1].

Introduce tlb_table_flush_implies_ipi_broadcast() to check if the prior TLB
flush already provided the necessary synchronization. When true, the sync
calls can early-return.

A few cases rely on this synchronization:

1) hugetlb PMD unshare[2]: The problem is not the freeing but the reuse
   of the PMD table for other purposes in the last remaining user after
   unsharing.

2) khugepaged collapse[3]: Ensure no concurrent GUP-fast before collapsing
   and (possibly) freeing the page table / re-depositing it.

Currently always returns false (no behavior change). The follow-up patch
will enable the optimization for x86.

[1] https://lore.kernel.org/linux-mm/1b27a3fa-359a-43d0-bdeb-c31341749367@kernel.org/
[2] https://lore.kernel.org/linux-mm/6a364356-5fea-4a6c-b959-ba3b22ce9c88@kernel.org/
[3] https://lore.kernel.org/linux-mm/2cb4503d-3a3f-4f6c-8038-7b3d1c74b3c2@kernel.org/

Suggested-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Lance Yang <lance.yang@linux.dev>
---
 include/asm-generic/tlb.h | 17 +++++++++++++++++
 mm/mmu_gather.c           | 15 +++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index bdcc2778ac64..cb41cc6a0024 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -240,6 +240,23 @@ static inline void tlb_remove_table(struct mmu_gather *tlb, void *table)
 }
 #endif /* CONFIG_MMU_GATHER_TABLE_FREE */
 
+/**
+ * tlb_table_flush_implies_ipi_broadcast - does TLB flush imply IPI sync
+ *
+ * When page table operations require synchronization with software/lockless
+ * walkers, they flush the TLB (tlb->freed_tables or tlb->unshared_tables)
+ * then call tlb_remove_table_sync_{one,rcu}(). If the flush already sent
+ * IPIs to all CPUs, the sync call is redundant.
+ *
+ * Returns false by default. Architectures can override by defining this.
+ */
+#ifndef tlb_table_flush_implies_ipi_broadcast
+static inline bool tlb_table_flush_implies_ipi_broadcast(void)
+{
+	return false;
+}
+#endif
+
 #ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE
 /*
  * This allows an architecture that does not use the linux page-tables for
diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
index 3985d856de7f..37a6a711c37e 100644
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -283,6 +283,14 @@ void tlb_remove_table_sync_one(void)
 	 * It is however sufficient for software page-table walkers that rely on
 	 * IRQ disabling.
 	 */
+
+	/*
+	 * Skip IPI if the preceding TLB flush already synchronized with
+	 * all CPUs that could be doing software/lockless page table walks.
+	 */
+	if (tlb_table_flush_implies_ipi_broadcast())
+		return;
+
 	smp_call_function(tlb_remove_table_smp_sync, NULL, 1);
 }
 
@@ -312,6 +320,13 @@ static void tlb_remove_table_free(struct mmu_table_batch *batch)
  */
 void tlb_remove_table_sync_rcu(void)
 {
+	/*
+	 * Skip RCU wait if the preceding TLB flush already synchronized
+	 * with all CPUs that could be doing software/lockless page table walks.
+	 */
+	if (tlb_table_flush_implies_ipi_broadcast())
+		return;
+
 	synchronize_rcu();
 }
 
-- 
2.49.0



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v5 2/2] x86/tlb: skip redundant sync IPIs for native TLB flush
  2026-03-02  6:30 [PATCH v5 0/2] skip redundant sync IPIs when TLB flush sent them Lance Yang
  2026-03-02  6:30 ` [PATCH v5 1/2] mm/mmu_gather: prepare to skip redundant sync IPIs Lance Yang
@ 2026-03-02  6:30 ` Lance Yang
  2026-03-02 14:56   ` Peter Zijlstra
  1 sibling, 1 reply; 5+ messages in thread
From: Lance Yang @ 2026-03-02  6:30 UTC (permalink / raw)
  To: akpm
  Cc: peterz, david, dave.hansen, dave.hansen, ypodemsk, hughd, will,
	aneesh.kumar, npiggin, tglx, mingo, bp, x86, hpa, arnd,
	lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, shy828301, riel, jannh, jgross,
	seanjc, pbonzini, boris.ostrovsky, virtualization, kvm,
	linux-arch, linux-mm, linux-kernel, ioworker0, Lance Yang

From: Lance Yang <lance.yang@linux.dev>

Enable the optimization introduced in the previous patch for x86.

Add pv_ops.mmu.flush_tlb_multi_implies_ipi_broadcast to track whether
flush_tlb_multi() sends real IPIs. Initialize it once in
native_pv_tlb_init() during boot.

On CONFIG_PARAVIRT systems, tlb_table_flush_implies_ipi_broadcast() reads
the pv_ops property. On non-PARAVIRT, it directly checks for INVLPGB.

PV backends (KVM, Xen, Hyper-V) typically have their own implementations
and don't call native_flush_tlb_multi() directly, so they cannot be trusted
to provide the IPI guarantees we need. They keep the property false.

Two-step plan as David suggested[1]:

Step 1 (this patch): Skip redundant sync when we're 100% certain the TLB
flush sent IPIs. INVLPGB is excluded because when supported, we cannot
guarantee IPIs were sent, keeping it clean and simple.

Step 2 (future work): Send targeted IPIs only to CPUs actually doing
software/lockless page table walks, benefiting all architectures.

Regarding Step 2, it obviously only applies to setups where Step 1 does
not apply: like x86 with INVLPGB or arm64.

[1] https://lore.kernel.org/linux-mm/bbfdf226-4660-4949-b17b-0d209ee4ef8c@kernel.org/

Suggested-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Lance Yang <lance.yang@linux.dev>
---
 arch/x86/include/asm/paravirt_types.h |  5 +++++
 arch/x86/include/asm/smp.h            |  7 +++++++
 arch/x86/include/asm/tlb.h            | 20 +++++++++++++++++++-
 arch/x86/kernel/paravirt.c            | 16 ++++++++++++++++
 arch/x86/kernel/smpboot.c             |  1 +
 5 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 9bcf6bce88f6..ec01268f2e3e 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -112,6 +112,11 @@ struct pv_mmu_ops {
 	void (*flush_tlb_multi)(const struct cpumask *cpus,
 				const struct flush_tlb_info *info);
 
+	/*
+	 * True if flush_tlb_multi() sends real IPIs to all target CPUs.
+	 */
+	bool flush_tlb_multi_implies_ipi_broadcast;
+
 	/* Hook for intercepting the destruction of an mm_struct. */
 	void (*exit_mmap)(struct mm_struct *mm);
 	void (*notify_page_enc_status_changed)(unsigned long pfn, int npages, bool enc);
diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index 84951572ab81..4ac175414ac1 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -105,6 +105,13 @@ void native_smp_prepare_boot_cpu(void);
 void smp_prepare_cpus_common(void);
 void native_smp_prepare_cpus(unsigned int max_cpus);
 void native_smp_cpus_done(unsigned int max_cpus);
+
+#ifdef CONFIG_PARAVIRT
+void __init native_pv_tlb_init(void);
+#else
+static inline void native_pv_tlb_init(void) { }
+#endif
+
 int common_cpu_up(unsigned int cpunum, struct task_struct *tidle);
 int native_kick_ap(unsigned int cpu, struct task_struct *tidle);
 int native_cpu_disable(void);
diff --git a/arch/x86/include/asm/tlb.h b/arch/x86/include/asm/tlb.h
index 866ea78ba156..87ef7147eac8 100644
--- a/arch/x86/include/asm/tlb.h
+++ b/arch/x86/include/asm/tlb.h
@@ -5,10 +5,23 @@
 #define tlb_flush tlb_flush
 static inline void tlb_flush(struct mmu_gather *tlb);
 
+#define tlb_table_flush_implies_ipi_broadcast tlb_table_flush_implies_ipi_broadcast
+static inline bool tlb_table_flush_implies_ipi_broadcast(void);
+
 #include <asm-generic/tlb.h>
 #include <linux/kernel.h>
 #include <vdso/bits.h>
 #include <vdso/page.h>
+#include <asm/paravirt.h>
+
+static inline bool tlb_table_flush_implies_ipi_broadcast(void)
+{
+#ifdef CONFIG_PARAVIRT
+	return pv_ops.mmu.flush_tlb_multi_implies_ipi_broadcast;
+#else
+	return !cpu_feature_enabled(X86_FEATURE_INVLPGB);
+#endif
+}
 
 static inline void tlb_flush(struct mmu_gather *tlb)
 {
@@ -20,7 +33,12 @@ static inline void tlb_flush(struct mmu_gather *tlb)
 		end = tlb->end;
 	}
 
-	flush_tlb_mm_range(tlb->mm, start, end, stride_shift, tlb->freed_tables);
+	/*
+	 * Pass both freed_tables and unshared_tables so that lazy-TLB CPUs
+	 * also receive IPIs during unsharing page tables.
+	 */
+	flush_tlb_mm_range(tlb->mm, start, end, stride_shift,
+			   tlb->freed_tables || tlb->unshared_tables);
 }
 
 static inline void invlpg(unsigned long addr)
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index a6ed52cae003..b681b8319295 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -154,6 +154,7 @@ struct paravirt_patch_template pv_ops = {
 	.mmu.flush_tlb_kernel	= native_flush_tlb_global,
 	.mmu.flush_tlb_one_user	= native_flush_tlb_one_user,
 	.mmu.flush_tlb_multi	= native_flush_tlb_multi,
+	.mmu.flush_tlb_multi_implies_ipi_broadcast = false,
 
 	.mmu.exit_mmap		= paravirt_nop,
 	.mmu.notify_page_enc_status_changed	= paravirt_nop,
@@ -221,3 +222,18 @@ NOKPROBE_SYMBOL(native_load_idt);
 
 EXPORT_SYMBOL(pv_ops);
 EXPORT_SYMBOL_GPL(pv_info);
+
+void __init native_pv_tlb_init(void)
+{
+	/*
+	 * If PV backend already set the property, respect it.
+	 * Otherwise, check if native TLB flush sends real IPIs to all target
+	 * CPUs (i.e., not using INVLPGB broadcast invalidation).
+	 */
+	if (pv_ops.mmu.flush_tlb_multi_implies_ipi_broadcast)
+		return;
+
+	if (pv_ops.mmu.flush_tlb_multi == native_flush_tlb_multi &&
+	    !cpu_feature_enabled(X86_FEATURE_INVLPGB))
+		pv_ops.mmu.flush_tlb_multi_implies_ipi_broadcast = true;
+}
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 5cd6950ab672..3cdb04162843 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1167,6 +1167,7 @@ void __init native_smp_prepare_boot_cpu(void)
 		switch_gdt_and_percpu_base(me);
 
 	native_pv_lock_init();
+	native_pv_tlb_init();
 }
 
 void __init native_smp_cpus_done(unsigned int max_cpus)
-- 
2.49.0



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v5 2/2] x86/tlb: skip redundant sync IPIs for native TLB flush
  2026-03-02  6:30 ` [PATCH v5 2/2] x86/tlb: skip redundant sync IPIs for native TLB flush Lance Yang
@ 2026-03-02 14:56   ` Peter Zijlstra
  2026-03-02 15:48     ` Lance Yang
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2026-03-02 14:56 UTC (permalink / raw)
  To: Lance Yang
  Cc: akpm, david, dave.hansen, dave.hansen, ypodemsk, hughd, will,
	aneesh.kumar, npiggin, tglx, mingo, bp, x86, hpa, arnd,
	lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, shy828301, riel, jannh, jgross,
	seanjc, pbonzini, boris.ostrovsky, virtualization, kvm,
	linux-arch, linux-mm, linux-kernel, ioworker0

On Mon, Mar 02, 2026 at 02:30:36PM +0800, Lance Yang wrote:


> @@ -221,3 +222,18 @@ NOKPROBE_SYMBOL(native_load_idt);
>  
>  EXPORT_SYMBOL(pv_ops);
>  EXPORT_SYMBOL_GPL(pv_info);
> +
> +void __init native_pv_tlb_init(void)
> +{
> +	/*
> +	 * If PV backend already set the property, respect it.
> +	 * Otherwise, check if native TLB flush sends real IPIs to all target
> +	 * CPUs (i.e., not using INVLPGB broadcast invalidation).
> +	 */
> +	if (pv_ops.mmu.flush_tlb_multi_implies_ipi_broadcast)
> +		return;
> +
> +	if (pv_ops.mmu.flush_tlb_multi == native_flush_tlb_multi &&
> +	    !cpu_feature_enabled(X86_FEATURE_INVLPGB))
> +		pv_ops.mmu.flush_tlb_multi_implies_ipi_broadcast = true;
> +}

How about making this a static_branch instead?

> diff --git a/arch/x86/include/asm/tlb.h b/arch/x86/include/asm/tlb.h
> index 866ea78ba156..87ef7147eac8 100644
> --- a/arch/x86/include/asm/tlb.h
> +++ b/arch/x86/include/asm/tlb.h
> @@ -5,10 +5,23 @@
>  #define tlb_flush tlb_flush
>  static inline void tlb_flush(struct mmu_gather *tlb);
>  
> +#define tlb_table_flush_implies_ipi_broadcast tlb_table_flush_implies_ipi_broadcast
> +static inline bool tlb_table_flush_implies_ipi_broadcast(void);
> +
>  #include <asm-generic/tlb.h>
>  #include <linux/kernel.h>
>  #include <vdso/bits.h>
>  #include <vdso/page.h>
> +#include <asm/paravirt.h>
> +
> +static inline bool tlb_table_flush_implies_ipi_broadcast(void)
> +{
> +#ifdef CONFIG_PARAVIRT
> +	return pv_ops.mmu.flush_tlb_multi_implies_ipi_broadcast;
> +#else
> +	return !cpu_feature_enabled(X86_FEATURE_INVLPGB);
> +#endif
> +}

Then this turns into:

static inline bool tlb_table_flush_implies_ipi_broadcast(void)
{
	return static_branch_likely(&tlb_ipi_broadcast_key);
}




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v5 2/2] x86/tlb: skip redundant sync IPIs for native TLB flush
  2026-03-02 14:56   ` Peter Zijlstra
@ 2026-03-02 15:48     ` Lance Yang
  0 siblings, 0 replies; 5+ messages in thread
From: Lance Yang @ 2026-03-02 15:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: akpm, david, dave.hansen, dave.hansen, ypodemsk, hughd, will,
	aneesh.kumar, npiggin, tglx, mingo, bp, x86, hpa, arnd,
	lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, shy828301, riel, jannh, jgross,
	seanjc, pbonzini, boris.ostrovsky, virtualization, kvm,
	linux-arch, linux-mm, linux-kernel, ioworker0



On 2026/3/2 22:56, Peter Zijlstra wrote:
> On Mon, Mar 02, 2026 at 02:30:36PM +0800, Lance Yang wrote:
> 
> 
>> @@ -221,3 +222,18 @@ NOKPROBE_SYMBOL(native_load_idt);
>>   
>>   EXPORT_SYMBOL(pv_ops);
>>   EXPORT_SYMBOL_GPL(pv_info);
>> +
>> +void __init native_pv_tlb_init(void)
>> +{
>> +	/*
>> +	 * If PV backend already set the property, respect it.
>> +	 * Otherwise, check if native TLB flush sends real IPIs to all target
>> +	 * CPUs (i.e., not using INVLPGB broadcast invalidation).
>> +	 */
>> +	if (pv_ops.mmu.flush_tlb_multi_implies_ipi_broadcast)
>> +		return;
>> +
>> +	if (pv_ops.mmu.flush_tlb_multi == native_flush_tlb_multi &&
>> +	    !cpu_feature_enabled(X86_FEATURE_INVLPGB))
>> +		pv_ops.mmu.flush_tlb_multi_implies_ipi_broadcast = true;
>> +}
> 
> How about making this a static_branch instead?

Cool. Thanks for the suggestion!

> 
>> diff --git a/arch/x86/include/asm/tlb.h b/arch/x86/include/asm/tlb.h
>> index 866ea78ba156..87ef7147eac8 100644
>> --- a/arch/x86/include/asm/tlb.h
>> +++ b/arch/x86/include/asm/tlb.h
>> @@ -5,10 +5,23 @@
>>   #define tlb_flush tlb_flush
>>   static inline void tlb_flush(struct mmu_gather *tlb);
>>   
>> +#define tlb_table_flush_implies_ipi_broadcast tlb_table_flush_implies_ipi_broadcast
>> +static inline bool tlb_table_flush_implies_ipi_broadcast(void);
>> +
>>   #include <asm-generic/tlb.h>
>>   #include <linux/kernel.h>
>>   #include <vdso/bits.h>
>>   #include <vdso/page.h>
>> +#include <asm/paravirt.h>
>> +
>> +static inline bool tlb_table_flush_implies_ipi_broadcast(void)
>> +{
>> +#ifdef CONFIG_PARAVIRT
>> +	return pv_ops.mmu.flush_tlb_multi_implies_ipi_broadcast;
>> +#else
>> +	return !cpu_feature_enabled(X86_FEATURE_INVLPGB);
>> +#endif
>> +}
> 
> Then this turns into:
> 
> static inline bool tlb_table_flush_implies_ipi_broadcast(void)
> {
> 	return static_branch_likely(&tlb_ipi_broadcast_key);
> }

Right. That would be cleaner and faster, eliminating the branch overhead.

Trying using static_branch on top of this series, something like:

---8<---
diff --git a/arch/x86/include/asm/tlb.h b/arch/x86/include/asm/tlb.h
index 87ef7147eac8..409bbf335f26 100644
--- a/arch/x86/include/asm/tlb.h
+++ b/arch/x86/include/asm/tlb.h
@@ -10,17 +10,16 @@ static inline bool 
tlb_table_flush_implies_ipi_broadcast(void);

  #include <asm-generic/tlb.h>
  #include <linux/kernel.h>
+#include <linux/jump_label.h>
  #include <vdso/bits.h>
  #include <vdso/page.h>
  #include <asm/paravirt.h>

+DECLARE_STATIC_KEY_FALSE(tlb_ipi_broadcast_key);
+
  static inline bool tlb_table_flush_implies_ipi_broadcast(void)
  {
-#ifdef CONFIG_PARAVIRT
-	return pv_ops.mmu.flush_tlb_multi_implies_ipi_broadcast;
-#else
-	return !cpu_feature_enabled(X86_FEATURE_INVLPGB);
-#endif
+	return static_branch_likely(&tlb_ipi_broadcast_key);
  }

  static inline void tlb_flush(struct mmu_gather *tlb)
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index b681b8319295..bcf28980c319 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -15,6 +15,7 @@
  #include <linux/kprobes.h>
  #include <linux/pgtable.h>
  #include <linux/static_call.h>
+#include <linux/jump_label.h>

  #include <asm/bug.h>
  #include <asm/paravirt.h>
@@ -223,6 +224,8 @@ NOKPROBE_SYMBOL(native_load_idt);
  EXPORT_SYMBOL(pv_ops);
  EXPORT_SYMBOL_GPL(pv_info);

+DEFINE_STATIC_KEY_FALSE(tlb_ipi_broadcast_key);
+
  void __init native_pv_tlb_init(void)
  {
  	/*
@@ -230,10 +233,14 @@ void __init native_pv_tlb_init(void)
  	 * Otherwise, check if native TLB flush sends real IPIs to all target
  	 * CPUs (i.e., not using INVLPGB broadcast invalidation).
  	 */
-	if (pv_ops.mmu.flush_tlb_multi_implies_ipi_broadcast)
+	if (pv_ops.mmu.flush_tlb_multi_implies_ipi_broadcast) {
+		static_branch_enable(&tlb_ipi_broadcast_key);
  		return;
+	}

  	if (pv_ops.mmu.flush_tlb_multi == native_flush_tlb_multi &&
-	    !cpu_feature_enabled(X86_FEATURE_INVLPGB))
+	    !cpu_feature_enabled(X86_FEATURE_INVLPGB)) {
  		pv_ops.mmu.flush_tlb_multi_implies_ipi_broadcast = true;
+		static_branch_enable(&tlb_ipi_broadcast_key);
+	}
  }
---


Thanks,
Lance



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-03-02 15:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-03-02  6:30 [PATCH v5 0/2] skip redundant sync IPIs when TLB flush sent them Lance Yang
2026-03-02  6:30 ` [PATCH v5 1/2] mm/mmu_gather: prepare to skip redundant sync IPIs Lance Yang
2026-03-02  6:30 ` [PATCH v5 2/2] x86/tlb: skip redundant sync IPIs for native TLB flush Lance Yang
2026-03-02 14:56   ` Peter Zijlstra
2026-03-02 15:48     ` Lance Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox