From: Dave Hansen <dave.hansen@intel.com>
To: Rik van Riel <riel@surriel.com>, x86@kernel.org
Cc: linux-kernel@vger.kernel.org, bp@alien8.de, peterz@infradead.org,
dave.hansen@linux.intel.com, zhengqi.arch@bytedance.com,
nadav.amit@gmail.com, thomas.lendacky@amd.com,
kernel-team@meta.com, linux-mm@kvack.org,
akpm@linux-foundation.org, jackmanb@google.com, jannh@google.com,
mhklinux@outlook.com, andrew.cooper3@citrix.com,
Manali Shukla <Manali.Shukla@amd.com>
Subject: Re: [PATCH v11 09/12] x86/mm: enable broadcast TLB invalidation for multi-threaded processes
Date: Fri, 14 Feb 2025 11:53:04 -0800 [thread overview]
Message-ID: <b067a9fc-ff5f-4baa-a1ff-3fa749ae4d44@intel.com> (raw)
In-Reply-To: <20250213161423.449435-10-riel@surriel.com>
On 2/13/25 08:14, Rik van Riel wrote:
> Use broadcast TLB invalidation, using the INVPLGB instruction, on AMD EPYC 3
> and newer CPUs.
Could we please just zap the "on AMD EPYC 3 and newer CPUs" from all of
these patches? It can be mentioned once in the cover letter or
something, but it doesn't need to be repeated.
> In order to not exhaust PCID space, and keep TLB flushes local for single
> threaded processes, we only hand out broadcast ASIDs to processes active on
> 4 or more CPUs.
Please no "we's". Use imperative voice. This also needs some fleshing
out. Perhaps:
There is not enough room in the 12-bit ASID address space to
hand out broadcast ASIDs to every process. Only hand out
broadcast ASIDs to processes when they are observed to be
simultaneously running on 4 or more CPUs.
Most importantly, this ensures that single threaded processes
continue to use the cheaper, legacy, local TLB invalidation
instructions like INVLPG.
> +#ifdef CONFIG_X86_BROADCAST_TLB_FLUSH
> + u16 global_asid;
> + bool asid_transition;
> +#endif
> +
> } mm_context_t;
Please give these at least a line or two comment explaining what they do.
> #define INIT_MM_CONTEXT(mm) \
> diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
> index 795fdd53bd0a..d670699d32c2 100644
> --- a/arch/x86/include/asm/mmu_context.h
> +++ b/arch/x86/include/asm/mmu_context.h
> @@ -139,6 +139,8 @@ static inline void mm_reset_untag_mask(struct mm_struct *mm)
> #define enter_lazy_tlb enter_lazy_tlb
> extern void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk);
>
> +extern void destroy_context_free_global_asid(struct mm_struct *mm);
> +
> /*
> * Init a new mm. Used on mm copies, like at fork()
> * and on mm's that are brand-new, like at execve().
> @@ -161,6 +163,14 @@ static inline int init_new_context(struct task_struct *tsk,
> mm->context.execute_only_pkey = -1;
> }
> #endif
> +
> +#ifdef CONFIG_X86_BROADCAST_TLB_FLUSH
> + if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) {
> + mm->context.global_asid = 0;
> + mm->context.asid_transition = false;
> + }
> +#endif
> +
> mm_reset_untag_mask(mm);
> init_new_context_ldt(mm);
> return 0;
> @@ -170,6 +180,10 @@ static inline int init_new_context(struct task_struct *tsk,
> static inline void destroy_context(struct mm_struct *mm)
> {
> destroy_context_ldt(mm);
> +#ifdef CONFIG_X86_BROADCAST_TLB_FLUSH
> + if (cpu_feature_enabled(X86_FEATURE_INVLPGB))
> + destroy_context_free_global_asid(mm);
> +#endif
> }
>
> extern void switch_mm(struct mm_struct *prev, struct mm_struct *next,
> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> index bda7080dec83..3080cb8d21dc 100644
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -6,6 +6,7 @@
> #include <linux/mmu_notifier.h>
> #include <linux/sched.h>
>
> +#include <asm/barrier.h>
> #include <asm/processor.h>
> #include <asm/cpufeature.h>
> #include <asm/special_insns.h>
> @@ -239,6 +240,78 @@ void flush_tlb_one_kernel(unsigned long addr);
> void flush_tlb_multi(const struct cpumask *cpumask,
> const struct flush_tlb_info *info);
>
> +#ifdef CONFIG_X86_BROADCAST_TLB_FLUSH
> +static inline bool is_dyn_asid(u16 asid)
> +{
> + if (!cpu_feature_enabled(X86_FEATURE_INVLPGB))
> + return true;
> +
> + return asid < TLB_NR_DYN_ASIDS;
> +}
If possible, could you avoid double-defining the helpers that will
compile with and without CONFIG_X86_BROADCAST_TLB_FLUSH? Just put the
#ifdef around the ones that *need* it.
...
> +static inline bool in_asid_transition(struct mm_struct *mm)
> +{
> + if (!cpu_feature_enabled(X86_FEATURE_INVLPGB))
> + return false;
> +
> + return mm && READ_ONCE(mm->context.asid_transition);
> +}
> +
> +static inline u16 mm_global_asid(struct mm_struct *mm)
> +{
> + u16 asid;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_INVLPGB))
> + return 0;
> +
> + asid = smp_load_acquire(&mm->context.global_asid);
> +
> + /* mm->context.global_asid is either 0, or a global ASID */
> + VM_WARN_ON_ONCE(asid && is_dyn_asid(asid));
> +
> + return asid;
> +}
Yay, some kind of custom lock. :)
Could give us a little idea what the locking rules are here and why this
neds the READ_ONCE() and smp_load_acquire()?
...
> + /*
> + * TLB consistency for global ASIDs is maintained with broadcast TLB
> + * flushing. The TLB is never outdated, and does not need flushing.
> + */
This is another case where I think using the word "broadcast" is not
helping.
Here's the problem: INVLPGB is a "INVLPG" that's broadcast. So the name
INVLPGB is fine. INVLPGB is *a* way to broadcast INVLPG which is *a*
kind of TLB invalidation.
But, to me "broadcast TLB flushing" is a broad term. In arguably
includes INVLPGB and normal IPI-based flushing. Just like the function
naming in the earlier patch, I think we need a better term here.
> + if (static_cpu_has(X86_FEATURE_INVLPGB)) {
> + u16 global_asid = mm_global_asid(next);
> +
> + if (global_asid) {
> + *new_asid = global_asid;
> + *need_flush = false;
> + return;
> + }
> + }
> +#ifdef CONFIG_X86_BROADCAST_TLB_FLUSH
How cleanly could we throw this hunk in a new file? I always dislike big
#ifdefs like this. They seem like magnets for causing weird compile
problems.
> +/*
> + * Logic for broadcast TLB invalidation.
> + */
> +static DEFINE_RAW_SPINLOCK(global_asid_lock);
The global lock definitely needs some discussion in the changelog.
> +static u16 last_global_asid = MAX_ASID_AVAILABLE;
> +static DECLARE_BITMAP(global_asid_used, MAX_ASID_AVAILABLE) = { 0 };
> +static DECLARE_BITMAP(global_asid_freed, MAX_ASID_AVAILABLE) = { 0 };
Isn't the initialization to all 0's superfluous for a global variable?
> +static int global_asid_available = MAX_ASID_AVAILABLE - TLB_NR_DYN_ASIDS - 1;
> +
> +static void reset_global_asid_space(void)
> +{
> + lockdep_assert_held(&global_asid_lock);
> +
> + /*
> + * A global TLB flush guarantees that any stale entries from
> + * previously freed global ASIDs get flushed from the TLB
> + * everywhere, making these global ASIDs safe to reuse.
> + */
> + invlpgb_flush_all_nonglobals();
Ugh, my suggestion to use the term "global ASID" really doesn't work
here, does it?
Also, isn't a invlpgb_flush_all_nonglobals() _relatively_ slow? It has
to go out and talk over the fabric to every CPU, right? This is also
holding a global lock.
That's seems worrisome.
> + /*
> + * Clear all the previously freed global ASIDs from the
> + * broadcast_asid_used bitmap, now that the global TLB flush
> + * has made them actually available for re-use.
> + */
> + bitmap_andnot(global_asid_used, global_asid_used,
> + global_asid_freed, MAX_ASID_AVAILABLE);
> + bitmap_clear(global_asid_freed, 0, MAX_ASID_AVAILABLE);
> +
> + /*
> + * ASIDs 0-TLB_NR_DYN_ASIDS are used for CPU-local ASID
> + * assignments, for tasks doing IPI based TLB shootdowns.
> + * Restart the search from the start of the global ASID space.
> + */
> + last_global_asid = TLB_NR_DYN_ASIDS;
> +}
> +
> +static u16 get_global_asid(void)
> +{
> +
> + u16 asid;
> +
> + lockdep_assert_held(&global_asid_lock);
> +
> + /* The previous allocated ASID is at the top of the address space. */
> + if (last_global_asid >= MAX_ASID_AVAILABLE - 1)
> + reset_global_asid_space();
> +
> + asid = find_next_zero_bit(global_asid_used, MAX_ASID_AVAILABLE, last_global_asid);
> +
> + if (asid >= MAX_ASID_AVAILABLE) {
> + /* This should never happen. */
> + VM_WARN_ONCE(1, "Unable to allocate global ASID despite %d available\n",
> + global_asid_available);
> + return 0;
> + }
> +
> + /* Claim this global ASID. */
> + __set_bit(asid, global_asid_used);
> + last_global_asid = asid;
> + global_asid_available--;
> + return asid;
> +}
> +
> +/*
> + * Returns true if the mm is transitioning from a CPU-local ASID to a global
> + * (INVLPGB) ASID, or the other way around.
> + */
> +static bool needs_global_asid_reload(struct mm_struct *next, u16 prev_asid)
> +{
> + u16 global_asid = mm_global_asid(next);
> +
> + /* Process is transitioning to a global ASID */
> + if (global_asid && prev_asid != global_asid)
> + return true;
> +
> + /* Transition from global->local ASID does not currently happen. */
> + if (!global_asid && is_global_asid(prev_asid))
> + return true;
> +
> + return false;
> +}
I'm going to throw in the towel at this point. This patch needs to get
broken up. It's more at once than my poor little brain can handle.
The _least_ it can do is introduce the stub functions and injection into
existing code changes, first. Then, in a second patch, introduce the
real implementation.
I also suspect that a big chunk of the ASID allocator could be broken
out and introduced separately.
Another example is broadcast_tlb_flush(). To reduce complexity in _this_
patch, it could do something suboptimal like always do a
invlpgb_flush_all_nonglobals() regardless of the kind of flush it gets.
Then, in a later patch, you could optimize it.
next prev parent reply other threads:[~2025-02-14 19:53 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-13 16:13 [PATCH v11 00/12] AMD broadcast TLB invalidation Rik van Riel
2025-02-13 16:13 ` [PATCH v11 01/12] x86/mm: make MMU_GATHER_RCU_TABLE_FREE unconditional Rik van Riel
2025-02-13 16:13 ` [PATCH v11 02/12] x86/mm: remove pv_ops.mmu.tlb_remove_table call Rik van Riel
2025-02-13 16:13 ` [PATCH v11 03/12] x86/mm: consolidate full flush threshold decision Rik van Riel
2025-02-14 18:07 ` Dave Hansen
2025-02-19 11:21 ` Borislav Petkov
2025-02-13 16:13 ` [PATCH v11 04/12] x86/mm: get INVLPGB count max from CPUID Rik van Riel
2025-02-14 18:16 ` Dave Hansen
2025-02-19 11:56 ` Borislav Petkov
2025-02-19 17:52 ` Rik van Riel
2025-02-19 18:23 ` Borislav Petkov
2025-02-19 19:26 ` Dave Hansen
2025-02-13 16:13 ` [PATCH v11 05/12] x86/mm: add INVLPGB support code Rik van Riel
2025-02-14 18:22 ` Dave Hansen
2025-02-18 17:23 ` Rik van Riel
2025-02-19 12:04 ` Borislav Petkov
2025-02-19 17:42 ` Rik van Riel
2025-02-19 19:01 ` Dave Hansen
2025-02-19 19:15 ` Borislav Petkov
2025-02-20 2:49 ` Rik van Riel
2025-02-20 10:23 ` Borislav Petkov
2025-02-13 16:13 ` [PATCH v11 06/12] x86/mm: use INVLPGB for kernel TLB flushes Rik van Riel
2025-02-14 18:35 ` Dave Hansen
2025-02-14 19:40 ` Peter Zijlstra
2025-02-14 19:55 ` Dave Hansen
2025-02-15 1:25 ` Rik van Riel
2025-02-15 2:08 ` Yosry Ahmed
2025-02-18 18:00 ` Rik van Riel
2025-02-18 22:27 ` Dave Hansen
2025-02-19 1:46 ` Yosry Ahmed
2025-02-13 16:13 ` [PATCH v11 07/12] x86/mm: use INVLPGB in flush_tlb_all Rik van Riel
2025-02-14 18:57 ` Dave Hansen
2025-02-13 16:13 ` [PATCH v11 08/12] x86/mm: use broadcast TLB flushing for page reclaim TLB flushing Rik van Riel
2025-02-14 18:51 ` Dave Hansen
2025-02-18 19:31 ` Rik van Riel
2025-02-18 19:46 ` Dave Hansen
2025-02-18 20:06 ` Rik van Riel
2025-02-13 16:14 ` [PATCH v11 09/12] x86/mm: enable broadcast TLB invalidation for multi-threaded processes Rik van Riel
2025-02-14 19:53 ` Dave Hansen [this message]
2025-02-17 13:22 ` Brendan Jackman
2025-02-20 15:25 ` Rik van Riel
2025-02-13 16:14 ` [PATCH v11 10/12] x86/mm: do targeted broadcast flushing from tlbbatch code Rik van Riel
2025-02-13 16:14 ` [PATCH v11 11/12] x86/mm: enable AMD translation cache extensions Rik van Riel
2025-02-13 16:14 ` [PATCH v11 12/12] x86/mm: only invalidate final translations with INVLPGB Rik van Riel
2025-02-13 18:31 ` [PATCH v11 00/12] AMD broadcast TLB invalidation Brendan Jackman
2025-02-13 18:38 ` Brendan Jackman
2025-02-13 20:02 ` Rik van Riel
2025-02-14 9:36 ` Peter Zijlstra
2025-02-14 9:54 ` Brendan Jackman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b067a9fc-ff5f-4baa-a1ff-3fa749ae4d44@intel.com \
--to=dave.hansen@intel.com \
--cc=Manali.Shukla@amd.com \
--cc=akpm@linux-foundation.org \
--cc=andrew.cooper3@citrix.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=jackmanb@google.com \
--cc=jannh@google.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhklinux@outlook.com \
--cc=nadav.amit@gmail.com \
--cc=peterz@infradead.org \
--cc=riel@surriel.com \
--cc=thomas.lendacky@amd.com \
--cc=x86@kernel.org \
--cc=zhengqi.arch@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox