linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jann Horn <jannh@google.com>
To: Rik van Riel <riel@surriel.com>
Cc: x86@kernel.org, linux-kernel@vger.kernel.org, bp@alien8.de,
	 peterz@infradead.org, dave.hansen@linux.intel.com,
	zhengqi.arch@bytedance.com,  nadav.amit@gmail.com,
	thomas.lendacky@amd.com, kernel-team@meta.com,
	 linux-mm@kvack.org, akpm@linux-foundation.org
Subject: Re: [PATCH v4 10/12] x86,tlb: do targeted broadcast flushing from tlbbatch code
Date: Mon, 13 Jan 2025 18:05:56 +0100	[thread overview]
Message-ID: <CAG48ez1QU0+Vkp2cbvTb2pLWP3X8oaz1mQxk3PEAPy07raUGBA@mail.gmail.com> (raw)
In-Reply-To: <20250112155453.1104139-11-riel@surriel.com>

On Sun, Jan 12, 2025 at 4:55 PM Rik van Riel <riel@surriel.com> wrote:
> Instead of doing a system-wide TLB flush from arch_tlbbatch_flush,
> queue up asynchronous, targeted flushes from arch_tlbbatch_add_pending.
>
> This also allows us to avoid adding the CPUs of processes using broadcast
> flushing to the batch->cpumask, and will hopefully further reduce TLB
> flushing from the reclaim and compaction paths.
[...]
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index 80375ef186d5..532911fbb12a 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -1658,9 +1658,7 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
>          * a local TLB flush is needed. Optimize this use-case by calling
>          * flush_tlb_func_local() directly in this case.
>          */
> -       if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) {
> -               invlpgb_flush_all_nonglobals();
> -       } else if (cpumask_any_but(&batch->cpumask, cpu) < nr_cpu_ids) {
> +       if (cpumask_any_but(&batch->cpumask, cpu) < nr_cpu_ids) {
>                 flush_tlb_multi(&batch->cpumask, info);
>         } else if (cpumask_test_cpu(cpu, &batch->cpumask)) {
>                 lockdep_assert_irqs_enabled();
> @@ -1669,12 +1667,49 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
>                 local_irq_enable();
>         }
>
> +       /*
> +        * If we issued (asynchronous) INVLPGB flushes, wait for them here.
> +        * The cpumask above contains only CPUs that were running tasks
> +        * not using broadcast TLB flushing.
> +        */
> +       if (cpu_feature_enabled(X86_FEATURE_INVLPGB) && batch->used_invlpgb) {
> +               tlbsync();
> +               migrate_enable();
> +               batch->used_invlpgb = false;
> +       }
> +
>         cpumask_clear(&batch->cpumask);
>
>         put_flush_tlb_info();
>         put_cpu();
>  }
>
> +void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
> +                                            struct mm_struct *mm,
> +                                            unsigned long uaddr)
> +{
> +       if (static_cpu_has(X86_FEATURE_INVLPGB) && mm_global_asid(mm)) {
> +               u16 asid = mm_global_asid(mm);
> +               /*
> +                * Queue up an asynchronous invalidation. The corresponding
> +                * TLBSYNC is done in arch_tlbbatch_flush(), and must be done
> +                * on the same CPU.
> +                */
> +               if (!batch->used_invlpgb) {
> +                       batch->used_invlpgb = true;
> +                       migrate_disable();
> +               }
> +               invlpgb_flush_user_nr_nosync(kern_pcid(asid), uaddr, 1, false);
> +               /* Do any CPUs supporting INVLPGB need PTI? */
> +               if (static_cpu_has(X86_FEATURE_PTI))
> +                       invlpgb_flush_user_nr_nosync(user_pcid(asid), uaddr, 1, false);
> +       } else {
> +               inc_mm_tlb_gen(mm);
> +               cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm));
> +       }
> +       mmu_notifier_arch_invalidate_secondary_tlbs(mm, 0, -1UL);
> +}

How does this work if the MM is currently transitioning to a global
ASID? Should the "mm_global_asid(mm)" check maybe be replaced with
something that checks if the MM has fully transitioned to a global
ASID, so that we keep using the classic path if there might be holdout
CPUs?


  reply	other threads:[~2025-01-13 17:06 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-12 15:53 [RFC PATCH v4 00/10] AMD broadcast TLB invalidation Rik van Riel
2025-01-12 15:53 ` [PATCH v4 01/12] x86/mm: make MMU_GATHER_RCU_TABLE_FREE unconditional Rik van Riel
2025-01-14 12:32   ` Borislav Petkov
2025-01-12 15:53 ` [PATCH v4 02/12] x86/mm: remove pv_ops.mmu.tlb_remove_table call Rik van Riel
2025-01-12 15:53 ` [PATCH v4 03/12] x86/mm: consolidate full flush threshold decision Rik van Riel
2025-01-12 15:53 ` [PATCH v4 04/12] x86/mm: get INVLPGB count max from CPUID Rik van Riel
2025-01-13 15:50   ` Jann Horn
2025-01-13 21:08     ` Rik van Riel
2025-01-13 22:53       ` Tom Lendacky
2025-01-12 15:53 ` [PATCH v4 05/12] x86/mm: add INVLPGB support code Rik van Riel
2025-01-13 14:21   ` Tom Lendacky
2025-01-13 21:10     ` Rik van Riel
2025-01-14 14:29       ` Tom Lendacky
2025-01-14 15:05         ` Dave Hansen
2025-01-14 15:23           ` Tom Lendacky
2025-01-14 15:47             ` Rik van Riel
2025-01-14 16:30               ` Tom Lendacky
2025-01-14 16:41                 ` Dave Hansen
2025-01-13 17:24   ` Jann Horn
2025-01-14  1:33     ` Rik van Riel
2025-01-14 18:24   ` Michael Kelley
2025-01-12 15:53 ` [PATCH v4 06/12] x86/mm: use INVLPGB for kernel TLB flushes Rik van Riel
2025-01-12 15:53 ` [PATCH v4 07/12] x86/tlb: use INVLPGB in flush_tlb_all Rik van Riel
2025-01-12 15:53 ` [PATCH v4 08/12] x86/mm: use broadcast TLB flushing for page reclaim TLB flushing Rik van Riel
2025-01-12 15:53 ` [PATCH v4 09/12] x86/mm: enable broadcast TLB invalidation for multi-threaded processes Rik van Riel
2025-01-13 13:09   ` Nadav Amit
2025-01-14  3:13     ` Rik van Riel
2025-01-12 15:53 ` [PATCH v4 10/12] x86,tlb: do targeted broadcast flushing from tlbbatch code Rik van Riel
2025-01-13 17:05   ` Jann Horn [this message]
2025-01-13 17:48     ` Jann Horn
2025-01-13 21:16     ` Rik van Riel
2025-01-12 15:53 ` [PATCH v4 11/12] x86/mm: enable AMD translation cache extensions Rik van Riel
2025-01-13 11:32   ` Andrew Cooper
2025-01-14  1:28     ` Rik van Riel
2025-01-12 15:53 ` [PATCH v4 12/12] x86/mm: only invalidate final translations with INVLPGB Rik van Riel
2025-01-13 17:11   ` Jann Horn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAG48ez1QU0+Vkp2cbvTb2pLWP3X8oaz1mQxk3PEAPy07raUGBA@mail.gmail.com \
    --to=jannh@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nadav.amit@gmail.com \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=thomas.lendacky@amd.com \
    --cc=x86@kernel.org \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox