From: Manali Shukla <manali.shukla@amd.com>
To: Rik van Riel <riel@surriel.com>, x86@kernel.org
Cc: linux-kernel@vger.kernel.org, bp@alien8.de, peterz@infradead.org,
dave.hansen@linux.intel.com, zhengqi.arch@bytedance.com,
nadav.amit@gmail.com, thomas.lendacky@amd.com,
kernel-team@meta.com, linux-mm@kvack.org,
akpm@linux-foundation.org, jannh@google.com,
mhklinux@outlook.com, andrew.cooper3@citrix.com,
Manali Shukla <manali.shukla@amd.com>
Subject: Re: [PATCH v6 00/12] AMD broadcast TLB invalidation
Date: Fri, 24 Jan 2025 17:11:34 +0530 [thread overview]
Message-ID: <431442a7-e62b-4892-9f3c-6fbdb97a7722@amd.com> (raw)
In-Reply-To: <20250120024104.1924753-1-riel@surriel.com>
On 1/20/2025 8:10 AM, Rik van Riel wrote:
> Add support for broadcast TLB invalidation using AMD's INVLPGB instruction.
>
> This allows the kernel to invalidate TLB entries on remote CPUs without
> needing to send IPIs, without having to wait for remote CPUs to handle
> those interrupts, and with less interruption to what was running on
> those CPUs.
>
> Because x86 PCID space is limited, and there are some very large
> systems out there, broadcast TLB invalidation is only used for
> processes that are active on 3 or more CPUs, with the threshold
> being gradually increased the more the PCID space gets exhausted.
>
> Combined with the removal of unnecessary lru_add_drain calls
> (see https://lkml.org/lkml/2024/12/19/1388) this results in a
> nice performance boost for the will-it-scale tlb_flush2_threads
> test on an AMD Milan system with 36 cores:
>
> - vanilla kernel: 527k loops/second
> - lru_add_drain removal: 731k loops/second
> - only INVLPGB: 527k loops/second
> - lru_add_drain + INVLPGB: 1157k loops/second
>
> Profiling with only the INVLPGB changes showed while
> TLB invalidation went down from 40% of the total CPU
> time to only around 4% of CPU time, the contention
> simply moved to the LRU lock.
>
> Fixing both at the same time about doubles the
> number of iterations per second from this case.
>
> Some numbers closer to real world performance
> can be found at Phoronix, thanks to Michael:
>
> https://www.phoronix.com/news/AMD-INVLPGB-Linux-Benefits
>
> My current plan is to implement support for Intel's RAR
> (Remote Action Request) TLB flushing in a follow-up series,
> after this thing has been merged into -tip. Making things
> any larger would just be unwieldy for reviewers.
>
> v6:
> - fix info->end check in flush_tlb_kernel_range (Michael)
> - disable broadcast TLB flushing on 32 bit x86
> v5:
> - use byte assembly for compatibility with older toolchains (Borislav, Michael)
> - ensure a panic on an invalid number of extra pages (Dave, Tom)
> - add cant_migrate() assertion to tlbsync (Jann)
> - a bunch more cleanups (Nadav)
> - key TCE enabling off X86_FEATURE_TCE (Andrew)
> - fix a race between reclaim and ASID transition (Jann)
> v4:
> - Use only bitmaps to track free global ASIDs (Nadav)
> - Improved AMD initialization (Borislav & Tom)
> - Various naming and documentation improvements (Peter, Nadav, Tom, Dave)
> - Fixes for subtle race conditions (Jann)
> v3:
> - Remove paravirt tlb_remove_table call (thank you Qi Zheng)
> - More suggested cleanups and changelog fixes by Peter and Nadav
> v2:
> - Apply suggestions by Peter and Borislav (thank you!)
> - Fix bug in arch_tlbbatch_flush, where we need to do both
> the TLBSYNC, and flush the CPUs that are in the cpumask.
> - Some updates to comments and changelogs based on questions.
>
>
>
I have collected the performance data using will-it-scale tlb_flush2_threads test
and AMD broadcast TLB invalidations v6 + LRU drain removal patches on my AMD Milan,
Genoa and Turin machines. I have not observed any discrepancy in the data. As seen
in the below table, LRU drain removal and INVLPGB patches combined is giving nice
performance boost as expected.
Since v7 has already been posted, I plan to collect the similar data for V7 too.
------------------------------------------------------------------------------------------------------------------------------------------------
| ./tlb_flush2_threads -s 5 -t 128 | Milan 1P (NPS1) | Milan 1P (NPS2) | Genoa 1P (NPS1) | Genoa 1P (NPS2) | Turin 2P (NPS1) | Turin 2P (NPS2) |
------------------------------------------------------------------------------------------------------------------------------------------------
| Vanila | 369880 | 399732 | 311936 | 326639 | 371146 | 377428 |
------------------------------------------------------------------------------------------------------------------------------------------------
| LRU drain removal | 781684 | 794648 | 541434 | 531191 | 549773 | 487340 |
------------------------------------------------------------------------------------------------------------------------------------------------
| INVLPGB | 554438 | 937696 | 501677 | 565055 | 531544 | 487342 |
------------------------------------------------------------------------------------------------------------------------------------------------
| LRU drain removal + INVLPGB | 1091792 | 1096667 | 971744 | 956330 | 1387741 | 1388581 |
------------------------------------------------------------------------------------------------------------------------------------------------
| LRU drain vs. Vanila | 52.68% | 49.70% | 42.39% | 38.51% | 32.49% | 22.55% |
------------------------------------------------------------------------------------------------------------------------------------------------
| INVLPGB vs. Vanila | 33.29% | 57.37% | 37.82% | 42.19% | 30.18% | 22.55% |
------------------------------------------------------------------------------------------------------------------------------------------------
| (LRU drain + INVLPGB) vs. Vanila | 66.12% | 63.55% | 67.90% | 65.84% | 73.26% | 72.82% |
------------------------------------------------------------------------------------------------------------------------------------------------
-Manali
prev parent reply other threads:[~2025-01-24 11:41 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-20 2:40 Rik van Riel
2025-01-20 2:40 ` [PATCH v6 01/12] x86/mm: make MMU_GATHER_RCU_TABLE_FREE unconditional Rik van Riel
2025-01-20 19:32 ` David Hildenbrand
2025-01-20 2:40 ` [PATCH v6 02/12] x86/mm: remove pv_ops.mmu.tlb_remove_table call Rik van Riel
2025-01-20 19:47 ` David Hildenbrand
2025-01-21 1:03 ` Rik van Riel
2025-01-21 7:46 ` David Hildenbrand
2025-01-21 8:54 ` Peter Zijlstra
2025-01-22 15:48 ` Rik van Riel
2025-01-20 2:40 ` [PATCH v6 03/12] x86/mm: consolidate full flush threshold decision Rik van Riel
2025-01-20 2:40 ` [PATCH v6 04/12] x86/mm: get INVLPGB count max from CPUID Rik van Riel
2025-01-20 2:40 ` [PATCH v6 05/12] x86/mm: add INVLPGB support code Rik van Riel
2025-01-21 9:45 ` Peter Zijlstra
2025-01-22 16:58 ` Rik van Riel
2025-01-20 2:40 ` [PATCH v6 06/12] x86/mm: use INVLPGB for kernel TLB flushes Rik van Riel
2025-01-20 2:40 ` [PATCH v6 07/12] x86/tlb: use INVLPGB in flush_tlb_all Rik van Riel
2025-01-20 2:40 ` [PATCH v6 08/12] x86/mm: use broadcast TLB flushing for page reclaim TLB flushing Rik van Riel
2025-01-20 2:40 ` [PATCH v6 09/12] x86/mm: enable broadcast TLB invalidation for multi-threaded processes Rik van Riel
2025-01-20 14:02 ` Nadav Amit
2025-01-20 16:09 ` Rik van Riel
2025-01-20 20:04 ` Nadav Amit
2025-01-20 22:44 ` Rik van Riel
2025-01-21 7:31 ` Nadav Amit
2025-01-21 9:55 ` Peter Zijlstra
2025-01-21 10:33 ` Peter Zijlstra
2025-01-23 1:40 ` Rik van Riel
2025-01-21 18:48 ` Dave Hansen
2025-01-22 8:38 ` Peter Zijlstra
2025-01-23 1:13 ` Rik van Riel
2025-01-23 9:07 ` Peter Zijlstra
2025-01-23 12:42 ` Rik van Riel
2025-01-20 2:40 ` [PATCH v6 10/12] x86,tlb: do targeted broadcast flushing from tlbbatch code Rik van Riel
2025-01-20 2:40 ` [PATCH v6 11/12] x86/mm: enable AMD translation cache extensions Rik van Riel
2025-01-20 2:40 ` [PATCH v6 12/12] x86/mm: only invalidate final translations with INVLPGB Rik van Riel
2025-01-20 5:58 ` [PATCH v6 00/12] AMD broadcast TLB invalidation Michael Kelley
2025-01-24 11:41 ` Manali Shukla [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=431442a7-e62b-4892-9f3c-6fbdb97a7722@amd.com \
--to=manali.shukla@amd.com \
--cc=akpm@linux-foundation.org \
--cc=andrew.cooper3@citrix.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=jannh@google.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhklinux@outlook.com \
--cc=nadav.amit@gmail.com \
--cc=peterz@infradead.org \
--cc=riel@surriel.com \
--cc=thomas.lendacky@amd.com \
--cc=x86@kernel.org \
--cc=zhengqi.arch@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox