From: "David Hildenbrand (Red Hat)" <david@kernel.org>
To: Dave Hansen <dave.hansen@intel.com>,
Lance Yang <lance.yang@linux.dev>,
akpm@linux-foundation.org
Cc: will@kernel.org, aneesh.kumar@kernel.org, npiggin@gmail.com,
peterz@infradead.org, tglx@linutronix.de, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, arnd@arndb.de, lorenzo.stoakes@oracle.com,
ziy@nvidia.com, baolin.wang@linux.alibaba.com,
Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com,
dev.jain@arm.com, baohua@kernel.org, ioworker0@gmail.com,
shy828301@gmail.com, riel@surriel.com, jannh@google.com,
linux-arch@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 0/3] skip redundant TLB sync IPIs
Date: Wed, 31 Dec 2025 13:33:30 +0100 [thread overview]
Message-ID: <1b27a3fa-359a-43d0-bdeb-c31341749367@kernel.org> (raw)
In-Reply-To: <f81b98e5-87c0-4c21-9a75-ad5f9b6af6aa@intel.com>
On 12/31/25 05:26, Dave Hansen wrote:
> On 12/29/25 06:52, Lance Yang wrote:
> ...
>> This series introduces a way for architectures to indicate their TLB flush
>> already provides full synchronization, allowing the redundant IPI to be
>> skipped. For now, the optimization is implemented for x86 first and applied
>> to all page table operations that free or unshare tables.
>
> I really don't like all the complexity here. Even on x86, there are
> three or more ways of deriving this. Having the pv_ops check the value
> of another pv op is also a bit unsettling.
Right. What I actually meant is that we simply have a property "bool
flush_tlb_multi_implies_ipi_broadcast" that we set only to true from the
initialization code.
Without comparing the pv_ops.
That should reduce the complexity quite a bit IMHO.
But maybe you have an even better way on how to indicate support, in a
very simple way.
>
> That said, complexity can be worth it with sufficient demonstrated
> gains. But:
>
>> When unsharing hugetlb PMD page tables or collapsing pages in khugepaged,
>> we send two IPIs: one for TLB invalidation, and another to synchronize
>> with concurrent GUP-fast walkers.
>
> Those aren't exactly hot paths. khugepaged is fundamentally rate
> limited. I don't think unsharing hugetlb PMD page tables just is all
> that common either.
Given that the added IPIs during unsharing broke Oracle DBs rather badly
[1], I think this is actually a case worth optimizing.
I'd assume that the impact can be measured on a many-core/many-socket
system with an adjusted reproducer of [1]. The impact will not be as big
as what [1] fixed (we reduced the tlb_remove_table_sync_one()
invocations quite drastically).
After all, tlb_remove_table_sync_one() sends an IPI to *all* CPUs in the
system, not just the ones in the MM CPU mask, which is rather bad on
systems with a lot of CPUs. Of course, this way we can only optimize on
systems that actually send IPIs during TLB flushes.
For other systems, it will be more tricky to avoid these broadcast IPIs.
(I have the faint recollection that the IPI broadcast through
tlb_remove_table_sync_one() is a problem when called from
__tlb_remove_table_one() on RT systems ...)
[1] https://lkml.kernel.org/r/20251223214037.580860-1-david@kernel.org
--
Cheers
David
prev parent reply other threads:[~2025-12-31 12:33 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-29 14:52 Lance Yang
2025-12-29 14:52 ` [PATCH v2 1/3] mm/tlb: allow architectures to " Lance Yang
2025-12-29 15:00 ` Lance Yang
2025-12-29 15:01 ` [PATCH v2 0/3] " Lance Yang
2025-12-30 20:31 ` [PATCH v2 1/3] mm/tlb: allow architectures to " David Hildenbrand (Red Hat)
2025-12-31 2:29 ` Lance Yang
2025-12-29 14:52 ` [PATCH v2 2/3] x86/mm: implement redundant IPI elimination for page table operations Lance Yang
2025-12-29 14:52 ` [PATCH v2 3/3] mm: embed TLB flush IPI check in tlb_remove_table_sync_one() Lance Yang
2025-12-30 20:33 ` David Hildenbrand (Red Hat)
2025-12-31 3:03 ` Lance Yang
2025-12-31 4:26 ` [PATCH v2 0/3] skip redundant TLB sync IPIs Dave Hansen
2025-12-31 12:33 ` David Hildenbrand (Red Hat) [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1b27a3fa-359a-43d0-bdeb-c31341749367@kernel.org \
--to=david@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@kernel.org \
--cc=arnd@arndb.de \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=bp@alien8.de \
--cc=dave.hansen@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=dev.jain@arm.com \
--cc=hpa@zytor.com \
--cc=ioworker0@gmail.com \
--cc=jannh@google.com \
--cc=lance.yang@linux.dev \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mingo@redhat.com \
--cc=npache@redhat.com \
--cc=npiggin@gmail.com \
--cc=peterz@infradead.org \
--cc=riel@surriel.com \
--cc=ryan.roberts@arm.com \
--cc=shy828301@gmail.com \
--cc=tglx@linutronix.de \
--cc=will@kernel.org \
--cc=x86@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox