linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dave Hansen <dave.hansen@intel.com>
To: "David Hildenbrand (Red Hat)" <david@kernel.org>,
	Lance Yang <lance.yang@linux.dev>,
	akpm@linux-foundation.org
Cc: will@kernel.org, aneesh.kumar@kernel.org, npiggin@gmail.com,
	peterz@infradead.org, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
	hpa@zytor.com, arnd@arndb.de, lorenzo.stoakes@oracle.com,
	ziy@nvidia.com, baolin.wang@linux.alibaba.com,
	Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com,
	dev.jain@arm.com, baohua@kernel.org, ioworker0@gmail.com,
	shy828301@gmail.com, riel@surriel.com, jannh@google.com,
	linux-arch@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 0/3] skip redundant TLB sync IPIs
Date: Fri, 2 Jan 2026 08:41:50 -0800	[thread overview]
Message-ID: <cea71c01-68e7-4f7f-9931-017109d95ef0@intel.com> (raw)
In-Reply-To: <1b27a3fa-359a-43d0-bdeb-c31341749367@kernel.org>

On 12/31/25 04:33, David Hildenbrand (Red Hat) wrote:
> On 12/31/25 05:26, Dave Hansen wrote:
>> On 12/29/25 06:52, Lance Yang wrote:
>> ...
>>> This series introduces a way for architectures to indicate their TLB
>>> flush
>>> already provides full synchronization, allowing the redundant IPI to be
>>> skipped. For now, the optimization is implemented for x86 first and
>>> applied
>>> to all page table operations that free or unshare tables.
>>
>> I really don't like all the complexity here. Even on x86, there are
>> three or more ways of deriving this. Having the pv_ops check the value
>> of another pv op is also a bit unsettling.
> 
> Right. What I actually meant is that we simply have a property "bool
> flush_tlb_multi_implies_ipi_broadcast" that we set only to true from the
> initialization code.
> 
> Without comparing the pv_ops.
> 
> That should reduce the complexity quite a bit IMHO.

Yeah, that sounds promising.

> But maybe you have an even better way on how to indicate support, in a
> very simple way.

Rather than having some kind of explicit support enumeration, the other
idea I had would be to actually track the state about what needs to get
flushed somewhere. For instance, even CPUs with enabled INVLPGB support
still use IPIs sometimes. That makes the
tlb_table_flush_implies_ipi_broadcast() check a bit imperfect as is
because it will for the extra sync IPI even when INVLPGB isn't being
used for an mm.

First, we already save some semblance of support for doing different
flushes when freeing page tables mmu_gather->freed_tables. But, the call
sites in question here are for a single flush and don't use mmu_gathers.

The other pretty straightforward thing to do would be to add something
to mm->context that indicates that page tables need to be freed but
there might still be wild gup walkers out there that need an IPI. It
would get set when the page tables are modified and cleared at all the
sites where an IPIs are sent.


>> That said, complexity can be worth it with sufficient demonstrated
>> gains. But:
>>
>>> When unsharing hugetlb PMD page tables or collapsing pages in
>>> khugepaged,
>>> we send two IPIs: one for TLB invalidation, and another to synchronize
>>> with concurrent GUP-fast walkers.
>>
>> Those aren't exactly hot paths. khugepaged is fundamentally rate
>> limited. I don't think unsharing hugetlb PMD page tables just is all
>> that common either.
> 
> Given that the added IPIs during unsharing broke Oracle DBs rather badly
> [1], I think this is actually a case worth optimizing.
...
> [1] https://lkml.kernel.org/r/20251223214037.580860-1-david@kernel.org

Gah, that's good context, thanks.

Are there any tests out there that might catch these this case better?
It might be something good to have 0day watch for.


      reply	other threads:[~2026-01-02 16:41 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-29 14:52 Lance Yang
2025-12-29 14:52 ` [PATCH v2 1/3] mm/tlb: allow architectures to " Lance Yang
2025-12-29 15:00   ` Lance Yang
2025-12-29 15:01     ` [PATCH v2 0/3] " Lance Yang
2025-12-30 20:31   ` [PATCH v2 1/3] mm/tlb: allow architectures to " David Hildenbrand (Red Hat)
2025-12-31  2:29     ` Lance Yang
2025-12-29 14:52 ` [PATCH v2 2/3] x86/mm: implement redundant IPI elimination for page table operations Lance Yang
2025-12-29 14:52 ` [PATCH v2 3/3] mm: embed TLB flush IPI check in tlb_remove_table_sync_one() Lance Yang
2025-12-30 20:33   ` David Hildenbrand (Red Hat)
2025-12-31  3:03     ` Lance Yang
2025-12-31  4:26 ` [PATCH v2 0/3] skip redundant TLB sync IPIs Dave Hansen
2025-12-31 12:33   ` David Hildenbrand (Red Hat)
2026-01-02 16:41     ` Dave Hansen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cea71c01-68e7-4f7f-9931-017109d95ef0@intel.com \
    --to=dave.hansen@intel.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@kernel.org \
    --cc=arnd@arndb.de \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=hpa@zytor.com \
    --cc=ioworker0@gmail.com \
    --cc=jannh@google.com \
    --cc=lance.yang@linux.dev \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mingo@redhat.com \
    --cc=npache@redhat.com \
    --cc=npiggin@gmail.com \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=ryan.roberts@arm.com \
    --cc=shy828301@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox