linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lance Yang <lance.yang@linux.dev>
To: Dave Hansen <dave.hansen@intel.com>, david@kernel.org
Cc: dave.hansen@linux.intel.com, will@kernel.org,
	aneesh.kumar@kernel.org, npiggin@gmail.com, peterz@infradead.org,
	tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	x86@kernel.org, hpa@zytor.com, arnd@arndb.de,
	lorenzo.stoakes@oracle.com, ziy@nvidia.com,
	baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com,
	npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com,
	baohua@kernel.org, shy828301@gmail.com, riel@surriel.com,
	jannh@google.com, linux-arch@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, ioworker0@gmail.com,
	akpm@linux-foundation.org
Subject: Re: [PATCH RESEND v3 1/2] mm/tlb: skip redundant IPI when TLB flush already synchronized
Date: Wed, 7 Jan 2026 10:47:50 +0800	[thread overview]
Message-ID: <2adc4355-f1e2-4355-b04e-efae4425a3d3@linux.dev> (raw)
In-Reply-To: <a20ab449-e6b4-45c7-86df-bb194304503c@intel.com>



On 2026/1/7 00:24, Dave Hansen wrote:
> On 1/6/26 04:03, Lance Yang wrote:
>> From: Lance Yang <lance.yang@linux.dev>
>>
>> When unsharing hugetlb PMD page tables, we currently send two IPIs: one
>> for TLB invalidation, and another to synchronize with concurrent GUP-fast
>> walkers via tlb_remove_table_sync_one().
>>
>> However, if the TLB flush already sent IPIs to all CPUs (when freed_tables
>> or unshared_tables is true), the second IPI is redundant. GUP-fast runs
>> with IRQs disabled, so when the TLB flush IPI completes, any concurrent
>> GUP-fast must have finished.
>>
>> To avoid the redundant IPI, we add a flag to mmu_gather to track whether
>> the TLB flush sent IPIs. We pass the mmu_gather pointer through the TLB
>> flush path via flush_tlb_info, so native_flush_tlb_multi() can set the
>> flag when it sends IPIs for freed_tables. We also set the flag for
>> local-only flushes, since disabling IRQs provides the same guarantee.
> 
> The lack of imperative voice is killing me. :)

Oops.

> 
>> diff --git a/arch/x86/include/asm/tlb.h b/arch/x86/include/asm/tlb.h
>> index 866ea78ba156..c5950a92058c 100644
>> --- a/arch/x86/include/asm/tlb.h
>> +++ b/arch/x86/include/asm/tlb.h
>> @@ -20,7 +20,8 @@ static inline void tlb_flush(struct mmu_gather *tlb)
>>   		end = tlb->end;
>>   	}
>>   
>> -	flush_tlb_mm_range(tlb->mm, start, end, stride_shift, tlb->freed_tables);
>> +	flush_tlb_mm_range(tlb->mm, start, end, stride_shift,
>> +			   tlb->freed_tables || tlb->unshared_tables, tlb);
>>   }
> 
> I think this hunk sums up v3 pretty well. Where there was a single boolean, now there are two. To add to that, the structure that contains the booleans is itself being passed in. The boolean is still named 'freed_tables', and is going from:
> 
> 	tlb->freed_tables
> 
> which is pretty obviously correct to:
> 
> 	tlb->freed_tables || tlb->unshared_tables
> 
> which is _far_ from obviously correct.
> 
> I'm at a loss for why the patch wouldn't just do this:
> 
> -	flush_tlb_mm_range(tlb->mm, start, end, stride_shift, tlb->freed_tables);
> +	flush_tlb_mm_range(tlb->mm, start, end, stride_shift, tlb);
> 
> I suspect these were sent out in a bit of haste, which isn't the first time I've gotten that feeling with this series.
> 
> Could we slow down, please?

Sorry, I went too fast ...

> 
>>   static inline void invlpg(unsigned long addr)
>> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
>> index 00daedfefc1b..83c260c88b80 100644
>> --- a/arch/x86/include/asm/tlbflush.h
>> +++ b/arch/x86/include/asm/tlbflush.h
>> @@ -220,6 +220,7 @@ struct flush_tlb_info {
>>   	 *   will be zero.
>>   	 */
>>   	struct mm_struct	*mm;
>> +	struct mmu_gather	*tlb;
>>   	unsigned long		start;
>>   	unsigned long		end;
>>   	u64			new_tlb_gen;
> 
> This also gives me pause.
> 
> There is a *lot* of redundant information between 'struct mmu_gather' and 'struct tlb_flush_info'. There needs to at least be a description of what the relationship is and how these relate to each other. I would have naively thought that the right move here would be to pull the mmu_gather data out at one discrete time rather than store a pointer to it.
> 
> What I see here is, I suspect, the most expedient way to do it. I'd _certainly_ have done this myself if I was just hacking something together to play with as quickly as possible.
> 
> So, in the end, I don't hate the approach here (yet). But it is almost impossible to evaluate it because the series is taking some rather egregious shortcuts and is lacking any real semblance of a refactoring effort.

The flag lifetime issue David pointed out is real, and you're right
about the messy parameters :)

And, yeah, I need to think more those. Maybe v3 can be fixed, or maybe
v2 is actually sufficient - it's conservative but safe (no false positives).

Will take more time, thanks!


  reply	other threads:[~2026-01-07  2:48 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-06 12:03 [PATCH RESEND v3 0/2] skip redundant TLB sync IPIs Lance Yang
2026-01-06 12:03 ` [PATCH RESEND v3 1/2] mm/tlb: skip redundant IPI when TLB flush already synchronized Lance Yang
2026-01-06 15:19   ` David Hildenbrand (Red Hat)
2026-01-06 16:10     ` Lance Yang
2026-01-07  6:37       ` Lance Yang
2026-01-09 14:11         ` David Hildenbrand (Red Hat)
2026-01-09 14:13       ` David Hildenbrand (Red Hat)
2026-01-09 15:30         ` Lance Yang
2026-01-09 15:40           ` David Hildenbrand (Red Hat)
2026-01-06 16:24   ` Dave Hansen
2026-01-07  2:47     ` Lance Yang [this message]
2026-01-06 12:03 ` [PATCH RESEND v3 2/2] mm: introduce pmdp_collapse_flush_sync() to skip redundant IPI Lance Yang
2026-01-06 15:07   ` David Hildenbrand (Red Hat)
2026-01-06 15:41     ` Lance Yang
2026-01-07  9:46   ` kernel test robot
2026-01-07 10:52   ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2adc4355-f1e2-4355-b04e-efae4425a3d3@linux.dev \
    --to=lance.yang@linux.dev \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@kernel.org \
    --cc=arnd@arndb.de \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=hpa@zytor.com \
    --cc=ioworker0@gmail.com \
    --cc=jannh@google.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mingo@redhat.com \
    --cc=npache@redhat.com \
    --cc=npiggin@gmail.com \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=ryan.roberts@arm.com \
    --cc=shy828301@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox