Re: [PATCH v1 4/4] mm/hugetlb: fix excessive IPI broadcasts when unsharing PMD tables using mmu_gather

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "David Hildenbrand (Red Hat)" <david@kernel.org>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-mm@kvack.org, Will Deacon <will@kernel.org>,
	"Aneesh Kumar K.V" <aneesh.kumar@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Nick Piggin <npiggin@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Arnd Bergmann <arnd@arndb.de>,
	Muchun Song <muchun.song@linux.dev>,
	Oscar Salvador <osalvador@suse.de>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
	Pedro Falcato <pfalcato@suse.de>, Rik van Riel <riel@surriel.com>,
	Harry Yoo <harry.yoo@oracle.com>,
	Laurence Oberman <loberman@redhat.com>,
	Prakash Sangappa <prakash.sangappa@oracle.com>,
	Nadav Amit <nadav.amit@gmail.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH v1 4/4] mm/hugetlb: fix excessive IPI broadcasts when unsharing PMD tables using mmu_gather
Date: Mon, 15 Dec 2025 15:47:57 +0100	[thread overview]
Message-ID: <937a4525-910d-4596-a9c4-29e47ca53667@kernel.org> (raw)
In-Reply-To: <07e8b94e-b4a1-4541-84ed-a5d57058d5a1@lucifer.local>


>>>
>>> As Nadav points out, should also initialise fully_unshared_tables.
>>
>> Right, but on an earlier init path, not on the range reset path here.
> 
> Shouldn't we reset it also?
> 
> I mean __tlb_reset_range() is also called by __tlb_gather_mmu() (invoked by
> tlb_gather_mmu[_fullmm]()).
> 

__tlb_reset_range() is all about flushing the TLB.

In v2 I have:

diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
index 247e3f9db6c7a..030a162a263ba 100644
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -426,6 +426,7 @@ static void __tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
  #endif
         tlb->vma_pfn = 0;
  
+       tlb->fully_unshared_tables = 0;
         __tlb_reset_range(tlb);
         inc_tlb_flush_pending(tlb->mm);
  }


>>
>>>
>>>>    	/*
>>>>    	 * Do not reset mmu_gather::vma_* fields here, we do not
>>>>    	 * call into tlb_start_vma() again to set them if there is an
>>>> @@ -484,7 +496,7 @@ static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
>>>>    	 * these bits.
>>>>    	 */
>>>>    	if (!(tlb->freed_tables || tlb->cleared_ptes || tlb->cleared_pmds ||
>>>> -	      tlb->cleared_puds || tlb->cleared_p4ds))
>>>> +	      tlb->cleared_puds || tlb->cleared_p4ds || tlb->unshared_tables))
>>>
>>> What about fully_unshared_tables? I guess though unshared_tables implies
>>> fully_unshared_tables.
>>
>> fully_unshared_tables is only for triggering IPIs and consequently not about
>> flushing TLBs.
>>
>> The TLB part is taken care of by unshared_tables, and we will always set
>> unshared_tables when unsharing any page tables (incl. fully unshared ones).
> 
> OK, so is there ever a situation where fully_unshared_tables would be set
> without unshared_tables? Presumably not.

tlb_unshare_pmd_ptdesc() will always set "unshared_tables" but only conditionally
sets "fully_unshared_tables".

"unshared_tables" might get handled by a prior TLB flush,
leaving only fully_unshared_tables set to perform the IPI in tlb_flush_unshared_tables().

So the important part is that whenever we unshare, we set "unshared_tables".

[...]

>>>> +{
>>>> +	/*
>>>> +	 * As soon as the caller drops locks to allow for reuse of
>>>> +	 * previously-shared tables, these tables could get modified and
>>>> +	 * even reused outside of hugetlb context. So flush the TLB now.
>>>
>>> Hmm but you're doing this in both the case of unshare and fully unsharing, so is
>>> this the right place to make this comment?
>>
>> That's why I start the comment below with "Similarly", to make it clear that
>> the comments build up on each other.
>>
>> But I'm afraid I might not be getting your point fully here :/
> 
> what I mean is, if we are not at the point of the table being fully unshared,
> nobody else can come in and reuse it right? Because we're still using it, just
> dropped a ref + flushed tlb?

After we drop the lock, someone else could fully unshare it. And that other (MM) would
not be able to flush the TLB for us -- in contrast to the IPI that would affect all
CPUs.

> 
> Isn't really the correct comment here that ranges that previously mapped the
> shared pages might no longer, so we must clear the TLB? I may be missing
> something :)

There are cases where we defer flushing the TLB until we dropped all (exclusive) locks.
In particular, MADV_DONTNEED does that in some cases, essentially deferring the flush
to the tlb_finish_mmu().

free_pgtables() will also defer the flush, performing the TLB flush during tlb_finish_mmu(),
before

The point is (as I tried to make clear in the comment), for unsharing we have no control
whenn the page table gets freed after we drop the lock.

So we must flush the TLB now and cannot defer it like we do in the other cases.

> 
> Or maybe the right thing is 'we must always flush the TLB because <blahdyblah>,
> and if we are fully unsharing tables we must avoid reuse of previously-shared
> tables when the caller drops the locks' or something?

I hope the description above made it clearer why I spell out that the TLB must be flushed
now.

> 
>>
>>>
>>> Surely here this is about flushing TLBs for the unsharer only as it no longer
>>> uses it?
>>>
>>>> +	 *
>>>> +	 * Note that we cannot defer the flush to a later point even if we are
>>>> +	 * not the last sharer of the page table.
>>>> +	 */
>>>
>>> Not hugely clear, some double negative here. Maybe worth saying something like:
>>>
>>> 'Even if we are not fully unsharing a PMD table, we must flush the TLB for the
>>> unsharer who no longer has access to this memory'
>>>
>>> Or something? Assuming this is accurate :)
>>
>> I'll adjust it to "Not that even if we are not fully unsharing a PMD table,
>> we must flush the TLB for the unsharer now.".
> 
> I guess you mean Note or that's even more confusing :P

:) Yeah, I did that in v2.

-- 
Cheers

David

next prev parent reply	other threads:[~2025-12-15 14:48 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-05 21:35 [PATCH v1 0/4] mm/hugetlb: fixes for PMD table sharing (incl. using mmu_gather) David Hildenbrand (Red Hat)
2025-12-05 21:35 ` [PATCH v1 1/4] mm/hugetlb: fix hugetlb_pmd_shared() David Hildenbrand (Red Hat)
2025-12-06  2:18   ` Rik van Riel
2025-12-06  5:55   ` Lance Yang
2025-12-06  6:24     ` Lance Yang
2025-12-08  2:32   ` Lance Yang
2025-12-08 11:01     ` David Hildenbrand (Red Hat)
2025-12-10 11:15     ` Lorenzo Stoakes
2025-12-08  9:08   ` Harry Yoo
2025-12-10 11:16   ` Lorenzo Stoakes
2025-12-11  5:38   ` Oscar Salvador
2025-12-05 21:35 ` [PATCH v1 2/4] mm/hugetlb: fix two comments related to huge_pmd_unshare() David Hildenbrand (Red Hat)
2025-12-06  2:26   ` Rik van Riel
2025-12-10 11:22   ` Lorenzo Stoakes
2025-12-11  1:58     ` David Hildenbrand (Red Hat)
2025-12-15  9:54       ` Lorenzo Stoakes
2025-12-11  5:41   ` Oscar Salvador
2025-12-05 21:35 ` [PATCH v1 3/4] mm/rmap: " David Hildenbrand (Red Hat)
2025-12-06  2:50   ` Rik van Riel
2025-12-10 11:24   ` Lorenzo Stoakes
2025-12-11  5:42   ` Oscar Salvador
2025-12-05 21:35 ` [PATCH v1 4/4] mm/hugetlb: fix excessive IPI broadcasts when unsharing PMD tables using mmu_gather David Hildenbrand (Red Hat)
2025-12-07 12:15   ` Nadav Amit
2025-12-07 12:24     ` Nadav Amit
2025-12-07 12:39       ` David Hildenbrand (Red Hat)
2025-12-10 15:06   ` Lorenzo Stoakes
2025-12-11  2:27     ` David Hildenbrand (Red Hat)
2025-12-15 10:06       ` Lorenzo Stoakes
2025-12-15 14:47         ` David Hildenbrand (Red Hat) [this message]
2025-12-06 19:53 ` [PATCH v1 0/4] mm/hugetlb: fixes for PMD table sharing (incl. using mmu_gather) Laurence Oberman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=937a4525-910d-4596-a9c4-29e47ca53667@kernel.org \
    --to=david@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@kernel.org \
    --cc=arnd@arndb.de \
    --cc=harry.yoo@oracle.com \
    --cc=jannh@google.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=loberman@redhat.com \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=muchun.song@linux.dev \
    --cc=nadav.amit@gmail.com \
    --cc=npiggin@gmail.com \
    --cc=osalvador@suse.de \
    --cc=peterz@infradead.org \
    --cc=pfalcato@suse.de \
    --cc=prakash.sangappa@oracle.com \
    --cc=riel@surriel.com \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@suse.cz \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox