From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: "David Hildenbrand (Red Hat)" <david@kernel.org>
Cc: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
linux-mm@kvack.org, Will Deacon <will@kernel.org>,
"Aneesh Kumar K.V" <aneesh.kumar@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Nick Piggin <npiggin@gmail.com>,
Peter Zijlstra <peterz@infradead.org>,
Arnd Bergmann <arnd@arndb.de>,
Muchun Song <muchun.song@linux.dev>,
Oscar Salvador <osalvador@suse.de>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
Pedro Falcato <pfalcato@suse.de>, Rik van Riel <riel@surriel.com>,
Harry Yoo <harry.yoo@oracle.com>,
Laurence Oberman <loberman@redhat.com>,
Prakash Sangappa <prakash.sangappa@oracle.com>,
Nadav Amit <nadav.amit@gmail.com>,
stable@vger.kernel.org
Subject: Re: [PATCH v1 4/4] mm/hugetlb: fix excessive IPI broadcasts when unsharing PMD tables using mmu_gather
Date: Tue, 16 Dec 2025 10:45:43 +0000 [thread overview]
Message-ID: <4037214b-d1c6-4e0b-ad9d-6722aea7aba9@lucifer.local> (raw)
In-Reply-To: <937a4525-910d-4596-a9c4-29e47ca53667@kernel.org>
On Mon, Dec 15, 2025 at 03:47:57PM +0100, David Hildenbrand (Red Hat) wrote:
>
> > > >
> > > > As Nadav points out, should also initialise fully_unshared_tables.
> > >
> > > Right, but on an earlier init path, not on the range reset path here.
> >
> > Shouldn't we reset it also?
> >
> > I mean __tlb_reset_range() is also called by __tlb_gather_mmu() (invoked by
> > tlb_gather_mmu[_fullmm]()).
> >
>
> __tlb_reset_range() is all about flushing the TLB.
>
> In v2 I have:
(Was waiting for reply here before looking there of course :)
>
> diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
> index 247e3f9db6c7a..030a162a263ba 100644
> --- a/mm/mmu_gather.c
> +++ b/mm/mmu_gather.c
> @@ -426,6 +426,7 @@ static void __tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
> #endif
> tlb->vma_pfn = 0;
> + tlb->fully_unshared_tables = 0;
> __tlb_reset_range(tlb);
> inc_tlb_flush_pending(tlb->mm);
> }
>
OK so I guess since this isn't tied to flushing the TLB, but rather only to the
IPI we're good.
>
> > >
> > > >
> > > > > /*
> > > > > * Do not reset mmu_gather::vma_* fields here, we do not
> > > > > * call into tlb_start_vma() again to set them if there is an
> > > > > @@ -484,7 +496,7 @@ static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
> > > > > * these bits.
> > > > > */
> > > > > if (!(tlb->freed_tables || tlb->cleared_ptes || tlb->cleared_pmds ||
> > > > > - tlb->cleared_puds || tlb->cleared_p4ds))
> > > > > + tlb->cleared_puds || tlb->cleared_p4ds || tlb->unshared_tables))
> > > >
> > > > What about fully_unshared_tables? I guess though unshared_tables implies
> > > > fully_unshared_tables.
> > >
> > > fully_unshared_tables is only for triggering IPIs and consequently not about
> > > flushing TLBs.
> > >
> > > The TLB part is taken care of by unshared_tables, and we will always set
> > > unshared_tables when unsharing any page tables (incl. fully unshared ones).
> >
> > OK, so is there ever a situation where fully_unshared_tables would be set
> > without unshared_tables? Presumably not.
>
> tlb_unshare_pmd_ptdesc() will always set "unshared_tables" but only conditionally
> sets "fully_unshared_tables".
Of course.
>
> "unshared_tables" might get handled by a prior TLB flush,
> leaving only fully_unshared_tables set to perform the IPI in tlb_flush_unshared_tables().
I see, right, so we can in fact have one set without the other.
But this logic pertains to the TLB flush so we're ok.
>
> So the important part is that whenever we unshare, we set "unshared_tables".
Yes.
>
> [...]
>
> > > > > +{
> > > > > + /*
> > > > > + * As soon as the caller drops locks to allow for reuse of
> > > > > + * previously-shared tables, these tables could get modified and
> > > > > + * even reused outside of hugetlb context. So flush the TLB now.
> > > >
> > > > Hmm but you're doing this in both the case of unshare and fully unsharing, so is
> > > > this the right place to make this comment?
> > >
> > > That's why I start the comment below with "Similarly", to make it clear that
> > > the comments build up on each other.
> > >
> > > But I'm afraid I might not be getting your point fully here :/
> >
> > what I mean is, if we are not at the point of the table being fully unshared,
> > nobody else can come in and reuse it right? Because we're still using it, just
> > dropped a ref + flushed tlb?
>
> After we drop the lock, someone else could fully unshare it. And that other (MM) would
> not be able to flush the TLB for us -- in contrast to the IPI that would affect all
> CPUs.
Ah so it's about TLB flushing for _us_ whereas the other user who fully unshares
will flush for _them_ which may not affect the CPUs we're using.
Yeah what fun this is :)
>
> >
> > Isn't really the correct comment here that ranges that previously mapped the
> > shared pages might no longer, so we must clear the TLB? I may be missing
> > something :)
>
> There are cases where we defer flushing the TLB until we dropped all (exclusive) locks.
> In particular, MADV_DONTNEED does that in some cases, essentially deferring the flush
> to the tlb_finish_mmu().
>
> free_pgtables() will also defer the flush, performing the TLB flush during tlb_finish_mmu(),
> before
>
> The point is (as I tried to make clear in the comment), for unsharing we have no control
> whenn the page table gets freed after we drop the lock.
>
> So we must flush the TLB now and cannot defer it like we do in the other cases.
Yeah I guess because of the above - that is - other users may unshare for their
CPUs but not unshare for ours?
>
> >
> > Or maybe the right thing is 'we must always flush the TLB because <blahdyblah>,
> > and if we are fully unsharing tables we must avoid reuse of previously-shared
> > tables when the caller drops the locks' or something?
>
> I hope the description above made it clearer why I spell out that the TLB must be flushed
> now.
Yeah I think so thanks :)
>
> >
> > >
> > > >
> > > > Surely here this is about flushing TLBs for the unsharer only as it no longer
> > > > uses it?
> > > >
> > > > > + *
> > > > > + * Note that we cannot defer the flush to a later point even if we are
> > > > > + * not the last sharer of the page table.
> > > > > + */
> > > >
> > > > Not hugely clear, some double negative here. Maybe worth saying something like:
> > > >
> > > > 'Even if we are not fully unsharing a PMD table, we must flush the TLB for the
> > > > unsharer who no longer has access to this memory'
> > > >
> > > > Or something? Assuming this is accurate :)
> > >
> > > I'll adjust it to "Not that even if we are not fully unsharing a PMD table,
> > > we must flush the TLB for the unsharer now.".
> >
> > I guess you mean Note or that's even more confusing :P
>
> :) Yeah, I did that in v2.
Thanks :)
>
> --
> Cheers
>
> David
Cheers, Lorenzo
next prev parent reply other threads:[~2025-12-16 10:46 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-05 21:35 [PATCH v1 0/4] mm/hugetlb: fixes for PMD table sharing (incl. using mmu_gather) David Hildenbrand (Red Hat)
2025-12-05 21:35 ` [PATCH v1 1/4] mm/hugetlb: fix hugetlb_pmd_shared() David Hildenbrand (Red Hat)
2025-12-06 2:18 ` Rik van Riel
2025-12-06 5:55 ` Lance Yang
2025-12-06 6:24 ` Lance Yang
2025-12-08 2:32 ` Lance Yang
2025-12-08 11:01 ` David Hildenbrand (Red Hat)
2025-12-10 11:15 ` Lorenzo Stoakes
2025-12-08 9:08 ` Harry Yoo
2025-12-10 11:16 ` Lorenzo Stoakes
2025-12-11 5:38 ` Oscar Salvador
2025-12-05 21:35 ` [PATCH v1 2/4] mm/hugetlb: fix two comments related to huge_pmd_unshare() David Hildenbrand (Red Hat)
2025-12-06 2:26 ` Rik van Riel
2025-12-10 11:22 ` Lorenzo Stoakes
2025-12-11 1:58 ` David Hildenbrand (Red Hat)
2025-12-15 9:54 ` Lorenzo Stoakes
2025-12-11 5:41 ` Oscar Salvador
2025-12-05 21:35 ` [PATCH v1 3/4] mm/rmap: " David Hildenbrand (Red Hat)
2025-12-06 2:50 ` Rik van Riel
2025-12-10 11:24 ` Lorenzo Stoakes
2025-12-11 5:42 ` Oscar Salvador
2025-12-05 21:35 ` [PATCH v1 4/4] mm/hugetlb: fix excessive IPI broadcasts when unsharing PMD tables using mmu_gather David Hildenbrand (Red Hat)
2025-12-07 12:15 ` Nadav Amit
2025-12-07 12:24 ` Nadav Amit
2025-12-07 12:39 ` David Hildenbrand (Red Hat)
2025-12-10 15:06 ` Lorenzo Stoakes
2025-12-11 2:27 ` David Hildenbrand (Red Hat)
2025-12-15 10:06 ` Lorenzo Stoakes
2025-12-15 14:47 ` David Hildenbrand (Red Hat)
2025-12-16 10:45 ` Lorenzo Stoakes [this message]
2025-12-18 12:52 ` David Hildenbrand (Red Hat)
2025-12-06 19:53 ` [PATCH v1 0/4] mm/hugetlb: fixes for PMD table sharing (incl. using mmu_gather) Laurence Oberman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4037214b-d1c6-4e0b-ad9d-6722aea7aba9@lucifer.local \
--to=lorenzo.stoakes@oracle.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@kernel.org \
--cc=arnd@arndb.de \
--cc=david@kernel.org \
--cc=harry.yoo@oracle.com \
--cc=jannh@google.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=loberman@redhat.com \
--cc=muchun.song@linux.dev \
--cc=nadav.amit@gmail.com \
--cc=npiggin@gmail.com \
--cc=osalvador@suse.de \
--cc=peterz@infradead.org \
--cc=pfalcato@suse.de \
--cc=prakash.sangappa@oracle.com \
--cc=riel@surriel.com \
--cc=stable@vger.kernel.org \
--cc=vbabka@suse.cz \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox