Re: [PATCH v1 4/4] mm/hugetlb: fix excessive IPI broadcasts when unsharing PMD tables using mmu_gather

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: "David Hildenbrand (Red Hat)" <david@kernel.org>
Cc: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-mm@kvack.org, Will Deacon <will@kernel.org>,
	"Aneesh Kumar K.V" <aneesh.kumar@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Nick Piggin <npiggin@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Arnd Bergmann <arnd@arndb.de>,
	Muchun Song <muchun.song@linux.dev>,
	Oscar Salvador <osalvador@suse.de>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
	Pedro Falcato <pfalcato@suse.de>, Rik van Riel <riel@surriel.com>,
	Harry Yoo <harry.yoo@oracle.com>,
	Laurence Oberman <loberman@redhat.com>,
	Prakash Sangappa <prakash.sangappa@oracle.com>,
	Nadav Amit <nadav.amit@gmail.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH v1 4/4] mm/hugetlb: fix excessive IPI broadcasts when unsharing PMD tables using mmu_gather
Date: Tue, 16 Dec 2025 10:45:43 +0000	[thread overview]
Message-ID: <4037214b-d1c6-4e0b-ad9d-6722aea7aba9@lucifer.local> (raw)
In-Reply-To: <937a4525-910d-4596-a9c4-29e47ca53667@kernel.org>

On Mon, Dec 15, 2025 at 03:47:57PM +0100, David Hildenbrand (Red Hat) wrote:
>
> > > >
> > > > As Nadav points out, should also initialise fully_unshared_tables.
> > >
> > > Right, but on an earlier init path, not on the range reset path here.
> >
> > Shouldn't we reset it also?
> >
> > I mean __tlb_reset_range() is also called by __tlb_gather_mmu() (invoked by
> > tlb_gather_mmu[_fullmm]()).
> >
>
> __tlb_reset_range() is all about flushing the TLB.
>
> In v2 I have:

(Was waiting for reply here before looking there of course :)

>
> diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
> index 247e3f9db6c7a..030a162a263ba 100644
> --- a/mm/mmu_gather.c
> +++ b/mm/mmu_gather.c
> @@ -426,6 +426,7 @@ static void __tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
>  #endif
>         tlb->vma_pfn = 0;
> +       tlb->fully_unshared_tables = 0;
>         __tlb_reset_range(tlb);
>         inc_tlb_flush_pending(tlb->mm);
>  }
>

OK so I guess since this isn't tied to flushing the TLB, but rather only to the
IPI we're good.

>
> > >
> > > >
> > > > >    	/*
> > > > >    	 * Do not reset mmu_gather::vma_* fields here, we do not
> > > > >    	 * call into tlb_start_vma() again to set them if there is an
> > > > > @@ -484,7 +496,7 @@ static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
> > > > >    	 * these bits.
> > > > >    	 */
> > > > >    	if (!(tlb->freed_tables || tlb->cleared_ptes || tlb->cleared_pmds ||
> > > > > -	      tlb->cleared_puds || tlb->cleared_p4ds))
> > > > > +	      tlb->cleared_puds || tlb->cleared_p4ds || tlb->unshared_tables))
> > > >
> > > > What about fully_unshared_tables? I guess though unshared_tables implies
> > > > fully_unshared_tables.
> > >
> > > fully_unshared_tables is only for triggering IPIs and consequently not about
> > > flushing TLBs.
> > >
> > > The TLB part is taken care of by unshared_tables, and we will always set
> > > unshared_tables when unsharing any page tables (incl. fully unshared ones).
> >
> > OK, so is there ever a situation where fully_unshared_tables would be set
> > without unshared_tables? Presumably not.
>
> tlb_unshare_pmd_ptdesc() will always set "unshared_tables" but only conditionally
> sets "fully_unshared_tables".

Of course.

>
> "unshared_tables" might get handled by a prior TLB flush,
> leaving only fully_unshared_tables set to perform the IPI in tlb_flush_unshared_tables().

I see, right, so we can in fact have one set without the other.

But this logic pertains to the TLB flush so we're ok.

>
> So the important part is that whenever we unshare, we set "unshared_tables".

Yes.

>
> [...]
>
> > > > > +{
> > > > > +	/*
> > > > > +	 * As soon as the caller drops locks to allow for reuse of
> > > > > +	 * previously-shared tables, these tables could get modified and
> > > > > +	 * even reused outside of hugetlb context. So flush the TLB now.
> > > >
> > > > Hmm but you're doing this in both the case of unshare and fully unsharing, so is
> > > > this the right place to make this comment?
> > >
> > > That's why I start the comment below with "Similarly", to make it clear that
> > > the comments build up on each other.
> > >
> > > But I'm afraid I might not be getting your point fully here :/
> >
> > what I mean is, if we are not at the point of the table being fully unshared,
> > nobody else can come in and reuse it right? Because we're still using it, just
> > dropped a ref + flushed tlb?
>
> After we drop the lock, someone else could fully unshare it. And that other (MM) would
> not be able to flush the TLB for us -- in contrast to the IPI that would affect all
> CPUs.

Ah so it's about TLB flushing for _us_ whereas the other user who fully unshares
will flush for _them_ which may not affect the CPUs we're using.

Yeah what fun this is :)

>
> >
> > Isn't really the correct comment here that ranges that previously mapped the
> > shared pages might no longer, so we must clear the TLB? I may be missing
> > something :)
>
> There are cases where we defer flushing the TLB until we dropped all (exclusive) locks.
> In particular, MADV_DONTNEED does that in some cases, essentially deferring the flush
> to the tlb_finish_mmu().
>
> free_pgtables() will also defer the flush, performing the TLB flush during tlb_finish_mmu(),
> before
>
> The point is (as I tried to make clear in the comment), for unsharing we have no control
> whenn the page table gets freed after we drop the lock.
>
> So we must flush the TLB now and cannot defer it like we do in the other cases.

Yeah I guess because of the above - that is - other users may unshare for their
CPUs but not unshare for ours?

>
> >
> > Or maybe the right thing is 'we must always flush the TLB because <blahdyblah>,
> > and if we are fully unsharing tables we must avoid reuse of previously-shared
> > tables when the caller drops the locks' or something?
>
> I hope the description above made it clearer why I spell out that the TLB must be flushed
> now.

Yeah I think so thanks :)

>
> >
> > >
> > > >
> > > > Surely here this is about flushing TLBs for the unsharer only as it no longer
> > > > uses it?
> > > >
> > > > > +	 *
> > > > > +	 * Note that we cannot defer the flush to a later point even if we are
> > > > > +	 * not the last sharer of the page table.
> > > > > +	 */
> > > >
> > > > Not hugely clear, some double negative here. Maybe worth saying something like:
> > > >
> > > > 'Even if we are not fully unsharing a PMD table, we must flush the TLB for the
> > > > unsharer who no longer has access to this memory'
> > > >
> > > > Or something? Assuming this is accurate :)
> > >
> > > I'll adjust it to "Not that even if we are not fully unsharing a PMD table,
> > > we must flush the TLB for the unsharer now.".
> >
> > I guess you mean Note or that's even more confusing :P
>
> :) Yeah, I did that in v2.

Thanks :)

>
> --
> Cheers
>
> David

Cheers, Lorenzo

next prev parent reply	other threads:[~2025-12-16 10:46 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-05 21:35 [PATCH v1 0/4] mm/hugetlb: fixes for PMD table sharing (incl. using mmu_gather) David Hildenbrand (Red Hat)
2025-12-05 21:35 ` [PATCH v1 1/4] mm/hugetlb: fix hugetlb_pmd_shared() David Hildenbrand (Red Hat)
2025-12-06  2:18   ` Rik van Riel
2025-12-06  5:55   ` Lance Yang
2025-12-06  6:24     ` Lance Yang
2025-12-08  2:32   ` Lance Yang
2025-12-08 11:01     ` David Hildenbrand (Red Hat)
2025-12-10 11:15     ` Lorenzo Stoakes
2025-12-08  9:08   ` Harry Yoo
2025-12-10 11:16   ` Lorenzo Stoakes
2025-12-11  5:38   ` Oscar Salvador
2025-12-05 21:35 ` [PATCH v1 2/4] mm/hugetlb: fix two comments related to huge_pmd_unshare() David Hildenbrand (Red Hat)
2025-12-06  2:26   ` Rik van Riel
2025-12-10 11:22   ` Lorenzo Stoakes
2025-12-11  1:58     ` David Hildenbrand (Red Hat)
2025-12-15  9:54       ` Lorenzo Stoakes
2025-12-11  5:41   ` Oscar Salvador
2025-12-05 21:35 ` [PATCH v1 3/4] mm/rmap: " David Hildenbrand (Red Hat)
2025-12-06  2:50   ` Rik van Riel
2025-12-10 11:24   ` Lorenzo Stoakes
2025-12-11  5:42   ` Oscar Salvador
2025-12-05 21:35 ` [PATCH v1 4/4] mm/hugetlb: fix excessive IPI broadcasts when unsharing PMD tables using mmu_gather David Hildenbrand (Red Hat)
2025-12-07 12:15   ` Nadav Amit
2025-12-07 12:24     ` Nadav Amit
2025-12-07 12:39       ` David Hildenbrand (Red Hat)
2025-12-10 15:06   ` Lorenzo Stoakes
2025-12-11  2:27     ` David Hildenbrand (Red Hat)
2025-12-15 10:06       ` Lorenzo Stoakes
2025-12-15 14:47         ` David Hildenbrand (Red Hat)
2025-12-16 10:45           ` Lorenzo Stoakes [this message]
2025-12-18 12:52             ` David Hildenbrand (Red Hat)
2025-12-06 19:53 ` [PATCH v1 0/4] mm/hugetlb: fixes for PMD table sharing (incl. using mmu_gather) Laurence Oberman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4037214b-d1c6-4e0b-ad9d-6722aea7aba9@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@kernel.org \
    --cc=arnd@arndb.de \
    --cc=david@kernel.org \
    --cc=harry.yoo@oracle.com \
    --cc=jannh@google.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=loberman@redhat.com \
    --cc=muchun.song@linux.dev \
    --cc=nadav.amit@gmail.com \
    --cc=npiggin@gmail.com \
    --cc=osalvador@suse.de \
    --cc=peterz@infradead.org \
    --cc=pfalcato@suse.de \
    --cc=prakash.sangappa@oracle.com \
    --cc=riel@surriel.com \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@suse.cz \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox