Re: [RFC PATCH v1 0/7] Open HugeTLB allocation routine for more generic use

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Joshua Hahn <joshua.hahnjy@gmail.com>
To: Ackerley Tng <ackerleytng@google.com>
Cc: akpm@linux-foundation.org, dan.j.williams@intel.com,
	david@kernel.org, fvdl@google.com, hannes@cmpxchg.org,
	jgg@nvidia.com, jiaqiyan@google.com, jthoughton@google.com,
	kalyazin@amazon.com, mhocko@kernel.org, michael.roth@amd.com,
	muchun.song@linux.dev, osalvador@suse.de,
	pasha.tatashin@soleen.com, pbonzini@redhat.com,
	peterx@redhat.com, pratyush@kernel.org,
	rick.p.edgecombe@intel.com, rientjes@google.com,
	roman.gushchin@linux.dev, seanjc@google.com,
	shakeel.butt@linux.dev, shivankg@amd.com, vannapurve@google.com,
	yan.y.zhao@intel.com, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [RFC PATCH v1 0/7] Open HugeTLB allocation routine for more generic use
Date: Thu, 26 Feb 2026 10:08:21 -0800	[thread overview]
Message-ID: <20260226180821.2218448-1-joshua.hahnjy@gmail.com> (raw)
In-Reply-To: <CAEvNRgGaJXbOGPQSgvo3rVDfis22DC4hYy=2Rczas0Vm3o66kQ@mail.gmail.com>

On Wed, 25 Feb 2026 19:37:04 -0800 Ackerley Tng <ackerleytng@google.com> wrote:

> Joshua Hahn <joshua.hahnjy@gmail.com> writes:
> 
> > On Wed, 11 Feb 2026 16:37:11 -0800 Ackerley Tng <ackerleytng@google.com> wrote:
> >
> > Hi Ackerly, I hope you're donig well!
> >
> > [...snip...]
> >
> >> I would like to get feedback on:
> >>
> >> 1. Opening up HugeTLB's allocation for more generic use
> >
> > I'm not entirely familiar with guest_memfd, so pleae excuse my ignorance
> > if I'm missing anything obvious.
> 
> Happy to take questions! Thank you for your thoughts and reviews!

Of course, thank you for your work, Ackerley!

> > But I'm wondering what hugeTLB offers
> > that other hugepage solutions cannot offer for guest_memfd, if the
> > goal of this series is to decouple it from hugeTLBfs.
> >
> 
> The one other huge page source that we've explored is THP pages from the
> buddy allocator. Compared to HugeTLB, huge pages from the buddy
> allocator
> 
> + Has a maximum size of 2M
> + Does not guarantee huge pages the way HugeTLB does - HugeTLB pages are
>   allocated at boot, and guest_memfd can reserve pages at guest_memfd
>   creation time.
> + Allocation of HugeTLB pages is also really fast, it's just dequeuing
>   from a preallocated pool

All of these make sense. Just wanted to know if guest_memfd had any
unique usecases for hugeTLB that normal hugetlbfs didn't have.

> The last reason to use HugeTLB is not because of any inherent advantage
> of using HugeTLB over other sources of huge pages, but for
> administrative/scheduling purposes:
> 
>   Given that existing non-guest_memfd workloads are already using
>   HugeTLB, for optimal scheduling, machine memory is already carved up
>   in HugeTLB pages for these workloads. Workloads that require using
>   guest_memfd (like Confidential VMs) must also use HugeTLB to
>   participate in optimial workload scheduling across machines.
> 
> >> 2. Reverting and re-adopting the try-commit-cancel protocol for memory
> >>    charging
> >
> > On the second point, I am wondering if reintroducing the try-commit-cancel
> > protocol is tied to factoring out hugetlb_alloc_folio. When I removed
> > the protocol a while back, the justification was that for the most part,
> > grabbing a hugetlb folio was a relatively cheap & fast operation, since
> > hugetlb mostly operates out of a preallocated pool.
> >
> > So the cost of being wrong, going above the limit, and having to return
> > the hugetlb folio was also relatively low.
> >
> 
> Thanks for this! I saw your patch to just optimistically grab a HugeTLB
> page :) For that patch, the primary reason was to simplify the logic,
> and the simplification was justifiable because grabbing a folio is
> cheap, right? (And so grabbing a folio being cheap wasn't a reason in
> itself?)

Yes, exactly!

> > It seems like this patch series introduces some new paths for hugetlb
> > pages to be consumed (specifically, without a reservation or vma).
> > I imagine that these new paths make the slowpath for hugetlb more frequent,
> > which makes the cost of assuming that the memcg limit is OK higher?
> > I think explicitly spelling this out in the justification for reintroducing
> > the charging protocol could be helpful.
> >
> 
> Yes, I should have done that. Will copy the following to the next
> revision.

Thank you for considering!

> The main reason is that reintroducing the charging protocol is the
> clearest way (for me) to cleanly refactor out hugetlb_alloc_folio()
> without worrying about the edge cases around HugeTLB reservations and
> charging.
> 
> If I didn't reintroduce the charging protocol, I would have to depend on
> freeing the new hugetlb folio on memcg charging failure, and the freeing
> in turn depends on the subpool correctly being set in the folio, and the
> presence of the subpool influences (in free_huge_folio()) whether the
> reservation was returned to the global hstate. Aaannnd... there's also a
> hugetlb_restore_reserve flag that controls whether to return the folio
> to the subpool (and the hstate). I find folio_clear_hugetlb_restore_reserve()
> on certain code paths kind of magical/unexplained too.

I see, if it makes the code simpler to introduce the protocol again, I see
no reason why we shouldn't revert the patch : -)

> I would rather iron out those charging and reservation details
> separately from this series (with more testing support).
> 
> On the other hand, reintroducing the charging protocol has the benefit
> of avoiding allocations (not just dequeuing, if surplus HugeTLB pages
> are required) if the memcg limit is hit. Also, if the original reason
> for removing the protocol was to simplify the code, refactoring out
> hugetlb_alloc_folio() also simplifies the code, and I think it's
> actually nice that memcg charging is done the same way as the other two
> (h_cg and h_cg_rsvd charging). After hugetlb_alloc_folio() is refactored
> out, the gotos make all three charging systems consistent and symmetric,
> which I think is nice to have :)
> 
> I hope the consistent/symmetric charging among all 3 systems is welcome,
> what do you think?

For the hugetlbfs case, the path to allocate a hugeTLB page on demand
makes sense, so I definitely see the argument for avoiding allocations.
Does guest_memfd also have a path to allocate a hugeTLB page outside of
the boottime reservations? In that case I think it would be nice to
clarify that the allocation failure case optimization is also for
guest_memfd, not only for hugetlbfs.

Symmetric charging is definitely welcome : -) All of your reasons make
sense to me, I just wanted to ask and make sure.

Thanks for your thoughts! I hope you have a great day!!
Joshua

     prev parent reply	other threads:[~2026-02-26 18:08 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-12  0:37 Ackerley Tng
2026-02-12  0:37 ` [RFC PATCH v1 1/7] mm: hugetlb: Consolidate interpretation of gbl_chg within alloc_hugetlb_folio() Ackerley Tng
2026-02-25 20:27   ` Joshua Hahn
2026-02-12  0:37 ` [RFC PATCH v1 2/7] mm: hugetlb: Move mpol interpretation out of alloc_buddy_hugetlb_folio_with_mpol() Ackerley Tng
2026-02-25 18:51   ` James Houghton
2026-02-12  0:37 ` [RFC PATCH v1 3/7] mm: hugetlb: Move mpol interpretation out of dequeue_hugetlb_folio_vma() Ackerley Tng
2026-02-25 19:57   ` James Houghton
2026-02-12  0:37 ` [RFC PATCH v1 4/7] Revert "memcg/hugetlb: remove memcg hugetlb try-commit-cancel protocol" Ackerley Tng
2026-02-12  0:37 ` [RFC PATCH v1 5/7] mm: hugetlb: Adopt memcg try-commit-cancel protocol Ackerley Tng
2026-02-12  0:37 ` [RFC PATCH v1 6/7] mm: memcontrol: Remove now-unused function mem_cgroup_charge_hugetlb Ackerley Tng
2026-02-12  0:37 ` [RFC PATCH v1 7/7] mm: hugetlb: Refactor out hugetlb_alloc_folio() Ackerley Tng
2026-02-25 20:24 ` [RFC PATCH v1 0/7] Open HugeTLB allocation routine for more generic use Joshua Hahn
2026-02-26  3:37   ` Ackerley Tng
2026-02-26 18:08     ` Joshua Hahn [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260226180821.2218448-1-joshua.hahnjy@gmail.com \
    --to=joshua.hahnjy@gmail.com \
    --cc=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=david@kernel.org \
    --cc=fvdl@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=jgg@nvidia.com \
    --cc=jiaqiyan@google.com \
    --cc=jthoughton@google.com \
    --cc=kalyazin@amazon.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=michael.roth@amd.com \
    --cc=muchun.song@linux.dev \
    --cc=osalvador@suse.de \
    --cc=pasha.tatashin@soleen.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=pratyush@kernel.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=seanjc@google.com \
    --cc=shakeel.butt@linux.dev \
    --cc=shivankg@amd.com \
    --cc=vannapurve@google.com \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox