From: Joshua Hahn <joshua.hahnjy@gmail.com>
To: Ackerley Tng <ackerleytng@google.com>
Cc: akpm@linux-foundation.org, dan.j.williams@intel.com,
david@kernel.org, fvdl@google.com, hannes@cmpxchg.org,
jgg@nvidia.com, jiaqiyan@google.com, jthoughton@google.com,
kalyazin@amazon.com, mhocko@kernel.org, michael.roth@amd.com,
muchun.song@linux.dev, osalvador@suse.de,
pasha.tatashin@soleen.com, pbonzini@redhat.com,
peterx@redhat.com, pratyush@kernel.org,
rick.p.edgecombe@intel.com, rientjes@google.com,
roman.gushchin@linux.dev, seanjc@google.com,
shakeel.butt@linux.dev, shivankg@amd.com, vannapurve@google.com,
yan.y.zhao@intel.com, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [RFC PATCH v1 0/7] Open HugeTLB allocation routine for more generic use
Date: Thu, 26 Feb 2026 10:08:21 -0800 [thread overview]
Message-ID: <20260226180821.2218448-1-joshua.hahnjy@gmail.com> (raw)
In-Reply-To: <CAEvNRgGaJXbOGPQSgvo3rVDfis22DC4hYy=2Rczas0Vm3o66kQ@mail.gmail.com>
On Wed, 25 Feb 2026 19:37:04 -0800 Ackerley Tng <ackerleytng@google.com> wrote:
> Joshua Hahn <joshua.hahnjy@gmail.com> writes:
>
> > On Wed, 11 Feb 2026 16:37:11 -0800 Ackerley Tng <ackerleytng@google.com> wrote:
> >
> > Hi Ackerly, I hope you're donig well!
> >
> > [...snip...]
> >
> >> I would like to get feedback on:
> >>
> >> 1. Opening up HugeTLB's allocation for more generic use
> >
> > I'm not entirely familiar with guest_memfd, so pleae excuse my ignorance
> > if I'm missing anything obvious.
>
> Happy to take questions! Thank you for your thoughts and reviews!
Of course, thank you for your work, Ackerley!
> > But I'm wondering what hugeTLB offers
> > that other hugepage solutions cannot offer for guest_memfd, if the
> > goal of this series is to decouple it from hugeTLBfs.
> >
>
> The one other huge page source that we've explored is THP pages from the
> buddy allocator. Compared to HugeTLB, huge pages from the buddy
> allocator
>
> + Has a maximum size of 2M
> + Does not guarantee huge pages the way HugeTLB does - HugeTLB pages are
> allocated at boot, and guest_memfd can reserve pages at guest_memfd
> creation time.
> + Allocation of HugeTLB pages is also really fast, it's just dequeuing
> from a preallocated pool
All of these make sense. Just wanted to know if guest_memfd had any
unique usecases for hugeTLB that normal hugetlbfs didn't have.
> The last reason to use HugeTLB is not because of any inherent advantage
> of using HugeTLB over other sources of huge pages, but for
> administrative/scheduling purposes:
>
> Given that existing non-guest_memfd workloads are already using
> HugeTLB, for optimal scheduling, machine memory is already carved up
> in HugeTLB pages for these workloads. Workloads that require using
> guest_memfd (like Confidential VMs) must also use HugeTLB to
> participate in optimial workload scheduling across machines.
>
> >> 2. Reverting and re-adopting the try-commit-cancel protocol for memory
> >> charging
> >
> > On the second point, I am wondering if reintroducing the try-commit-cancel
> > protocol is tied to factoring out hugetlb_alloc_folio. When I removed
> > the protocol a while back, the justification was that for the most part,
> > grabbing a hugetlb folio was a relatively cheap & fast operation, since
> > hugetlb mostly operates out of a preallocated pool.
> >
> > So the cost of being wrong, going above the limit, and having to return
> > the hugetlb folio was also relatively low.
> >
>
> Thanks for this! I saw your patch to just optimistically grab a HugeTLB
> page :) For that patch, the primary reason was to simplify the logic,
> and the simplification was justifiable because grabbing a folio is
> cheap, right? (And so grabbing a folio being cheap wasn't a reason in
> itself?)
Yes, exactly!
> > It seems like this patch series introduces some new paths for hugetlb
> > pages to be consumed (specifically, without a reservation or vma).
> > I imagine that these new paths make the slowpath for hugetlb more frequent,
> > which makes the cost of assuming that the memcg limit is OK higher?
> > I think explicitly spelling this out in the justification for reintroducing
> > the charging protocol could be helpful.
> >
>
> Yes, I should have done that. Will copy the following to the next
> revision.
Thank you for considering!
> The main reason is that reintroducing the charging protocol is the
> clearest way (for me) to cleanly refactor out hugetlb_alloc_folio()
> without worrying about the edge cases around HugeTLB reservations and
> charging.
>
> If I didn't reintroduce the charging protocol, I would have to depend on
> freeing the new hugetlb folio on memcg charging failure, and the freeing
> in turn depends on the subpool correctly being set in the folio, and the
> presence of the subpool influences (in free_huge_folio()) whether the
> reservation was returned to the global hstate. Aaannnd... there's also a
> hugetlb_restore_reserve flag that controls whether to return the folio
> to the subpool (and the hstate). I find folio_clear_hugetlb_restore_reserve()
> on certain code paths kind of magical/unexplained too.
I see, if it makes the code simpler to introduce the protocol again, I see
no reason why we shouldn't revert the patch : -)
> I would rather iron out those charging and reservation details
> separately from this series (with more testing support).
>
> On the other hand, reintroducing the charging protocol has the benefit
> of avoiding allocations (not just dequeuing, if surplus HugeTLB pages
> are required) if the memcg limit is hit. Also, if the original reason
> for removing the protocol was to simplify the code, refactoring out
> hugetlb_alloc_folio() also simplifies the code, and I think it's
> actually nice that memcg charging is done the same way as the other two
> (h_cg and h_cg_rsvd charging). After hugetlb_alloc_folio() is refactored
> out, the gotos make all three charging systems consistent and symmetric,
> which I think is nice to have :)
>
> I hope the consistent/symmetric charging among all 3 systems is welcome,
> what do you think?
For the hugetlbfs case, the path to allocate a hugeTLB page on demand
makes sense, so I definitely see the argument for avoiding allocations.
Does guest_memfd also have a path to allocate a hugeTLB page outside of
the boottime reservations? In that case I think it would be nice to
clarify that the allocation failure case optimization is also for
guest_memfd, not only for hugetlbfs.
Symmetric charging is definitely welcome : -) All of your reasons make
sense to me, I just wanted to ask and make sure.
Thanks for your thoughts! I hope you have a great day!!
Joshua
prev parent reply other threads:[~2026-02-26 18:08 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-12 0:37 Ackerley Tng
2026-02-12 0:37 ` [RFC PATCH v1 1/7] mm: hugetlb: Consolidate interpretation of gbl_chg within alloc_hugetlb_folio() Ackerley Tng
2026-02-25 20:27 ` Joshua Hahn
2026-02-12 0:37 ` [RFC PATCH v1 2/7] mm: hugetlb: Move mpol interpretation out of alloc_buddy_hugetlb_folio_with_mpol() Ackerley Tng
2026-02-25 18:51 ` James Houghton
2026-02-12 0:37 ` [RFC PATCH v1 3/7] mm: hugetlb: Move mpol interpretation out of dequeue_hugetlb_folio_vma() Ackerley Tng
2026-02-25 19:57 ` James Houghton
2026-02-12 0:37 ` [RFC PATCH v1 4/7] Revert "memcg/hugetlb: remove memcg hugetlb try-commit-cancel protocol" Ackerley Tng
2026-02-12 0:37 ` [RFC PATCH v1 5/7] mm: hugetlb: Adopt memcg try-commit-cancel protocol Ackerley Tng
2026-02-12 0:37 ` [RFC PATCH v1 6/7] mm: memcontrol: Remove now-unused function mem_cgroup_charge_hugetlb Ackerley Tng
2026-02-12 0:37 ` [RFC PATCH v1 7/7] mm: hugetlb: Refactor out hugetlb_alloc_folio() Ackerley Tng
2026-02-25 20:24 ` [RFC PATCH v1 0/7] Open HugeTLB allocation routine for more generic use Joshua Hahn
2026-02-26 3:37 ` Ackerley Tng
2026-02-26 18:08 ` Joshua Hahn [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260226180821.2218448-1-joshua.hahnjy@gmail.com \
--to=joshua.hahnjy@gmail.com \
--cc=ackerleytng@google.com \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=dan.j.williams@intel.com \
--cc=david@kernel.org \
--cc=fvdl@google.com \
--cc=hannes@cmpxchg.org \
--cc=jgg@nvidia.com \
--cc=jiaqiyan@google.com \
--cc=jthoughton@google.com \
--cc=kalyazin@amazon.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=michael.roth@amd.com \
--cc=muchun.song@linux.dev \
--cc=osalvador@suse.de \
--cc=pasha.tatashin@soleen.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=pratyush@kernel.org \
--cc=rick.p.edgecombe@intel.com \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=seanjc@google.com \
--cc=shakeel.butt@linux.dev \
--cc=shivankg@amd.com \
--cc=vannapurve@google.com \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox