From: Ackerley Tng <ackerleytng@google.com>
To: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: akpm@linux-foundation.org, dan.j.williams@intel.com,
david@kernel.org, fvdl@google.com, hannes@cmpxchg.org,
jgg@nvidia.com, jiaqiyan@google.com, jthoughton@google.com,
kalyazin@amazon.com, mhocko@kernel.org, michael.roth@amd.com,
muchun.song@linux.dev, osalvador@suse.de,
pasha.tatashin@soleen.com, pbonzini@redhat.com,
peterx@redhat.com, pratyush@kernel.org,
rick.p.edgecombe@intel.com, rientjes@google.com,
roman.gushchin@linux.dev, seanjc@google.com,
shakeel.butt@linux.dev, shivankg@amd.com, vannapurve@google.com,
yan.y.zhao@intel.com, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [RFC PATCH v1 0/7] Open HugeTLB allocation routine for more generic use
Date: Wed, 25 Feb 2026 19:37:04 -0800 [thread overview]
Message-ID: <CAEvNRgGaJXbOGPQSgvo3rVDfis22DC4hYy=2Rczas0Vm3o66kQ@mail.gmail.com> (raw)
In-Reply-To: <20260225202437.4077364-1-joshua.hahnjy@gmail.com>
Joshua Hahn <joshua.hahnjy@gmail.com> writes:
> On Wed, 11 Feb 2026 16:37:11 -0800 Ackerley Tng <ackerleytng@google.com> wrote:
>
> Hi Ackerly, I hope you're donig well!
>
> [...snip...]
>
>> I would like to get feedback on:
>>
>> 1. Opening up HugeTLB's allocation for more generic use
>
> I'm not entirely familiar with guest_memfd, so pleae excuse my ignorance
> if I'm missing anything obvious.
Happy to take questions! Thank you for your thoughts and reviews!
> But I'm wondering what hugeTLB offers
> that other hugepage solutions cannot offer for guest_memfd, if the
> goal of this series is to decouple it from hugeTLBfs.
>
The one other huge page source that we've explored is THP pages from the
buddy allocator. Compared to HugeTLB, huge pages from the buddy
allocator
+ Has a maximum size of 2M
+ Does not guarantee huge pages the way HugeTLB does - HugeTLB pages are
allocated at boot, and guest_memfd can reserve pages at guest_memfd
creation time.
+ Allocation of HugeTLB pages is also really fast, it's just dequeuing
from a preallocated pool
The last reason to use HugeTLB is not because of any inherent advantage
of using HugeTLB over other sources of huge pages, but for
administrative/scheduling purposes:
Given that existing non-guest_memfd workloads are already using
HugeTLB, for optimal scheduling, machine memory is already carved up
in HugeTLB pages for these workloads. Workloads that require using
guest_memfd (like Confidential VMs) must also use HugeTLB to
participate in optimial workload scheduling across machines.
>> 2. Reverting and re-adopting the try-commit-cancel protocol for memory
>> charging
>
> On the second point, I am wondering if reintroducing the try-commit-cancel
> protocol is tied to factoring out hugetlb_alloc_folio. When I removed
> the protocol a while back, the justification was that for the most part,
> grabbing a hugetlb folio was a relatively cheap & fast operation, since
> hugetlb mostly operates out of a preallocated pool.
>
> So the cost of being wrong, going above the limit, and having to return
> the hugetlb folio was also relatively low.
>
Thanks for this! I saw your patch to just optimistically grab a HugeTLB
page :) For that patch, the primary reason was to simplify the logic,
and the simplification was justifiable because grabbing a folio is
cheap, right? (And so grabbing a folio being cheap wasn't a reason in
itself?)
> It seems like this patch series introduces some new paths for hugetlb
> pages to be consumed (specifically, without a reservation or vma).
> I imagine that these new paths make the slowpath for hugetlb more frequent,
> which makes the cost of assuming that the memcg limit is OK higher?
> I think explicitly spelling this out in the justification for reintroducing
> the charging protocol could be helpful.
>
Yes, I should have done that. Will copy the following to the next
revision.
The main reason is that reintroducing the charging protocol is the
clearest way (for me) to cleanly refactor out hugetlb_alloc_folio()
without worrying about the edge cases around HugeTLB reservations and
charging.
If I didn't reintroduce the charging protocol, I would have to depend on
freeing the new hugetlb folio on memcg charging failure, and the freeing
in turn depends on the subpool correctly being set in the folio, and the
presence of the subpool influences (in free_huge_folio()) whether the
reservation was returned to the global hstate. Aaannnd... there's also a
hugetlb_restore_reserve flag that controls whether to return the folio
to the subpool (and the hstate). I find folio_clear_hugetlb_restore_reserve()
on certain code paths kind of magical/unexplained too.
I would rather iron out those charging and reservation details
separately from this series (with more testing support).
On the other hand, reintroducing the charging protocol has the benefit
of avoiding allocations (not just dequeuing, if surplus HugeTLB pages
are required) if the memcg limit is hit. Also, if the original reason
for removing the protocol was to simplify the code, refactoring out
hugetlb_alloc_folio() also simplifies the code, and I think it's
actually nice that memcg charging is done the same way as the other two
(h_cg and h_cg_rsvd charging). After hugetlb_alloc_folio() is refactored
out, the gotos make all three charging systems consistent and symmetric,
which I think is nice to have :)
I hope the consistent/symmetric charging among all 3 systems is welcome,
what do you think?
> Thank you for the series, again. I hope you have a great day!
> Joshua
>
>> To see how hugetlb_alloc_folio() is used by guest_memfd, the most
>> recent patch series that uses this more generic HugeTLB allocation
>> routine is at [1], and a newer revision of that patch series is at
>> [2].
>>
>> Independently of guest_memfd, I believe this change is useful in
>> simplifying alloc_hugetlb_folio(). alloc_hugetlb_folio() was so
>> coupled to a VMA that even HugeTLBfs allocates HugeTLB folios using a
>> pseudo-VMA.
>>
>> [1] https://lore.kernel.org/all/cover.1747264138.git.ackerleytng@google.com/T/
>> [2] https://github.com/googleprodkernel/linux-cc/tree/wip-gmem-conversions-hugetlb-restructuring-12-08-25
>>
>> Ackerley Tng (7):
>> mm: hugetlb: Consolidate interpretation of gbl_chg within
>> alloc_hugetlb_folio()
>> mm: hugetlb: Move mpol interpretation out of
>> alloc_buddy_hugetlb_folio_with_mpol()
>> mm: hugetlb: Move mpol interpretation out of
>> dequeue_hugetlb_folio_vma()
>> Revert "memcg/hugetlb: remove memcg hugetlb try-commit-cancel
>> protocol"
>> mm: hugetlb: Adopt memcg try-commit-cancel protocol
>> mm: memcontrol: Remove now-unused function mem_cgroup_charge_hugetlb
>> mm: hugetlb: Refactor out hugetlb_alloc_folio()
>>
>> include/linux/hugetlb.h | 11 ++
>> include/linux/memcontrol.h | 21 +++-
>> mm/hugetlb.c | 228 +++++++++++++++++++++----------------
>> mm/memcontrol.c | 77 ++++++++-----
>> 4 files changed, 212 insertions(+), 125 deletions(-)
>>
>>
>> base-commit: db9571a66156bfbc0273e66e5c77923869bda547
>> --
>> 2.53.0.310.g728cabbaf7-goog
>>
prev parent reply other threads:[~2026-02-26 3:37 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-12 0:37 Ackerley Tng
2026-02-12 0:37 ` [RFC PATCH v1 1/7] mm: hugetlb: Consolidate interpretation of gbl_chg within alloc_hugetlb_folio() Ackerley Tng
2026-02-25 20:27 ` Joshua Hahn
2026-02-12 0:37 ` [RFC PATCH v1 2/7] mm: hugetlb: Move mpol interpretation out of alloc_buddy_hugetlb_folio_with_mpol() Ackerley Tng
2026-02-25 18:51 ` James Houghton
2026-02-12 0:37 ` [RFC PATCH v1 3/7] mm: hugetlb: Move mpol interpretation out of dequeue_hugetlb_folio_vma() Ackerley Tng
2026-02-25 19:57 ` James Houghton
2026-02-12 0:37 ` [RFC PATCH v1 4/7] Revert "memcg/hugetlb: remove memcg hugetlb try-commit-cancel protocol" Ackerley Tng
2026-02-12 0:37 ` [RFC PATCH v1 5/7] mm: hugetlb: Adopt memcg try-commit-cancel protocol Ackerley Tng
2026-02-12 0:37 ` [RFC PATCH v1 6/7] mm: memcontrol: Remove now-unused function mem_cgroup_charge_hugetlb Ackerley Tng
2026-02-12 0:37 ` [RFC PATCH v1 7/7] mm: hugetlb: Refactor out hugetlb_alloc_folio() Ackerley Tng
2026-02-25 20:24 ` [RFC PATCH v1 0/7] Open HugeTLB allocation routine for more generic use Joshua Hahn
2026-02-26 3:37 ` Ackerley Tng [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAEvNRgGaJXbOGPQSgvo3rVDfis22DC4hYy=2Rczas0Vm3o66kQ@mail.gmail.com' \
--to=ackerleytng@google.com \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=dan.j.williams@intel.com \
--cc=david@kernel.org \
--cc=fvdl@google.com \
--cc=hannes@cmpxchg.org \
--cc=jgg@nvidia.com \
--cc=jiaqiyan@google.com \
--cc=joshua.hahnjy@gmail.com \
--cc=jthoughton@google.com \
--cc=kalyazin@amazon.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=michael.roth@amd.com \
--cc=muchun.song@linux.dev \
--cc=osalvador@suse.de \
--cc=pasha.tatashin@soleen.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=pratyush@kernel.org \
--cc=rick.p.edgecombe@intel.com \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=seanjc@google.com \
--cc=shakeel.butt@linux.dev \
--cc=shivankg@amd.com \
--cc=vannapurve@google.com \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox