From: Matthew Wilcox <willy@infradead.org>
To: Zi Yan <ziy@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
Usama Arif <usama.arif@linux.dev>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Andrew Morton <akpm@linux-foundation.org>,
lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
riel@surriel.com, Shakeel Butt <shakeel.butt@linux.dev>,
Kiryl Shutsemau <kas@kernel.org>, Barry Song <baohua@kernel.org>,
Dev Jain <dev.jain@arm.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Nico Pache <npache@redhat.com>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
Ryan Roberts <ryan.roberts@arm.com>,
Vlastimil Babka <vbabka@suse.cz>,
Lance Yang <lance.yang@linux.dev>,
Frank van der Linden <fvdl@google.com>
Subject: Re: [LSF/MM/BPF TOPIC] Beyond 2MB: Why Terabyte-Scale Machines Need 1GB Transparent Huge Pages
Date: Tue, 24 Feb 2026 20:35:15 +0000 [thread overview]
Message-ID: <aZ4Lg51BVmGE5MLn@casper.infradead.org> (raw)
In-Reply-To: <42CCC4AB-EE32-4279-BB50-EE72756B5137@nvidia.com>
On Tue, Feb 24, 2026 at 02:08:26PM -0500, Zi Yan wrote:
> On 24 Feb 2026, at 14:03, Johannes Weiner wrote:
> > On Thu, Feb 19, 2026 at 03:53:35PM +0000, Usama Arif wrote:
> >> Why 1G THP over hugetlbfs?
> >> ==========================
> >
> > I know this isn't your intention, but one interesting aspect of
> > supporting PUD mapped folios natively is that it could open the door
> > to simplifying hugetlb as well.
> >
> > We currently have all kinds of huge_vma checks scattered over the page
> > table code, and entirely parallel paths for unmapping etc. With native
> > PUD mappings, this could allow pushing the special casing out of the
> > virtual memory layer and into where we deal with the page objects.
> >
> > You might be able to take it as far as the only thing left of hugetlb
> > is the reservation pool. Such that a naive application does mmap() as
> > per usual, and it comes down to a separate allocation policy how the
> > backing pages are served (buddy, CMA, boot-time reservations, ...)
> >
> > Approaching it this way could help separate out the discussion on code
> > impact and tech debt of PUD mappings, from the allocation technique
> > question, which in itself is a fairly large topic.
>
> I agree with this 100%. Adding 1GB folio support first, we then can think
> about what other THP features, e.g., split, migration, PMD/PTE mapping, are
> really needed and add them one by one. It is also going to be a good way
> of retiring hugetlb special code.
But this hasn't happened yet for PMD-sized hugetlb, and there's no need
to wait for PUD-sized THP to start this process. I don't think that
introducing PUD-sized THP will actually motivate anyone to do this work.
I think we have four main things that hugetlb still offers:
- Reserved pool (mentioned above) which we don't yet have a THP
replacement for
- shared page tables. mshare() is the replacement here, and that
project is moving along nicely.
- Being able to allocate gigantic folios. This is also progressing.
- Guaranteeing that you don't get a fallback; you either get memory in
the size you asked for, or you fail.
Every time this comes up, I offer the pagewalk code as an egregious
example of where we force every user to know "oh, hugetlb is special".
Getting rid of mm_walk_ops->hugetlb_entry() would be a great improvement.
People always look at the fault handler first and say "Ah, this is
an obvious hugetlb-is-special case I can get rid of", but honestly
it's not that painful to keep around and doesn't affect anyone else.
mm_walk_ops affects everybody who walks page tables.
prev parent reply other threads:[~2026-02-24 20:35 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-19 15:53 Usama Arif
2026-02-19 16:00 ` David Hildenbrand (Arm)
2026-02-19 16:48 ` Johannes Weiner
2026-02-19 16:52 ` Zi Yan
2026-02-19 17:08 ` Johannes Weiner
2026-02-19 17:09 ` David Hildenbrand (Arm)
2026-02-19 17:09 ` David Hildenbrand (Arm)
2026-02-19 16:49 ` Zi Yan
2026-02-19 17:13 ` Matthew Wilcox
2026-02-19 17:28 ` Zi Yan
2026-02-19 19:02 ` Rik van Riel
2026-02-20 10:00 ` David Hildenbrand (Arm)
2026-02-24 19:03 ` Johannes Weiner
2026-02-24 19:08 ` Zi Yan
2026-02-24 20:35 ` Matthew Wilcox [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aZ4Lg51BVmGE5MLn@casper.infradead.org \
--to=willy@infradead.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=fvdl@google.com \
--cc=hannes@cmpxchg.org \
--cc=kas@kernel.org \
--cc=lance.yang@linux.dev \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=npache@redhat.com \
--cc=riel@surriel.com \
--cc=ryan.roberts@arm.com \
--cc=shakeel.butt@linux.dev \
--cc=usama.arif@linux.dev \
--cc=vbabka@suse.cz \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox