linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Zi Yan <ziy@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Usama Arif <usama.arif@linux.dev>,
	David Hildenbrand <david@kernel.org>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
	riel@surriel.com, Shakeel Butt <shakeel.butt@linux.dev>,
	Kiryl Shutsemau <kas@kernel.org>, Barry Song <baohua@kernel.org>,
	Dev Jain <dev.jain@arm.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	Nico Pache <npache@redhat.com>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Ryan Roberts <ryan.roberts@arm.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Lance Yang <lance.yang@linux.dev>,
	Frank van der Linden <fvdl@google.com>
Subject: Re: [LSF/MM/BPF TOPIC] Beyond 2MB: Why Terabyte-Scale Machines Need 1GB Transparent Huge Pages
Date: Tue, 24 Feb 2026 20:35:15 +0000	[thread overview]
Message-ID: <aZ4Lg51BVmGE5MLn@casper.infradead.org> (raw)
In-Reply-To: <42CCC4AB-EE32-4279-BB50-EE72756B5137@nvidia.com>

On Tue, Feb 24, 2026 at 02:08:26PM -0500, Zi Yan wrote:
> On 24 Feb 2026, at 14:03, Johannes Weiner wrote:
> > On Thu, Feb 19, 2026 at 03:53:35PM +0000, Usama Arif wrote:
> >> Why 1G THP over hugetlbfs?
> >> ==========================
> >
> > I know this isn't your intention, but one interesting aspect of
> > supporting PUD mapped folios natively is that it could open the door
> > to simplifying hugetlb as well.
> >
> > We currently have all kinds of huge_vma checks scattered over the page
> > table code, and entirely parallel paths for unmapping etc. With native
> > PUD mappings, this could allow pushing the special casing out of the
> > virtual memory layer and into where we deal with the page objects.
> >
> > You might be able to take it as far as the only thing left of hugetlb
> > is the reservation pool. Such that a naive application does mmap() as
> > per usual, and it comes down to a separate allocation policy how the
> > backing pages are served (buddy, CMA, boot-time reservations, ...)
> >
> > Approaching it this way could help separate out the discussion on code
> > impact and tech debt of PUD mappings, from the allocation technique
> > question, which in itself is a fairly large topic.
> 
> I agree with this 100%. Adding 1GB folio support first, we then can think
> about what other THP features, e.g., split, migration, PMD/PTE mapping, are
> really needed and add them one by one. It is also going to be a good way
> of retiring hugetlb special code.

But this hasn't happened yet for PMD-sized hugetlb, and there's no need
to wait for PUD-sized THP to start this process.  I don't think that
introducing PUD-sized THP will actually motivate anyone to do this work.

I think we have four main things that hugetlb still offers:

 - Reserved pool (mentioned above) which we don't yet have a THP
   replacement for
 - shared page tables.  mshare() is the replacement here, and that
   project is moving along nicely.
 - Being able to allocate gigantic folios.  This is also progressing.
 - Guaranteeing that you don't get a fallback; you either get memory in
   the size you asked for, or you fail.

Every time this comes up, I offer the pagewalk code as an egregious
example of where we force every user to know "oh, hugetlb is special".
Getting rid of mm_walk_ops->hugetlb_entry() would be a great improvement.

People always look at the fault handler first and say "Ah, this is
an obvious hugetlb-is-special case I can get rid of", but honestly
it's not that painful to keep around and doesn't affect anyone else.
mm_walk_ops affects everybody who walks page tables.


      reply	other threads:[~2026-02-24 20:35 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-19 15:53 Usama Arif
2026-02-19 16:00 ` David Hildenbrand (Arm)
2026-02-19 16:48   ` Johannes Weiner
2026-02-19 16:52     ` Zi Yan
2026-02-19 17:08       ` Johannes Weiner
2026-02-19 17:09         ` David Hildenbrand (Arm)
2026-02-19 17:09       ` David Hildenbrand (Arm)
2026-02-19 16:49   ` Zi Yan
2026-02-19 17:13     ` Matthew Wilcox
2026-02-19 17:28       ` Zi Yan
2026-02-19 19:02 ` Rik van Riel
2026-02-20 10:00   ` David Hildenbrand (Arm)
2026-02-24 19:03 ` Johannes Weiner
2026-02-24 19:08   ` Zi Yan
2026-02-24 20:35     ` Matthew Wilcox [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aZ4Lg51BVmGE5MLn@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=fvdl@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=kas@kernel.org \
    --cc=lance.yang@linux.dev \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=npache@redhat.com \
    --cc=riel@surriel.com \
    --cc=ryan.roberts@arm.com \
    --cc=shakeel.butt@linux.dev \
    --cc=usama.arif@linux.dev \
    --cc=vbabka@suse.cz \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox