Re: [LSF/MM/BPF TOPIC] Beyond 2MB: Why Terabyte-Scale Machines Need 1GB Transparent Huge Pages

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Zi Yan <ziy@nvidia.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: "David Hildenbrand (Arm)" <david@kernel.org>,
	Usama Arif <usama.arif@linux.dev>,
	willy@infradead.org, Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
	riel@surriel.com, Shakeel Butt <shakeel.butt@linux.dev>,
	Kiryl Shutsemau <kas@kernel.org>, Barry Song <baohua@kernel.org>,
	Dev Jain <dev.jain@arm.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	Nico Pache <npache@redhat.com>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Ryan Roberts <ryan.roberts@arm.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Lance Yang <lance.yang@linux.dev>,
	Frank van der Linden <fvdl@google.com>
Subject: Re: [LSF/MM/BPF TOPIC] Beyond 2MB: Why Terabyte-Scale Machines Need 1GB Transparent Huge Pages
Date: Thu, 19 Feb 2026 11:52:57 -0500	[thread overview]
Message-ID: <A4DDD3A2-104A-42BD-B174-676012166BEC@nvidia.com> (raw)
In-Reply-To: <aZc-8dMBz1XCJI3n@cmpxchg.org>

On 19 Feb 2026, at 11:48, Johannes Weiner wrote:

> On Thu, Feb 19, 2026 at 05:00:19PM +0100, David Hildenbrand (Arm) wrote:
>>
>>>
>>> I see 1G THPs being opportunistically used ideally at the start of the application
>>> or by the allocator (jemalloc/tcmalloc) when there is plenty of free memory
>>> available and a greater chance of getting 1G THPs.
>>>
>>> Splitting strategy
>>> ==================
>>>
>>> When PUD THP must be break -- for COW after fork, partial munmap, mprotect on
>>> a subregion, or reclaim -- it splits directly from PUD to PTE level, converting
>>> 1 PUD entry into 262,144 PTE entries. The ideal solution would be to split to
>>> PMDs and only the necessary PMDs to PTEs. This is something that would hopefully
>>> be possible with Davids proposal [3].
>>
>> There once was this proposal where we would, instead of splitting a THP,
>> migrate all memory away instead. That means, instead of splitting the 1
>> GiB THP, you would instead return it to the page allocator where
>> somebody else could use it.
>
> With TLB coalescing, there is benefit in preserving contiguity. If you
> lop off the last 4k of a 2M-backed range, a split still gives you 511
> contiguously mapped pfns that can be coalesced.

Which CPU are you referring to? AMD’s PTE coalescing works up to 32KB
and ARM’s contig PTE supports larger sizes. BTW, do we have PMD level
ARM contiguous bit support?

>
> It would be unfortunate to lose that for pure virtual memory splits,
> while there is no demand or no shortage of huge pages. But it might be
> possible to do this lazily, e.g. when somebody has trouble getting a
> larger page, scan the deferred split lists for candidates to migrate.


Best Regards,
Yan, Zi

next prev parent reply	other threads:[~2026-02-19 16:53 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-19 15:53 Usama Arif
2026-02-19 16:00 ` David Hildenbrand (Arm)
2026-02-19 16:48   ` Johannes Weiner
2026-02-19 16:52     ` Zi Yan [this message]
2026-02-19 17:08       ` Johannes Weiner
2026-02-19 17:09         ` David Hildenbrand (Arm)
2026-02-19 17:09       ` David Hildenbrand (Arm)
2026-02-19 16:49   ` Zi Yan
2026-02-19 17:13     ` Matthew Wilcox
2026-02-19 17:28       ` Zi Yan
2026-02-19 19:02 ` Rik van Riel
2026-02-20 10:00   ` David Hildenbrand (Arm)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=A4DDD3A2-104A-42BD-B174-676012166BEC@nvidia.com \
    --to=ziy@nvidia.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=fvdl@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=kas@kernel.org \
    --cc=lance.yang@linux.dev \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=npache@redhat.com \
    --cc=riel@surriel.com \
    --cc=ryan.roberts@arm.com \
    --cc=shakeel.butt@linux.dev \
    --cc=usama.arif@linux.dev \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox