linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Zi Yan <ziy@nvidia.com>
To: "David Hildenbrand (Arm)" <david@kernel.org>
Cc: Usama Arif <usama.arif@linux.dev>,
	willy@infradead.org, Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
	Johannes Weiner <hannes@cmpxchg.org>,
	riel@surriel.com, Shakeel Butt <shakeel.butt@linux.dev>,
	Kiryl Shutsemau <kas@kernel.org>, Barry Song <baohua@kernel.org>,
	Dev Jain <dev.jain@arm.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	Nico Pache <npache@redhat.com>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Ryan Roberts <ryan.roberts@arm.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Lance Yang <lance.yang@linux.dev>,
	Frank van der Linden <fvdl@google.com>
Subject: Re: [LSF/MM/BPF TOPIC] Beyond 2MB: Why Terabyte-Scale Machines Need 1GB Transparent Huge Pages
Date: Thu, 19 Feb 2026 11:49:27 -0500	[thread overview]
Message-ID: <87DAD8A6-85E7-4BC9-B81A-4A842DC546E3@nvidia.com> (raw)
In-Reply-To: <3485c8c8-9bfc-4725-885a-626e79d0aebb@kernel.org>

On 19 Feb 2026, at 11:00, David Hildenbrand (Arm) wrote:

>>
>> I see 1G THPs being opportunistically used ideally at the start of the application
>> or by the allocator (jemalloc/tcmalloc) when there is plenty of free memory
>> available and a greater chance of getting 1G THPs.
>>
>> Splitting strategy
>> ==================
>>
>> When PUD THP must be break -- for COW after fork, partial munmap, mprotect on
>> a subregion, or reclaim -- it splits directly from PUD to PTE level, converting
>> 1 PUD entry into 262,144 PTE entries. The ideal solution would be to split to
>> PMDs and only the necessary PMDs to PTEs. This is something that would hopefully
>> be possible with Davids proposal [3].

With mapping of folios > PMD with PMDs, you can use non uniform split to keep
after-split folios as large as possible.

>
> There once was this proposal where we would, instead of splitting a THP, migrate all memory away instead. That means, instead of splitting the 1 GiB THP, you would instead return it to the page allocator where somebody else could use it.

This sounds more reasonable than splitting 1GB itself.

>
> However, we cannot easily do the same when remapping a 1 GiB THP to be mapped by PMDs etc. I think there are examples where that just doesn't work or is not desired.
>
> But I considered that in general (avoid folio_split()) an interesting approach. The remapping part is a bit different though.

If HW can support multiple TLB entries translating to the same physical frame
and allow translation priority of TLB entries, this remapping would be easy
and we can still keep the 1GB PUD mapping. Basically, we can have 1GB TLB entry
pointing to the 1GB folio and another 4KB TLB entry pointing to the remapped
region and overriding the part in the original 1GB vaddr region.

Without that, SW will need to split the PUD into PMDs and PTEs.


Best Regards,
Yan, Zi


  parent reply	other threads:[~2026-02-19 16:49 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-19 15:53 Usama Arif
2026-02-19 16:00 ` David Hildenbrand (Arm)
2026-02-19 16:48   ` Johannes Weiner
2026-02-19 16:52     ` Zi Yan
2026-02-19 17:08       ` Johannes Weiner
2026-02-19 17:09         ` David Hildenbrand (Arm)
2026-02-19 17:09       ` David Hildenbrand (Arm)
2026-02-19 16:49   ` Zi Yan [this message]
2026-02-19 17:13     ` Matthew Wilcox
2026-02-19 17:28       ` Zi Yan
2026-02-19 19:02 ` Rik van Riel
2026-02-20 10:00   ` David Hildenbrand (Arm)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87DAD8A6-85E7-4BC9-B81A-4A842DC546E3@nvidia.com \
    --to=ziy@nvidia.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=fvdl@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=kas@kernel.org \
    --cc=lance.yang@linux.dev \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=npache@redhat.com \
    --cc=riel@surriel.com \
    --cc=ryan.roberts@arm.com \
    --cc=shakeel.butt@linux.dev \
    --cc=usama.arif@linux.dev \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox