Re: [RFC 00/12] mm: PUD (1GB) THP implementation

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Zi Yan <ziy@nvidia.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	David Hildenbrand <david@kernel.org>
Cc: Rik van Riel <riel@surriel.com>,
	Usama Arif <usamaarif642@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, hannes@cmpxchg.org, shakeel.butt@linux.dev,
	kas@kernel.org, baohua@kernel.org, dev.jain@arm.com,
	baolin.wang@linux.alibaba.com, npache@redhat.com,
	Liam.Howlett@oracle.com, ryan.roberts@arm.com, vbabka@suse.cz,
	lance.yang@linux.dev, linux-kernel@vger.kernel.org,
	kernel-team@meta.com, Frank van der Linden <fvdl@google.com>
Subject: Re: [RFC 00/12] mm: PUD (1GB) THP implementation
Date: Mon, 02 Feb 2026 10:50:35 -0500	[thread overview]
Message-ID: <2B979847-CED0-41AE-AEB1-BEFB267B1E14@nvidia.com> (raw)
In-Reply-To: <7d246914-6077-4dc1-bbcf-8dabbe6183e7@lucifer.local>

On 2 Feb 2026, at 6:30, Lorenzo Stoakes wrote:

> On Sun, Feb 01, 2026 at 09:44:12PM -0500, Rik van Riel wrote:
>> On Sun, 2026-02-01 at 16:50 -0800, Usama Arif wrote:
>>>
>>> 1. Static Reservation: hugetlbfs requires pre-allocating huge pages
>>> at boot
>>>    or runtime, taking memory away. This requires capacity planning,
>>>    administrative overhead, and makes workload orchastration much
>>> much more
>>>    complex, especially colocating with workloads that don't use
>>> hugetlbfs.
>>>
>> To address the obvious objection "but how could we
>> possibly allocate 1GB huge pages while the workload
>> is running?", I am planning to pick up the CMA balancing
>> patch series (thank you, Frank) and get that in an
>> upstream ready shape soon.
>>
>> https://lkml.org/2025/9/15/1735
>
> That link doesn't work?
>
> Did a quick search for CMA balancing on lore, couldn't find anything, could you
> provide a lore link?

https://lwn.net/Articles/1038263/

>
>>
>> That patch set looks like another case where no
>> amount of internal testing will find every single
>> corner case, and we'll probably just want to
>> merge it upstream, deploy it experimentally, and
>> aggressively deal with anything that might pop up.
>
> I'm not really in favour of this kind of approach. There's plenty of things that
> were considered 'temporary' upstream that became rather permanent :)
>
> Maybe we can't cover all corner-cases, but we need to make sure whatever we do
> send upstream is maintainable, conceptually sensible and doesn't paint us into
> any corners, etc.
>
>>
>> With CMA balancing, it would be possibly to just
>> have half (or even more) of system memory for
>> movable allocations only, which would make it possible
>> to allocate 1GB huge pages dynamically.
>
> Could you expand on that?

I also would like to hear David’s opinion on using CMA for 1GB THP.
He did not like it[1] when I posted my patch back in 2020, but it has
been more than 5 years. :)

The other direction I explored is to get 1GB THP from buddy allocator.
That means we need to:
1. bump MAX_PAGE_ORDER to 18 or make it a runtime variable so that only 1GB
   THP users need to bump it,
2. handle cross memory section PFN merge in buddy allocator,
3. improve anti-fragmentation mechanism for 1GB range compaction.

1 is easier-ish[2]. I have not looked into 2 and 3 much yet.

[1] https://lore.kernel.org/all/52bc2d5d-eb8a-83de-1c93-abd329132d58@redhat.com/
[2] https://lore.kernel.org/all/20210805190253.2795604-1-zi.yan@sent.com/


Best Regards,
Yan, Zi

next prev parent reply	other threads:[~2026-02-02 15:50 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-02  0:50 Usama Arif
2026-02-02  0:50 ` [RFC 01/12] mm: add PUD THP ptdesc and rmap support Usama Arif
2026-02-02 10:44   ` Kiryl Shutsemau
2026-02-02 16:01     ` Zi Yan
2026-02-03 22:07       ` Usama Arif
2026-02-05  4:17         ` Matthew Wilcox
2026-02-05  4:21           ` Matthew Wilcox
2026-02-05  5:13             ` Usama Arif
2026-02-05 17:40               ` David Hildenbrand (Arm)
2026-02-05 18:05                 ` Usama Arif
2026-02-05 18:11                   ` Usama Arif
2026-02-02 12:15   ` Lorenzo Stoakes
2026-02-04  7:38     ` Usama Arif
2026-02-04 12:55       ` Lorenzo Stoakes
2026-02-05  6:40         ` Usama Arif
2026-02-02  0:50 ` [RFC 02/12] mm/thp: add mTHP stats infrastructure for PUD THP Usama Arif
2026-02-02 11:56   ` Lorenzo Stoakes
2026-02-05  5:53     ` Usama Arif
2026-02-02  0:50 ` [RFC 03/12] mm: thp: add PUD THP allocation and fault handling Usama Arif
2026-02-02  0:50 ` [RFC 04/12] mm: thp: implement PUD THP split to PTE level Usama Arif
2026-02-02  0:50 ` [RFC 05/12] mm: thp: add reclaim and migration support for PUD THP Usama Arif
2026-02-02  0:50 ` [RFC 06/12] selftests/mm: add PUD THP basic allocation test Usama Arif
2026-02-02  0:50 ` [RFC 07/12] selftests/mm: add PUD THP read/write access test Usama Arif
2026-02-02  0:50 ` [RFC 08/12] selftests/mm: add PUD THP fork COW test Usama Arif
2026-02-02  0:50 ` [RFC 09/12] selftests/mm: add PUD THP partial munmap test Usama Arif
2026-02-02  0:50 ` [RFC 10/12] selftests/mm: add PUD THP mprotect split test Usama Arif
2026-02-02  0:50 ` [RFC 11/12] selftests/mm: add PUD THP reclaim test Usama Arif
2026-02-02  0:50 ` [RFC 12/12] selftests/mm: add PUD THP migration test Usama Arif
2026-02-02  2:44 ` [RFC 00/12] mm: PUD (1GB) THP implementation Rik van Riel
2026-02-02 11:30   ` Lorenzo Stoakes
2026-02-02 15:50     ` Zi Yan [this message]
2026-02-04 10:56       ` Lorenzo Stoakes
2026-02-05 11:29         ` David Hildenbrand (arm)
2026-02-05 11:22       ` David Hildenbrand (arm)
2026-02-02  4:00 ` Matthew Wilcox
2026-02-02  9:06   ` David Hildenbrand (arm)
2026-02-03 21:11     ` Usama Arif
2026-02-02 11:20 ` Lorenzo Stoakes
2026-02-04  1:00   ` Usama Arif
2026-02-04 11:08     ` Lorenzo Stoakes
2026-02-04 11:50       ` Dev Jain
2026-02-04 12:01         ` Dev Jain
2026-02-05  6:08       ` Usama Arif
2026-02-02 16:24 ` Zi Yan
2026-02-03 23:29   ` Usama Arif
2026-02-04  0:08     ` Frank van der Linden
2026-02-05  5:46       ` Usama Arif
2026-02-05 18:07     ` Zi Yan
2026-02-07 23:22       ` Usama Arif

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2B979847-CED0-41AE-AEB1-BEFB267B1E14@nvidia.com \
    --to=ziy@nvidia.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=fvdl@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=kas@kernel.org \
    --cc=kernel-team@meta.com \
    --cc=lance.yang@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=npache@redhat.com \
    --cc=riel@surriel.com \
    --cc=ryan.roberts@arm.com \
    --cc=shakeel.butt@linux.dev \
    --cc=usamaarif642@gmail.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox