Re: [RFC 00/12] mm: PUD (1GB) THP implementation

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Zi Yan <ziy@nvidia.com>
Cc: David Hildenbrand <david@kernel.org>,
	Rik van Riel <riel@surriel.com>,
	Usama Arif <usamaarif642@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, hannes@cmpxchg.org, shakeel.butt@linux.dev,
	kas@kernel.org, baohua@kernel.org, dev.jain@arm.com,
	baolin.wang@linux.alibaba.com, npache@redhat.com,
	Liam.Howlett@oracle.com, ryan.roberts@arm.com, vbabka@suse.cz,
	lance.yang@linux.dev, linux-kernel@vger.kernel.org,
	kernel-team@meta.com, Frank van der Linden <fvdl@google.com>
Subject: Re: [RFC 00/12] mm: PUD (1GB) THP implementation
Date: Wed, 4 Feb 2026 10:56:14 +0000	[thread overview]
Message-ID: <9c1809f0-8c8b-4eb1-9ec8-6c00fe3097f1@lucifer.local> (raw)
In-Reply-To: <2B979847-CED0-41AE-AEB1-BEFB267B1E14@nvidia.com>

On Mon, Feb 02, 2026 at 10:50:35AM -0500, Zi Yan wrote:
> On 2 Feb 2026, at 6:30, Lorenzo Stoakes wrote:
>
> > On Sun, Feb 01, 2026 at 09:44:12PM -0500, Rik van Riel wrote:
> >> On Sun, 2026-02-01 at 16:50 -0800, Usama Arif wrote:
> >>>
> >>> 1. Static Reservation: hugetlbfs requires pre-allocating huge pages
> >>> at boot
> >>>    or runtime, taking memory away. This requires capacity planning,
> >>>    administrative overhead, and makes workload orchastration much
> >>> much more
> >>>    complex, especially colocating with workloads that don't use
> >>> hugetlbfs.
> >>>
> >> To address the obvious objection "but how could we
> >> possibly allocate 1GB huge pages while the workload
> >> is running?", I am planning to pick up the CMA balancing
> >> patch series (thank you, Frank) and get that in an
> >> upstream ready shape soon.
> >>
> >> https://lkml.org/2025/9/15/1735
> >
> > That link doesn't work?
> >
> > Did a quick search for CMA balancing on lore, couldn't find anything, could you
> > provide a lore link?
>
> https://lwn.net/Articles/1038263/
>
> >
> >>
> >> That patch set looks like another case where no
> >> amount of internal testing will find every single
> >> corner case, and we'll probably just want to
> >> merge it upstream, deploy it experimentally, and
> >> aggressively deal with anything that might pop up.
> >
> > I'm not really in favour of this kind of approach. There's plenty of things that
> > were considered 'temporary' upstream that became rather permanent :)
> >
> > Maybe we can't cover all corner-cases, but we need to make sure whatever we do
> > send upstream is maintainable, conceptually sensible and doesn't paint us into
> > any corners, etc.
> >
> >>
> >> With CMA balancing, it would be possibly to just
> >> have half (or even more) of system memory for
> >> movable allocations only, which would make it possible
> >> to allocate 1GB huge pages dynamically.
> >
> > Could you expand on that?
>
> I also would like to hear David’s opinion on using CMA for 1GB THP.
> He did not like it[1] when I posted my patch back in 2020, but it has
> been more than 5 years. :)

Yes please David :)

I find the idea of using the CMA for this a bit gross. And I fear we're
essentially expanding the hacks for DAX to everyone.

Again I really feel that we should be tackling technical debt here, rather
than adding features on shaky foundations and just making things worse.

We are inundated with series-after-series for THP trying to add features
but really not very many that are tackling this debt, and I think it's time
to get firmer about that.

>
> The other direction I explored is to get 1GB THP from buddy allocator.
> That means we need to:
> 1. bump MAX_PAGE_ORDER to 18 or make it a runtime variable so that only 1GB
>    THP users need to bump it,

Would we need to bump the page block size too to stand more of a chance of
avoiding fragmentation?

Doing that though would result in reserves being way higher and thus more
memory used and we'd be in the territory of the unresolved issues with 64
KB page size kernels :)

> 2. handle cross memory section PFN merge in buddy allocator,

Ugh god...

> 3. improve anti-fragmentation mechanism for 1GB range compaction.

I think we'd really need something like this. Obviously there's the series
Rik refers to.

I mean CMA itself feels like a hack, though efforts are being made to at
least make it more robust (series mentioned, also the guaranteed CMA stuff
from Suren).

>
> 1 is easier-ish[2]. I have not looked into 2 and 3 much yet.
>
> [1] https://lore.kernel.org/all/52bc2d5d-eb8a-83de-1c93-abd329132d58@redhat.com/
> [2] https://lore.kernel.org/all/20210805190253.2795604-1-zi.yan@sent.com/
>
>
> Best Regards,
> Yan, Zi

Cheers, Lorenzo

next prev parent reply	other threads:[~2026-02-04 10:56 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-02  0:50 Usama Arif
2026-02-02  0:50 ` [RFC 01/12] mm: add PUD THP ptdesc and rmap support Usama Arif
2026-02-02 10:44   ` Kiryl Shutsemau
2026-02-02 16:01     ` Zi Yan
2026-02-03 22:07       ` Usama Arif
2026-02-05  4:17         ` Matthew Wilcox
2026-02-05  4:21           ` Matthew Wilcox
2026-02-05  5:13             ` Usama Arif
2026-02-05 17:40               ` David Hildenbrand (Arm)
2026-02-05 18:05                 ` Usama Arif
2026-02-05 18:11                   ` Usama Arif
2026-02-02 12:15   ` Lorenzo Stoakes
2026-02-04  7:38     ` Usama Arif
2026-02-04 12:55       ` Lorenzo Stoakes
2026-02-05  6:40         ` Usama Arif
2026-02-02  0:50 ` [RFC 02/12] mm/thp: add mTHP stats infrastructure for PUD THP Usama Arif
2026-02-02 11:56   ` Lorenzo Stoakes
2026-02-05  5:53     ` Usama Arif
2026-02-02  0:50 ` [RFC 03/12] mm: thp: add PUD THP allocation and fault handling Usama Arif
2026-02-02  0:50 ` [RFC 04/12] mm: thp: implement PUD THP split to PTE level Usama Arif
2026-02-02  0:50 ` [RFC 05/12] mm: thp: add reclaim and migration support for PUD THP Usama Arif
2026-02-02  0:50 ` [RFC 06/12] selftests/mm: add PUD THP basic allocation test Usama Arif
2026-02-02  0:50 ` [RFC 07/12] selftests/mm: add PUD THP read/write access test Usama Arif
2026-02-02  0:50 ` [RFC 08/12] selftests/mm: add PUD THP fork COW test Usama Arif
2026-02-02  0:50 ` [RFC 09/12] selftests/mm: add PUD THP partial munmap test Usama Arif
2026-02-02  0:50 ` [RFC 10/12] selftests/mm: add PUD THP mprotect split test Usama Arif
2026-02-02  0:50 ` [RFC 11/12] selftests/mm: add PUD THP reclaim test Usama Arif
2026-02-02  0:50 ` [RFC 12/12] selftests/mm: add PUD THP migration test Usama Arif
2026-02-02  2:44 ` [RFC 00/12] mm: PUD (1GB) THP implementation Rik van Riel
2026-02-02 11:30   ` Lorenzo Stoakes
2026-02-02 15:50     ` Zi Yan
2026-02-04 10:56       ` Lorenzo Stoakes [this message]
2026-02-05 11:29         ` David Hildenbrand (arm)
2026-02-05 11:22       ` David Hildenbrand (arm)
2026-02-02  4:00 ` Matthew Wilcox
2026-02-02  9:06   ` David Hildenbrand (arm)
2026-02-03 21:11     ` Usama Arif
2026-02-02 11:20 ` Lorenzo Stoakes
2026-02-04  1:00   ` Usama Arif
2026-02-04 11:08     ` Lorenzo Stoakes
2026-02-04 11:50       ` Dev Jain
2026-02-04 12:01         ` Dev Jain
2026-02-05  6:08       ` Usama Arif
2026-02-02 16:24 ` Zi Yan
2026-02-03 23:29   ` Usama Arif
2026-02-04  0:08     ` Frank van der Linden
2026-02-05  5:46       ` Usama Arif
2026-02-05 18:07     ` Zi Yan
2026-02-07 23:22       ` Usama Arif

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9c1809f0-8c8b-4eb1-9ec8-6c00fe3097f1@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=fvdl@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=kas@kernel.org \
    --cc=kernel-team@meta.com \
    --cc=lance.yang@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npache@redhat.com \
    --cc=riel@surriel.com \
    --cc=ryan.roberts@arm.com \
    --cc=shakeel.butt@linux.dev \
    --cc=usamaarif642@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox