From: Usama Arif <usamaarif642@gmail.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: ziy@nvidia.com, Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
linux-mm@kvack.org, hannes@cmpxchg.org, riel@surriel.com,
shakeel.butt@linux.dev, kas@kernel.org, baohua@kernel.org,
dev.jain@arm.com, baolin.wang@linux.alibaba.com,
npache@redhat.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com,
vbabka@suse.cz, lance.yang@linux.dev,
linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [RFC 00/12] mm: PUD (1GB) THP implementation
Date: Wed, 4 Feb 2026 22:08:10 -0800 [thread overview]
Message-ID: <075feda4-86ee-4521-8b92-914d24aee582@gmail.com> (raw)
In-Reply-To: <b07a8d54-75f2-4d77-838a-7454ad559cd2@lucifer.local>
On 04/02/2026 03:08, Lorenzo Stoakes wrote:
> On Tue, Feb 03, 2026 at 05:00:10PM -0800, Usama Arif wrote:
>>
>>
>> On 02/02/2026 03:20, Lorenzo Stoakes wrote:
>>> OK so this is somewhat unexpected :)
>>>
>>> It would have been nice to discuss it in the THP cabal or at a conference
>>> etc. so we could discuss approaches ahead of time. Communication is important,
>>> especially with major changes like this.
>>
>> Makes sense!
>>
>>>
>>> And PUD THP is especially problematic in that it requires pages that the page
>>> allocator can't give us, presumably you're doing something with CMA and... it's
>>> a whole kettle of fish.
>>
>> So we dont need CMA. It helps ofcourse, but we don't *need* it.
>> Its summarized in the first reply I gave to Zi in [1]:
>>
>>>
>>> It's also complicated by the fact we _already_ support it in the DAX, VFIO cases
>>> but it's kinda a weird sorta special case that we need to keep supporting.
>>>
>>> There's questions about how this will interact with khugepaged, MADV_COLLAPSE,
>>> mTHP (and really I want to see Nico's series land before we really consider
>>> this).
>>
>>
>> So I have numbers and experiments for page faults which are in the cover letter,
>> but not for khugepaged. I would be very surprised (although pleasently :)) if
>> khugepaged by some magic finds 262144 pages that meets all the khugepaged requirements
>> to collapse the page. In the basic infrastructure support which this series is adding,
>> I want to keep khugepaged collapse disabled for 1G pages. This is also the initial
>> approach that was taken in other mTHP sizes. We should go slow with 1G THPs.
>
> Yes we definitely want to limit to page faults for now.
>
> But keep in mind for that to be viable you'd surely need to update who gets
> appropriate alignment in __get_unmapped_area()... not read through series far
> enough to see so not sure if you update that though!
>
> I guess that'd be the sanest place to start, if an allocation _size_ is aligned
> 1 GB, then align the unmapped area _address_ to 1 GB for maximum chance of 1 GB
> fault-in.
Yeah this was definitely missing. I was manually aligning the fault address in selftest
and benchmarks with the trick used in other selftests
(((unsigned long)addr + PUD_SIZE - 1) & ~(PUD_SIZE - 1))
Thanks for pointing this out! This is basically what I wanted with the RFC, to find out
what I am missing and not testing. Will look into VFIO and DAX as you mentioned as well.
>
> Oh by the way I made some rough THP notes at
> https://publish.obsidian.md/mm/Transparent+Huge+Pages+(THP) which are helpful
> for reminding me about what does what where, useful for a top-down view of how
> things are now.
>
Thanks!
>>
>>>
>>> So overall, I want to be very cautious and SLOW here. So let's please not drop
>>> the RFC tag until David and I are ok with that?
>>>
>>> Also the THP code base is in _dire_ need of rework, and I don't really want to
>>> add major new features without us paying down some technical debt, to be honest.
>>>
>>> So let's proceed with caution, and treat this as a very early bit of
>>> experimental code.
>>>
>>> Thanks, Lorenzo
>>
>> Ack, yeah so this is mainly an RFC to discuss what the major design choices will be.
>> I got a kernel with selftests for allocation, memory integrity, fork, partial munmap,
>> mprotect, reclaim and migration passing and am running them with DEBUG_VM to make sure
>> we dont get the VM bugs/warnings and the numbers are good, so just wanted to share it
>> upstream and get your opinions! Basically try and trigger a discussion similar to what
>> Zi asked in [2]! And also if someone could point out if there is something fundamental
>> we are missing in this series.
>
> Well that's fair enough :)
>
> But do come to a THP cabal so we can chat, face-to-face (ok, digital face to
> digital face ;). It's usually a force-multiplier I find, esp. if multiple people
> have input which I think is the case here. We're friendly :)
Yes, Thanks for this! It would be really helpful to discuss in a call. I didn't
know there was a meeting but have requested details (date/time) in another thread.
>
> In any case, conversations are already kicking off so that's definitely positive!
>
> I think we will definitely get there with this at _some point_ but I would urge
> patience and also I really want to underline my desire for us in THP to start
> paying down some of this technical debt.
>
> I know people are already making efforts (Vernon, Luiz), and sorry that I've not
> been great at review recently (should be gradually increasing over time), but I
> feel that for large features to be added like this now we really do require some
> refactoring work before we take it.
>
Yes agreed! I will definitely need your and others guidance on what needs to be
properly refractored so that this fits well with the current code.
> We definitely need to rebase this once Nico's series lands (should do next
> cycle) and think about how it plays with this, I'm not sure if arm64 supports
> mTHP between PMD and PUD size (Dev? Do you know?) so maybe that one is moot, but
> in general want to make sure it plays nice.
>
Will do!
>>
>> Thanks for the reviews! Really do apprecaite it!
>
> No worries! :)
>
>>
>> [1] https://lore.kernel.org/all/20f92576-e932-435f-bb7b-de49eb84b012@gmail.com/#t
>> [2] https://lore.kernel.org/all/3561FD10-664D-42AA-8351-DE7D8D49D42E@nvidia.com/
>
> Cheers, Lorenzo
next prev parent reply other threads:[~2026-02-05 6:08 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-02 0:50 Usama Arif
2026-02-02 0:50 ` [RFC 01/12] mm: add PUD THP ptdesc and rmap support Usama Arif
2026-02-02 10:44 ` Kiryl Shutsemau
2026-02-02 16:01 ` Zi Yan
2026-02-03 22:07 ` Usama Arif
2026-02-05 4:17 ` Matthew Wilcox
2026-02-05 4:21 ` Matthew Wilcox
2026-02-05 5:13 ` Usama Arif
2026-02-05 17:40 ` David Hildenbrand (Arm)
2026-02-05 18:05 ` Usama Arif
2026-02-05 18:11 ` Usama Arif
2026-02-02 12:15 ` Lorenzo Stoakes
2026-02-04 7:38 ` Usama Arif
2026-02-04 12:55 ` Lorenzo Stoakes
2026-02-05 6:40 ` Usama Arif
2026-02-02 0:50 ` [RFC 02/12] mm/thp: add mTHP stats infrastructure for PUD THP Usama Arif
2026-02-02 11:56 ` Lorenzo Stoakes
2026-02-05 5:53 ` Usama Arif
2026-02-02 0:50 ` [RFC 03/12] mm: thp: add PUD THP allocation and fault handling Usama Arif
2026-02-02 0:50 ` [RFC 04/12] mm: thp: implement PUD THP split to PTE level Usama Arif
2026-02-02 0:50 ` [RFC 05/12] mm: thp: add reclaim and migration support for PUD THP Usama Arif
2026-02-02 0:50 ` [RFC 06/12] selftests/mm: add PUD THP basic allocation test Usama Arif
2026-02-02 0:50 ` [RFC 07/12] selftests/mm: add PUD THP read/write access test Usama Arif
2026-02-02 0:50 ` [RFC 08/12] selftests/mm: add PUD THP fork COW test Usama Arif
2026-02-02 0:50 ` [RFC 09/12] selftests/mm: add PUD THP partial munmap test Usama Arif
2026-02-02 0:50 ` [RFC 10/12] selftests/mm: add PUD THP mprotect split test Usama Arif
2026-02-02 0:50 ` [RFC 11/12] selftests/mm: add PUD THP reclaim test Usama Arif
2026-02-02 0:50 ` [RFC 12/12] selftests/mm: add PUD THP migration test Usama Arif
2026-02-02 2:44 ` [RFC 00/12] mm: PUD (1GB) THP implementation Rik van Riel
2026-02-02 11:30 ` Lorenzo Stoakes
2026-02-02 15:50 ` Zi Yan
2026-02-04 10:56 ` Lorenzo Stoakes
2026-02-05 11:29 ` David Hildenbrand (arm)
2026-02-05 11:22 ` David Hildenbrand (arm)
2026-02-02 4:00 ` Matthew Wilcox
2026-02-02 9:06 ` David Hildenbrand (arm)
2026-02-03 21:11 ` Usama Arif
2026-02-02 11:20 ` Lorenzo Stoakes
2026-02-04 1:00 ` Usama Arif
2026-02-04 11:08 ` Lorenzo Stoakes
2026-02-04 11:50 ` Dev Jain
2026-02-04 12:01 ` Dev Jain
2026-02-05 6:08 ` Usama Arif [this message]
2026-02-02 16:24 ` Zi Yan
2026-02-03 23:29 ` Usama Arif
2026-02-04 0:08 ` Frank van der Linden
2026-02-05 5:46 ` Usama Arif
2026-02-05 18:07 ` Zi Yan
2026-02-07 23:22 ` Usama Arif
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=075feda4-86ee-4521-8b92-914d24aee582@gmail.com \
--to=usamaarif642@gmail.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=hannes@cmpxchg.org \
--cc=kas@kernel.org \
--cc=kernel-team@meta.com \
--cc=lance.yang@linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=npache@redhat.com \
--cc=riel@surriel.com \
--cc=ryan.roberts@arm.com \
--cc=shakeel.butt@linux.dev \
--cc=vbabka@suse.cz \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox