From: Muchun Song <muchun.song@linux.dev>
To: Kiryl Shutsemau <kas@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Oscar Salvador <osalvador@suse.de>,
Mike Rapoport <rppt@kernel.org>, Vlastimil Babka <vbabka@suse.cz>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Matthew Wilcox <willy@infradead.org>, Zi Yan <ziy@nvidia.com>,
Baoquan He <bhe@redhat.com>, Michal Hocko <mhocko@suse.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Jonathan Corbet <corbet@lwn.net>,
Usama Arif <usamaarif642@gmail.com>,
kernel-team@meta.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org
Subject: Re: [PATCH 00/11] mm/hugetlb: Eliminate fake head pages from vmemmap optimization
Date: Fri, 12 Dec 2025 14:45:42 +0800 [thread overview]
Message-ID: <4707A7AE-B8A9-4A56-B292-E590E91A9980@linux.dev> (raw)
In-Reply-To: <5twlonzi3rooao7gyp5g4tyaeevemcx6qhuf4xvdtsi2cykuo4@wrhxmxz63wvn>
> On Dec 11, 2025, at 23:08, Kiryl Shutsemau <kas@kernel.org> wrote:
>
> On Wed, Dec 10, 2025 at 11:39:24AM +0800, Muchun Song wrote:
>>
>>
>>> On Dec 9, 2025, at 22:44, Kiryl Shutsemau <kas@kernel.org> wrote:
>>>
>>> On Tue, Dec 09, 2025 at 02:22:28PM +0800, Muchun Song wrote:
>>>> The prerequisite is that the starting address of vmemmap must be aligned to
>>>> 16MB boundaries (for 1GB huge pages). Right? We should add some checks
>>>> somewhere to guarantee this (not compile time but at runtime like for KASLR).
>>>
>>> I have hard time finding the right spot to put the check.
>>>
>>> I considered something like the patch below, but it is probably too late
>>> if we boot preallocating huge pages.
>>>
>>> I will dig more later, but if you have any suggestions, I would
>>> appreciate.
>>
>> If you opt to record the mask information, then even when HVO is
>> disabled compound_head will still compute the head-page address
>> by means of the mask. Consequently this constraint must hold for
>> **every** compound page.
>>
>> Therefore adding your code in hugetlb_vmemmap.c is not appropriate:
>> that file only turns HVO off, yet the calculation remains broken
>> for all other large compound pages.
>>
>> From MAX_FOLIO_ORDER we know that folio_alloc_gigantic() can allocate
>> at most 16 GB of physically contiguous memory. We must therefore
>> guarantee that the vmemmap area starts on an address aligned to at
>> least 256 MB.
>>
>> When KASLR is disabled the vmemmap base is normally fixed by a
>> macro, so the check can be done at compile time; when KASLR is enabled
>> we have to ensure that the randomly chosen offset is a multiple
>> of 256 MB. These two spots are, in my view, the places that need
>> to be changed.
>>
>> Moreover, this approach requires the virtual addresses of struct
>> page (possibly spanning sections) to be contiguous, so the method is
>> valid **only** under CONFIG_SPARSEMEM_VMEMMAP.
>>
>> Also, when I skimmed through the overall patch yesterday, one detail
>> caught my eye: the shared tail page is **not** "per hstate"; it is
>> "per hstate, per zone, per node", because the zone and node
>> information is encoded in the tail page’s flags field. We should make
>> sure both page_to_nid() and page_zone() work properly.
>
> Right. Or we can slap compound_head() inside them.
At the same time, to keep users from accidentally passing compound_head()
a handcrafted-on-stack page struct (like snapshot_page()), Shall we add
a VM_BUG_ON() in compound_head() to validate that the page address falls
within the vmemmap range? Otherwise, compound_head() will return an invalid
head page struct (it is an address on the stack with arbitrary data).
>
> I stepped onto VM_BUG_ON_PAGE() in get_pfnblock_bitmap_bitidx().
> Workarounded with compound_head() for now.
I don’t see why you singled out get_pfnblock_bitmap_bitidx—what’s
special about that spot?
>
> I am not sure if we want to allocate them per-zone. Seems excessive.
Yes. If we could solve page_to_nid() and page_zonenum(), it does not
need to be per-zone.
> But per-node is reasonable.
Agree.
>
> --
> Kiryl Shutsemau / Kirill A. Shutemov
next prev parent reply other threads:[~2025-12-12 6:47 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-05 19:43 Kiryl Shutsemau
2025-12-05 19:43 ` [PATCH 01/11] mm: Change the interface of prep_compound_tail() Kiryl Shutsemau
2025-12-05 21:49 ` Usama Arif
2025-12-05 22:10 ` Kiryl Shutsemau
2025-12-05 22:15 ` Usama Arif
2025-12-05 19:43 ` [PATCH 02/11] mm: Rename the 'compound_head' field in the 'struct page' to 'compound_info' Kiryl Shutsemau
2025-12-05 19:43 ` [PATCH 03/11] mm: Move set/clear_compound_head() to compound_head() Kiryl Shutsemau
2025-12-05 19:43 ` [PATCH 04/11] mm: Rework compound_head() for power-of-2 sizeof(struct page) Kiryl Shutsemau
2025-12-06 0:25 ` Usama Arif
2025-12-06 16:29 ` Kiryl Shutsemau
2025-12-06 17:36 ` Usama Arif
2025-12-05 19:43 ` [PATCH 05/11] mm/hugetlb: Refactor code around vmemmap_walk Kiryl Shutsemau
2025-12-06 16:42 ` Usama Arif
2025-12-08 10:30 ` Kiryl Shutsemau
2025-12-05 19:43 ` [PATCH 06/11] mm/hugetlb: Remove fake head pages Kiryl Shutsemau
2025-12-06 17:03 ` Usama Arif
2025-12-08 10:40 ` Kiryl Shutsemau
2025-12-05 19:43 ` [PATCH 07/11] mm: Drop fake head checks and fix a race condition Kiryl Shutsemau
2025-12-06 17:27 ` Usama Arif
2025-12-08 10:48 ` Kiryl Shutsemau
2025-12-05 19:43 ` [PATCH 08/11] hugetlb: Remove VMEMMAP_SYNCHRONIZE_RCU Kiryl Shutsemau
2025-12-05 19:43 ` [PATCH 09/11] mm/hugetlb: Remove hugetlb_optimize_vmemmap_key static key Kiryl Shutsemau
2025-12-05 19:43 ` [PATCH 10/11] mm: Remove the branch from compound_head() Kiryl Shutsemau
2025-12-05 19:43 ` [PATCH 11/11] hugetlb: Update vmemmap_dedup.rst Kiryl Shutsemau
2025-12-05 20:16 ` [PATCH 00/11] mm/hugetlb: Eliminate fake head pages from vmemmap optimization David Hildenbrand (Red Hat)
2025-12-05 20:33 ` Kiryl Shutsemau
2025-12-05 20:44 ` David Hildenbrand (Red Hat)
2025-12-05 20:54 ` Kiryl Shutsemau
2025-12-05 21:34 ` David Hildenbrand (Red Hat)
2025-12-05 21:41 ` Kiryl Shutsemau
2025-12-06 17:47 ` Usama Arif
2025-12-08 9:53 ` David Hildenbrand (Red Hat)
2025-12-08 8:51 ` David Hildenbrand (Red Hat)
2025-12-09 6:22 ` Muchun Song
2025-12-09 14:44 ` Kiryl Shutsemau
2025-12-10 3:39 ` Muchun Song
2025-12-11 3:45 ` Muchun Song
2025-12-11 15:08 ` Kiryl Shutsemau
2025-12-12 6:45 ` Muchun Song [this message]
2025-12-09 18:20 ` Frank van der Linden
2025-12-11 15:02 ` Kiryl Shutsemau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4707A7AE-B8A9-4A56-B292-E590E91A9980@linux.dev \
--to=muchun.song@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=bhe@redhat.com \
--cc=corbet@lwn.net \
--cc=david@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=kas@kernel.org \
--cc=kernel-team@meta.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=osalvador@suse.de \
--cc=rppt@kernel.org \
--cc=usamaarif642@gmail.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox