linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Muchun Song <muchun.song@linux.dev>
To: Kiryl Shutsemau <kas@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@kernel.org>,
	Oscar Salvador <osalvador@suse.de>,
	Mike Rapoport <rppt@kernel.org>, Vlastimil Babka <vbabka@suse.cz>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Matthew Wilcox <willy@infradead.org>, Zi Yan <ziy@nvidia.com>,
	Baoquan He <bhe@redhat.com>, Michal Hocko <mhocko@suse.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Jonathan Corbet <corbet@lwn.net>,
	Usama Arif <usamaarif642@gmail.com>,
	kernel-team@meta.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org
Subject: Re: [PATCH 00/11] mm/hugetlb: Eliminate fake head pages from vmemmap optimization
Date: Wed, 10 Dec 2025 11:39:24 +0800	[thread overview]
Message-ID: <6396CF70-E10F-4939-8E38-C58BE5BF6F91@linux.dev> (raw)
In-Reply-To: <m63ub6lxljw7m2mmc3ovbsyfurl7hp4cvx27tmwelcxxrra5m3@eva5tqcdjxtn>



> On Dec 9, 2025, at 22:44, Kiryl Shutsemau <kas@kernel.org> wrote:
> 
> On Tue, Dec 09, 2025 at 02:22:28PM +0800, Muchun Song wrote:
>> The prerequisite is that the starting address of vmemmap must be aligned to
>> 16MB boundaries (for 1GB huge pages). Right? We should add some checks
>> somewhere to guarantee this (not compile time but at runtime like for KASLR).
> 
> I have hard time finding the right spot to put the check.
> 
> I considered something like the patch below, but it is probably too late
> if we boot preallocating huge pages.
> 
> I will dig more later, but if you have any suggestions, I would
> appreciate.

If you opt to record the mask information, then even when HVO is
disabled compound_head will still compute the head-page address
by means of the mask. Consequently this constraint must hold for
**every** compound page.  

Therefore adding your code in hugetlb_vmemmap.c is not appropriate:
that file only turns HVO off, yet the calculation remains broken
for all other large compound pages.

From MAX_FOLIO_ORDER we know that folio_alloc_gigantic() can allocate
at most 16 GB of physically contiguous memory. We must therefore
guarantee that the vmemmap area starts on an address aligned to at
least 256 MB.

When KASLR is disabled the vmemmap base is normally fixed by a
macro, so the check can be done at compile time; when KASLR is enabled
we have to ensure that the randomly chosen offset is a multiple
of 256 MB. These two spots are, in my view, the places that need
to be changed.

Moreover, this approach requires the virtual addresses of struct
page (possibly spanning sections) to be contiguous, so the method is
valid **only** under CONFIG_SPARSEMEM_VMEMMAP.

Also, when I skimmed through the overall patch yesterday, one detail
caught my eye: the shared tail page is **not** "per hstate"; it is
"per hstate, per zone, per node", because the zone and node
information is encoded in the tail page’s flags field. We should make
sure both page_to_nid() and page_zone() work properly.

Muchun,
Thanks.

> 
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index 04a211a146a0..971558184587 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -886,6 +886,14 @@ static int __init hugetlb_vmemmap_init(void)
> BUILD_BUG_ON(__NR_USED_SUBPAGE > HUGETLB_VMEMMAP_RESERVE_PAGES);
> 
> 	for_each_hstate(h) {
> + 		unsigned long size = huge_page_size(h) / sizeof(struct page);
> +
> + 		/* vmemmap is expected to be naturally aligned to page size */
> + 		if (WARN_ON_ONCE(!IS_ALIGNED((unsigned long)vmemmap, size))) {
> + 			vmemmap_optimize_enabled = false;
> + 			continue;
> + 		}
> +
> 		if (hugetlb_vmemmap_optimizable(h)) {
> 			register_sysctl_init("vm", hugetlb_vmemmap_sysctls);
> 			break;
> -- 
>  Kiryl Shutsemau / Kirill A. Shutemov



  reply	other threads:[~2025-12-10  3:40 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-05 19:43 Kiryl Shutsemau
2025-12-05 19:43 ` [PATCH 01/11] mm: Change the interface of prep_compound_tail() Kiryl Shutsemau
2025-12-05 21:49   ` Usama Arif
2025-12-05 22:10     ` Kiryl Shutsemau
2025-12-05 22:15       ` Usama Arif
2025-12-05 19:43 ` [PATCH 02/11] mm: Rename the 'compound_head' field in the 'struct page' to 'compound_info' Kiryl Shutsemau
2025-12-05 19:43 ` [PATCH 03/11] mm: Move set/clear_compound_head() to compound_head() Kiryl Shutsemau
2025-12-05 19:43 ` [PATCH 04/11] mm: Rework compound_head() for power-of-2 sizeof(struct page) Kiryl Shutsemau
2025-12-06  0:25   ` Usama Arif
2025-12-06 16:29     ` Kiryl Shutsemau
2025-12-06 17:36       ` Usama Arif
2025-12-05 19:43 ` [PATCH 05/11] mm/hugetlb: Refactor code around vmemmap_walk Kiryl Shutsemau
2025-12-06 16:42   ` Usama Arif
2025-12-08 10:30     ` Kiryl Shutsemau
2025-12-05 19:43 ` [PATCH 06/11] mm/hugetlb: Remove fake head pages Kiryl Shutsemau
2025-12-06 17:03   ` Usama Arif
2025-12-08 10:40     ` Kiryl Shutsemau
2025-12-05 19:43 ` [PATCH 07/11] mm: Drop fake head checks and fix a race condition Kiryl Shutsemau
2025-12-06 17:27   ` Usama Arif
2025-12-08 10:48     ` Kiryl Shutsemau
2025-12-05 19:43 ` [PATCH 08/11] hugetlb: Remove VMEMMAP_SYNCHRONIZE_RCU Kiryl Shutsemau
2025-12-05 19:43 ` [PATCH 09/11] mm/hugetlb: Remove hugetlb_optimize_vmemmap_key static key Kiryl Shutsemau
2025-12-05 19:43 ` [PATCH 10/11] mm: Remove the branch from compound_head() Kiryl Shutsemau
2025-12-05 19:43 ` [PATCH 11/11] hugetlb: Update vmemmap_dedup.rst Kiryl Shutsemau
2025-12-05 20:16 ` [PATCH 00/11] mm/hugetlb: Eliminate fake head pages from vmemmap optimization David Hildenbrand (Red Hat)
2025-12-05 20:33   ` Kiryl Shutsemau
2025-12-05 20:44     ` David Hildenbrand (Red Hat)
2025-12-05 20:54       ` Kiryl Shutsemau
2025-12-05 21:34         ` David Hildenbrand (Red Hat)
2025-12-05 21:41           ` Kiryl Shutsemau
2025-12-06 17:47             ` Usama Arif
2025-12-08  9:53               ` David Hildenbrand (Red Hat)
2025-12-08  8:51             ` David Hildenbrand (Red Hat)
2025-12-09  6:22 ` Muchun Song
2025-12-09 14:44   ` Kiryl Shutsemau
2025-12-10  3:39     ` Muchun Song [this message]
2025-12-09 18:20 ` Frank van der Linden

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6396CF70-E10F-4939-8E38-C58BE5BF6F91@linux.dev \
    --to=muchun.song@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=corbet@lwn.net \
    --cc=david@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kas@kernel.org \
    --cc=kernel-team@meta.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=osalvador@suse.de \
    --cc=rppt@kernel.org \
    --cc=usamaarif642@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox