linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Muchun Song <muchun.song@linux.dev>
To: Kiryl Shutsemau <kas@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@kernel.org>,
	Oscar Salvador <osalvador@suse.de>,
	Mike Rapoport <rppt@kernel.org>, Vlastimil Babka <vbabka@suse.cz>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Matthew Wilcox <willy@infradead.org>, Zi Yan <ziy@nvidia.com>,
	Baoquan He <bhe@redhat.com>, Michal Hocko <mhocko@suse.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Jonathan Corbet <corbet@lwn.net>,
	Usama Arif <usamaarif642@gmail.com>,
	kernel-team@meta.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org
Subject: Re: [PATCH 00/11] mm/hugetlb: Eliminate fake head pages from vmemmap optimization
Date: Thu, 11 Dec 2025 11:45:13 +0800	[thread overview]
Message-ID: <BAF36B4D-0047-48C4-9CB8-C8566722A79B@linux.dev> (raw)
In-Reply-To: <6396CF70-E10F-4939-8E38-C58BE5BF6F91@linux.dev>



> On Dec 10, 2025, at 11:39, Muchun Song <muchun.song@linux.dev> wrote:
> 
>> On Dec 9, 2025, at 22:44, Kiryl Shutsemau <kas@kernel.org> wrote:
>> 
>> On Tue, Dec 09, 2025 at 02:22:28PM +0800, Muchun Song wrote:
>>> The prerequisite is that the starting address of vmemmap must be aligned to
>>> 16MB boundaries (for 1GB huge pages). Right? We should add some checks
>>> somewhere to guarantee this (not compile time but at runtime like for KASLR).
>> 
>> I have hard time finding the right spot to put the check.
>> 
>> I considered something like the patch below, but it is probably too late
>> if we boot preallocating huge pages.
>> 
>> I will dig more later, but if you have any suggestions, I would
>> appreciate.
> 
> If you opt to record the mask information, then even when HVO is
> disabled compound_head will still compute the head-page address
> by means of the mask. Consequently this constraint must hold for
> **every** compound page.  
> 
> Therefore adding your code in hugetlb_vmemmap.c is not appropriate:
> that file only turns HVO off, yet the calculation remains broken
> for all other large compound pages.
> 
> From MAX_FOLIO_ORDER we know that folio_alloc_gigantic() can allocate
> at most 16 GB of physically contiguous memory. We must therefore
> guarantee that the vmemmap area starts on an address aligned to at
> least 256 MB.
> 
> When KASLR is disabled the vmemmap base is normally fixed by a
> macro, so the check can be done at compile time; when KASLR is enabled
> we have to ensure that the randomly chosen offset is a multiple
> of 256 MB. These two spots are, in my view, the places that need
> to be changed.
> 
> Moreover, this approach requires the virtual addresses of struct
> page (possibly spanning sections) to be contiguous, so the method is
> valid **only** under CONFIG_SPARSEMEM_VMEMMAP.

This is no longer an issue, because with nth_page removed (I only
just found out), a folio can no longer span multiple sections even
when !CONFIG_SPARSEMEM_VMEMMAP.

> 
> Also, when I skimmed through the overall patch yesterday, one detail
> caught my eye: the shared tail page is **not** "per hstate"; it is
> "per hstate, per zone, per node", because the zone and node
> information is encoded in the tail page’s flags field. We should make
> sure both page_to_nid() and page_zone() work properly.
> 
> Muchun,
> Thanks.
> 
>> 
>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
>> index 04a211a146a0..971558184587 100644
>> --- a/mm/hugetlb_vmemmap.c
>> +++ b/mm/hugetlb_vmemmap.c
>> @@ -886,6 +886,14 @@ static int __init hugetlb_vmemmap_init(void)
>> BUILD_BUG_ON(__NR_USED_SUBPAGE > HUGETLB_VMEMMAP_RESERVE_PAGES);
>> 
>> 	for_each_hstate(h) {
>> +  		unsigned long size = huge_page_size(h) / sizeof(struct page);
>> +
>> +  		/* vmemmap is expected to be naturally aligned to page size */
>> +  		if (WARN_ON_ONCE(!IS_ALIGNED((unsigned long)vmemmap, size))) {
>> +  			vmemmap_optimize_enabled = false;
>> +  			continue;
>> +  		}
>> +
>> 		if (hugetlb_vmemmap_optimizable(h)) {
>> 			register_sysctl_init("vm", hugetlb_vmemmap_sysctls);
>> 			break;
>> -- 
>> Kiryl Shutsemau / Kirill A. Shutemov




  reply	other threads:[~2025-12-11  3:45 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-05 19:43 Kiryl Shutsemau
2025-12-05 19:43 ` [PATCH 01/11] mm: Change the interface of prep_compound_tail() Kiryl Shutsemau
2025-12-05 21:49   ` Usama Arif
2025-12-05 22:10     ` Kiryl Shutsemau
2025-12-05 22:15       ` Usama Arif
2025-12-05 19:43 ` [PATCH 02/11] mm: Rename the 'compound_head' field in the 'struct page' to 'compound_info' Kiryl Shutsemau
2025-12-05 19:43 ` [PATCH 03/11] mm: Move set/clear_compound_head() to compound_head() Kiryl Shutsemau
2025-12-05 19:43 ` [PATCH 04/11] mm: Rework compound_head() for power-of-2 sizeof(struct page) Kiryl Shutsemau
2025-12-06  0:25   ` Usama Arif
2025-12-06 16:29     ` Kiryl Shutsemau
2025-12-06 17:36       ` Usama Arif
2025-12-05 19:43 ` [PATCH 05/11] mm/hugetlb: Refactor code around vmemmap_walk Kiryl Shutsemau
2025-12-06 16:42   ` Usama Arif
2025-12-08 10:30     ` Kiryl Shutsemau
2025-12-05 19:43 ` [PATCH 06/11] mm/hugetlb: Remove fake head pages Kiryl Shutsemau
2025-12-06 17:03   ` Usama Arif
2025-12-08 10:40     ` Kiryl Shutsemau
2025-12-05 19:43 ` [PATCH 07/11] mm: Drop fake head checks and fix a race condition Kiryl Shutsemau
2025-12-06 17:27   ` Usama Arif
2025-12-08 10:48     ` Kiryl Shutsemau
2025-12-05 19:43 ` [PATCH 08/11] hugetlb: Remove VMEMMAP_SYNCHRONIZE_RCU Kiryl Shutsemau
2025-12-05 19:43 ` [PATCH 09/11] mm/hugetlb: Remove hugetlb_optimize_vmemmap_key static key Kiryl Shutsemau
2025-12-05 19:43 ` [PATCH 10/11] mm: Remove the branch from compound_head() Kiryl Shutsemau
2025-12-05 19:43 ` [PATCH 11/11] hugetlb: Update vmemmap_dedup.rst Kiryl Shutsemau
2025-12-05 20:16 ` [PATCH 00/11] mm/hugetlb: Eliminate fake head pages from vmemmap optimization David Hildenbrand (Red Hat)
2025-12-05 20:33   ` Kiryl Shutsemau
2025-12-05 20:44     ` David Hildenbrand (Red Hat)
2025-12-05 20:54       ` Kiryl Shutsemau
2025-12-05 21:34         ` David Hildenbrand (Red Hat)
2025-12-05 21:41           ` Kiryl Shutsemau
2025-12-06 17:47             ` Usama Arif
2025-12-08  9:53               ` David Hildenbrand (Red Hat)
2025-12-08  8:51             ` David Hildenbrand (Red Hat)
2025-12-09  6:22 ` Muchun Song
2025-12-09 14:44   ` Kiryl Shutsemau
2025-12-10  3:39     ` Muchun Song
2025-12-11  3:45       ` Muchun Song [this message]
2025-12-11 15:08       ` Kiryl Shutsemau
2025-12-12  6:45         ` Muchun Song
2025-12-09 18:20 ` Frank van der Linden
2025-12-11 15:02   ` Kiryl Shutsemau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BAF36B4D-0047-48C4-9CB8-C8566722A79B@linux.dev \
    --to=muchun.song@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=corbet@lwn.net \
    --cc=david@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kas@kernel.org \
    --cc=kernel-team@meta.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=osalvador@suse.de \
    --cc=rppt@kernel.org \
    --cc=usamaarif642@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox