linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Kiryl Shutsemau <kas@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Muchun Song <muchun.song@linux.dev>,
	Matthew Wilcox <willy@infradead.org>,
	Usama Arif <usamaarif642@gmail.com>,
	Frank van der Linden <fvdl@google.com>
Cc: Oscar Salvador <osalvador@suse.de>,
	Mike Rapoport <rppt@kernel.org>, Vlastimil Babka <vbabka@suse.cz>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Zi Yan <ziy@nvidia.com>, Baoquan He <bhe@redhat.com>,
	Michal Hocko <mhocko@suse.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Jonathan Corbet <corbet@lwn.net>,
	Huacai Chen <chenhuacai@kernel.org>,
	WANG Xuerui <kernel@xen0n.name>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Albert Ou <aou@eecs.berkeley.edu>,
	Alexandre Ghiti <alex@ghiti.fr>,
	kernel-team@meta.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	loongarch@lists.linux.dev, linux-riscv@lists.infradead.org
Subject: Re: [PATCHv6 11/17] mm/hugetlb: Remove fake head pages
Date: Fri, 6 Feb 2026 10:36:24 +0100	[thread overview]
Message-ID: <3fcbad05-bef2-486a-8d9b-7010a91c85b8@kernel.org> (raw)
In-Reply-To: <20260202155634.650837-12-kas@kernel.org>

On 2/2/26 16:56, Kiryl Shutsemau wrote:
> HugeTLB Vmemmap Optimization (HVO) reduces memory usage by freeing most
> vmemmap pages for huge pages and remapping the freed range to a single
> page containing the struct page metadata.
> 
> With the new mask-based compound_info encoding (for power-of-2 struct
> page sizes), all tail pages of the same order are now identical
> regardless of which compound page they belong to. This means the tail
> pages can be truly shared without fake heads.
> 
> Allocate a single page of initialized tail struct pages per NUMA node
> per order in the vmemmap_tails[] array in pglist_data. All huge pages of
> that order on the node share this tail page, mapped read-only into their
> vmemmap. The head page remains unique per huge page.
> 
> Redefine MAX_FOLIO_ORDER using ilog2(). The define has to produce a
> compile-constant as it is used to specify vmemmap_tail array size.
> For some reason, compiler is not able to solve get_order() at
> compile-time, but ilog2() works.
> 
> Avoid PUD_ORDER to define MAX_FOLIO_ORDER as it adds dependency to
> <linux/pgtable.h> which generates hard-to-break include loop.
> 
> This eliminates fake heads while maintaining the same memory savings,
> and simplifies compound_head() by removing fake head detection.
> 
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> ---

[...]

>   #define node_present_pages(nid)	(NODE_DATA(nid)->node_present_pages)
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index a39a301e08b9..688764c52c72 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -19,6 +19,7 @@
>   
>   #include <asm/tlbflush.h>
>   #include "hugetlb_vmemmap.h"
> +#include "internal.h"
>   
>   /**
>    * struct vmemmap_remap_walk - walk vmemmap page table
> @@ -505,6 +506,32 @@ static bool vmemmap_should_optimize_folio(const struct hstate *h, struct folio *
>   	return true;
>   }
>   
> +static struct page *vmemmap_get_tail(unsigned int order, int node)
> +{
> +	struct page *tail, *p;
> +	unsigned int idx;
> +
> +	idx = 

Could do

const unsigned int idx = order - VMEMMAP_TAIL_MIN_ORDER;

above.

> +	tail = READ_ONCE(NODE_DATA(node)->vmemmap_tails[idx]);
> +	if (tail)

Wondering if a likely() would be a good idea here. I guess we'll usually 
go through that fast path on a system that has been running for a bit.

> +		return tail;
> +
> +	tail = alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO, 0);
> +	if (!tail)
> +		return NULL;
> +
> +	p = page_to_virt(tail);
> +	for (int i = 0; i < PAGE_SIZE / sizeof(struct page); i++)
> +		prep_compound_tail(p + i, NULL, order);

This leaves all pageflags, refcount etc. set to 0, which is mostly 
expected for tail pages.

But, I would have expected something a bit more from 
__init_single_page() that initialized the page properly.

In particular:
* set_page_node(page, node), or how is page_to_nid() handled?
* atomic_set(&page->_mapcount, -1), to not indicate something odd to
   core-mm where we would suddenly have a page mapping for a hugetlb
   folio.

> +
> +	if (cmpxchg(&NODE_DATA(node)->vmemmap_tails[idx], NULL, tail)) {
> +		__free_page(tail);
> +		tail = READ_ONCE(NODE_DATA(node)->vmemmap_tails[idx]);
> +	}
> +
> +	return tail;
> +}

[...]

> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -378,16 +378,44 @@ void vmemmap_wrprotect_hvo(unsigned long addr, unsigned long end,
>   	}
>   }
>   
> -/*
> - * Populate vmemmap pages HVO-style. The first page contains the head
> - * page and needed tail pages, the other ones are mirrors of the first
> - * page.
> - */
> +static __meminit unsigned long vmemmap_get_tail(unsigned int order, int node)
> +{
> +	struct page *p, *tail;
> +	unsigned int idx;
> +
> +	BUG_ON(order < VMEMMAP_TAIL_MIN_ORDER);
> +	BUG_ON(order > MAX_FOLIO_ORDER);
> +
> +	idx = order - VMEMMAP_TAIL_MIN_ORDER;
> +	tail = NODE_DATA(node)->vmemmap_tails[idx];
> +	if (tail)
> +		return page_to_pfn(tail);
> +
> +	p = vmemmap_alloc_block_zero(PAGE_SIZE, node);
> +	if (!p)
> +		return 0;
> +
> +	for (int i = 0; i < PAGE_SIZE / sizeof(struct page); i++)
> +		prep_compound_tail(p + i, NULL, order);
> +
> +	tail = virt_to_page(p);
> +	NODE_DATA(node)->vmemmap_tails[idx] = tail;
> +
> +	return page_to_pfn(tail);
> +}
> +
>   int __meminit vmemmap_populate_hvo(unsigned long addr, unsigned long end,
>   				       int node, unsigned long headsize)
>   {
> +	unsigned long maddr, len, tail_pfn;
> +	unsigned int order;
>   	pte_t *pte;
> -	unsigned long maddr;
> +
> +	len = end - addr;
> +	order = ilog2(len * sizeof(struct page) / PAGE_SIZE);


Could initialize them as const above.

But I am wondering whether it shouldn't be the caller that provides this 
to use? After all, it's all hugetlb code that allocates and prepares that.

Then we could maybe change

#ifdef·CONFIG_SPARSEMEM_VMEMMAP
	struct·page·*vmemmap_tails[NR_VMEMMAP_TAILS];
#endif

to be HVO-only.

-- 
Cheers,

David


  parent reply	other threads:[~2026-02-06  9:36 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-02 15:56 [PATCHv6 00/17] mm: Eliminate fake head pages from vmemmap optimization Kiryl Shutsemau
2026-02-02 15:56 ` [PATCHv6 01/17] mm: Move MAX_FOLIO_ORDER definition to mmzone.h Kiryl Shutsemau
2026-02-07 20:20   ` Usama Arif
2026-02-10 15:01   ` Vlastimil Babka
2026-02-02 15:56 ` [PATCHv6 02/17] mm: Change the interface of prep_compound_tail() Kiryl Shutsemau
2026-02-04 16:14   ` David Hildenbrand (arm)
2026-02-05 11:35     ` Kiryl Shutsemau
2026-02-05 11:58       ` David Hildenbrand (arm)
2026-02-10 15:06   ` Vlastimil Babka
2026-02-02 15:56 ` [PATCHv6 03/17] mm: Rename the 'compound_head' field in the 'struct page' to 'compound_info' Kiryl Shutsemau
2026-02-04 16:14   ` David Hildenbrand (arm)
2026-02-10 15:09   ` Vlastimil Babka
2026-02-02 15:56 ` [PATCHv6 04/17] mm: Move set/clear_compound_head() next to compound_head() Kiryl Shutsemau
2026-02-04 16:35   ` David Hildenbrand (arm)
2026-02-10 15:10   ` Vlastimil Babka
2026-02-02 15:56 ` [PATCHv6 05/17] riscv/mm: Align vmemmap to maximal folio size Kiryl Shutsemau
2026-02-04 16:50   ` David Hildenbrand (arm)
2026-02-05 13:50     ` Kiryl Shutsemau
2026-02-05 13:54       ` David Hildenbrand (Arm)
2026-02-02 15:56 ` [PATCHv6 06/17] LoongArch/mm: " Kiryl Shutsemau
2026-02-04 16:56   ` David Hildenbrand (arm)
2026-02-05 12:56     ` David Hildenbrand (Arm)
2026-02-05 13:43       ` Kiryl Shutsemau
2026-02-05 13:52         ` David Hildenbrand (Arm)
2026-02-05 13:52     ` Kiryl Shutsemau
2026-02-05 13:57       ` David Hildenbrand (Arm)
2026-02-02 15:56 ` [PATCHv6 07/17] mm: Rework compound_head() for power-of-2 sizeof(struct page) Kiryl Shutsemau
2026-02-05 14:09   ` David Hildenbrand (Arm)
2026-02-07 20:19   ` Usama Arif
2026-02-10 15:40   ` Vlastimil Babka
2026-02-02 15:56 ` [PATCHv6 08/17] mm: Make page_zonenum() use head page Kiryl Shutsemau
2026-02-04  3:40   ` Muchun Song
2026-02-05 13:10   ` David Hildenbrand (Arm)
2026-02-09 11:52     ` Kiryl Shutsemau
2026-02-10 15:57       ` Vlastimil Babka
2026-02-16 11:30         ` Kiryl Shutsemau
2026-02-15 23:13   ` Matthew Wilcox
2026-02-16  9:06     ` David Hildenbrand (Arm)
2026-02-16 11:20       ` Vlastimil Babka
2026-02-02 15:56 ` [PATCHv6 09/17] mm/sparse: Check memmap alignment for compound_info_has_mask() Kiryl Shutsemau
2026-02-03  3:35   ` Muchun Song
2026-02-05 13:31   ` David Hildenbrand (Arm)
2026-02-05 13:58     ` David Hildenbrand (Arm)
2026-02-02 15:56 ` [PATCHv6 10/17] mm/hugetlb: Refactor code around vmemmap_walk Kiryl Shutsemau
2026-02-02 15:56 ` [PATCHv6 11/17] mm/hugetlb: Remove fake head pages Kiryl Shutsemau
2026-02-03  9:50   ` Muchun Song
2026-02-06  9:14   ` David Hildenbrand (Arm)
2026-02-06  9:36   ` David Hildenbrand (Arm) [this message]
2026-02-07 20:16   ` Usama Arif
2026-02-07 21:25     ` David Hildenbrand (Arm)
2026-02-07 22:50       ` Usama Arif
2026-02-02 15:56 ` [PATCHv6 12/17] mm: Drop fake head checks Kiryl Shutsemau
2026-02-06  9:41   ` David Hildenbrand (Arm)
2026-02-10 16:18   ` Vlastimil Babka
2026-02-02 15:56 ` [PATCHv6 13/17] hugetlb: Remove VMEMMAP_SYNCHRONIZE_RCU Kiryl Shutsemau
2026-02-06  9:42   ` David Hildenbrand (Arm)
2026-02-02 15:56 ` [PATCHv6 14/17] mm/hugetlb: Remove hugetlb_optimize_vmemmap_key static key Kiryl Shutsemau
2026-02-06  9:42   ` David Hildenbrand (Arm)
2026-02-02 15:56 ` [PATCHv6 15/17] mm: Remove the branch from compound_head() Kiryl Shutsemau
2026-02-06 10:23   ` David Hildenbrand (Arm)
2026-02-10 16:42   ` Vlastimil Babka
2026-02-02 15:56 ` [PATCHv6 16/17] hugetlb: Update vmemmap_dedup.rst Kiryl Shutsemau
2026-02-06 10:35   ` David Hildenbrand (Arm)
2026-02-02 15:56 ` [PATCHv6 17/17] mm/slab: Use compound_head() in page_slab() Kiryl Shutsemau
2026-02-04  3:39   ` Muchun Song
2026-02-06 10:42   ` David Hildenbrand (Arm)
2026-02-10 16:45   ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3fcbad05-bef2-486a-8d9b-7010a91c85b8@kernel.org \
    --to=david@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=alex@ghiti.fr \
    --cc=aou@eecs.berkeley.edu \
    --cc=bhe@redhat.com \
    --cc=chenhuacai@kernel.org \
    --cc=corbet@lwn.net \
    --cc=fvdl@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=kas@kernel.org \
    --cc=kernel-team@meta.com \
    --cc=kernel@xen0n.name \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=loongarch@lists.linux.dev \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=osalvador@suse.de \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=rppt@kernel.org \
    --cc=usamaarif642@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox