* [PATCHv6 01/17] mm: Move MAX_FOLIO_ORDER definition to mmzone.h
2026-02-02 15:56 [PATCHv6 00/17] mm: Eliminate fake head pages from vmemmap optimization Kiryl Shutsemau
@ 2026-02-02 15:56 ` Kiryl Shutsemau
2026-02-07 20:20 ` Usama Arif
2026-02-10 15:01 ` Vlastimil Babka
2026-02-02 15:56 ` [PATCHv6 02/17] mm: Change the interface of prep_compound_tail() Kiryl Shutsemau
` (15 subsequent siblings)
16 siblings, 2 replies; 67+ messages in thread
From: Kiryl Shutsemau @ 2026-02-02 15:56 UTC (permalink / raw)
To: Andrew Morton, Muchun Song, David Hildenbrand, Matthew Wilcox,
Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv, Kiryl Shutsemau,
David Hildenbrand (Red Hat)
Move MAX_FOLIO_ORDER definition from mm.h to mmzone.h.
This is preparation for adding the vmemmap_tails array to struct
pglist_data, which requires MAX_FOLIO_ORDER to be available in mmzone.h.
Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Acked-by: Zi Yan <ziy@nvidia.com>
Acked-by: Muchun Song <muchun.song@linux.dev>
---
include/linux/mm.h | 31 -------------------------------
include/linux/mmzone.h | 31 +++++++++++++++++++++++++++++++
2 files changed, 31 insertions(+), 31 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index f8a8fd47399c..8d5fa655fea4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -27,7 +27,6 @@
#include <linux/page-flags.h>
#include <linux/page_ref.h>
#include <linux/overflow.h>
-#include <linux/sizes.h>
#include <linux/sched.h>
#include <linux/pgtable.h>
#include <linux/kasan.h>
@@ -2477,36 +2476,6 @@ static inline unsigned long folio_nr_pages(const struct folio *folio)
return folio_large_nr_pages(folio);
}
-#if !defined(CONFIG_HAVE_GIGANTIC_FOLIOS)
-/*
- * We don't expect any folios that exceed buddy sizes (and consequently
- * memory sections).
- */
-#define MAX_FOLIO_ORDER MAX_PAGE_ORDER
-#elif defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
-/*
- * Only pages within a single memory section are guaranteed to be
- * contiguous. By limiting folios to a single memory section, all folio
- * pages are guaranteed to be contiguous.
- */
-#define MAX_FOLIO_ORDER PFN_SECTION_SHIFT
-#elif defined(CONFIG_HUGETLB_PAGE)
-/*
- * There is no real limit on the folio size. We limit them to the maximum we
- * currently expect (see CONFIG_HAVE_GIGANTIC_FOLIOS): with hugetlb, we expect
- * no folios larger than 16 GiB on 64bit and 1 GiB on 32bit.
- */
-#define MAX_FOLIO_ORDER get_order(IS_ENABLED(CONFIG_64BIT) ? SZ_16G : SZ_1G)
-#else
-/*
- * Without hugetlb, gigantic folios that are bigger than a single PUD are
- * currently impossible.
- */
-#define MAX_FOLIO_ORDER PUD_ORDER
-#endif
-
-#define MAX_FOLIO_NR_PAGES (1UL << MAX_FOLIO_ORDER)
-
/*
* compound_nr() returns the number of pages in this potentially compound
* page. compound_nr() can be called on a tail page, and is defined to
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 3e51190a55e4..be8ce40b5638 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -23,6 +23,7 @@
#include <linux/page-flags.h>
#include <linux/local_lock.h>
#include <linux/zswap.h>
+#include <linux/sizes.h>
#include <asm/page.h>
/* Free memory management - zoned buddy allocator. */
@@ -61,6 +62,36 @@
*/
#define PAGE_ALLOC_COSTLY_ORDER 3
+#if !defined(CONFIG_HAVE_GIGANTIC_FOLIOS)
+/*
+ * We don't expect any folios that exceed buddy sizes (and consequently
+ * memory sections).
+ */
+#define MAX_FOLIO_ORDER MAX_PAGE_ORDER
+#elif defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
+/*
+ * Only pages within a single memory section are guaranteed to be
+ * contiguous. By limiting folios to a single memory section, all folio
+ * pages are guaranteed to be contiguous.
+ */
+#define MAX_FOLIO_ORDER PFN_SECTION_SHIFT
+#elif defined(CONFIG_HUGETLB_PAGE)
+/*
+ * There is no real limit on the folio size. We limit them to the maximum we
+ * currently expect (see CONFIG_HAVE_GIGANTIC_FOLIOS): with hugetlb, we expect
+ * no folios larger than 16 GiB on 64bit and 1 GiB on 32bit.
+ */
+#define MAX_FOLIO_ORDER get_order(IS_ENABLED(CONFIG_64BIT) ? SZ_16G : SZ_1G)
+#else
+/*
+ * Without hugetlb, gigantic folios that are bigger than a single PUD are
+ * currently impossible.
+ */
+#define MAX_FOLIO_ORDER PUD_ORDER
+#endif
+
+#define MAX_FOLIO_NR_PAGES (1UL << MAX_FOLIO_ORDER)
+
enum migratetype {
MIGRATE_UNMOVABLE,
MIGRATE_MOVABLE,
--
2.51.2
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCHv6 01/17] mm: Move MAX_FOLIO_ORDER definition to mmzone.h
2026-02-02 15:56 ` [PATCHv6 01/17] mm: Move MAX_FOLIO_ORDER definition to mmzone.h Kiryl Shutsemau
@ 2026-02-07 20:20 ` Usama Arif
2026-02-10 15:01 ` Vlastimil Babka
1 sibling, 0 replies; 67+ messages in thread
From: Usama Arif @ 2026-02-07 20:20 UTC (permalink / raw)
To: Kiryl Shutsemau, Andrew Morton, Muchun Song, David Hildenbrand,
Matthew Wilcox, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv,
David Hildenbrand (Red Hat)
On 02/02/2026 15:56, Kiryl Shutsemau wrote:
> Move MAX_FOLIO_ORDER definition from mm.h to mmzone.h.
>
> This is preparation for adding the vmemmap_tails array to struct
> pglist_data, which requires MAX_FOLIO_ORDER to be available in mmzone.h.
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
> Acked-by: Zi Yan <ziy@nvidia.com>
> Acked-by: Muchun Song <muchun.song@linux.dev>
Acked-by: Usama Arif <usamaarif642@gmail.com>
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCHv6 01/17] mm: Move MAX_FOLIO_ORDER definition to mmzone.h
2026-02-02 15:56 ` [PATCHv6 01/17] mm: Move MAX_FOLIO_ORDER definition to mmzone.h Kiryl Shutsemau
2026-02-07 20:20 ` Usama Arif
@ 2026-02-10 15:01 ` Vlastimil Babka
1 sibling, 0 replies; 67+ messages in thread
From: Vlastimil Babka @ 2026-02-10 15:01 UTC (permalink / raw)
To: Kiryl Shutsemau, Andrew Morton, Muchun Song, David Hildenbrand,
Matthew Wilcox, Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Lorenzo Stoakes, Zi Yan,
Baoquan He, Michal Hocko, Johannes Weiner, Jonathan Corbet,
Huacai Chen, WANG Xuerui, Palmer Dabbelt, Paul Walmsley,
Albert Ou, Alexandre Ghiti, kernel-team, linux-mm, linux-kernel,
linux-doc, loongarch, linux-riscv, David Hildenbrand (Red Hat)
On 2/2/26 16:56, Kiryl Shutsemau wrote:
> Move MAX_FOLIO_ORDER definition from mm.h to mmzone.h.
>
> This is preparation for adding the vmemmap_tails array to struct
> pglist_data, which requires MAX_FOLIO_ORDER to be available in mmzone.h.
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
> Acked-by: Zi Yan <ziy@nvidia.com>
> Acked-by: Muchun Song <muchun.song@linux.dev>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCHv6 02/17] mm: Change the interface of prep_compound_tail()
2026-02-02 15:56 [PATCHv6 00/17] mm: Eliminate fake head pages from vmemmap optimization Kiryl Shutsemau
2026-02-02 15:56 ` [PATCHv6 01/17] mm: Move MAX_FOLIO_ORDER definition to mmzone.h Kiryl Shutsemau
@ 2026-02-02 15:56 ` Kiryl Shutsemau
2026-02-04 16:14 ` David Hildenbrand (arm)
2026-02-10 15:06 ` Vlastimil Babka
2026-02-02 15:56 ` [PATCHv6 03/17] mm: Rename the 'compound_head' field in the 'struct page' to 'compound_info' Kiryl Shutsemau
` (14 subsequent siblings)
16 siblings, 2 replies; 67+ messages in thread
From: Kiryl Shutsemau @ 2026-02-02 15:56 UTC (permalink / raw)
To: Andrew Morton, Muchun Song, David Hildenbrand, Matthew Wilcox,
Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv, Kiryl Shutsemau
Instead of passing down the head page and tail page index, pass the tail
and head pages directly, as well as the order of the compound page.
This is a preparation for changing how the head position is encoded in
the tail page.
Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
Reviewed-by: Zi Yan <ziy@nvidia.com>
---
include/linux/page-flags.h | 4 +++-
mm/hugetlb.c | 8 +++++---
mm/internal.h | 12 ++++++------
mm/mm_init.c | 2 +-
mm/page_alloc.c | 2 +-
5 files changed, 16 insertions(+), 12 deletions(-)
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index f7a0e4af0c73..8a3694369e15 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -865,7 +865,9 @@ static inline bool folio_test_large(const struct folio *folio)
return folio_test_head(folio);
}
-static __always_inline void set_compound_head(struct page *page, struct page *head)
+static __always_inline void set_compound_head(struct page *page,
+ const struct page *head,
+ unsigned int order)
{
WRITE_ONCE(page->compound_head, (unsigned long)head + 1);
}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6e855a32de3d..54ba7cd05a86 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3168,6 +3168,7 @@ int __alloc_bootmem_huge_page(struct hstate *h, int nid)
/* Initialize [start_page:end_page_number] tail struct pages of a hugepage */
static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio,
+ struct hstate *h,
unsigned long start_page_number,
unsigned long end_page_number)
{
@@ -3176,6 +3177,7 @@ static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio,
struct page *page = folio_page(folio, start_page_number);
unsigned long head_pfn = folio_pfn(folio);
unsigned long pfn, end_pfn = head_pfn + end_page_number;
+ unsigned int order = huge_page_order(h);
/*
* As we marked all tail pages with memblock_reserved_mark_noinit(),
@@ -3183,7 +3185,7 @@ static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio,
*/
for (pfn = head_pfn + start_page_number; pfn < end_pfn; page++, pfn++) {
__init_single_page(page, pfn, zone, nid);
- prep_compound_tail((struct page *)folio, pfn - head_pfn);
+ prep_compound_tail(page, &folio->page, order);
set_page_count(page, 0);
}
}
@@ -3203,7 +3205,7 @@ static void __init hugetlb_folio_init_vmemmap(struct folio *folio,
__folio_set_head(folio);
ret = folio_ref_freeze(folio, 1);
VM_BUG_ON(!ret);
- hugetlb_folio_init_tail_vmemmap(folio, 1, nr_pages);
+ hugetlb_folio_init_tail_vmemmap(folio, h, 1, nr_pages);
prep_compound_head(&folio->page, huge_page_order(h));
}
@@ -3260,7 +3262,7 @@ static void __init prep_and_add_bootmem_folios(struct hstate *h,
* time as this is early in boot and there should
* be no contention.
*/
- hugetlb_folio_init_tail_vmemmap(folio,
+ hugetlb_folio_init_tail_vmemmap(folio, h,
HUGETLB_VMEMMAP_RESERVE_PAGES,
pages_per_huge_page(h));
}
diff --git a/mm/internal.h b/mm/internal.h
index d67e8bb75734..037ddcda25ff 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -879,13 +879,13 @@ static inline void prep_compound_head(struct page *page, unsigned int order)
INIT_LIST_HEAD(&folio->_deferred_list);
}
-static inline void prep_compound_tail(struct page *head, int tail_idx)
+static inline void prep_compound_tail(struct page *tail,
+ const struct page *head,
+ unsigned int order)
{
- struct page *p = head + tail_idx;
-
- p->mapping = TAIL_MAPPING;
- set_compound_head(p, head);
- set_page_private(p, 0);
+ tail->mapping = TAIL_MAPPING;
+ set_compound_head(tail, head, order);
+ set_page_private(tail, 0);
}
void post_alloc_hook(struct page *page, unsigned int order, gfp_t gfp_flags);
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 1a29a719af58..ba50f4c4337b 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -1099,7 +1099,7 @@ static void __ref memmap_init_compound(struct page *head,
struct page *page = pfn_to_page(pfn);
__init_zone_device_page(page, pfn, zone_idx, nid, pgmap);
- prep_compound_tail(head, pfn - head_pfn);
+ prep_compound_tail(page, head, order);
set_page_count(page, 0);
}
prep_compound_head(head, order);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e4104973e22f..00c7ea958767 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -744,7 +744,7 @@ void prep_compound_page(struct page *page, unsigned int order)
__SetPageHead(page);
for (i = 1; i < nr_pages; i++)
- prep_compound_tail(page, i);
+ prep_compound_tail(page + i, page, order);
prep_compound_head(page, order);
}
--
2.51.2
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCHv6 02/17] mm: Change the interface of prep_compound_tail()
2026-02-02 15:56 ` [PATCHv6 02/17] mm: Change the interface of prep_compound_tail() Kiryl Shutsemau
@ 2026-02-04 16:14 ` David Hildenbrand (arm)
2026-02-05 11:35 ` Kiryl Shutsemau
2026-02-10 15:06 ` Vlastimil Babka
1 sibling, 1 reply; 67+ messages in thread
From: David Hildenbrand (arm) @ 2026-02-04 16:14 UTC (permalink / raw)
To: Kiryl Shutsemau, Andrew Morton, Muchun Song, David Hildenbrand,
Matthew Wilcox, Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv
On 2/2/26 16:56, Kiryl Shutsemau wrote:
> Instead of passing down the head page and tail page index, pass the tail
> and head pages directly, as well as the order of the compound page.
>
> This is a preparation for changing how the head position is encoded in
> the tail page.
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Reviewed-by: Muchun Song <muchun.song@linux.dev>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> ---
> include/linux/page-flags.h | 4 +++-
> mm/hugetlb.c | 8 +++++---
> mm/internal.h | 12 ++++++------
> mm/mm_init.c | 2 +-
> mm/page_alloc.c | 2 +-
> 5 files changed, 16 insertions(+), 12 deletions(-)
>
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index f7a0e4af0c73..8a3694369e15 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -865,7 +865,9 @@ static inline bool folio_test_large(const struct folio *folio)
> return folio_test_head(folio);
> }
>
> -static __always_inline void set_compound_head(struct page *page, struct page *head)
> +static __always_inline void set_compound_head(struct page *page,
> + const struct page *head,
> + unsigned int order)
Two tab indents please on second+ parameter list whenever you touch code.
> {
> WRITE_ONCE(page->compound_head, (unsigned long)head + 1);
> }
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 6e855a32de3d..54ba7cd05a86 100644
[...]
> diff --git a/mm/internal.h b/mm/internal.h
> index d67e8bb75734..037ddcda25ff 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -879,13 +879,13 @@ static inline void prep_compound_head(struct page *page, unsigned int order)
> INIT_LIST_HEAD(&folio->_deferred_list);
> }
>
> -static inline void prep_compound_tail(struct page *head, int tail_idx)
> +static inline void prep_compound_tail(struct page *tail,
Just wondering whether we should call this "struct page *page" for
consistency with set_compound_head().
Or alternatively, call it also "tail" in set_compound_head().
> + const struct page *head,
> + unsigned int order)
Two tab indent, then this fits into two lines in total.
> {
> - struct page *p = head + tail_idx;
> -
> - p->mapping = TAIL_MAPPING;
> - set_compound_head(p, head);
> - set_page_private(p, 0);
> + tail->mapping = TAIL_MAPPING;
> + set_compound_head(tail, head, order);
> + set_page_private(tail, 0);
> }
Only nits, in general LGTM
Acked-by: David Hildenbrand (arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCHv6 02/17] mm: Change the interface of prep_compound_tail()
2026-02-04 16:14 ` David Hildenbrand (arm)
@ 2026-02-05 11:35 ` Kiryl Shutsemau
2026-02-05 11:58 ` David Hildenbrand (arm)
0 siblings, 1 reply; 67+ messages in thread
From: Kiryl Shutsemau @ 2026-02-05 11:35 UTC (permalink / raw)
To: David Hildenbrand (arm)
Cc: Andrew Morton, Muchun Song, David Hildenbrand, Matthew Wilcox,
Usama Arif, Frank van der Linden, Oscar Salvador, Mike Rapoport,
Vlastimil Babka, Lorenzo Stoakes, Zi Yan, Baoquan He,
Michal Hocko, Johannes Weiner, Jonathan Corbet, Huacai Chen,
WANG Xuerui, Palmer Dabbelt, Paul Walmsley, Albert Ou,
Alexandre Ghiti, kernel-team, linux-mm, linux-kernel, linux-doc,
loongarch, linux-riscv
On Wed, Feb 04, 2026 at 05:14:12PM +0100, David Hildenbrand (arm) wrote:
> On 2/2/26 16:56, Kiryl Shutsemau wrote:
> > Instead of passing down the head page and tail page index, pass the tail
> > and head pages directly, as well as the order of the compound page.
> >
> > This is a preparation for changing how the head position is encoded in
> > the tail page.
> >
> > Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> > Reviewed-by: Muchun Song <muchun.song@linux.dev>
> > Reviewed-by: Zi Yan <ziy@nvidia.com>
> > ---
> > include/linux/page-flags.h | 4 +++-
> > mm/hugetlb.c | 8 +++++---
> > mm/internal.h | 12 ++++++------
> > mm/mm_init.c | 2 +-
> > mm/page_alloc.c | 2 +-
> > 5 files changed, 16 insertions(+), 12 deletions(-)
> >
> > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > index f7a0e4af0c73..8a3694369e15 100644
> > --- a/include/linux/page-flags.h
> > +++ b/include/linux/page-flags.h
> > @@ -865,7 +865,9 @@ static inline bool folio_test_large(const struct folio *folio)
> > return folio_test_head(folio);
> > }
> > -static __always_inline void set_compound_head(struct page *page, struct page *head)
> > +static __always_inline void set_compound_head(struct page *page,
> > + const struct page *head,
> > + unsigned int order)
>
> Two tab indents please on second+ parameter list whenever you touch code.
Do we have this coding style preference written down somewhere?
-tip tree wants the opposite. Documentation/process/maintainer-tip.rst:
When splitting function declarations or function calls, then please align
the first argument in the second line with the first argument in the first
line::
I want the editor to do The Right Thing™ without my brain involvement.
Having different coding styles in different corners of the kernel makes
it hard.
>
> > {
> > WRITE_ONCE(page->compound_head, (unsigned long)head + 1);
> > }
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 6e855a32de3d..54ba7cd05a86 100644
>
>
> [...]
>
> > diff --git a/mm/internal.h b/mm/internal.h
> > index d67e8bb75734..037ddcda25ff 100644
> > --- a/mm/internal.h
> > +++ b/mm/internal.h
> > @@ -879,13 +879,13 @@ static inline void prep_compound_head(struct page *page, unsigned int order)
> > INIT_LIST_HEAD(&folio->_deferred_list);
> > }
> > -static inline void prep_compound_tail(struct page *head, int tail_idx)
> > +static inline void prep_compound_tail(struct page *tail,
>
> Just wondering whether we should call this "struct page *page" for
> consistency with set_compound_head().
>
> Or alternatively, call it also "tail" in set_compound_head().
I will take the alternative path :)
>
> > + const struct page *head,
> > + unsigned int order)
>
> Two tab indent, then this fits into two lines in total.
>
> > {
> > - struct page *p = head + tail_idx;
> > -
> > - p->mapping = TAIL_MAPPING;
> > - set_compound_head(p, head);
> > - set_page_private(p, 0);
> > + tail->mapping = TAIL_MAPPING;
> > + set_compound_head(tail, head, order);
> > + set_page_private(tail, 0);
> > }
> Only nits, in general LGTM
>
> Acked-by: David Hildenbrand (arm) <david@kernel.org>
Thanks!
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCHv6 02/17] mm: Change the interface of prep_compound_tail()
2026-02-05 11:35 ` Kiryl Shutsemau
@ 2026-02-05 11:58 ` David Hildenbrand (arm)
0 siblings, 0 replies; 67+ messages in thread
From: David Hildenbrand (arm) @ 2026-02-05 11:58 UTC (permalink / raw)
To: Kiryl Shutsemau
Cc: Andrew Morton, Muchun Song, Matthew Wilcox, Usama Arif,
Frank van der Linden, Oscar Salvador, Mike Rapoport,
Vlastimil Babka, Lorenzo Stoakes, Zi Yan, Baoquan He,
Michal Hocko, Johannes Weiner, Jonathan Corbet, Huacai Chen,
WANG Xuerui, Palmer Dabbelt, Paul Walmsley, Albert Ou,
Alexandre Ghiti, kernel-team, linux-mm, linux-kernel, linux-doc,
loongarch, linux-riscv
On 2/5/26 12:35, Kiryl Shutsemau wrote:
> On Wed, Feb 04, 2026 at 05:14:12PM +0100, David Hildenbrand (arm) wrote:
>> On 2/2/26 16:56, Kiryl Shutsemau wrote:
>>> Instead of passing down the head page and tail page index, pass the tail
>>> and head pages directly, as well as the order of the compound page.
>>>
>>> This is a preparation for changing how the head position is encoded in
>>> the tail page.
>>>
>>> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
>>> Reviewed-by: Muchun Song <muchun.song@linux.dev>
>>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>>> ---
>>> include/linux/page-flags.h | 4 +++-
>>> mm/hugetlb.c | 8 +++++---
>>> mm/internal.h | 12 ++++++------
>>> mm/mm_init.c | 2 +-
>>> mm/page_alloc.c | 2 +-
>>> 5 files changed, 16 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
>>> index f7a0e4af0c73..8a3694369e15 100644
>>> --- a/include/linux/page-flags.h
>>> +++ b/include/linux/page-flags.h
>>> @@ -865,7 +865,9 @@ static inline bool folio_test_large(const struct folio *folio)
>>> return folio_test_head(folio);
>>> }
>>> -static __always_inline void set_compound_head(struct page *page, struct page *head)
>>> +static __always_inline void set_compound_head(struct page *page,
>>> + const struct page *head,
>>> + unsigned int order)
>>
>> Two tab indents please on second+ parameter list whenever you touch code.
>
> Do we have this coding style preference written down somewhere?
Good question. I assume not. But it's what we do in MM :)
>
> -tip tree wants the opposite. Documentation/process/maintainer-tip.rst:
>
> When splitting function declarations or function calls, then please align
> the first argument in the second line with the first argument in the first
> line::
>
> I want the editor to do The Right Thing™ without my brain involvement.
> Having different coding styles in different corners of the kernel makes
> it hard.
Yeah, but unavoidable. :)
--
Cheers,
David
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCHv6 02/17] mm: Change the interface of prep_compound_tail()
2026-02-02 15:56 ` [PATCHv6 02/17] mm: Change the interface of prep_compound_tail() Kiryl Shutsemau
2026-02-04 16:14 ` David Hildenbrand (arm)
@ 2026-02-10 15:06 ` Vlastimil Babka
1 sibling, 0 replies; 67+ messages in thread
From: Vlastimil Babka @ 2026-02-10 15:06 UTC (permalink / raw)
To: Kiryl Shutsemau, Andrew Morton, Muchun Song, David Hildenbrand,
Matthew Wilcox, Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Lorenzo Stoakes, Zi Yan,
Baoquan He, Michal Hocko, Johannes Weiner, Jonathan Corbet,
Huacai Chen, WANG Xuerui, Palmer Dabbelt, Paul Walmsley,
Albert Ou, Alexandre Ghiti, kernel-team, linux-mm, linux-kernel,
linux-doc, loongarch, linux-riscv
On 2/2/26 16:56, Kiryl Shutsemau wrote:
> Instead of passing down the head page and tail page index, pass the tail
> and head pages directly, as well as the order of the compound page.
>
> This is a preparation for changing how the head position is encoded in
> the tail page.
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Reviewed-by: Muchun Song <muchun.song@linux.dev>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCHv6 03/17] mm: Rename the 'compound_head' field in the 'struct page' to 'compound_info'
2026-02-02 15:56 [PATCHv6 00/17] mm: Eliminate fake head pages from vmemmap optimization Kiryl Shutsemau
2026-02-02 15:56 ` [PATCHv6 01/17] mm: Move MAX_FOLIO_ORDER definition to mmzone.h Kiryl Shutsemau
2026-02-02 15:56 ` [PATCHv6 02/17] mm: Change the interface of prep_compound_tail() Kiryl Shutsemau
@ 2026-02-02 15:56 ` Kiryl Shutsemau
2026-02-04 16:14 ` David Hildenbrand (arm)
2026-02-10 15:09 ` Vlastimil Babka
2026-02-02 15:56 ` [PATCHv6 04/17] mm: Move set/clear_compound_head() next to compound_head() Kiryl Shutsemau
` (13 subsequent siblings)
16 siblings, 2 replies; 67+ messages in thread
From: Kiryl Shutsemau @ 2026-02-02 15:56 UTC (permalink / raw)
To: Andrew Morton, Muchun Song, David Hildenbrand, Matthew Wilcox,
Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv, Kiryl Shutsemau
The 'compound_head' field in the 'struct page' encodes whether the page
is a tail and where to locate the head page. Bit 0 is set if the page is
a tail, and the remaining bits in the field point to the head page.
As preparation for changing how the field encodes information about the
head page, rename the field to 'compound_info'.
Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
Reviewed-by: Zi Yan <ziy@nvidia.com>
---
.../admin-guide/kdump/vmcoreinfo.rst | 2 +-
Documentation/mm/vmemmap_dedup.rst | 6 +++---
include/linux/mm_types.h | 20 +++++++++----------
include/linux/page-flags.h | 18 ++++++++---------
include/linux/types.h | 2 +-
kernel/vmcore_info.c | 2 +-
mm/page_alloc.c | 2 +-
mm/slab.h | 2 +-
mm/util.c | 2 +-
9 files changed, 28 insertions(+), 28 deletions(-)
diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst b/Documentation/admin-guide/kdump/vmcoreinfo.rst
index 404a15f6782c..7663c610fe90 100644
--- a/Documentation/admin-guide/kdump/vmcoreinfo.rst
+++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst
@@ -141,7 +141,7 @@ nodemask_t
The size of a nodemask_t type. Used to compute the number of online
nodes.
-(page, flags|_refcount|mapping|lru|_mapcount|private|compound_order|compound_head)
+(page, flags|_refcount|mapping|lru|_mapcount|private|compound_order|compound_info)
----------------------------------------------------------------------------------
User-space tools compute their values based on the offset of these
diff --git a/Documentation/mm/vmemmap_dedup.rst b/Documentation/mm/vmemmap_dedup.rst
index b4a55b6569fa..1863d88d2dcb 100644
--- a/Documentation/mm/vmemmap_dedup.rst
+++ b/Documentation/mm/vmemmap_dedup.rst
@@ -24,7 +24,7 @@ For each base page, there is a corresponding ``struct page``.
Within the HugeTLB subsystem, only the first 4 ``struct page`` are used to
contain unique information about a HugeTLB page. ``__NR_USED_SUBPAGE`` provides
this upper limit. The only 'useful' information in the remaining ``struct page``
-is the compound_head field, and this field is the same for all tail pages.
+is the compound_info field, and this field is the same for all tail pages.
By removing redundant ``struct page`` for HugeTLB pages, memory can be returned
to the buddy allocator for other uses.
@@ -124,10 +124,10 @@ Here is how things look before optimization::
| |
+-----------+
-The value of page->compound_head is the same for all tail pages. The first
+The value of page->compound_info is the same for all tail pages. The first
page of ``struct page`` (page 0) associated with the HugeTLB page contains the 4
``struct page`` necessary to describe the HugeTLB. The only use of the remaining
-pages of ``struct page`` (page 1 to page 7) is to point to page->compound_head.
+pages of ``struct page`` (page 1 to page 7) is to point to page->compound_info.
Therefore, we can remap pages 1 to 7 to page 0. Only 1 page of ``struct page``
will be used for each HugeTLB page. This will allow us to free the remaining
7 pages to the buddy allocator.
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 3cc8ae722886..7bc82a2b889f 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -126,14 +126,14 @@ struct page {
atomic_long_t pp_ref_count;
};
struct { /* Tail pages of compound page */
- unsigned long compound_head; /* Bit zero is set */
+ unsigned long compound_info; /* Bit zero is set */
};
struct { /* ZONE_DEVICE pages */
/*
- * The first word is used for compound_head or folio
+ * The first word is used for compound_info or folio
* pgmap
*/
- void *_unused_pgmap_compound_head;
+ void *_unused_pgmap_compound_info;
void *zone_device_data;
/*
* ZONE_DEVICE private pages are counted as being
@@ -409,7 +409,7 @@ struct folio {
/* private: avoid cluttering the output */
/* For the Unevictable "LRU list" slot */
struct {
- /* Avoid compound_head */
+ /* Avoid compound_info */
void *__filler;
/* public: */
unsigned int mlock_count;
@@ -510,7 +510,7 @@ struct folio {
FOLIO_MATCH(flags, flags);
FOLIO_MATCH(lru, lru);
FOLIO_MATCH(mapping, mapping);
-FOLIO_MATCH(compound_head, lru);
+FOLIO_MATCH(compound_info, lru);
FOLIO_MATCH(__folio_index, index);
FOLIO_MATCH(private, private);
FOLIO_MATCH(_mapcount, _mapcount);
@@ -529,7 +529,7 @@ FOLIO_MATCH(_last_cpupid, _last_cpupid);
static_assert(offsetof(struct folio, fl) == \
offsetof(struct page, pg) + sizeof(struct page))
FOLIO_MATCH(flags, _flags_1);
-FOLIO_MATCH(compound_head, _head_1);
+FOLIO_MATCH(compound_info, _head_1);
FOLIO_MATCH(_mapcount, _mapcount_1);
FOLIO_MATCH(_refcount, _refcount_1);
#undef FOLIO_MATCH
@@ -537,13 +537,13 @@ FOLIO_MATCH(_refcount, _refcount_1);
static_assert(offsetof(struct folio, fl) == \
offsetof(struct page, pg) + 2 * sizeof(struct page))
FOLIO_MATCH(flags, _flags_2);
-FOLIO_MATCH(compound_head, _head_2);
+FOLIO_MATCH(compound_info, _head_2);
#undef FOLIO_MATCH
#define FOLIO_MATCH(pg, fl) \
static_assert(offsetof(struct folio, fl) == \
offsetof(struct page, pg) + 3 * sizeof(struct page))
FOLIO_MATCH(flags, _flags_3);
-FOLIO_MATCH(compound_head, _head_3);
+FOLIO_MATCH(compound_info, _head_3);
#undef FOLIO_MATCH
/**
@@ -609,8 +609,8 @@ struct ptdesc {
#define TABLE_MATCH(pg, pt) \
static_assert(offsetof(struct page, pg) == offsetof(struct ptdesc, pt))
TABLE_MATCH(flags, pt_flags);
-TABLE_MATCH(compound_head, pt_list);
-TABLE_MATCH(compound_head, _pt_pad_1);
+TABLE_MATCH(compound_info, pt_list);
+TABLE_MATCH(compound_info, _pt_pad_1);
TABLE_MATCH(mapping, __page_mapping);
TABLE_MATCH(__folio_index, pt_index);
TABLE_MATCH(rcu_head, pt_rcu_head);
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 8a3694369e15..aa46d49e82f7 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -213,7 +213,7 @@ static __always_inline const struct page *page_fixed_fake_head(const struct page
/*
* Only addresses aligned with PAGE_SIZE of struct page may be fake head
* struct page. The alignment check aims to avoid access the fields (
- * e.g. compound_head) of the @page[1]. It can avoid touch a (possibly)
+ * e.g. compound_info) of the @page[1]. It can avoid touch a (possibly)
* cold cacheline in some cases.
*/
if (IS_ALIGNED((unsigned long)page, PAGE_SIZE) &&
@@ -223,7 +223,7 @@ static __always_inline const struct page *page_fixed_fake_head(const struct page
* because the @page is a compound page composed with at least
* two contiguous pages.
*/
- unsigned long head = READ_ONCE(page[1].compound_head);
+ unsigned long head = READ_ONCE(page[1].compound_info);
if (likely(head & 1))
return (const struct page *)(head - 1);
@@ -281,7 +281,7 @@ static __always_inline int page_is_fake_head(const struct page *page)
static __always_inline unsigned long _compound_head(const struct page *page)
{
- unsigned long head = READ_ONCE(page->compound_head);
+ unsigned long head = READ_ONCE(page->compound_info);
if (unlikely(head & 1))
return head - 1;
@@ -320,13 +320,13 @@ static __always_inline unsigned long _compound_head(const struct page *page)
static __always_inline int PageTail(const struct page *page)
{
- return READ_ONCE(page->compound_head) & 1 || page_is_fake_head(page);
+ return READ_ONCE(page->compound_info) & 1 || page_is_fake_head(page);
}
static __always_inline int PageCompound(const struct page *page)
{
return test_bit(PG_head, &page->flags.f) ||
- READ_ONCE(page->compound_head) & 1;
+ READ_ONCE(page->compound_info) & 1;
}
#define PAGE_POISON_PATTERN -1l
@@ -348,7 +348,7 @@ static const unsigned long *const_folio_flags(const struct folio *folio,
{
const struct page *page = &folio->page;
- VM_BUG_ON_PGFLAGS(page->compound_head & 1, page);
+ VM_BUG_ON_PGFLAGS(page->compound_info & 1, page);
VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags.f), page);
return &page[n].flags.f;
}
@@ -357,7 +357,7 @@ static unsigned long *folio_flags(struct folio *folio, unsigned n)
{
struct page *page = &folio->page;
- VM_BUG_ON_PGFLAGS(page->compound_head & 1, page);
+ VM_BUG_ON_PGFLAGS(page->compound_info & 1, page);
VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags.f), page);
return &page[n].flags.f;
}
@@ -869,12 +869,12 @@ static __always_inline void set_compound_head(struct page *page,
const struct page *head,
unsigned int order)
{
- WRITE_ONCE(page->compound_head, (unsigned long)head + 1);
+ WRITE_ONCE(page->compound_info, (unsigned long)head + 1);
}
static __always_inline void clear_compound_head(struct page *page)
{
- WRITE_ONCE(page->compound_head, 0);
+ WRITE_ONCE(page->compound_info, 0);
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
diff --git a/include/linux/types.h b/include/linux/types.h
index f69be881369f..604697abf151 100644
--- a/include/linux/types.h
+++ b/include/linux/types.h
@@ -234,7 +234,7 @@ struct ustat {
*
* This guarantee is important for few reasons:
* - future call_rcu_lazy() will make use of lower bits in the pointer;
- * - the structure shares storage space in struct page with @compound_head,
+ * - the structure shares storage space in struct page with @compound_info,
* which encode PageTail() in bit 0. The guarantee is needed to avoid
* false-positive PageTail().
*/
diff --git a/kernel/vmcore_info.c b/kernel/vmcore_info.c
index 46198580373a..0a46df3e3db9 100644
--- a/kernel/vmcore_info.c
+++ b/kernel/vmcore_info.c
@@ -198,7 +198,7 @@ static int __init crash_save_vmcoreinfo_init(void)
VMCOREINFO_OFFSET(page, lru);
VMCOREINFO_OFFSET(page, _mapcount);
VMCOREINFO_OFFSET(page, private);
- VMCOREINFO_OFFSET(page, compound_head);
+ VMCOREINFO_OFFSET(page, compound_info);
VMCOREINFO_OFFSET(pglist_data, node_zones);
VMCOREINFO_OFFSET(pglist_data, nr_zones);
#ifdef CONFIG_FLATMEM
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 00c7ea958767..cb7375eb1713 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -731,7 +731,7 @@ static inline bool pcp_allowed_order(unsigned int order)
* The first PAGE_SIZE page is called the "head page" and have PG_head set.
*
* The remaining PAGE_SIZE pages are called "tail pages". PageTail() is encoded
- * in bit 0 of page->compound_head. The rest of bits is pointer to head page.
+ * in bit 0 of page->compound_info. The rest of bits is pointer to head page.
*
* The first tail page's ->compound_order holds the order of allocation.
* This usage means that zero-order pages may not be compound.
diff --git a/mm/slab.h b/mm/slab.h
index e767aa7e91b0..8a2a9c6c697b 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -100,7 +100,7 @@ struct slab {
#define SLAB_MATCH(pg, sl) \
static_assert(offsetof(struct page, pg) == offsetof(struct slab, sl))
SLAB_MATCH(flags, flags);
-SLAB_MATCH(compound_head, slab_cache); /* Ensure bit 0 is clear */
+SLAB_MATCH(compound_info, slab_cache); /* Ensure bit 0 is clear */
SLAB_MATCH(_refcount, __page_refcount);
#ifdef CONFIG_MEMCG
SLAB_MATCH(memcg_data, obj_exts);
diff --git a/mm/util.c b/mm/util.c
index b05ab6f97e11..3ebcb9e6035c 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -1247,7 +1247,7 @@ void snapshot_page(struct page_snapshot *ps, const struct page *page)
again:
memset(&ps->folio_snapshot, 0, sizeof(struct folio));
memcpy(&ps->page_snapshot, page, sizeof(*page));
- head = ps->page_snapshot.compound_head;
+ head = ps->page_snapshot.compound_info;
if ((head & 1) == 0) {
ps->idx = 0;
foliop = (struct folio *)&ps->page_snapshot;
--
2.51.2
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCHv6 03/17] mm: Rename the 'compound_head' field in the 'struct page' to 'compound_info'
2026-02-02 15:56 ` [PATCHv6 03/17] mm: Rename the 'compound_head' field in the 'struct page' to 'compound_info' Kiryl Shutsemau
@ 2026-02-04 16:14 ` David Hildenbrand (arm)
2026-02-10 15:09 ` Vlastimil Babka
1 sibling, 0 replies; 67+ messages in thread
From: David Hildenbrand (arm) @ 2026-02-04 16:14 UTC (permalink / raw)
To: Kiryl Shutsemau, Andrew Morton, Muchun Song, David Hildenbrand,
Matthew Wilcox, Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv
On 2/2/26 16:56, Kiryl Shutsemau wrote:
> The 'compound_head' field in the 'struct page' encodes whether the page
> is a tail and where to locate the head page. Bit 0 is set if the page is
> a tail, and the remaining bits in the field point to the head page.
>
> As preparation for changing how the field encodes information about the
> head page, rename the field to 'compound_info'.
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Reviewed-by: Muchun Song <muchun.song@linux.dev>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> ---
Acked-by: David Hildenbrand (arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCHv6 03/17] mm: Rename the 'compound_head' field in the 'struct page' to 'compound_info'
2026-02-02 15:56 ` [PATCHv6 03/17] mm: Rename the 'compound_head' field in the 'struct page' to 'compound_info' Kiryl Shutsemau
2026-02-04 16:14 ` David Hildenbrand (arm)
@ 2026-02-10 15:09 ` Vlastimil Babka
1 sibling, 0 replies; 67+ messages in thread
From: Vlastimil Babka @ 2026-02-10 15:09 UTC (permalink / raw)
To: Kiryl Shutsemau, Andrew Morton, Muchun Song, David Hildenbrand,
Matthew Wilcox, Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Lorenzo Stoakes, Zi Yan,
Baoquan He, Michal Hocko, Johannes Weiner, Jonathan Corbet,
Huacai Chen, WANG Xuerui, Palmer Dabbelt, Paul Walmsley,
Albert Ou, Alexandre Ghiti, kernel-team, linux-mm, linux-kernel,
linux-doc, loongarch, linux-riscv
On 2/2/26 16:56, Kiryl Shutsemau wrote:
> The 'compound_head' field in the 'struct page' encodes whether the page
> is a tail and where to locate the head page. Bit 0 is set if the page is
> a tail, and the remaining bits in the field point to the head page.
>
> As preparation for changing how the field encodes information about the
> head page, rename the field to 'compound_info'.
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Reviewed-by: Muchun Song <muchun.song@linux.dev>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCHv6 04/17] mm: Move set/clear_compound_head() next to compound_head()
2026-02-02 15:56 [PATCHv6 00/17] mm: Eliminate fake head pages from vmemmap optimization Kiryl Shutsemau
` (2 preceding siblings ...)
2026-02-02 15:56 ` [PATCHv6 03/17] mm: Rename the 'compound_head' field in the 'struct page' to 'compound_info' Kiryl Shutsemau
@ 2026-02-02 15:56 ` Kiryl Shutsemau
2026-02-04 16:35 ` David Hildenbrand (arm)
2026-02-10 15:10 ` Vlastimil Babka
2026-02-02 15:56 ` [PATCHv6 05/17] riscv/mm: Align vmemmap to maximal folio size Kiryl Shutsemau
` (12 subsequent siblings)
16 siblings, 2 replies; 67+ messages in thread
From: Kiryl Shutsemau @ 2026-02-02 15:56 UTC (permalink / raw)
To: Andrew Morton, Muchun Song, David Hildenbrand, Matthew Wilcox,
Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv, Kiryl Shutsemau
Move set_compound_head() and clear_compound_head() to be adjacent to the
compound_head() function in page-flags.h.
These functions encode and decode the same compound_info field, so
keeping them together makes it easier to verify their logic is
consistent, especially when the encoding changes.
Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
Reviewed-by: Zi Yan <ziy@nvidia.com>
---
include/linux/page-flags.h | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index aa46d49e82f7..d14a17ffb55b 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -290,6 +290,18 @@ static __always_inline unsigned long _compound_head(const struct page *page)
#define compound_head(page) ((typeof(page))_compound_head(page))
+static __always_inline void set_compound_head(struct page *page,
+ const struct page *head,
+ unsigned int order)
+{
+ WRITE_ONCE(page->compound_info, (unsigned long)head + 1);
+}
+
+static __always_inline void clear_compound_head(struct page *page)
+{
+ WRITE_ONCE(page->compound_info, 0);
+}
+
/**
* page_folio - Converts from page to folio.
* @p: The page.
@@ -865,18 +877,6 @@ static inline bool folio_test_large(const struct folio *folio)
return folio_test_head(folio);
}
-static __always_inline void set_compound_head(struct page *page,
- const struct page *head,
- unsigned int order)
-{
- WRITE_ONCE(page->compound_info, (unsigned long)head + 1);
-}
-
-static __always_inline void clear_compound_head(struct page *page)
-{
- WRITE_ONCE(page->compound_info, 0);
-}
-
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
static inline void ClearPageCompound(struct page *page)
{
--
2.51.2
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCHv6 04/17] mm: Move set/clear_compound_head() next to compound_head()
2026-02-02 15:56 ` [PATCHv6 04/17] mm: Move set/clear_compound_head() next to compound_head() Kiryl Shutsemau
@ 2026-02-04 16:35 ` David Hildenbrand (arm)
2026-02-10 15:10 ` Vlastimil Babka
1 sibling, 0 replies; 67+ messages in thread
From: David Hildenbrand (arm) @ 2026-02-04 16:35 UTC (permalink / raw)
To: Kiryl Shutsemau, Andrew Morton, Muchun Song, Matthew Wilcox,
Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv
On 2/2/26 16:56, Kiryl Shutsemau wrote:
> Move set_compound_head() and clear_compound_head() to be adjacent to the
> compound_head() function in page-flags.h.
>
> These functions encode and decode the same compound_info field, so
> keeping them together makes it easier to verify their logic is
> consistent, especially when the encoding changes.
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Reviewed-by: Muchun Song <muchun.song@linux.dev>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> ---
> include/linux/page-flags.h | 24 ++++++++++++------------
> 1 file changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index aa46d49e82f7..d14a17ffb55b 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -290,6 +290,18 @@ static __always_inline unsigned long _compound_head(const struct page *page)
>
> #define compound_head(page) ((typeof(page))_compound_head(page))
>
> +static __always_inline void set_compound_head(struct page *page,
> + const struct page *head,
> + unsigned int order)
^ :)
Acked-by: David Hildenbrand (arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCHv6 04/17] mm: Move set/clear_compound_head() next to compound_head()
2026-02-02 15:56 ` [PATCHv6 04/17] mm: Move set/clear_compound_head() next to compound_head() Kiryl Shutsemau
2026-02-04 16:35 ` David Hildenbrand (arm)
@ 2026-02-10 15:10 ` Vlastimil Babka
1 sibling, 0 replies; 67+ messages in thread
From: Vlastimil Babka @ 2026-02-10 15:10 UTC (permalink / raw)
To: Kiryl Shutsemau, Andrew Morton, Muchun Song, David Hildenbrand,
Matthew Wilcox, Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Lorenzo Stoakes, Zi Yan,
Baoquan He, Michal Hocko, Johannes Weiner, Jonathan Corbet,
Huacai Chen, WANG Xuerui, Palmer Dabbelt, Paul Walmsley,
Albert Ou, Alexandre Ghiti, kernel-team, linux-mm, linux-kernel,
linux-doc, loongarch, linux-riscv
On 2/2/26 16:56, Kiryl Shutsemau wrote:
> Move set_compound_head() and clear_compound_head() to be adjacent to the
> compound_head() function in page-flags.h.
>
> These functions encode and decode the same compound_info field, so
> keeping them together makes it easier to verify their logic is
> consistent, especially when the encoding changes.
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Reviewed-by: Muchun Song <muchun.song@linux.dev>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCHv6 05/17] riscv/mm: Align vmemmap to maximal folio size
2026-02-02 15:56 [PATCHv6 00/17] mm: Eliminate fake head pages from vmemmap optimization Kiryl Shutsemau
` (3 preceding siblings ...)
2026-02-02 15:56 ` [PATCHv6 04/17] mm: Move set/clear_compound_head() next to compound_head() Kiryl Shutsemau
@ 2026-02-02 15:56 ` Kiryl Shutsemau
2026-02-04 16:50 ` David Hildenbrand (arm)
2026-02-02 15:56 ` [PATCHv6 06/17] LoongArch/mm: " Kiryl Shutsemau
` (11 subsequent siblings)
16 siblings, 1 reply; 67+ messages in thread
From: Kiryl Shutsemau @ 2026-02-02 15:56 UTC (permalink / raw)
To: Andrew Morton, Muchun Song, David Hildenbrand, Matthew Wilcox,
Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv, Kiryl Shutsemau
The upcoming change to the HugeTLB vmemmap optimization (HVO) requires
struct pages of the head page to be naturally aligned with regard to the
folio size.
Align vmemmap to MAX_FOLIO_NR_PAGES.
Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
---
arch/riscv/mm/init.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 21d534824624..c555b9a4fdce 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -63,7 +63,8 @@ phys_addr_t phys_ram_base __ro_after_init;
EXPORT_SYMBOL(phys_ram_base);
#ifdef CONFIG_SPARSEMEM_VMEMMAP
-#define VMEMMAP_ADDR_ALIGN (1ULL << SECTION_SIZE_BITS)
+#define VMEMMAP_ADDR_ALIGN max(1ULL << SECTION_SIZE_BITS, \
+ MAX_FOLIO_NR_PAGES * sizeof(struct page))
unsigned long vmemmap_start_pfn __ro_after_init;
EXPORT_SYMBOL(vmemmap_start_pfn);
--
2.51.2
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCHv6 05/17] riscv/mm: Align vmemmap to maximal folio size
2026-02-02 15:56 ` [PATCHv6 05/17] riscv/mm: Align vmemmap to maximal folio size Kiryl Shutsemau
@ 2026-02-04 16:50 ` David Hildenbrand (arm)
2026-02-05 13:50 ` Kiryl Shutsemau
0 siblings, 1 reply; 67+ messages in thread
From: David Hildenbrand (arm) @ 2026-02-04 16:50 UTC (permalink / raw)
To: Kiryl Shutsemau, Andrew Morton, Muchun Song, Matthew Wilcox,
Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv
On 2/2/26 16:56, Kiryl Shutsemau wrote:
> The upcoming change to the HugeTLB vmemmap optimization (HVO) requires
> struct pages of the head page to be naturally aligned with regard to the
> folio size.
>
> Align vmemmap to MAX_FOLIO_NR_PAGES.
I think neither that statement nor the one in the patch description is
correct?
"MAX_FOLIO_NR_PAGES * sizeof(struct page)" is neither the maximum folio
size nor MAX_FOLIO_NR_PAGES.
It's the size of the memmap that a large folio could span at maximum.
Assuming we have a 16 GiB folio, the calculation would give us
4194304 * sizeof(struct page)
Which could be something like (assuming 80 bytes)
335544320
-> not even a power of 2, weird? (for HVO you wouldn't care as HVO would
be disabled, but that aliment is super weird?)
Assuming 64 bytes, it would be a power of two (as 64 is a power of two).
268435456 (1<< 28)
Which makes me wonder whether there is a way to avoid sizeof(struct
page) here completely.
Or limit the alignment to the case where HVO is actually active and
sizeof(struct page) makes any sense?
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> ---
> arch/riscv/mm/init.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 21d534824624..c555b9a4fdce 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -63,7 +63,8 @@ phys_addr_t phys_ram_base __ro_after_init;
> EXPORT_SYMBOL(phys_ram_base);
>
> #ifdef CONFIG_SPARSEMEM_VMEMMAP
> -#define VMEMMAP_ADDR_ALIGN (1ULL << SECTION_SIZE_BITS)
> +#define VMEMMAP_ADDR_ALIGN max(1ULL << SECTION_SIZE_BITS, \
> + MAX_FOLIO_NR_PAGES * sizeof(struct page))
>
> unsigned long vmemmap_start_pfn __ro_after_init;
> EXPORT_SYMBOL(vmemmap_start_pfn);
--
Cheers,
David
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCHv6 05/17] riscv/mm: Align vmemmap to maximal folio size
2026-02-04 16:50 ` David Hildenbrand (arm)
@ 2026-02-05 13:50 ` Kiryl Shutsemau
2026-02-05 13:54 ` David Hildenbrand (Arm)
0 siblings, 1 reply; 67+ messages in thread
From: Kiryl Shutsemau @ 2026-02-05 13:50 UTC (permalink / raw)
To: David Hildenbrand (arm)
Cc: Andrew Morton, Muchun Song, Matthew Wilcox, Usama Arif,
Frank van der Linden, Oscar Salvador, Mike Rapoport,
Vlastimil Babka, Lorenzo Stoakes, Zi Yan, Baoquan He,
Michal Hocko, Johannes Weiner, Jonathan Corbet, Huacai Chen,
WANG Xuerui, Palmer Dabbelt, Paul Walmsley, Albert Ou,
Alexandre Ghiti, kernel-team, linux-mm, linux-kernel, linux-doc,
loongarch, linux-riscv
On Wed, Feb 04, 2026 at 05:50:23PM +0100, David Hildenbrand (arm) wrote:
> On 2/2/26 16:56, Kiryl Shutsemau wrote:
> > The upcoming change to the HugeTLB vmemmap optimization (HVO) requires
> > struct pages of the head page to be naturally aligned with regard to the
> > folio size.
> >
> > Align vmemmap to MAX_FOLIO_NR_PAGES.
>
> I think neither that statement nor the one in the patch description is
> correct?
>
> "MAX_FOLIO_NR_PAGES * sizeof(struct page)" is neither the maximum folio size
> nor MAX_FOLIO_NR_PAGES.
>
> It's the size of the memmap that a large folio could span at maximum.
>
>
> Assuming we have a 16 GiB folio, the calculation would give us
>
> 4194304 * sizeof(struct page)
>
> Which could be something like (assuming 80 bytes)
>
> 335544320
>
> -> not even a power of 2, weird? (for HVO you wouldn't care as HVO would be
> disabled, but that aliment is super weird?)
>
>
> Assuming 64 bytes, it would be a power of two (as 64 is a power of two).
>
> 268435456 (1<< 28)
>
>
> Which makes me wonder whether there is a way to avoid sizeof(struct page)
> here completely.
I don't think we can. See the other thread.
What about using roundup_pow_of_two(sizeof(struct page)) here.
> Or limit the alignment to the case where HVO is actually active and
> sizeof(struct page) makes any sense?
The annoying part of HVO is that it is unknown at compile-time if it
will be used. You can compile kernel with HVO that will no be activated
due to non-power-of-2 sizeof(struct page) because of a debug config option.
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCHv6 05/17] riscv/mm: Align vmemmap to maximal folio size
2026-02-05 13:50 ` Kiryl Shutsemau
@ 2026-02-05 13:54 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 67+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-05 13:54 UTC (permalink / raw)
To: Kiryl Shutsemau
Cc: Andrew Morton, Muchun Song, Matthew Wilcox, Usama Arif,
Frank van der Linden, Oscar Salvador, Mike Rapoport,
Vlastimil Babka, Lorenzo Stoakes, Zi Yan, Baoquan He,
Michal Hocko, Johannes Weiner, Jonathan Corbet, Huacai Chen,
WANG Xuerui, Palmer Dabbelt, Paul Walmsley, Albert Ou,
Alexandre Ghiti, kernel-team, linux-mm, linux-kernel, linux-doc,
loongarch, linux-riscv
On 2/5/26 14:50, Kiryl Shutsemau wrote:
> On Wed, Feb 04, 2026 at 05:50:23PM +0100, David Hildenbrand (arm) wrote:
>> On 2/2/26 16:56, Kiryl Shutsemau wrote:
>>> The upcoming change to the HugeTLB vmemmap optimization (HVO) requires
>>> struct pages of the head page to be naturally aligned with regard to the
>>> folio size.
>>>
>>> Align vmemmap to MAX_FOLIO_NR_PAGES.
>>
>> I think neither that statement nor the one in the patch description is
>> correct?
>>
>> "MAX_FOLIO_NR_PAGES * sizeof(struct page)" is neither the maximum folio size
>> nor MAX_FOLIO_NR_PAGES.
>>
>> It's the size of the memmap that a large folio could span at maximum.
>>
>>
>> Assuming we have a 16 GiB folio, the calculation would give us
>>
>> 4194304 * sizeof(struct page)
>>
>> Which could be something like (assuming 80 bytes)
>>
>> 335544320
>>
>> -> not even a power of 2, weird? (for HVO you wouldn't care as HVO would be
>> disabled, but that aliment is super weird?)
>>
>>
>> Assuming 64 bytes, it would be a power of two (as 64 is a power of two).
>>
>> 268435456 (1<< 28)
>>
>>
>> Which makes me wonder whether there is a way to avoid sizeof(struct page)
>> here completely.
>
> I don't think we can. See the other thread.
Agreed. You could only go for something larger (like PAGE_SIZE).
>
> What about using roundup_pow_of_two(sizeof(struct page)) here.
Better I think.
>
>> Or limit the alignment to the case where HVO is actually active and
>> sizeof(struct page) makes any sense?
>
> The annoying part of HVO is that it is unknown at compile-time if it
> will be used. You can compile kernel with HVO that will no be activated
> due to non-power-of-2 sizeof(struct page) because of a debug config option.
Ah, and now I remember that sizeof cannot be used in macros, damnit.
--
Cheers,
David
^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCHv6 06/17] LoongArch/mm: Align vmemmap to maximal folio size
2026-02-02 15:56 [PATCHv6 00/17] mm: Eliminate fake head pages from vmemmap optimization Kiryl Shutsemau
` (4 preceding siblings ...)
2026-02-02 15:56 ` [PATCHv6 05/17] riscv/mm: Align vmemmap to maximal folio size Kiryl Shutsemau
@ 2026-02-02 15:56 ` Kiryl Shutsemau
2026-02-04 16:56 ` David Hildenbrand (arm)
2026-02-02 15:56 ` [PATCHv6 07/17] mm: Rework compound_head() for power-of-2 sizeof(struct page) Kiryl Shutsemau
` (10 subsequent siblings)
16 siblings, 1 reply; 67+ messages in thread
From: Kiryl Shutsemau @ 2026-02-02 15:56 UTC (permalink / raw)
To: Andrew Morton, Muchun Song, David Hildenbrand, Matthew Wilcox,
Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv, Kiryl Shutsemau
The upcoming change to the HugeTLB vmemmap optimization (HVO) requires
struct pages of the head page to be naturally aligned with regard to the
folio size.
Align vmemmap to MAX_FOLIO_NR_PAGES.
Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
---
arch/loongarch/include/asm/pgtable.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h
index c33b3bcb733e..f9416acb9156 100644
--- a/arch/loongarch/include/asm/pgtable.h
+++ b/arch/loongarch/include/asm/pgtable.h
@@ -113,7 +113,8 @@ extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)];
min(PTRS_PER_PGD * PTRS_PER_PUD * PTRS_PER_PMD * PTRS_PER_PTE * PAGE_SIZE, (1UL << cpu_vabits) / 2) - PMD_SIZE - VMEMMAP_SIZE - KFENCE_AREA_SIZE)
#endif
-#define vmemmap ((struct page *)((VMALLOC_END + PMD_SIZE) & PMD_MASK))
+#define VMEMMAP_ALIGN max(PMD_SIZE, MAX_FOLIO_NR_PAGES * sizeof(struct page))
+#define vmemmap ((struct page *)(ALIGN(VMALLOC_END, VMEMMAP_ALIGN)))
#define VMEMMAP_END ((unsigned long)vmemmap + VMEMMAP_SIZE - 1)
#define KFENCE_AREA_START (VMEMMAP_END + 1)
--
2.51.2
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCHv6 06/17] LoongArch/mm: Align vmemmap to maximal folio size
2026-02-02 15:56 ` [PATCHv6 06/17] LoongArch/mm: " Kiryl Shutsemau
@ 2026-02-04 16:56 ` David Hildenbrand (arm)
2026-02-05 12:56 ` David Hildenbrand (Arm)
2026-02-05 13:52 ` Kiryl Shutsemau
0 siblings, 2 replies; 67+ messages in thread
From: David Hildenbrand (arm) @ 2026-02-04 16:56 UTC (permalink / raw)
To: Kiryl Shutsemau, Andrew Morton, Muchun Song, Matthew Wilcox,
Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv
On 2/2/26 16:56, Kiryl Shutsemau wrote:
> The upcoming change to the HugeTLB vmemmap optimization (HVO) requires
> struct pages of the head page to be naturally aligned with regard to the
> folio size.
>
> Align vmemmap to MAX_FOLIO_NR_PAGES.
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> ---
> arch/loongarch/include/asm/pgtable.h | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h
> index c33b3bcb733e..f9416acb9156 100644
> --- a/arch/loongarch/include/asm/pgtable.h
> +++ b/arch/loongarch/include/asm/pgtable.h
> @@ -113,7 +113,8 @@ extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)];
> min(PTRS_PER_PGD * PTRS_PER_PUD * PTRS_PER_PMD * PTRS_PER_PTE * PAGE_SIZE, (1UL << cpu_vabits) / 2) - PMD_SIZE - VMEMMAP_SIZE - KFENCE_AREA_SIZE)
> #endif
>
> -#define vmemmap ((struct page *)((VMALLOC_END + PMD_SIZE) & PMD_MASK))
> +#define VMEMMAP_ALIGN max(PMD_SIZE, MAX_FOLIO_NR_PAGES * sizeof(struct page))
> +#define vmemmap ((struct page *)(ALIGN(VMALLOC_END, VMEMMAP_ALIGN)))
Same comment, the "MAX_FOLIO_NR_PAGES * sizeof(struct page)" is just black magic here
and the description of the situation is wrong.
Maybe you want to pull the magic "MAX_FOLIO_NR_PAGES * sizeof(struct page)" into the core and call it
#define MAX_FOLIO_VMEMMAP_ALIGN (MAX_FOLIO_NR_PAGES * sizeof(struct page))
But then special case it base on (a) HVO being configured in an (b) HVO being possible
#ifdef HUGETLB_PAGE_OPTIMIZE_VMEMMAP && is_power_of_2(sizeof(struct page)
/* A very helpful comment explaining the situation. */
#define MAX_FOLIO_VMEMMAP_ALIGN (MAX_FOLIO_NR_PAGES * sizeof(struct page))
#else
#define MAX_FOLIO_VMEMMAP_ALIGN 0
#endif
Something like that.
--
Cheers,
David
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCHv6 06/17] LoongArch/mm: Align vmemmap to maximal folio size
2026-02-04 16:56 ` David Hildenbrand (arm)
@ 2026-02-05 12:56 ` David Hildenbrand (Arm)
2026-02-05 13:43 ` Kiryl Shutsemau
2026-02-05 13:52 ` Kiryl Shutsemau
1 sibling, 1 reply; 67+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-05 12:56 UTC (permalink / raw)
To: Kiryl Shutsemau, Andrew Morton, Muchun Song, Matthew Wilcox,
Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv
On 2/4/26 17:56, David Hildenbrand (arm) wrote:
> On 2/2/26 16:56, Kiryl Shutsemau wrote:
>> The upcoming change to the HugeTLB vmemmap optimization (HVO) requires
>> struct pages of the head page to be naturally aligned with regard to the
>> folio size.
>>
>> Align vmemmap to MAX_FOLIO_NR_PAGES.
>>
>> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
>> ---
>> arch/loongarch/include/asm/pgtable.h | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/
>> include/asm/pgtable.h
>> index c33b3bcb733e..f9416acb9156 100644
>> --- a/arch/loongarch/include/asm/pgtable.h
>> +++ b/arch/loongarch/include/asm/pgtable.h
>> @@ -113,7 +113,8 @@ extern unsigned long empty_zero_page[PAGE_SIZE /
>> sizeof(unsigned long)];
>> min(PTRS_PER_PGD * PTRS_PER_PUD * PTRS_PER_PMD * PTRS_PER_PTE *
>> PAGE_SIZE, (1UL << cpu_vabits) / 2) - PMD_SIZE - VMEMMAP_SIZE -
>> KFENCE_AREA_SIZE)
>> #endif
>> -#define vmemmap ((struct page *)((VMALLOC_END + PMD_SIZE) &
>> PMD_MASK))
>> +#define VMEMMAP_ALIGN max(PMD_SIZE, MAX_FOLIO_NR_PAGES *
>> sizeof(struct page))
>> +#define vmemmap ((struct page *)(ALIGN(VMALLOC_END,
>> VMEMMAP_ALIGN)))
>
>
> Same comment, the "MAX_FOLIO_NR_PAGES * sizeof(struct page)" is just
> black magic here
> and the description of the situation is wrong.
>
> Maybe you want to pull the magic "MAX_FOLIO_NR_PAGES * sizeof(struct
> page)" into the core and call it
>
> #define MAX_FOLIO_VMEMMAP_ALIGN (MAX_FOLIO_NR_PAGES * sizeof(struct
> page))
>
> But then special case it base on (a) HVO being configured in an (b) HVO
> being possible
>
> #ifdef HUGETLB_PAGE_OPTIMIZE_VMEMMAP && is_power_of_2(sizeof(struct page)
> /* A very helpful comment explaining the situation. */
> #define MAX_FOLIO_VMEMMAP_ALIGN (MAX_FOLIO_NR_PAGES * sizeof(struct
> page))
> #else
> #define MAX_FOLIO_VMEMMAP_ALIGN 0
> #endif
>
> Something like that.
>
Thinking about this ...
the vmemmap start is always struct-page-aligned. Otherwise we'd be in
trouble already.
Isn't it then sufficient to just align the start to MAX_FOLIO_NR_PAGES?
Let's assume sizeof(struct page) == 64 and MAX_FOLIO_NR_PAGES = 512 for
simplicity.
vmemmap start would be multiples of 512 (0x0010000000).
512, 1024, 1536, 2048 ...
Assume we have an 256-pages folio at 1536+256 = 0x111000000
Assume we have the last page of that folio (0x011111111111), we would
just get to the start of that folio by AND-ing with ~(256-1).
Which case am I ignoring?
--
Cheers,
David
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCHv6 06/17] LoongArch/mm: Align vmemmap to maximal folio size
2026-02-05 12:56 ` David Hildenbrand (Arm)
@ 2026-02-05 13:43 ` Kiryl Shutsemau
2026-02-05 13:52 ` David Hildenbrand (Arm)
0 siblings, 1 reply; 67+ messages in thread
From: Kiryl Shutsemau @ 2026-02-05 13:43 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: Andrew Morton, Muchun Song, Matthew Wilcox, Usama Arif,
Frank van der Linden, Oscar Salvador, Mike Rapoport,
Vlastimil Babka, Lorenzo Stoakes, Zi Yan, Baoquan He,
Michal Hocko, Johannes Weiner, Jonathan Corbet, Huacai Chen,
WANG Xuerui, Palmer Dabbelt, Paul Walmsley, Albert Ou,
Alexandre Ghiti, kernel-team, linux-mm, linux-kernel, linux-doc,
loongarch, linux-riscv
On Thu, Feb 05, 2026 at 01:56:36PM +0100, David Hildenbrand (Arm) wrote:
> On 2/4/26 17:56, David Hildenbrand (arm) wrote:
> > On 2/2/26 16:56, Kiryl Shutsemau wrote:
> > > The upcoming change to the HugeTLB vmemmap optimization (HVO) requires
> > > struct pages of the head page to be naturally aligned with regard to the
> > > folio size.
> > >
> > > Align vmemmap to MAX_FOLIO_NR_PAGES.
> > >
> > > Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> > > ---
> > > arch/loongarch/include/asm/pgtable.h | 3 ++-
> > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/
> > > include/asm/pgtable.h
> > > index c33b3bcb733e..f9416acb9156 100644
> > > --- a/arch/loongarch/include/asm/pgtable.h
> > > +++ b/arch/loongarch/include/asm/pgtable.h
> > > @@ -113,7 +113,8 @@ extern unsigned long empty_zero_page[PAGE_SIZE /
> > > sizeof(unsigned long)];
> > > min(PTRS_PER_PGD * PTRS_PER_PUD * PTRS_PER_PMD * PTRS_PER_PTE
> > > * PAGE_SIZE, (1UL << cpu_vabits) / 2) - PMD_SIZE - VMEMMAP_SIZE -
> > > KFENCE_AREA_SIZE)
> > > #endif
> > > -#define vmemmap ((struct page *)((VMALLOC_END + PMD_SIZE) &
> > > PMD_MASK))
> > > +#define VMEMMAP_ALIGN max(PMD_SIZE, MAX_FOLIO_NR_PAGES *
> > > sizeof(struct page))
> > > +#define vmemmap ((struct page *)(ALIGN(VMALLOC_END,
> > > VMEMMAP_ALIGN)))
> >
> >
> > Same comment, the "MAX_FOLIO_NR_PAGES * sizeof(struct page)" is just
> > black magic here
> > and the description of the situation is wrong.
> >
> > Maybe you want to pull the magic "MAX_FOLIO_NR_PAGES * sizeof(struct
> > page)" into the core and call it
> >
> > #define MAX_FOLIO_VMEMMAP_ALIGN (MAX_FOLIO_NR_PAGES * sizeof(struct
> > page))
> >
> > But then special case it base on (a) HVO being configured in an (b) HVO
> > being possible
> >
> > #ifdef HUGETLB_PAGE_OPTIMIZE_VMEMMAP && is_power_of_2(sizeof(struct page)
> > /* A very helpful comment explaining the situation. */
> > #define MAX_FOLIO_VMEMMAP_ALIGN (MAX_FOLIO_NR_PAGES * sizeof(struct
> > page))
> > #else
> > #define MAX_FOLIO_VMEMMAP_ALIGN 0
> > #endif
> >
> > Something like that.
> >
>
> Thinking about this ...
>
> the vmemmap start is always struct-page-aligned. Otherwise we'd be in
> trouble already.
>
> Isn't it then sufficient to just align the start to MAX_FOLIO_NR_PAGES?
>
> Let's assume sizeof(struct page) == 64 and MAX_FOLIO_NR_PAGES = 512 for
> simplicity.
>
> vmemmap start would be multiples of 512 (0x0010000000).
>
> 512, 1024, 1536, 2048 ...
>
> Assume we have an 256-pages folio at 1536+256 = 0x111000000
s/0x/0b/, but okay.
> Assume we have the last page of that folio (0x011111111111), we would just
> get to the start of that folio by AND-ing with ~(256-1).
>
> Which case am I ignoring?
IIUC, you are ignoring the actual size of struct page. It is not 1 byte :P
The last page of this 256-page folio is at 1536+256 + (64 * 255) which
is 0b100011011000000. There's no mask that you can AND that gets you to
0b11100000000.
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCHv6 06/17] LoongArch/mm: Align vmemmap to maximal folio size
2026-02-05 13:43 ` Kiryl Shutsemau
@ 2026-02-05 13:52 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 67+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-05 13:52 UTC (permalink / raw)
To: Kiryl Shutsemau
Cc: Andrew Morton, Muchun Song, Matthew Wilcox, Usama Arif,
Frank van der Linden, Oscar Salvador, Mike Rapoport,
Vlastimil Babka, Lorenzo Stoakes, Zi Yan, Baoquan He,
Michal Hocko, Johannes Weiner, Jonathan Corbet, Huacai Chen,
WANG Xuerui, Palmer Dabbelt, Paul Walmsley, Albert Ou,
Alexandre Ghiti, kernel-team, linux-mm, linux-kernel, linux-doc,
loongarch, linux-riscv
On 2/5/26 14:43, Kiryl Shutsemau wrote:
> On Thu, Feb 05, 2026 at 01:56:36PM +0100, David Hildenbrand (Arm) wrote:
>> On 2/4/26 17:56, David Hildenbrand (arm) wrote:
>>>
>>>
>>> Same comment, the "MAX_FOLIO_NR_PAGES * sizeof(struct page)" is just
>>> black magic here
>>> and the description of the situation is wrong.
>>>
>>> Maybe you want to pull the magic "MAX_FOLIO_NR_PAGES * sizeof(struct
>>> page)" into the core and call it
>>>
>>> #define MAX_FOLIO_VMEMMAP_ALIGN (MAX_FOLIO_NR_PAGES * sizeof(struct
>>> page))
>>>
>>> But then special case it base on (a) HVO being configured in an (b) HVO
>>> being possible
>>>
>>> #ifdef HUGETLB_PAGE_OPTIMIZE_VMEMMAP && is_power_of_2(sizeof(struct page)
>>> /* A very helpful comment explaining the situation. */
>>> #define MAX_FOLIO_VMEMMAP_ALIGN (MAX_FOLIO_NR_PAGES * sizeof(struct
>>> page))
>>> #else
>>> #define MAX_FOLIO_VMEMMAP_ALIGN 0
>>> #endif
>>>
>>> Something like that.
>>>
>>
>> Thinking about this ...
>>
>> the vmemmap start is always struct-page-aligned. Otherwise we'd be in
>> trouble already.
>>
>> Isn't it then sufficient to just align the start to MAX_FOLIO_NR_PAGES?
>>
>> Let's assume sizeof(struct page) == 64 and MAX_FOLIO_NR_PAGES = 512 for
>> simplicity.
>>
>> vmemmap start would be multiples of 512 (0x0010000000).
>>
>> 512, 1024, 1536, 2048 ...
>>
>> Assume we have an 256-pages folio at 1536+256 = 0x111000000
>
> s/0x/0b/, but okay.
:)
>
>> Assume we have the last page of that folio (0x011111111111), we would just
>> get to the start of that folio by AND-ing with ~(256-1).
>>
>> Which case am I ignoring?
>
> IIUC, you are ignoring the actual size of struct page. It is not 1 byte :P
I thought it wouldn't matter but, yeah, that's it.
"Align the vmemmap to the maximum folio metadata size" it is.
Then you can explain the situation also alongside
MAX_FOLIO_VMEMMAP_ALIGN, and that we expect this to be a power of 2.
--
Cheers,
David
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCHv6 06/17] LoongArch/mm: Align vmemmap to maximal folio size
2026-02-04 16:56 ` David Hildenbrand (arm)
2026-02-05 12:56 ` David Hildenbrand (Arm)
@ 2026-02-05 13:52 ` Kiryl Shutsemau
2026-02-05 13:57 ` David Hildenbrand (Arm)
1 sibling, 1 reply; 67+ messages in thread
From: Kiryl Shutsemau @ 2026-02-05 13:52 UTC (permalink / raw)
To: David Hildenbrand (arm)
Cc: Andrew Morton, Muchun Song, Matthew Wilcox, Usama Arif,
Frank van der Linden, Oscar Salvador, Mike Rapoport,
Vlastimil Babka, Lorenzo Stoakes, Zi Yan, Baoquan He,
Michal Hocko, Johannes Weiner, Jonathan Corbet, Huacai Chen,
WANG Xuerui, Palmer Dabbelt, Paul Walmsley, Albert Ou,
Alexandre Ghiti, kernel-team, linux-mm, linux-kernel, linux-doc,
loongarch, linux-riscv
On Wed, Feb 04, 2026 at 05:56:45PM +0100, David Hildenbrand (arm) wrote:
> On 2/2/26 16:56, Kiryl Shutsemau wrote:
> > The upcoming change to the HugeTLB vmemmap optimization (HVO) requires
> > struct pages of the head page to be naturally aligned with regard to the
> > folio size.
> >
> > Align vmemmap to MAX_FOLIO_NR_PAGES.
> >
> > Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> > ---
> > arch/loongarch/include/asm/pgtable.h | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h
> > index c33b3bcb733e..f9416acb9156 100644
> > --- a/arch/loongarch/include/asm/pgtable.h
> > +++ b/arch/loongarch/include/asm/pgtable.h
> > @@ -113,7 +113,8 @@ extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)];
> > min(PTRS_PER_PGD * PTRS_PER_PUD * PTRS_PER_PMD * PTRS_PER_PTE * PAGE_SIZE, (1UL << cpu_vabits) / 2) - PMD_SIZE - VMEMMAP_SIZE - KFENCE_AREA_SIZE)
> > #endif
> > -#define vmemmap ((struct page *)((VMALLOC_END + PMD_SIZE) & PMD_MASK))
> > +#define VMEMMAP_ALIGN max(PMD_SIZE, MAX_FOLIO_NR_PAGES * sizeof(struct page))
> > +#define vmemmap ((struct page *)(ALIGN(VMALLOC_END, VMEMMAP_ALIGN)))
>
>
> Same comment, the "MAX_FOLIO_NR_PAGES * sizeof(struct page)" is just black magic here
> and the description of the situation is wrong.
>
> Maybe you want to pull the magic "MAX_FOLIO_NR_PAGES * sizeof(struct page)" into the core and call it
>
> #define MAX_FOLIO_VMEMMAP_ALIGN (MAX_FOLIO_NR_PAGES * sizeof(struct page))
>
> But then special case it base on (a) HVO being configured in an (b) HVO being possible
>
> #ifdef HUGETLB_PAGE_OPTIMIZE_VMEMMAP && is_power_of_2(sizeof(struct page)
This would require some kind of asm-offsets.c/bounds.c magic to pull the
struct page size condition to the preprocessor level.
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCHv6 06/17] LoongArch/mm: Align vmemmap to maximal folio size
2026-02-05 13:52 ` Kiryl Shutsemau
@ 2026-02-05 13:57 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 67+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-05 13:57 UTC (permalink / raw)
To: Kiryl Shutsemau
Cc: Andrew Morton, Muchun Song, Matthew Wilcox, Usama Arif,
Frank van der Linden, Oscar Salvador, Mike Rapoport,
Vlastimil Babka, Lorenzo Stoakes, Zi Yan, Baoquan He,
Michal Hocko, Johannes Weiner, Jonathan Corbet, Huacai Chen,
WANG Xuerui, Palmer Dabbelt, Paul Walmsley, Albert Ou,
Alexandre Ghiti, kernel-team, linux-mm, linux-kernel, linux-doc,
loongarch, linux-riscv
On 2/5/26 14:52, Kiryl Shutsemau wrote:
> On Wed, Feb 04, 2026 at 05:56:45PM +0100, David Hildenbrand (arm) wrote:
>> On 2/2/26 16:56, Kiryl Shutsemau wrote:
>>> The upcoming change to the HugeTLB vmemmap optimization (HVO) requires
>>> struct pages of the head page to be naturally aligned with regard to the
>>> folio size.
>>>
>>> Align vmemmap to MAX_FOLIO_NR_PAGES.
>>>
>>> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
>>> ---
>>> arch/loongarch/include/asm/pgtable.h | 3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h
>>> index c33b3bcb733e..f9416acb9156 100644
>>> --- a/arch/loongarch/include/asm/pgtable.h
>>> +++ b/arch/loongarch/include/asm/pgtable.h
>>> @@ -113,7 +113,8 @@ extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)];
>>> min(PTRS_PER_PGD * PTRS_PER_PUD * PTRS_PER_PMD * PTRS_PER_PTE * PAGE_SIZE, (1UL << cpu_vabits) / 2) - PMD_SIZE - VMEMMAP_SIZE - KFENCE_AREA_SIZE)
>>> #endif
>>> -#define vmemmap ((struct page *)((VMALLOC_END + PMD_SIZE) & PMD_MASK))
>>> +#define VMEMMAP_ALIGN max(PMD_SIZE, MAX_FOLIO_NR_PAGES * sizeof(struct page))
>>> +#define vmemmap ((struct page *)(ALIGN(VMALLOC_END, VMEMMAP_ALIGN)))
>>
>>
>> Same comment, the "MAX_FOLIO_NR_PAGES * sizeof(struct page)" is just black magic here
>> and the description of the situation is wrong.
>>
>> Maybe you want to pull the magic "MAX_FOLIO_NR_PAGES * sizeof(struct page)" into the core and call it
>>
>> #define MAX_FOLIO_VMEMMAP_ALIGN (MAX_FOLIO_NR_PAGES * sizeof(struct page))
>>
>> But then special case it base on (a) HVO being configured in an (b) HVO being possible
>>
>> #ifdef HUGETLB_PAGE_OPTIMIZE_VMEMMAP && is_power_of_2(sizeof(struct page)
>
> This would require some kind of asm-offsets.c/bounds.c magic to pull the
> struct page size condition to the preprocessor level.
>
Right.
I guess you could move that into the macro and let the compiler handle it.
--
Cheers,
David
^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCHv6 07/17] mm: Rework compound_head() for power-of-2 sizeof(struct page)
2026-02-02 15:56 [PATCHv6 00/17] mm: Eliminate fake head pages from vmemmap optimization Kiryl Shutsemau
` (5 preceding siblings ...)
2026-02-02 15:56 ` [PATCHv6 06/17] LoongArch/mm: " Kiryl Shutsemau
@ 2026-02-02 15:56 ` Kiryl Shutsemau
2026-02-05 14:09 ` David Hildenbrand (Arm)
` (2 more replies)
2026-02-02 15:56 ` [PATCHv6 08/17] mm: Make page_zonenum() use head page Kiryl Shutsemau
` (9 subsequent siblings)
16 siblings, 3 replies; 67+ messages in thread
From: Kiryl Shutsemau @ 2026-02-02 15:56 UTC (permalink / raw)
To: Andrew Morton, Muchun Song, David Hildenbrand, Matthew Wilcox,
Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv, Kiryl Shutsemau
For tail pages, the kernel uses the 'compound_info' field to get to the
head page. The bit 0 of the field indicates whether the page is a
tail page, and if set, the remaining bits represent a pointer to the
head page.
For cases when size of struct page is power-of-2, change the encoding of
compound_info to store a mask that can be applied to the virtual address
of the tail page in order to access the head page. It is possible
because struct page of the head page is naturally aligned with regards
to order of the page.
The significant impact of this modification is that all tail pages of
the same order will now have identical 'compound_info', regardless of
the compound page they are associated with. This paves the way for
eliminating fake heads.
The HugeTLB Vmemmap Optimization (HVO) creates fake heads and it is only
applied when the sizeof(struct page) is power-of-2. Having identical
tail pages allows the same page to be mapped into the vmemmap of all
pages, maintaining memory savings without fake heads.
If sizeof(struct page) is not power-of-2, there is no functional
changes.
Limit mask usage to HugeTLB vmemmap optimization (HVO) where it makes
a difference. The approach with mask would work in the wider set of
conditions, but it requires validating that struct pages are naturally
aligned for all orders up to the MAX_FOLIO_ORDER, which can be tricky.
Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
Reviewed-by: Zi Yan <ziy@nvidia.com>
---
include/linux/page-flags.h | 81 ++++++++++++++++++++++++++++++++++----
mm/slab.h | 16 ++++++--
mm/util.c | 16 ++++++--
3 files changed, 97 insertions(+), 16 deletions(-)
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index d14a17ffb55b..8f2c7fbc739b 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -198,6 +198,29 @@ enum pageflags {
#ifndef __GENERATING_BOUNDS_H
+/*
+ * For tail pages, if the size of struct page is power-of-2 ->compound_info
+ * encodes the mask that converts the address of the tail page address to
+ * the head page address.
+ *
+ * Otherwise, ->compound_info has direct pointer to head pages.
+ */
+static __always_inline bool compound_info_has_mask(void)
+{
+ /*
+ * Limit mask usage to HugeTLB vmemmap optimization (HVO) where it
+ * makes a difference.
+ *
+ * The approach with mask would work in the wider set of conditions,
+ * but it requires validating that struct pages are naturally aligned
+ * for all orders up to the MAX_FOLIO_ORDER, which can be tricky.
+ */
+ if (!IS_ENABLED(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP))
+ return false;
+
+ return is_power_of_2(sizeof(struct page));
+}
+
#ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
DECLARE_STATIC_KEY_FALSE(hugetlb_optimize_vmemmap_key);
@@ -210,6 +233,10 @@ static __always_inline const struct page *page_fixed_fake_head(const struct page
if (!static_branch_unlikely(&hugetlb_optimize_vmemmap_key))
return page;
+ /* Fake heads only exists if compound_info_has_mask() is true */
+ if (!compound_info_has_mask())
+ return page;
+
/*
* Only addresses aligned with PAGE_SIZE of struct page may be fake head
* struct page. The alignment check aims to avoid access the fields (
@@ -223,10 +250,14 @@ static __always_inline const struct page *page_fixed_fake_head(const struct page
* because the @page is a compound page composed with at least
* two contiguous pages.
*/
- unsigned long head = READ_ONCE(page[1].compound_info);
+ unsigned long info = READ_ONCE(page[1].compound_info);
- if (likely(head & 1))
- return (const struct page *)(head - 1);
+ /* See set_compound_head() */
+ if (likely(info & 1)) {
+ unsigned long p = (unsigned long)page;
+
+ return (const struct page *)(p & info);
+ }
}
return page;
}
@@ -281,11 +312,26 @@ static __always_inline int page_is_fake_head(const struct page *page)
static __always_inline unsigned long _compound_head(const struct page *page)
{
- unsigned long head = READ_ONCE(page->compound_info);
+ unsigned long info = READ_ONCE(page->compound_info);
- if (unlikely(head & 1))
- return head - 1;
- return (unsigned long)page_fixed_fake_head(page);
+ /* Bit 0 encodes PageTail() */
+ if (!(info & 1))
+ return (unsigned long)page_fixed_fake_head(page);
+
+ /*
+ * If compound_info_has_mask() is false, the rest of compound_info is
+ * the pointer to the head page.
+ */
+ if (!compound_info_has_mask())
+ return info - 1;
+
+ /*
+ * If compoun_info_has_mask() is true the rest of the info encodes
+ * the mask that converts the address of the tail page to the head page.
+ *
+ * No need to clear bit 0 in the mask as 'page' always has it clear.
+ */
+ return (unsigned long)page & info;
}
#define compound_head(page) ((typeof(page))_compound_head(page))
@@ -294,7 +340,26 @@ static __always_inline void set_compound_head(struct page *page,
const struct page *head,
unsigned int order)
{
- WRITE_ONCE(page->compound_info, (unsigned long)head + 1);
+ unsigned int shift;
+ unsigned long mask;
+
+ if (!compound_info_has_mask()) {
+ WRITE_ONCE(page->compound_info, (unsigned long)head | 1);
+ return;
+ }
+
+ /*
+ * If the size of struct page is power-of-2, bits [shift:0] of the
+ * virtual address of compound head are zero.
+ *
+ * Calculate mask that can be applied to the virtual address of
+ * the tail page to get address of the head page.
+ */
+ shift = order + order_base_2(sizeof(struct page));
+ mask = GENMASK(BITS_PER_LONG - 1, shift);
+
+ /* Bit 0 encodes PageTail() */
+ WRITE_ONCE(page->compound_info, mask | 1);
}
static __always_inline void clear_compound_head(struct page *page)
diff --git a/mm/slab.h b/mm/slab.h
index 8a2a9c6c697b..f68c3ac8126f 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -137,11 +137,19 @@ static_assert(IS_ALIGNED(offsetof(struct slab, freelist), sizeof(struct freelist
*/
static inline struct slab *page_slab(const struct page *page)
{
- unsigned long head;
+ unsigned long info;
+
+ info = READ_ONCE(page->compound_info);
+ if (info & 1) {
+ /* See compound_head() */
+ if (compound_info_has_mask()) {
+ unsigned long p = (unsigned long)page;
+ page = (struct page *)(p & info);
+ } else {
+ page = (struct page *)(info - 1);
+ }
+ }
- head = READ_ONCE(page->compound_head);
- if (head & 1)
- page = (struct page *)(head - 1);
if (data_race(page->page_type >> 24) != PGTY_slab)
page = NULL;
diff --git a/mm/util.c b/mm/util.c
index 3ebcb9e6035c..20dccf2881d7 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -1237,7 +1237,7 @@ static void set_ps_flags(struct page_snapshot *ps, const struct folio *folio,
*/
void snapshot_page(struct page_snapshot *ps, const struct page *page)
{
- unsigned long head, nr_pages = 1;
+ unsigned long info, nr_pages = 1;
struct folio *foliop;
int loops = 5;
@@ -1247,8 +1247,8 @@ void snapshot_page(struct page_snapshot *ps, const struct page *page)
again:
memset(&ps->folio_snapshot, 0, sizeof(struct folio));
memcpy(&ps->page_snapshot, page, sizeof(*page));
- head = ps->page_snapshot.compound_info;
- if ((head & 1) == 0) {
+ info = ps->page_snapshot.compound_info;
+ if (!(info & 1)) {
ps->idx = 0;
foliop = (struct folio *)&ps->page_snapshot;
if (!folio_test_large(foliop)) {
@@ -1259,7 +1259,15 @@ void snapshot_page(struct page_snapshot *ps, const struct page *page)
}
foliop = (struct folio *)page;
} else {
- foliop = (struct folio *)(head - 1);
+ /* See compound_head() */
+ if (compound_info_has_mask()) {
+ unsigned long p = (unsigned long)page;
+
+ foliop = (struct folio *)(p & info);
+ } else {
+ foliop = (struct folio *)(info - 1);
+ }
+
ps->idx = folio_page_idx(foliop, page);
}
--
2.51.2
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCHv6 07/17] mm: Rework compound_head() for power-of-2 sizeof(struct page)
2026-02-02 15:56 ` [PATCHv6 07/17] mm: Rework compound_head() for power-of-2 sizeof(struct page) Kiryl Shutsemau
@ 2026-02-05 14:09 ` David Hildenbrand (Arm)
2026-02-07 20:19 ` Usama Arif
2026-02-10 15:40 ` Vlastimil Babka
2 siblings, 0 replies; 67+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-05 14:09 UTC (permalink / raw)
To: Kiryl Shutsemau, Andrew Morton, Muchun Song, Matthew Wilcox,
Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv
On 2/2/26 16:56, Kiryl Shutsemau wrote:
> For tail pages, the kernel uses the 'compound_info' field to get to the
> head page. The bit 0 of the field indicates whether the page is a
> tail page, and if set, the remaining bits represent a pointer to the
> head page.
>
> For cases when size of struct page is power-of-2, change the encoding of
> compound_info to store a mask that can be applied to the virtual address
> of the tail page in order to access the head page. It is possible
> because struct page of the head page is naturally aligned with regards
> to order of the page.
>
> The significant impact of this modification is that all tail pages of
> the same order will now have identical 'compound_info', regardless of
> the compound page they are associated with. This paves the way for
> eliminating fake heads.
>
> The HugeTLB Vmemmap Optimization (HVO) creates fake heads and it is only
> applied when the sizeof(struct page) is power-of-2. Having identical
> tail pages allows the same page to be mapped into the vmemmap of all
> pages, maintaining memory savings without fake heads.
>
> If sizeof(struct page) is not power-of-2, there is no functional
> changes.
>
> Limit mask usage to HugeTLB vmemmap optimization (HVO) where it makes
> a difference. The approach with mask would work in the wider set of
> conditions, but it requires validating that struct pages are naturally
> aligned for all orders up to the MAX_FOLIO_ORDER, which can be tricky.
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Reviewed-by: Muchun Song <muchun.song@linux.dev>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
[...]
> struct folio *foliop;
> int loops = 5;
>
> @@ -1247,8 +1247,8 @@ void snapshot_page(struct page_snapshot *ps, const struct page *page)
> again:
> memset(&ps->folio_snapshot, 0, sizeof(struct folio));
> memcpy(&ps->page_snapshot, page, sizeof(*page));
> - head = ps->page_snapshot.compound_info;
> - if ((head & 1) == 0) {
> + info = ps->page_snapshot.compound_info;
> + if (!(info & 1)) {
> ps->idx = 0;
> foliop = (struct folio *)&ps->page_snapshot;
> if (!folio_test_large(foliop)) {
> @@ -1259,7 +1259,15 @@ void snapshot_page(struct page_snapshot *ps, const struct page *page)
> }
> foliop = (struct folio *)page;
> } else {
> - foliop = (struct folio *)(head - 1);
> + /* See compound_head() */
> + if (compound_info_has_mask()) {
> + unsigned long p = (unsigned long)page;
> +
> + foliop = (struct folio *)(p & info);
IIUC, we don't care about clearing bit0 before the & as the page pointer
shouldn't have set it in the first page.
Pretty neat
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCHv6 07/17] mm: Rework compound_head() for power-of-2 sizeof(struct page)
2026-02-02 15:56 ` [PATCHv6 07/17] mm: Rework compound_head() for power-of-2 sizeof(struct page) Kiryl Shutsemau
2026-02-05 14:09 ` David Hildenbrand (Arm)
@ 2026-02-07 20:19 ` Usama Arif
2026-02-10 15:40 ` Vlastimil Babka
2 siblings, 0 replies; 67+ messages in thread
From: Usama Arif @ 2026-02-07 20:19 UTC (permalink / raw)
To: Kiryl Shutsemau, Andrew Morton, Muchun Song, David Hildenbrand,
Matthew Wilcox, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv
On 02/02/2026 15:56, Kiryl Shutsemau wrote:
> For tail pages, the kernel uses the 'compound_info' field to get to the
> head page. The bit 0 of the field indicates whether the page is a
> tail page, and if set, the remaining bits represent a pointer to the
> head page.
>
> For cases when size of struct page is power-of-2, change the encoding of
> compound_info to store a mask that can be applied to the virtual address
> of the tail page in order to access the head page. It is possible
> because struct page of the head page is naturally aligned with regards
> to order of the page.
>
> The significant impact of this modification is that all tail pages of
> the same order will now have identical 'compound_info', regardless of
> the compound page they are associated with. This paves the way for
> eliminating fake heads.
>
> The HugeTLB Vmemmap Optimization (HVO) creates fake heads and it is only
> applied when the sizeof(struct page) is power-of-2. Having identical
> tail pages allows the same page to be mapped into the vmemmap of all
> pages, maintaining memory savings without fake heads.
>
> If sizeof(struct page) is not power-of-2, there is no functional
> changes.
>
> Limit mask usage to HugeTLB vmemmap optimization (HVO) where it makes
> a difference. The approach with mask would work in the wider set of
> conditions, but it requires validating that struct pages are naturally
> aligned for all orders up to the MAX_FOLIO_ORDER, which can be tricky.
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Reviewed-by: Muchun Song <muchun.song@linux.dev>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> ---
Acked-by: Usama Arif <usamaarif642@gmail.com>
Small nit below:
> include/linux/page-flags.h | 81 ++++++++++++++++++++++++++++++++++----
> mm/slab.h | 16 ++++++--
> mm/util.c | 16 ++++++--
> 3 files changed, 97 insertions(+), 16 deletions(-)
>
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index d14a17ffb55b..8f2c7fbc739b 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -198,6 +198,29 @@ enum pageflags {
>
> #ifndef __GENERATING_BOUNDS_H
>
> +/*
> + * For tail pages, if the size of struct page is power-of-2 ->compound_info
> + * encodes the mask that converts the address of the tail page address to
> + * the head page address.
> + *
> + * Otherwise, ->compound_info has direct pointer to head pages.
> + */
> +static __always_inline bool compound_info_has_mask(void)
> +{
> + /*
> + * Limit mask usage to HugeTLB vmemmap optimization (HVO) where it
> + * makes a difference.
> + *
> + * The approach with mask would work in the wider set of conditions,
> + * but it requires validating that struct pages are naturally aligned
> + * for all orders up to the MAX_FOLIO_ORDER, which can be tricky.
> + */
> + if (!IS_ENABLED(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP))
> + return false;
> +
> + return is_power_of_2(sizeof(struct page));
> +}
> +
> #ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
> DECLARE_STATIC_KEY_FALSE(hugetlb_optimize_vmemmap_key);
>
> @@ -210,6 +233,10 @@ static __always_inline const struct page *page_fixed_fake_head(const struct page
> if (!static_branch_unlikely(&hugetlb_optimize_vmemmap_key))
> return page;
>
> + /* Fake heads only exists if compound_info_has_mask() is true */
> + if (!compound_info_has_mask())
> + return page;
> +
> /*
> * Only addresses aligned with PAGE_SIZE of struct page may be fake head
> * struct page. The alignment check aims to avoid access the fields (
> @@ -223,10 +250,14 @@ static __always_inline const struct page *page_fixed_fake_head(const struct page
> * because the @page is a compound page composed with at least
> * two contiguous pages.
> */
> - unsigned long head = READ_ONCE(page[1].compound_info);
> + unsigned long info = READ_ONCE(page[1].compound_info);
>
> - if (likely(head & 1))
> - return (const struct page *)(head - 1);
> + /* See set_compound_head() */
> + if (likely(info & 1)) {
> + unsigned long p = (unsigned long)page;
> +
> + return (const struct page *)(p & info);
> + }
> }
> return page;
> }
> @@ -281,11 +312,26 @@ static __always_inline int page_is_fake_head(const struct page *page)
>
> static __always_inline unsigned long _compound_head(const struct page *page)
> {
> - unsigned long head = READ_ONCE(page->compound_info);
> + unsigned long info = READ_ONCE(page->compound_info);
>
> - if (unlikely(head & 1))
> - return head - 1;
> - return (unsigned long)page_fixed_fake_head(page);
> + /* Bit 0 encodes PageTail() */
> + if (!(info & 1))
> + return (unsigned long)page_fixed_fake_head(page);
> +
> + /*
> + * If compound_info_has_mask() is false, the rest of compound_info is
> + * the pointer to the head page.
> + */
> + if (!compound_info_has_mask())
> + return info - 1;
> +
> + /*
> + * If compoun_info_has_mask() is true the rest of the info encodes
s/compoun_info_has_mask/compound_info_has_mask/
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCHv6 07/17] mm: Rework compound_head() for power-of-2 sizeof(struct page)
2026-02-02 15:56 ` [PATCHv6 07/17] mm: Rework compound_head() for power-of-2 sizeof(struct page) Kiryl Shutsemau
2026-02-05 14:09 ` David Hildenbrand (Arm)
2026-02-07 20:19 ` Usama Arif
@ 2026-02-10 15:40 ` Vlastimil Babka
2 siblings, 0 replies; 67+ messages in thread
From: Vlastimil Babka @ 2026-02-10 15:40 UTC (permalink / raw)
To: Kiryl Shutsemau, Andrew Morton, Muchun Song, David Hildenbrand,
Matthew Wilcox, Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Lorenzo Stoakes, Zi Yan,
Baoquan He, Michal Hocko, Johannes Weiner, Jonathan Corbet,
Huacai Chen, WANG Xuerui, Palmer Dabbelt, Paul Walmsley,
Albert Ou, Alexandre Ghiti, kernel-team, linux-mm, linux-kernel,
linux-doc, loongarch, linux-riscv
On 2/2/26 16:56, Kiryl Shutsemau wrote:
> For tail pages, the kernel uses the 'compound_info' field to get to the
> head page. The bit 0 of the field indicates whether the page is a
> tail page, and if set, the remaining bits represent a pointer to the
> head page.
>
> For cases when size of struct page is power-of-2, change the encoding of
> compound_info to store a mask that can be applied to the virtual address
> of the tail page in order to access the head page. It is possible
> because struct page of the head page is naturally aligned with regards
> to order of the page.
>
> The significant impact of this modification is that all tail pages of
> the same order will now have identical 'compound_info', regardless of
> the compound page they are associated with. This paves the way for
> eliminating fake heads.
>
> The HugeTLB Vmemmap Optimization (HVO) creates fake heads and it is only
> applied when the sizeof(struct page) is power-of-2. Having identical
> tail pages allows the same page to be mapped into the vmemmap of all
> pages, maintaining memory savings without fake heads.
>
> If sizeof(struct page) is not power-of-2, there is no functional
> changes.
>
> Limit mask usage to HugeTLB vmemmap optimization (HVO) where it makes
> a difference. The approach with mask would work in the wider set of
> conditions, but it requires validating that struct pages are naturally
> aligned for all orders up to the MAX_FOLIO_ORDER, which can be tricky.
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Reviewed-by: Muchun Song <muchun.song@linux.dev>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
nit:
> ---
> include/linux/page-flags.h | 81 ++++++++++++++++++++++++++++++++++----
> mm/slab.h | 16 ++++++--
> mm/util.c | 16 ++++++--
> 3 files changed, 97 insertions(+), 16 deletions(-)
>
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index d14a17ffb55b..8f2c7fbc739b 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -198,6 +198,29 @@ enum pageflags {
>
> #ifndef __GENERATING_BOUNDS_H
>
> +/*
> + * For tail pages, if the size of struct page is power-of-2 ->compound_info
> + * encodes the mask that converts the address of the tail page address to
> + * the head page address.
> + *
> + * Otherwise, ->compound_info has direct pointer to head pages.
> + */
> +static __always_inline bool compound_info_has_mask(void)
> +{
> + /*
> + * Limit mask usage to HugeTLB vmemmap optimization (HVO) where it
> + * makes a difference.
> + *
> + * The approach with mask would work in the wider set of conditions,
> + * but it requires validating that struct pages are naturally aligned
> + * for all orders up to the MAX_FOLIO_ORDER, which can be tricky.
> + */
> + if (!IS_ENABLED(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP))
> + return false;
> +
> + return is_power_of_2(sizeof(struct page));
> +}
> +
> #ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
> DECLARE_STATIC_KEY_FALSE(hugetlb_optimize_vmemmap_key);
>
> @@ -210,6 +233,10 @@ static __always_inline const struct page *page_fixed_fake_head(const struct page
> if (!static_branch_unlikely(&hugetlb_optimize_vmemmap_key))
> return page;
>
> + /* Fake heads only exists if compound_info_has_mask() is true */
> + if (!compound_info_has_mask())
> + return page;
> +
Could we move this compile-time-constant test above the static branch test?
> /*
> * Only addresses aligned with PAGE_SIZE of struct page may be fake head
> * struct page. The alignment check aims to avoid access the fields (
> @@ -223,10 +250,14 @@ static __always_inline const struct page *page_fixed_fake_head(const struct page
> * because the @page is a compound page composed with at least
> * two contiguous pages.
> */
> - unsigned long head = READ_ONCE(page[1].compound_info);
> + unsigned long info = READ_ONCE(page[1].compound_info);
>
> - if (likely(head & 1))
> - return (const struct page *)(head - 1);
> + /* See set_compound_head() */
> + if (likely(info & 1)) {
> + unsigned long p = (unsigned long)page;
> +
> + return (const struct page *)(p & info);
> + }
> }
> return page;
> }
^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCHv6 08/17] mm: Make page_zonenum() use head page
2026-02-02 15:56 [PATCHv6 00/17] mm: Eliminate fake head pages from vmemmap optimization Kiryl Shutsemau
` (6 preceding siblings ...)
2026-02-02 15:56 ` [PATCHv6 07/17] mm: Rework compound_head() for power-of-2 sizeof(struct page) Kiryl Shutsemau
@ 2026-02-02 15:56 ` Kiryl Shutsemau
2026-02-04 3:40 ` Muchun Song
` (2 more replies)
2026-02-02 15:56 ` [PATCHv6 09/17] mm/sparse: Check memmap alignment for compound_info_has_mask() Kiryl Shutsemau
` (8 subsequent siblings)
16 siblings, 3 replies; 67+ messages in thread
From: Kiryl Shutsemau @ 2026-02-02 15:56 UTC (permalink / raw)
To: Andrew Morton, Muchun Song, David Hildenbrand, Matthew Wilcox,
Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv, Kiryl Shutsemau
With the upcoming changes to HVO, a single page of tail struct pages
will be shared across all huge pages of the same order on a node. Since
huge pages on the same node may belong to different zones, the zone
information stored in shared tail page flags would be incorrect.
Always fetch zone information from the head page, which has unique and
correct zone flags for each compound page.
Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Acked-by: Zi Yan <ziy@nvidia.com>
---
include/linux/mmzone.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index be8ce40b5638..192143b5cdc0 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1219,6 +1219,7 @@ static inline enum zone_type memdesc_zonenum(memdesc_flags_t flags)
static inline enum zone_type page_zonenum(const struct page *page)
{
+ page = compound_head(page);
return memdesc_zonenum(page->flags);
}
--
2.51.2
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCHv6 08/17] mm: Make page_zonenum() use head page
2026-02-02 15:56 ` [PATCHv6 08/17] mm: Make page_zonenum() use head page Kiryl Shutsemau
@ 2026-02-04 3:40 ` Muchun Song
2026-02-05 13:10 ` David Hildenbrand (Arm)
2026-02-15 23:13 ` Matthew Wilcox
2 siblings, 0 replies; 67+ messages in thread
From: Muchun Song @ 2026-02-04 3:40 UTC (permalink / raw)
To: Kiryl Shutsemau
Cc: Andrew Morton, David Hildenbrand, Matthew Wilcox, Usama Arif,
Frank van der Linden, Oscar Salvador, Mike Rapoport,
Vlastimil Babka, Lorenzo Stoakes, Zi Yan, Baoquan He,
Michal Hocko, Johannes Weiner, Jonathan Corbet, Huacai Chen,
WANG Xuerui, Palmer Dabbelt, Paul Walmsley, Albert Ou,
Alexandre Ghiti, kernel-team, linux-mm, linux-kernel, linux-doc,
loongarch, linux-riscv
> On Feb 2, 2026, at 23:56, Kiryl Shutsemau <kas@kernel.org> wrote:
>
> With the upcoming changes to HVO, a single page of tail struct pages
> will be shared across all huge pages of the same order on a node. Since
> huge pages on the same node may belong to different zones, the zone
> information stored in shared tail page flags would be incorrect.
>
> Always fetch zone information from the head page, which has unique and
> correct zone flags for each compound page.
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Acked-by: Zi Yan <ziy@nvidia.com>
Acked-by: Muchun Song <muchun.song@linux.dev>
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCHv6 08/17] mm: Make page_zonenum() use head page
2026-02-02 15:56 ` [PATCHv6 08/17] mm: Make page_zonenum() use head page Kiryl Shutsemau
2026-02-04 3:40 ` Muchun Song
@ 2026-02-05 13:10 ` David Hildenbrand (Arm)
2026-02-09 11:52 ` Kiryl Shutsemau
2026-02-15 23:13 ` Matthew Wilcox
2 siblings, 1 reply; 67+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-05 13:10 UTC (permalink / raw)
To: Kiryl Shutsemau, Andrew Morton, Muchun Song, Matthew Wilcox,
Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv
On 2/2/26 16:56, Kiryl Shutsemau wrote:
> With the upcoming changes to HVO, a single page of tail struct pages
> will be shared across all huge pages of the same order on a node. Since
> huge pages on the same node may belong to different zones, the zone
> information stored in shared tail page flags would be incorrect.
>
> Always fetch zone information from the head page, which has unique and
> correct zone flags for each compound page.
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Acked-by: Zi Yan <ziy@nvidia.com>
> ---
> include/linux/mmzone.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index be8ce40b5638..192143b5cdc0 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1219,6 +1219,7 @@ static inline enum zone_type memdesc_zonenum(memdesc_flags_t flags)
>
> static inline enum zone_type page_zonenum(const struct page *page)
> {
> + page = compound_head(page);
> return memdesc_zonenum(page->flags);
We end up calling page_zonenum() without holding a reference.
Given that _compound_head() does a READ_ONCE(), this should work even if
we see concurrent page freeing etc.
However, this change implies that we now perform a compound page lookup
for every PageHighMem() [meh], page_zone() [quite some users in the
buddy, including for pageblock access and page freeing].
That's a nasty compromise for making HVO better? :)
We should likely limit that special casing to kernels that really rquire
it (HVO).
--
Cheers,
David
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCHv6 08/17] mm: Make page_zonenum() use head page
2026-02-05 13:10 ` David Hildenbrand (Arm)
@ 2026-02-09 11:52 ` Kiryl Shutsemau
2026-02-10 15:57 ` Vlastimil Babka
0 siblings, 1 reply; 67+ messages in thread
From: Kiryl Shutsemau @ 2026-02-09 11:52 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: Andrew Morton, Muchun Song, Matthew Wilcox, Usama Arif,
Frank van der Linden, Oscar Salvador, Mike Rapoport,
Vlastimil Babka, Lorenzo Stoakes, Zi Yan, Baoquan He,
Michal Hocko, Johannes Weiner, Jonathan Corbet, Huacai Chen,
WANG Xuerui, Palmer Dabbelt, Paul Walmsley, Albert Ou,
Alexandre Ghiti, kernel-team, linux-mm, linux-kernel, linux-doc,
loongarch, linux-riscv
On Thu, Feb 05, 2026 at 02:10:40PM +0100, David Hildenbrand (Arm) wrote:
> On 2/2/26 16:56, Kiryl Shutsemau wrote:
> > With the upcoming changes to HVO, a single page of tail struct pages
> > will be shared across all huge pages of the same order on a node. Since
> > huge pages on the same node may belong to different zones, the zone
> > information stored in shared tail page flags would be incorrect.
> >
> > Always fetch zone information from the head page, which has unique and
> > correct zone flags for each compound page.
> >
> > Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> > Acked-by: Zi Yan <ziy@nvidia.com>
> > ---
> > include/linux/mmzone.h | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index be8ce40b5638..192143b5cdc0 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -1219,6 +1219,7 @@ static inline enum zone_type memdesc_zonenum(memdesc_flags_t flags)
> > static inline enum zone_type page_zonenum(const struct page *page)
> > {
> > + page = compound_head(page);
> > return memdesc_zonenum(page->flags);
>
> We end up calling page_zonenum() without holding a reference.
>
> Given that _compound_head() does a READ_ONCE(), this should work even if we
> see concurrent page freeing etc.
>
> However, this change implies that we now perform a compound page lookup for
> every PageHighMem() [meh], page_zone() [quite some users in the buddy,
> including for pageblock access and page freeing].
>
> That's a nasty compromise for making HVO better? :)
>
> We should likely limit that special casing to kernels that really rquire it
> (HVO).
I will add compound_info_has_mask() check.
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCHv6 08/17] mm: Make page_zonenum() use head page
2026-02-09 11:52 ` Kiryl Shutsemau
@ 2026-02-10 15:57 ` Vlastimil Babka
2026-02-16 11:30 ` Kiryl Shutsemau
0 siblings, 1 reply; 67+ messages in thread
From: Vlastimil Babka @ 2026-02-10 15:57 UTC (permalink / raw)
To: Kiryl Shutsemau, David Hildenbrand (Arm)
Cc: Andrew Morton, Muchun Song, Matthew Wilcox, Usama Arif,
Frank van der Linden, Oscar Salvador, Mike Rapoport,
Lorenzo Stoakes, Zi Yan, Baoquan He, Michal Hocko,
Johannes Weiner, Jonathan Corbet, Huacai Chen, WANG Xuerui,
Palmer Dabbelt, Paul Walmsley, Albert Ou, Alexandre Ghiti,
kernel-team, linux-mm, linux-kernel, linux-doc, loongarch,
linux-riscv
On 2/9/26 12:52, Kiryl Shutsemau wrote:
> On Thu, Feb 05, 2026 at 02:10:40PM +0100, David Hildenbrand (Arm) wrote:
>> On 2/2/26 16:56, Kiryl Shutsemau wrote:
>> > With the upcoming changes to HVO, a single page of tail struct pages
>> > will be shared across all huge pages of the same order on a node. Since
>> > huge pages on the same node may belong to different zones, the zone
>> > information stored in shared tail page flags would be incorrect.
>> >
>> > Always fetch zone information from the head page, which has unique and
>> > correct zone flags for each compound page.
>> >
>> > Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
>> > Acked-by: Zi Yan <ziy@nvidia.com>
>> > ---
>> > include/linux/mmzone.h | 1 +
>> > 1 file changed, 1 insertion(+)
>> >
>> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> > index be8ce40b5638..192143b5cdc0 100644
>> > --- a/include/linux/mmzone.h
>> > +++ b/include/linux/mmzone.h
>> > @@ -1219,6 +1219,7 @@ static inline enum zone_type memdesc_zonenum(memdesc_flags_t flags)
>> > static inline enum zone_type page_zonenum(const struct page *page)
>> > {
>> > + page = compound_head(page);
>> > return memdesc_zonenum(page->flags);
>>
>> We end up calling page_zonenum() without holding a reference.
>>
>> Given that _compound_head() does a READ_ONCE(), this should work even if we
>> see concurrent page freeing etc.
>>
>> However, this change implies that we now perform a compound page lookup for
>> every PageHighMem() [meh], page_zone() [quite some users in the buddy,
>> including for pageblock access and page freeing].
>>
>> That's a nasty compromise for making HVO better? :)
>>
>> We should likely limit that special casing to kernels that really rquire it
>> (HVO).
>
> I will add compound_info_has_mask() check.
Not thrilled by this indeed. Would it be a problem to have the shared tail
pages per node+zone instead of just per node?
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCHv6 08/17] mm: Make page_zonenum() use head page
2026-02-10 15:57 ` Vlastimil Babka
@ 2026-02-16 11:30 ` Kiryl Shutsemau
0 siblings, 0 replies; 67+ messages in thread
From: Kiryl Shutsemau @ 2026-02-16 11:30 UTC (permalink / raw)
To: Vlastimil Babka
Cc: David Hildenbrand (Arm),
Andrew Morton, Muchun Song, Matthew Wilcox, Usama Arif,
Frank van der Linden, Oscar Salvador, Mike Rapoport,
Lorenzo Stoakes, Zi Yan, Baoquan He, Michal Hocko,
Johannes Weiner, Jonathan Corbet, Huacai Chen, WANG Xuerui,
Palmer Dabbelt, Paul Walmsley, Albert Ou, Alexandre Ghiti,
kernel-team, linux-mm, linux-kernel, linux-doc, loongarch,
linux-riscv
On Tue, Feb 10, 2026 at 04:57:55PM +0100, Vlastimil Babka wrote:
> On 2/9/26 12:52, Kiryl Shutsemau wrote:
> > On Thu, Feb 05, 2026 at 02:10:40PM +0100, David Hildenbrand (Arm) wrote:
> >> On 2/2/26 16:56, Kiryl Shutsemau wrote:
> >> > With the upcoming changes to HVO, a single page of tail struct pages
> >> > will be shared across all huge pages of the same order on a node. Since
> >> > huge pages on the same node may belong to different zones, the zone
> >> > information stored in shared tail page flags would be incorrect.
> >> >
> >> > Always fetch zone information from the head page, which has unique and
> >> > correct zone flags for each compound page.
> >> >
> >> > Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> >> > Acked-by: Zi Yan <ziy@nvidia.com>
> >> > ---
> >> > include/linux/mmzone.h | 1 +
> >> > 1 file changed, 1 insertion(+)
> >> >
> >> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> >> > index be8ce40b5638..192143b5cdc0 100644
> >> > --- a/include/linux/mmzone.h
> >> > +++ b/include/linux/mmzone.h
> >> > @@ -1219,6 +1219,7 @@ static inline enum zone_type memdesc_zonenum(memdesc_flags_t flags)
> >> > static inline enum zone_type page_zonenum(const struct page *page)
> >> > {
> >> > + page = compound_head(page);
> >> > return memdesc_zonenum(page->flags);
> >>
> >> We end up calling page_zonenum() without holding a reference.
> >>
> >> Given that _compound_head() does a READ_ONCE(), this should work even if we
> >> see concurrent page freeing etc.
> >>
> >> However, this change implies that we now perform a compound page lookup for
> >> every PageHighMem() [meh], page_zone() [quite some users in the buddy,
> >> including for pageblock access and page freeing].
> >>
> >> That's a nasty compromise for making HVO better? :)
> >>
> >> We should likely limit that special casing to kernels that really rquire it
> >> (HVO).
> >
> > I will add compound_info_has_mask() check.
>
> Not thrilled by this indeed. Would it be a problem to have the shared tail
> pages per node+zone instead of just per node?
I thought it would be overkill. It likely is going to be unused for most
nodes. But sure, move it to per-zone.
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCHv6 08/17] mm: Make page_zonenum() use head page
2026-02-02 15:56 ` [PATCHv6 08/17] mm: Make page_zonenum() use head page Kiryl Shutsemau
2026-02-04 3:40 ` Muchun Song
2026-02-05 13:10 ` David Hildenbrand (Arm)
@ 2026-02-15 23:13 ` Matthew Wilcox
2026-02-16 9:06 ` David Hildenbrand (Arm)
2 siblings, 1 reply; 67+ messages in thread
From: Matthew Wilcox @ 2026-02-15 23:13 UTC (permalink / raw)
To: Kiryl Shutsemau
Cc: Andrew Morton, Muchun Song, David Hildenbrand, Usama Arif,
Frank van der Linden, Oscar Salvador, Mike Rapoport,
Vlastimil Babka, Lorenzo Stoakes, Zi Yan, Baoquan He,
Michal Hocko, Johannes Weiner, Jonathan Corbet, Huacai Chen,
WANG Xuerui, Palmer Dabbelt, Paul Walmsley, Albert Ou,
Alexandre Ghiti, kernel-team, linux-mm, linux-kernel, linux-doc,
loongarch, linux-riscv
On Mon, Feb 02, 2026 at 03:56:24PM +0000, Kiryl Shutsemau wrote:
> With the upcoming changes to HVO, a single page of tail struct pages
> will be shared across all huge pages of the same order on a node. Since
> huge pages on the same node may belong to different zones, the zone
> information stored in shared tail page flags would be incorrect.
>
> Always fetch zone information from the head page, which has unique and
> correct zone flags for each compound page.
You're right that different pages in the same folio can have different
zone number. But does it matter ... or to put it another way, why is
returning the zone number of the head page the correct way to resolve
this?
Arguably, the caller is asking for the zone number of _this page_, and
does not care about the zone number of the head page. It would be good
to have a short discussion of this in the commit message (but probably
not worth putting this in a comment).
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCHv6 08/17] mm: Make page_zonenum() use head page
2026-02-15 23:13 ` Matthew Wilcox
@ 2026-02-16 9:06 ` David Hildenbrand (Arm)
2026-02-16 11:20 ` Vlastimil Babka
0 siblings, 1 reply; 67+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-16 9:06 UTC (permalink / raw)
To: Matthew Wilcox, Kiryl Shutsemau
Cc: Andrew Morton, Muchun Song, Usama Arif, Frank van der Linden,
Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv
On 2/16/26 00:13, Matthew Wilcox wrote:
> On Mon, Feb 02, 2026 at 03:56:24PM +0000, Kiryl Shutsemau wrote:
>> With the upcoming changes to HVO, a single page of tail struct pages
>> will be shared across all huge pages of the same order on a node. Since
>> huge pages on the same node may belong to different zones, the zone
>> information stored in shared tail page flags would be incorrect.
>>
>> Always fetch zone information from the head page, which has unique and
>> correct zone flags for each compound page.
>
> You're right that different pages in the same folio can have different
> zone number. But does it matter ... or to put it another way, why is
> returning the zone number of the head page the correct way to resolve
> this?
How can a folio cross zones?
Runtime allocated hugetlb folios from the CMA/buddy (alloc_contig_range)
definitely fall into a single zone.
So is it about ones allocated early during boot, where, by chance, we
manage to cross ZONE_NORMAL + ZONE_MOVABLE etc?
I thought that it's also not allowed there, and I wonder whether we
should disallow it if it's possible.
>
> Arguably, the caller is asking for the zone number of _this page_, and
> does not care about the zone number of the head page. It would be good
> to have a short discussion of this in the commit message (but probably
> not worth putting this in a comment).
Agreed, in particular, if there would be a functional change. So far I
assumed there would be no such change.
Things like shrink_zone_span() really need to know the zone of that
page, not the one of the head; unless both fall into the same zone.
--
Cheers,
David
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCHv6 08/17] mm: Make page_zonenum() use head page
2026-02-16 9:06 ` David Hildenbrand (Arm)
@ 2026-02-16 11:20 ` Vlastimil Babka
0 siblings, 0 replies; 67+ messages in thread
From: Vlastimil Babka @ 2026-02-16 11:20 UTC (permalink / raw)
To: David Hildenbrand (Arm), Matthew Wilcox, Kiryl Shutsemau
Cc: Andrew Morton, Muchun Song, Usama Arif, Frank van der Linden,
Oscar Salvador, Mike Rapoport, Lorenzo Stoakes, Zi Yan,
Baoquan He, Michal Hocko, Johannes Weiner, Jonathan Corbet,
Huacai Chen, WANG Xuerui, Palmer Dabbelt, Paul Walmsley,
Albert Ou, Alexandre Ghiti, kernel-team, linux-mm, linux-kernel,
linux-doc, loongarch, linux-riscv
On 2/16/26 10:06, David Hildenbrand (Arm) wrote:
> On 2/16/26 00:13, Matthew Wilcox wrote:
>> On Mon, Feb 02, 2026 at 03:56:24PM +0000, Kiryl Shutsemau wrote:
>>> With the upcoming changes to HVO, a single page of tail struct pages
>>> will be shared across all huge pages of the same order on a node. Since
>>> huge pages on the same node may belong to different zones, the zone
>>> information stored in shared tail page flags would be incorrect.
>>>
>>> Always fetch zone information from the head page, which has unique and
>>> correct zone flags for each compound page.
>>
>> You're right that different pages in the same folio can have different
>> zone number. But does it matter ... or to put it another way, why is
>> returning the zone number of the head page the correct way to resolve
>> this?
>
> How can a folio cross zones?
>
> Runtime allocated hugetlb folios from the CMA/buddy (alloc_contig_range)
> definitely fall into a single zone.
>
> So is it about ones allocated early during boot, where, by chance, we
> manage to cross ZONE_NORMAL + ZONE_MOVABLE etc?
>
> I thought that it's also not allowed there, and I wonder whether we
> should disallow it if it's possible.
I would be surprised if things didn't break horribly if we allowed crossing
zones in a single folio. I'd rather not allow it.
(And I still don't like how this patch solves the issue)
>>
>> Arguably, the caller is asking for the zone number of _this page_, and
>> does not care about the zone number of the head page. It would be good
>> to have a short discussion of this in the commit message (but probably
>> not worth putting this in a comment).
>
> Agreed, in particular, if there would be a functional change. So far I
> assumed there would be no such change.
>
> Things like shrink_zone_span() really need to know the zone of that
> page, not the one of the head; unless both fall into the same zone.
>
^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCHv6 09/17] mm/sparse: Check memmap alignment for compound_info_has_mask()
2026-02-02 15:56 [PATCHv6 00/17] mm: Eliminate fake head pages from vmemmap optimization Kiryl Shutsemau
` (7 preceding siblings ...)
2026-02-02 15:56 ` [PATCHv6 08/17] mm: Make page_zonenum() use head page Kiryl Shutsemau
@ 2026-02-02 15:56 ` Kiryl Shutsemau
2026-02-03 3:35 ` Muchun Song
2026-02-05 13:31 ` David Hildenbrand (Arm)
2026-02-02 15:56 ` [PATCHv6 10/17] mm/hugetlb: Refactor code around vmemmap_walk Kiryl Shutsemau
` (7 subsequent siblings)
16 siblings, 2 replies; 67+ messages in thread
From: Kiryl Shutsemau @ 2026-02-02 15:56 UTC (permalink / raw)
To: Andrew Morton, Muchun Song, David Hildenbrand, Matthew Wilcox,
Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv, Kiryl Shutsemau
If page->compound_info encodes a mask, it is expected that vmemmap to be
naturally aligned to the maximum folio size.
Add a VM_BUG_ON() to check the alignment.
Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Acked-by: Zi Yan <ziy@nvidia.com>
---
mm/sparse.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/mm/sparse.c b/mm/sparse.c
index b5b2b6f7041b..6c9b62607f3f 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -600,6 +600,13 @@ void __init sparse_init(void)
BUILD_BUG_ON(!is_power_of_2(sizeof(struct mem_section)));
memblocks_present();
+ if (compound_info_has_mask()) {
+ unsigned long alignment;
+
+ alignment = MAX_FOLIO_NR_PAGES * sizeof(struct page);
+ VM_BUG_ON(!IS_ALIGNED((unsigned long) pfn_to_page(0), alignment));
+ }
+
pnum_begin = first_present_section_nr();
nid_begin = sparse_early_nid(__nr_to_section(pnum_begin));
--
2.51.2
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCHv6 09/17] mm/sparse: Check memmap alignment for compound_info_has_mask()
2026-02-02 15:56 ` [PATCHv6 09/17] mm/sparse: Check memmap alignment for compound_info_has_mask() Kiryl Shutsemau
@ 2026-02-03 3:35 ` Muchun Song
2026-02-05 13:31 ` David Hildenbrand (Arm)
1 sibling, 0 replies; 67+ messages in thread
From: Muchun Song @ 2026-02-03 3:35 UTC (permalink / raw)
To: Kiryl Shutsemau
Cc: Andrew Morton, David Hildenbrand, Matthew Wilcox, Usama Arif,
Frank van der Linden, Oscar Salvador, Mike Rapoport,
Vlastimil Babka, Lorenzo Stoakes, Zi Yan, Baoquan He,
Michal Hocko, Johannes Weiner, Jonathan Corbet, Huacai Chen,
WANG Xuerui, Palmer Dabbelt, Paul Walmsley, Albert Ou,
Alexandre Ghiti, kernel-team, linux-mm, linux-kernel, linux-doc,
loongarch, linux-riscv
> On Feb 2, 2026, at 23:56, Kiryl Shutsemau <kas@kernel.org> wrote:
>
> If page->compound_info encodes a mask, it is expected that vmemmap to be
> naturally aligned to the maximum folio size.
>
> Add a VM_BUG_ON() to check the alignment.
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Acked-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCHv6 09/17] mm/sparse: Check memmap alignment for compound_info_has_mask()
2026-02-02 15:56 ` [PATCHv6 09/17] mm/sparse: Check memmap alignment for compound_info_has_mask() Kiryl Shutsemau
2026-02-03 3:35 ` Muchun Song
@ 2026-02-05 13:31 ` David Hildenbrand (Arm)
2026-02-05 13:58 ` David Hildenbrand (Arm)
1 sibling, 1 reply; 67+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-05 13:31 UTC (permalink / raw)
To: Kiryl Shutsemau, Andrew Morton, Muchun Song, Matthew Wilcox,
Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv
On 2/2/26 16:56, Kiryl Shutsemau wrote:
> If page->compound_info encodes a mask, it is expected that vmemmap to be
> naturally aligned to the maximum folio size.
>
> Add a VM_BUG_ON() to check the alignment.
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Acked-by: Zi Yan <ziy@nvidia.com>
> ---
> mm/sparse.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index b5b2b6f7041b..6c9b62607f3f 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -600,6 +600,13 @@ void __init sparse_init(void)
> BUILD_BUG_ON(!is_power_of_2(sizeof(struct mem_section)));
> memblocks_present();
>
> + if (compound_info_has_mask()) {
> + unsigned long alignment;
> +
> + alignment = MAX_FOLIO_NR_PAGES * sizeof(struct page);
> + VM_BUG_ON(!IS_ALIGNED((unsigned long) pfn_to_page(0), alignment));
No VM_BUG_ON. VM_WARN_ON_ONCE() should be good enough, no?
As discussed in the other thread, is checking for MAX_FOLIO_NR_PAGES
alignment sufficient?
--
Cheers,
David
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCHv6 09/17] mm/sparse: Check memmap alignment for compound_info_has_mask()
2026-02-05 13:31 ` David Hildenbrand (Arm)
@ 2026-02-05 13:58 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 67+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-05 13:58 UTC (permalink / raw)
To: Kiryl Shutsemau, Andrew Morton, Muchun Song, Matthew Wilcox,
Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv
On 2/5/26 14:31, David Hildenbrand (Arm) wrote:
> On 2/2/26 16:56, Kiryl Shutsemau wrote:
>> If page->compound_info encodes a mask, it is expected that vmemmap to be
>> naturally aligned to the maximum folio size.
>>
>> Add a VM_BUG_ON() to check the alignment.
>>
>> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
>> Acked-by: Zi Yan <ziy@nvidia.com>
>> ---
>> mm/sparse.c | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> diff --git a/mm/sparse.c b/mm/sparse.c
>> index b5b2b6f7041b..6c9b62607f3f 100644
>> --- a/mm/sparse.c
>> +++ b/mm/sparse.c
>> @@ -600,6 +600,13 @@ void __init sparse_init(void)
>> BUILD_BUG_ON(!is_power_of_2(sizeof(struct mem_section)));
>> memblocks_present();
>> + if (compound_info_has_mask()) {
>> + unsigned long alignment;
>> +
>> + alignment = MAX_FOLIO_NR_PAGES * sizeof(struct page);
>> + VM_BUG_ON(!IS_ALIGNED((unsigned long) pfn_to_page(0),
>> alignment));
>
> No VM_BUG_ON. VM_WARN_ON_ONCE() should be good enough, no?
>
> As discussed in the other thread, is checking for MAX_FOLIO_NR_PAGES
> alignment sufficient?
And after further discussions, we could use MAX_FOLIO_VMEMMAP_ALIGN
macro once we have that.
--
Cheers,
David
^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCHv6 10/17] mm/hugetlb: Refactor code around vmemmap_walk
2026-02-02 15:56 [PATCHv6 00/17] mm: Eliminate fake head pages from vmemmap optimization Kiryl Shutsemau
` (8 preceding siblings ...)
2026-02-02 15:56 ` [PATCHv6 09/17] mm/sparse: Check memmap alignment for compound_info_has_mask() Kiryl Shutsemau
@ 2026-02-02 15:56 ` Kiryl Shutsemau
2026-02-02 15:56 ` [PATCHv6 11/17] mm/hugetlb: Remove fake head pages Kiryl Shutsemau
` (6 subsequent siblings)
16 siblings, 0 replies; 67+ messages in thread
From: Kiryl Shutsemau @ 2026-02-02 15:56 UTC (permalink / raw)
To: Andrew Morton, Muchun Song, David Hildenbrand, Matthew Wilcox,
Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv, Kiryl Shutsemau
To prepare for removing fake head pages, the vmemmap_walk code is being
reworked.
The reuse_page and reuse_addr variables are being eliminated. There will
no longer be an expectation regarding the reuse address in relation to
the operated range. Instead, the caller will provide head and tail
vmemmap pages.
Currently, vmemmap_head and vmemmap_tail are set to the same page, but
this will change in the future.
The only functional change is that __hugetlb_vmemmap_optimize_folio()
will abandon optimization if memory allocation fails.
Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
---
mm/hugetlb_vmemmap.c | 226 +++++++++++++++++--------------------------
1 file changed, 90 insertions(+), 136 deletions(-)
diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index a9280259e12a..a39a301e08b9 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -25,8 +25,8 @@
*
* @remap_pte: called for each lowest-level entry (PTE).
* @nr_walked: the number of walked pte.
- * @reuse_page: the page which is reused for the tail vmemmap pages.
- * @reuse_addr: the virtual address of the @reuse_page page.
+ * @vmemmap_head: the page to be installed as first in the vmemmap range
+ * @vmemmap_tail: the page to be installed as non-first in the vmemmap range
* @vmemmap_pages: the list head of the vmemmap pages that can be freed
* or is mapped from.
* @flags: used to modify behavior in vmemmap page table walking
@@ -35,11 +35,13 @@
struct vmemmap_remap_walk {
void (*remap_pte)(pte_t *pte, unsigned long addr,
struct vmemmap_remap_walk *walk);
+
unsigned long nr_walked;
- struct page *reuse_page;
- unsigned long reuse_addr;
+ struct page *vmemmap_head;
+ struct page *vmemmap_tail;
struct list_head *vmemmap_pages;
+
/* Skip the TLB flush when we split the PMD */
#define VMEMMAP_SPLIT_NO_TLB_FLUSH BIT(0)
/* Skip the TLB flush when we remap the PTE */
@@ -141,14 +143,7 @@ static int vmemmap_pte_entry(pte_t *pte, unsigned long addr,
{
struct vmemmap_remap_walk *vmemmap_walk = walk->private;
- /*
- * The reuse_page is found 'first' in page table walking before
- * starting remapping.
- */
- if (!vmemmap_walk->reuse_page)
- vmemmap_walk->reuse_page = pte_page(ptep_get(pte));
- else
- vmemmap_walk->remap_pte(pte, addr, vmemmap_walk);
+ vmemmap_walk->remap_pte(pte, addr, vmemmap_walk);
vmemmap_walk->nr_walked++;
return 0;
@@ -208,18 +203,12 @@ static void free_vmemmap_page_list(struct list_head *list)
static void vmemmap_remap_pte(pte_t *pte, unsigned long addr,
struct vmemmap_remap_walk *walk)
{
- /*
- * Remap the tail pages as read-only to catch illegal write operation
- * to the tail pages.
- */
- pgprot_t pgprot = PAGE_KERNEL_RO;
struct page *page = pte_page(ptep_get(pte));
pte_t entry;
/* Remapping the head page requires r/w */
- if (unlikely(addr == walk->reuse_addr)) {
- pgprot = PAGE_KERNEL;
- list_del(&walk->reuse_page->lru);
+ if (unlikely(walk->nr_walked == 0 && walk->vmemmap_head)) {
+ list_del(&walk->vmemmap_head->lru);
/*
* Makes sure that preceding stores to the page contents from
@@ -227,53 +216,50 @@ static void vmemmap_remap_pte(pte_t *pte, unsigned long addr,
* write.
*/
smp_wmb();
+
+ entry = mk_pte(walk->vmemmap_head, PAGE_KERNEL);
+ } else {
+ /*
+ * Remap the tail pages as read-only to catch illegal write
+ * operation to the tail pages.
+ */
+ entry = mk_pte(walk->vmemmap_tail, PAGE_KERNEL_RO);
}
- entry = mk_pte(walk->reuse_page, pgprot);
list_add(&page->lru, walk->vmemmap_pages);
set_pte_at(&init_mm, addr, pte, entry);
}
-/*
- * How many struct page structs need to be reset. When we reuse the head
- * struct page, the special metadata (e.g. page->flags or page->mapping)
- * cannot copy to the tail struct page structs. The invalid value will be
- * checked in the free_tail_page_prepare(). In order to avoid the message
- * of "corrupted mapping in tail page". We need to reset at least 4 (one
- * head struct page struct and three tail struct page structs) struct page
- * structs.
- */
-#define NR_RESET_STRUCT_PAGE 4
-
-static inline void reset_struct_pages(struct page *start)
-{
- struct page *from = start + NR_RESET_STRUCT_PAGE;
-
- BUILD_BUG_ON(NR_RESET_STRUCT_PAGE * 2 > PAGE_SIZE / sizeof(struct page));
- memcpy(start, from, sizeof(*from) * NR_RESET_STRUCT_PAGE);
-}
-
static void vmemmap_restore_pte(pte_t *pte, unsigned long addr,
struct vmemmap_remap_walk *walk)
{
- pgprot_t pgprot = PAGE_KERNEL;
struct page *page;
- void *to;
-
- BUG_ON(pte_page(ptep_get(pte)) != walk->reuse_page);
+ struct page *from, *to;
page = list_first_entry(walk->vmemmap_pages, struct page, lru);
list_del(&page->lru);
+
+ /*
+ * Initialize tail pages in the newly allocated vmemmap page.
+ *
+ * There is folio-scope metadata that is encoded in the first few
+ * tail pages.
+ *
+ * Use the value last tail page in the page with the head page
+ * to initialize the rest of tail pages.
+ */
+ from = compound_head((struct page *)addr) +
+ PAGE_SIZE / sizeof(struct page) - 1;
to = page_to_virt(page);
- copy_page(to, (void *)walk->reuse_addr);
- reset_struct_pages(to);
+ for (int i = 0; i < PAGE_SIZE / sizeof(struct page); i++, to++)
+ *to = *from;
/*
* Makes sure that preceding stores to the page contents become visible
* before the set_pte_at() write.
*/
smp_wmb();
- set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot));
+ set_pte_at(&init_mm, addr, pte, mk_pte(page, PAGE_KERNEL));
}
/**
@@ -283,33 +269,28 @@ static void vmemmap_restore_pte(pte_t *pte, unsigned long addr,
* to remap.
* @end: end address of the vmemmap virtual address range that we want to
* remap.
- * @reuse: reuse address.
- *
* Return: %0 on success, negative error code otherwise.
*/
-static int vmemmap_remap_split(unsigned long start, unsigned long end,
- unsigned long reuse)
+static int vmemmap_remap_split(unsigned long start, unsigned long end)
{
struct vmemmap_remap_walk walk = {
.remap_pte = NULL,
.flags = VMEMMAP_SPLIT_NO_TLB_FLUSH,
};
- /* See the comment in the vmemmap_remap_free(). */
- BUG_ON(start - reuse != PAGE_SIZE);
-
- return vmemmap_remap_range(reuse, end, &walk);
+ return vmemmap_remap_range(start, end, &walk);
}
/**
* vmemmap_remap_free - remap the vmemmap virtual address range [@start, @end)
- * to the page which @reuse is mapped to, then free vmemmap
- * which the range are mapped to.
+ * to use @vmemmap_head/tail, then free vmemmap which
+ * the range are mapped to.
* @start: start address of the vmemmap virtual address range that we want
* to remap.
* @end: end address of the vmemmap virtual address range that we want to
* remap.
- * @reuse: reuse address.
+ * @vmemmap_head: the page to be installed as first in the vmemmap range
+ * @vmemmap_tail: the page to be installed as non-first in the vmemmap range
* @vmemmap_pages: list to deposit vmemmap pages to be freed. It is callers
* responsibility to free pages.
* @flags: modifications to vmemmap_remap_walk flags
@@ -317,69 +298,38 @@ static int vmemmap_remap_split(unsigned long start, unsigned long end,
* Return: %0 on success, negative error code otherwise.
*/
static int vmemmap_remap_free(unsigned long start, unsigned long end,
- unsigned long reuse,
+ struct page *vmemmap_head,
+ struct page *vmemmap_tail,
struct list_head *vmemmap_pages,
unsigned long flags)
{
int ret;
struct vmemmap_remap_walk walk = {
.remap_pte = vmemmap_remap_pte,
- .reuse_addr = reuse,
+ .vmemmap_head = vmemmap_head,
+ .vmemmap_tail = vmemmap_tail,
.vmemmap_pages = vmemmap_pages,
.flags = flags,
};
- int nid = page_to_nid((struct page *)reuse);
- gfp_t gfp_mask = GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN;
+
+ ret = vmemmap_remap_range(start, end, &walk);
+ if (!ret || !walk.nr_walked)
+ return ret;
+
+ end = start + walk.nr_walked * PAGE_SIZE;
/*
- * Allocate a new head vmemmap page to avoid breaking a contiguous
- * block of struct page memory when freeing it back to page allocator
- * in free_vmemmap_page_list(). This will allow the likely contiguous
- * struct page backing memory to be kept contiguous and allowing for
- * more allocations of hugepages. Fallback to the currently
- * mapped head page in case should it fail to allocate.
+ * vmemmap_pages contains pages from the previous vmemmap_remap_range()
+ * call which failed. These are pages which were removed from
+ * the vmemmap. They will be restored in the following call.
*/
- walk.reuse_page = alloc_pages_node(nid, gfp_mask, 0);
- if (walk.reuse_page) {
- copy_page(page_to_virt(walk.reuse_page),
- (void *)walk.reuse_addr);
- list_add(&walk.reuse_page->lru, vmemmap_pages);
- memmap_pages_add(1);
- }
+ walk = (struct vmemmap_remap_walk) {
+ .remap_pte = vmemmap_restore_pte,
+ .vmemmap_pages = vmemmap_pages,
+ .flags = 0,
+ };
- /*
- * In order to make remapping routine most efficient for the huge pages,
- * the routine of vmemmap page table walking has the following rules
- * (see more details from the vmemmap_pte_range()):
- *
- * - The range [@start, @end) and the range [@reuse, @reuse + PAGE_SIZE)
- * should be continuous.
- * - The @reuse address is part of the range [@reuse, @end) that we are
- * walking which is passed to vmemmap_remap_range().
- * - The @reuse address is the first in the complete range.
- *
- * So we need to make sure that @start and @reuse meet the above rules.
- */
- BUG_ON(start - reuse != PAGE_SIZE);
-
- ret = vmemmap_remap_range(reuse, end, &walk);
- if (ret && walk.nr_walked) {
- end = reuse + walk.nr_walked * PAGE_SIZE;
- /*
- * vmemmap_pages contains pages from the previous
- * vmemmap_remap_range call which failed. These
- * are pages which were removed from the vmemmap.
- * They will be restored in the following call.
- */
- walk = (struct vmemmap_remap_walk) {
- .remap_pte = vmemmap_restore_pte,
- .reuse_addr = reuse,
- .vmemmap_pages = vmemmap_pages,
- .flags = 0,
- };
-
- vmemmap_remap_range(reuse, end, &walk);
- }
+ vmemmap_remap_range(start, end, &walk);
return ret;
}
@@ -416,29 +366,24 @@ static int alloc_vmemmap_page_list(unsigned long start, unsigned long end,
* to remap.
* @end: end address of the vmemmap virtual address range that we want to
* remap.
- * @reuse: reuse address.
* @flags: modifications to vmemmap_remap_walk flags
*
* Return: %0 on success, negative error code otherwise.
*/
static int vmemmap_remap_alloc(unsigned long start, unsigned long end,
- unsigned long reuse, unsigned long flags)
+ unsigned long flags)
{
LIST_HEAD(vmemmap_pages);
struct vmemmap_remap_walk walk = {
.remap_pte = vmemmap_restore_pte,
- .reuse_addr = reuse,
.vmemmap_pages = &vmemmap_pages,
.flags = flags,
};
- /* See the comment in the vmemmap_remap_free(). */
- BUG_ON(start - reuse != PAGE_SIZE);
-
if (alloc_vmemmap_page_list(start, end, &vmemmap_pages))
return -ENOMEM;
- return vmemmap_remap_range(reuse, end, &walk);
+ return vmemmap_remap_range(start, end, &walk);
}
DEFINE_STATIC_KEY_FALSE(hugetlb_optimize_vmemmap_key);
@@ -455,8 +400,7 @@ static int __hugetlb_vmemmap_restore_folio(const struct hstate *h,
struct folio *folio, unsigned long flags)
{
int ret;
- unsigned long vmemmap_start = (unsigned long)&folio->page, vmemmap_end;
- unsigned long vmemmap_reuse;
+ unsigned long vmemmap_start, vmemmap_end;
VM_WARN_ON_ONCE_FOLIO(!folio_test_hugetlb(folio), folio);
VM_WARN_ON_ONCE_FOLIO(folio_ref_count(folio), folio);
@@ -467,18 +411,18 @@ static int __hugetlb_vmemmap_restore_folio(const struct hstate *h,
if (flags & VMEMMAP_SYNCHRONIZE_RCU)
synchronize_rcu();
+ vmemmap_start = (unsigned long)&folio->page;
vmemmap_end = vmemmap_start + hugetlb_vmemmap_size(h);
- vmemmap_reuse = vmemmap_start;
+
vmemmap_start += HUGETLB_VMEMMAP_RESERVE_SIZE;
/*
* The pages which the vmemmap virtual address range [@vmemmap_start,
- * @vmemmap_end) are mapped to are freed to the buddy allocator, and
- * the range is mapped to the page which @vmemmap_reuse is mapped to.
+ * @vmemmap_end) are mapped to are freed to the buddy allocator.
* When a HugeTLB page is freed to the buddy allocator, previously
* discarded vmemmap pages must be allocated and remapping.
*/
- ret = vmemmap_remap_alloc(vmemmap_start, vmemmap_end, vmemmap_reuse, flags);
+ ret = vmemmap_remap_alloc(vmemmap_start, vmemmap_end, flags);
if (!ret) {
folio_clear_hugetlb_vmemmap_optimized(folio);
static_branch_dec(&hugetlb_optimize_vmemmap_key);
@@ -566,9 +510,9 @@ static int __hugetlb_vmemmap_optimize_folio(const struct hstate *h,
struct list_head *vmemmap_pages,
unsigned long flags)
{
- int ret = 0;
- unsigned long vmemmap_start = (unsigned long)&folio->page, vmemmap_end;
- unsigned long vmemmap_reuse;
+ unsigned long vmemmap_start, vmemmap_end;
+ struct page *vmemmap_head, *vmemmap_tail;
+ int nid, ret = 0;
VM_WARN_ON_ONCE_FOLIO(!folio_test_hugetlb(folio), folio);
VM_WARN_ON_ONCE_FOLIO(folio_ref_count(folio), folio);
@@ -593,18 +537,30 @@ static int __hugetlb_vmemmap_optimize_folio(const struct hstate *h,
*/
folio_set_hugetlb_vmemmap_optimized(folio);
+ nid = folio_nid(folio);
+ vmemmap_head = alloc_pages_node(nid, GFP_KERNEL, 0);
+ if (!vmemmap_head) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ copy_page(page_to_virt(vmemmap_head), folio);
+ list_add(&vmemmap_head->lru, vmemmap_pages);
+ memmap_pages_add(1);
+
+ vmemmap_tail = vmemmap_head;
+ vmemmap_start = (unsigned long)&folio->page;
vmemmap_end = vmemmap_start + hugetlb_vmemmap_size(h);
- vmemmap_reuse = vmemmap_start;
- vmemmap_start += HUGETLB_VMEMMAP_RESERVE_SIZE;
/*
- * Remap the vmemmap virtual address range [@vmemmap_start, @vmemmap_end)
- * to the page which @vmemmap_reuse is mapped to. Add pages previously
- * mapping the range to vmemmap_pages list so that they can be freed by
- * the caller.
+ * Remap the vmemmap virtual address range [@vmemmap_start, @vmemmap_end).
+ * Add pages previously mapping the range to vmemmap_pages list so that
+ * they can be freed by the caller.
*/
- ret = vmemmap_remap_free(vmemmap_start, vmemmap_end, vmemmap_reuse,
+ ret = vmemmap_remap_free(vmemmap_start, vmemmap_end,
+ vmemmap_head, vmemmap_tail,
vmemmap_pages, flags);
+out:
if (ret) {
static_branch_dec(&hugetlb_optimize_vmemmap_key);
folio_clear_hugetlb_vmemmap_optimized(folio);
@@ -633,21 +589,19 @@ void hugetlb_vmemmap_optimize_folio(const struct hstate *h, struct folio *folio)
static int hugetlb_vmemmap_split_folio(const struct hstate *h, struct folio *folio)
{
- unsigned long vmemmap_start = (unsigned long)&folio->page, vmemmap_end;
- unsigned long vmemmap_reuse;
+ unsigned long vmemmap_start, vmemmap_end;
if (!vmemmap_should_optimize_folio(h, folio))
return 0;
+ vmemmap_start = (unsigned long)&folio->page;
vmemmap_end = vmemmap_start + hugetlb_vmemmap_size(h);
- vmemmap_reuse = vmemmap_start;
- vmemmap_start += HUGETLB_VMEMMAP_RESERVE_SIZE;
/*
* Split PMDs on the vmemmap virtual address range [@vmemmap_start,
* @vmemmap_end]
*/
- return vmemmap_remap_split(vmemmap_start, vmemmap_end, vmemmap_reuse);
+ return vmemmap_remap_split(vmemmap_start, vmemmap_end);
}
static void __hugetlb_vmemmap_optimize_folios(struct hstate *h,
--
2.51.2
^ permalink raw reply [flat|nested] 67+ messages in thread* [PATCHv6 11/17] mm/hugetlb: Remove fake head pages
2026-02-02 15:56 [PATCHv6 00/17] mm: Eliminate fake head pages from vmemmap optimization Kiryl Shutsemau
` (9 preceding siblings ...)
2026-02-02 15:56 ` [PATCHv6 10/17] mm/hugetlb: Refactor code around vmemmap_walk Kiryl Shutsemau
@ 2026-02-02 15:56 ` Kiryl Shutsemau
2026-02-03 9:50 ` Muchun Song
` (3 more replies)
2026-02-02 15:56 ` [PATCHv6 12/17] mm: Drop fake head checks Kiryl Shutsemau
` (5 subsequent siblings)
16 siblings, 4 replies; 67+ messages in thread
From: Kiryl Shutsemau @ 2026-02-02 15:56 UTC (permalink / raw)
To: Andrew Morton, Muchun Song, David Hildenbrand, Matthew Wilcox,
Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv, Kiryl Shutsemau
HugeTLB Vmemmap Optimization (HVO) reduces memory usage by freeing most
vmemmap pages for huge pages and remapping the freed range to a single
page containing the struct page metadata.
With the new mask-based compound_info encoding (for power-of-2 struct
page sizes), all tail pages of the same order are now identical
regardless of which compound page they belong to. This means the tail
pages can be truly shared without fake heads.
Allocate a single page of initialized tail struct pages per NUMA node
per order in the vmemmap_tails[] array in pglist_data. All huge pages of
that order on the node share this tail page, mapped read-only into their
vmemmap. The head page remains unique per huge page.
Redefine MAX_FOLIO_ORDER using ilog2(). The define has to produce a
compile-constant as it is used to specify vmemmap_tail array size.
For some reason, compiler is not able to solve get_order() at
compile-time, but ilog2() works.
Avoid PUD_ORDER to define MAX_FOLIO_ORDER as it adds dependency to
<linux/pgtable.h> which generates hard-to-break include loop.
This eliminates fake heads while maintaining the same memory savings,
and simplifies compound_head() by removing fake head detection.
Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
---
include/linux/mmzone.h | 19 +++++++++++++++++--
mm/hugetlb_vmemmap.c | 34 +++++++++++++++++++++++++++++++--
mm/sparse-vmemmap.c | 43 ++++++++++++++++++++++++++++++++++--------
3 files changed, 84 insertions(+), 12 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 192143b5cdc0..c01f8235743b 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -81,13 +81,17 @@
* currently expect (see CONFIG_HAVE_GIGANTIC_FOLIOS): with hugetlb, we expect
* no folios larger than 16 GiB on 64bit and 1 GiB on 32bit.
*/
-#define MAX_FOLIO_ORDER get_order(IS_ENABLED(CONFIG_64BIT) ? SZ_16G : SZ_1G)
+#ifdef CONFIG_64BIT
+#define MAX_FOLIO_ORDER (ilog2(SZ_16G) - PAGE_SHIFT)
+#else
+#define MAX_FOLIO_ORDER (ilog2(SZ_1G) - PAGE_SHIFT)
+#endif
#else
/*
* Without hugetlb, gigantic folios that are bigger than a single PUD are
* currently impossible.
*/
-#define MAX_FOLIO_ORDER PUD_ORDER
+#define MAX_FOLIO_ORDER (PUD_SHIFT - PAGE_SHIFT)
#endif
#define MAX_FOLIO_NR_PAGES (1UL << MAX_FOLIO_ORDER)
@@ -1402,6 +1406,14 @@ struct memory_failure_stats {
};
#endif
+/*
+ * vmemmap optimization (like HVO) is only possible for page orders that fill
+ * two or more pages with struct pages.
+ */
+#define VMEMMAP_TAIL_MIN_ORDER (ilog2(2 * PAGE_SIZE / sizeof(struct page)))
+#define __NR_VMEMMAP_TAILS (MAX_FOLIO_ORDER - VMEMMAP_TAIL_MIN_ORDER + 1)
+#define NR_VMEMMAP_TAILS (__NR_VMEMMAP_TAILS > 0 ? __NR_VMEMMAP_TAILS : 0)
+
/*
* On NUMA machines, each NUMA node would have a pg_data_t to describe
* it's memory layout. On UMA machines there is a single pglist_data which
@@ -1550,6 +1562,9 @@ typedef struct pglist_data {
#ifdef CONFIG_MEMORY_FAILURE
struct memory_failure_stats mf_stats;
#endif
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+ struct page *vmemmap_tails[NR_VMEMMAP_TAILS];
+#endif
} pg_data_t;
#define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages)
diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index a39a301e08b9..688764c52c72 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -19,6 +19,7 @@
#include <asm/tlbflush.h>
#include "hugetlb_vmemmap.h"
+#include "internal.h"
/**
* struct vmemmap_remap_walk - walk vmemmap page table
@@ -505,6 +506,32 @@ static bool vmemmap_should_optimize_folio(const struct hstate *h, struct folio *
return true;
}
+static struct page *vmemmap_get_tail(unsigned int order, int node)
+{
+ struct page *tail, *p;
+ unsigned int idx;
+
+ idx = order - VMEMMAP_TAIL_MIN_ORDER;
+ tail = READ_ONCE(NODE_DATA(node)->vmemmap_tails[idx]);
+ if (tail)
+ return tail;
+
+ tail = alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO, 0);
+ if (!tail)
+ return NULL;
+
+ p = page_to_virt(tail);
+ for (int i = 0; i < PAGE_SIZE / sizeof(struct page); i++)
+ prep_compound_tail(p + i, NULL, order);
+
+ if (cmpxchg(&NODE_DATA(node)->vmemmap_tails[idx], NULL, tail)) {
+ __free_page(tail);
+ tail = READ_ONCE(NODE_DATA(node)->vmemmap_tails[idx]);
+ }
+
+ return tail;
+}
+
static int __hugetlb_vmemmap_optimize_folio(const struct hstate *h,
struct folio *folio,
struct list_head *vmemmap_pages,
@@ -520,6 +547,11 @@ static int __hugetlb_vmemmap_optimize_folio(const struct hstate *h,
if (!vmemmap_should_optimize_folio(h, folio))
return ret;
+ nid = folio_nid(folio);
+ vmemmap_tail = vmemmap_get_tail(h->order, nid);
+ if (!vmemmap_tail)
+ return -ENOMEM;
+
static_branch_inc(&hugetlb_optimize_vmemmap_key);
if (flags & VMEMMAP_SYNCHRONIZE_RCU)
@@ -537,7 +569,6 @@ static int __hugetlb_vmemmap_optimize_folio(const struct hstate *h,
*/
folio_set_hugetlb_vmemmap_optimized(folio);
- nid = folio_nid(folio);
vmemmap_head = alloc_pages_node(nid, GFP_KERNEL, 0);
if (!vmemmap_head) {
ret = -ENOMEM;
@@ -548,7 +579,6 @@ static int __hugetlb_vmemmap_optimize_folio(const struct hstate *h,
list_add(&vmemmap_head->lru, vmemmap_pages);
memmap_pages_add(1);
- vmemmap_tail = vmemmap_head;
vmemmap_start = (unsigned long)&folio->page;
vmemmap_end = vmemmap_start + hugetlb_vmemmap_size(h);
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 37522d6cb398..13bcf5562f1b 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -378,16 +378,44 @@ void vmemmap_wrprotect_hvo(unsigned long addr, unsigned long end,
}
}
-/*
- * Populate vmemmap pages HVO-style. The first page contains the head
- * page and needed tail pages, the other ones are mirrors of the first
- * page.
- */
+static __meminit unsigned long vmemmap_get_tail(unsigned int order, int node)
+{
+ struct page *p, *tail;
+ unsigned int idx;
+
+ BUG_ON(order < VMEMMAP_TAIL_MIN_ORDER);
+ BUG_ON(order > MAX_FOLIO_ORDER);
+
+ idx = order - VMEMMAP_TAIL_MIN_ORDER;
+ tail = NODE_DATA(node)->vmemmap_tails[idx];
+ if (tail)
+ return page_to_pfn(tail);
+
+ p = vmemmap_alloc_block_zero(PAGE_SIZE, node);
+ if (!p)
+ return 0;
+
+ for (int i = 0; i < PAGE_SIZE / sizeof(struct page); i++)
+ prep_compound_tail(p + i, NULL, order);
+
+ tail = virt_to_page(p);
+ NODE_DATA(node)->vmemmap_tails[idx] = tail;
+
+ return page_to_pfn(tail);
+}
+
int __meminit vmemmap_populate_hvo(unsigned long addr, unsigned long end,
int node, unsigned long headsize)
{
+ unsigned long maddr, len, tail_pfn;
+ unsigned int order;
pte_t *pte;
- unsigned long maddr;
+
+ len = end - addr;
+ order = ilog2(len * sizeof(struct page) / PAGE_SIZE);
+ tail_pfn = vmemmap_get_tail(order, node);
+ if (!tail_pfn)
+ return -ENOMEM;
for (maddr = addr; maddr < addr + headsize; maddr += PAGE_SIZE) {
pte = vmemmap_populate_address(maddr, node, NULL, -1, 0);
@@ -398,8 +426,7 @@ int __meminit vmemmap_populate_hvo(unsigned long addr, unsigned long end,
/*
* Reuse the last page struct page mapped above for the rest.
*/
- return vmemmap_populate_range(maddr, end, node, NULL,
- pte_pfn(ptep_get(pte)), 0);
+ return vmemmap_populate_range(maddr, end, node, NULL, tail_pfn, 0);
}
void __weak __meminit vmemmap_set_pmd(pmd_t *pmd, void *p, int node,
--
2.51.2
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCHv6 11/17] mm/hugetlb: Remove fake head pages
2026-02-02 15:56 ` [PATCHv6 11/17] mm/hugetlb: Remove fake head pages Kiryl Shutsemau
@ 2026-02-03 9:50 ` Muchun Song
2026-02-06 9:14 ` David Hildenbrand (Arm)
` (2 subsequent siblings)
3 siblings, 0 replies; 67+ messages in thread
From: Muchun Song @ 2026-02-03 9:50 UTC (permalink / raw)
To: Kiryl Shutsemau
Cc: Andrew Morton, David Hildenbrand, Matthew Wilcox, Usama Arif,
Frank van der Linden, Oscar Salvador, Mike Rapoport,
Vlastimil Babka, Lorenzo Stoakes, Zi Yan, Baoquan He,
Michal Hocko, Johannes Weiner, Jonathan Corbet, Huacai Chen,
WANG Xuerui, Palmer Dabbelt, Paul Walmsley, Albert Ou,
Alexandre Ghiti, kernel-team, linux-mm, linux-kernel, linux-doc,
loongarch, linux-riscv
> On Feb 2, 2026, at 23:56, Kiryl Shutsemau <kas@kernel.org> wrote:
>
> HugeTLB Vmemmap Optimization (HVO) reduces memory usage by freeing most
> vmemmap pages for huge pages and remapping the freed range to a single
> page containing the struct page metadata.
>
> With the new mask-based compound_info encoding (for power-of-2 struct
> page sizes), all tail pages of the same order are now identical
> regardless of which compound page they belong to. This means the tail
> pages can be truly shared without fake heads.
>
> Allocate a single page of initialized tail struct pages per NUMA node
> per order in the vmemmap_tails[] array in pglist_data. All huge pages of
> that order on the node share this tail page, mapped read-only into their
> vmemmap. The head page remains unique per huge page.
>
> Redefine MAX_FOLIO_ORDER using ilog2(). The define has to produce a
> compile-constant as it is used to specify vmemmap_tail array size.
> For some reason, compiler is not able to solve get_order() at
> compile-time, but ilog2() works.
>
> Avoid PUD_ORDER to define MAX_FOLIO_ORDER as it adds dependency to
> <linux/pgtable.h> which generates hard-to-break include loop.
>
> This eliminates fake heads while maintaining the same memory savings,
> and simplifies compound_head() by removing fake head detection.
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCHv6 11/17] mm/hugetlb: Remove fake head pages
2026-02-02 15:56 ` [PATCHv6 11/17] mm/hugetlb: Remove fake head pages Kiryl Shutsemau
2026-02-03 9:50 ` Muchun Song
@ 2026-02-06 9:14 ` David Hildenbrand (Arm)
2026-02-06 9:36 ` David Hildenbrand (Arm)
2026-02-07 20:16 ` Usama Arif
3 siblings, 0 replies; 67+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-06 9:14 UTC (permalink / raw)
To: Kiryl Shutsemau, Andrew Morton, Muchun Song, Matthew Wilcox,
Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv
> -/*
> - * Populate vmemmap pages HVO-style. The first page contains the head
> - * page and needed tail pages, the other ones are mirrors of the first
> - * page.
> - */
> +static __meminit unsigned long vmemmap_get_tail(unsigned int order, int node)
> +{
> + struct page *p, *tail;
> + unsigned int idx;
> +
> + BUG_ON(order < VMEMMAP_TAIL_MIN_ORDER);
> + BUG_ON(order > MAX_FOLIO_ORDER);
No BUG_ON.
--
Cheers,
David
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCHv6 11/17] mm/hugetlb: Remove fake head pages
2026-02-02 15:56 ` [PATCHv6 11/17] mm/hugetlb: Remove fake head pages Kiryl Shutsemau
2026-02-03 9:50 ` Muchun Song
2026-02-06 9:14 ` David Hildenbrand (Arm)
@ 2026-02-06 9:36 ` David Hildenbrand (Arm)
2026-02-07 20:16 ` Usama Arif
3 siblings, 0 replies; 67+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-06 9:36 UTC (permalink / raw)
To: Kiryl Shutsemau, Andrew Morton, Muchun Song, Matthew Wilcox,
Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv
On 2/2/26 16:56, Kiryl Shutsemau wrote:
> HugeTLB Vmemmap Optimization (HVO) reduces memory usage by freeing most
> vmemmap pages for huge pages and remapping the freed range to a single
> page containing the struct page metadata.
>
> With the new mask-based compound_info encoding (for power-of-2 struct
> page sizes), all tail pages of the same order are now identical
> regardless of which compound page they belong to. This means the tail
> pages can be truly shared without fake heads.
>
> Allocate a single page of initialized tail struct pages per NUMA node
> per order in the vmemmap_tails[] array in pglist_data. All huge pages of
> that order on the node share this tail page, mapped read-only into their
> vmemmap. The head page remains unique per huge page.
>
> Redefine MAX_FOLIO_ORDER using ilog2(). The define has to produce a
> compile-constant as it is used to specify vmemmap_tail array size.
> For some reason, compiler is not able to solve get_order() at
> compile-time, but ilog2() works.
>
> Avoid PUD_ORDER to define MAX_FOLIO_ORDER as it adds dependency to
> <linux/pgtable.h> which generates hard-to-break include loop.
>
> This eliminates fake heads while maintaining the same memory savings,
> and simplifies compound_head() by removing fake head detection.
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> ---
[...]
> #define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages)
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index a39a301e08b9..688764c52c72 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -19,6 +19,7 @@
>
> #include <asm/tlbflush.h>
> #include "hugetlb_vmemmap.h"
> +#include "internal.h"
>
> /**
> * struct vmemmap_remap_walk - walk vmemmap page table
> @@ -505,6 +506,32 @@ static bool vmemmap_should_optimize_folio(const struct hstate *h, struct folio *
> return true;
> }
>
> +static struct page *vmemmap_get_tail(unsigned int order, int node)
> +{
> + struct page *tail, *p;
> + unsigned int idx;
> +
> + idx =
Could do
const unsigned int idx = order - VMEMMAP_TAIL_MIN_ORDER;
above.
> + tail = READ_ONCE(NODE_DATA(node)->vmemmap_tails[idx]);
> + if (tail)
Wondering if a likely() would be a good idea here. I guess we'll usually
go through that fast path on a system that has been running for a bit.
> + return tail;
> +
> + tail = alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO, 0);
> + if (!tail)
> + return NULL;
> +
> + p = page_to_virt(tail);
> + for (int i = 0; i < PAGE_SIZE / sizeof(struct page); i++)
> + prep_compound_tail(p + i, NULL, order);
This leaves all pageflags, refcount etc. set to 0, which is mostly
expected for tail pages.
But, I would have expected something a bit more from
__init_single_page() that initialized the page properly.
In particular:
* set_page_node(page, node), or how is page_to_nid() handled?
* atomic_set(&page->_mapcount, -1), to not indicate something odd to
core-mm where we would suddenly have a page mapping for a hugetlb
folio.
> +
> + if (cmpxchg(&NODE_DATA(node)->vmemmap_tails[idx], NULL, tail)) {
> + __free_page(tail);
> + tail = READ_ONCE(NODE_DATA(node)->vmemmap_tails[idx]);
> + }
> +
> + return tail;
> +}
[...]
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -378,16 +378,44 @@ void vmemmap_wrprotect_hvo(unsigned long addr, unsigned long end,
> }
> }
>
> -/*
> - * Populate vmemmap pages HVO-style. The first page contains the head
> - * page and needed tail pages, the other ones are mirrors of the first
> - * page.
> - */
> +static __meminit unsigned long vmemmap_get_tail(unsigned int order, int node)
> +{
> + struct page *p, *tail;
> + unsigned int idx;
> +
> + BUG_ON(order < VMEMMAP_TAIL_MIN_ORDER);
> + BUG_ON(order > MAX_FOLIO_ORDER);
> +
> + idx = order - VMEMMAP_TAIL_MIN_ORDER;
> + tail = NODE_DATA(node)->vmemmap_tails[idx];
> + if (tail)
> + return page_to_pfn(tail);
> +
> + p = vmemmap_alloc_block_zero(PAGE_SIZE, node);
> + if (!p)
> + return 0;
> +
> + for (int i = 0; i < PAGE_SIZE / sizeof(struct page); i++)
> + prep_compound_tail(p + i, NULL, order);
> +
> + tail = virt_to_page(p);
> + NODE_DATA(node)->vmemmap_tails[idx] = tail;
> +
> + return page_to_pfn(tail);
> +}
> +
> int __meminit vmemmap_populate_hvo(unsigned long addr, unsigned long end,
> int node, unsigned long headsize)
> {
> + unsigned long maddr, len, tail_pfn;
> + unsigned int order;
> pte_t *pte;
> - unsigned long maddr;
> +
> + len = end - addr;
> + order = ilog2(len * sizeof(struct page) / PAGE_SIZE);
Could initialize them as const above.
But I am wondering whether it shouldn't be the caller that provides this
to use? After all, it's all hugetlb code that allocates and prepares that.
Then we could maybe change
#ifdef·CONFIG_SPARSEMEM_VMEMMAP
struct·page·*vmemmap_tails[NR_VMEMMAP_TAILS];
#endif
to be HVO-only.
--
Cheers,
David
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCHv6 11/17] mm/hugetlb: Remove fake head pages
2026-02-02 15:56 ` [PATCHv6 11/17] mm/hugetlb: Remove fake head pages Kiryl Shutsemau
` (2 preceding siblings ...)
2026-02-06 9:36 ` David Hildenbrand (Arm)
@ 2026-02-07 20:16 ` Usama Arif
2026-02-07 21:25 ` David Hildenbrand (Arm)
3 siblings, 1 reply; 67+ messages in thread
From: Usama Arif @ 2026-02-07 20:16 UTC (permalink / raw)
To: Kiryl Shutsemau, Andrew Morton, Muchun Song, David Hildenbrand,
Matthew Wilcox, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv
> +
> int __meminit vmemmap_populate_hvo(unsigned long addr, unsigned long end,
> int node, unsigned long headsize)
> {
> + unsigned long maddr, len, tail_pfn;
> + unsigned int order;
> pte_t *pte;
> - unsigned long maddr;
> +
> + len = end - addr;
> + order = ilog2(len * sizeof(struct page) / PAGE_SIZE);
This doesnt work for ARM. For len = 32 (2MB contiguous-PTE hugetlb on arm64):
ilog2(32 * 64 / 65536) = ilog2(2048 / 65536) = ilog2(0) which is undefined.
Is order = ilog2(len / sizeof(struct page)) better?
> + tail_pfn = vmemmap_get_tail(order, node);
> + if (!tail_pfn)
> + return -ENOMEM;
>
> for (maddr = addr; maddr < addr + headsize; maddr += PAGE_SIZE) {
> pte = vmemmap_populate_address(maddr, node, NULL, -1, 0);
> @@ -398,8 +426,7 @@ int __meminit vmemmap_populate_hvo(unsigned long addr, unsigned long end,
> /*
> * Reuse the last page struct page mapped above for the rest.
> */
> - return vmemmap_populate_range(maddr, end, node, NULL,
> - pte_pfn(ptep_get(pte)), 0);
> + return vmemmap_populate_range(maddr, end, node, NULL, tail_pfn, 0);
> }
>
> void __weak __meminit vmemmap_set_pmd(pmd_t *pmd, void *p, int node,
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCHv6 11/17] mm/hugetlb: Remove fake head pages
2026-02-07 20:16 ` Usama Arif
@ 2026-02-07 21:25 ` David Hildenbrand (Arm)
2026-02-07 22:50 ` Usama Arif
0 siblings, 1 reply; 67+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-07 21:25 UTC (permalink / raw)
To: Usama Arif, Kiryl Shutsemau, Andrew Morton, Muchun Song,
Matthew Wilcox, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv
On 2/7/26 21:16, Usama Arif wrote:
>
>> +
>> int __meminit vmemmap_populate_hvo(unsigned long addr, unsigned long end,
>> int node, unsigned long headsize)
>> {
>> + unsigned long maddr, len, tail_pfn;
>> + unsigned int order;
>> pte_t *pte;
>> - unsigned long maddr;
>> +
>> + len = end - addr;
>> + order = ilog2(len * sizeof(struct page) / PAGE_SIZE);
>
>
> This doesnt work for ARM. For len = 32 (2MB contiguous-PTE hugetlb on arm64):
> ilog2(32 * 64 / 65536) = ilog2(2048 / 65536) = ilog2(0) which is undefined.
HVO should not be possible for that size, and we should never reach that
point, no?
Remember that for HVO, the metadata must span at least two pages.
--
Cheers,
David
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCHv6 11/17] mm/hugetlb: Remove fake head pages
2026-02-07 21:25 ` David Hildenbrand (Arm)
@ 2026-02-07 22:50 ` Usama Arif
0 siblings, 0 replies; 67+ messages in thread
From: Usama Arif @ 2026-02-07 22:50 UTC (permalink / raw)
To: David Hildenbrand (Arm),
Kiryl Shutsemau, Andrew Morton, Muchun Song, Matthew Wilcox,
Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv
On 07/02/2026 21:25, David Hildenbrand (Arm) wrote:
> On 2/7/26 21:16, Usama Arif wrote:
>>
>>> +
>>> int __meminit vmemmap_populate_hvo(unsigned long addr, unsigned long end,
>>> int node, unsigned long headsize)
>>> {
>>> + unsigned long maddr, len, tail_pfn;
>>> + unsigned int order;
>>> pte_t *pte;
>>> - unsigned long maddr;
>>> +
>>> + len = end - addr;
>>> + order = ilog2(len * sizeof(struct page) / PAGE_SIZE);
>>
>>
>> This doesnt work for ARM. For len = 32 (2MB contiguous-PTE hugetlb on arm64):
>> ilog2(32 * 64 / 65536) = ilog2(2048 / 65536) = ilog2(0) which is undefined.
>
> HVO should not be possible for that size, and we should never reach that point, no?
>
> Remember that for HVO, the metadata must span at least two pages.
>
Ah yeah thats right, ignore me. Its also checked in hugetlb_vmemmap_optimizable_size,
so its all good.
^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCHv6 12/17] mm: Drop fake head checks
2026-02-02 15:56 [PATCHv6 00/17] mm: Eliminate fake head pages from vmemmap optimization Kiryl Shutsemau
` (10 preceding siblings ...)
2026-02-02 15:56 ` [PATCHv6 11/17] mm/hugetlb: Remove fake head pages Kiryl Shutsemau
@ 2026-02-02 15:56 ` Kiryl Shutsemau
2026-02-06 9:41 ` David Hildenbrand (Arm)
2026-02-10 16:18 ` Vlastimil Babka
2026-02-02 15:56 ` [PATCHv6 13/17] hugetlb: Remove VMEMMAP_SYNCHRONIZE_RCU Kiryl Shutsemau
` (4 subsequent siblings)
16 siblings, 2 replies; 67+ messages in thread
From: Kiryl Shutsemau @ 2026-02-02 15:56 UTC (permalink / raw)
To: Andrew Morton, Muchun Song, David Hildenbrand, Matthew Wilcox,
Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv, Kiryl Shutsemau
With fake head pages eliminated in the previous commit, remove the
supporting infrastructure:
- page_fixed_fake_head(): no longer needed to detect fake heads;
- page_is_fake_head(): no longer needed;
- page_count_writable(): no longer needed for RCU protection;
- RCU read_lock in page_ref_add_unless(): no longer needed;
This substantially simplifies compound_head() and page_ref_add_unless(),
removing both branches and RCU overhead from these hot paths.
Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
---
include/linux/page-flags.h | 93 ++------------------------------------
include/linux/page_ref.h | 8 +---
2 files changed, 4 insertions(+), 97 deletions(-)
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 8f2c7fbc739b..5a8f6fab2255 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -221,102 +221,15 @@ static __always_inline bool compound_info_has_mask(void)
return is_power_of_2(sizeof(struct page));
}
-#ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
DECLARE_STATIC_KEY_FALSE(hugetlb_optimize_vmemmap_key);
-/*
- * Return the real head page struct iff the @page is a fake head page, otherwise
- * return the @page itself. See Documentation/mm/vmemmap_dedup.rst.
- */
-static __always_inline const struct page *page_fixed_fake_head(const struct page *page)
-{
- if (!static_branch_unlikely(&hugetlb_optimize_vmemmap_key))
- return page;
-
- /* Fake heads only exists if compound_info_has_mask() is true */
- if (!compound_info_has_mask())
- return page;
-
- /*
- * Only addresses aligned with PAGE_SIZE of struct page may be fake head
- * struct page. The alignment check aims to avoid access the fields (
- * e.g. compound_info) of the @page[1]. It can avoid touch a (possibly)
- * cold cacheline in some cases.
- */
- if (IS_ALIGNED((unsigned long)page, PAGE_SIZE) &&
- test_bit(PG_head, &page->flags.f)) {
- /*
- * We can safely access the field of the @page[1] with PG_head
- * because the @page is a compound page composed with at least
- * two contiguous pages.
- */
- unsigned long info = READ_ONCE(page[1].compound_info);
-
- /* See set_compound_head() */
- if (likely(info & 1)) {
- unsigned long p = (unsigned long)page;
-
- return (const struct page *)(p & info);
- }
- }
- return page;
-}
-
-static __always_inline bool page_count_writable(const struct page *page, int u)
-{
- if (!static_branch_unlikely(&hugetlb_optimize_vmemmap_key))
- return true;
-
- /*
- * The refcount check is ordered before the fake-head check to prevent
- * the following race:
- * CPU 1 (HVO) CPU 2 (speculative PFN walker)
- *
- * page_ref_freeze()
- * synchronize_rcu()
- * rcu_read_lock()
- * page_is_fake_head() is false
- * vmemmap_remap_pte()
- * XXX: struct page[] becomes r/o
- *
- * page_ref_unfreeze()
- * page_ref_count() is not zero
- *
- * atomic_add_unless(&page->_refcount)
- * XXX: try to modify r/o struct page[]
- *
- * The refcount check also prevents modification attempts to other (r/o)
- * tail pages that are not fake heads.
- */
- if (atomic_read_acquire(&page->_refcount) == u)
- return false;
-
- return page_fixed_fake_head(page) == page;
-}
-#else
-static inline const struct page *page_fixed_fake_head(const struct page *page)
-{
- return page;
-}
-
-static inline bool page_count_writable(const struct page *page, int u)
-{
- return true;
-}
-#endif
-
-static __always_inline int page_is_fake_head(const struct page *page)
-{
- return page_fixed_fake_head(page) != page;
-}
-
static __always_inline unsigned long _compound_head(const struct page *page)
{
unsigned long info = READ_ONCE(page->compound_info);
/* Bit 0 encodes PageTail() */
if (!(info & 1))
- return (unsigned long)page_fixed_fake_head(page);
+ return (unsigned long)page;
/*
* If compound_info_has_mask() is false, the rest of compound_info is
@@ -397,7 +310,7 @@ static __always_inline void clear_compound_head(struct page *page)
static __always_inline int PageTail(const struct page *page)
{
- return READ_ONCE(page->compound_info) & 1 || page_is_fake_head(page);
+ return READ_ONCE(page->compound_info) & 1;
}
static __always_inline int PageCompound(const struct page *page)
@@ -924,7 +837,7 @@ static __always_inline bool folio_test_head(const struct folio *folio)
static __always_inline int PageHead(const struct page *page)
{
PF_POISONED_CHECK(page);
- return test_bit(PG_head, &page->flags.f) && !page_is_fake_head(page);
+ return test_bit(PG_head, &page->flags.f);
}
__SETPAGEFLAG(Head, head, PF_ANY)
diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
index 544150d1d5fd..490d0ad6e56d 100644
--- a/include/linux/page_ref.h
+++ b/include/linux/page_ref.h
@@ -230,13 +230,7 @@ static inline int folio_ref_dec_return(struct folio *folio)
static inline bool page_ref_add_unless(struct page *page, int nr, int u)
{
- bool ret = false;
-
- rcu_read_lock();
- /* avoid writing to the vmemmap area being remapped */
- if (page_count_writable(page, u))
- ret = atomic_add_unless(&page->_refcount, nr, u);
- rcu_read_unlock();
+ bool ret = atomic_add_unless(&page->_refcount, nr, u);
if (page_ref_tracepoint_active(page_ref_mod_unless))
__page_ref_mod_unless(page, nr, ret);
--
2.51.2
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCHv6 12/17] mm: Drop fake head checks
2026-02-02 15:56 ` [PATCHv6 12/17] mm: Drop fake head checks Kiryl Shutsemau
@ 2026-02-06 9:41 ` David Hildenbrand (Arm)
2026-02-10 16:18 ` Vlastimil Babka
1 sibling, 0 replies; 67+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-06 9:41 UTC (permalink / raw)
To: Kiryl Shutsemau, Andrew Morton, Muchun Song, Matthew Wilcox,
Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv
On 2/2/26 16:56, Kiryl Shutsemau wrote:
> With fake head pages eliminated in the previous commit, remove the
> supporting infrastructure:
>
> - page_fixed_fake_head(): no longer needed to detect fake heads;
> - page_is_fake_head(): no longer needed;
> - page_count_writable(): no longer needed for RCU protection;
> - RCU read_lock in page_ref_add_unless(): no longer needed;
>
> This substantially simplifies compound_head() and page_ref_add_unless(),
> removing both branches and RCU overhead from these hot paths.
Can you say a bit words more why RCU was required and is now no longer
needed?
IIRC, it's because we now no longer reuse the real head page (page 0)
for a tail, and there could have been a race where we could have
attempted to write to that page0 while already mapped (r/o) to page1.
Also good to mention that the corresponding RCU sync will be removed
separately in a following commit.
Nothing jumped at me and it's a great simplification for core-mm.
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCHv6 12/17] mm: Drop fake head checks
2026-02-02 15:56 ` [PATCHv6 12/17] mm: Drop fake head checks Kiryl Shutsemau
2026-02-06 9:41 ` David Hildenbrand (Arm)
@ 2026-02-10 16:18 ` Vlastimil Babka
1 sibling, 0 replies; 67+ messages in thread
From: Vlastimil Babka @ 2026-02-10 16:18 UTC (permalink / raw)
To: Kiryl Shutsemau, Andrew Morton, Muchun Song, David Hildenbrand,
Matthew Wilcox, Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Lorenzo Stoakes, Zi Yan,
Baoquan He, Michal Hocko, Johannes Weiner, Jonathan Corbet,
Huacai Chen, WANG Xuerui, Palmer Dabbelt, Paul Walmsley,
Albert Ou, Alexandre Ghiti, kernel-team, linux-mm, linux-kernel,
linux-doc, loongarch, linux-riscv
On 2/2/26 16:56, Kiryl Shutsemau wrote:
> With fake head pages eliminated in the previous commit, remove the
> supporting infrastructure:
>
> - page_fixed_fake_head(): no longer needed to detect fake heads;
> - page_is_fake_head(): no longer needed;
> - page_count_writable(): no longer needed for RCU protection;
> - RCU read_lock in page_ref_add_unless(): no longer needed;
>
> This substantially simplifies compound_head() and page_ref_add_unless(),
> removing both branches and RCU overhead from these hot paths.
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Reviewed-by: Muchun Song <muchun.song@linux.dev>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCHv6 13/17] hugetlb: Remove VMEMMAP_SYNCHRONIZE_RCU
2026-02-02 15:56 [PATCHv6 00/17] mm: Eliminate fake head pages from vmemmap optimization Kiryl Shutsemau
` (11 preceding siblings ...)
2026-02-02 15:56 ` [PATCHv6 12/17] mm: Drop fake head checks Kiryl Shutsemau
@ 2026-02-02 15:56 ` Kiryl Shutsemau
2026-02-06 9:42 ` David Hildenbrand (Arm)
2026-02-02 15:56 ` [PATCHv6 14/17] mm/hugetlb: Remove hugetlb_optimize_vmemmap_key static key Kiryl Shutsemau
` (3 subsequent siblings)
16 siblings, 1 reply; 67+ messages in thread
From: Kiryl Shutsemau @ 2026-02-02 15:56 UTC (permalink / raw)
To: Andrew Morton, Muchun Song, David Hildenbrand, Matthew Wilcox,
Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv, Kiryl Shutsemau
The VMEMMAP_SYNCHRONIZE_RCU flag triggered synchronize_rcu() calls to
prevent a race between HVO remapping and page_ref_add_unless(). The
race could occur when a speculative PFN walker tried to modify the
refcount on a struct page that was in the process of being remapped
to a fake head.
With fake heads eliminated, page_ref_add_unless() no longer needs RCU
protection.
Remove the flag and synchronize_rcu() calls.
Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
---
mm/hugetlb_vmemmap.c | 20 ++++----------------
1 file changed, 4 insertions(+), 16 deletions(-)
diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index 688764c52c72..6088fc77865c 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -47,8 +47,6 @@ struct vmemmap_remap_walk {
#define VMEMMAP_SPLIT_NO_TLB_FLUSH BIT(0)
/* Skip the TLB flush when we remap the PTE */
#define VMEMMAP_REMAP_NO_TLB_FLUSH BIT(1)
-/* synchronize_rcu() to avoid writes from page_ref_add_unless() */
-#define VMEMMAP_SYNCHRONIZE_RCU BIT(2)
unsigned long flags;
};
@@ -409,9 +407,6 @@ static int __hugetlb_vmemmap_restore_folio(const struct hstate *h,
if (!folio_test_hugetlb_vmemmap_optimized(folio))
return 0;
- if (flags & VMEMMAP_SYNCHRONIZE_RCU)
- synchronize_rcu();
-
vmemmap_start = (unsigned long)&folio->page;
vmemmap_end = vmemmap_start + hugetlb_vmemmap_size(h);
@@ -444,7 +439,7 @@ static int __hugetlb_vmemmap_restore_folio(const struct hstate *h,
*/
int hugetlb_vmemmap_restore_folio(const struct hstate *h, struct folio *folio)
{
- return __hugetlb_vmemmap_restore_folio(h, folio, VMEMMAP_SYNCHRONIZE_RCU);
+ return __hugetlb_vmemmap_restore_folio(h, folio, 0);
}
/**
@@ -467,14 +462,11 @@ long hugetlb_vmemmap_restore_folios(const struct hstate *h,
struct folio *folio, *t_folio;
long restored = 0;
long ret = 0;
- unsigned long flags = VMEMMAP_REMAP_NO_TLB_FLUSH | VMEMMAP_SYNCHRONIZE_RCU;
+ unsigned long flags = VMEMMAP_REMAP_NO_TLB_FLUSH;
list_for_each_entry_safe(folio, t_folio, folio_list, lru) {
if (folio_test_hugetlb_vmemmap_optimized(folio)) {
ret = __hugetlb_vmemmap_restore_folio(h, folio, flags);
- /* only need to synchronize_rcu() once for each batch */
- flags &= ~VMEMMAP_SYNCHRONIZE_RCU;
-
if (ret)
break;
restored++;
@@ -554,8 +546,6 @@ static int __hugetlb_vmemmap_optimize_folio(const struct hstate *h,
static_branch_inc(&hugetlb_optimize_vmemmap_key);
- if (flags & VMEMMAP_SYNCHRONIZE_RCU)
- synchronize_rcu();
/*
* Very Subtle
* If VMEMMAP_REMAP_NO_TLB_FLUSH is set, TLB flushing is not performed
@@ -613,7 +603,7 @@ void hugetlb_vmemmap_optimize_folio(const struct hstate *h, struct folio *folio)
{
LIST_HEAD(vmemmap_pages);
- __hugetlb_vmemmap_optimize_folio(h, folio, &vmemmap_pages, VMEMMAP_SYNCHRONIZE_RCU);
+ __hugetlb_vmemmap_optimize_folio(h, folio, &vmemmap_pages, 0);
free_vmemmap_page_list(&vmemmap_pages);
}
@@ -641,7 +631,7 @@ static void __hugetlb_vmemmap_optimize_folios(struct hstate *h,
struct folio *folio;
int nr_to_optimize;
LIST_HEAD(vmemmap_pages);
- unsigned long flags = VMEMMAP_REMAP_NO_TLB_FLUSH | VMEMMAP_SYNCHRONIZE_RCU;
+ unsigned long flags = VMEMMAP_REMAP_NO_TLB_FLUSH;
nr_to_optimize = 0;
list_for_each_entry(folio, folio_list, lru) {
@@ -694,8 +684,6 @@ static void __hugetlb_vmemmap_optimize_folios(struct hstate *h,
int ret;
ret = __hugetlb_vmemmap_optimize_folio(h, folio, &vmemmap_pages, flags);
- /* only need to synchronize_rcu() once for each batch */
- flags &= ~VMEMMAP_SYNCHRONIZE_RCU;
/*
* Pages to be freed may have been accumulated. If we
--
2.51.2
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCHv6 13/17] hugetlb: Remove VMEMMAP_SYNCHRONIZE_RCU
2026-02-02 15:56 ` [PATCHv6 13/17] hugetlb: Remove VMEMMAP_SYNCHRONIZE_RCU Kiryl Shutsemau
@ 2026-02-06 9:42 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 67+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-06 9:42 UTC (permalink / raw)
To: Kiryl Shutsemau, Andrew Morton, Muchun Song, Matthew Wilcox,
Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv
On 2/2/26 16:56, Kiryl Shutsemau wrote:
> The VMEMMAP_SYNCHRONIZE_RCU flag triggered synchronize_rcu() calls to
> prevent a race between HVO remapping and page_ref_add_unless(). The
> race could occur when a speculative PFN walker tried to modify the
> refcount on a struct page that was in the process of being remapped
> to a fake head.
>
> With fake heads eliminated, page_ref_add_unless() no longer needs RCU
> protection.
>
> Remove the flag and synchronize_rcu() calls.
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Reviewed-by: Muchun Song <muchun.song@linux.dev>
> ---
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCHv6 14/17] mm/hugetlb: Remove hugetlb_optimize_vmemmap_key static key
2026-02-02 15:56 [PATCHv6 00/17] mm: Eliminate fake head pages from vmemmap optimization Kiryl Shutsemau
` (12 preceding siblings ...)
2026-02-02 15:56 ` [PATCHv6 13/17] hugetlb: Remove VMEMMAP_SYNCHRONIZE_RCU Kiryl Shutsemau
@ 2026-02-02 15:56 ` Kiryl Shutsemau
2026-02-06 9:42 ` David Hildenbrand (Arm)
2026-02-02 15:56 ` [PATCHv6 15/17] mm: Remove the branch from compound_head() Kiryl Shutsemau
` (2 subsequent siblings)
16 siblings, 1 reply; 67+ messages in thread
From: Kiryl Shutsemau @ 2026-02-02 15:56 UTC (permalink / raw)
To: Andrew Morton, Muchun Song, David Hildenbrand, Matthew Wilcox,
Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv, Kiryl Shutsemau
The hugetlb_optimize_vmemmap_key static key was used to guard fake head
detection in compound_head() and related functions. It allowed skipping
the fake head checks entirely when HVO was not in use.
With fake heads eliminated and the detection code removed, the static
key serves no purpose. Remove its definition and all increment/decrement
calls.
Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
---
include/linux/page-flags.h | 2 --
mm/hugetlb_vmemmap.c | 14 ++------------
2 files changed, 2 insertions(+), 14 deletions(-)
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 5a8f6fab2255..1aaa604f4b9b 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -221,8 +221,6 @@ static __always_inline bool compound_info_has_mask(void)
return is_power_of_2(sizeof(struct page));
}
-DECLARE_STATIC_KEY_FALSE(hugetlb_optimize_vmemmap_key);
-
static __always_inline unsigned long _compound_head(const struct page *page)
{
unsigned long info = READ_ONCE(page->compound_info);
diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index 6088fc77865c..bdb68779d7b2 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -385,9 +385,6 @@ static int vmemmap_remap_alloc(unsigned long start, unsigned long end,
return vmemmap_remap_range(start, end, &walk);
}
-DEFINE_STATIC_KEY_FALSE(hugetlb_optimize_vmemmap_key);
-EXPORT_SYMBOL(hugetlb_optimize_vmemmap_key);
-
static bool vmemmap_optimize_enabled = IS_ENABLED(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON);
static int __init hugetlb_vmemmap_optimize_param(char *buf)
{
@@ -419,10 +416,8 @@ static int __hugetlb_vmemmap_restore_folio(const struct hstate *h,
* discarded vmemmap pages must be allocated and remapping.
*/
ret = vmemmap_remap_alloc(vmemmap_start, vmemmap_end, flags);
- if (!ret) {
+ if (!ret)
folio_clear_hugetlb_vmemmap_optimized(folio);
- static_branch_dec(&hugetlb_optimize_vmemmap_key);
- }
return ret;
}
@@ -544,8 +539,6 @@ static int __hugetlb_vmemmap_optimize_folio(const struct hstate *h,
if (!vmemmap_tail)
return -ENOMEM;
- static_branch_inc(&hugetlb_optimize_vmemmap_key);
-
/*
* Very Subtle
* If VMEMMAP_REMAP_NO_TLB_FLUSH is set, TLB flushing is not performed
@@ -581,10 +574,8 @@ static int __hugetlb_vmemmap_optimize_folio(const struct hstate *h,
vmemmap_head, vmemmap_tail,
vmemmap_pages, flags);
out:
- if (ret) {
- static_branch_dec(&hugetlb_optimize_vmemmap_key);
+ if (ret)
folio_clear_hugetlb_vmemmap_optimized(folio);
- }
return ret;
}
@@ -650,7 +641,6 @@ static void __hugetlb_vmemmap_optimize_folios(struct hstate *h,
register_page_bootmem_memmap(pfn_to_section_nr(spfn),
&folio->page,
HUGETLB_VMEMMAP_RESERVE_SIZE);
- static_branch_inc(&hugetlb_optimize_vmemmap_key);
continue;
}
--
2.51.2
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCHv6 14/17] mm/hugetlb: Remove hugetlb_optimize_vmemmap_key static key
2026-02-02 15:56 ` [PATCHv6 14/17] mm/hugetlb: Remove hugetlb_optimize_vmemmap_key static key Kiryl Shutsemau
@ 2026-02-06 9:42 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 67+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-06 9:42 UTC (permalink / raw)
To: Kiryl Shutsemau, Andrew Morton, Muchun Song, Matthew Wilcox,
Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv
On 2/2/26 16:56, Kiryl Shutsemau wrote:
> The hugetlb_optimize_vmemmap_key static key was used to guard fake head
> detection in compound_head() and related functions. It allowed skipping
> the fake head checks entirely when HVO was not in use.
>
> With fake heads eliminated and the detection code removed, the static
> key serves no purpose. Remove its definition and all increment/decrement
> calls.
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Reviewed-by: Muchun Song <muchun.song@linux.dev>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCHv6 15/17] mm: Remove the branch from compound_head()
2026-02-02 15:56 [PATCHv6 00/17] mm: Eliminate fake head pages from vmemmap optimization Kiryl Shutsemau
` (13 preceding siblings ...)
2026-02-02 15:56 ` [PATCHv6 14/17] mm/hugetlb: Remove hugetlb_optimize_vmemmap_key static key Kiryl Shutsemau
@ 2026-02-02 15:56 ` Kiryl Shutsemau
2026-02-06 10:23 ` David Hildenbrand (Arm)
2026-02-10 16:42 ` Vlastimil Babka
2026-02-02 15:56 ` [PATCHv6 16/17] hugetlb: Update vmemmap_dedup.rst Kiryl Shutsemau
2026-02-02 15:56 ` [PATCHv6 17/17] mm/slab: Use compound_head() in page_slab() Kiryl Shutsemau
16 siblings, 2 replies; 67+ messages in thread
From: Kiryl Shutsemau @ 2026-02-02 15:56 UTC (permalink / raw)
To: Andrew Morton, Muchun Song, David Hildenbrand, Matthew Wilcox,
Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv, Kiryl Shutsemau
The compound_head() function is a hot path. For example, the zap path
calls it for every leaf page table entry.
Rewrite the helper function in a branchless manner to eliminate the risk
of CPU branch misprediction.
Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
Reviewed-by: Zi Yan <ziy@nvidia.com>
---
include/linux/page-flags.h | 27 +++++++++++++++++----------
1 file changed, 17 insertions(+), 10 deletions(-)
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 1aaa604f4b9b..16384cb6f962 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -224,25 +224,32 @@ static __always_inline bool compound_info_has_mask(void)
static __always_inline unsigned long _compound_head(const struct page *page)
{
unsigned long info = READ_ONCE(page->compound_info);
+ unsigned long mask;
+
+ if (!compound_info_has_mask()) {
+ /* Bit 0 encodes PageTail() */
+ if (info & 1)
+ return info - 1;
- /* Bit 0 encodes PageTail() */
- if (!(info & 1))
return (unsigned long)page;
-
- /*
- * If compound_info_has_mask() is false, the rest of compound_info is
- * the pointer to the head page.
- */
- if (!compound_info_has_mask())
- return info - 1;
+ }
/*
* If compoun_info_has_mask() is true the rest of the info encodes
* the mask that converts the address of the tail page to the head page.
*
* No need to clear bit 0 in the mask as 'page' always has it clear.
+ *
+ * Let's do it in a branchless manner.
*/
- return (unsigned long)page & info;
+
+ /* Non-tail: -1UL, Tail: 0 */
+ mask = (info & 1) - 1;
+
+ /* Non-tail: -1UL, Tail: info */
+ mask |= info;
+
+ return (unsigned long)page & mask;
}
#define compound_head(page) ((typeof(page))_compound_head(page))
--
2.51.2
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCHv6 15/17] mm: Remove the branch from compound_head()
2026-02-02 15:56 ` [PATCHv6 15/17] mm: Remove the branch from compound_head() Kiryl Shutsemau
@ 2026-02-06 10:23 ` David Hildenbrand (Arm)
2026-02-10 16:42 ` Vlastimil Babka
1 sibling, 0 replies; 67+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-06 10:23 UTC (permalink / raw)
To: Kiryl Shutsemau, Andrew Morton, Muchun Song, Matthew Wilcox,
Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv
On 2/2/26 16:56, Kiryl Shutsemau wrote:
> The compound_head() function is a hot path. For example, the zap path
> calls it for every leaf page table entry.
>
> Rewrite the helper function in a branchless manner to eliminate the risk
> of CPU branch misprediction.
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Reviewed-by: Muchun Song <muchun.song@linux.dev>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> ---
> include/linux/page-flags.h | 27 +++++++++++++++++----------
> 1 file changed, 17 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 1aaa604f4b9b..16384cb6f962 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -224,25 +224,32 @@ static __always_inline bool compound_info_has_mask(void)
> static __always_inline unsigned long _compound_head(const struct page *page)
> {
> unsigned long info = READ_ONCE(page->compound_info);
> + unsigned long mask;
> +
> + if (!compound_info_has_mask()) {
> + /* Bit 0 encodes PageTail() */
> + if (info & 1)
> + return info - 1;
>
> - /* Bit 0 encodes PageTail() */
> - if (!(info & 1))
> return (unsigned long)page;
> -
> - /*
> - * If compound_info_has_mask() is false, the rest of compound_info is
> - * the pointer to the head page.
> - */
> - if (!compound_info_has_mask())
> - return info - 1;
> + }
>
> /*
> * If compoun_info_has_mask() is true the rest of the info encodes
> * the mask that converts the address of the tail page to the head page.
> *
> * No need to clear bit 0 in the mask as 'page' always has it clear.
> + *
> + * Let's do it in a branchless manner.
> */
> - return (unsigned long)page & info;
> +
> + /* Non-tail: -1UL, Tail: 0 */
> + mask = (info & 1) - 1;
> +
> + /* Non-tail: -1UL, Tail: info */
> + mask |= info;
> +
> + return (unsigned long)page & mask;
> }
>
> #define compound_head(page) ((typeof(page))_compound_head(page))
Nice!
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCHv6 15/17] mm: Remove the branch from compound_head()
2026-02-02 15:56 ` [PATCHv6 15/17] mm: Remove the branch from compound_head() Kiryl Shutsemau
2026-02-06 10:23 ` David Hildenbrand (Arm)
@ 2026-02-10 16:42 ` Vlastimil Babka
1 sibling, 0 replies; 67+ messages in thread
From: Vlastimil Babka @ 2026-02-10 16:42 UTC (permalink / raw)
To: Kiryl Shutsemau, Andrew Morton, Muchun Song, David Hildenbrand,
Matthew Wilcox, Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Lorenzo Stoakes, Zi Yan,
Baoquan He, Michal Hocko, Johannes Weiner, Jonathan Corbet,
Huacai Chen, WANG Xuerui, Palmer Dabbelt, Paul Walmsley,
Albert Ou, Alexandre Ghiti, kernel-team, linux-mm, linux-kernel,
linux-doc, loongarch, linux-riscv
On 2/2/26 16:56, Kiryl Shutsemau wrote:
> The compound_head() function is a hot path. For example, the zap path
> calls it for every leaf page table entry.
>
> Rewrite the helper function in a branchless manner to eliminate the risk
> of CPU branch misprediction.
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Reviewed-by: Muchun Song <muchun.song@linux.dev>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCHv6 16/17] hugetlb: Update vmemmap_dedup.rst
2026-02-02 15:56 [PATCHv6 00/17] mm: Eliminate fake head pages from vmemmap optimization Kiryl Shutsemau
` (14 preceding siblings ...)
2026-02-02 15:56 ` [PATCHv6 15/17] mm: Remove the branch from compound_head() Kiryl Shutsemau
@ 2026-02-02 15:56 ` Kiryl Shutsemau
2026-02-06 10:35 ` David Hildenbrand (Arm)
2026-02-02 15:56 ` [PATCHv6 17/17] mm/slab: Use compound_head() in page_slab() Kiryl Shutsemau
16 siblings, 1 reply; 67+ messages in thread
From: Kiryl Shutsemau @ 2026-02-02 15:56 UTC (permalink / raw)
To: Andrew Morton, Muchun Song, David Hildenbrand, Matthew Wilcox,
Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv, Kiryl Shutsemau
Update the documentation regarding vmemmap optimization for hugetlb to
reflect the changes in how the kernel maps the tail pages.
Fake heads no longer exist. Remove their description.
Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
---
Documentation/mm/vmemmap_dedup.rst | 60 +++++++++++++-----------------
1 file changed, 26 insertions(+), 34 deletions(-)
diff --git a/Documentation/mm/vmemmap_dedup.rst b/Documentation/mm/vmemmap_dedup.rst
index 1863d88d2dcb..fca9d0ce282a 100644
--- a/Documentation/mm/vmemmap_dedup.rst
+++ b/Documentation/mm/vmemmap_dedup.rst
@@ -124,33 +124,35 @@ Here is how things look before optimization::
| |
+-----------+
-The value of page->compound_info is the same for all tail pages. The first
-page of ``struct page`` (page 0) associated with the HugeTLB page contains the 4
-``struct page`` necessary to describe the HugeTLB. The only use of the remaining
-pages of ``struct page`` (page 1 to page 7) is to point to page->compound_info.
-Therefore, we can remap pages 1 to 7 to page 0. Only 1 page of ``struct page``
-will be used for each HugeTLB page. This will allow us to free the remaining
-7 pages to the buddy allocator.
+The first page of ``struct page`` (page 0) associated with the HugeTLB page
+contains the 4 ``struct page`` necessary to describe the HugeTLB. The remaining
+pages of ``struct page`` (page 1 to page 7) are tail pages.
+
+The optimization is only applied when the size of the struct page is a power-of-2
+In this case, all tail pages of the same order are identical. See
+compound_head(). This allows us to remap the tail pages of the vmemmap to a
+shared, read-only page. The head page is also remapped to a new page. This
+allows the original vmemmap pages to be freed.
Here is how things look after remapping::
- HugeTLB struct pages(8 pages) page frame(8 pages)
- +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+
- | | | 0 | -------------> | 0 |
- | | +-----------+ +-----------+
- | | | 1 | ---------------^ ^ ^ ^ ^ ^ ^
- | | +-----------+ | | | | | |
- | | | 2 | -----------------+ | | | | |
- | | +-----------+ | | | | |
- | | | 3 | -------------------+ | | | |
- | | +-----------+ | | | |
- | | | 4 | ---------------------+ | | |
- | PMD | +-----------+ | | |
- | level | | 5 | -----------------------+ | |
- | mapping | +-----------+ | |
- | | | 6 | -------------------------+ |
- | | +-----------+ |
- | | | 7 | ---------------------------+
+ HugeTLB struct pages(8 pages) page frame
+ +-----------+ ---virt_to_page---> +-----------+ mapping to +----------------+
+ | | | 0 | -------------> | 0 |
+ | | +-----------+ +----------------+
+ | | | 1 | ------┐
+ | | +-----------+ |
+ | | | 2 | ------┼ +----------------------------+
+ | | +-----------+ | | A single, per-node page |
+ | | | 3 | ------┼------> | frame shared among all |
+ | | +-----------+ | | hugepages of the same size |
+ | | | 4 | ------┼ +----------------------------+
+ | | +-----------+ |
+ | | | 5 | ------┼
+ | PMD | +-----------+ |
+ | level | | 6 | ------┼
+ | mapping | +-----------+ |
+ | | | 7 | ------┘
| | +-----------+
| |
| |
@@ -172,16 +174,6 @@ The contiguous bit is used to increase the mapping size at the pmd and pte
(last) level. So this type of HugeTLB page can be optimized only when its
size of the ``struct page`` structs is greater than **1** page.
-Notice: The head vmemmap page is not freed to the buddy allocator and all
-tail vmemmap pages are mapped to the head vmemmap page frame. So we can see
-more than one ``struct page`` struct with ``PG_head`` (e.g. 8 per 2 MB HugeTLB
-page) associated with each HugeTLB page. The ``compound_head()`` can handle
-this correctly. There is only **one** head ``struct page``, the tail
-``struct page`` with ``PG_head`` are fake head ``struct page``. We need an
-approach to distinguish between those two different types of ``struct page`` so
-that ``compound_head()`` can return the real head ``struct page`` when the
-parameter is the tail ``struct page`` but with ``PG_head``.
-
Device DAX
==========
--
2.51.2
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCHv6 16/17] hugetlb: Update vmemmap_dedup.rst
2026-02-02 15:56 ` [PATCHv6 16/17] hugetlb: Update vmemmap_dedup.rst Kiryl Shutsemau
@ 2026-02-06 10:35 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 67+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-06 10:35 UTC (permalink / raw)
To: Kiryl Shutsemau, Andrew Morton, Muchun Song, Matthew Wilcox,
Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv
On 2/2/26 16:56, Kiryl Shutsemau wrote:
> Update the documentation regarding vmemmap optimization for hugetlb to
> reflect the changes in how the kernel maps the tail pages.
>
> Fake heads no longer exist. Remove their description.
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Reviewed-by: Muchun Song <muchun.song@linux.dev>
> ---
> Documentation/mm/vmemmap_dedup.rst | 60 +++++++++++++-----------------
> 1 file changed, 26 insertions(+), 34 deletions(-)
>
> diff --git a/Documentation/mm/vmemmap_dedup.rst b/Documentation/mm/vmemmap_dedup.rst
> index 1863d88d2dcb..fca9d0ce282a 100644
> --- a/Documentation/mm/vmemmap_dedup.rst
> +++ b/Documentation/mm/vmemmap_dedup.rst
> @@ -124,33 +124,35 @@ Here is how things look before optimization::
> | |
> +-----------+
>
> -The value of page->compound_info is the same for all tail pages. The first
> -page of ``struct page`` (page 0) associated with the HugeTLB page contains the 4
> -``struct page`` necessary to describe the HugeTLB. The only use of the remaining
> -pages of ``struct page`` (page 1 to page 7) is to point to page->compound_info.
> -Therefore, we can remap pages 1 to 7 to page 0. Only 1 page of ``struct page``
> -will be used for each HugeTLB page. This will allow us to free the remaining
> -7 pages to the buddy allocator.
> +The first page of ``struct page`` (page 0) associated with the HugeTLB page
> +contains the 4 ``struct page`` necessary to describe the HugeTLB. The remaining
> +pages of ``struct page`` (page 1 to page 7) are tail pages.
> +
> +The optimization is only applied when the size of the struct page is a power-of-2
> +In this case, all tail pages of the same order are identical. See
> +compound_head(). This allows us to remap the tail pages of the vmemmap to a
> +shared, read-only page. The head page is also remapped to a new page. This
> +allows the original vmemmap pages to be freed.
>
> Here is how things look after remapping::
>
> - HugeTLB struct pages(8 pages) page frame(8 pages)
> - +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+
> - | | | 0 | -------------> | 0 |
> - | | +-----------+ +-----------+
> - | | | 1 | ---------------^ ^ ^ ^ ^ ^ ^
> - | | +-----------+ | | | | | |
> - | | | 2 | -----------------+ | | | | |
> - | | +-----------+ | | | | |
> - | | | 3 | -------------------+ | | | |
> - | | +-----------+ | | | |
> - | | | 4 | ---------------------+ | | |
> - | PMD | +-----------+ | | |
> - | level | | 5 | -----------------------+ | |
> - | mapping | +-----------+ | |
> - | | | 6 | -------------------------+ |
> - | | +-----------+ |
> - | | | 7 | ---------------------------+
> + HugeTLB struct pages(8 pages) page frame
You could highlight that we allocate a new head page like "page frame
(new)".
Wasn't aware of that detail before reading your change above.
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCHv6 17/17] mm/slab: Use compound_head() in page_slab()
2026-02-02 15:56 [PATCHv6 00/17] mm: Eliminate fake head pages from vmemmap optimization Kiryl Shutsemau
` (15 preceding siblings ...)
2026-02-02 15:56 ` [PATCHv6 16/17] hugetlb: Update vmemmap_dedup.rst Kiryl Shutsemau
@ 2026-02-02 15:56 ` Kiryl Shutsemau
2026-02-04 3:39 ` Muchun Song
` (2 more replies)
16 siblings, 3 replies; 67+ messages in thread
From: Kiryl Shutsemau @ 2026-02-02 15:56 UTC (permalink / raw)
To: Andrew Morton, Muchun Song, David Hildenbrand, Matthew Wilcox,
Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv, Kiryl Shutsemau
page_slab() contained an open-coded implementation of compound_head().
Replace the duplicated code with a direct call to compound_head().
Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
---
I am not sure if this open-coded version is intentional and required for
memdesc transition. Drop the patch if it is.
---
mm/slab.h | 14 +-------------
1 file changed, 1 insertion(+), 13 deletions(-)
diff --git a/mm/slab.h b/mm/slab.h
index f68c3ac8126f..970a13ac5b8e 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -137,19 +137,7 @@ static_assert(IS_ALIGNED(offsetof(struct slab, freelist), sizeof(struct freelist
*/
static inline struct slab *page_slab(const struct page *page)
{
- unsigned long info;
-
- info = READ_ONCE(page->compound_info);
- if (info & 1) {
- /* See compound_head() */
- if (compound_info_has_mask()) {
- unsigned long p = (unsigned long)page;
- page = (struct page *)(p & info);
- } else {
- page = (struct page *)(info - 1);
- }
- }
-
+ page = compound_head(page);
if (data_race(page->page_type >> 24) != PGTY_slab)
page = NULL;
--
2.51.2
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCHv6 17/17] mm/slab: Use compound_head() in page_slab()
2026-02-02 15:56 ` [PATCHv6 17/17] mm/slab: Use compound_head() in page_slab() Kiryl Shutsemau
@ 2026-02-04 3:39 ` Muchun Song
2026-02-06 10:42 ` David Hildenbrand (Arm)
2026-02-10 16:45 ` Vlastimil Babka
2 siblings, 0 replies; 67+ messages in thread
From: Muchun Song @ 2026-02-04 3:39 UTC (permalink / raw)
To: Kiryl Shutsemau
Cc: Andrew Morton, David Hildenbrand, Matthew Wilcox, Usama Arif,
Frank van der Linden, Oscar Salvador, Mike Rapoport,
Vlastimil Babka, Lorenzo Stoakes, Zi Yan, Baoquan He,
Michal Hocko, Johannes Weiner, Jonathan Corbet, Huacai Chen,
WANG Xuerui, Palmer Dabbelt, Paul Walmsley, Albert Ou,
Alexandre Ghiti, kernel-team, linux-mm, linux-kernel, linux-doc,
loongarch, linux-riscv
> On Feb 2, 2026, at 23:56, Kiryl Shutsemau <kas@kernel.org> wrote:
>
> page_slab() contained an open-coded implementation of compound_head().
>
> Replace the duplicated code with a direct call to compound_head().
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Acked-by: Muchun Song <muchun.song@linux.dev>
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCHv6 17/17] mm/slab: Use compound_head() in page_slab()
2026-02-02 15:56 ` [PATCHv6 17/17] mm/slab: Use compound_head() in page_slab() Kiryl Shutsemau
2026-02-04 3:39 ` Muchun Song
@ 2026-02-06 10:42 ` David Hildenbrand (Arm)
2026-02-10 16:45 ` Vlastimil Babka
2 siblings, 0 replies; 67+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-06 10:42 UTC (permalink / raw)
To: Kiryl Shutsemau, Andrew Morton, Muchun Song, Matthew Wilcox,
Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Vlastimil Babka, Lorenzo Stoakes,
Zi Yan, Baoquan He, Michal Hocko, Johannes Weiner,
Jonathan Corbet, Huacai Chen, WANG Xuerui, Palmer Dabbelt,
Paul Walmsley, Albert Ou, Alexandre Ghiti, kernel-team, linux-mm,
linux-kernel, linux-doc, loongarch, linux-riscv
On 2/2/26 16:56, Kiryl Shutsemau wrote:
> page_slab() contained an open-coded implementation of compound_head().
>
> Replace the duplicated code with a direct call to compound_head().
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
>
> ---
>
> I am not sure if this open-coded version is intentional and required for
> memdesc transition. Drop the patch if it is.
commit 2bcd3800f2da1be13b972858f63c66d035b1ec6d
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date: Thu Nov 13 00:09:15 2025 +0000
slab: Reimplement page_slab()
In order to separate slabs from folios, we need to convert from any page
in a slab to the slab directly without going through a page to folio
conversion first.
Up to this point, page_slab() has followed the example of other memdesc
converters (page_folio(), page_ptdesc() etc) and just cast the pointer
to the requested type, regardless of whether the pointer is actually a
pointer to the correct type or not.
That changes with this commit; we check that the page actually belongs
to a slab and return NULL if it does not. Other memdesc converters will
adopt this convention in future.
I think using compound_head() is fine. For memdescs the function has to be changed to
lookup the memdesc either way, and not go through the head page.
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCHv6 17/17] mm/slab: Use compound_head() in page_slab()
2026-02-02 15:56 ` [PATCHv6 17/17] mm/slab: Use compound_head() in page_slab() Kiryl Shutsemau
2026-02-04 3:39 ` Muchun Song
2026-02-06 10:42 ` David Hildenbrand (Arm)
@ 2026-02-10 16:45 ` Vlastimil Babka
2 siblings, 0 replies; 67+ messages in thread
From: Vlastimil Babka @ 2026-02-10 16:45 UTC (permalink / raw)
To: Kiryl Shutsemau, Andrew Morton, Muchun Song, David Hildenbrand,
Matthew Wilcox, Usama Arif, Frank van der Linden
Cc: Oscar Salvador, Mike Rapoport, Lorenzo Stoakes, Zi Yan,
Baoquan He, Michal Hocko, Johannes Weiner, Jonathan Corbet,
Huacai Chen, WANG Xuerui, Palmer Dabbelt, Paul Walmsley,
Albert Ou, Alexandre Ghiti, kernel-team, linux-mm, linux-kernel,
linux-doc, loongarch, linux-riscv
On 2/2/26 16:56, Kiryl Shutsemau wrote:
> page_slab() contained an open-coded implementation of compound_head().
>
> Replace the duplicated code with a direct call to compound_head().
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>
> ---
>
> I am not sure if this open-coded version is intentional and required for
> memdesc transition. Drop the patch if it is.
> ---
> mm/slab.h | 14 +-------------
> 1 file changed, 1 insertion(+), 13 deletions(-)
>
> diff --git a/mm/slab.h b/mm/slab.h
> index f68c3ac8126f..970a13ac5b8e 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -137,19 +137,7 @@ static_assert(IS_ALIGNED(offsetof(struct slab, freelist), sizeof(struct freelist
> */
> static inline struct slab *page_slab(const struct page *page)
> {
> - unsigned long info;
> -
> - info = READ_ONCE(page->compound_info);
> - if (info & 1) {
> - /* See compound_head() */
> - if (compound_info_has_mask()) {
> - unsigned long p = (unsigned long)page;
> - page = (struct page *)(p & info);
> - } else {
> - page = (struct page *)(info - 1);
> - }
> - }
> -
> + page = compound_head(page);
> if (data_race(page->page_type >> 24) != PGTY_slab)
> page = NULL;
>
^ permalink raw reply [flat|nested] 67+ messages in thread