From: Muchun Song <songmuchun@bytedance.com>
To: Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Muchun Song <muchun.song@linux.dev>,
Oscar Salvador <osalvador@suse.de>,
Michael Ellerman <mpe@ellerman.id.au>,
Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Lorenzo Stoakes <ljs@kernel.org>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>,
Nicholas Piggin <npiggin@gmail.com>,
Christophe Leroy <chleroy@kernel.org>,
aneesh.kumar@linux.ibm.com, joao.m.martins@oracle.com,
linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org,
linux-kernel@vger.kernel.org,
Muchun Song <songmuchun@bytedance.com>
Subject: [PATCH 44/49] mm/sparse-vmemmap: drop ARCH_WANT_OPTIMIZE_DAX_VMEMMAP and simplify checks
Date: Sun, 5 Apr 2026 20:52:35 +0800 [thread overview]
Message-ID: <20260405125240.2558577-45-songmuchun@bytedance.com> (raw)
In-Reply-To: <20260405125240.2558577-1-songmuchun@bytedance.com>
Historically, when device DAX vmemmap optimization was introduced, it was
initially implemented as a generic feature within sparse-vmemmap.c. However,
it was later discovered that architectures with specific page table formats
(such as PowerPC with hash translation) would crash because the generic
vmemmap_populate_compound_pages() was unaware of their specific page table
setup (e.g., bolted table entries).
To address this, commit 87a7ae75d738 ("mm/vmemmap/devdax: fix kernel crash
when probing devdax devices") introduced a restrictive config option,
which eventually evolved into ARCH_WANT_OPTIMIZE_DAX_VMEMMAP (via commits
0b376f1e0ff5 and 0b6f15824cc7). This effectively turned a generic
optimization into an opt-in architectural feature.
However, the architecture landscape has evolved. The decision of whether
to apply DAX vmemmap optimization techniques for specific page table formats
is now fully delegated to the architecture-specific implementations (e.g.,
within vmemmap_populate()). The upper-level Kconfig restrictions and the
rigid generic wrapper functions are no longer necessary to prevent crashes,
as the architectures themselves handle the viability of the mappings. If an
architecture does not support DAX vmemmap optimization, it can simply
implement fallback logic similar to what PowerPC does in its
vmemmap_populate() routines.
If the architecture supports neither HugeTLB vmemmap optimization nor DAX
vmemmap optimization, but still wants to reduce code size and disable this
feature entirely, it is now possible to turn off SPARSEMEM_VMEMMAP_OPTIMIZATION.
It is no longer a hidden option, but rather a user-configurable boolean under
the SPARSEMEM_VMEMMAP umbrella.
Therefore, this patch removes the redundant ARCH_WANT_OPTIMIZE_DAX_VMEMMAP
and drops the complicated vmemmap_can_optimize() helper. Instead, we
unify SPARSEMEM_VMEMMAP_OPTIMIZATION as a fundamental core capability that
is enabled by default whenever SPARSEMEM_VMEMMAP is selected.
The check in sparse_add_section() is safely simplified to:
if (!altmap && pgmap && nr_pages == PAGES_PER_SECTION)
which succinctly reflects the prerequisites for the optimization without
unnecessary boilerplate.
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
arch/powerpc/Kconfig | 1 -
arch/riscv/Kconfig | 1 -
arch/x86/Kconfig | 1 -
include/linux/mm.h | 34 ----------------------------------
mm/Kconfig | 14 ++++++++------
mm/sparse-vmemmap.c | 2 +-
6 files changed, 9 insertions(+), 44 deletions(-)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index da4e2ec2af20..8158d5d0c226 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -184,7 +184,6 @@ config PPC
select ARCH_WANT_IPC_PARSE_VERSION
select ARCH_WANT_IRQS_OFF_ACTIVATE_MM
select ARCH_WANT_LD_ORPHAN_WARN
- select ARCH_WANT_OPTIMIZE_DAX_VMEMMAP if PPC_RADIX_MMU
select ARCH_WANTS_MODULES_DATA_IN_VMALLOC if PPC_BOOK3S_32 || PPC_8xx
select ARCH_WEAK_RELEASE_ACQUIRE
select BINFMT_ELF
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 61a9d8d3ea64..a8eccb828e7b 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -85,7 +85,6 @@ config RISCV
select ARCH_WANT_GENERAL_HUGETLB if !RISCV_ISA_SVNAPOT
select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
select ARCH_WANT_LD_ORPHAN_WARN if !XIP_KERNEL
- select ARCH_WANT_OPTIMIZE_DAX_VMEMMAP
select ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP
select ARCH_WANTS_NO_INSTR
select ARCH_WANTS_THP_SWAP if HAVE_ARCH_TRANSPARENT_HUGEPAGE
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f19625648f0f..83c55e286b40 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -146,7 +146,6 @@ config X86
select ARCH_WANT_GENERAL_HUGETLB
select ARCH_WANT_HUGE_PMD_SHARE if X86_64
select ARCH_WANT_LD_ORPHAN_WARN
- select ARCH_WANT_OPTIMIZE_DAX_VMEMMAP if X86_64
select ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP if X86_64
select ARCH_WANTS_THP_SWAP if X86_64
select ARCH_HAS_PARANOID_L1D_FLUSH
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c36001c9d571..8baa224444be 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4910,40 +4910,6 @@ static inline void vmem_altmap_free(struct vmem_altmap *altmap,
}
#endif
-#define VMEMMAP_RESERVE_NR OPTIMIZED_FOLIO_VMEMMAP_PAGES
-#ifdef CONFIG_ARCH_WANT_OPTIMIZE_DAX_VMEMMAP
-static inline bool __vmemmap_can_optimize(struct vmem_altmap *altmap,
- struct dev_pagemap *pgmap)
-{
- unsigned long nr_pages;
- unsigned long nr_vmemmap_pages;
-
- if (!pgmap || !is_power_of_2(sizeof(struct page)))
- return false;
-
- nr_pages = pgmap_vmemmap_nr(pgmap);
- nr_vmemmap_pages = ((nr_pages * sizeof(struct page)) >> PAGE_SHIFT);
- /*
- * For vmemmap optimization with DAX we need minimum 2 vmemmap
- * pages. See layout diagram in Documentation/mm/vmemmap_dedup.rst
- */
- return !altmap && (nr_vmemmap_pages > VMEMMAP_RESERVE_NR);
-}
-/*
- * If we don't have an architecture override, use the generic rule
- */
-#ifndef vmemmap_can_optimize
-#define vmemmap_can_optimize __vmemmap_can_optimize
-#endif
-
-#else
-static inline bool vmemmap_can_optimize(struct vmem_altmap *altmap,
- struct dev_pagemap *pgmap)
-{
- return false;
-}
-#endif
-
enum mf_flags {
MF_COUNT_INCREASED = 1 << 0,
MF_ACTION_REQUIRED = 1 << 1,
diff --git a/mm/Kconfig b/mm/Kconfig
index e81aa77182b2..166552d5d69a 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -411,17 +411,19 @@ config SPARSEMEM_VMEMMAP
efficient option when sufficient kernel resources are available.
config SPARSEMEM_VMEMMAP_OPTIMIZATION
- bool
+ bool "Enable Vmemmap Optimization Infrastructure"
+ default y
depends on SPARSEMEM_VMEMMAP
+ help
+ This allows features like HugeTLB and DAX to map multiple contiguous
+ vmemmap pages to a single underlying physical page to save memory.
+
+ If unsure, say Y.
#
# Select this config option from the architecture Kconfig, if it is preferred
-# to enable the feature of HugeTLB/dev_dax vmemmap optimization.
+# to enable the feature of HugeTLB vmemmap optimization.
#
-config ARCH_WANT_OPTIMIZE_DAX_VMEMMAP
- bool
- select SPARSEMEM_VMEMMAP_OPTIMIZATION if SPARSEMEM_VMEMMAP
-
config ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP
bool
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index ac2efba9ef92..752a48112504 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -698,7 +698,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
return ret;
ms = __nr_to_section(section_nr);
- if (vmemmap_can_optimize(altmap, pgmap) && nr_pages == PAGES_PER_SECTION) {
+ if (!altmap && pgmap && nr_pages == PAGES_PER_SECTION) {
section_set_order(ms, pgmap->vmemmap_shift);
#ifdef CONFIG_ZONE_DEVICE
section_set_zone(ms, ZONE_DEVICE);
--
2.20.1
next prev parent reply other threads:[~2026-04-05 12:58 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-05 12:51 [PATCH 00/49] mm: Generalize vmemmap optimization for DAX and HugeTLB Muchun Song
2026-04-05 12:51 ` [PATCH 01/49] mm/sparse: fix vmemmap accounting imbalance on memory hotplug error Muchun Song
2026-04-05 12:51 ` [PATCH 02/49] mm/sparse: add a @pgmap argument to memory deactivation paths Muchun Song
2026-04-05 12:51 ` [PATCH 03/49] mm/sparse: fix vmemmap page accounting for HVOed DAX Muchun Song
2026-04-05 12:51 ` [PATCH 04/49] mm/sparse: add a @pgmap parameter to arch vmemmap_populate() Muchun Song
2026-04-05 12:51 ` [PATCH 05/49] mm/sparse: fix missing architecture-specific page table sync for HVO DAX Muchun Song
2026-04-05 12:51 ` [PATCH 06/49] mm/mm_init: fix uninitialized pageblock migratetype for ZONE_DEVICE compound pages Muchun Song
2026-04-05 12:51 ` [PATCH 07/49] mm/mm_init: use pageblock_migratetype_init_range() in deferred_free_pages() Muchun Song
2026-04-05 12:51 ` [PATCH 08/49] mm: Convert vmemmap_p?d_populate() to static functions Muchun Song
2026-04-05 12:52 ` [PATCH 09/49] mm: panic on memory allocation failure in sparse_init_nid() Muchun Song
2026-04-05 12:52 ` [PATCH 10/49] mm: move subsection_map_init() into sparse_init() Muchun Song
2026-04-05 12:52 ` [PATCH 11/49] mm: defer sparse_init() until after zone initialization Muchun Song
2026-04-05 12:52 ` [PATCH 12/49] mm: make set_pageblock_order() static Muchun Song
2026-04-05 12:52 ` [PATCH 13/49] mm: integrate sparse_vmemmap_init_nid_late() into sparse_init_nid() Muchun Song
2026-04-05 12:52 ` [PATCH 14/49] mm/cma: validate hugetlb CMA range by zone at reserve time Muchun Song
2026-04-05 12:52 ` [PATCH 15/49] mm/hugetlb: free cross-zone bootmem gigantic pages after allocation Muchun Song
2026-04-05 12:52 ` [PATCH 16/49] mm/hugetlb: initialize vmemmap optimization in early stage Muchun Song
2026-04-05 12:52 ` [PATCH 17/49] mm: remove sparse_vmemmap_init_nid_late() Muchun Song
2026-04-05 12:52 ` [PATCH 18/49] mm/mm_init: make __init_page_from_nid() static Muchun Song
2026-04-05 12:52 ` [PATCH 19/49] mm/sparse-vmemmap: remove the VMEMMAP_POPULATE_PAGEREF flag Muchun Song
2026-04-05 12:52 ` [PATCH 20/49] mm: rename vmemmap optimization macros to generic names Muchun Song
2026-04-05 12:52 ` [PATCH 21/49] mm/sparse: drop power-of-2 size requirement for struct mem_section Muchun Song
2026-04-05 12:52 ` [PATCH 22/49] mm/sparse: introduce compound page order to mem_section Muchun Song
2026-04-05 12:52 ` [PATCH 23/49] mm/mm_init: skip initializing shared tail pages for compound pages Muchun Song
2026-04-05 12:52 ` [PATCH 24/49] mm/sparse-vmemmap: initialize shared tail vmemmap page upon allocation Muchun Song
2026-04-05 12:52 ` [PATCH 25/49] mm/sparse-vmemmap: support vmemmap-optimizable compound page population Muchun Song
2026-04-05 12:52 ` [PATCH 26/49] mm/hugetlb: use generic vmemmap optimization macros Muchun Song
2026-04-05 12:52 ` [PATCH 27/49] mm: call memblocks_present() before HugeTLB initialization Muchun Song
2026-04-05 12:52 ` [PATCH 28/49] mm/hugetlb: switch HugeTLB to use generic vmemmap optimization Muchun Song
2026-04-05 12:52 ` [PATCH 29/49] mm: extract pfn_to_zone() helper Muchun Song
2026-04-05 12:52 ` [PATCH 30/49] mm/sparse-vmemmap: remove unused SPARSEMEM_VMEMMAP_PREINIT feature Muchun Song
2026-04-05 12:52 ` [PATCH 31/49] mm/hugetlb: remove HUGE_BOOTMEM_HVO flag and simplify pre-HVO logic Muchun Song
2026-04-05 12:52 ` [PATCH 32/49] mm/sparse-vmemmap: consolidate shared tail page allocation Muchun Song
2026-04-05 12:52 ` [PATCH 33/49] mm: introduce CONFIG_SPARSEMEM_VMEMMAP_OPTIMIZATION Muchun Song
2026-04-05 12:52 ` [PATCH 34/49] mm/sparse-vmemmap: switch DAX to use generic vmemmap optimization Muchun Song
2026-04-05 12:52 ` [PATCH 35/49] mm/sparse-vmemmap: introduce section zone to struct mem_section Muchun Song
2026-04-05 12:52 ` [PATCH 36/49] powerpc/mm: use generic vmemmap_shared_tail_page() in compound vmemmap Muchun Song
2026-04-05 12:52 ` [PATCH 37/49] mm/sparse-vmemmap: unify DAX and HugeTLB vmemmap optimization Muchun Song
2026-04-05 12:52 ` [PATCH 38/49] mm/sparse-vmemmap: remap the shared tail pages as read-only Muchun Song
2026-04-05 12:52 ` [PATCH 39/49] mm/sparse-vmemmap: remove unused ptpfn argument Muchun Song
2026-04-05 12:52 ` [PATCH 40/49] mm/hugetlb_vmemmap: remove vmemmap_wrprotect_hvo() and related code Muchun Song
2026-04-05 12:52 ` [PATCH 41/49] mm/sparse: simplify section_vmemmap_pages() Muchun Song
2026-04-05 12:52 ` [PATCH 42/49] mm/sparse-vmemmap: introduce section_vmemmap_page_structs() Muchun Song
2026-04-05 12:52 ` [PATCH 43/49] powerpc/mm: rely on generic vmemmap_can_optimize() to simplify code Muchun Song
2026-04-05 12:52 ` Muchun Song [this message]
2026-04-05 12:52 ` [PATCH 45/49] mm/sparse-vmemmap: drop @pgmap parameter from vmemmap populate APIs Muchun Song
2026-04-05 12:52 ` [PATCH 46/49] mm/sparse: replace pgmap with order and zone in sparse_add_section() Muchun Song
2026-04-05 12:52 ` [PATCH 47/49] mm: redefine HVO as Hugepage Vmemmap Optimization Muchun Song
2026-04-05 12:52 ` [PATCH 48/49] Documentation/mm: restructure vmemmap_dedup.rst to reflect generalized HVO Muchun Song
2026-04-05 12:52 ` [PATCH 49/49] mm: consolidate struct page power-of-2 size checks for HVO Muchun Song
2026-04-05 13:34 ` [PATCH 00/49] mm: Generalize vmemmap optimization for DAX and HugeTLB Mike Rapoport
2026-04-06 19:59 ` David Hildenbrand (arm)
2026-04-08 15:29 ` Frank van der Linden
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260405125240.2558577-45-songmuchun@bytedance.com \
--to=songmuchun@bytedance.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.ibm.com \
--cc=chleroy@kernel.org \
--cc=david@kernel.org \
--cc=joao.m.martins@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=ljs@kernel.org \
--cc=maddy@linux.ibm.com \
--cc=mhocko@suse.com \
--cc=mpe@ellerman.id.au \
--cc=muchun.song@linux.dev \
--cc=npiggin@gmail.com \
--cc=osalvador@suse.de \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox