* [PATCH RFC v3 01/19] mm: thread user_addr through page allocator for cache-friendly zeroing
2026-04-21 22:01 [PATCH RFC v3 00/19] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
@ 2026-04-21 22:01 ` Michael S. Tsirkin
2026-04-21 22:01 ` [PATCH RFC v3 02/19] mm: add folio_zero_user stub for configs without THP/HUGETLBFS Michael S. Tsirkin
` (17 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2026-04-21 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
Andrea Arcangeli, Gregory Price, linux-mm, virtualization,
Johannes Weiner, Zi Yan, Lorenzo Stoakes, Liam R. Howlett,
Mike Rapoport, Matthew Wilcox (Oracle),
Muchun Song, Oscar Salvador, Baolin Wang, Nico Pache,
Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Matthew Brost,
Joshua Hahn, Rakie Kim, Byungchul Park, Ying Huang,
Alistair Popple, Hugh Dickins, Christoph Lameter, David Rientjes,
Roman Gushchin, Harry Yoo, Chris Li, Kairui Song, Kemeng Shi,
Nhat Pham, Baoquan He, linux-fsdevel
Thread a user virtual address from vma_alloc_folio() down through
the page allocator to post_alloc_hook(). This is plumbing preparation
for a subsequent patch that will use user_addr to call folio_zero_user()
for cache-friendly zeroing of user pages.
The user_addr is stored in struct alloc_context and flows through:
vma_alloc_folio -> folio_alloc_mpol -> __alloc_pages_mpol ->
__alloc_frozen_pages -> get_page_from_freelist -> prep_new_page ->
post_alloc_hook
Public APIs (__alloc_pages, __folio_alloc, folio_alloc_mpol) gain a
user_addr parameter directly. Callers that do not need user_addr
pass USER_ADDR_NONE ((unsigned long)-1), since
address 0 is a valid user mapping.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
Assisted-by: cursor-agent:GPT-5.4-xhigh
---
include/linux/gfp.h | 25 +++++++++++++++++--------
mm/compaction.c | 6 ++----
mm/filemap.c | 3 ++-
mm/hugetlb.c | 36 ++++++++++++++++++++----------------
mm/internal.h | 9 ++++++---
mm/khugepaged.c | 2 +-
mm/mempolicy.c | 39 ++++++++++++++++++++++++++-------------
mm/migrate.c | 2 +-
mm/page_alloc.c | 38 ++++++++++++++++++++++----------------
mm/page_frag_cache.c | 4 ++--
mm/shmem.c | 2 +-
mm/slub.c | 4 ++--
mm/swap_state.c | 2 +-
13 files changed, 103 insertions(+), 69 deletions(-)
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 51ef13ed756e..10f653338042 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -226,12 +226,18 @@ static inline void arch_free_page(struct page *page, int order) { }
static inline void arch_alloc_page(struct page *page, int order) { }
#endif
+/*
+ * Sentinel for user_addr: indicates a non-user allocation.
+ * Cannot use 0 because address 0 is a valid userspace mapping.
+ */
+#define USER_ADDR_NONE ((unsigned long)-1)
+
struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order, int preferred_nid,
- nodemask_t *nodemask);
+ nodemask_t *nodemask, unsigned long user_addr);
#define __alloc_pages(...) alloc_hooks(__alloc_pages_noprof(__VA_ARGS__))
struct folio *__folio_alloc_noprof(gfp_t gfp, unsigned int order, int preferred_nid,
- nodemask_t *nodemask);
+ nodemask_t *nodemask, unsigned long user_addr);
#define __folio_alloc(...) alloc_hooks(__folio_alloc_noprof(__VA_ARGS__))
unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid,
@@ -286,7 +292,7 @@ __alloc_pages_node_noprof(int nid, gfp_t gfp_mask, unsigned int order)
VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
warn_if_node_offline(nid, gfp_mask);
- return __alloc_pages_noprof(gfp_mask, order, nid, NULL);
+ return __alloc_pages_noprof(gfp_mask, order, nid, NULL, USER_ADDR_NONE);
}
#define __alloc_pages_node(...) alloc_hooks(__alloc_pages_node_noprof(__VA_ARGS__))
@@ -297,7 +303,7 @@ struct folio *__folio_alloc_node_noprof(gfp_t gfp, unsigned int order, int nid)
VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
warn_if_node_offline(nid, gfp);
- return __folio_alloc_noprof(gfp, order, nid, NULL);
+ return __folio_alloc_noprof(gfp, order, nid, NULL, USER_ADDR_NONE);
}
#define __folio_alloc_node(...) alloc_hooks(__folio_alloc_node_noprof(__VA_ARGS__))
@@ -322,7 +328,8 @@ static inline struct page *alloc_pages_node_noprof(int nid, gfp_t gfp_mask,
struct page *alloc_pages_noprof(gfp_t gfp, unsigned int order);
struct folio *folio_alloc_noprof(gfp_t gfp, unsigned int order);
struct folio *folio_alloc_mpol_noprof(gfp_t gfp, unsigned int order,
- struct mempolicy *mpol, pgoff_t ilx, int nid);
+ struct mempolicy *mpol, pgoff_t ilx, int nid,
+ unsigned long user_addr);
struct folio *vma_alloc_folio_noprof(gfp_t gfp, int order, struct vm_area_struct *vma,
unsigned long addr);
#else
@@ -335,14 +342,16 @@ static inline struct folio *folio_alloc_noprof(gfp_t gfp, unsigned int order)
return __folio_alloc_node_noprof(gfp, order, numa_node_id());
}
static inline struct folio *folio_alloc_mpol_noprof(gfp_t gfp, unsigned int order,
- struct mempolicy *mpol, pgoff_t ilx, int nid)
+ struct mempolicy *mpol, pgoff_t ilx, int nid,
+ unsigned long user_addr)
{
- return folio_alloc_noprof(gfp, order);
+ return __folio_alloc_noprof(gfp, order, numa_node_id(), NULL, user_addr);
}
static inline struct folio *vma_alloc_folio_noprof(gfp_t gfp, int order,
struct vm_area_struct *vma, unsigned long addr)
{
- return folio_alloc_noprof(gfp, order);
+ return folio_alloc_mpol_noprof(gfp, order, NULL, 0, numa_node_id(),
+ addr);
}
#endif
diff --git a/mm/compaction.c b/mm/compaction.c
index 1e8f8eca318c..82f2914962f5 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -82,8 +82,7 @@ static inline bool is_via_compact_memory(int order) { return false; }
static struct page *mark_allocated_noprof(struct page *page, unsigned int order, gfp_t gfp_flags)
{
- post_alloc_hook(page, order, __GFP_MOVABLE);
- set_page_refcounted(page);
+ post_alloc_hook(page, order, __GFP_MOVABLE, USER_ADDR_NONE)
return page;
}
#define mark_allocated(...) alloc_hooks(mark_allocated_noprof(__VA_ARGS__))
@@ -1832,8 +1831,7 @@ static struct folio *compaction_alloc_noprof(struct folio *src, unsigned long da
set_page_private(&freepage[size], start_order);
}
dst = (struct folio *)freepage;
-
- post_alloc_hook(&dst->page, order, __GFP_MOVABLE);
+ post_alloc_hook(&dst->page, order, __GFP_MOVABLE, USER_ADDR_NONE);
set_page_refcounted(&dst->page);
if (order)
prep_compound_page(&dst->page, order);
diff --git a/mm/filemap.c b/mm/filemap.c
index 6cd7974d4ada..bfc6554b993d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -998,7 +998,8 @@ struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order,
if (policy)
return folio_alloc_mpol_noprof(gfp, order, policy,
- NO_INTERLEAVE_INDEX, numa_node_id());
+ NO_INTERLEAVE_INDEX, numa_node_id(),
+ USER_ADDR_NONE);
if (cpuset_do_page_mem_spread()) {
unsigned int cpuset_mems_cookie;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 0beb6e22bc26..de8361b503d2 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1842,7 +1842,8 @@ struct address_space *hugetlb_folio_mapping_lock_write(struct folio *folio)
}
static struct folio *alloc_buddy_frozen_folio(int order, gfp_t gfp_mask,
- int nid, nodemask_t *nmask, nodemask_t *node_alloc_noretry)
+ int nid, nodemask_t *nmask, nodemask_t *node_alloc_noretry,
+ unsigned long addr)
{
struct folio *folio;
bool alloc_try_hard = true;
@@ -1859,7 +1860,7 @@ static struct folio *alloc_buddy_frozen_folio(int order, gfp_t gfp_mask,
if (alloc_try_hard)
gfp_mask |= __GFP_RETRY_MAYFAIL;
- folio = (struct folio *)__alloc_frozen_pages(gfp_mask, order, nid, nmask);
+ folio = (struct folio *)__alloc_frozen_pages(gfp_mask, order, nid, nmask, addr);
/*
* If we did not specify __GFP_RETRY_MAYFAIL, but still got a
@@ -1888,7 +1889,7 @@ static struct folio *alloc_buddy_frozen_folio(int order, gfp_t gfp_mask,
static struct folio *only_alloc_fresh_hugetlb_folio(struct hstate *h,
gfp_t gfp_mask, int nid, nodemask_t *nmask,
- nodemask_t *node_alloc_noretry)
+ nodemask_t *node_alloc_noretry, unsigned long addr)
{
struct folio *folio;
int order = huge_page_order(h);
@@ -1900,7 +1901,7 @@ static struct folio *only_alloc_fresh_hugetlb_folio(struct hstate *h,
folio = alloc_gigantic_frozen_folio(order, gfp_mask, nid, nmask);
else
folio = alloc_buddy_frozen_folio(order, gfp_mask, nid, nmask,
- node_alloc_noretry);
+ node_alloc_noretry, addr);
if (folio)
init_new_hugetlb_folio(folio);
return folio;
@@ -1914,11 +1915,12 @@ static struct folio *only_alloc_fresh_hugetlb_folio(struct hstate *h,
* pages is zero, and the accounting must be done in the caller.
*/
static struct folio *alloc_fresh_hugetlb_folio(struct hstate *h,
- gfp_t gfp_mask, int nid, nodemask_t *nmask)
+ gfp_t gfp_mask, int nid, nodemask_t *nmask,
+ unsigned long addr)
{
struct folio *folio;
- folio = only_alloc_fresh_hugetlb_folio(h, gfp_mask, nid, nmask, NULL);
+ folio = only_alloc_fresh_hugetlb_folio(h, gfp_mask, nid, nmask, NULL, addr);
if (folio)
hugetlb_vmemmap_optimize_folio(h, folio);
return folio;
@@ -1958,7 +1960,7 @@ static struct folio *alloc_pool_huge_folio(struct hstate *h,
struct folio *folio;
folio = only_alloc_fresh_hugetlb_folio(h, gfp_mask, node,
- nodes_allowed, node_alloc_noretry);
+ nodes_allowed, node_alloc_noretry, USER_ADDR_NONE);
if (folio)
return folio;
}
@@ -2127,7 +2129,8 @@ int dissolve_free_hugetlb_folios(unsigned long start_pfn, unsigned long end_pfn)
* Allocates a fresh surplus page from the page allocator.
*/
static struct folio *alloc_surplus_hugetlb_folio(struct hstate *h,
- gfp_t gfp_mask, int nid, nodemask_t *nmask)
+ gfp_t gfp_mask, int nid, nodemask_t *nmask,
+ unsigned long addr)
{
struct folio *folio = NULL;
@@ -2139,7 +2142,7 @@ static struct folio *alloc_surplus_hugetlb_folio(struct hstate *h,
goto out_unlock;
spin_unlock_irq(&hugetlb_lock);
- folio = alloc_fresh_hugetlb_folio(h, gfp_mask, nid, nmask);
+ folio = alloc_fresh_hugetlb_folio(h, gfp_mask, nid, nmask, addr);
if (!folio)
return NULL;
@@ -2182,7 +2185,7 @@ static struct folio *alloc_migrate_hugetlb_folio(struct hstate *h, gfp_t gfp_mas
if (hstate_is_gigantic(h))
return NULL;
- folio = alloc_fresh_hugetlb_folio(h, gfp_mask, nid, nmask);
+ folio = alloc_fresh_hugetlb_folio(h, gfp_mask, nid, nmask, USER_ADDR_NONE);
if (!folio)
return NULL;
@@ -2218,14 +2221,14 @@ struct folio *alloc_buddy_hugetlb_folio_with_mpol(struct hstate *h,
if (mpol_is_preferred_many(mpol)) {
gfp_t gfp = gfp_mask & ~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL);
- folio = alloc_surplus_hugetlb_folio(h, gfp, nid, nodemask);
+ folio = alloc_surplus_hugetlb_folio(h, gfp, nid, nodemask, addr);
/* Fallback to all nodes if page==NULL */
nodemask = NULL;
}
if (!folio)
- folio = alloc_surplus_hugetlb_folio(h, gfp_mask, nid, nodemask);
+ folio = alloc_surplus_hugetlb_folio(h, gfp_mask, nid, nodemask, addr);
mpol_cond_put(mpol);
return folio;
}
@@ -2332,7 +2335,8 @@ static int gather_surplus_pages(struct hstate *h, long delta)
* down the road to pick the current node if that is the case.
*/
folio = alloc_surplus_hugetlb_folio(h, htlb_alloc_mask(h),
- NUMA_NO_NODE, &alloc_nodemask);
+ NUMA_NO_NODE, &alloc_nodemask,
+ USER_ADDR_NONE);
if (!folio) {
alloc_ok = false;
break;
@@ -2738,7 +2742,7 @@ static int alloc_and_dissolve_hugetlb_folio(struct folio *old_folio,
spin_unlock_irq(&hugetlb_lock);
gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE;
new_folio = alloc_fresh_hugetlb_folio(h, gfp_mask,
- nid, NULL);
+ nid, NULL, USER_ADDR_NONE);
if (!new_folio)
return -ENOMEM;
goto retry;
@@ -3434,13 +3438,13 @@ static void __init hugetlb_hstate_alloc_pages_onenode(struct hstate *h, int nid)
gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE;
folio = only_alloc_fresh_hugetlb_folio(h, gfp_mask, nid,
- &node_states[N_MEMORY], NULL);
+ &node_states[N_MEMORY], NULL, USER_ADDR_NONE);
if (!folio && !list_empty(&folio_list) &&
hugetlb_vmemmap_optimizable_size(h)) {
prep_and_add_allocated_folios(h, &folio_list);
INIT_LIST_HEAD(&folio_list);
folio = only_alloc_fresh_hugetlb_folio(h, gfp_mask, nid,
- &node_states[N_MEMORY], NULL);
+ &node_states[N_MEMORY], NULL, USER_ADDR_NONE);
}
if (!folio)
break;
diff --git a/mm/internal.h b/mm/internal.h
index cb0af847d7d9..0b9c0bd133d3 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -672,6 +672,7 @@ struct alloc_context {
*/
enum zone_type highest_zoneidx;
bool spread_dirty_pages;
+ unsigned long user_addr;
};
/*
@@ -887,16 +888,18 @@ static inline void prep_compound_tail(struct page *head, int tail_idx)
set_page_private(p, 0);
}
-void post_alloc_hook(struct page *page, unsigned int order, gfp_t gfp_flags);
+void post_alloc_hook(struct page *page, unsigned int order, gfp_t gfp_flags,
+ unsigned long user_addr);
extern bool free_pages_prepare(struct page *page, unsigned int order);
extern int user_min_free_kbytes;
struct page *__alloc_frozen_pages_noprof(gfp_t, unsigned int order, int nid,
- nodemask_t *);
+ nodemask_t *, unsigned long user_addr);
#define __alloc_frozen_pages(...) \
alloc_hooks(__alloc_frozen_pages_noprof(__VA_ARGS__))
void free_frozen_pages(struct page *page, unsigned int order);
+void free_frozen_pages_zeroed(struct page *page, unsigned int order);
void free_unref_folios(struct folio_batch *fbatch);
#ifdef CONFIG_NUMA
@@ -904,7 +907,7 @@ struct page *alloc_frozen_pages_noprof(gfp_t, unsigned int order);
#else
static inline struct page *alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order)
{
- return __alloc_frozen_pages_noprof(gfp, order, numa_node_id(), NULL);
+ return __alloc_frozen_pages_noprof(gfp, order, numa_node_id(), NULL, USER_ADDR_NONE);
}
#endif
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 1dd3cfca610d..f7e0f37f0632 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1055,7 +1055,7 @@ static enum scan_result alloc_charge_folio(struct folio **foliop, struct mm_stru
int node = hpage_collapse_find_target_node(cc);
struct folio *folio;
- folio = __folio_alloc(gfp, HPAGE_PMD_ORDER, node, &cc->alloc_nmask);
+ folio = __folio_alloc(gfp, HPAGE_PMD_ORDER, node, &cc->alloc_nmask, USER_ADDR_NONE);
if (!folio) {
*foliop = NULL;
count_vm_event(THP_COLLAPSE_ALLOC_FAILED);
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 0e5175f1c767..ca2f430a7ffd 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1454,7 +1454,7 @@ static struct folio *alloc_migration_target_by_mpol(struct folio *src,
else
gfp = GFP_HIGHUSER_MOVABLE | __GFP_RETRY_MAYFAIL | __GFP_COMP;
- return folio_alloc_mpol(gfp, order, pol, ilx, nid);
+ return folio_alloc_mpol(gfp, order, pol, ilx, nid, USER_ADDR_NONE);
}
#else
@@ -2406,7 +2406,8 @@ bool mempolicy_in_oom_domain(struct task_struct *tsk,
}
static struct page *alloc_pages_preferred_many(gfp_t gfp, unsigned int order,
- int nid, nodemask_t *nodemask)
+ int nid, nodemask_t *nodemask,
+ unsigned long user_addr)
{
struct page *page;
gfp_t preferred_gfp;
@@ -2419,9 +2420,11 @@ static struct page *alloc_pages_preferred_many(gfp_t gfp, unsigned int order,
*/
preferred_gfp = gfp | __GFP_NOWARN;
preferred_gfp &= ~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL);
- page = __alloc_frozen_pages_noprof(preferred_gfp, order, nid, nodemask);
+ page = __alloc_frozen_pages_noprof(preferred_gfp, order, nid,
+ nodemask, user_addr);
if (!page)
- page = __alloc_frozen_pages_noprof(gfp, order, nid, NULL);
+ page = __alloc_frozen_pages_noprof(gfp, order, nid, NULL,
+ user_addr);
return page;
}
@@ -2436,8 +2439,9 @@ static struct page *alloc_pages_preferred_many(gfp_t gfp, unsigned int order,
*
* Return: The page on success or NULL if allocation fails.
*/
-static struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
- struct mempolicy *pol, pgoff_t ilx, int nid)
+static struct page *__alloc_pages_mpol(gfp_t gfp, unsigned int order,
+ struct mempolicy *pol, pgoff_t ilx, int nid,
+ unsigned long user_addr)
{
nodemask_t *nodemask;
struct page *page;
@@ -2445,7 +2449,8 @@ static struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
nodemask = policy_nodemask(gfp, pol, ilx, &nid);
if (pol->mode == MPOL_PREFERRED_MANY)
- return alloc_pages_preferred_many(gfp, order, nid, nodemask);
+ return alloc_pages_preferred_many(gfp, order, nid, nodemask,
+ user_addr);
if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
/* filter "hugepage" allocation, unless from alloc_pages() */
@@ -2469,7 +2474,7 @@ static struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
*/
page = __alloc_frozen_pages_noprof(
gfp | __GFP_THISNODE | __GFP_NORETRY, order,
- nid, NULL);
+ nid, NULL, user_addr);
if (page || !(gfp & __GFP_DIRECT_RECLAIM))
return page;
/*
@@ -2481,7 +2486,7 @@ static struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
}
}
- page = __alloc_frozen_pages_noprof(gfp, order, nid, nodemask);
+ page = __alloc_frozen_pages_noprof(gfp, order, nid, nodemask, user_addr);
if (unlikely(pol->mode == MPOL_INTERLEAVE ||
pol->mode == MPOL_WEIGHTED_INTERLEAVE) && page) {
@@ -2497,17 +2502,25 @@ static struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
return page;
}
-struct folio *folio_alloc_mpol_noprof(gfp_t gfp, unsigned int order,
+static struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
struct mempolicy *pol, pgoff_t ilx, int nid)
{
- struct page *page = alloc_pages_mpol(gfp | __GFP_COMP, order, pol,
- ilx, nid);
+ return __alloc_pages_mpol(gfp, order, pol, ilx, nid, USER_ADDR_NONE);
+}
+
+struct folio *folio_alloc_mpol_noprof(gfp_t gfp, unsigned int order,
+ struct mempolicy *pol, pgoff_t ilx, int nid,
+ unsigned long user_addr)
+{
+ struct page *page = __alloc_pages_mpol(gfp | __GFP_COMP, order, pol,
+ ilx, nid, user_addr);
if (!page)
return NULL;
set_page_refcounted(page);
return page_rmappable_folio(page);
}
+EXPORT_SYMBOL(folio_alloc_mpol_noprof);
/**
* vma_alloc_folio - Allocate a folio for a VMA.
@@ -2535,7 +2548,7 @@ struct folio *vma_alloc_folio_noprof(gfp_t gfp, int order, struct vm_area_struct
gfp |= __GFP_NOWARN;
pol = get_vma_policy(vma, addr, order, &ilx);
- folio = folio_alloc_mpol_noprof(gfp, order, pol, ilx, numa_node_id());
+ folio = folio_alloc_mpol_noprof(gfp, order, pol, ilx, numa_node_id(), addr);
mpol_cond_put(pol);
return folio;
}
diff --git a/mm/migrate.c b/mm/migrate.c
index 1bf2cf8c44dd..df805a763991 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2202,7 +2202,7 @@ struct folio *alloc_migration_target(struct folio *src, unsigned long private)
if (is_highmem_idx(zidx) || zidx == ZONE_MOVABLE)
gfp_mask |= __GFP_HIGHMEM;
- return __folio_alloc(gfp_mask, order, nid, mtc->nmask);
+ return __folio_alloc(gfp_mask, order, nid, mtc->nmask, USER_ADDR_NONE);
}
#ifdef CONFIG_NUMA
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2d4b6f1a554e..1cf5551849fe 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1837,7 +1837,7 @@ static inline bool should_skip_init(gfp_t flags)
}
inline void post_alloc_hook(struct page *page, unsigned int order,
- gfp_t gfp_flags)
+ gfp_t gfp_flags, unsigned long user_addr)
{
bool init = !want_init_on_free() && want_init_on_alloc(gfp_flags) &&
!should_skip_init(gfp_flags);
@@ -1892,9 +1892,10 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
}
static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags,
- unsigned int alloc_flags)
+ unsigned int alloc_flags,
+ unsigned long user_addr)
{
- post_alloc_hook(page, order, gfp_flags);
+ post_alloc_hook(page, order, gfp_flags, user_addr);
if (order && (gfp_flags & __GFP_COMP))
prep_compound_page(page, order);
@@ -3959,7 +3960,8 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
page = rmqueue(zonelist_zone(ac->preferred_zoneref), zone, order,
gfp_mask, alloc_flags, ac->migratetype);
if (page) {
- prep_new_page(page, order, gfp_mask, alloc_flags);
+ prep_new_page(page, order, gfp_mask, alloc_flags,
+ ac->user_addr);
/*
* If this is a high-order atomic allocation then check
@@ -4194,7 +4196,8 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
/* Prep a captured page if available */
if (page)
- prep_new_page(page, order, gfp_mask, alloc_flags);
+ prep_new_page(page, order, gfp_mask, alloc_flags,
+ ac->user_addr);
/* Try get a page from the freelist if available */
if (!page)
@@ -5187,7 +5190,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid,
}
nr_account++;
- prep_new_page(page, 0, gfp, 0);
+ prep_new_page(page, 0, gfp, 0, USER_ADDR_NONE);
set_page_refcounted(page);
page_array[nr_populated++] = page;
}
@@ -5201,7 +5204,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid,
return nr_populated;
failed:
- page = __alloc_pages_noprof(gfp, 0, preferred_nid, nodemask);
+ page = __alloc_pages_noprof(gfp, 0, preferred_nid, nodemask, USER_ADDR_NONE);
if (page)
page_array[nr_populated++] = page;
goto out;
@@ -5212,12 +5215,13 @@ EXPORT_SYMBOL_GPL(alloc_pages_bulk_noprof);
* This is the 'heart' of the zoned buddy allocator.
*/
struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
- int preferred_nid, nodemask_t *nodemask)
+ int preferred_nid, nodemask_t *nodemask,
+ unsigned long user_addr)
{
struct page *page;
unsigned int alloc_flags = ALLOC_WMARK_LOW;
gfp_t alloc_gfp; /* The gfp_t that was actually used for allocation */
- struct alloc_context ac = { };
+ struct alloc_context ac = { .user_addr = user_addr };
/*
* There are several places where we assume that the order value is sane
@@ -5277,11 +5281,13 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
EXPORT_SYMBOL(__alloc_frozen_pages_noprof);
struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order,
- int preferred_nid, nodemask_t *nodemask)
+ int preferred_nid, nodemask_t *nodemask,
+ unsigned long user_addr)
{
struct page *page;
- page = __alloc_frozen_pages_noprof(gfp, order, preferred_nid, nodemask);
+ page = __alloc_frozen_pages_noprof(gfp, order, preferred_nid,
+ nodemask, user_addr);
if (page)
set_page_refcounted(page);
return page;
@@ -5289,10 +5295,10 @@ struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order,
EXPORT_SYMBOL(__alloc_pages_noprof);
struct folio *__folio_alloc_noprof(gfp_t gfp, unsigned int order, int preferred_nid,
- nodemask_t *nodemask)
+ nodemask_t *nodemask, unsigned long user_addr)
{
struct page *page = __alloc_pages_noprof(gfp | __GFP_COMP, order,
- preferred_nid, nodemask);
+ preferred_nid, nodemask, user_addr);
return page_rmappable_folio(page);
}
EXPORT_SYMBOL(__folio_alloc_noprof);
@@ -6910,7 +6916,7 @@ static void split_free_frozen_pages(struct list_head *list, gfp_t gfp_mask)
list_for_each_entry_safe(page, next, &list[order], lru) {
int i;
- post_alloc_hook(page, order, gfp_mask);
+ post_alloc_hook(page, order, gfp_mask, USER_ADDR_NONE);
if (!order)
continue;
@@ -7116,7 +7122,7 @@ int alloc_contig_frozen_range_noprof(unsigned long start, unsigned long end,
struct page *head = pfn_to_page(start);
check_new_pages(head, order);
- prep_new_page(head, order, gfp_mask, 0);
+ prep_new_page(head, order, gfp_mask, 0, USER_ADDR_NONE);
} else {
ret = -EINVAL;
WARN(true, "PFN range: requested [%lu, %lu), allocated [%lu, %lu)\n",
@@ -7781,7 +7787,7 @@ struct page *alloc_frozen_pages_nolock_noprof(gfp_t gfp_flags, int nid, unsigned
gfp_t alloc_gfp = __GFP_NOWARN | __GFP_ZERO | __GFP_NOMEMALLOC | __GFP_COMP
| gfp_flags;
unsigned int alloc_flags = ALLOC_TRYLOCK;
- struct alloc_context ac = { };
+ struct alloc_context ac = { .user_addr = USER_ADDR_NONE };
struct page *page;
VM_WARN_ON_ONCE(gfp_flags & ~__GFP_ACCOUNT);
diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c
index d2423f30577e..bcd3d1aa8589 100644
--- a/mm/page_frag_cache.c
+++ b/mm/page_frag_cache.c
@@ -57,10 +57,10 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
gfp_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) | __GFP_COMP |
__GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC;
page = __alloc_pages(gfp_mask, PAGE_FRAG_CACHE_MAX_ORDER,
- numa_mem_id(), NULL);
+ numa_mem_id(), NULL, USER_ADDR_NONE);
#endif
if (unlikely(!page)) {
- page = __alloc_pages(gfp, 0, numa_mem_id(), NULL);
+ page = __alloc_pages(gfp, 0, numa_mem_id(), NULL, USER_ADDR_NONE);
order = 0;
}
diff --git a/mm/shmem.c b/mm/shmem.c
index b40f3cd48961..896cef466b0c 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1927,7 +1927,7 @@ static struct folio *shmem_alloc_folio(gfp_t gfp, int order,
struct folio *folio;
mpol = shmem_get_pgoff_policy(info, index, order, &ilx);
- folio = folio_alloc_mpol(gfp, order, mpol, ilx, numa_node_id());
+ folio = folio_alloc_mpol(gfp, order, mpol, ilx, numa_node_id(), USER_ADDR_NONE);
mpol_cond_put(mpol);
return folio;
diff --git a/mm/slub.c b/mm/slub.c
index 0c906fefc31b..fc8f998a0fe1 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3266,7 +3266,7 @@ static inline struct slab *alloc_slab_page(gfp_t flags, int node,
else if (node == NUMA_NO_NODE)
page = alloc_frozen_pages(flags, order);
else
- page = __alloc_frozen_pages(flags, order, node, NULL);
+ page = __alloc_frozen_pages(flags, order, node, NULL, USER_ADDR_NONE);
if (!page)
return NULL;
@@ -5178,7 +5178,7 @@ static void *___kmalloc_large_node(size_t size, gfp_t flags, int node)
if (node == NUMA_NO_NODE)
page = alloc_frozen_pages_noprof(flags, order);
else
- page = __alloc_frozen_pages_noprof(flags, order, node, NULL);
+ page = __alloc_frozen_pages_noprof(flags, order, node, NULL, USER_ADDR_NONE);
if (page) {
ptr = page_address(page);
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 6d0eef7470be..12ac29ae818c 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -568,7 +568,7 @@ struct folio *swap_cache_alloc_folio(swp_entry_t entry, gfp_t gfp_mask,
return NULL;
/* Allocate a new folio to be added into the swap cache. */
- folio = folio_alloc_mpol(gfp_mask, 0, mpol, ilx, numa_node_id());
+ folio = folio_alloc_mpol(gfp_mask, 0, mpol, ilx, numa_node_id(), USER_ADDR_NONE);
if (!folio)
return NULL;
/* Try add the new folio, returns existing folio or NULL on failure. */
--
MST
^ permalink raw reply [flat|nested] 20+ messages in thread* [PATCH RFC v3 02/19] mm: add folio_zero_user stub for configs without THP/HUGETLBFS
2026-04-21 22:01 [PATCH RFC v3 00/19] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
2026-04-21 22:01 ` [PATCH RFC v3 01/19] mm: thread user_addr through page allocator for cache-friendly zeroing Michael S. Tsirkin
@ 2026-04-21 22:01 ` Michael S. Tsirkin
2026-04-21 22:01 ` [PATCH RFC v3 03/19] mm: page_alloc: move prep_compound_page before post_alloc_hook Michael S. Tsirkin
` (16 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2026-04-21 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
Andrea Arcangeli, Gregory Price, linux-mm, virtualization,
Lorenzo Stoakes, Liam R. Howlett, Mike Rapoport
folio_zero_user() is defined in mm/memory.c under
CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS. A subsequent patch
will call it from post_alloc_hook() for all user page zeroing, so
configs without THP or HUGETLBFS will need a stub.
Add a macro in the #else branch that falls back to
clear_user_highpages(), which handles cache aliasing correctly on
VIPT architectures and is always available via highmem.h.
Without THP/HUGETLBFS, only order-0 user pages are allocated, so
the locality optimization in the real folio_zero_user() (zero near
the faulting address last) is not needed.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
Assisted-by: cursor-agent:GPT-5.4-xhigh
---
include/linux/mm.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5be3d8a8f806..541d36e5e420 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4718,6 +4718,9 @@ static inline bool vma_is_special_huge(const struct vm_area_struct *vma)
(vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)));
}
+#else /* !CONFIG_TRANSPARENT_HUGEPAGE && !CONFIG_HUGETLBFS */
+#define folio_zero_user(folio, addr_hint) \
+ clear_user_highpages(&(folio)->page, (addr_hint), folio_nr_pages(folio))
#endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */
#if MAX_NUMNODES > 1
--
MST
^ permalink raw reply [flat|nested] 20+ messages in thread* [PATCH RFC v3 03/19] mm: page_alloc: move prep_compound_page before post_alloc_hook
2026-04-21 22:01 [PATCH RFC v3 00/19] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
2026-04-21 22:01 ` [PATCH RFC v3 01/19] mm: thread user_addr through page allocator for cache-friendly zeroing Michael S. Tsirkin
2026-04-21 22:01 ` [PATCH RFC v3 02/19] mm: add folio_zero_user stub for configs without THP/HUGETLBFS Michael S. Tsirkin
@ 2026-04-21 22:01 ` Michael S. Tsirkin
2026-04-21 22:01 ` [PATCH RFC v3 04/19] mm: use folio_zero_user for user pages in post_alloc_hook Michael S. Tsirkin
` (15 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2026-04-21 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
Andrea Arcangeli, Gregory Price, linux-mm, virtualization,
Johannes Weiner, Zi Yan
Move prep_compound_page() before post_alloc_hook() in prep_new_page().
The next patch adds a folio_zero_user() call to post_alloc_hook(),
which uses folio_nr_pages() to determine how many pages to zero.
Without compound metadata set up first, folio_nr_pages() returns 1
for higher-order allocations, so only the first page would be zeroed.
All other operations in post_alloc_hook() (arch_alloc_page, KASAN,
debug, page owner, etc.) use raw page pointers with explicit order
counts and are unaffected by this reordering.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
Assisted-by: cursor-agent:GPT-5.4-xhigh
---
mm/page_alloc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1cf5551849fe..99c01eb2d59e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1895,11 +1895,11 @@ static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags
unsigned int alloc_flags,
unsigned long user_addr)
{
- post_alloc_hook(page, order, gfp_flags, user_addr);
-
if (order && (gfp_flags & __GFP_COMP))
prep_compound_page(page, order);
+ post_alloc_hook(page, order, gfp_flags, user_addr);
+
/*
* page is set pfmemalloc when ALLOC_NO_WATERMARKS was necessary to
* allocate the page. The expectation is that the caller is taking
--
MST
^ permalink raw reply [flat|nested] 20+ messages in thread* [PATCH RFC v3 04/19] mm: use folio_zero_user for user pages in post_alloc_hook
2026-04-21 22:01 [PATCH RFC v3 00/19] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
` (2 preceding siblings ...)
2026-04-21 22:01 ` [PATCH RFC v3 03/19] mm: page_alloc: move prep_compound_page before post_alloc_hook Michael S. Tsirkin
@ 2026-04-21 22:01 ` Michael S. Tsirkin
2026-04-21 22:01 ` [PATCH RFC v3 05/19] mm: use __GFP_ZERO in vma_alloc_zeroed_movable_folio Michael S. Tsirkin
` (14 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2026-04-21 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
Andrea Arcangeli, Gregory Price, linux-mm, virtualization,
Johannes Weiner, Zi Yan
When post_alloc_hook() needs to zero a page for an explicit
__GFP_ZERO allocation and user_addr is set, use folio_zero_user()
instead of kernel_init_pages(). This zeros near the faulting
address last, keeping those cachelines hot for the impending
user access.
folio_zero_user() is only used for explicit __GFP_ZERO, not for
init_on_alloc. On architectures with virtually-indexed caches
(e.g., ARM), clear_user_highpage() performs per-line cache
operations; using it for init_on_alloc would add overhead that
kernel_init_pages() avoids (the page fault path flushes the
cache at PTE installation time regardless).
No functional change yet: current callers do not pass __GFP_ZERO
for user pages (they zero at the callsite instead). Subsequent
patches will convert them.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
---
mm/page_alloc.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 99c01eb2d59e..db2192ffc27c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1882,9 +1882,20 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
for (i = 0; i != 1 << order; ++i)
page_kasan_tag_reset(page + i);
}
- /* If memory is still not initialized, initialize it now. */
- if (init)
- kernel_init_pages(page, 1 << order);
+ /*
+ * If memory is still not initialized, initialize it now.
+ * When __GFP_ZERO was explicitly requested and user_addr is set,
+ * use folio_zero_user() which zeros near the faulting address
+ * last, keeping those cachelines hot. For init_on_alloc, use
+ * kernel_init_pages() to avoid unnecessary cache flush overhead
+ * on architectures with virtually-indexed caches.
+ */
+ if (init) {
+ if ((gfp_flags & __GFP_ZERO) && user_addr != USER_ADDR_NONE)
+ folio_zero_user(page_folio(page), user_addr);
+ else
+ kernel_init_pages(page, 1 << order);
+ }
set_page_owner(page, order, gfp_flags);
page_table_check_alloc(page, order);
--
MST
^ permalink raw reply [flat|nested] 20+ messages in thread* [PATCH RFC v3 05/19] mm: use __GFP_ZERO in vma_alloc_zeroed_movable_folio
2026-04-21 22:01 [PATCH RFC v3 00/19] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
` (3 preceding siblings ...)
2026-04-21 22:01 ` [PATCH RFC v3 04/19] mm: use folio_zero_user for user pages in post_alloc_hook Michael S. Tsirkin
@ 2026-04-21 22:01 ` Michael S. Tsirkin
2026-04-21 22:01 ` [PATCH RFC v3 06/19] mm: use __GFP_ZERO in alloc_anon_folio Michael S. Tsirkin
` (13 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2026-04-21 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
Andrea Arcangeli, Gregory Price, linux-mm, virtualization,
Lorenzo Stoakes, Liam R. Howlett, Mike Rapoport
Now that post_alloc_hook() handles cache-friendly user page
zeroing via folio_zero_user(), convert vma_alloc_zeroed_movable_folio()
to pass __GFP_ZERO instead of zeroing at the callsite.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
---
include/linux/highmem.h | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index af03db851a1d..ffa683f64f1d 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -320,13 +320,8 @@ static inline
struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
unsigned long vaddr)
{
- struct folio *folio;
-
- folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr);
- if (folio && user_alloc_needs_zeroing())
- clear_user_highpage(&folio->page, vaddr);
-
- return folio;
+ return vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO,
+ 0, vma, vaddr);
}
#endif
--
MST
^ permalink raw reply [flat|nested] 20+ messages in thread* [PATCH RFC v3 06/19] mm: use __GFP_ZERO in alloc_anon_folio
2026-04-21 22:01 [PATCH RFC v3 00/19] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
` (4 preceding siblings ...)
2026-04-21 22:01 ` [PATCH RFC v3 05/19] mm: use __GFP_ZERO in vma_alloc_zeroed_movable_folio Michael S. Tsirkin
@ 2026-04-21 22:01 ` Michael S. Tsirkin
2026-04-21 22:01 ` [PATCH RFC v3 07/19] mm: use __GFP_ZERO in vma_alloc_anon_folio_pmd Michael S. Tsirkin
` (12 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2026-04-21 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
Andrea Arcangeli, Gregory Price, linux-mm, virtualization,
Lorenzo Stoakes, Liam R. Howlett, Mike Rapoport
Convert alloc_anon_folio() to pass __GFP_ZERO instead of
zeroing at the callsite. The allocator now handles
cache-friendly zeroing via folio_zero_user() in post_alloc_hook().
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
---
mm/memory.c | 15 +++------------
1 file changed, 3 insertions(+), 12 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index 07778814b4a8..ed3797a6e121 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4662,7 +4662,7 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf)
gfp = vma_thp_gfp_mask(vma);
while (orders) {
addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order);
- folio = vma_alloc_folio(gfp, order, vma, addr);
+ folio = vma_alloc_folio(gfp, order, vma, vmf->address);
if (folio) {
if (!mem_cgroup_swapin_charge_folio(folio, vma->vm_mm,
gfp, entry))
@@ -5176,10 +5176,10 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
goto fallback;
/* Try allocating the highest of the remaining orders. */
- gfp = vma_thp_gfp_mask(vma);
+ gfp = vma_thp_gfp_mask(vma) | __GFP_ZERO;
while (orders) {
addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order);
- folio = vma_alloc_folio(gfp, order, vma, addr);
+ folio = vma_alloc_folio(gfp, order, vma, vmf->address);
if (folio) {
if (mem_cgroup_charge(folio, vma->vm_mm, gfp)) {
count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
@@ -5187,15 +5187,6 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
goto next;
}
folio_throttle_swaprate(folio, gfp);
- /*
- * When a folio is not zeroed during allocation
- * (__GFP_ZERO not used) or user folios require special
- * handling, folio_zero_user() is used to make sure
- * that the page corresponding to the faulting address
- * will be hot in the cache after zeroing.
- */
- if (user_alloc_needs_zeroing())
- folio_zero_user(folio, vmf->address);
return folio;
}
next:
--
MST
^ permalink raw reply [flat|nested] 20+ messages in thread* [PATCH RFC v3 07/19] mm: use __GFP_ZERO in vma_alloc_anon_folio_pmd
2026-04-21 22:01 [PATCH RFC v3 00/19] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
` (5 preceding siblings ...)
2026-04-21 22:01 ` [PATCH RFC v3 06/19] mm: use __GFP_ZERO in alloc_anon_folio Michael S. Tsirkin
@ 2026-04-21 22:01 ` Michael S. Tsirkin
2026-04-21 22:01 ` [PATCH RFC v3 08/19] mm: hugetlb: use __GFP_ZERO and skip zeroing for zeroed pages Michael S. Tsirkin
` (11 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2026-04-21 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
Andrea Arcangeli, Gregory Price, linux-mm, virtualization,
Lorenzo Stoakes, Zi Yan, Baolin Wang, Liam R. Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang
Convert vma_alloc_anon_folio_pmd() to pass __GFP_ZERO instead of
zeroing at the callsite.
Pass the exact fault address (not PMD-aligned) to vma_alloc_folio()
to ensure the cache locality optimization in folio_zero_user()
works correctly. The NUMA interleave index computation already
shifts by PAGE_SHIFT + order, so the unmasked address gives the
same result.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
---
mm/huge_memory.c | 12 ++----------
1 file changed, 2 insertions(+), 10 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 8e2746ea74ad..3f2a868cf9e9 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1256,11 +1256,11 @@ EXPORT_SYMBOL_GPL(thp_get_unmapped_area);
static struct folio *vma_alloc_anon_folio_pmd(struct vm_area_struct *vma,
unsigned long addr)
{
- gfp_t gfp = vma_thp_gfp_mask(vma);
+ gfp_t gfp = vma_thp_gfp_mask(vma) | __GFP_ZERO;
const int order = HPAGE_PMD_ORDER;
struct folio *folio;
- folio = vma_alloc_folio(gfp, order, vma, addr & HPAGE_PMD_MASK);
+ folio = vma_alloc_folio(gfp, order, vma, addr);
if (unlikely(!folio)) {
count_vm_event(THP_FAULT_FALLBACK);
@@ -1279,14 +1279,6 @@ static struct folio *vma_alloc_anon_folio_pmd(struct vm_area_struct *vma,
}
folio_throttle_swaprate(folio, gfp);
- /*
- * When a folio is not zeroed during allocation (__GFP_ZERO not used)
- * or user folios require special handling, folio_zero_user() is used to
- * make sure that the page corresponding to the faulting address will be
- * hot in the cache after zeroing.
- */
- if (user_alloc_needs_zeroing())
- folio_zero_user(folio, addr);
/*
* The memory barrier inside __folio_mark_uptodate makes sure that
* folio_zero_user writes become visible before the set_pmd_at()
--
MST
^ permalink raw reply [flat|nested] 20+ messages in thread* [PATCH RFC v3 08/19] mm: hugetlb: use __GFP_ZERO and skip zeroing for zeroed pages
2026-04-21 22:01 [PATCH RFC v3 00/19] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
` (6 preceding siblings ...)
2026-04-21 22:01 ` [PATCH RFC v3 07/19] mm: use __GFP_ZERO in vma_alloc_anon_folio_pmd Michael S. Tsirkin
@ 2026-04-21 22:01 ` Michael S. Tsirkin
2026-04-21 22:01 ` [PATCH RFC v3 09/19] mm: memfd: skip zeroing for zeroed hugetlb pool pages Michael S. Tsirkin
` (10 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2026-04-21 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
Andrea Arcangeli, Gregory Price, linux-mm, virtualization,
Muchun Song, Oscar Salvador
Convert the hugetlb fault and fallocate paths to use __GFP_ZERO.
For pages allocated from the buddy allocator, post_alloc_hook()
handles zeroing (with zeroed skip when the host already zeroed
the page).
Hugetlb surplus pages need special handling because they can be
pre-allocated into the pool during mmap (by hugetlb_acct_memory)
before any page fault. Pool pages are kept around and may need
zeroing long after buddy allocation, so PG_zeroed (consumed at
allocation time) cannot track their state.
Add a bool *zeroed output parameter to alloc_hugetlb_folio()
so callers know whether the page needs zeroing. Buddy-allocated
pages are always zeroed (zeroed by post_alloc_hook). Pool
pages use a new HPG_zeroed flag to track whether the page is
known-zero (freshly buddy-allocated, never mapped to userspace).
The flag is set in alloc_surplus_hugetlb_folio() after buddy
allocation and cleared in free_huge_folio() when a user-mapped
page returns to the pool.
Callers that do not need zeroing (CoW, migration) pass NULL for
zeroed and 0 for gfp.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
---
fs/hugetlbfs/inode.c | 10 ++++++--
include/linux/hugetlb.h | 8 ++++--
mm/hugetlb.c | 54 ++++++++++++++++++++++++++++++++---------
3 files changed, 56 insertions(+), 16 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 3f70c47981de..d5d570d6eff4 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -822,14 +822,20 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
* folios in these areas, we need to consume the reserves
* to keep reservation accounting consistent.
*/
- folio = alloc_hugetlb_folio(&pseudo_vma, addr, false);
+ {
+ bool zeroed;
+
+ folio = alloc_hugetlb_folio(&pseudo_vma, addr, false,
+ __GFP_ZERO, &zeroed);
if (IS_ERR(folio)) {
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
error = PTR_ERR(folio);
goto out;
}
- folio_zero_user(folio, addr);
+ if (!zeroed)
+ folio_zero_user(folio, addr);
__folio_mark_uptodate(folio);
+ }
error = hugetlb_add_to_page_cache(folio, mapping, index);
if (unlikely(error)) {
restore_reserve_on_error(h, &pseudo_vma, addr, folio);
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 65910437be1c..094714c607f9 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -598,6 +598,7 @@ enum hugetlb_page_flags {
HPG_vmemmap_optimized,
HPG_raw_hwp_unreliable,
HPG_cma,
+ HPG_zeroed,
__NR_HPAGEFLAGS,
};
@@ -658,6 +659,7 @@ HPAGEFLAG(Freed, freed)
HPAGEFLAG(VmemmapOptimized, vmemmap_optimized)
HPAGEFLAG(RawHwpUnreliable, raw_hwp_unreliable)
HPAGEFLAG(Cma, cma)
+HPAGEFLAG(Zeroed, zeroed)
#ifdef CONFIG_HUGETLB_PAGE
@@ -705,7 +707,8 @@ int isolate_or_dissolve_huge_folio(struct folio *folio, struct list_head *list);
int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn);
void wait_for_freed_hugetlb_folios(void);
struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
- unsigned long addr, bool cow_from_owner);
+ unsigned long addr, bool cow_from_owner,
+ gfp_t gfp, bool *zeroed);
struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
nodemask_t *nmask, gfp_t gfp_mask,
bool allow_alloc_fallback);
@@ -1117,7 +1120,8 @@ static inline void wait_for_freed_hugetlb_folios(void)
static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
unsigned long addr,
- bool cow_from_owner)
+ bool cow_from_owner,
+ gfp_t gfp, bool *zeroed)
{
return NULL;
}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index de8361b503d2..4f0ed01f5b13 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1744,6 +1744,9 @@ void free_huge_folio(struct folio *folio)
int nid = folio_nid(folio);
struct hugepage_subpool *spool = hugetlb_folio_subpool(folio);
bool restore_reserve;
+
+ /* Page was mapped to userspace; no longer known-zero */
+ folio_clear_hugetlb_zeroed(folio);
unsigned long flags;
VM_BUG_ON_FOLIO(folio_ref_count(folio), folio);
@@ -2146,6 +2149,10 @@ static struct folio *alloc_surplus_hugetlb_folio(struct hstate *h,
if (!folio)
return NULL;
+ /* Mark as known-zero only if __GFP_ZERO was requested */
+ if (gfp_mask & __GFP_ZERO)
+ folio_set_hugetlb_zeroed(folio);
+
spin_lock_irq(&hugetlb_lock);
/*
* nr_huge_pages needs to be adjusted within the same lock cycle
@@ -2209,11 +2216,11 @@ static struct folio *alloc_migrate_hugetlb_folio(struct hstate *h, gfp_t gfp_mas
*/
static
struct folio *alloc_buddy_hugetlb_folio_with_mpol(struct hstate *h,
- struct vm_area_struct *vma, unsigned long addr)
+ struct vm_area_struct *vma, unsigned long addr, gfp_t gfp)
{
struct folio *folio = NULL;
struct mempolicy *mpol;
- gfp_t gfp_mask = htlb_alloc_mask(h);
+ gfp_t gfp_mask = htlb_alloc_mask(h) | gfp;
int nid;
nodemask_t *nodemask;
@@ -2910,7 +2917,8 @@ typedef enum {
* When it's set, the allocation will bypass all vma level reservations.
*/
struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
- unsigned long addr, bool cow_from_owner)
+ unsigned long addr, bool cow_from_owner,
+ gfp_t gfp, bool *zeroed)
{
struct hugepage_subpool *spool = subpool_vma(vma);
struct hstate *h = hstate_vma(vma);
@@ -2919,7 +2927,9 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
map_chg_state map_chg;
int ret, idx;
struct hugetlb_cgroup *h_cg = NULL;
- gfp_t gfp = htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL;
+ bool from_pool;
+
+ gfp |= htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL;
idx = hstate_index(h);
@@ -2987,13 +2997,15 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
folio = dequeue_hugetlb_folio_vma(h, vma, addr, gbl_chg);
if (!folio) {
spin_unlock_irq(&hugetlb_lock);
- folio = alloc_buddy_hugetlb_folio_with_mpol(h, vma, addr);
+ folio = alloc_buddy_hugetlb_folio_with_mpol(h, vma, addr, gfp);
if (!folio)
goto out_uncharge_cgroup;
spin_lock_irq(&hugetlb_lock);
list_add(&folio->lru, &h->hugepage_activelist);
folio_ref_unfreeze(folio, 1);
- /* Fall through */
+ from_pool = false;
+ } else {
+ from_pool = true;
}
/*
@@ -3016,6 +3028,14 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
spin_unlock_irq(&hugetlb_lock);
+ if (zeroed) {
+ if (from_pool)
+ *zeroed = folio_test_hugetlb_zeroed(folio);
+ else
+ *zeroed = true; /* buddy-allocated, zeroed by post_alloc_hook */
+ folio_clear_hugetlb_zeroed(folio);
+ }
+
hugetlb_set_folio_subpool(folio, spool);
if (map_chg != MAP_CHG_ENFORCED) {
@@ -5004,7 +5024,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
spin_unlock(src_ptl);
spin_unlock(dst_ptl);
/* Do not use reserve as it's private owned */
- new_folio = alloc_hugetlb_folio(dst_vma, addr, false);
+ new_folio = alloc_hugetlb_folio(dst_vma, addr, false, 0, NULL);
if (IS_ERR(new_folio)) {
folio_put(pte_folio);
ret = PTR_ERR(new_folio);
@@ -5533,7 +5553,7 @@ static vm_fault_t hugetlb_wp(struct vm_fault *vmf)
* be acquired again before returning to the caller, as expected.
*/
spin_unlock(vmf->ptl);
- new_folio = alloc_hugetlb_folio(vma, vmf->address, cow_from_owner);
+ new_folio = alloc_hugetlb_folio(vma, vmf->address, cow_from_owner, 0, NULL);
if (IS_ERR(new_folio)) {
/*
@@ -5793,7 +5813,11 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
goto out;
}
- folio = alloc_hugetlb_folio(vma, vmf->address, false);
+ {
+ bool zeroed;
+
+ folio = alloc_hugetlb_folio(vma, vmf->address, false,
+ __GFP_ZERO, &zeroed);
if (IS_ERR(folio)) {
/*
* Returning error will result in faulting task being
@@ -5813,9 +5837,15 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
ret = 0;
goto out;
}
- folio_zero_user(folio, vmf->real_address);
+ /*
+ * Buddy-allocated pages are zeroed in post_alloc_hook().
+ * Pool pages bypass the allocator, zero them here.
+ */
+ if (!zeroed)
+ folio_zero_user(folio, vmf->real_address);
__folio_mark_uptodate(folio);
new_folio = true;
+ }
if (vma->vm_flags & VM_MAYSHARE) {
int err = hugetlb_add_to_page_cache(folio, mapping,
@@ -6252,7 +6282,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
goto out;
}
- folio = alloc_hugetlb_folio(dst_vma, dst_addr, false);
+ folio = alloc_hugetlb_folio(dst_vma, dst_addr, false, 0, NULL);
if (IS_ERR(folio)) {
pte_t *actual_pte = hugetlb_walk(dst_vma, dst_addr, PMD_SIZE);
if (actual_pte) {
@@ -6299,7 +6329,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
goto out;
}
- folio = alloc_hugetlb_folio(dst_vma, dst_addr, false);
+ folio = alloc_hugetlb_folio(dst_vma, dst_addr, false, 0, NULL);
if (IS_ERR(folio)) {
folio_put(*foliop);
ret = -ENOMEM;
--
MST
^ permalink raw reply [flat|nested] 20+ messages in thread* [PATCH RFC v3 09/19] mm: memfd: skip zeroing for zeroed hugetlb pool pages
2026-04-21 22:01 [PATCH RFC v3 00/19] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
` (7 preceding siblings ...)
2026-04-21 22:01 ` [PATCH RFC v3 08/19] mm: hugetlb: use __GFP_ZERO and skip zeroing for zeroed pages Michael S. Tsirkin
@ 2026-04-21 22:01 ` Michael S. Tsirkin
2026-04-21 22:01 ` [PATCH RFC v3 10/19] mm: remove arch vma_alloc_zeroed_movable_folio overrides Michael S. Tsirkin
` (9 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2026-04-21 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
Andrea Arcangeli, Gregory Price, linux-mm, virtualization,
Muchun Song, Oscar Salvador, Hugh Dickins, Baolin Wang
gather_surplus_pages() pre-allocates hugetlb pages into the pool
during mmap. Pass __GFP_ZERO so these pages are zeroed by the
buddy allocator, and HPG_zeroed is set by alloc_surplus_hugetlb_folio.
Add bool *zeroed output to alloc_hugetlb_folio_reserve() so
callers can check whether the pool page is known-zero. memfd's
memfd_alloc_folio() uses this to skip the explicit folio_zero_user()
when the page is already zero.
This avoids redundant zeroing for memfd hugetlb pages that were
pre-allocated into the pool and never mapped to userspace.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
---
include/linux/hugetlb.h | 6 ++++--
mm/hugetlb.c | 11 +++++++++--
mm/memfd.c | 17 +++++++++++------
3 files changed, 24 insertions(+), 10 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 094714c607f9..93bb06a33f57 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -713,7 +713,8 @@ struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
nodemask_t *nmask, gfp_t gfp_mask,
bool allow_alloc_fallback);
struct folio *alloc_hugetlb_folio_reserve(struct hstate *h, int preferred_nid,
- nodemask_t *nmask, gfp_t gfp_mask);
+ nodemask_t *nmask, gfp_t gfp_mask,
+ bool *zeroed);
int hugetlb_add_to_page_cache(struct folio *folio, struct address_space *mapping,
pgoff_t idx);
@@ -1128,7 +1129,8 @@ static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
static inline struct folio *
alloc_hugetlb_folio_reserve(struct hstate *h, int preferred_nid,
- nodemask_t *nmask, gfp_t gfp_mask)
+ nodemask_t *nmask, gfp_t gfp_mask,
+ bool *zeroed)
{
return NULL;
}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 4f0ed01f5b13..f02583b9faab 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2241,7 +2241,7 @@ struct folio *alloc_buddy_hugetlb_folio_with_mpol(struct hstate *h,
}
struct folio *alloc_hugetlb_folio_reserve(struct hstate *h, int preferred_nid,
- nodemask_t *nmask, gfp_t gfp_mask)
+ nodemask_t *nmask, gfp_t gfp_mask, bool *zeroed)
{
struct folio *folio;
@@ -2257,6 +2257,12 @@ struct folio *alloc_hugetlb_folio_reserve(struct hstate *h, int preferred_nid,
h->resv_huge_pages--;
spin_unlock_irq(&hugetlb_lock);
+
+ if (zeroed && folio) {
+ *zeroed = folio_test_hugetlb_zeroed(folio);
+ folio_clear_hugetlb_zeroed(folio);
+ }
+
return folio;
}
@@ -2341,7 +2347,8 @@ static int gather_surplus_pages(struct hstate *h, long delta)
* It is okay to use NUMA_NO_NODE because we use numa_mem_id()
* down the road to pick the current node if that is the case.
*/
- folio = alloc_surplus_hugetlb_folio(h, htlb_alloc_mask(h),
+ folio = alloc_surplus_hugetlb_folio(h,
+ htlb_alloc_mask(h) | __GFP_ZERO,
NUMA_NO_NODE, &alloc_nodemask,
USER_ADDR_NONE);
if (!folio) {
diff --git a/mm/memfd.c b/mm/memfd.c
index 919c2a53eb96..b9b44ed54db5 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -90,20 +90,24 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx)
if (nr_resv < 0)
return ERR_PTR(nr_resv);
+ {
+ bool zeroed;
+
folio = alloc_hugetlb_folio_reserve(h,
numa_node_id(),
NULL,
- gfp_mask);
+ gfp_mask,
+ &zeroed);
if (folio) {
u32 hash;
/*
- * Zero the folio to prevent information leaks to userspace.
- * Use folio_zero_user() which is optimized for huge/gigantic
- * pages. Pass 0 as addr_hint since this is not a faulting path
- * and we don't have a user virtual address yet.
+ * Zero the folio to prevent information leaks to
+ * userspace. Skip if the pool page is known-zero
+ * (HPG_zeroed set during pool pre-allocation).
*/
- folio_zero_user(folio, 0);
+ if (!zeroed)
+ folio_zero_user(folio, 0);
/*
* Mark the folio uptodate before adding to page cache,
@@ -139,6 +143,7 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx)
hugetlb_unreserve_pages(inode, idx, idx + 1, 0);
return ERR_PTR(err);
}
+ }
#endif
return shmem_read_folio(memfd->f_mapping, idx);
}
--
MST
^ permalink raw reply [flat|nested] 20+ messages in thread* [PATCH RFC v3 10/19] mm: remove arch vma_alloc_zeroed_movable_folio overrides
2026-04-21 22:01 [PATCH RFC v3 00/19] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
` (8 preceding siblings ...)
2026-04-21 22:01 ` [PATCH RFC v3 09/19] mm: memfd: skip zeroing for zeroed hugetlb pool pages Michael S. Tsirkin
@ 2026-04-21 22:01 ` Michael S. Tsirkin
2026-04-21 22:01 ` [PATCH RFC v3 11/19] mm: page_alloc: propagate PageReported flag across buddy splits Michael S. Tsirkin
` (8 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2026-04-21 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
Andrea Arcangeli, Gregory Price, linux-mm, virtualization,
Richard Henderson, Matt Turner, Magnus Lindholm, Greg Ungerer,
Geert Uytterhoeven, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, linux-alpha, linux-m68k, linux-s390
Now that the generic vma_alloc_zeroed_movable_folio() uses
__GFP_ZERO, the arch-specific macros on alpha, m68k, s390, and
x86 that did the same thing are redundant. Remove them.
arm64 is not affected: it has a real function override that
handles MTE tag zeroing, not just __GFP_ZERO.
Suggested-by: David Hildenbrand <david@kernel.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
arch/alpha/include/asm/page.h | 3 ---
arch/m68k/include/asm/page_no.h | 3 ---
arch/s390/include/asm/page.h | 3 ---
arch/x86/include/asm/page.h | 3 ---
4 files changed, 12 deletions(-)
diff --git a/arch/alpha/include/asm/page.h b/arch/alpha/include/asm/page.h
index 59d01f9b77f6..4327029cd660 100644
--- a/arch/alpha/include/asm/page.h
+++ b/arch/alpha/include/asm/page.h
@@ -12,9 +12,6 @@
extern void clear_page(void *page);
-#define vma_alloc_zeroed_movable_folio(vma, vaddr) \
- vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr)
-
extern void copy_page(void * _to, void * _from);
#define copy_user_page(to, from, vaddr, pg) copy_page(to, from)
diff --git a/arch/m68k/include/asm/page_no.h b/arch/m68k/include/asm/page_no.h
index d2532bc407ef..f511b763a235 100644
--- a/arch/m68k/include/asm/page_no.h
+++ b/arch/m68k/include/asm/page_no.h
@@ -12,9 +12,6 @@ extern unsigned long memory_end;
#define copy_user_page(to, from, vaddr, pg) copy_page(to, from)
-#define vma_alloc_zeroed_movable_folio(vma, vaddr) \
- vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr)
-
#define __pa(vaddr) ((unsigned long)(vaddr))
#define __va(paddr) ((void *)((unsigned long)(paddr)))
diff --git a/arch/s390/include/asm/page.h b/arch/s390/include/asm/page.h
index f339258135f7..04020a19a5cf 100644
--- a/arch/s390/include/asm/page.h
+++ b/arch/s390/include/asm/page.h
@@ -67,9 +67,6 @@ static inline void copy_page(void *to, void *from)
#define copy_user_page(to, from, vaddr, pg) copy_page(to, from)
-#define vma_alloc_zeroed_movable_folio(vma, vaddr) \
- vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr)
-
#ifdef CONFIG_STRICT_MM_TYPECHECKS
#define STRICT_MM_TYPECHECKS
#endif
diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 416dc88e35c1..92fa975b46f3 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -28,9 +28,6 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
copy_page(to, from);
}
-#define vma_alloc_zeroed_movable_folio(vma, vaddr) \
- vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr)
-
#ifndef __pa
#define __pa(x) __phys_addr((unsigned long)(x))
#endif
--
MST
^ permalink raw reply [flat|nested] 20+ messages in thread* [PATCH RFC v3 11/19] mm: page_alloc: propagate PageReported flag across buddy splits
2026-04-21 22:01 [PATCH RFC v3 00/19] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
` (9 preceding siblings ...)
2026-04-21 22:01 ` [PATCH RFC v3 10/19] mm: remove arch vma_alloc_zeroed_movable_folio overrides Michael S. Tsirkin
@ 2026-04-21 22:01 ` Michael S. Tsirkin
2026-04-21 22:01 ` [PATCH RFC v3 12/19] mm: page_reporting: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
` (7 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2026-04-21 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
Andrea Arcangeli, Gregory Price, linux-mm, virtualization,
Johannes Weiner, Zi Yan
When a reported free page is split via expand() to satisfy a
smaller allocation, the sub-pages placed back on the free lists
lose the PageReported flag. This means they will be unnecessarily
re-reported to the hypervisor in the next reporting cycle, wasting
work.
Propagate the PageReported flag to sub-pages during expand() so
that they are recognized as already-reported.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
---
mm/page_alloc.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index db2192ffc27c..211e9e32b91d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1730,7 +1730,7 @@ struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
* -- nyc
*/
static inline unsigned int expand(struct zone *zone, struct page *page, int low,
- int high, int migratetype)
+ int high, int migratetype, bool reported)
{
unsigned int size = 1 << high;
unsigned int nr_added = 0;
@@ -1752,6 +1752,15 @@ static inline unsigned int expand(struct zone *zone, struct page *page, int low,
__add_to_free_list(&page[size], zone, high, migratetype, false);
set_buddy_order(&page[size], high);
nr_added += size;
+
+ /*
+ * The parent page has been reported to the host. The
+ * sub-pages are part of the same reported block, so mark
+ * them reported too. This avoids re-reporting pages that
+ * the host already knows about.
+ */
+ if (reported)
+ __SetPageReported(&page[size]);
}
return nr_added;
@@ -1762,9 +1771,10 @@ static __always_inline void page_del_and_expand(struct zone *zone,
int high, int migratetype)
{
int nr_pages = 1 << high;
+ bool was_reported = page_reported(page);
__del_page_from_free_list(page, zone, high, migratetype);
- nr_pages -= expand(zone, page, low, high, migratetype);
+ nr_pages -= expand(zone, page, low, high, migratetype, was_reported);
account_freepages(zone, -nr_pages, migratetype);
}
@@ -2334,7 +2344,8 @@ try_to_claim_block(struct zone *zone, struct page *page,
del_page_from_free_list(page, zone, current_order, block_type);
change_pageblock_range(page, current_order, start_type);
- nr_added = expand(zone, page, order, current_order, start_type);
+ nr_added = expand(zone, page, order, current_order, start_type,
+ false);
account_freepages(zone, nr_added, start_type);
return page;
}
--
MST
^ permalink raw reply [flat|nested] 20+ messages in thread* [PATCH RFC v3 12/19] mm: page_reporting: skip redundant zeroing of host-zeroed reported pages
2026-04-21 22:01 [PATCH RFC v3 00/19] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
` (10 preceding siblings ...)
2026-04-21 22:01 ` [PATCH RFC v3 11/19] mm: page_alloc: propagate PageReported flag across buddy splits Michael S. Tsirkin
@ 2026-04-21 22:01 ` Michael S. Tsirkin
2026-04-21 22:01 ` [PATCH RFC v3 13/19] virtio_balloon: a hack to enable host-zeroed page optimization Michael S. Tsirkin
` (6 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2026-04-21 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
Andrea Arcangeli, Gregory Price, linux-mm, virtualization,
Lorenzo Stoakes, Liam R. Howlett, Mike Rapoport, Johannes Weiner,
Zi Yan
When a guest reports free pages to the hypervisor via the page reporting
framework (used by virtio-balloon and hv_balloon), the host typically
zeros those pages when reclaiming their backing memory. However, when
those pages are later allocated in the guest, post_alloc_hook()
unconditionally zeros them again if __GFP_ZERO is set. This
double-zeroing is wasteful, especially for large pages.
Avoid redundant zeroing:
- Add a host_zeroes_pages flag to page_reporting_dev_info, allowing
drivers to declare that their host zeros reported pages on reclaim.
A static key (page_reporting_host_zeroes) gates the fast path.
- Add PG_zeroed page flag (sharing PG_private bit) to mark pages
that have been zeroed by the host. Set it on reported pages during
allocation from the buddy in page_del_and_expand().
- Thread the zeroed bool through rmqueue -> prep_new_page ->
post_alloc_hook, where it skips redundant zeroing for __GFP_ZERO
allocations.
No driver sets host_zeroes_pages yet; a follow-up patch to
virtio_balloon is needed to opt in.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
Assisted-by: cursor-agent:GPT-5.4-xhigh
---
include/linux/mm.h | 28 +++++++++++++++++
include/linux/page-flags.h | 12 ++++++-
include/linux/page_reporting.h | 3 ++
mm/compaction.c | 5 +--
mm/internal.h | 2 +-
mm/page_alloc.c | 57 ++++++++++++++++++++++------------
mm/page_reporting.c | 14 ++++++++-
mm/page_reporting.h | 12 +++++++
8 files changed, 108 insertions(+), 25 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 541d36e5e420..821034dd33d1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4817,6 +4817,34 @@ static inline bool user_alloc_needs_zeroing(void)
&init_on_alloc);
}
+/**
+ * __page_test_clear_zeroed - test and clear the zeroed marker.
+ * @page: the page to test.
+ *
+ * Returns true if the page was zeroed by the host, and clears
+ * the marker. Caller must have exclusive access to @page.
+ */
+static inline bool __page_test_clear_zeroed(struct page *page)
+{
+ if (PageZeroed(page)) {
+ __ClearPageZeroed(page);
+ return true;
+ }
+ return false;
+}
+
+/**
+ * folio_test_clear_zeroed - test and clear the zeroed marker.
+ * @folio: the folio to test.
+ *
+ * Returns true if the folio was zeroed by the host, and clears
+ * the marker. Callers can skip their own zeroing.
+ */
+static inline bool folio_test_clear_zeroed(struct folio *folio)
+{
+ return __page_test_clear_zeroed(&folio->page);
+}
+
int arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *status);
int arch_set_shadow_stack_status(struct task_struct *t, unsigned long status);
int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index f7a0e4af0c73..aa0de99247d4 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -135,6 +135,8 @@ enum pageflags {
PG_swapcache = PG_owner_priv_1, /* Swap page: swp_entry_t in private */
/* Some filesystems */
PG_checked = PG_owner_priv_1,
+ /* Page contents are known to be zero */
+ PG_zeroed = PG_private,
/*
* Depending on the way an anonymous folio can be mapped into a page
@@ -679,6 +681,13 @@ FOLIO_TEST_CLEAR_FLAG_FALSE(young)
FOLIO_FLAG_FALSE(idle)
#endif
+/*
+ * PageZeroed() tracks pages known to be zero. The allocator
+ * uses this to skip redundant zeroing in post_alloc_hook().
+ */
+__PAGEFLAG(Zeroed, zeroed, PF_NO_COMPOUND)
+#define __PG_ZEROED (1UL << PG_zeroed)
+
/*
* PageReported() is used to track reported free pages within the Buddy
* allocator. We can use the non-atomic version of the test and set
@@ -1207,9 +1216,10 @@ static __always_inline void __ClearPageAnonExclusive(struct page *page)
*
* __PG_HWPOISON is exceptional because it needs to be kept beyond page's
* alloc-free cycle to prevent from reusing the page.
+ * __PG_ZEROED survives alloc-free cycles to track known-zero pages.
*/
#define PAGE_FLAGS_CHECK_AT_PREP \
- ((PAGEFLAGS_MASK & ~__PG_HWPOISON) | LRU_GEN_MASK | LRU_REFS_MASK)
+ ((PAGEFLAGS_MASK & ~(__PG_HWPOISON | __PG_ZEROED)) | LRU_GEN_MASK | LRU_REFS_MASK)
/*
* Flags stored in the second page of a compound page. They may overlap
diff --git a/include/linux/page_reporting.h b/include/linux/page_reporting.h
index fe648dfa3a7c..10faadfeb4fb 100644
--- a/include/linux/page_reporting.h
+++ b/include/linux/page_reporting.h
@@ -13,6 +13,9 @@ struct page_reporting_dev_info {
int (*report)(struct page_reporting_dev_info *prdev,
struct scatterlist *sg, unsigned int nents);
+ /* If true, host zeros reported pages on reclaim */
+ bool host_zeroes_pages;
+
/* work struct for processing reports */
struct delayed_work work;
diff --git a/mm/compaction.c b/mm/compaction.c
index 82f2914962f5..3d9ae727a98a 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -82,7 +82,8 @@ static inline bool is_via_compact_memory(int order) { return false; }
static struct page *mark_allocated_noprof(struct page *page, unsigned int order, gfp_t gfp_flags)
{
- post_alloc_hook(page, order, __GFP_MOVABLE, USER_ADDR_NONE)
+ post_alloc_hook(page, order, __GFP_MOVABLE, false, USER_ADDR_NONE);
+ set_page_refcounted(page);
return page;
}
#define mark_allocated(...) alloc_hooks(mark_allocated_noprof(__VA_ARGS__))
@@ -1831,7 +1832,7 @@ static struct folio *compaction_alloc_noprof(struct folio *src, unsigned long da
set_page_private(&freepage[size], start_order);
}
dst = (struct folio *)freepage;
- post_alloc_hook(&dst->page, order, __GFP_MOVABLE, USER_ADDR_NONE);
+ post_alloc_hook(&dst->page, order, __GFP_MOVABLE, false, USER_ADDR_NONE);
set_page_refcounted(&dst->page);
if (order)
prep_compound_page(&dst->page, order);
diff --git a/mm/internal.h b/mm/internal.h
index 0b9c0bd133d3..4c33249e03f0 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -889,7 +889,7 @@ static inline void prep_compound_tail(struct page *head, int tail_idx)
}
void post_alloc_hook(struct page *page, unsigned int order, gfp_t gfp_flags,
- unsigned long user_addr);
+ bool zeroed, unsigned long user_addr);
extern bool free_pages_prepare(struct page *page, unsigned int order);
extern int user_min_free_kbytes;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 211e9e32b91d..2098d569d80c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1774,6 +1774,7 @@ static __always_inline void page_del_and_expand(struct zone *zone,
bool was_reported = page_reported(page);
__del_page_from_free_list(page, zone, high, migratetype);
+
nr_pages -= expand(zone, page, low, high, migratetype, was_reported);
account_freepages(zone, -nr_pages, migratetype);
}
@@ -1846,8 +1847,10 @@ static inline bool should_skip_init(gfp_t flags)
return (flags & __GFP_SKIP_ZERO);
}
+
inline void post_alloc_hook(struct page *page, unsigned int order,
- gfp_t gfp_flags, unsigned long user_addr)
+ gfp_t gfp_flags, bool zeroed,
+ unsigned long user_addr)
{
bool init = !want_init_on_free() && want_init_on_alloc(gfp_flags) &&
!should_skip_init(gfp_flags);
@@ -1856,6 +1859,14 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
set_page_private(page, 0);
+ /*
+ * If the page is zeroed, skip memory initialization.
+ * We still need to handle tag zeroing separately since the host
+ * does not know about memory tags.
+ */
+ if (zeroed && init && !zero_tags)
+ init = false;
+
arch_alloc_page(page, order);
debug_pagealloc_map_pages(page, 1 << order);
@@ -1913,13 +1924,13 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
}
static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags,
- unsigned int alloc_flags,
- unsigned long user_addr)
+ unsigned int alloc_flags, bool zeroed,
+ unsigned long user_addr)
{
if (order && (gfp_flags & __GFP_COMP))
prep_compound_page(page, order);
- post_alloc_hook(page, order, gfp_flags, user_addr);
+ post_alloc_hook(page, order, gfp_flags, zeroed, user_addr);
/*
* page is set pfmemalloc when ALLOC_NO_WATERMARKS was necessary to
@@ -3261,7 +3272,7 @@ static inline void zone_statistics(struct zone *preferred_zone, struct zone *z,
static __always_inline
struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
unsigned int order, unsigned int alloc_flags,
- int migratetype)
+ int migratetype, bool *zeroed)
{
struct page *page;
unsigned long flags;
@@ -3296,6 +3307,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
}
}
spin_unlock_irqrestore(&zone->lock, flags);
+ *zeroed = __page_test_clear_zeroed(page);
} while (check_new_pages(page, order));
__count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order);
@@ -3357,10 +3369,9 @@ static int nr_pcp_alloc(struct per_cpu_pages *pcp, struct zone *zone, int order)
/* Remove page from the per-cpu list, caller must protect the list */
static inline
struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order,
- int migratetype,
- unsigned int alloc_flags,
+ int migratetype, unsigned int alloc_flags,
struct per_cpu_pages *pcp,
- struct list_head *list)
+ struct list_head *list, bool *zeroed)
{
struct page *page;
@@ -3381,6 +3392,7 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order,
page = list_first_entry(list, struct page, pcp_list);
list_del(&page->pcp_list);
pcp->count -= 1 << order;
+ *zeroed = __page_test_clear_zeroed(page);
} while (check_new_pages(page, order));
return page;
@@ -3389,7 +3401,8 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order,
/* Lock and remove page from the per-cpu list */
static struct page *rmqueue_pcplist(struct zone *preferred_zone,
struct zone *zone, unsigned int order,
- int migratetype, unsigned int alloc_flags)
+ int migratetype, unsigned int alloc_flags,
+ bool *zeroed)
{
struct per_cpu_pages *pcp;
struct list_head *list;
@@ -3408,7 +3421,8 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone,
*/
pcp->free_count >>= 1;
list = &pcp->lists[order_to_pindex(migratetype, order)];
- page = __rmqueue_pcplist(zone, order, migratetype, alloc_flags, pcp, list);
+ page = __rmqueue_pcplist(zone, order, migratetype, alloc_flags,
+ pcp, list, zeroed);
pcp_spin_unlock(pcp, UP_flags);
if (page) {
__count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order);
@@ -3433,19 +3447,19 @@ static inline
struct page *rmqueue(struct zone *preferred_zone,
struct zone *zone, unsigned int order,
gfp_t gfp_flags, unsigned int alloc_flags,
- int migratetype)
+ int migratetype, bool *zeroed)
{
struct page *page;
if (likely(pcp_allowed_order(order))) {
page = rmqueue_pcplist(preferred_zone, zone, order,
- migratetype, alloc_flags);
+ migratetype, alloc_flags, zeroed);
if (likely(page))
goto out;
}
page = rmqueue_buddy(preferred_zone, zone, order, alloc_flags,
- migratetype);
+ migratetype, zeroed);
out:
/* Separate test+clear to avoid unnecessary atomics */
@@ -3836,6 +3850,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
struct pglist_data *last_pgdat = NULL;
bool last_pgdat_dirty_ok = false;
bool no_fallback;
+ bool zeroed;
bool skip_kswapd_nodes = nr_online_nodes > 1;
bool skipped_kswapd_nodes = false;
@@ -3980,10 +3995,11 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
try_this_zone:
page = rmqueue(zonelist_zone(ac->preferred_zoneref), zone, order,
- gfp_mask, alloc_flags, ac->migratetype);
+ gfp_mask, alloc_flags, ac->migratetype,
+ &zeroed);
if (page) {
prep_new_page(page, order, gfp_mask, alloc_flags,
- ac->user_addr);
+ zeroed, ac->user_addr);
/*
* If this is a high-order atomic allocation then check
@@ -4218,7 +4234,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
/* Prep a captured page if available */
if (page)
- prep_new_page(page, order, gfp_mask, alloc_flags,
+ prep_new_page(page, order, gfp_mask, alloc_flags, false,
ac->user_addr);
/* Try get a page from the freelist if available */
@@ -5193,6 +5209,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid,
/* Attempt the batch allocation */
pcp_list = &pcp->lists[order_to_pindex(ac.migratetype, 0)];
while (nr_populated < nr_pages) {
+ bool zeroed = false;
/* Skip existing pages */
if (page_array[nr_populated]) {
@@ -5201,7 +5218,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid,
}
page = __rmqueue_pcplist(zone, 0, ac.migratetype, alloc_flags,
- pcp, pcp_list);
+ pcp, pcp_list, &zeroed);
if (unlikely(!page)) {
/* Try and allocate at least one page */
if (!nr_account) {
@@ -5212,7 +5229,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid,
}
nr_account++;
- prep_new_page(page, 0, gfp, 0, USER_ADDR_NONE);
+ prep_new_page(page, 0, gfp, 0, zeroed, USER_ADDR_NONE);
set_page_refcounted(page);
page_array[nr_populated++] = page;
}
@@ -6938,7 +6955,7 @@ static void split_free_frozen_pages(struct list_head *list, gfp_t gfp_mask)
list_for_each_entry_safe(page, next, &list[order], lru) {
int i;
- post_alloc_hook(page, order, gfp_mask, USER_ADDR_NONE);
+ post_alloc_hook(page, order, gfp_mask, false, USER_ADDR_NONE);
if (!order)
continue;
@@ -7144,7 +7161,7 @@ int alloc_contig_frozen_range_noprof(unsigned long start, unsigned long end,
struct page *head = pfn_to_page(start);
check_new_pages(head, order);
- prep_new_page(head, order, gfp_mask, 0, USER_ADDR_NONE);
+ prep_new_page(head, order, gfp_mask, 0, false, USER_ADDR_NONE);
} else {
ret = -EINVAL;
WARN(true, "PFN range: requested [%lu, %lu), allocated [%lu, %lu)\n",
diff --git a/mm/page_reporting.c b/mm/page_reporting.c
index f0042d5743af..6177d2413743 100644
--- a/mm/page_reporting.c
+++ b/mm/page_reporting.c
@@ -50,6 +50,8 @@ EXPORT_SYMBOL_GPL(page_reporting_order);
#define PAGE_REPORTING_DELAY (2 * HZ)
static struct page_reporting_dev_info __rcu *pr_dev_info __read_mostly;
+DEFINE_STATIC_KEY_FALSE(page_reporting_host_zeroes);
+
enum {
PAGE_REPORTING_IDLE = 0,
PAGE_REPORTING_REQUESTED,
@@ -129,8 +131,11 @@ page_reporting_drain(struct page_reporting_dev_info *prdev,
* report on the new larger page when we make our way
* up to that higher order.
*/
- if (PageBuddy(page) && buddy_order(page) == order)
+ if (PageBuddy(page) && buddy_order(page) == order) {
__SetPageReported(page);
+ if (page_reporting_host_zeroes_pages())
+ __SetPageZeroed(page);
+ }
} while ((sg = sg_next(sg)));
/* reinitialize scatterlist now that it is empty */
@@ -386,6 +391,10 @@ int page_reporting_register(struct page_reporting_dev_info *prdev)
/* Assign device to allow notifications */
rcu_assign_pointer(pr_dev_info, prdev);
+ /* enable zeroed page optimization if host zeroes reported pages */
+ if (prdev->host_zeroes_pages)
+ static_branch_enable(&page_reporting_host_zeroes);
+
/* enable page reporting notification */
if (!static_key_enabled(&page_reporting_enabled)) {
static_branch_enable(&page_reporting_enabled);
@@ -410,6 +419,9 @@ void page_reporting_unregister(struct page_reporting_dev_info *prdev)
/* Flush any existing work, and lock it out */
cancel_delayed_work_sync(&prdev->work);
+
+ if (prdev->host_zeroes_pages)
+ static_branch_disable(&page_reporting_host_zeroes);
}
mutex_unlock(&page_reporting_mutex);
diff --git a/mm/page_reporting.h b/mm/page_reporting.h
index c51dbc228b94..736ea7b37e9e 100644
--- a/mm/page_reporting.h
+++ b/mm/page_reporting.h
@@ -15,6 +15,13 @@ DECLARE_STATIC_KEY_FALSE(page_reporting_enabled);
extern unsigned int page_reporting_order;
void __page_reporting_notify(void);
+DECLARE_STATIC_KEY_FALSE(page_reporting_host_zeroes);
+
+static inline bool page_reporting_host_zeroes_pages(void)
+{
+ return static_branch_unlikely(&page_reporting_host_zeroes);
+}
+
static inline bool page_reported(struct page *page)
{
return static_branch_unlikely(&page_reporting_enabled) &&
@@ -46,6 +53,11 @@ static inline void page_reporting_notify_free(unsigned int order)
#else /* CONFIG_PAGE_REPORTING */
#define page_reported(_page) false
+static inline bool page_reporting_host_zeroes_pages(void)
+{
+ return false;
+}
+
static inline void page_reporting_notify_free(unsigned int order)
{
}
--
MST
^ permalink raw reply [flat|nested] 20+ messages in thread* [PATCH RFC v3 13/19] virtio_balloon: a hack to enable host-zeroed page optimization
2026-04-21 22:01 [PATCH RFC v3 00/19] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
` (11 preceding siblings ...)
2026-04-21 22:01 ` [PATCH RFC v3 12/19] mm: page_reporting: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
@ 2026-04-21 22:01 ` Michael S. Tsirkin
2026-04-21 22:01 ` [PATCH RFC v3 14/19] mm: page_reporting: add flush parameter with page budget Michael S. Tsirkin
` (5 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2026-04-21 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
Andrea Arcangeli, Gregory Price, linux-mm, virtualization,
Xuan Zhuo, Eugenio Pérez
Add a module parameter host_zeroes_pages to opt in to the zeroed
page optimization. A proper virtio feature flag is needed before
this can be merged.
insmod virtio_balloon.ko host_zeroes_pages=1
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
---
drivers/virtio/virtio_balloon.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index d1fbc8fe8470..165b123caa64 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -19,6 +19,11 @@
#include <linux/mm.h>
#include <linux/page_reporting.h>
+static bool host_zeroes_pages;
+module_param(host_zeroes_pages, bool, 0444);
+MODULE_PARM_DESC(host_zeroes_pages,
+ "Host zeroes reported pages, skip guest re-zeroing");
+
/*
* Balloon device works in 4K page units. So each page is pointed to by
* multiple balloon pages. All memory counters in this driver are in balloon
@@ -1039,6 +1044,8 @@ static int virtballoon_probe(struct virtio_device *vdev)
vb->pr_dev_info.order = 5;
#endif
+ /* TODO: needs a virtio feature flag */
+ vb->pr_dev_info.host_zeroes_pages = host_zeroes_pages;
err = page_reporting_register(&vb->pr_dev_info);
if (err)
goto out_unregister_oom;
--
MST
^ permalink raw reply [flat|nested] 20+ messages in thread* [PATCH RFC v3 14/19] mm: page_reporting: add flush parameter with page budget
2026-04-21 22:01 [PATCH RFC v3 00/19] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
` (12 preceding siblings ...)
2026-04-21 22:01 ` [PATCH RFC v3 13/19] virtio_balloon: a hack to enable host-zeroed page optimization Michael S. Tsirkin
@ 2026-04-21 22:01 ` Michael S. Tsirkin
2026-04-21 22:01 ` [PATCH RFC v3 15/19] mm: add free_frozen_pages_zeroed Michael S. Tsirkin
` (4 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2026-04-21 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
Andrea Arcangeli, Gregory Price, linux-mm, virtualization,
Johannes Weiner, Zi Yan
Add a write-only module parameter 'flush' that triggers immediate
page reporting. The value specifies approximately how many pages
(at page_reporting_order) to report. The flush loops through
reporting cycles, each processing up to PAGE_REPORTING_CAPACITY
pages, until the budget is exhausted, all pages are reported, or
a signal is pending.
This is helpful when there is a lot of memory freed quickly,
and a single cycle may not process all free pages due to
internal budget limits.
echo 512 > /sys/module/page_reporting/parameters/flush
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
---
mm/page_reporting.c | 42 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 42 insertions(+)
diff --git a/mm/page_reporting.c b/mm/page_reporting.c
index 6177d2413743..c09a8ac754dc 100644
--- a/mm/page_reporting.c
+++ b/mm/page_reporting.c
@@ -354,6 +354,48 @@ static void page_reporting_process(struct work_struct *work)
static DEFINE_MUTEX(page_reporting_mutex);
DEFINE_STATIC_KEY_FALSE(page_reporting_enabled);
+static int page_reporting_flush_set(const char *val,
+ const struct kernel_param *kp)
+{
+ struct page_reporting_dev_info *prdev;
+ unsigned int budget;
+ int err;
+
+ err = kstrtouint(val, 0, &budget);
+ if (err)
+ return err;
+ if (!budget)
+ return 0;
+
+ mutex_lock(&page_reporting_mutex);
+ prdev = rcu_dereference_protected(pr_dev_info,
+ lockdep_is_held(&page_reporting_mutex));
+ if (prdev) {
+ unsigned int reported;
+
+ for (reported = 0; reported < budget;
+ reported += PAGE_REPORTING_CAPACITY) {
+ flush_delayed_work(&prdev->work);
+ __page_reporting_request(prdev);
+ flush_delayed_work(&prdev->work);
+ if (atomic_read(&prdev->state) == PAGE_REPORTING_IDLE)
+ break;
+ if (signal_pending(current))
+ break;
+ }
+ }
+ mutex_unlock(&page_reporting_mutex);
+ return 0;
+}
+
+static const struct kernel_param_ops flush_ops = {
+ .set = page_reporting_flush_set,
+ .get = param_get_uint,
+};
+static unsigned int page_reporting_flush;
+module_param_cb(flush, &flush_ops, &page_reporting_flush, 0200);
+MODULE_PARM_DESC(flush, "Report up to N pages at page_reporting_order");
+
int page_reporting_register(struct page_reporting_dev_info *prdev)
{
int err = 0;
--
MST
^ permalink raw reply [flat|nested] 20+ messages in thread* [PATCH RFC v3 15/19] mm: add free_frozen_pages_zeroed
2026-04-21 22:01 [PATCH RFC v3 00/19] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
` (13 preceding siblings ...)
2026-04-21 22:01 ` [PATCH RFC v3 14/19] mm: page_reporting: add flush parameter with page budget Michael S. Tsirkin
@ 2026-04-21 22:01 ` Michael S. Tsirkin
2026-04-21 22:02 ` [PATCH RFC v3 16/19] mm: add put_page_zeroed and folio_put_zeroed Michael S. Tsirkin
` (3 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2026-04-21 22:01 UTC (permalink / raw)
To: linux-kernel
Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
Andrea Arcangeli, Gregory Price, linux-mm, virtualization,
Johannes Weiner, Zi Yan, Lorenzo Stoakes, Liam R. Howlett,
Mike Rapoport
Add free_frozen_pages_zeroed(page, order) to free a frozen page
while marking it as zeroed, so the next allocation can skip
redundant zeroing.
An FPI_ZEROED internal flag carries the hint through the free path.
PageZeroed is set after __free_pages_prepare() clears all flags,
so the hint survives on the free list.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
---
include/linux/gfp.h | 1 +
mm/internal.h | 1 -
mm/page_alloc.c | 21 ++++++++++++++++++++-
3 files changed, 21 insertions(+), 2 deletions(-)
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 10f653338042..12ab91e2ed57 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -398,6 +398,7 @@ __meminit void *alloc_pages_exact_nid_noprof(int nid, size_t size, gfp_t gfp_mas
extern void __free_pages(struct page *page, unsigned int order);
extern void free_pages_nolock(struct page *page, unsigned int order);
extern void free_pages(unsigned long addr, unsigned int order);
+void free_frozen_pages_zeroed(struct page *page, unsigned int order);
#define __free_page(page) __free_pages((page), 0)
#define free_page(addr) free_pages((addr), 0)
diff --git a/mm/internal.h b/mm/internal.h
index 4c33249e03f0..e655385b269c 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -899,7 +899,6 @@ struct page *__alloc_frozen_pages_noprof(gfp_t, unsigned int order, int nid,
#define __alloc_frozen_pages(...) \
alloc_hooks(__alloc_frozen_pages_noprof(__VA_ARGS__))
void free_frozen_pages(struct page *page, unsigned int order);
-void free_frozen_pages_zeroed(struct page *page, unsigned int order);
void free_unref_folios(struct folio_batch *fbatch);
#ifdef CONFIG_NUMA
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2098d569d80c..9311374bbd2d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -90,6 +90,13 @@ typedef int __bitwise fpi_t;
/* Free the page without taking locks. Rely on trylock only. */
#define FPI_TRYLOCK ((__force fpi_t)BIT(2))
+/*
+ * The page contents are known to be zero (e.g., the host zeroed them
+ * during balloon deflate). Set PageZeroed after free so the next
+ * allocation can skip redundant zeroing.
+ */
+#define FPI_ZEROED ((__force fpi_t)BIT(3))
+
/* prevent >1 _updater_ of zone percpu pageset ->high and ->batch fields */
static DEFINE_MUTEX(pcp_batch_high_lock);
#define MIN_PERCPU_PAGELIST_HIGH_FRACTION (8)
@@ -1611,8 +1618,11 @@ static void __free_pages_ok(struct page *page, unsigned int order,
unsigned long pfn = page_to_pfn(page);
struct zone *zone = page_zone(page);
- if (__free_pages_prepare(page, order, fpi_flags))
+ if (__free_pages_prepare(page, order, fpi_flags)) {
+ if (fpi_flags & FPI_ZEROED)
+ __SetPageZeroed(page);
free_one_page(zone, page, pfn, order, fpi_flags);
+ }
}
void __meminit __free_pages_core(struct page *page, unsigned int order,
@@ -3012,6 +3022,9 @@ static void __free_frozen_pages(struct page *page, unsigned int order,
if (!__free_pages_prepare(page, order, fpi_flags))
return;
+ if (fpi_flags & FPI_ZEROED)
+ __SetPageZeroed(page);
+
/*
* We only track unmovable, reclaimable and movable on pcp lists.
* Place ISOLATE pages on the isolated list because they are being
@@ -3050,6 +3063,12 @@ void free_frozen_pages(struct page *page, unsigned int order)
__free_frozen_pages(page, order, FPI_NONE);
}
+void free_frozen_pages_zeroed(struct page *page, unsigned int order)
+{
+ __free_frozen_pages(page, order, FPI_ZEROED);
+}
+EXPORT_SYMBOL(free_frozen_pages_zeroed);
+
void free_frozen_pages_nolock(struct page *page, unsigned int order)
{
__free_frozen_pages(page, order, FPI_TRYLOCK);
--
MST
^ permalink raw reply [flat|nested] 20+ messages in thread* [PATCH RFC v3 16/19] mm: add put_page_zeroed and folio_put_zeroed
2026-04-21 22:01 [PATCH RFC v3 00/19] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
` (14 preceding siblings ...)
2026-04-21 22:01 ` [PATCH RFC v3 15/19] mm: add free_frozen_pages_zeroed Michael S. Tsirkin
@ 2026-04-21 22:02 ` Michael S. Tsirkin
2026-04-21 22:02 ` [PATCH RFC v3 17/19] mm: page_alloc: clear PG_zeroed on buddy merge if not both zero Michael S. Tsirkin
` (2 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2026-04-21 22:02 UTC (permalink / raw)
To: linux-kernel
Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
Andrea Arcangeli, Gregory Price, linux-mm, virtualization,
Lorenzo Stoakes, Liam R. Howlett, Mike Rapoport, Axel Rasmussen,
Yuanchu Xie, Wei Xu, Chris Li, Kairui Song, Kemeng Shi,
Nhat Pham, Baoquan He, Barry Song
Add put_page_zeroed() / folio_put_zeroed() for callers that hold
a reference to a page known to be zeroed.
If this drops the last reference, the page goes through
__folio_put_zeroed() which calls free_frozen_pages_zeroed() so
the zeroed hint is preserved. If someone else still holds a
reference, the hint is simply lost -- this is best-effort.
This is useful for balloon drivers during deflation: the host
has already zeroed the pages, and the balloon is typically the
sole owner. But if the page happens to be shared, silently
dropping the hint is safe and avoids the need for callers to
check the refcount.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
---
include/linux/mm.h | 12 ++++++++++++
mm/swap.c | 18 ++++++++++++++++--
2 files changed, 28 insertions(+), 2 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 821034dd33d1..878544830369 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1640,6 +1640,7 @@ static inline struct folio *virt_to_folio(const void *x)
}
void __folio_put(struct folio *folio);
+void __folio_put_zeroed(struct folio *folio);
void split_page(struct page *page, unsigned int order);
void folio_copy(struct folio *dst, struct folio *src);
@@ -1817,6 +1818,17 @@ static inline void folio_put(struct folio *folio)
__folio_put(folio);
}
+static inline void folio_put_zeroed(struct folio *folio)
+{
+ if (folio_put_testzero(folio))
+ __folio_put_zeroed(folio);
+}
+
+static inline void put_page_zeroed(struct page *page)
+{
+ folio_put_zeroed(page_folio(page));
+}
+
/**
* folio_put_refs - Reduce the reference count on a folio.
* @folio: The folio.
diff --git a/mm/swap.c b/mm/swap.c
index bb19ccbece46..5d05a463b46a 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -94,7 +94,7 @@ static void page_cache_release(struct folio *folio)
unlock_page_lruvec_irqrestore(lruvec, flags);
}
-void __folio_put(struct folio *folio)
+static void ___folio_put(struct folio *folio, bool zeroed)
{
if (unlikely(folio_is_zone_device(folio))) {
free_zone_device_folio(folio);
@@ -109,10 +109,24 @@ void __folio_put(struct folio *folio)
page_cache_release(folio);
folio_unqueue_deferred_split(folio);
mem_cgroup_uncharge(folio);
- free_frozen_pages(&folio->page, folio_order(folio));
+ if (zeroed)
+ free_frozen_pages_zeroed(&folio->page, folio_order(folio));
+ else
+ free_frozen_pages(&folio->page, folio_order(folio));
+}
+
+void __folio_put(struct folio *folio)
+{
+ ___folio_put(folio, false);
}
EXPORT_SYMBOL(__folio_put);
+void __folio_put_zeroed(struct folio *folio)
+{
+ ___folio_put(folio, true);
+}
+EXPORT_SYMBOL(__folio_put_zeroed);
+
typedef void (*move_fn_t)(struct lruvec *lruvec, struct folio *folio);
static void lru_add(struct lruvec *lruvec, struct folio *folio)
--
MST
^ permalink raw reply [flat|nested] 20+ messages in thread* [PATCH RFC v3 17/19] mm: page_alloc: clear PG_zeroed on buddy merge if not both zero
2026-04-21 22:01 [PATCH RFC v3 00/19] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
` (15 preceding siblings ...)
2026-04-21 22:02 ` [PATCH RFC v3 16/19] mm: add put_page_zeroed and folio_put_zeroed Michael S. Tsirkin
@ 2026-04-21 22:02 ` Michael S. Tsirkin
2026-04-21 22:02 ` [PATCH RFC v3 18/19] mm: page_alloc: preserve PG_zeroed in page_del_and_expand Michael S. Tsirkin
2026-04-21 22:02 ` [PATCH RFC v3 19/19] virtio_balloon: mark deflated pages as zeroed Michael S. Tsirkin
18 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2026-04-21 22:02 UTC (permalink / raw)
To: linux-kernel
Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
Andrea Arcangeli, Gregory Price, linux-mm, virtualization,
Johannes Weiner, Zi Yan
When two buddy pages merge in __free_one_page(), preserve
PG_zeroed on the merged page only if both buddies have the
flag set. Otherwise clear it.
Without this, a zeroed page (freed via free_frozen_pages_zeroed
from balloon deflate) could merge with a non-zero buddy. The merged
page would inherit PG_zeroed, and a later __GFP_ZERO allocation
would skip zeroing stale data in the non-zero half.
The page reporting path is not affected: it sets PG_zeroed during
allocation (page_del_and_expand), not on free list pages.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
Assisted-by: cursor-agent:GPT-5.4-xhigh
---
mm/page_alloc.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9311374bbd2d..122b49a6d435 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -991,6 +991,8 @@ static inline void __free_one_page(struct page *page,
unsigned long buddy_pfn = 0;
unsigned long combined_pfn;
struct page *buddy;
+ bool buddy_zeroed;
+ bool page_zeroed;
bool to_tail;
VM_BUG_ON(!zone_is_initialized(zone));
@@ -1029,6 +1031,8 @@ static inline void __free_one_page(struct page *page,
goto done_merging;
}
+ buddy_zeroed = PageZeroed(buddy);
+
/*
* Our buddy is free or it is CONFIG_DEBUG_PAGEALLOC guard page,
* merge with it and move up one order.
@@ -1047,10 +1051,17 @@ static inline void __free_one_page(struct page *page,
change_pageblock_range(buddy, order, migratetype);
}
+ page_zeroed = PageZeroed(page);
+ __ClearPageZeroed(page);
+ __ClearPageZeroed(buddy);
+
combined_pfn = buddy_pfn & pfn;
page = page + (combined_pfn - pfn);
pfn = combined_pfn;
order++;
+
+ if (page_zeroed && buddy_zeroed)
+ __SetPageZeroed(page);
}
done_merging:
--
MST
^ permalink raw reply [flat|nested] 20+ messages in thread* [PATCH RFC v3 18/19] mm: page_alloc: preserve PG_zeroed in page_del_and_expand
2026-04-21 22:01 [PATCH RFC v3 00/19] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
` (16 preceding siblings ...)
2026-04-21 22:02 ` [PATCH RFC v3 17/19] mm: page_alloc: clear PG_zeroed on buddy merge if not both zero Michael S. Tsirkin
@ 2026-04-21 22:02 ` Michael S. Tsirkin
2026-04-21 22:02 ` [PATCH RFC v3 19/19] virtio_balloon: mark deflated pages as zeroed Michael S. Tsirkin
18 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2026-04-21 22:02 UTC (permalink / raw)
To: linux-kernel
Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
Andrea Arcangeli, Gregory Price, linux-mm, virtualization,
Johannes Weiner, Zi Yan
Don't unconditionally clear PG_zeroed for non-reported pages in
page_del_and_expand(). Pages freed via free_frozen_pages_zeroed
(balloon deflate) already have the flag set and should keep it
through buddy allocation, not just PCP reuse.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
Assisted-by: cursor-agent:GPT-5.4-xhigh
---
mm/page_alloc.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 122b49a6d435..3f5ed022cb9c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1751,7 +1751,8 @@ struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
* -- nyc
*/
static inline unsigned int expand(struct zone *zone, struct page *page, int low,
- int high, int migratetype, bool reported)
+ int high, int migratetype, bool reported,
+ bool zeroed)
{
unsigned int size = 1 << high;
unsigned int nr_added = 0;
@@ -1782,6 +1783,8 @@ static inline unsigned int expand(struct zone *zone, struct page *page, int low,
*/
if (reported)
__SetPageReported(&page[size]);
+ if (zeroed)
+ __SetPageZeroed(&page[size]);
}
return nr_added;
@@ -1793,10 +1796,12 @@ static __always_inline void page_del_and_expand(struct zone *zone,
{
int nr_pages = 1 << high;
bool was_reported = page_reported(page);
+ bool was_zeroed = PageZeroed(page);
__del_page_from_free_list(page, zone, high, migratetype);
- nr_pages -= expand(zone, page, low, high, migratetype, was_reported);
+ nr_pages -= expand(zone, page, low, high, migratetype, was_reported,
+ was_zeroed);
account_freepages(zone, -nr_pages, migratetype);
}
@@ -2373,11 +2378,13 @@ try_to_claim_block(struct zone *zone, struct page *page,
/* Take ownership for orders >= pageblock_order */
if (current_order >= pageblock_order) {
unsigned int nr_added;
+ bool was_reported = page_reported(page);
+ bool was_zeroed = PageZeroed(page);
del_page_from_free_list(page, zone, current_order, block_type);
change_pageblock_range(page, current_order, start_type);
nr_added = expand(zone, page, order, current_order, start_type,
- false);
+ was_reported, was_zeroed);
account_freepages(zone, nr_added, start_type);
return page;
}
--
MST
^ permalink raw reply [flat|nested] 20+ messages in thread* [PATCH RFC v3 19/19] virtio_balloon: mark deflated pages as zeroed
2026-04-21 22:01 [PATCH RFC v3 00/19] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
` (17 preceding siblings ...)
2026-04-21 22:02 ` [PATCH RFC v3 18/19] mm: page_alloc: preserve PG_zeroed in page_del_and_expand Michael S. Tsirkin
@ 2026-04-21 22:02 ` Michael S. Tsirkin
18 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2026-04-21 22:02 UTC (permalink / raw)
To: linux-kernel
Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
Andrea Arcangeli, Gregory Price, linux-mm, virtualization,
Xuan Zhuo, Eugenio Pérez
When host_zeroes_pages is set, the host has zeroed the balloon
pages on reclaim. Use put_page_zeroed() during deflation so
the freed pages are marked as zeroed in the buddy allocator,
allowing the next allocation to skip redundant zeroing.
put_page_zeroed() is best-effort: if the balloon is the sole
holder (the common case), the zeroed hint reaches the buddy
allocator via free_frozen_pages_zeroed(). If someone else
holds a reference, the hint is silently lost.
Once balloon pages are converted to frozen pages (no refcount),
this can switch to free_frozen_pages_zeroed() directly.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
---
drivers/virtio/virtio_balloon.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 165b123caa64..3058d48fc8de 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -299,7 +299,10 @@ static void release_pages_balloon(struct virtio_balloon *vb,
list_for_each_entry_safe(page, next, pages, lru) {
list_del(&page->lru);
- put_page(page); /* balloon reference */
+ if (host_zeroes_pages && !page_poisoning_enabled_static())
+ put_page_zeroed(page);
+ else
+ put_page(page); /* balloon reference */
}
}
--
MST
^ permalink raw reply [flat|nested] 20+ messages in thread