linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/9] mm/virtio: skip redundant zeroing of host-zeroed reported pages
@ 2026-04-12 22:50 Michael S. Tsirkin
  2026-04-12 22:50 ` [PATCH RFC 1/9] mm: page_alloc: propagate PageReported flag across buddy splits Michael S. Tsirkin
                   ` (8 more replies)
  0 siblings, 9 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2026-04-12 22:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
	Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
	Andrea Arcangeli, linux-mm, virtualization

When a guest reports free pages to the hypervisor via virtio-balloon's
free page reporting, the host typically zeros those pages when reclaiming
their backing memory (e.g., via MADV_DONTNEED on anonymous mappings).
When the guest later reallocates those pages, the kernel zeros them
again -- redundantly.

This series eliminates that double-zeroing by propagating the "host
already zeroed this page" information through the buddy allocator and
into the page fault path.

Performance with THP enabled on a 2GB VM, 1 vCPU, allocating
256MB of anonymous pages:

  metric         baseline    optimized   delta
  task-clock     179ms       99ms        -45%
  cache-misses   1.22M       287K        -76%
  instructions   15.1M       13.9M       -8%

With hugetlb surplus pages:

  metric         baseline    optimized   delta
  task-clock     322ms       9.9ms       -97%
  cache-misses   659K        88K         -87%
  instructions   18.3M       10.6M       -42%

Notes:
- The virtio_balloon patch (9/9) is a testing hack with a module
  parameter.  A proper virtio feature flag is needed before merging.
- Patch 8/9 adds a sysfs flush trigger for deterministic testing
  (avoids waiting for the 2-second reporting delay).
- The optimization is most effective with THP, where entire 2MB
  pages are allocated directly from reported order-9+ buddy pages.
  Without THP, only ~21% of order-0 allocations come from reported
  pages due to low-order fragmentation.
- Persistent hugetlb pool pages are not covered: when freed by
  userspace they return to the hugetlb free pool, not the buddy
  allocator, so they are never reported to the host.  Surplus
  hugetlb pages are allocated from buddy and do benefit.

Test program:

  #include <stdio.h>
  #include <stdlib.h>
  #include <string.h>
  #include <sys/mman.h>

  #ifndef MADV_POPULATE_WRITE
  #define MADV_POPULATE_WRITE 23
  #endif
  #ifndef MAP_HUGETLB
  #define MAP_HUGETLB 0x40000
  #endif

  int main(int argc, char **argv)
  {
      unsigned long size;
      int flags = MAP_PRIVATE | MAP_ANONYMOUS;
      void *p;
      int r;

      if (argc < 2) {
          fprintf(stderr, "usage: %s <size_mb> [huge]\n", argv[0]);
          return 1;
      }
      size = atol(argv[1]) * 1024UL * 1024;
      if (argc >= 3 && strcmp(argv[2], "huge") == 0)
          flags |= MAP_HUGETLB;
      p = mmap(NULL, size, PROT_READ | PROT_WRITE, flags, -1, 0);
      if (p == MAP_FAILED) {
          perror("mmap");
          return 1;
      }
      r = madvise(p, size, MADV_POPULATE_WRITE);
      if (r) {
          perror("madvise");
          return 1;
      }
      munmap(p, size);
      return 0;
  }

Test script (bench.sh):

  #!/bin/bash
  # Usage: bench.sh <size_mb> <mode> <iterations> [huge]
  # mode 0 = baseline, mode 1 = skip zeroing
  SZ=${1:-256}; MODE=${2:-0}; ITER=${3:-10}; HUGE=${4:-}
  FLUSH=/sys/module/page_reporting/parameters/flush
  PERF_DATA=/tmp/perf-$MODE.data
  rmmod virtio_balloon 2>/dev/null
  insmod virtio_balloon.ko host_zeroes_pages=$MODE
  echo 1 > $FLUSH
  [ "$HUGE" = "huge" ] && echo $((SZ/2)) > /proc/sys/vm/nr_overcommit_hugepages
  rm -f $PERF_DATA
  echo "=== sz=${SZ}MB mode=$MODE iter=$ITER $HUGE ==="
  for i in $(seq 1 $ITER); do
      echo 3 > /proc/sys/vm/drop_caches
      echo 1 > $FLUSH
      perf stat record -e task-clock,instructions,cache-misses \
          -o $PERF_DATA --append -- ./alloc_once $SZ $HUGE
  done
  [ "$HUGE" = "huge" ] && echo 0 > /proc/sys/vm/nr_overcommit_hugepages
  rmmod virtio_balloon
  perf stat report -i $PERF_DATA

Compile and run:
  gcc -static -O2 -o alloc_once alloc_once.c
  bash bench.sh 256 0 10          # baseline (regular pages)
  bash bench.sh 256 1 10          # optimized (regular pages)
  bash bench.sh 256 0 10 huge     # baseline (hugetlb surplus)
  bash bench.sh 256 1 10 huge     # optimized (hugetlb surplus)

Written with assistance from claude. Everything manually read, patchset
split and commit logs edited manually.

Michael S. Tsirkin (9):
  mm: page_alloc: propagate PageReported flag across buddy splits
  mm: page_reporting: skip redundant zeroing of host-zeroed reported
    pages
  mm: add __GFP_PREZEROED flag and folio_test_clear_prezeroed()
  mm: skip zeroing in vma_alloc_zeroed_movable_folio for pre-zeroed
    pages
  mm: skip zeroing in alloc_anon_folio for pre-zeroed pages
  mm: skip zeroing in vma_alloc_anon_folio_pmd for pre-zeroed pages
  mm: hugetlb: skip zeroing of pre-zeroed hugetlb pages
  mm: page_reporting: add flush parameter to trigger immediate reporting
  virtio_balloon: a hack to enable host-zeroed page optimization

 drivers/virtio/virtio_balloon.c |  7 +++++
 fs/hugetlbfs/inode.c            |  3 ++-
 include/linux/gfp_types.h       |  5 ++++
 include/linux/highmem.h         |  6 +++--
 include/linux/hugetlb.h         |  2 +-
 include/linux/mm.h              | 22 ++++++++++++++++
 include/linux/page_reporting.h  |  3 +++
 mm/huge_memory.c                |  4 +--
 mm/hugetlb.c                    |  3 ++-
 mm/memory.c                     |  5 ++--
 mm/page_alloc.c                 | 46 ++++++++++++++++++++++++++++++---
 mm/page_reporting.c             | 34 ++++++++++++++++++++++++
 mm/page_reporting.h             |  2 ++
 13 files changed, 129 insertions(+), 13 deletions(-)

-- 
MST



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH RFC 1/9] mm: page_alloc: propagate PageReported flag across buddy splits
  2026-04-12 22:50 [PATCH RFC 0/9] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
@ 2026-04-12 22:50 ` Michael S. Tsirkin
  2026-04-12 22:50 ` [PATCH RFC 2/9] mm: page_reporting: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2026-04-12 22:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
	Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
	Andrea Arcangeli, linux-mm, virtualization, Johannes Weiner,
	Zi Yan

When a reported free page is split via expand() to satisfy a
smaller allocation, the sub-pages placed back on the free lists
lose the PageReported flag.  This means they will be unnecessarily
re-reported to the hypervisor in the next reporting cycle, wasting
work.

Propagate the PageReported flag to sub-pages during expand() so
that they are recognized as already-reported.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
---
 mm/page_alloc.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2d4b6f1a554e..edbb1edf463d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1730,7 +1730,7 @@ struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
  * -- nyc
  */
 static inline unsigned int expand(struct zone *zone, struct page *page, int low,
-				  int high, int migratetype)
+				  int high, int migratetype, bool reported)
 {
 	unsigned int size = 1 << high;
 	unsigned int nr_added = 0;
@@ -1752,6 +1752,15 @@ static inline unsigned int expand(struct zone *zone, struct page *page, int low,
 		__add_to_free_list(&page[size], zone, high, migratetype, false);
 		set_buddy_order(&page[size], high);
 		nr_added += size;
+
+		/*
+		 * The parent page has been reported to the host.  The
+		 * sub-pages are part of the same reported block, so mark
+		 * them reported too.  This avoids re-reporting pages that
+		 * the host already knows about.
+		 */
+		if (reported)
+			__SetPageReported(&page[size]);
 	}
 
 	return nr_added;
@@ -1762,9 +1771,10 @@ static __always_inline void page_del_and_expand(struct zone *zone,
 						int high, int migratetype)
 {
 	int nr_pages = 1 << high;
+	bool was_reported = page_reported(page);
 
 	__del_page_from_free_list(page, zone, high, migratetype);
-	nr_pages -= expand(zone, page, low, high, migratetype);
+	nr_pages -= expand(zone, page, low, high, migratetype, was_reported);
 	account_freepages(zone, -nr_pages, migratetype);
 }
 
@@ -2322,7 +2332,8 @@ try_to_claim_block(struct zone *zone, struct page *page,
 
 		del_page_from_free_list(page, zone, current_order, block_type);
 		change_pageblock_range(page, current_order, start_type);
-		nr_added = expand(zone, page, order, current_order, start_type);
+		nr_added = expand(zone, page, order, current_order, start_type,
+				  false);
 		account_freepages(zone, nr_added, start_type);
 		return page;
 	}
-- 
MST



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH RFC 2/9] mm: page_reporting: skip redundant zeroing of host-zeroed reported pages
  2026-04-12 22:50 [PATCH RFC 0/9] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
  2026-04-12 22:50 ` [PATCH RFC 1/9] mm: page_alloc: propagate PageReported flag across buddy splits Michael S. Tsirkin
@ 2026-04-12 22:50 ` Michael S. Tsirkin
  2026-04-13  8:00   ` David Hildenbrand (Arm)
  2026-04-12 22:50 ` [PATCH RFC 3/9] mm: add __GFP_PREZEROED flag and folio_test_clear_prezeroed() Michael S. Tsirkin
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Michael S. Tsirkin @ 2026-04-12 22:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
	Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
	Andrea Arcangeli, linux-mm, virtualization, Lorenzo Stoakes,
	Liam R. Howlett, Mike Rapoport, Johannes Weiner, Zi Yan

When a guest reports free pages to the hypervisor via the page reporting
framework (used by virtio-balloon and hv_balloon), the host typically
zeros those pages when reclaiming their backing memory.  However, when
those pages are later allocated in the guest, post_alloc_hook()
unconditionally zeros them again if __GFP_ZERO is set.  This
double-zeroing is wasteful, especially for large pages.

Avoid redundant zeroing by propagating the "host already zeroed this"
information through the allocation path:

1. Add a host_zeroes_pages flag to page_reporting_dev_info, allowing
   drivers to declare that their host zeros reported pages on reclaim.
   A static key (page_reporting_host_zeroes) gates the fast path.

2. In page_del_and_expand(), when the page was reported and the
   static key is enabled, stash a sentinel value (MAGIC_PAGE_ZEROED)
   in page->private.

3. In post_alloc_hook(), check page->private for the sentinel.  If
   present and zeroing was requested (but not tag zeroing), skip
   kernel_init_pages().

In particular, __GFP_ZERO is used by the x86 arch override of
vma_alloc_zeroed_movable_folio.

No driver sets host_zeroes_pages yet; a follow-up patch to
virtio_balloon is needed to opt in.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
---
 include/linux/mm.h             |  6 ++++++
 include/linux/page_reporting.h |  3 +++
 mm/page_alloc.c                | 21 +++++++++++++++++++++
 mm/page_reporting.c            |  9 +++++++++
 mm/page_reporting.h            |  2 ++
 5 files changed, 41 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5be3d8a8f806..59fc77c4c90e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4814,6 +4814,12 @@ static inline bool user_alloc_needs_zeroing(void)
 				   &init_on_alloc);
 }
 
+/*
+ * Sentinel stored in page->private to indicate the page was pre-zeroed
+ * by the hypervisor (via free page reporting).
+ */
+#define MAGIC_PAGE_ZEROED	0x5A45524FU	/* ZERO */
+
 int arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *status);
 int arch_set_shadow_stack_status(struct task_struct *t, unsigned long status);
 int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
diff --git a/include/linux/page_reporting.h b/include/linux/page_reporting.h
index fe648dfa3a7c..10faadfeb4fb 100644
--- a/include/linux/page_reporting.h
+++ b/include/linux/page_reporting.h
@@ -13,6 +13,9 @@ struct page_reporting_dev_info {
 	int (*report)(struct page_reporting_dev_info *prdev,
 		      struct scatterlist *sg, unsigned int nents);
 
+	/* If true, host zeros reported pages on reclaim */
+	bool host_zeroes_pages;
+
 	/* work struct for processing reports */
 	struct delayed_work work;
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index edbb1edf463d..efb65eee826b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1774,8 +1774,20 @@ static __always_inline void page_del_and_expand(struct zone *zone,
 	bool was_reported = page_reported(page);
 
 	__del_page_from_free_list(page, zone, high, migratetype);
+
+	was_reported = was_reported &&
+		       static_branch_unlikely(&page_reporting_host_zeroes);
+
 	nr_pages -= expand(zone, page, low, high, migratetype, was_reported);
 	account_freepages(zone, -nr_pages, migratetype);
+
+	/*
+	 * If the page was reported and the host is known to zero reported
+	 * pages, mark it zeroed via page->private so that
+	 * post_alloc_hook() can skip redundant zeroing.
+	 */
+	if (was_reported)
+		set_page_private(page, MAGIC_PAGE_ZEROED);
 }
 
 static void check_new_page_bad(struct page *page)
@@ -1851,11 +1863,20 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
 {
 	bool init = !want_init_on_free() && want_init_on_alloc(gfp_flags) &&
 			!should_skip_init(gfp_flags);
+	bool prezeroed = page_private(page) == MAGIC_PAGE_ZEROED;
 	bool zero_tags = init && (gfp_flags & __GFP_ZEROTAGS);
 	int i;
 
 	set_page_private(page, 0);
 
+	/*
+	 * If the page is pre-zeroed, skip memory initialization.
+	 * We still need to handle tag zeroing separately since the host
+	 * does not know about memory tags.
+	 */
+	if (prezeroed && init && !zero_tags)
+		init = false;
+
 	arch_alloc_page(page, order);
 	debug_pagealloc_map_pages(page, 1 << order);
 
diff --git a/mm/page_reporting.c b/mm/page_reporting.c
index f0042d5743af..cb24832bdf4e 100644
--- a/mm/page_reporting.c
+++ b/mm/page_reporting.c
@@ -50,6 +50,8 @@ EXPORT_SYMBOL_GPL(page_reporting_order);
 #define PAGE_REPORTING_DELAY	(2 * HZ)
 static struct page_reporting_dev_info __rcu *pr_dev_info __read_mostly;
 
+DEFINE_STATIC_KEY_FALSE(page_reporting_host_zeroes);
+
 enum {
 	PAGE_REPORTING_IDLE = 0,
 	PAGE_REPORTING_REQUESTED,
@@ -386,6 +388,10 @@ int page_reporting_register(struct page_reporting_dev_info *prdev)
 	/* Assign device to allow notifications */
 	rcu_assign_pointer(pr_dev_info, prdev);
 
+	/* enable zeroed page optimization if host zeroes reported pages */
+	if (prdev->host_zeroes_pages)
+		static_branch_enable(&page_reporting_host_zeroes);
+
 	/* enable page reporting notification */
 	if (!static_key_enabled(&page_reporting_enabled)) {
 		static_branch_enable(&page_reporting_enabled);
@@ -410,6 +416,9 @@ void page_reporting_unregister(struct page_reporting_dev_info *prdev)
 
 		/* Flush any existing work, and lock it out */
 		cancel_delayed_work_sync(&prdev->work);
+
+		if (prdev->host_zeroes_pages)
+			static_branch_disable(&page_reporting_host_zeroes);
 	}
 
 	mutex_unlock(&page_reporting_mutex);
diff --git a/mm/page_reporting.h b/mm/page_reporting.h
index c51dbc228b94..2bbf99f456f5 100644
--- a/mm/page_reporting.h
+++ b/mm/page_reporting.h
@@ -15,6 +15,8 @@ DECLARE_STATIC_KEY_FALSE(page_reporting_enabled);
 extern unsigned int page_reporting_order;
 void __page_reporting_notify(void);
 
+DECLARE_STATIC_KEY_FALSE(page_reporting_host_zeroes);
+
 static inline bool page_reported(struct page *page)
 {
 	return static_branch_unlikely(&page_reporting_enabled) &&
-- 
MST



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH RFC 3/9] mm: add __GFP_PREZEROED flag and folio_test_clear_prezeroed()
  2026-04-12 22:50 [PATCH RFC 0/9] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
  2026-04-12 22:50 ` [PATCH RFC 1/9] mm: page_alloc: propagate PageReported flag across buddy splits Michael S. Tsirkin
  2026-04-12 22:50 ` [PATCH RFC 2/9] mm: page_reporting: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
@ 2026-04-12 22:50 ` Michael S. Tsirkin
  2026-04-13  9:05   ` David Hildenbrand (Arm)
  2026-04-12 22:50 ` [PATCH RFC 4/9] mm: skip zeroing in vma_alloc_zeroed_movable_folio for pre-zeroed pages Michael S. Tsirkin
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Michael S. Tsirkin @ 2026-04-12 22:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
	Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
	Andrea Arcangeli, linux-mm, virtualization, Lorenzo Stoakes,
	Liam R. Howlett, Mike Rapoport, Johannes Weiner, Zi Yan

The previous patch skips zeroing in post_alloc_hook() when
__GFP_ZERO is used.  However, several page allocation paths
zero pages via folio_zero_user() or clear_user_highpage() after
allocation, not via __GFP_ZERO.

Add __GFP_PREZEROED gfp flag that tells post_alloc_hook() to
preserve the MAGIC_PAGE_ZEROED sentinel in page->private so the
caller can detect pre-zeroed pages and skip its own zeroing.
Add folio_test_clear_prezeroed() helper to check and clear
the sentinel.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
---
 include/linux/gfp_types.h |  5 +++++
 include/linux/mm.h        | 16 ++++++++++++++++
 mm/page_alloc.c           |  8 +++++++-
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h
index 6c75df30a281..903f87c7fec9 100644
--- a/include/linux/gfp_types.h
+++ b/include/linux/gfp_types.h
@@ -56,6 +56,7 @@ enum {
 	___GFP_NOLOCKDEP_BIT,
 #endif
 	___GFP_NO_OBJ_EXT_BIT,
+	___GFP_PREZEROED_BIT,
 	___GFP_LAST_BIT
 };
 
@@ -97,6 +98,7 @@ enum {
 #define ___GFP_NOLOCKDEP	0
 #endif
 #define ___GFP_NO_OBJ_EXT       BIT(___GFP_NO_OBJ_EXT_BIT)
+#define ___GFP_PREZEROED	BIT(___GFP_PREZEROED_BIT)
 
 /*
  * Physical address zone modifiers (see linux/mmzone.h - low four bits)
@@ -292,6 +294,9 @@ enum {
 #define __GFP_SKIP_ZERO ((__force gfp_t)___GFP_SKIP_ZERO)
 #define __GFP_SKIP_KASAN ((__force gfp_t)___GFP_SKIP_KASAN)
 
+/* Caller handles pre-zeroed pages; preserve MAGIC_PAGE_ZEROED in private */
+#define __GFP_PREZEROED ((__force gfp_t)___GFP_PREZEROED)
+
 /* Disable lockdep for GFP context tracking */
 #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP)
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 59fc77c4c90e..caa1de31bbca 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4820,6 +4820,22 @@ static inline bool user_alloc_needs_zeroing(void)
  */
 #define MAGIC_PAGE_ZEROED	0x5A45524FU	/* ZERO */
 
+/**
+ * folio_test_clear_prezeroed - test and clear the pre-zeroed marker.
+ * @folio: the folio to test.
+ *
+ * Returns true if the folio was pre-zeroed by the host, and clears
+ * the marker.  Callers can skip their own zeroing.
+ */
+static inline bool folio_test_clear_prezeroed(struct folio *folio)
+{
+	if (page_private(&folio->page) == MAGIC_PAGE_ZEROED) {
+		set_page_private(&folio->page, 0);
+		return true;
+	}
+	return false;
+}
+
 int arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *status);
 int arch_set_shadow_stack_status(struct task_struct *t, unsigned long status);
 int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index efb65eee826b..fba8321c45ed 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1867,7 +1867,13 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
 	bool zero_tags = init && (gfp_flags & __GFP_ZEROTAGS);
 	int i;
 
-	set_page_private(page, 0);
+	/*
+	 * If the page is pre-zeroed and the caller opted in via
+	 * __GFP_PREZEROED, preserve the marker so the caller can
+	 * skip its own zeroing.  Otherwise always clear private.
+	 */
+	if (!(prezeroed && (gfp_flags & __GFP_PREZEROED)))
+		set_page_private(page, 0);
 
 	/*
 	 * If the page is pre-zeroed, skip memory initialization.
-- 
MST



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH RFC 4/9] mm: skip zeroing in vma_alloc_zeroed_movable_folio for pre-zeroed pages
  2026-04-12 22:50 [PATCH RFC 0/9] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
                   ` (2 preceding siblings ...)
  2026-04-12 22:50 ` [PATCH RFC 3/9] mm: add __GFP_PREZEROED flag and folio_test_clear_prezeroed() Michael S. Tsirkin
@ 2026-04-12 22:50 ` Michael S. Tsirkin
  2026-04-12 22:50 ` [PATCH RFC 5/9] mm: skip zeroing in alloc_anon_folio " Michael S. Tsirkin
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2026-04-12 22:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
	Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
	Andrea Arcangeli, linux-mm, virtualization, Lorenzo Stoakes,
	Liam R. Howlett, Mike Rapoport

Use __GFP_PREZEROED and folio_test_clear_prezeroed() to skip
clear_user_highpage() when the page is already zeroed.

On x86, vma_alloc_zeroed_movable_folio is overridden by a macro
that uses __GFP_ZERO directly, so this change has no effect there.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
---
 include/linux/highmem.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index af03db851a1d..b649e7e315f4 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -322,8 +322,10 @@ struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
 {
 	struct folio *folio;
 
-	folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr);
-	if (folio && user_alloc_needs_zeroing())
+	folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_PREZEROED,
+			       0, vma, vaddr);
+	if (folio && user_alloc_needs_zeroing() &&
+	    !folio_test_clear_prezeroed(folio))
 		clear_user_highpage(&folio->page, vaddr);
 
 	return folio;
-- 
MST



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH RFC 5/9] mm: skip zeroing in alloc_anon_folio for pre-zeroed pages
  2026-04-12 22:50 [PATCH RFC 0/9] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
                   ` (3 preceding siblings ...)
  2026-04-12 22:50 ` [PATCH RFC 4/9] mm: skip zeroing in vma_alloc_zeroed_movable_folio for pre-zeroed pages Michael S. Tsirkin
@ 2026-04-12 22:50 ` Michael S. Tsirkin
  2026-04-12 22:50 ` [PATCH RFC 6/9] mm: skip zeroing in vma_alloc_anon_folio_pmd " Michael S. Tsirkin
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2026-04-12 22:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
	Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
	Andrea Arcangeli, linux-mm, virtualization, Lorenzo Stoakes,
	Liam R. Howlett, Mike Rapoport

Use __GFP_PREZEROED and folio_test_clear_prezeroed() to skip
folio_zero_user() in the mTHP anonymous page allocation path
when the page is already zeroed.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
---
 mm/memory.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 07778814b4a8..2f61321a81fd 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5176,7 +5176,7 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
 		goto fallback;
 
 	/* Try allocating the highest of the remaining orders. */
-	gfp = vma_thp_gfp_mask(vma);
+	gfp = vma_thp_gfp_mask(vma) | __GFP_PREZEROED;
 	while (orders) {
 		addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order);
 		folio = vma_alloc_folio(gfp, order, vma, addr);
@@ -5194,7 +5194,8 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
 			 * that the page corresponding to the faulting address
 			 * will be hot in the cache after zeroing.
 			 */
-			if (user_alloc_needs_zeroing())
+			if (user_alloc_needs_zeroing() &&
+			    !folio_test_clear_prezeroed(folio))
 				folio_zero_user(folio, vmf->address);
 			return folio;
 		}
-- 
MST



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH RFC 6/9] mm: skip zeroing in vma_alloc_anon_folio_pmd for pre-zeroed pages
  2026-04-12 22:50 [PATCH RFC 0/9] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
                   ` (4 preceding siblings ...)
  2026-04-12 22:50 ` [PATCH RFC 5/9] mm: skip zeroing in alloc_anon_folio " Michael S. Tsirkin
@ 2026-04-12 22:50 ` Michael S. Tsirkin
  2026-04-12 22:51 ` [PATCH RFC 7/9] mm: hugetlb: skip zeroing of pre-zeroed hugetlb pages Michael S. Tsirkin
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2026-04-12 22:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
	Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
	Andrea Arcangeli, linux-mm, virtualization, Lorenzo Stoakes,
	Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang

Use __GFP_PREZEROED and folio_test_clear_prezeroed() to skip
folio_zero_user() in the PMD THP anonymous page allocation path
when the page is already zeroed.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
---
 mm/huge_memory.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 8e2746ea74ad..3b9b53fad0f1 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1256,7 +1256,7 @@ EXPORT_SYMBOL_GPL(thp_get_unmapped_area);
 static struct folio *vma_alloc_anon_folio_pmd(struct vm_area_struct *vma,
 		unsigned long addr)
 {
-	gfp_t gfp = vma_thp_gfp_mask(vma);
+	gfp_t gfp = vma_thp_gfp_mask(vma) | __GFP_PREZEROED;
 	const int order = HPAGE_PMD_ORDER;
 	struct folio *folio;
 
@@ -1285,7 +1285,7 @@ static struct folio *vma_alloc_anon_folio_pmd(struct vm_area_struct *vma,
 	* make sure that the page corresponding to the faulting address will be
 	* hot in the cache after zeroing.
 	*/
-	if (user_alloc_needs_zeroing())
+	if (user_alloc_needs_zeroing() && !folio_test_clear_prezeroed(folio))
 		folio_zero_user(folio, addr);
 	/*
 	 * The memory barrier inside __folio_mark_uptodate makes sure that
-- 
MST



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH RFC 7/9] mm: hugetlb: skip zeroing of pre-zeroed hugetlb pages
  2026-04-12 22:50 [PATCH RFC 0/9] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
                   ` (5 preceding siblings ...)
  2026-04-12 22:50 ` [PATCH RFC 6/9] mm: skip zeroing in vma_alloc_anon_folio_pmd " Michael S. Tsirkin
@ 2026-04-12 22:51 ` Michael S. Tsirkin
  2026-04-12 22:51 ` [PATCH RFC 8/9] mm: page_reporting: add flush parameter to trigger immediate reporting Michael S. Tsirkin
  2026-04-12 22:51 ` [PATCH RFC 9/9] virtio_balloon: a hack to enable host-zeroed page optimization Michael S. Tsirkin
  8 siblings, 0 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2026-04-12 22:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
	Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
	Andrea Arcangeli, linux-mm, virtualization, Muchun Song,
	Oscar Salvador

When a surplus hugetlb page is allocated from the buddy allocator
and the page was previously reported to the host (and zeroed on
reclaim), skip the redundant folio_zero_user() in the hugetlb
fault path.

This only benefits surplus hugetlb pages that are freshly allocated
from the buddy.  Pages from the persistent hugetlb pool are not
affected since they are not allocated from buddy at fault time.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
---
 fs/hugetlbfs/inode.c    | 3 ++-
 include/linux/hugetlb.h | 2 +-
 mm/hugetlb.c            | 3 ++-
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 3f70c47981de..301567ad160f 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -828,7 +828,8 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
 			error = PTR_ERR(folio);
 			goto out;
 		}
-		folio_zero_user(folio, addr);
+		if (!folio_test_clear_prezeroed(folio))
+			folio_zero_user(folio, addr);
 		__folio_mark_uptodate(folio);
 		error = hugetlb_add_to_page_cache(folio, mapping, index);
 		if (unlikely(error)) {
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 65910437be1c..07e3ef8c0418 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -937,7 +937,7 @@ static inline bool hugepage_movable_supported(struct hstate *h)
 /* Movability of hugepages depends on migration support. */
 static inline gfp_t htlb_alloc_mask(struct hstate *h)
 {
-	gfp_t gfp = __GFP_COMP | __GFP_NOWARN;
+	gfp_t gfp = __GFP_COMP | __GFP_NOWARN | __GFP_PREZEROED;
 
 	gfp |= hugepage_movable_supported(h) ? GFP_HIGHUSER_MOVABLE : GFP_HIGHUSER;
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 0beb6e22bc26..5b23b006c37c 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5809,7 +5809,8 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
 				ret = 0;
 			goto out;
 		}
-		folio_zero_user(folio, vmf->real_address);
+		if (!folio_test_clear_prezeroed(folio))
+			folio_zero_user(folio, vmf->real_address);
 		__folio_mark_uptodate(folio);
 		new_folio = true;
 
-- 
MST



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH RFC 8/9] mm: page_reporting: add flush parameter to trigger immediate reporting
  2026-04-12 22:50 [PATCH RFC 0/9] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
                   ` (6 preceding siblings ...)
  2026-04-12 22:51 ` [PATCH RFC 7/9] mm: hugetlb: skip zeroing of pre-zeroed hugetlb pages Michael S. Tsirkin
@ 2026-04-12 22:51 ` Michael S. Tsirkin
  2026-04-12 22:51 ` [PATCH RFC 9/9] virtio_balloon: a hack to enable host-zeroed page optimization Michael S. Tsirkin
  8 siblings, 0 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2026-04-12 22:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
	Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
	Andrea Arcangeli, linux-mm, virtualization, Johannes Weiner,
	Zi Yan

Add a write-only module parameter 'flush' that triggers an immediate
page reporting cycle.  Writing any value flushes pending work and
runs one cycle synchronously.

This is useful for testing and benchmarking the pre-zeroed page
optimization, where the reporting delay (2 seconds) makes it hard
to ensure pages are reported before measuring allocation performance.

  echo 1 > /sys/module/page_reporting/parameters/flush

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
---
 mm/page_reporting.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/mm/page_reporting.c b/mm/page_reporting.c
index cb24832bdf4e..e9a2186e4c48 100644
--- a/mm/page_reporting.c
+++ b/mm/page_reporting.c
@@ -351,6 +351,31 @@ static void page_reporting_process(struct work_struct *work)
 static DEFINE_MUTEX(page_reporting_mutex);
 DEFINE_STATIC_KEY_FALSE(page_reporting_enabled);
 
+static int page_reporting_flush_set(const char *val,
+				    const struct kernel_param *kp)
+{
+	struct page_reporting_dev_info *prdev;
+
+	mutex_lock(&page_reporting_mutex);
+	prdev = rcu_dereference_protected(pr_dev_info,
+				lockdep_is_held(&page_reporting_mutex));
+	if (prdev) {
+		flush_delayed_work(&prdev->work);
+		__page_reporting_request(prdev);
+		flush_delayed_work(&prdev->work);
+	}
+	mutex_unlock(&page_reporting_mutex);
+	return 0;
+}
+
+static const struct kernel_param_ops flush_ops = {
+	.set = page_reporting_flush_set,
+	.get = param_get_uint,
+};
+static unsigned int page_reporting_flush;
+module_param_cb(flush, &flush_ops, &page_reporting_flush, 0200);
+MODULE_PARM_DESC(flush, "Trigger immediate page reporting cycle");
+
 int page_reporting_register(struct page_reporting_dev_info *prdev)
 {
 	int err = 0;
-- 
MST



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH RFC 9/9] virtio_balloon: a hack to enable host-zeroed page optimization
  2026-04-12 22:50 [PATCH RFC 0/9] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
                   ` (7 preceding siblings ...)
  2026-04-12 22:51 ` [PATCH RFC 8/9] mm: page_reporting: add flush parameter to trigger immediate reporting Michael S. Tsirkin
@ 2026-04-12 22:51 ` Michael S. Tsirkin
  8 siblings, 0 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2026-04-12 22:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
	Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
	Andrea Arcangeli, linux-mm, virtualization, Xuan Zhuo,
	Eugenio Pérez

Add a module parameter host_zeroes_pages to opt in to the pre-zeroed
page optimization.  A proper virtio feature flag is needed before
this can be merged.

  insmod virtio_balloon.ko host_zeroes_pages=1

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
---
 drivers/virtio/virtio_balloon.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index d1fbc8fe8470..5d37196daa75 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -19,6 +19,11 @@
 #include <linux/mm.h>
 #include <linux/page_reporting.h>
 
+static bool host_zeroes_pages;
+module_param(host_zeroes_pages, bool, 0644);
+MODULE_PARM_DESC(host_zeroes_pages,
+		 "Host zeroes reported pages, skip guest re-zeroing");
+
 /*
  * Balloon device works in 4K page units.  So each page is pointed to by
  * multiple balloon pages.  All memory counters in this driver are in balloon
@@ -1039,6 +1044,8 @@ static int virtballoon_probe(struct virtio_device *vdev)
 		vb->pr_dev_info.order = 5;
 #endif
 
+		/* TODO: needs a virtio feature flag */
+		vb->pr_dev_info.host_zeroes_pages = host_zeroes_pages;
 		err = page_reporting_register(&vb->pr_dev_info);
 		if (err)
 			goto out_unregister_oom;
-- 
MST



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 2/9] mm: page_reporting: skip redundant zeroing of host-zeroed reported pages
  2026-04-12 22:50 ` [PATCH RFC 2/9] mm: page_reporting: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
@ 2026-04-13  8:00   ` David Hildenbrand (Arm)
  2026-04-13  8:10     ` Michael S. Tsirkin
  0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-13  8:00 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel
  Cc: Andrew Morton, Vlastimil Babka, Brendan Jackman, Michal Hocko,
	Suren Baghdasaryan, Jason Wang, Andrea Arcangeli, linux-mm,
	virtualization, Lorenzo Stoakes, Liam R. Howlett, Mike Rapoport,
	Johannes Weiner, Zi Yan

On 4/13/26 00:50, Michael S. Tsirkin wrote:
> When a guest reports free pages to the hypervisor via the page reporting
> framework (used by virtio-balloon and hv_balloon), the host typically
> zeros those pages when reclaiming their backing memory.  However, when
> those pages are later allocated in the guest, post_alloc_hook()
> unconditionally zeros them again if __GFP_ZERO is set.  This
> double-zeroing is wasteful, especially for large pages.
> 
> Avoid redundant zeroing by propagating the "host already zeroed this"
> information through the allocation path:
> 
> 1. Add a host_zeroes_pages flag to page_reporting_dev_info, allowing
>    drivers to declare that their host zeros reported pages on reclaim.
>    A static key (page_reporting_host_zeroes) gates the fast path.
> 
> 2. In page_del_and_expand(), when the page was reported and the
>    static key is enabled, stash a sentinel value (MAGIC_PAGE_ZEROED)
>    in page->private.
> 
> 3. In post_alloc_hook(), check page->private for the sentinel.  If
>    present and zeroing was requested (but not tag zeroing), skip
>    kernel_init_pages().
> 
> In particular, __GFP_ZERO is used by the x86 arch override of
> vma_alloc_zeroed_movable_folio.
> 
> No driver sets host_zeroes_pages yet; a follow-up patch to
> virtio_balloon is needed to opt in.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> Assisted-by: Claude:claude-opus-4-6
> ---
>  include/linux/mm.h             |  6 ++++++
>  include/linux/page_reporting.h |  3 +++
>  mm/page_alloc.c                | 21 +++++++++++++++++++++
>  mm/page_reporting.c            |  9 +++++++++
>  mm/page_reporting.h            |  2 ++
>  5 files changed, 41 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 5be3d8a8f806..59fc77c4c90e 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -4814,6 +4814,12 @@ static inline bool user_alloc_needs_zeroing(void)
>  				   &init_on_alloc);
>  }
>  
> +/*
> + * Sentinel stored in page->private to indicate the page was pre-zeroed
> + * by the hypervisor (via free page reporting).
> + */
> +#define MAGIC_PAGE_ZEROED	0x5A45524FU	/* ZERO */

Why are we not using another page flag that is yet unused for buddy pages?

Using page->private for that, and exposing it to buddy users with the
__GFP_PREZEROED flag (I hope we can avoid that) does not sound
particularly elegant.

Also, if we're going to remember that some pages in the buddy are
pre-zeroed, it should better not be free-page-reporting specific.

I'd assume ordinary inflating+deflating of the balloon would also end up
with pre-zeroed pages. We'd just need a (mm/balloon.c -specific)
interface to tell the buddy that the pages are zeroed.


-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 2/9] mm: page_reporting: skip redundant zeroing of host-zeroed reported pages
  2026-04-13  8:00   ` David Hildenbrand (Arm)
@ 2026-04-13  8:10     ` Michael S. Tsirkin
  2026-04-13  8:15       ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 15+ messages in thread
From: Michael S. Tsirkin @ 2026-04-13  8:10 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, Andrew Morton, Vlastimil Babka, Brendan Jackman,
	Michal Hocko, Suren Baghdasaryan, Jason Wang, Andrea Arcangeli,
	linux-mm, virtualization, Lorenzo Stoakes, Liam R. Howlett,
	Mike Rapoport, Johannes Weiner, Zi Yan

On Mon, Apr 13, 2026 at 10:00:58AM +0200, David Hildenbrand (Arm) wrote:
> On 4/13/26 00:50, Michael S. Tsirkin wrote:
> > When a guest reports free pages to the hypervisor via the page reporting
> > framework (used by virtio-balloon and hv_balloon), the host typically
> > zeros those pages when reclaiming their backing memory.  However, when
> > those pages are later allocated in the guest, post_alloc_hook()
> > unconditionally zeros them again if __GFP_ZERO is set.  This
> > double-zeroing is wasteful, especially for large pages.
> > 
> > Avoid redundant zeroing by propagating the "host already zeroed this"
> > information through the allocation path:
> > 
> > 1. Add a host_zeroes_pages flag to page_reporting_dev_info, allowing
> >    drivers to declare that their host zeros reported pages on reclaim.
> >    A static key (page_reporting_host_zeroes) gates the fast path.
> > 
> > 2. In page_del_and_expand(), when the page was reported and the
> >    static key is enabled, stash a sentinel value (MAGIC_PAGE_ZEROED)
> >    in page->private.
> > 
> > 3. In post_alloc_hook(), check page->private for the sentinel.  If
> >    present and zeroing was requested (but not tag zeroing), skip
> >    kernel_init_pages().
> > 
> > In particular, __GFP_ZERO is used by the x86 arch override of
> > vma_alloc_zeroed_movable_folio.
> > 
> > No driver sets host_zeroes_pages yet; a follow-up patch to
> > virtio_balloon is needed to opt in.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > Assisted-by: Claude:claude-opus-4-6
> > ---
> >  include/linux/mm.h             |  6 ++++++
> >  include/linux/page_reporting.h |  3 +++
> >  mm/page_alloc.c                | 21 +++++++++++++++++++++
> >  mm/page_reporting.c            |  9 +++++++++
> >  mm/page_reporting.h            |  2 ++
> >  5 files changed, 41 insertions(+)
> > 
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 5be3d8a8f806..59fc77c4c90e 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -4814,6 +4814,12 @@ static inline bool user_alloc_needs_zeroing(void)
> >  				   &init_on_alloc);
> >  }
> >  
> > +/*
> > + * Sentinel stored in page->private to indicate the page was pre-zeroed
> > + * by the hypervisor (via free page reporting).
> > + */
> > +#define MAGIC_PAGE_ZEROED	0x5A45524FU	/* ZERO */
> 
> Why are we not using another page flag that is yet unused for buddy pages?

Because we need to report the status *after* it left buddy.
And all flags are in use at that point.


> Using page->private for that, and exposing it to buddy users with the
> __GFP_PREZEROED flag (I hope we can avoid that) does not sound
> particularly elegant.

But propagating this all over mm does not sound too palatable, right?
There's precedent with MAGIC_HWPOISON already.
Better ideas? Thanks!

> Also, if we're going to remember that some pages in the buddy are
> pre-zeroed, it should better not be free-page-reporting specific.
> I'd assume ordinary inflating+deflating of the balloon would also end up
> with pre-zeroed pages. We'd just need a (mm/balloon.c -specific)
> interface to tell the buddy that the pages are zeroed.
> 

Indeed, it's also easily possible - it's a separate optimization, though.
Another simple enhancement is including hugetlbfs freelists in page
reporting.
Doesn't need to block this patchset though, right?

> 
> -- 
> Cheers,
> 
> David



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 2/9] mm: page_reporting: skip redundant zeroing of host-zeroed reported pages
  2026-04-13  8:10     ` Michael S. Tsirkin
@ 2026-04-13  8:15       ` David Hildenbrand (Arm)
  2026-04-13  8:29         ` Michael S. Tsirkin
  0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-13  8:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, Andrew Morton, Vlastimil Babka, Brendan Jackman,
	Michal Hocko, Suren Baghdasaryan, Jason Wang, Andrea Arcangeli,
	linux-mm, virtualization, Lorenzo Stoakes, Liam R. Howlett,
	Mike Rapoport, Johannes Weiner, Zi Yan

On 4/13/26 10:10, Michael S. Tsirkin wrote:
> On Mon, Apr 13, 2026 at 10:00:58AM +0200, David Hildenbrand (Arm) wrote:
>> On 4/13/26 00:50, Michael S. Tsirkin wrote:
>>> When a guest reports free pages to the hypervisor via the page reporting
>>> framework (used by virtio-balloon and hv_balloon), the host typically
>>> zeros those pages when reclaiming their backing memory.  However, when
>>> those pages are later allocated in the guest, post_alloc_hook()
>>> unconditionally zeros them again if __GFP_ZERO is set.  This
>>> double-zeroing is wasteful, especially for large pages.
>>>
>>> Avoid redundant zeroing by propagating the "host already zeroed this"
>>> information through the allocation path:
>>>
>>> 1. Add a host_zeroes_pages flag to page_reporting_dev_info, allowing
>>>    drivers to declare that their host zeros reported pages on reclaim.
>>>    A static key (page_reporting_host_zeroes) gates the fast path.
>>>
>>> 2. In page_del_and_expand(), when the page was reported and the
>>>    static key is enabled, stash a sentinel value (MAGIC_PAGE_ZEROED)
>>>    in page->private.
>>>
>>> 3. In post_alloc_hook(), check page->private for the sentinel.  If
>>>    present and zeroing was requested (but not tag zeroing), skip
>>>    kernel_init_pages().
>>>
>>> In particular, __GFP_ZERO is used by the x86 arch override of
>>> vma_alloc_zeroed_movable_folio.
>>>
>>> No driver sets host_zeroes_pages yet; a follow-up patch to
>>> virtio_balloon is needed to opt in.
>>>
>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>> Assisted-by: Claude:claude-opus-4-6
>>> ---
>>>  include/linux/mm.h             |  6 ++++++
>>>  include/linux/page_reporting.h |  3 +++
>>>  mm/page_alloc.c                | 21 +++++++++++++++++++++
>>>  mm/page_reporting.c            |  9 +++++++++
>>>  mm/page_reporting.h            |  2 ++
>>>  5 files changed, 41 insertions(+)
>>>
>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>> index 5be3d8a8f806..59fc77c4c90e 100644
>>> --- a/include/linux/mm.h
>>> +++ b/include/linux/mm.h
>>> @@ -4814,6 +4814,12 @@ static inline bool user_alloc_needs_zeroing(void)
>>>  				   &init_on_alloc);
>>>  }
>>>  
>>> +/*
>>> + * Sentinel stored in page->private to indicate the page was pre-zeroed
>>> + * by the hypervisor (via free page reporting).
>>> + */
>>> +#define MAGIC_PAGE_ZEROED	0x5A45524FU	/* ZERO */
>>
>> Why are we not using another page flag that is yet unused for buddy pages?
> 
> Because we need to report the status *after* it left buddy.
> And all flags are in use at that point.

I'll comment on that on the other patch, where __GFP_PREZEROED, which I
really hate, is added.

> 
> 
>> Using page->private for that, and exposing it to buddy users with the
>> __GFP_PREZEROED flag (I hope we can avoid that) does not sound
>> particularly elegant.
> 
> But propagating this all over mm does not sound too palatable, right?
> There's precedent with MAGIC_HWPOISON already.
> Better ideas? Thanks!

I'll comment on the __GFP_PREZEROED patch.

> 
>> Also, if we're going to remember that some pages in the buddy are
>> pre-zeroed, it should better not be free-page-reporting specific.
>> I'd assume ordinary inflating+deflating of the balloon would also end up
>> with pre-zeroed pages. We'd just need a (mm/balloon.c -specific)
>> interface to tell the buddy that the pages are zeroed.
>>
> 
> Indeed, it's also easily possible - it's a separate optimization, though.
> Another simple enhancement is including hugetlbfs freelists in page
> reporting.
> Doesn't need to block this patchset though, right?

Not blocking, but I don't want something that is too coupled to
free-page reporting optimizations in the buddy. The comment above
MAGIC_PAGE_ZEROED triggered my reaction.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 2/9] mm: page_reporting: skip redundant zeroing of host-zeroed reported pages
  2026-04-13  8:15       ` David Hildenbrand (Arm)
@ 2026-04-13  8:29         ` Michael S. Tsirkin
  0 siblings, 0 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2026-04-13  8:29 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, Andrew Morton, Vlastimil Babka, Brendan Jackman,
	Michal Hocko, Suren Baghdasaryan, Jason Wang, Andrea Arcangeli,
	linux-mm, virtualization, Lorenzo Stoakes, Liam R. Howlett,
	Mike Rapoport, Johannes Weiner, Zi Yan

On Mon, Apr 13, 2026 at 10:15:08AM +0200, David Hildenbrand (Arm) wrote:
> On 4/13/26 10:10, Michael S. Tsirkin wrote:
> > On Mon, Apr 13, 2026 at 10:00:58AM +0200, David Hildenbrand (Arm) wrote:
> >> On 4/13/26 00:50, Michael S. Tsirkin wrote:
> >>> When a guest reports free pages to the hypervisor via the page reporting
> >>> framework (used by virtio-balloon and hv_balloon), the host typically
> >>> zeros those pages when reclaiming their backing memory.  However, when
> >>> those pages are later allocated in the guest, post_alloc_hook()
> >>> unconditionally zeros them again if __GFP_ZERO is set.  This
> >>> double-zeroing is wasteful, especially for large pages.
> >>>
> >>> Avoid redundant zeroing by propagating the "host already zeroed this"
> >>> information through the allocation path:
> >>>
> >>> 1. Add a host_zeroes_pages flag to page_reporting_dev_info, allowing
> >>>    drivers to declare that their host zeros reported pages on reclaim.
> >>>    A static key (page_reporting_host_zeroes) gates the fast path.
> >>>
> >>> 2. In page_del_and_expand(), when the page was reported and the
> >>>    static key is enabled, stash a sentinel value (MAGIC_PAGE_ZEROED)
> >>>    in page->private.
> >>>
> >>> 3. In post_alloc_hook(), check page->private for the sentinel.  If
> >>>    present and zeroing was requested (but not tag zeroing), skip
> >>>    kernel_init_pages().
> >>>
> >>> In particular, __GFP_ZERO is used by the x86 arch override of
> >>> vma_alloc_zeroed_movable_folio.
> >>>
> >>> No driver sets host_zeroes_pages yet; a follow-up patch to
> >>> virtio_balloon is needed to opt in.
> >>>
> >>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >>> Assisted-by: Claude:claude-opus-4-6
> >>> ---
> >>>  include/linux/mm.h             |  6 ++++++
> >>>  include/linux/page_reporting.h |  3 +++
> >>>  mm/page_alloc.c                | 21 +++++++++++++++++++++
> >>>  mm/page_reporting.c            |  9 +++++++++
> >>>  mm/page_reporting.h            |  2 ++
> >>>  5 files changed, 41 insertions(+)
> >>>
> >>> diff --git a/include/linux/mm.h b/include/linux/mm.h
> >>> index 5be3d8a8f806..59fc77c4c90e 100644
> >>> --- a/include/linux/mm.h
> >>> +++ b/include/linux/mm.h
> >>> @@ -4814,6 +4814,12 @@ static inline bool user_alloc_needs_zeroing(void)
> >>>  				   &init_on_alloc);
> >>>  }
> >>>  
> >>> +/*
> >>> + * Sentinel stored in page->private to indicate the page was pre-zeroed
> >>> + * by the hypervisor (via free page reporting).
> >>> + */
> >>> +#define MAGIC_PAGE_ZEROED	0x5A45524FU	/* ZERO */
> >>
> >> Why are we not using another page flag that is yet unused for buddy pages?
> > 
> > Because we need to report the status *after* it left buddy.
> > And all flags are in use at that point.
> 
> I'll comment on that on the other patch, where __GFP_PREZEROED, which I
> really hate, is added.
> 
> > 
> > 
> >> Using page->private for that, and exposing it to buddy users with the
> >> __GFP_PREZEROED flag (I hope we can avoid that) does not sound
> >> particularly elegant.
> > 
> > But propagating this all over mm does not sound too palatable, right?
> > There's precedent with MAGIC_HWPOISON already.
> > Better ideas? Thanks!
> 
> I'll comment on the __GFP_PREZEROED patch.
> 
> > 
> >> Also, if we're going to remember that some pages in the buddy are
> >> pre-zeroed, it should better not be free-page-reporting specific.
> >> I'd assume ordinary inflating+deflating of the balloon would also end up
> >> with pre-zeroed pages. We'd just need a (mm/balloon.c -specific)
> >> interface to tell the buddy that the pages are zeroed.
> >>
> > 
> > Indeed, it's also easily possible - it's a separate optimization, though.
> > Another simple enhancement is including hugetlbfs freelists in page
> > reporting.
> > Doesn't need to block this patchset though, right?
> 
> Not blocking, but I don't want something that is too coupled to
> free-page reporting optimizations in the buddy.


I can add that in the next version if you like, sure.  The main issue is
that it means we need a flag that survives free.  And the benefit is
much smaller - unlike page reports, deflates are rare.

> The comment above
> MAGIC_PAGE_ZEROED triggered my reaction.

yea, that's more confusing than helpful.

> -- 
> Cheers,
> 
> David



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 3/9] mm: add __GFP_PREZEROED flag and folio_test_clear_prezeroed()
  2026-04-12 22:50 ` [PATCH RFC 3/9] mm: add __GFP_PREZEROED flag and folio_test_clear_prezeroed() Michael S. Tsirkin
@ 2026-04-13  9:05   ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-13  9:05 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel
  Cc: Andrew Morton, Vlastimil Babka, Brendan Jackman, Michal Hocko,
	Suren Baghdasaryan, Jason Wang, Andrea Arcangeli, linux-mm,
	virtualization, Lorenzo Stoakes, Liam R. Howlett, Mike Rapoport,
	Johannes Weiner, Zi Yan

On 4/13/26 00:50, Michael S. Tsirkin wrote:
> The previous patch skips zeroing in post_alloc_hook() when
> __GFP_ZERO is used.  However, several page allocation paths
> zero pages via folio_zero_user() or clear_user_highpage() after
> allocation, not via __GFP_ZERO.
> 
> Add __GFP_PREZEROED gfp flag that tells post_alloc_hook() to
> preserve the MAGIC_PAGE_ZEROED sentinel in page->private so the
> caller can detect pre-zeroed pages and skip its own zeroing.
> Add folio_test_clear_prezeroed() helper to check and clear
> the sentinel.

I really don't like __GFP_PREZEROED, and wonder how we can avoid it.


What you want is, allocate a folio (well, actually a page that becomes
a folio) and know whether zeroing for that folio (once we establish it
from a page) is still required.

Or you just allocate a folio, specify GFP_ZERO, and let the folio
allocation code deal with that.


I think we have two options:

(1) Use an indication that can be sticky for callers that do not care.

Assuming we would use a page flag that is only ever used on folios, all
we'd have to do is make sure that we clear the flag once we convert
the to a folio.

For example, PG_dropbehind is only ever set on folios in the pagecache. 

Paths that allocate folios would have to clear the flag. For non-hugetlb
folios that happens through page_rmappable_folio().

I'm not super-happy about that, but it would be doable.


(2) Use a dedicated allocation interface for user pages in the buddy.

I hate the whole user_alloc_needs_zeroing()+folio_zero_user() handling.

It shouldn't exist. We should just be passing GFP_ZERO and let the buddy handle
all that.


For example, vma_alloc_folio() already gets passed the address in.

Pass the address from vma_alloc_folio_noprof()->folio_alloc_noprof(), and let
folio_alloc_noprof() use a buddy interface that can handle it.

Imagine if we had a alloc_user_pages_noprof() that consumes an address. It could just
do what folio_zero_user() does, and only if really required.

The whole user_alloc_needs_zeroing() could go away and you could just handle the
pre-zeroed optimization internally.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-04-13  9:05 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-04-12 22:50 [PATCH RFC 0/9] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
2026-04-12 22:50 ` [PATCH RFC 1/9] mm: page_alloc: propagate PageReported flag across buddy splits Michael S. Tsirkin
2026-04-12 22:50 ` [PATCH RFC 2/9] mm: page_reporting: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
2026-04-13  8:00   ` David Hildenbrand (Arm)
2026-04-13  8:10     ` Michael S. Tsirkin
2026-04-13  8:15       ` David Hildenbrand (Arm)
2026-04-13  8:29         ` Michael S. Tsirkin
2026-04-12 22:50 ` [PATCH RFC 3/9] mm: add __GFP_PREZEROED flag and folio_test_clear_prezeroed() Michael S. Tsirkin
2026-04-13  9:05   ` David Hildenbrand (Arm)
2026-04-12 22:50 ` [PATCH RFC 4/9] mm: skip zeroing in vma_alloc_zeroed_movable_folio for pre-zeroed pages Michael S. Tsirkin
2026-04-12 22:50 ` [PATCH RFC 5/9] mm: skip zeroing in alloc_anon_folio " Michael S. Tsirkin
2026-04-12 22:50 ` [PATCH RFC 6/9] mm: skip zeroing in vma_alloc_anon_folio_pmd " Michael S. Tsirkin
2026-04-12 22:51 ` [PATCH RFC 7/9] mm: hugetlb: skip zeroing of pre-zeroed hugetlb pages Michael S. Tsirkin
2026-04-12 22:51 ` [PATCH RFC 8/9] mm: page_reporting: add flush parameter to trigger immediate reporting Michael S. Tsirkin
2026-04-12 22:51 ` [PATCH RFC 9/9] virtio_balloon: a hack to enable host-zeroed page optimization Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox