From: "Michael S. Tsirkin" <mst@redhat.com>
To: linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Vlastimil Babka <vbabka@kernel.org>,
Brendan Jackman <jackmanb@google.com>,
Michal Hocko <mhocko@suse.com>,
Suren Baghdasaryan <surenb@google.com>,
Jason Wang <jasowang@redhat.com>,
Andrea Arcangeli <aarcange@redhat.com>,
linux-mm@kvack.org, virtualization@lists.linux.dev
Subject: [PATCH RFC 0/9] mm/virtio: skip redundant zeroing of host-zeroed reported pages
Date: Sun, 12 Apr 2026 18:50:36 -0400 [thread overview]
Message-ID: <cover.1776033471.git.mst@redhat.com> (raw)
When a guest reports free pages to the hypervisor via virtio-balloon's
free page reporting, the host typically zeros those pages when reclaiming
their backing memory (e.g., via MADV_DONTNEED on anonymous mappings).
When the guest later reallocates those pages, the kernel zeros them
again -- redundantly.
This series eliminates that double-zeroing by propagating the "host
already zeroed this page" information through the buddy allocator and
into the page fault path.
Performance with THP enabled on a 2GB VM, 1 vCPU, allocating
256MB of anonymous pages:
metric baseline optimized delta
task-clock 179ms 99ms -45%
cache-misses 1.22M 287K -76%
instructions 15.1M 13.9M -8%
With hugetlb surplus pages:
metric baseline optimized delta
task-clock 322ms 9.9ms -97%
cache-misses 659K 88K -87%
instructions 18.3M 10.6M -42%
Notes:
- The virtio_balloon patch (9/9) is a testing hack with a module
parameter. A proper virtio feature flag is needed before merging.
- Patch 8/9 adds a sysfs flush trigger for deterministic testing
(avoids waiting for the 2-second reporting delay).
- The optimization is most effective with THP, where entire 2MB
pages are allocated directly from reported order-9+ buddy pages.
Without THP, only ~21% of order-0 allocations come from reported
pages due to low-order fragmentation.
- Persistent hugetlb pool pages are not covered: when freed by
userspace they return to the hugetlb free pool, not the buddy
allocator, so they are never reported to the host. Surplus
hugetlb pages are allocated from buddy and do benefit.
Test program:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#ifndef MADV_POPULATE_WRITE
#define MADV_POPULATE_WRITE 23
#endif
#ifndef MAP_HUGETLB
#define MAP_HUGETLB 0x40000
#endif
int main(int argc, char **argv)
{
unsigned long size;
int flags = MAP_PRIVATE | MAP_ANONYMOUS;
void *p;
int r;
if (argc < 2) {
fprintf(stderr, "usage: %s <size_mb> [huge]\n", argv[0]);
return 1;
}
size = atol(argv[1]) * 1024UL * 1024;
if (argc >= 3 && strcmp(argv[2], "huge") == 0)
flags |= MAP_HUGETLB;
p = mmap(NULL, size, PROT_READ | PROT_WRITE, flags, -1, 0);
if (p == MAP_FAILED) {
perror("mmap");
return 1;
}
r = madvise(p, size, MADV_POPULATE_WRITE);
if (r) {
perror("madvise");
return 1;
}
munmap(p, size);
return 0;
}
Test script (bench.sh):
#!/bin/bash
# Usage: bench.sh <size_mb> <mode> <iterations> [huge]
# mode 0 = baseline, mode 1 = skip zeroing
SZ=${1:-256}; MODE=${2:-0}; ITER=${3:-10}; HUGE=${4:-}
FLUSH=/sys/module/page_reporting/parameters/flush
PERF_DATA=/tmp/perf-$MODE.data
rmmod virtio_balloon 2>/dev/null
insmod virtio_balloon.ko host_zeroes_pages=$MODE
echo 1 > $FLUSH
[ "$HUGE" = "huge" ] && echo $((SZ/2)) > /proc/sys/vm/nr_overcommit_hugepages
rm -f $PERF_DATA
echo "=== sz=${SZ}MB mode=$MODE iter=$ITER $HUGE ==="
for i in $(seq 1 $ITER); do
echo 3 > /proc/sys/vm/drop_caches
echo 1 > $FLUSH
perf stat record -e task-clock,instructions,cache-misses \
-o $PERF_DATA --append -- ./alloc_once $SZ $HUGE
done
[ "$HUGE" = "huge" ] && echo 0 > /proc/sys/vm/nr_overcommit_hugepages
rmmod virtio_balloon
perf stat report -i $PERF_DATA
Compile and run:
gcc -static -O2 -o alloc_once alloc_once.c
bash bench.sh 256 0 10 # baseline (regular pages)
bash bench.sh 256 1 10 # optimized (regular pages)
bash bench.sh 256 0 10 huge # baseline (hugetlb surplus)
bash bench.sh 256 1 10 huge # optimized (hugetlb surplus)
Written with assistance from claude. Everything manually read, patchset
split and commit logs edited manually.
Michael S. Tsirkin (9):
mm: page_alloc: propagate PageReported flag across buddy splits
mm: page_reporting: skip redundant zeroing of host-zeroed reported
pages
mm: add __GFP_PREZEROED flag and folio_test_clear_prezeroed()
mm: skip zeroing in vma_alloc_zeroed_movable_folio for pre-zeroed
pages
mm: skip zeroing in alloc_anon_folio for pre-zeroed pages
mm: skip zeroing in vma_alloc_anon_folio_pmd for pre-zeroed pages
mm: hugetlb: skip zeroing of pre-zeroed hugetlb pages
mm: page_reporting: add flush parameter to trigger immediate reporting
virtio_balloon: a hack to enable host-zeroed page optimization
drivers/virtio/virtio_balloon.c | 7 +++++
fs/hugetlbfs/inode.c | 3 ++-
include/linux/gfp_types.h | 5 ++++
include/linux/highmem.h | 6 +++--
include/linux/hugetlb.h | 2 +-
include/linux/mm.h | 22 ++++++++++++++++
include/linux/page_reporting.h | 3 +++
mm/huge_memory.c | 4 +--
mm/hugetlb.c | 3 ++-
mm/memory.c | 5 ++--
mm/page_alloc.c | 46 ++++++++++++++++++++++++++++++---
mm/page_reporting.c | 34 ++++++++++++++++++++++++
mm/page_reporting.h | 2 ++
13 files changed, 129 insertions(+), 13 deletions(-)
--
MST
next reply other threads:[~2026-04-12 22:50 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-12 22:50 Michael S. Tsirkin [this message]
2026-04-12 22:50 ` [PATCH RFC 1/9] mm: page_alloc: propagate PageReported flag across buddy splits Michael S. Tsirkin
2026-04-12 22:50 ` [PATCH RFC 2/9] mm: page_reporting: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
2026-04-12 22:50 ` [PATCH RFC 3/9] mm: add __GFP_PREZEROED flag and folio_test_clear_prezeroed() Michael S. Tsirkin
2026-04-12 22:50 ` [PATCH RFC 4/9] mm: skip zeroing in vma_alloc_zeroed_movable_folio for pre-zeroed pages Michael S. Tsirkin
2026-04-12 22:50 ` [PATCH RFC 5/9] mm: skip zeroing in alloc_anon_folio " Michael S. Tsirkin
2026-04-12 22:50 ` [PATCH RFC 6/9] mm: skip zeroing in vma_alloc_anon_folio_pmd " Michael S. Tsirkin
2026-04-12 22:51 ` [PATCH RFC 7/9] mm: hugetlb: skip zeroing of pre-zeroed hugetlb pages Michael S. Tsirkin
2026-04-12 22:51 ` [PATCH RFC 8/9] mm: page_reporting: add flush parameter to trigger immediate reporting Michael S. Tsirkin
2026-04-12 22:51 ` [PATCH RFC 9/9] virtio_balloon: a hack to enable host-zeroed page optimization Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1776033471.git.mst@redhat.com \
--to=mst@redhat.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=david@kernel.org \
--cc=jackmanb@google.com \
--cc=jasowang@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=virtualization@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox