From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7B50CF588C2 for ; Mon, 20 Apr 2026 12:51:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DD5156B00B1; Mon, 20 Apr 2026 08:51:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D861A6B00B4; Mon, 20 Apr 2026 08:51:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C4D536B00B5; Mon, 20 Apr 2026 08:51:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id B12F26B00B1 for ; Mon, 20 Apr 2026 08:51:21 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 856ACBC109 for ; Mon, 20 Apr 2026 12:51:21 +0000 (UTC) X-FDA: 84678919962.05.F9429B8 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf09.hostedemail.com (Postfix) with ESMTP id 5DC5314000E for ; Mon, 20 Apr 2026 12:51:19 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="fL1xy/B6"; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf09.hostedemail.com: domain of mst@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=mst@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1776689479; a=rsa-sha256; cv=none; b=h5zJj63wIATjaQN8m0l/k17AV0CEBv0cJf3a3udsM5SAEUo1OUQk5Lx4qPus2NyORNOwCE VGWQgKiLGhHaZ1181KJQmwZN0O5HGcGOiccYL7YHip668IzX70YC2Rw+5M8XPMAdT46F+w xwFxVKm8G+pcv2sdJEu5s8F9R16yqXY= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="fL1xy/B6"; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf09.hostedemail.com: domain of mst@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=mst@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1776689479; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=X7lPKT3ZlJbuSaN0JH9oInPdyaWIPp6laf7u7SE1AJ4=; b=ih1YK6+5OCVIWg6JFyMP5NFjrZaK2k+5v7X48ye9jLoMsKzRKbPHJpu+y6gFzlZ3TXQa+M RBGpW1r43CNB7vHb3Z91nhrnGMM4runLrIHlNDm45HjqXl+I1BXPFpPXosaOvr05CYrBYV DHdQdC4AnR4We2T1qDPu364Op632xdg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1776689478; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type; bh=X7lPKT3ZlJbuSaN0JH9oInPdyaWIPp6laf7u7SE1AJ4=; b=fL1xy/B635wcs+KpkWbUdFoYdGklGy4YP48lUFTjSwlKERxdDA8ByzTnGiIo8va4tiL4gp lhFW62PrpJ3yoFBZCqIk4BKuYlM1o9l9HjcfPVG7rrj+NK6MsY0loNialWV92bTnEadGhy U1pg+RccJoP19qTbFWa8LFC4Q1oFHU4= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-121-PHNQ74YnNGa2eLD3N4rzdA-1; Mon, 20 Apr 2026 08:51:17 -0400 X-MC-Unique: PHNQ74YnNGa2eLD3N4rzdA-1 X-Mimecast-MFC-AGG-ID: PHNQ74YnNGa2eLD3N4rzdA_1776689476 Received: by mail-wr1-f71.google.com with SMTP id ffacd0b85a97d-43ea7a5da57so2460177f8f.1 for ; Mon, 20 Apr 2026 05:51:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776689476; x=1777294276; h=content-disposition:mime-version:message-id:subject:cc:to:from:date :x-gm-gg:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=X7lPKT3ZlJbuSaN0JH9oInPdyaWIPp6laf7u7SE1AJ4=; b=Usnrwq5QzGQb1cK1ZWVjxuUnHzNZZP4htKTGMf/1HT14kk1dgAuWAQwIIglwFcc+AH xuzTS7aEjPGW18Pdgc4a3IwbqOilSF/BfYuUNsqwDIlvLQ7sFRxWYxev+NboY/TIcbm/ jA6rK7swOtaiTsG5d397w/5yQr+QFAW0oD9bIIRwqudi2618i8YymZ17V5TT4sfimVC2 lDMUTR3qmk24y84yKnWnxMAuHnutnbGCv/ZOSVxpf0/XAPyfiFTZq0tCx4RPxJSinoVX mpEiR+866xfboTrty1FQdeR/8JcvaHuMFSu1qK/t7dO4AJXHpAtduxMFiNhXazyeP0q6 yEww== X-Forwarded-Encrypted: i=1; AFNElJ/VGn1Wan6L9XPXT+k7hY6emHktOB2s7+653Xp6Dzr5UI1H+QvnFSOxsEHeyMD8xmlw+9tfUXYUYQ==@kvack.org X-Gm-Message-State: AOJu0Yyj+eJTSdjTKIbG31UYiVtilZfxdMmOeZqCAo/x3C0XJmQLc2+R v1RQOgq7vW4Yd83hyre+a80VqtCO4otAYJ+EIlwWc2QIhdeRVWKMYt8yLplqaOAdloohqw2Upwx Ie+Jb6BkQPXzW9J5DUkjyuinUy+fXOhK75VFelux2OfRSrblluQWg X-Gm-Gg: AeBDieuYYZnlqT+sdJTXQS9O8bbu6mwVRjuZBHQUeFuIqCLthzZiC+4KLEH7vvTOSqo u4cifSUXA+vfgghszWtbt+crVnA/mFRMQ6gpwWQWXQHaObVjtlAlCU0ybnd4nwcFimXstyV2H9t /DPGxD4Kcqqm/MuM0S3atx0xW4/bWdFuNfXakg+aODK9334y5d4ENL2xntOrXNUJCXjhB0vKLXT KrojBpBcXRwzGF5PUUAK3507p9WIWu417M3j7vE0/A7a3UBGXtfMdgcN9xq52IfO7Cex2Ga+adj Cxud6RxT3kpN1XqfhgnrMOdKFmOilbBk+bmX9UN9yBV2hgjjE3hZj8o/xexdcqllda7BHQYUxCB V/Uk7+DOKvvLJifwhCYnJz34aDNjuNBZX4Ulh+5YNn2NELzwPXz4uow== X-Received: by 2002:a05:6000:2f83:b0:43d:7d6f:f529 with SMTP id ffacd0b85a97d-43fe3e0dc49mr20902418f8f.31.1776689475947; Mon, 20 Apr 2026 05:51:15 -0700 (PDT) X-Received: by 2002:a05:6000:2f83:b0:43d:7d6f:f529 with SMTP id ffacd0b85a97d-43fe3e0dc49mr20902359f8f.31.1776689475350; Mon, 20 Apr 2026 05:51:15 -0700 (PDT) Received: from redhat.com (IGLD-80-230-25-21.inter.net.il. [80.230.25.21]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-43fe4e4d5b1sm32131117f8f.30.2026.04.20.05.51.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Apr 2026 05:51:14 -0700 (PDT) Date: Mon, 20 Apr 2026 08:51:13 -0400 From: "Michael S. Tsirkin" To: linux-kernel@vger.kernel.org Cc: Andrew Morton , David Hildenbrand , Vlastimil Babka , Brendan Jackman , Michal Hocko , Suren Baghdasaryan , Jason Wang , Andrea Arcangeli , linux-mm@kvack.org, virtualization@lists.linux.dev Subject: [PATCH RFC v2 00/18] mm/virtio: skip redundant zeroing of host-zeroed reported pages Message-ID: MIME-Version: 1.0 X-Mailer: git-send-email 2.27.0.106.g8ac3dc51b1 X-Mutt-Fcc: =sent X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: vsRO2eDWaAnmh9900OTtQIhnCReKMKFAEY-0Snx5KCk_1776689476 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Rspamd-Queue-Id: 5DC5314000E X-Rspamd-Server: rspam12 X-Stat-Signature: qwhp4cqpbz6kz5n6jycf3u3gnmqi6nkg X-Rspam-User: X-HE-Tag: 1776689479-48447 X-HE-Meta: U2FsdGVkX19uJgqzVQH/w/NiiCUaxlASoaArEDQiAIXQi9Q5kQxUymmIWTzHXGzxIEGVEpJn9Hw5WxMICsm16Ga0LEqaXnD+UXTSr/pOSq/2+M+ETeuSMPVixiFgz8igiYJR0e259hc7fD4oOfQq1r+mtcmJS5lqzupuwLhiKTnxjiNbZXPxS9nyMiaQ9n+pzo4zC3QvokHhdUzGhRm4yQ1CJlKRktolVstntSmlRwLWE1egpahZiB3xJ9rA23L8o94mcRUEwGreABMnXl1U7/RynIp6pJH5bg2ActZO5Cw+OKwb5EpfznJ3ZSogZ+oJP+RkTK2VnrdFtGr+oMJEKzZ8xkmqktK2jYvSxouM4qd4X8jruE40LFxqqipa5kUmDABF/kP7lYu0jgzljk9ZSGFZImD5CFGUNzaYuUtnDl4fKrIUQcPi6kPXZV3WWNqfaotZlwWktS7bKgKXWlmsT3nGdo10F5W4sFglnQXSRQSb5a4AAA1UhiKkUXZnU0+muYK7KF7V/fW85qP76Q12s9F7u4gJ480FHKTtmsQOqVsWilYdY5GlbVphGqL/jEAuBqwIgENIDGw4Lc9voy4IiEC2xmTzgPVsmbEJq9p9hHtgLFIEWvYSJdsqno8D9SPGgwRczop9bv7dfPAp8sCLUS1daEYEjAF3nu0GuGd+2u+yQabG4tlpb4Atyz5LvI5Y2l/Y9GPKc8wlMawhLJgreHbFBum0k9wMk+qXP7gJR9z1WqnvJvkrMbMSNF4lYibRtvmIX4iz6NnhnsriMQLKyUcsa3c/PVqsk1Je1nAl0p040b/zJsNhkRI01a3H7DiM2vT3fMAGgiafuIy8Ib6x6FTLbezuHWKzTwdaaTKTCnbhtckdbiUKLwlS56d7+iFqpzsU3qGiQyQcGAZj3zX7E/FIf67vvEY5tmzWyPUMzl2EqIyy6zqa/XeU/xRz/ygt/DcAGbcYwK70d0THNhD lvFvwMEf EDna8V2rOXPyoVz5BNrmhjVWseeFgRtsU3XevFF5aBhfh000PLUUhgCsfIqP+vZndUIzErKBIrVsWs1xKOOK/wPdM+gBu30UHtu5aXUh8aY7paCsUxEHEGyKuXojGz6uew9U7dbXfCNdFeJp8Jr5qbN7JSw8C8UzKrD1C2YQh68OkSGi/PIThueSekhUFk9gZw5Ytm6wypa8wrTxiG1Jv4T08pQClvD946IAtvjnSWqDukifwPTQN0B5HAXyspY+UVY4W2soZiaNVrxdv08oInBzMUynnKU9jpBiXkPT5Si5jeg6d0v9sPAY5n+1leblCXoqI Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: v2 - this is an attempt to address David Hildenbrand's comments: overloading GFP and using page->private, support for balloon deflate. I hope this one is acceptable, API wise. I also went ahead and implemented an alternative approach that David suggested: using GFP_ZERO to zero userspace pages. The issue is simple: on some architectures, one has to know the userspace fault address in order to flush the cache. So, I had to propagate the fault address everywhere. A lot of churn, and my concern is, if we miss even one place, silent, subtle data corruption will result and only on some arches (x86 will be fine). Still, you can view that approach here: https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git gfp_zero David, if you still feel I should switch to that approach, let me know. Personally, I'd rather keep that as a separate project from this optimization. Still an RFC as virtio bits need work, but I would very much like to get a general agreement on mm bits first. Thanks! Patch 1 is a minor optimization that I am carrying here to avoid conflicts. It might make sense to merge it straight away. ------- When a guest reports free pages to the hypervisor via virtio-balloon's free page reporting, the host typically zeros those pages when reclaiming their backing memory (e.g., via MADV_DONTNEED on anonymous mappings). When the guest later reallocates those pages, the kernel zeros them again -- redundantly. This series eliminates that double-zeroing by propagating the "host already zeroed this page" information through the buddy allocator and into the page fault path. Performance with THP enabled on a 2GB VM, 1 vCPU, allocating 256MB of anonymous pages: metric baseline optimized delta task-clock 191 +- 31 ms 60 +- 35 ms -68% cache-misses 1.10M +- 460K 269K +- 31K -76% instructions 4.54M +- 275K 4.10M +- 130K -10% With hugetlb surplus pages: metric baseline optimized delta task-clock 183 +- 24 ms 45 +- 23 ms -76% cache-misses 1.27M +- 544K 270K +- 16K -79% instructions 5.37M +- 254K 4.94M +- 155K -8% Notes: - The virtio_balloon module parameter (15/18) is a testing hack. A proper virtio feature flag is needed before merging. - Patch 16/18 adds a sysfs flush trigger for deterministic testing (avoids waiting for the 2-second reporting delay). - When host_zeroes_pages is set, callers skip folio_zero_user() for pages known to be zeroed by the host. This is safe on all architectures because the hypervisor invalidates guest cache lines when reclaiming page backing (MADV_DONTNEED). - PG_zeroed is aliased to PG_private. It is excluded from PAGE_FLAGS_CHECK_AT_PREP because it must survive on free-list pages until post_alloc_hook() consumes and clears it. Is this acceptable, or should a different bit be used? - The optimization is most effective with THP, where entire 2MB pages are allocated directly from reported order-9+ buddy pages. Without THP, only ~21% of order-0 allocations come from reported pages due to low-order fragmentation. - Persistent hugetlb pool pages are not covered: when freed by userspace they return to the hugetlb free pool, not the buddy allocator, so they are never reported to the host. Surplus hugetlb pages are allocated from buddy and do benefit. Test program: #include #include #include #include #ifndef MADV_POPULATE_WRITE #define MADV_POPULATE_WRITE 23 #endif #ifndef MAP_HUGETLB #define MAP_HUGETLB 0x40000 #endif int main(int argc, char **argv) { unsigned long size; int flags = MAP_PRIVATE | MAP_ANONYMOUS; void *p; int r; if (argc < 2) { fprintf(stderr, "usage: %s [huge]\n", argv[0]); return 1; } size = atol(argv[1]) * 1024UL * 1024; if (argc >= 3 && strcmp(argv[2], "huge") == 0) flags |= MAP_HUGETLB; p = mmap(NULL, size, PROT_READ | PROT_WRITE, flags, -1, 0); if (p == MAP_FAILED) { perror("mmap"); return 1; } r = madvise(p, size, MADV_POPULATE_WRITE); if (r) { perror("madvise"); return 1; } munmap(p, size); return 0; } Test script (bench.sh): #!/bin/bash # Usage: bench.sh [huge] # mode 0 = baseline, mode 1 = skip zeroing SZ=${1:-256}; MODE=${2:-0}; ITER=${3:-10}; HUGE=${4:-} FLUSH=/sys/module/page_reporting/parameters/flush PERF_DATA=/tmp/perf-$MODE.csv rmmod virtio_balloon 2>/dev/null insmod virtio_balloon.ko host_zeroes_pages=$MODE echo 512 > $FLUSH [ "$HUGE" = "huge" ] && echo $((SZ/2)) > /proc/sys/vm/nr_overcommit_hugepages rm -f $PERF_DATA echo "=== sz=${SZ}MB mode=$MODE iter=$ITER $HUGE ===" for i in $(seq 1 $ITER); do echo 3 > /proc/sys/vm/drop_caches echo 512 > $FLUSH perf stat -e task-clock,instructions,cache-misses \ -x, -o $PERF_DATA --append -- ./alloc_once $SZ $HUGE done [ "$HUGE" = "huge" ] && echo 0 > /proc/sys/vm/nr_overcommit_hugepages rmmod virtio_balloon awk -F, '/^#/||/^$/{next}{v=$1+0;e=$3;gsub(/ /,"",e);s[e]+=v;n[e]++} END{for(e in s)printf " %-16s %10.2f (n=%d)\n",e,s[e]/n[e],n[e]}' $PERF_DATA Compile and run: gcc -static -O2 -o alloc_once alloc_once.c bash bench.sh 256 0 10 # baseline (regular pages) bash bench.sh 256 1 10 # optimized (regular pages) bash bench.sh 256 0 10 huge # baseline (hugetlb surplus) bash bench.sh 256 1 10 huge # optimized (hugetlb surplus) Changes since v1: - Replaced __GFP_PREZEROED with PG_zeroed page flag (aliased PG_private) - Added pghint_t type and vma_alloc_folio_hints() API - Track PG_zeroed across buddy merges and splits - Added post_alloc_hook integration (single consume/clear point) - Added hugetlb support (pool pages + memfd) - Added page_reporting flush parameter for deterministic testing - Added free_frozen_pages_hint/put_page_hint for balloon deflate path - Added try_to_claim_block PG_zeroed preservation - Updated perf numbers with per-iteration flush methodology Michael S. Tsirkin (18): mm: page_alloc: propagate PageReported flag across buddy splits mm: add pghint_t type and vma_alloc_folio_hints API mm: add PG_zeroed page flag for known-zero pages mm: page_alloc: track PG_zeroed across buddy merges mm: page_alloc: preserve PG_zeroed in try_to_claim_block mm: page_alloc: thread pghint_t through get_page_from_freelist mm: post_alloc_hook: use PG_zeroed to skip zeroing, return pghint_t mm: hugetlb: thread pghint_t through buddy allocation chain mm: hugetlb: use PG_zeroed for pool pages, skip redundant zeroing mm: page_reporting: support host-zeroed reported pages mm: skip zeroing in vma_alloc_zeroed_movable_folio for pre-zeroed pages Michael S. Tsirkin (18): mm: page_alloc: propagate PageReported flag across buddy splits mm: add pghint_t type and vma_alloc_folio_hints API mm: add PG_zeroed page flag for known-zero pages mm: page_alloc: track PG_zeroed across buddy merges mm: page_alloc: preserve PG_zeroed in try_to_claim_block mm: page_alloc: thread pghint_t through get_page_from_freelist mm: post_alloc_hook: use PG_zeroed to skip zeroing, return pghint_t mm: hugetlb: thread pghint_t through buddy allocation chain mm: hugetlb: use PG_zeroed for pool pages, skip redundant zeroing mm: page_reporting: support host-zeroed reported pages mm: skip zeroing in vma_alloc_zeroed_movable_folio for pre-zeroed pages mm: skip zeroing in alloc_anon_folio for pre-zeroed pages mm: skip zeroing in vma_alloc_anon_folio_pmd for pre-zeroed pages mm: memfd: skip zeroing for pre-zeroed hugetlb pages virtio_balloon: add host_zeroes_pages module parameter mm: page_reporting: add flush parameter with page budget mm: add free_frozen_pages_hint and put_page_hint APIs virtio_balloon: mark deflated pages as pre-zeroed drivers/virtio/virtio_balloon.c | 11 ++- fs/hugetlbfs/inode.c | 5 +- include/linux/gfp.h | 17 +++++ include/linux/highmem.h | 6 +- include/linux/hugetlb.h | 6 +- include/linux/mm.h | 12 +++ include/linux/page-flags.h | 13 +++- include/linux/page_reporting.h | 3 + mm/compaction.c | 4 +- mm/huge_memory.c | 12 +-- mm/hugetlb.c | 52 +++++++++---- mm/internal.h | 7 +- mm/memfd.c | 12 +-- mm/memory.c | 14 ++-- mm/mempolicy.c | 85 +++++++++++++++++++++ mm/page_alloc.c | 131 ++++++++++++++++++++++++-------- mm/page_reporting.c | 55 +++++++++++++- mm/page_reporting.h | 11 +++ mm/swap.c | 19 +++++ 19 files changed, 392 insertions(+), 83 deletions(-) -- MST