From: "Li Zhe" <lizhe.67@bytedance.com>
To: <gourry@gourry.net>
Cc: <akpm@linux-foundation.org>, <ankur.a.arora@oracle.com>,
<dan.j.williams@intel.com>, <dave@stgolabs.net>,
<david.laight.linux@gmail.com>, <david@kernel.org>,
<fvdl@google.com>, <joao.m.martins@oracle.com>,
<jonathan.cameron@huawei.com>, <linux-cxl@vger.kernel.org>,
<linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>,
<lizhe.67@bytedance.com>, <mhocko@suse.com>, <mjguzik@gmail.com>,
<muchun.song@linux.dev>, <osalvador@suse.de>,
<raghavendra.kt@amd.com>, <wangzhou1@hisilicon.com>,
<zhanjie9@hisilicon.com>
Subject: Re: [PATCH v2 0/8] Introduce a huge-page pre-zeroing mechanism
Date: Wed, 21 Jan 2026 16:03:48 +0800 [thread overview]
Message-ID: <20260121080348.36253-1-lizhe.67@bytedance.com> (raw)
In-Reply-To: <aW_G66HeWLbyiPHs@gourry-fedora-PF4VCD3F>
On Tue, 20 Jan 2026 13:18:19 -0500, gourry@gourry.net wrote:
> On Tue, Jan 20, 2026 at 06:39:48PM +0800, Li Zhe wrote:
> > On Tue, 20 Jan 2026 09:47:44 +0000, david.laight.linux@gmail.com wrote:
> >
> > > On Tue, 20 Jan 2026 14:27:06 +0800
> > > "Li Zhe" <lizhe.67@bytedance.com> wrote:
> > >
> > > > In light of the preceding discussion, we appear to have reached the
> > > > following understanding:
> > > >
> > > > (1) At present we prefer to mitigate slow application startup (e.g.,
> > > > VM creation) by zeroing huge pages at the moment they are freed
> > > > (init_on_free). The principal benefit is that user space gains the
> > > > performance improvement without deploying any additional user space
> > > > daemon.
> > >
> > > Am I missing something?
> > > If userspace does:
> > > $ program_a; program_b
> > > and pages used by program_a are zeroed when it exits you get the delay
> > > for zeroing all the pages it used before program_b starts.
> > > OTOH if the zeroing is deferred program_b only needs to zero the pages
> > > it needs to start (and there may be some lurking).
> >
> > Under the init_on-free approach, improving the speed of zeroing may
> > indeed prove necessary.
> >
> > However, I believe we should first reach consensus on adopting
> > "init_on_free" as the solution to slow application startup before
> > turning to performance tuning.
> >
>
> His point was init_on_free may not actually reduce any delays on serial
> applications, and can actually introduce additional delays.
>
> Example
> -------
> program_a: alloc_hugepages(10);
> exit();
>
> program b: alloc_hugepages(5);
> exit();
>
> /* Run programs in serial */
> sh: program_a && program_b
>
> in zero_on_alloc():
> program_a eats zero(10) cost on startup
> program_b eats zero(5) cost on startup
> Overall zero(15) cost to start program_b
>
> in zero_on_free()
> program_a eats zero(10) cost on startup
> program_a eats zero(10) cost on exit
> program_b eats zero(0) cost on startup
> Overall zero(20) cost to start program_b
>
> zero_on_free is worse by zero(5)
> -------
>
> This is a trivial example, but it's unclear zero_on_free actually
> provides a benefit. You have to know ahead of time what the runtime
> behavior, pre-zeroed count, and allocation pattern (0->10->5->...) would
> be to determine whether there's an actual reduction in startup time.
>
> But just trivially, starting from the base case of no pages being
> zeroed, you're just injecting an additional zero(X) cost if program_a()
> consumes more hugepages than program_b().
>
> Long way of saying the shift from alloc to free seems heuristic-y and
> you need stronger analysis / better data to show this change is actually
> beneficial in the general case.
I understand your concern. At some point some process must pay the
cost of zeroing, and the optimal strategy is inevitably
workload-dependent.
Our "zero-on-free for huge pages" draws on the existing kernel
init_on_free mechanism. Of course, it may prove sub-optimal in certain
scenarios.
Consistent with "provide tools, not policy", perhaps the decision is
better left to user space. And that is exactly what this patchset
does. Requiring a userspace daemon to decide when to zero pages
certainly adds complexity, but it also gives administrators a single,
flexible knob that can be tuned for any workload.
Thanks,
Zhe
next prev parent reply other threads:[~2026-01-21 8:04 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-07 11:31 Li Zhe
2026-01-07 11:31 ` [PATCH v2 1/8] mm/hugetlb: add pre-zeroed framework Li Zhe
2026-01-07 11:31 ` [PATCH v2 2/8] mm/hugetlb: convert to prep_account_new_hugetlb_folio() Li Zhe
2026-01-07 11:31 ` [PATCH v2 3/8] mm/hugetlb: move the huge folio to the end of the list during enqueue Li Zhe
2026-01-07 11:31 ` [PATCH v2 4/8] mm/hugetlb: introduce per-node sysfs interface "zeroable_hugepages" Li Zhe
2026-01-07 11:31 ` [PATCH v2 5/8] mm/hugetlb: simplify function hugetlb_sysfs_add_hstate() Li Zhe
2026-01-07 11:31 ` [PATCH v2 6/8] mm/hugetlb: relocate the per-hstate struct kobject pointer Li Zhe
2026-01-07 11:31 ` [PATCH v2 7/8] mm/hugetlb: add epoll support for interface "zeroable_hugepages" Li Zhe
2026-01-07 11:31 ` [PATCH v2 8/8] mm/hugetlb: limit event generation frequency of function do_zero_free_notify() Li Zhe
2026-01-07 16:19 ` [PATCH v2 0/8] Introduce a huge-page pre-zeroing mechanism Andrew Morton
2026-01-12 11:25 ` Li Zhe
2026-01-09 6:05 ` Muchun Song
2026-01-12 11:27 ` Li Zhe
2026-01-12 19:52 ` David Hildenbrand (Red Hat)
2026-01-13 6:37 ` Li Zhe
2026-01-13 10:15 ` David Hildenbrand (Red Hat)
2026-01-13 12:41 ` Li Zhe
2026-01-14 10:41 ` David Hildenbrand (Red Hat)
2026-01-14 11:36 ` Li Zhe
2026-01-14 11:55 ` David Hildenbrand (Red Hat)
2026-01-14 12:11 ` Mateusz Guzik
2026-01-14 12:33 ` David Hildenbrand (Red Hat)
2026-01-14 12:41 ` David Hildenbrand (Red Hat)
2026-01-14 13:06 ` Mateusz Guzik
2026-01-14 17:21 ` David Hildenbrand (Red Hat)
2026-01-15 9:36 ` Li Zhe
2026-01-15 11:08 ` David Hildenbrand (Red Hat)
2026-01-15 11:57 ` Jonathan Cameron
2026-01-15 17:08 ` David Hildenbrand (Red Hat)
2026-01-15 20:16 ` dan.j.williams
2026-01-15 20:22 ` David Hildenbrand (Red Hat)
2026-01-15 22:30 ` Ankur Arora
2026-01-20 6:27 ` Li Zhe
2026-01-20 9:47 ` David Laight
2026-01-20 10:39 ` Li Zhe
2026-01-20 18:18 ` Gregory Price
2026-01-20 18:38 ` Gregory Price
2026-01-20 19:30 ` David Laight
2026-01-20 19:52 ` Gregory Price
2026-01-21 8:03 ` Li Zhe [this message]
2026-01-21 12:41 ` David Hildenbrand (Red Hat)
2026-01-21 12:32 ` David Hildenbrand (Red Hat)
2026-01-12 22:00 ` Ankur Arora
2026-01-13 6:39 ` Li Zhe
2026-01-12 22:01 ` Ankur Arora
2026-01-13 6:41 ` Li Zhe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260121080348.36253-1-lizhe.67@bytedance.com \
--to=lizhe.67@bytedance.com \
--cc=akpm@linux-foundation.org \
--cc=ankur.a.arora@oracle.com \
--cc=dan.j.williams@intel.com \
--cc=dave@stgolabs.net \
--cc=david.laight.linux@gmail.com \
--cc=david@kernel.org \
--cc=fvdl@google.com \
--cc=gourry@gourry.net \
--cc=joao.m.martins@oracle.com \
--cc=jonathan.cameron@huawei.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=mjguzik@gmail.com \
--cc=muchun.song@linux.dev \
--cc=osalvador@suse.de \
--cc=raghavendra.kt@amd.com \
--cc=wangzhou1@hisilicon.com \
--cc=zhanjie9@hisilicon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox