From: "Li Zhe" <lizhe.67@bytedance.com>
To: <david@kernel.org>
Cc: <akpm@linux-foundation.org>, <ankur.a.arora@oracle.com>,
<fvdl@google.com>, <joao.m.martins@oracle.com>,
<linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>,
<lizhe.67@bytedance.com>, <mhocko@suse.com>, <mjguzik@gmail.com>,
<muchun.song@linux.dev>, <osalvador@suse.de>,
<raghavendra.kt@amd.com>
Subject: Re: [PATCH v2 0/8] Introduce a huge-page pre-zeroing mechanism
Date: Thu, 15 Jan 2026 17:36:41 +0800 [thread overview]
Message-ID: <20260115093641.44404-1-lizhe.67@bytedance.com> (raw)
In-Reply-To: <9daa39e6-9653-45cc-8c00-abf5f3bae974@kernel.org>
On Wed, 14 Jan 2026 18:21:08 +0100, david@kernel.org wrote:
> >> But again, I think the main motivation here is "increase application
> >> startup", not optimize that the zeroing happens at specific points in
> >> time during system operation (e.g., when idle etc).
> >>
> >
> > Framing this as "increase application startup" and merely shifting the
> > overhead to shutdown seems like gaming the problem statement to me.
> > The real problem is total real time spent on it while pages are
> > needed.
> >
> > Support for background zeroing can give you more usable pages provided
> > it has the cpu + ram to do it. If it does not, you are in the worst
> > case in the same spot as with zeroing on free.
> >
> > Let's take a look at some examples.
> >
> > Say there are no free huge pages and you kill a vm + start a new one.
> > On top of that all CPUs are pegged as is. In this case total time is
> > the same for "zero on free" as it is for background zeroing.
>
> Right. If the pages get freed to immediately get allocated again, it
> doesn't really matter who does the freeing. There might be some details,
> of course.
>
> >
> > Say the system is freshly booted and you start up a vm. There are no
> > pre-zeroed pages available so it suffers at start time no matter what.
> > However, with some support for background zeroing, the machinery could
> > respond to demand and do it in parallel in some capacity, shortening
> > the real time needed.
>
> Just like for init_on_free, I would start with zeroing these pages
> during boot.
>
> init_on_free assures that all pages in the buddy were zeroed out. Which
> greatly simplifies the implementation, because there is no need to track
> what was initialized and what was not.
>
> It's a good question if initialization during that should be done in
> parallel, possibly asynchronously during boot. Reminds me a bit of
> deferred page initialization during boot. But that is rather an
> extension that could be added somewhat transparently on top later.
>
> If ever required we could dynamically enable this setting for a running
> system. Whoever would enable it (flips the magic toggle) would zero out
> all hugetlb pages that are already in the hugetlb allocator as free, but
> not initialized yet.
>
> But again, these are extensions on top of the basic design of having all
> free hugetlb folios be zeroed.
>
> >
> > Say a little bit of real time passes and you start another vm. With
> > merely zeroing on free there are still no pre-zeroed pages available
> > so it again suffers the overhead. With background zeroing some of the
> > that memory would be already sorted out, speeding up said startup.
>
> The moment they end up in the hugetlb allocator as free folios they
> would have to get initialized.
>
> Now, I am sure there are downsides to this approach (how to speedup
> process exit by parallelizing zeroing, if ever required)? But it sounds
> like being a bit ... simpler without user space changes required. In
> theory :)
I strongly agree that init_on_free strategy effectively eliminates the
latency incurred during VM creation. However, it appears to introduce
two new issues.
First, the process that later allocates a page may not be the one that
freed it, raising the question of which process should bear the cost
of zeroing.
Second, put_page() is executed atomically, making it inappropriate to
invoke clear_page() within that context; off-loading the zeroing to a
workqueue merely reopens the same accounting problem.
Do you have any recommendations regarding these issues?
Thanks,
Zhe
next prev parent reply other threads:[~2026-01-15 9:37 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-07 11:31 Li Zhe
2026-01-07 11:31 ` [PATCH v2 1/8] mm/hugetlb: add pre-zeroed framework Li Zhe
2026-01-07 11:31 ` [PATCH v2 2/8] mm/hugetlb: convert to prep_account_new_hugetlb_folio() Li Zhe
2026-01-07 11:31 ` [PATCH v2 3/8] mm/hugetlb: move the huge folio to the end of the list during enqueue Li Zhe
2026-01-07 11:31 ` [PATCH v2 4/8] mm/hugetlb: introduce per-node sysfs interface "zeroable_hugepages" Li Zhe
2026-01-07 11:31 ` [PATCH v2 5/8] mm/hugetlb: simplify function hugetlb_sysfs_add_hstate() Li Zhe
2026-01-07 11:31 ` [PATCH v2 6/8] mm/hugetlb: relocate the per-hstate struct kobject pointer Li Zhe
2026-01-07 11:31 ` [PATCH v2 7/8] mm/hugetlb: add epoll support for interface "zeroable_hugepages" Li Zhe
2026-01-07 11:31 ` [PATCH v2 8/8] mm/hugetlb: limit event generation frequency of function do_zero_free_notify() Li Zhe
2026-01-07 16:19 ` [PATCH v2 0/8] Introduce a huge-page pre-zeroing mechanism Andrew Morton
2026-01-12 11:25 ` Li Zhe
2026-01-09 6:05 ` Muchun Song
2026-01-12 11:27 ` Li Zhe
2026-01-12 19:52 ` David Hildenbrand (Red Hat)
2026-01-13 6:37 ` Li Zhe
2026-01-13 10:15 ` David Hildenbrand (Red Hat)
2026-01-13 12:41 ` Li Zhe
2026-01-14 10:41 ` David Hildenbrand (Red Hat)
2026-01-14 11:36 ` Li Zhe
2026-01-14 11:55 ` David Hildenbrand (Red Hat)
2026-01-14 12:11 ` Mateusz Guzik
2026-01-14 12:33 ` David Hildenbrand (Red Hat)
2026-01-14 12:41 ` David Hildenbrand (Red Hat)
2026-01-14 13:06 ` Mateusz Guzik
2026-01-14 17:21 ` David Hildenbrand (Red Hat)
2026-01-15 9:36 ` Li Zhe [this message]
2026-01-15 11:08 ` David Hildenbrand (Red Hat)
2026-01-15 11:57 ` Jonathan Cameron
2026-01-15 17:08 ` David Hildenbrand (Red Hat)
2026-01-15 20:16 ` dan.j.williams
2026-01-15 20:22 ` David Hildenbrand (Red Hat)
2026-01-15 22:30 ` Ankur Arora
2026-01-20 6:27 ` Li Zhe
2026-01-20 9:47 ` David Laight
2026-01-20 10:39 ` Li Zhe
2026-01-20 18:18 ` Gregory Price
2026-01-20 18:38 ` Gregory Price
2026-01-20 19:30 ` David Laight
2026-01-20 19:52 ` Gregory Price
2026-01-21 8:03 ` Li Zhe
2026-01-21 12:41 ` David Hildenbrand (Red Hat)
2026-01-21 12:32 ` David Hildenbrand (Red Hat)
2026-01-12 22:00 ` Ankur Arora
2026-01-13 6:39 ` Li Zhe
2026-01-12 22:01 ` Ankur Arora
2026-01-13 6:41 ` Li Zhe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260115093641.44404-1-lizhe.67@bytedance.com \
--to=lizhe.67@bytedance.com \
--cc=akpm@linux-foundation.org \
--cc=ankur.a.arora@oracle.com \
--cc=david@kernel.org \
--cc=fvdl@google.com \
--cc=joao.m.martins@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=mjguzik@gmail.com \
--cc=muchun.song@linux.dev \
--cc=osalvador@suse.de \
--cc=raghavendra.kt@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox