From: "Li Zhe" <lizhe.67@bytedance.com>
To: <david@kernel.org>
Cc: <akpm@linux-foundation.org>, <dan.j.williams@intel.com>,
<dave@stgolabs.net>, <ankur.a.arora@oracle.com>,
<fvdl@google.com>, <gourry@gourry.net>,
<joao.m.martins@oracle.com>, <jonathan.cameron@huawei.com>,
<linux-cxl@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<linux-mm@kvack.org>, <lizhe.67@bytedance.com>,
<mhocko@suse.com>, <mjguzik@gmail.com>, <muchun.song@linux.dev>,
<osalvador@suse.de>, <raghavendra.kt@amd.com>,
<wangzhou1@hisilicon.com>, <zhanjie9@hisilicon.com>
Subject: Re: [PATCH v2 0/8] Introduce a huge-page pre-zeroing mechanism
Date: Tue, 20 Jan 2026 14:27:06 +0800 [thread overview]
Message-ID: <20260120062706.91078-1-lizhe.67@bytedance.com> (raw)
In-Reply-To: <87wm1ih5kb.fsf@oracle.com>
In light of the preceding discussion, we appear to have reached the
following understanding:
(1) At present we prefer to mitigate slow application startup (e.g.,
VM creation) by zeroing huge pages at the moment they are freed
(init_on_free). The principal benefit is that user space gains the
performance improvement without deploying any additional user space
daemon.
(2) Deferring the zeroing from allocation to release may occasionally
cause the thread that frees the page to differ from the one that
originally allocates it, so the clearing cost is not charged to the
allocating thread. Because this situation is rare and the existing
init_on_free mechanism in the kernel already exhibits the same
behavior, we deem the consequence acceptable.
(3) The function __unmap_hugepage_range() employs the MMU-gather
mechanism, which refrains from dropping the page reference while
holding the PTL (spinlock). This allows huge-page zeroing to be
performed in a non-atomic context.
(4) Given that, in the vast majority of cases, the same thread that
allocates a huge page also frees it, and the exceptions highlighted
by David are genuinely rare[1]. We can achieve faster application
startup by implementing an init_on_free-style mechanism.
(5) Going forward we can further optimize the zeroing process by
leveraging a DMA engine.
If the foregoing is accurate, I propose we add a new hugetlbfs mount
option to achieve the init-on-free behavior.
Thanks,
Zhe
[1]: https://lore.kernel.org/all/83798495-915b-4a5d-9638-f5b3de913b71@kernel.org/#t
next prev parent reply other threads:[~2026-01-20 6:27 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-07 11:31 Li Zhe
2026-01-07 11:31 ` [PATCH v2 1/8] mm/hugetlb: add pre-zeroed framework Li Zhe
2026-01-07 11:31 ` [PATCH v2 2/8] mm/hugetlb: convert to prep_account_new_hugetlb_folio() Li Zhe
2026-01-07 11:31 ` [PATCH v2 3/8] mm/hugetlb: move the huge folio to the end of the list during enqueue Li Zhe
2026-01-07 11:31 ` [PATCH v2 4/8] mm/hugetlb: introduce per-node sysfs interface "zeroable_hugepages" Li Zhe
2026-01-07 11:31 ` [PATCH v2 5/8] mm/hugetlb: simplify function hugetlb_sysfs_add_hstate() Li Zhe
2026-01-07 11:31 ` [PATCH v2 6/8] mm/hugetlb: relocate the per-hstate struct kobject pointer Li Zhe
2026-01-07 11:31 ` [PATCH v2 7/8] mm/hugetlb: add epoll support for interface "zeroable_hugepages" Li Zhe
2026-01-07 11:31 ` [PATCH v2 8/8] mm/hugetlb: limit event generation frequency of function do_zero_free_notify() Li Zhe
2026-01-07 16:19 ` [PATCH v2 0/8] Introduce a huge-page pre-zeroing mechanism Andrew Morton
2026-01-12 11:25 ` Li Zhe
2026-01-09 6:05 ` Muchun Song
2026-01-12 11:27 ` Li Zhe
2026-01-12 19:52 ` David Hildenbrand (Red Hat)
2026-01-13 6:37 ` Li Zhe
2026-01-13 10:15 ` David Hildenbrand (Red Hat)
2026-01-13 12:41 ` Li Zhe
2026-01-14 10:41 ` David Hildenbrand (Red Hat)
2026-01-14 11:36 ` Li Zhe
2026-01-14 11:55 ` David Hildenbrand (Red Hat)
2026-01-14 12:11 ` Mateusz Guzik
2026-01-14 12:33 ` David Hildenbrand (Red Hat)
2026-01-14 12:41 ` David Hildenbrand (Red Hat)
2026-01-14 13:06 ` Mateusz Guzik
2026-01-14 17:21 ` David Hildenbrand (Red Hat)
2026-01-15 9:36 ` Li Zhe
2026-01-15 11:08 ` David Hildenbrand (Red Hat)
2026-01-15 11:57 ` Jonathan Cameron
2026-01-15 17:08 ` David Hildenbrand (Red Hat)
2026-01-15 20:16 ` dan.j.williams
2026-01-15 20:22 ` David Hildenbrand (Red Hat)
2026-01-15 22:30 ` Ankur Arora
2026-01-20 6:27 ` Li Zhe [this message]
2026-01-20 9:47 ` David Laight
2026-01-20 10:39 ` Li Zhe
2026-01-20 18:18 ` Gregory Price
2026-01-20 18:38 ` Gregory Price
2026-01-20 19:30 ` David Laight
2026-01-20 19:52 ` Gregory Price
2026-01-21 8:03 ` Li Zhe
2026-01-21 12:41 ` David Hildenbrand (Red Hat)
2026-01-21 12:32 ` David Hildenbrand (Red Hat)
2026-01-12 22:00 ` Ankur Arora
2026-01-13 6:39 ` Li Zhe
2026-01-12 22:01 ` Ankur Arora
2026-01-13 6:41 ` Li Zhe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260120062706.91078-1-lizhe.67@bytedance.com \
--to=lizhe.67@bytedance.com \
--cc=akpm@linux-foundation.org \
--cc=ankur.a.arora@oracle.com \
--cc=dan.j.williams@intel.com \
--cc=dave@stgolabs.net \
--cc=david@kernel.org \
--cc=fvdl@google.com \
--cc=gourry@gourry.net \
--cc=joao.m.martins@oracle.com \
--cc=jonathan.cameron@huawei.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=mjguzik@gmail.com \
--cc=muchun.song@linux.dev \
--cc=osalvador@suse.de \
--cc=raghavendra.kt@amd.com \
--cc=wangzhou1@hisilicon.com \
--cc=zhanjie9@hisilicon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox