linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ankur Arora <ankur.a.arora@oracle.com>
To: Li Zhe <lizhe.67@bytedance.com>
Cc: muchun.song@linux.dev, akpm@linux-foundation.org,
	david@kernel.org, fvdl@google.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, osalvador@suse.de, mjguzik@gmail.com,
	mhocko@suse.com, joao.m.martins@oracle.com,
	ankur.a.arora@oracle.com, raghavendra.kt@amd.com
Subject: Re: [PATCH v2 0/8] Introduce a huge-page pre-zeroing mechanism
Date: Mon, 12 Jan 2026 14:00:23 -0800	[thread overview]
Message-ID: <87qzrujxu0.fsf@oracle.com> (raw)
In-Reply-To: <20260112112728.94590-1-lizhe.67@bytedance.com>


Li Zhe <lizhe.67@bytedance.com> writes:

> On Fri, 9 Jan 2026 14:05:01 +0800, muchun.song@linux.dev wrote:
>
>> > On Jan 7, 2026, at 19:31, Li Zhe <lizhe.67@bytedance.com> wrote:
>> >
>> > This patchset is based on this commit[1]("mm/hugetlb: optionally
>> > pre-zero hugetlb pages").
>>
>> I’d like you to add a brief summary here that roughly explains
>> what concerns the previous attempts raised and whether the
>> current proposal has already addressed those concerns, so more
>> people can quickly grasp the context.
>
> In my opinion, the main concerns raised in the preceding discussion[1]
> may be summarized as follows:
>
> (1): The CPU cost of background zeroing is not attributable to the
> task that consumes the pages, breaking fairness and cgroup accounting.
>
> (2) Policy (when, how many threads) is hard-coded in the kernel. User
> space lacks adequate means of control.
>
> (3) Comparable functionality is already available in user space. (QEMU
> support parallel preallocation)
>
> (4) Faster zeroing method is provied in kernel[2].
>
> In my view, these concerns have already been addressed by this patchset.
>
> It merely supplies the tools and leaves all policy decisions to user
> space; the kernel just performs the zeroing on behalf of the user,
> thereby resolving concerns (1) and (2).
>
> Regarding concern (3), I am aware that QEMU has implemented a parallel
> page-touch mechanism, which does reduce VM creation time; nevertheless,
> in our measurements it still consumes a non-trivial amount of time.
> (According to feedback from QEMU colleagues, bringing up a 2 TB VM
> still requires more than 40 seconds for zeroing)
>
>> > Fresh hugetlb pages are zeroed out when they are faulted in,
>> > just like with all other page types. This can take up a good
>> > amount of time for larger page sizes (e.g. around 250
>> > milliseconds for a 1G page on a Skylake machine).
>> >
>> > This normally isn't a problem, since hugetlb pages are typically
>> > mapped by the application for a long time, and the initial
>> > delay when touching them isn't much of an issue.
>> >
>> > However, there are some use cases where a large number of hugetlb
>> > pages are touched when an application starts (such as a VM backed
>> > by these pages), rendering the launch noticeably slow.
>> >
>> > On an Skylake platform running v6.19-rc2, faulting in 64 × 1 GB huge
>> > pages takes about 16 seconds, roughly 250 ms per page. Even with
>> > Ankur’s optimizations[2], the time drops only to ~13 seconds,
>> > ~200 ms per page, still a noticeable delay.
>
> As for concern (4), I believe it is orthogonal to this patchset, and
> the cover letter already contains a performance comparison that
> demonstrates the additional benefit.

That comparison isn't quite apples to apples though. In the fault
workoad above, you are looking at single threaded zeroing but
realistically clearing pages at VM init is multi-threaded (QEMU does
that as David describes).

Also Skylake has probably one of the slowest REP; STOS implementations
I've tried.

--
ankur


  parent reply	other threads:[~2026-01-12 22:00 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-07 11:31 Li Zhe
2026-01-07 11:31 ` [PATCH v2 1/8] mm/hugetlb: add pre-zeroed framework Li Zhe
2026-01-07 11:31 ` [PATCH v2 2/8] mm/hugetlb: convert to prep_account_new_hugetlb_folio() Li Zhe
2026-01-07 11:31 ` [PATCH v2 3/8] mm/hugetlb: move the huge folio to the end of the list during enqueue Li Zhe
2026-01-07 11:31 ` [PATCH v2 4/8] mm/hugetlb: introduce per-node sysfs interface "zeroable_hugepages" Li Zhe
2026-01-07 11:31 ` [PATCH v2 5/8] mm/hugetlb: simplify function hugetlb_sysfs_add_hstate() Li Zhe
2026-01-07 11:31 ` [PATCH v2 6/8] mm/hugetlb: relocate the per-hstate struct kobject pointer Li Zhe
2026-01-07 11:31 ` [PATCH v2 7/8] mm/hugetlb: add epoll support for interface "zeroable_hugepages" Li Zhe
2026-01-07 11:31 ` [PATCH v2 8/8] mm/hugetlb: limit event generation frequency of function do_zero_free_notify() Li Zhe
2026-01-07 16:19 ` [PATCH v2 0/8] Introduce a huge-page pre-zeroing mechanism Andrew Morton
2026-01-12 11:25   ` Li Zhe
2026-01-09  6:05 ` Muchun Song
2026-01-12 11:27   ` Li Zhe
2026-01-12 19:52     ` David Hildenbrand (Red Hat)
2026-01-12 22:00     ` Ankur Arora [this message]
2026-01-12 22:01 ` Ankur Arora

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87qzrujxu0.fsf@oracle.com \
    --to=ankur.a.arora@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@kernel.org \
    --cc=fvdl@google.com \
    --cc=joao.m.martins@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizhe.67@bytedance.com \
    --cc=mhocko@suse.com \
    --cc=mjguzik@gmail.com \
    --cc=muchun.song@linux.dev \
    --cc=osalvador@suse.de \
    --cc=raghavendra.kt@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox