linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Li Zhe" <lizhe.67@bytedance.com>
To: <ankur.a.arora@oracle.com>
Cc: <akpm@linux-foundation.org>, <david@kernel.org>,
	<fvdl@google.com>,  <linux-kernel@vger.kernel.org>,
	<linux-mm@kvack.org>,  <lizhe.67@bytedance.com>,
	<muchun.song@linux.dev>, <osalvador@suse.de>
Subject: Re: [PATCH v2 0/8] Introduce a huge-page pre-zeroing mechanism
Date: Tue, 13 Jan 2026 14:41:54 +0800	[thread overview]
Message-ID: <20260113064155.29900-1-lizhe.67@bytedance.com> (raw)
In-Reply-To: <87jyxmjxs6.fsf@oracle.com>

On Mon, 12 Jan 2026 14:01:29 -0800, ankur.a.arora@oracle.com wrote:

> > In user space, we can use system calls such as epoll and write to zero
> > huge folios as they become available, and sleep when none are ready. The
> > following pseudocode illustrates this approach. The pseudocode spawns
> > eight threads (each running thread_fun()) that wait for huge pages on
> > node 0 to become eligible for zeroing; whenever such pages are available,
> > the threads clear them in parallel.
> >
> >   static void thread_fun(void)
> >   {
> >   	epoll_create();
> >   	epoll_ctl();
> >   	while (1) {
> >   		val = read("/sys/devices/system/node/node0/hugepages/hugepages-1048576kB/zeroable_hugepages");
> >   		if (val > 0)
> >   			system("echo max > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/zeroable_hugepages");
> >   		epoll_wait();
> >   	}
> >   }
> 
> Given that zeroable_hugepages is per node, anybody who writes to
> it would need to know how much the aggregate demand would be.
> 
> Seems to me that the only value that might make sense would be "max".
> And at that point this approach seems a little bit like init_on_free.

Yes, writing “max” suffices for the vast majority of workloads.

However, once multiple mutually independent application processes each
need huge pages, the ability to specify an exact value becomes
essential, because the CPU time each process spends on zeroing can
then be charged to its own cgroup. If we currently considers “max”
sufficient, we can implement support for that parameter alone and
extend it later when necessary.

Although “max” resembles init_on_free at first glance, it leaves the
decision of “when and on which CPU to zero” entirely to user space,
thereby eliminating the concern previously raised.

Thanks,
Zhe


      reply	other threads:[~2026-01-13  6:42 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-07 11:31 Li Zhe
2026-01-07 11:31 ` [PATCH v2 1/8] mm/hugetlb: add pre-zeroed framework Li Zhe
2026-01-07 11:31 ` [PATCH v2 2/8] mm/hugetlb: convert to prep_account_new_hugetlb_folio() Li Zhe
2026-01-07 11:31 ` [PATCH v2 3/8] mm/hugetlb: move the huge folio to the end of the list during enqueue Li Zhe
2026-01-07 11:31 ` [PATCH v2 4/8] mm/hugetlb: introduce per-node sysfs interface "zeroable_hugepages" Li Zhe
2026-01-07 11:31 ` [PATCH v2 5/8] mm/hugetlb: simplify function hugetlb_sysfs_add_hstate() Li Zhe
2026-01-07 11:31 ` [PATCH v2 6/8] mm/hugetlb: relocate the per-hstate struct kobject pointer Li Zhe
2026-01-07 11:31 ` [PATCH v2 7/8] mm/hugetlb: add epoll support for interface "zeroable_hugepages" Li Zhe
2026-01-07 11:31 ` [PATCH v2 8/8] mm/hugetlb: limit event generation frequency of function do_zero_free_notify() Li Zhe
2026-01-07 16:19 ` [PATCH v2 0/8] Introduce a huge-page pre-zeroing mechanism Andrew Morton
2026-01-12 11:25   ` Li Zhe
2026-01-09  6:05 ` Muchun Song
2026-01-12 11:27   ` Li Zhe
2026-01-12 19:52     ` David Hildenbrand (Red Hat)
2026-01-13  6:37       ` Li Zhe
2026-01-12 22:00     ` Ankur Arora
2026-01-13  6:39       ` Li Zhe
2026-01-12 22:01 ` Ankur Arora
2026-01-13  6:41   ` Li Zhe [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260113064155.29900-1-lizhe.67@bytedance.com \
    --to=lizhe.67@bytedance.com \
    --cc=akpm@linux-foundation.org \
    --cc=ankur.a.arora@oracle.com \
    --cc=david@kernel.org \
    --cc=fvdl@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    --cc=osalvador@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox