* [PATCH] mm: Add AnonZero accounting for zero-filled anonymous pages
@ 2026-02-14 8:45 Wenchao Hao
2026-02-16 11:34 ` Kiryl Shutsemau
` (4 more replies)
0 siblings, 5 replies; 26+ messages in thread
From: Wenchao Hao @ 2026-02-14 8:45 UTC (permalink / raw)
To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel
Cc: Wenchao Hao
Add kernel command line option "count_zero_page" to track anonymous pages
have been allocated and mapped to userspace but zero-filled.
This feature is mainly used to debug large folio mechanism, which
pre-allocates and map more pages than actually needed, leading to memory
waste from unaccessed pages.
Export the result in /proc/pid/smaps as "AnonZero" field.
Link: https://lore.kernel.org/linux-mm/20260210043456.2137482-1-haowenchao22@gmail.com/
Signed-off-by: Wenchao Hao <haowenchao22@gmail.com>
---
Documentation/filesystems/proc.rst | 5 +++++
fs/proc/task_mmu.c | 10 ++++++++++
2 files changed, 15 insertions(+)
diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index b0c0d1b45b99..573c8b015e39 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -545,6 +545,11 @@ replaced by copy-on-write) part of the underlying shmem object out on swap.
does not take into account swapped out page of underlying shmem objects.
"Locked" indicates whether the mapping is locked in memory or not.
+"AnonZero" shows the size of anonymous pages that have never been accessed
+after mapping, and it can reflect the memory waste caused by huge pages.
+Implemented by scanning the size of zero-filled pages of the VMA. It
+is default disabled, and enabled via cmdline param "count_zero_page=true".
+
"THPeligible" indicates whether the mapping is eligible for allocating
naturally aligned THP pages of any currently enabled size. 1 if true, 0
otherwise.
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index dd3b5cf9f0b7..c39ebd015724 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -880,6 +880,7 @@ struct mem_size_stats {
u64 pss_dirty;
u64 pss_locked;
u64 swap_pss;
+ u64 anon_zero;
};
static void smaps_page_accumulate(struct mem_size_stats *mss,
@@ -912,6 +913,10 @@ static void smaps_page_accumulate(struct mem_size_stats *mss,
}
}
+/* If scan and count zero-filled pages */
+static bool count_zero_page;
+core_param(count_zero_page, count_zero_page, bool, 0644);
+
static void smaps_account(struct mem_size_stats *mss, struct page *page,
bool compound, bool young, bool dirty, bool locked,
bool present)
@@ -931,6 +936,9 @@ static void smaps_account(struct mem_size_stats *mss, struct page *page,
if (!folio_test_swapbacked(folio) && !dirty &&
!folio_test_dirty(folio))
mss->lazyfree += size;
+
+ if (count_zero_page && pages_identical(page, ZERO_PAGE(0)))
+ mss->anon_zero += PAGE_SIZE;
}
if (folio_test_ksm(folio))
@@ -1363,6 +1371,8 @@ static void __show_smap(struct seq_file *m, const struct mem_size_stats *mss,
mss->swap_pss >> PSS_SHIFT);
SEQ_PUT_DEC(" kB\nLocked: ",
mss->pss_locked >> PSS_SHIFT);
+ if (count_zero_page)
+ SEQ_PUT_DEC(" kB\nAnonZero: ", mss->anon_zero);
seq_puts(m, " kB\n");
}
--
2.45.0
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm: Add AnonZero accounting for zero-filled anonymous pages
2026-02-14 8:45 [PATCH] mm: Add AnonZero accounting for zero-filled anonymous pages Wenchao Hao
@ 2026-02-16 11:34 ` Kiryl Shutsemau
2026-02-16 11:45 ` David Hildenbrand (Arm)
2026-02-16 12:15 ` David Hildenbrand (Arm)
` (3 subsequent siblings)
4 siblings, 1 reply; 26+ messages in thread
From: Kiryl Shutsemau @ 2026-02-16 11:34 UTC (permalink / raw)
To: Wenchao Hao
Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel
On Sat, Feb 14, 2026 at 04:45:14PM +0800, Wenchao Hao wrote:
> Add kernel command line option "count_zero_page" to track anonymous pages
> have been allocated and mapped to userspace but zero-filled.
>
> This feature is mainly used to debug large folio mechanism, which
> pre-allocates and map more pages than actually needed, leading to memory
> waste from unaccessed pages.
>
> Export the result in /proc/pid/smaps as "AnonZero" field.
I expect it to slowdown /proc/pid/smaps read substantially. I don't
think this line in smaps worth it.
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm: Add AnonZero accounting for zero-filled anonymous pages
2026-02-16 11:34 ` Kiryl Shutsemau
@ 2026-02-16 11:45 ` David Hildenbrand (Arm)
2026-02-16 11:58 ` Kiryl Shutsemau
0 siblings, 1 reply; 26+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-16 11:45 UTC (permalink / raw)
To: Kiryl Shutsemau, Wenchao Hao
Cc: Andrew Morton, Lorenzo Stoakes, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, linux-kernel
On 2/16/26 12:34, Kiryl Shutsemau wrote:
> On Sat, Feb 14, 2026 at 04:45:14PM +0800, Wenchao Hao wrote:
>> Add kernel command line option "count_zero_page" to track anonymous pages
>> have been allocated and mapped to userspace but zero-filled.
>>
>> This feature is mainly used to debug large folio mechanism, which
>> pre-allocates and map more pages than actually needed, leading to memory
>> waste from unaccessed pages.
>>
>> Export the result in /proc/pid/smaps as "AnonZero" field.
>
> I expect it to slowdown /proc/pid/smaps read substantially. I don't
> think this line in smaps worth it.
>
That's why it's enabled through a command line parameter.
--
Cheers,
David
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm: Add AnonZero accounting for zero-filled anonymous pages
2026-02-16 11:45 ` David Hildenbrand (Arm)
@ 2026-02-16 11:58 ` Kiryl Shutsemau
2026-02-16 12:19 ` David Hildenbrand (Arm)
2026-02-16 15:59 ` Wenchao Hao
0 siblings, 2 replies; 26+ messages in thread
From: Kiryl Shutsemau @ 2026-02-16 11:58 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: Wenchao Hao, Andrew Morton, Lorenzo Stoakes, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, linux-kernel
On Mon, Feb 16, 2026 at 12:45:13PM +0100, David Hildenbrand (Arm) wrote:
> On 2/16/26 12:34, Kiryl Shutsemau wrote:
> > On Sat, Feb 14, 2026 at 04:45:14PM +0800, Wenchao Hao wrote:
> > > Add kernel command line option "count_zero_page" to track anonymous pages
> > > have been allocated and mapped to userspace but zero-filled.
> > >
> > > This feature is mainly used to debug large folio mechanism, which
> > > pre-allocates and map more pages than actually needed, leading to memory
> > > waste from unaccessed pages.
> > >
> > > Export the result in /proc/pid/smaps as "AnonZero" field.
> >
> > I expect it to slowdown /proc/pid/smaps read substantially. I don't
> > think this line in smaps worth it.
> >
>
> That's why it's enabled through a command line parameter.
One users want the stat and all users on the machine pay the price?
That's a poor trade off.
In general, smaps scales poorly. It collects a lot of stats and most of
them are ignored by user. We need something like statx(2) where user can
declare what he is interested in, so kernel won't waste cycles.
Kernel cmdline parameter is the wrong place to declare what stats you
want to see.
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm: Add AnonZero accounting for zero-filled anonymous pages
2026-02-14 8:45 [PATCH] mm: Add AnonZero accounting for zero-filled anonymous pages Wenchao Hao
2026-02-16 11:34 ` Kiryl Shutsemau
@ 2026-02-16 12:15 ` David Hildenbrand (Arm)
2026-02-16 15:10 ` Wenchao Hao
2026-02-16 14:22 ` Matthew Wilcox
` (2 subsequent siblings)
4 siblings, 1 reply; 26+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-16 12:15 UTC (permalink / raw)
To: Wenchao Hao, Andrew Morton, Lorenzo Stoakes, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, linux-kernel
On 2/14/26 09:45, Wenchao Hao wrote:
> Add kernel command line option "count_zero_page" to track anonymous pages
> have been allocated and mapped to userspace but zero-filled.
"count_zero_page" is rather sub-optimal parameter name. "anonzero_in_smaps" or sth like that?
Still wondering if there could be a better way to enable this dynamically.
In particular, not using a core parameter.
If you use a module parameter, you can just set on the cmdline
proc.anonzero_in_smaps=1
And dynamically set/observe it in
/sys/module/proc/parameters/anonzero_in_smaps
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index d7d52e259055..0301b9cd28f8 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -34,6 +34,10 @@
#define SEQ_PUT_DEC(str, val) \
seq_put_decimal_ull_width(m, str, (val) << (PAGE_SHIFT-10), 8)
+
+static bool anonzero_in_smaps;
+module_param(anonzero_in_smaps, bool, 0644);
+
void task_mem(struct seq_file *m, struct mm_struct *mm)
{
unsigned long text, lib, swap, anon, file, shmem;
>
> This feature is mainly used to debug large folio mechanism, which
> pre-allocates and map more pages than actually needed, leading to memory
> waste from unaccessed pages.
>
> Export the result in /proc/pid/smaps as "AnonZero" field.
>
> Link: https://lore.kernel.org/linux-mm/20260210043456.2137482-1-haowenchao22@gmail.com/
> Signed-off-by: Wenchao Hao <haowenchao22@gmail.com>
> ---
> Documentation/filesystems/proc.rst | 5 +++++
> fs/proc/task_mmu.c | 10 ++++++++++
> 2 files changed, 15 insertions(+)
>
> diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> index b0c0d1b45b99..573c8b015e39 100644
> --- a/Documentation/filesystems/proc.rst
> +++ b/Documentation/filesystems/proc.rst
> @@ -545,6 +545,11 @@ replaced by copy-on-write) part of the underlying shmem object out on swap.
> does not take into account swapped out page of underlying shmem objects.
> "Locked" indicates whether the mapping is locked in memory or not.
>
> +"AnonZero" shows the size of anonymous pages that have never been accessed
> +after mapping, and it can reflect the memory waste caused by huge pages.
That's not correct. They could be read/written, but with zeroes.
> +Implemented by scanning the size of zero-filled pages of the VMA. It
> +is default disabled, and enabled via cmdline param "count_zero_page=true".
Probably best to keep it simpler:
"AnonZero" shows the size of anonymous pages that contain zeroes. Zero-filled
pages can indicate memory waste caused by memory-overallocation with (m)THPs.
Availability is controlled through XYZ.
--
Cheers,
David
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm: Add AnonZero accounting for zero-filled anonymous pages
2026-02-16 11:58 ` Kiryl Shutsemau
@ 2026-02-16 12:19 ` David Hildenbrand (Arm)
2026-02-16 15:59 ` Wenchao Hao
1 sibling, 0 replies; 26+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-16 12:19 UTC (permalink / raw)
To: Kiryl Shutsemau
Cc: Wenchao Hao, Andrew Morton, Lorenzo Stoakes, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, linux-kernel
On 2/16/26 12:58, Kiryl Shutsemau wrote:
> On Mon, Feb 16, 2026 at 12:45:13PM +0100, David Hildenbrand (Arm) wrote:
>> On 2/16/26 12:34, Kiryl Shutsemau wrote:
>>>
>>> I expect it to slowdown /proc/pid/smaps read substantially. I don't
>>> think this line in smaps worth it.
>>>
>>
>> That's why it's enabled through a command line parameter.
>
> One users want the stat and all users on the machine pay the price?
> That's a poor trade off.
>
It's a debug mechanism as stated in the patch description.
Similar to CONFIG_DEBUG_VM slowing your machine down.
> In general, smaps scales poorly. It collects a lot of stats and most of
> them are ignored by user. We need something like statx(2) where user can
> declare what he is interested in, so kernel won't waste cycles.
>
> Kernel cmdline parameter is the wrong place to declare what stats you
> want to see.
If you have a good idea, please shoot. I proposed using module parameter
that
can get toggled more easily by the admin that debugs something.
It's a good question of someone would want to use this mechanism not
just for debugging. Then, indeed, a more selective (per-process) toggle
could be warranted. But likely the scanning would in any case be too
expensive for any non-debug case.
So let's not over-engineer a debug mechanism if this is intended to stay
a debug mechanism.
--
Cheers,
David
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm: Add AnonZero accounting for zero-filled anonymous pages
2026-02-14 8:45 [PATCH] mm: Add AnonZero accounting for zero-filled anonymous pages Wenchao Hao
2026-02-16 11:34 ` Kiryl Shutsemau
2026-02-16 12:15 ` David Hildenbrand (Arm)
@ 2026-02-16 14:22 ` Matthew Wilcox
2026-02-16 15:55 ` Wenchao Hao
2026-02-16 17:03 ` Matthew Wilcox
2026-02-17 15:22 ` Wenchao Hao
4 siblings, 1 reply; 26+ messages in thread
From: Matthew Wilcox @ 2026-02-16 14:22 UTC (permalink / raw)
To: Wenchao Hao
Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel
On Sat, Feb 14, 2026 at 04:45:14PM +0800, Wenchao Hao wrote:
> Add kernel command line option "count_zero_page" to track anonymous pages
> have been allocated and mapped to userspace but zero-filled.
>
> This feature is mainly used to debug large folio mechanism, which
> pre-allocates and map more pages than actually needed, leading to memory
> waste from unaccessed pages.
Why are you trying to get this upstream when you admitted in an earlier
email this is just for your internal use?
Why do you think that "unaccessed pages" are the only, or even the
largest source of extra memory consumption? The vast majority of files
are never mmaped.
This just seems like a giant waste of time.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm: Add AnonZero accounting for zero-filled anonymous pages
2026-02-16 12:15 ` David Hildenbrand (Arm)
@ 2026-02-16 15:10 ` Wenchao Hao
2026-02-16 15:18 ` David Hildenbrand (Arm)
0 siblings, 1 reply; 26+ messages in thread
From: Wenchao Hao @ 2026-02-16 15:10 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: Andrew Morton, Lorenzo Stoakes, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, linux-kernel
On Mon, Feb 16, 2026 at 8:15 PM David Hildenbrand (Arm)
<david@kernel.org> wrote:
>
> On 2/14/26 09:45, Wenchao Hao wrote:
> > Add kernel command line option "count_zero_page" to track anonymous pages
> > have been allocated and mapped to userspace but zero-filled.
>
> "count_zero_page" is rather sub-optimal parameter name. "anonzero_in_smaps" or sth like that?
>
Your naming suggestion is indeed better than mine. If this patch is still needed
for further development, I will modify it according to your advice.
> Still wondering if there could be a better way to enable this dynamically.
>
> In particular, not using a core parameter.
>
> If you use a module parameter, you can just set on the cmdline
>
> proc.anonzero_in_smaps=1
>
> And dynamically set/observe it in
>
> /sys/module/proc/parameters/anonzero_in_smaps
>
Regarding the use of module parameters versus core_param, I think
either approach is ok.
Currently, the core_param I'm using also supports two modification methods:
1. Command line parameter: count_zero_page=Y
2. After system boot, view or modify it via
/sys/module/kernel/parameters/count_zero_page
It is true that this modification would be more appropriately placed
in the proc module,
which aligns with your earlier suggestion.
If there is a possibility of further iteration on this change, I will
move it to the proc module instead
of using a core_param
> >
> > This feature is mainly used to debug large folio mechanism, which
> > pre-allocates and map more pages than actually needed, leading to memory
> > waste from unaccessed pages.
> >
> > Export the result in /proc/pid/smaps as "AnonZero" field.
> >
> > Link: https://lore.kernel.org/linux-mm/20260210043456.2137482-1-haowenchao22@gmail.com/
> > Signed-off-by: Wenchao Hao <haowenchao22@gmail.com>
> > ---
> > Documentation/filesystems/proc.rst | 5 +++++
> > fs/proc/task_mmu.c | 10 ++++++++++
> > 2 files changed, 15 insertions(+)
> >
> > diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> > index b0c0d1b45b99..573c8b015e39 100644
> > --- a/Documentation/filesystems/proc.rst
> > +++ b/Documentation/filesystems/proc.rst
> > @@ -545,6 +545,11 @@ replaced by copy-on-write) part of the underlying shmem object out on swap.
> > does not take into account swapped out page of underlying shmem objects.
> > "Locked" indicates whether the mapping is locked in memory or not.
> >
> > +"AnonZero" shows the size of anonymous pages that have never been accessed
> > +after mapping, and it can reflect the memory waste caused by huge pages.
>
> That's not correct. They could be read/written, but with zeroes.
>
> > +Implemented by scanning the size of zero-filled pages of the VMA. It
> > +is default disabled, and enabled via cmdline param "count_zero_page=true".
>
> Probably best to keep it simpler:
>
> "AnonZero" shows the size of anonymous pages that contain zeroes. Zero-filled
> pages can indicate memory waste caused by memory-overallocation with (m)THPs.
> Availability is controlled through XYZ.
>
As with your earlier suggestion, if this change is to be iterated on
further, I will update it according
to your suggestion.
Thanks.
> --
> Cheers,
>
> David
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm: Add AnonZero accounting for zero-filled anonymous pages
2026-02-16 15:10 ` Wenchao Hao
@ 2026-02-16 15:18 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 26+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-16 15:18 UTC (permalink / raw)
To: Wenchao Hao
Cc: Andrew Morton, Lorenzo Stoakes, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, linux-kernel
On 2/16/26 16:10, Wenchao Hao wrote:
> On Mon, Feb 16, 2026 at 8:15 PM David Hildenbrand (Arm)
> <david@kernel.org> wrote:
>>
>> On 2/14/26 09:45, Wenchao Hao wrote:
>>> Add kernel command line option "count_zero_page" to track anonymous pages
>>> have been allocated and mapped to userspace but zero-filled.
>>
>> "count_zero_page" is rather sub-optimal parameter name. "anonzero_in_smaps" or sth like that?
>>
> Your naming suggestion is indeed better than mine. If this patch is still needed
> for further development, I will modify it according to your advice.
>
>> Still wondering if there could be a better way to enable this dynamically.
>>
>> In particular, not using a core parameter.
>>
>> If you use a module parameter, you can just set on the cmdline
>>
>> proc.anonzero_in_smaps=1
>>
>> And dynamically set/observe it in
>>
>> /sys/module/proc/parameters/anonzero_in_smaps
>>
> Regarding the use of module parameters versus core_param, I think
> either approach is ok.
> Currently, the core_param I'm using also supports two modification methods:
>
> 1. Command line parameter: count_zero_page=Y
> 2. After system boot, view or modify it via
> /sys/module/kernel/parameters/count_zero_page
>
> It is true that this modification would be more appropriately placed
> in the proc module,
> which aligns with your earlier suggestion.
>
> If there is a possibility of further iteration on this change, I will
> move it to the proc module instead
> of using a core_param
>
>>>
>>> This feature is mainly used to debug large folio mechanism, which
>>> pre-allocates and map more pages than actually needed, leading to memory
>>> waste from unaccessed pages.
>>>
>>> Export the result in /proc/pid/smaps as "AnonZero" field.
>>>
>>> Link: https://lore.kernel.org/linux-mm/20260210043456.2137482-1-haowenchao22@gmail.com/
>>> Signed-off-by: Wenchao Hao <haowenchao22@gmail.com>
>>> ---
>>> Documentation/filesystems/proc.rst | 5 +++++
>>> fs/proc/task_mmu.c | 10 ++++++++++
>>> 2 files changed, 15 insertions(+)
>>>
>>> diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
>>> index b0c0d1b45b99..573c8b015e39 100644
>>> --- a/Documentation/filesystems/proc.rst
>>> +++ b/Documentation/filesystems/proc.rst
>>> @@ -545,6 +545,11 @@ replaced by copy-on-write) part of the underlying shmem object out on swap.
>>> does not take into account swapped out page of underlying shmem objects.
>>> "Locked" indicates whether the mapping is locked in memory or not.
>>>
>>> +"AnonZero" shows the size of anonymous pages that have never been accessed
>>> +after mapping, and it can reflect the memory waste caused by huge pages.
>>
>> That's not correct. They could be read/written, but with zeroes.
>>
>>> +Implemented by scanning the size of zero-filled pages of the VMA. It
>>> +is default disabled, and enabled via cmdline param "count_zero_page=true".
>>
>> Probably best to keep it simpler:
>>
>> "AnonZero" shows the size of anonymous pages that contain zeroes. Zero-filled
>> pages can indicate memory waste caused by memory-overallocation with (m)THPs.
>> Availability is controlled through XYZ.
>>
> As with your earlier suggestion, if this change is to be iterated on
> further, I will update it according
> to your suggestion.
Let's wait first for other opinions. Willy doesn't seem to like it if
there is no use case besides for limited experiments where you could
just use the OOT patch, possibly.
--
Cheers,
David
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm: Add AnonZero accounting for zero-filled anonymous pages
2026-02-16 14:22 ` Matthew Wilcox
@ 2026-02-16 15:55 ` Wenchao Hao
0 siblings, 0 replies; 26+ messages in thread
From: Wenchao Hao @ 2026-02-16 15:55 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel
On Mon, Feb 16, 2026 at 10:23 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Sat, Feb 14, 2026 at 04:45:14PM +0800, Wenchao Hao wrote:
> > Add kernel command line option "count_zero_page" to track anonymous pages
> > have been allocated and mapped to userspace but zero-filled.
> >
> > This feature is mainly used to debug large folio mechanism, which
> > pre-allocates and map more pages than actually needed, leading to memory
> > waste from unaccessed pages.
>
> Why are you trying to get this upstream when you admitted in an earlier
> email this is just for your internal use?
>
I see this as a debugging feature, not limited to internal use only.
Our real goal is to gain more precise visibility into how system memory is used.
A basic requirement is to measure the memory overhead caused by anonymous
hugepages that have been pre-allocated but never accessed.
With this information, we can implement various policies:
- Allocate only 4K pages for applications that suffer severe memory waste from
anonymous hugepages.
- Evaluate per-process hugepage waste during low system load and proactively
split hugepages accordingly.
So I believe this debugging feature still provides value when merged upstream.
Currently, there is no effective way to account for memory waste from
pre-allocated
but unused anonymous hugepages, and this feature fills that gap. Or do you have
any suggestions about how to get this info?
> Why do you think that "unaccessed pages" are the only, or even the
> largest source of extra memory consumption? The vast majority of files
> are never mmaped.
>
In my view, memory waste from anonymous hugepages is less acceptable than that
from file pages.
Although file pages may also be unmapped, a cache hit can still reduce
I/O overhead.
By contrast, pre-allocated anonymous hugepages that are never accessed represent
pure waste.
Furthermore, the total number of unmapped file pages can already be
estimated from
/proc/meminfo, so we can already apply policies to control file page waste.
From my research, many memory-sensitive environments already apply
special policies
for file pages—for example, the RFC patch from vivo that manages
readahead file pages
separately:
https://lore.kernel.org/linux-mm/20250916072226.220426-1-liulei.rjpt@vivo.com/
But file page waste is not the main point I want to focus on here.
Thanks.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm: Add AnonZero accounting for zero-filled anonymous pages
2026-02-16 11:58 ` Kiryl Shutsemau
2026-02-16 12:19 ` David Hildenbrand (Arm)
@ 2026-02-16 15:59 ` Wenchao Hao
2026-02-16 16:42 ` Michal Hocko
2026-02-16 16:54 ` Kiryl Shutsemau
1 sibling, 2 replies; 26+ messages in thread
From: Wenchao Hao @ 2026-02-16 15:59 UTC (permalink / raw)
To: Kiryl Shutsemau
Cc: David Hildenbrand (Arm),
Andrew Morton, Lorenzo Stoakes, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, linux-kernel
On Mon, Feb 16, 2026 at 7:58 PM Kiryl Shutsemau <kirill@shutemov.name> wrote:
>
> On Mon, Feb 16, 2026 at 12:45:13PM +0100, David Hildenbrand (Arm) wrote:
> > On 2/16/26 12:34, Kiryl Shutsemau wrote:
> > > On Sat, Feb 14, 2026 at 04:45:14PM +0800, Wenchao Hao wrote:
> > > > Add kernel command line option "count_zero_page" to track anonymous pages
> > > > have been allocated and mapped to userspace but zero-filled.
> > > >
> > > > This feature is mainly used to debug large folio mechanism, which
> > > > pre-allocates and map more pages than actually needed, leading to memory
> > > > waste from unaccessed pages.
> > > >
> > > > Export the result in /proc/pid/smaps as "AnonZero" field.
> > >
> > > I expect it to slowdown /proc/pid/smaps read substantially. I don't
> > > think this line in smaps worth it.
> > >
> >
> > That's why it's enabled through a command line parameter.
>
> One users want the stat and all users on the machine pay the price?
> That's a poor trade off.
>
> In general, smaps scales poorly. It collects a lot of stats and most of
> them are ignored by user. We need something like statx(2) where user can
> declare what he is interested in, so kernel won't waste cycles.
>
I initially considered two approaches:
First, exposing the needed information via smaps. This does incur some
performance cost but is the simplest to implement. The new feature can be
dynamically toggled via a command-line parameter. When disabled, the
overhead is negligible—only a minor if check, which is insignificant compared
to the full smaps cost.
Second, adding a new system call or extending madvise with a new command
like MADV_GET_ZEROANON. Userspace tools can then use it to measure
memory waste from zero-filled anonymous huge pages.
This is slightly more complex but minimizes system impact: environments that
don’t care about zero-filled anonymous pages pay zero overhead when the
command is not used.
The exact implementation approach can be discussed after we confirm whether
the upstream kernel needs this debugging feature.
Thanks.
> Kernel cmdline parameter is the wrong place to declare what stats you
> want to see.
>
> --
> Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm: Add AnonZero accounting for zero-filled anonymous pages
2026-02-16 15:59 ` Wenchao Hao
@ 2026-02-16 16:42 ` Michal Hocko
2026-02-16 16:56 ` David Hildenbrand (Arm)
2026-02-16 16:54 ` Kiryl Shutsemau
1 sibling, 1 reply; 26+ messages in thread
From: Michal Hocko @ 2026-02-16 16:42 UTC (permalink / raw)
To: Wenchao Hao
Cc: Kiryl Shutsemau, David Hildenbrand (Arm),
Andrew Morton, Lorenzo Stoakes, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, linux-mm,
linux-kernel
On Mon 16-02-26 23:59:50, Wenchao Hao wrote:
> On Mon, Feb 16, 2026 at 7:58 PM Kiryl Shutsemau <kirill@shutemov.name> wrote:
> >
> > On Mon, Feb 16, 2026 at 12:45:13PM +0100, David Hildenbrand (Arm) wrote:
> > > On 2/16/26 12:34, Kiryl Shutsemau wrote:
> > > > On Sat, Feb 14, 2026 at 04:45:14PM +0800, Wenchao Hao wrote:
> > > > > Add kernel command line option "count_zero_page" to track anonymous pages
> > > > > have been allocated and mapped to userspace but zero-filled.
> > > > >
> > > > > This feature is mainly used to debug large folio mechanism, which
> > > > > pre-allocates and map more pages than actually needed, leading to memory
> > > > > waste from unaccessed pages.
> > > > >
> > > > > Export the result in /proc/pid/smaps as "AnonZero" field.
> > > >
> > > > I expect it to slowdown /proc/pid/smaps read substantially. I don't
> > > > think this line in smaps worth it.
> > > >
> > >
> > > That's why it's enabled through a command line parameter.
> >
> > One users want the stat and all users on the machine pay the price?
> > That's a poor trade off.
> >
> > In general, smaps scales poorly. It collects a lot of stats and most of
> > them are ignored by user. We need something like statx(2) where user can
> > declare what he is interested in, so kernel won't waste cycles.
> >
>
> I initially considered two approaches:
>
> First, exposing the needed information via smaps. This does incur some
> performance cost but is the simplest to implement. The new feature can be
> dynamically toggled via a command-line parameter. When disabled, the
> overhead is negligible—only a minor if check, which is insignificant compared
> to the full smaps cost.
You are comparing content of all anon pages, aren't you? Depending on
the content this can add a lot of overhead.
> Second, adding a new system call or extending madvise with a new command
> like MADV_GET_ZEROANON. Userspace tools can then use it to measure
> memory waste from zero-filled anonymous huge pages.
MADV_GET_ZEROPAGE would make more sense to me. But a more fundamental
question is whether this metric is really what you want long term.
Kernel can do all sorts of optimizations behind userspace back - e.g.
map shared zero page - so just learning that a process has a lot of
pages filled up with zeroes doesn't tell you all that much. Or does it?
Also think about what do you do with those numbers. Any action performed
later is inherently racy. Aren't you really looking for something like
MADV_COMPACT or maybe even MADV_COMPRESS?
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm: Add AnonZero accounting for zero-filled anonymous pages
2026-02-16 15:59 ` Wenchao Hao
2026-02-16 16:42 ` Michal Hocko
@ 2026-02-16 16:54 ` Kiryl Shutsemau
2026-02-16 17:01 ` Matthew Wilcox
1 sibling, 1 reply; 26+ messages in thread
From: Kiryl Shutsemau @ 2026-02-16 16:54 UTC (permalink / raw)
To: Wenchao Hao
Cc: David Hildenbrand (Arm),
Andrew Morton, Lorenzo Stoakes, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, linux-kernel
On Mon, Feb 16, 2026 at 11:59:50PM +0800, Wenchao Hao wrote:
> On Mon, Feb 16, 2026 at 7:58 PM Kiryl Shutsemau <kirill@shutemov.name> wrote:
> >
> > On Mon, Feb 16, 2026 at 12:45:13PM +0100, David Hildenbrand (Arm) wrote:
> > > On 2/16/26 12:34, Kiryl Shutsemau wrote:
> > > > On Sat, Feb 14, 2026 at 04:45:14PM +0800, Wenchao Hao wrote:
> > > > > Add kernel command line option "count_zero_page" to track anonymous pages
> > > > > have been allocated and mapped to userspace but zero-filled.
> > > > >
> > > > > This feature is mainly used to debug large folio mechanism, which
> > > > > pre-allocates and map more pages than actually needed, leading to memory
> > > > > waste from unaccessed pages.
> > > > >
> > > > > Export the result in /proc/pid/smaps as "AnonZero" field.
> > > >
> > > > I expect it to slowdown /proc/pid/smaps read substantially. I don't
> > > > think this line in smaps worth it.
> > > >
> > >
> > > That's why it's enabled through a command line parameter.
> >
> > One users want the stat and all users on the machine pay the price?
> > That's a poor trade off.
> >
> > In general, smaps scales poorly. It collects a lot of stats and most of
> > them are ignored by user. We need something like statx(2) where user can
> > declare what he is interested in, so kernel won't waste cycles.
> >
>
> I initially considered two approaches:
>
> First, exposing the needed information via smaps. This does incur some
> performance cost but is the simplest to implement. The new feature can be
> dynamically toggled via a command-line parameter. When disabled, the
> overhead is negligible—only a minor if check, which is insignificant compared
> to the full smaps cost.
>
> Second, adding a new system call or extending madvise with a new command
> like MADV_GET_ZEROANON. Userspace tools can then use it to measure
> memory waste from zero-filled anonymous huge pages.
>
> This is slightly more complex but minimizes system impact: environments that
> don’t care about zero-filled anonymous pages pay zero overhead when the
> command is not used.
>
> The exact implementation approach can be discussed after we confirm whether
> the upstream kernel needs this debugging feature.
What I would like to see in the kernel is a syscall that return the
memory stats in binary form. Something like
size_t memstat(int pidfd, struct memstat memstatbuf[], size_t n,
unsigned long flags, unsigned long start, unsigned long end);
The syscall will fill up to n memstatbufs, one per-VMA. What exactly
filled there defined by flags. The return value is how many memstatbuf
is populated. The caller can call it multiple times to walk address
space it is interested in.
We also can have a flag that mirrors smaps_rollup behaviour and collect
all the data into a single memstatbuf.
Internally, the kernel can use the infrastructure built for this syscall
to provide /proc/<PID>/{maps,smaps,smaps_rollup}. This way we will not
duplicate the code.
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm: Add AnonZero accounting for zero-filled anonymous pages
2026-02-16 16:42 ` Michal Hocko
@ 2026-02-16 16:56 ` David Hildenbrand (Arm)
2026-02-16 17:10 ` Michal Hocko
0 siblings, 1 reply; 26+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-16 16:56 UTC (permalink / raw)
To: Michal Hocko, Wenchao Hao
Cc: Kiryl Shutsemau, Andrew Morton, Lorenzo Stoakes,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, linux-mm, linux-kernel
On 2/16/26 17:42, Michal Hocko wrote:
> On Mon 16-02-26 23:59:50, Wenchao Hao wrote:
>> On Mon, Feb 16, 2026 at 7:58 PM Kiryl Shutsemau <kirill@shutemov.name> wrote:
>>>
>>>
>>> One users want the stat and all users on the machine pay the price?
>>> That's a poor trade off.
>>>
>>> In general, smaps scales poorly. It collects a lot of stats and most of
>>> them are ignored by user. We need something like statx(2) where user can
>>> declare what he is interested in, so kernel won't waste cycles.
>>>
>>
>> I initially considered two approaches:
>>
>> First, exposing the needed information via smaps. This does incur some
>> performance cost but is the simplest to implement. The new feature can be
>> dynamically toggled via a command-line parameter. When disabled, the
>> overhead is negligible—only a minor if check, which is insignificant compared
>> to the full smaps cost.
>
> You are comparing content of all anon pages, aren't you? Depending on
> the content this can add a lot of overhead.
>
>> Second, adding a new system call or extending madvise with a new command
>> like MADV_GET_ZEROANON. Userspace tools can then use it to measure
>> memory waste from zero-filled anonymous huge pages.
>
> MADV_GET_ZEROPAGE would make more sense to me. But a more fundamental
> question is whether this metric is really what you want long term.
> Kernel can do all sorts of optimizations behind userspace back - e.g.
> map shared zero page - so just learning that a process has a lot of
> pages filled up with zeroes doesn't tell you all that much. Or does it?
If a sysadmin wants to see where THP hurt (zero-filled pages), surely
MADV_GET_ZEROPAGE is the wrong (ugly) interface.
All we want are per-process stats. What am I missing?
KSM could deduplicate them, the deferred shrinker could remove them.
Unless both mechanisms are not desired or are ineffective for another
reason (e.g., page pinning).
--
Cheers,
David
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm: Add AnonZero accounting for zero-filled anonymous pages
2026-02-16 16:54 ` Kiryl Shutsemau
@ 2026-02-16 17:01 ` Matthew Wilcox
2026-02-16 17:10 ` David Hildenbrand (Arm)
2026-02-16 17:18 ` Kiryl Shutsemau
0 siblings, 2 replies; 26+ messages in thread
From: Matthew Wilcox @ 2026-02-16 17:01 UTC (permalink / raw)
To: Kiryl Shutsemau
Cc: Wenchao Hao, David Hildenbrand (Arm),
Andrew Morton, Lorenzo Stoakes, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, linux-kernel
On Mon, Feb 16, 2026 at 04:54:05PM +0000, Kiryl Shutsemau wrote:
> What I would like to see in the kernel is a syscall that return the
> memory stats in binary form. Something like
>
> size_t memstat(int pidfd, struct memstat memstatbuf[], size_t n,
> unsigned long flags, unsigned long start, unsigned long end);
>
> The syscall will fill up to n memstatbufs, one per-VMA. What exactly
> filled there defined by flags. The return value is how many memstatbuf
> is populated. The caller can call it multiple times to walk address
> space it is interested in.
>
> We also can have a flag that mirrors smaps_rollup behaviour and collect
> all the data into a single memstatbuf.
But is that what we want? Let's say a process allocates a 2MB THP, uses
12kB of it and then forks. A lot. Now all children that haven't called
exec() see the wasted 2036kB. Would we rather have something that scans
(say) the LRU list looking for zero memory?
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm: Add AnonZero accounting for zero-filled anonymous pages
2026-02-14 8:45 [PATCH] mm: Add AnonZero accounting for zero-filled anonymous pages Wenchao Hao
` (2 preceding siblings ...)
2026-02-16 14:22 ` Matthew Wilcox
@ 2026-02-16 17:03 ` Matthew Wilcox
2026-02-17 15:22 ` Wenchao Hao
4 siblings, 0 replies; 26+ messages in thread
From: Matthew Wilcox @ 2026-02-16 17:03 UTC (permalink / raw)
To: Wenchao Hao
Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel
On Sat, Feb 14, 2026 at 04:45:14PM +0800, Wenchao Hao wrote:
> + if (count_zero_page && pages_identical(page, ZERO_PAGE(0)))
Pretty sure you want memchr_inv() here?
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm: Add AnonZero accounting for zero-filled anonymous pages
2026-02-16 17:01 ` Matthew Wilcox
@ 2026-02-16 17:10 ` David Hildenbrand (Arm)
2026-02-16 17:18 ` Kiryl Shutsemau
1 sibling, 0 replies; 26+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-16 17:10 UTC (permalink / raw)
To: Matthew Wilcox, Kiryl Shutsemau
Cc: Wenchao Hao, Andrew Morton, Lorenzo Stoakes, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, linux-kernel
On 2/16/26 18:01, Matthew Wilcox wrote:
> On Mon, Feb 16, 2026 at 04:54:05PM +0000, Kiryl Shutsemau wrote:
>> What I would like to see in the kernel is a syscall that return the
>> memory stats in binary form. Something like
>>
>> size_t memstat(int pidfd, struct memstat memstatbuf[], size_t n,
>> unsigned long flags, unsigned long start, unsigned long end);
>>
>> The syscall will fill up to n memstatbufs, one per-VMA. What exactly
>> filled there defined by flags. The return value is how many memstatbuf
>> is populated. The caller can call it multiple times to walk address
>> space it is interested in.
>>
>> We also can have a flag that mirrors smaps_rollup behaviour and collect
>> all the data into a single memstatbuf.
>
> But is that what we want? Let's say a process allocates a 2MB THP, uses
> 12kB of it and then forks.
That's just like all of these entries (Rss, Anonymous, KSM) except the
Pss ones behave.
--
Cheers,
David
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm: Add AnonZero accounting for zero-filled anonymous pages
2026-02-16 16:56 ` David Hildenbrand (Arm)
@ 2026-02-16 17:10 ` Michal Hocko
2026-02-16 17:17 ` David Hildenbrand (Arm)
0 siblings, 1 reply; 26+ messages in thread
From: Michal Hocko @ 2026-02-16 17:10 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: Wenchao Hao, Kiryl Shutsemau, Andrew Morton, Lorenzo Stoakes,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, linux-mm, linux-kernel
On Mon 16-02-26 17:56:12, David Hildenbrand wrote:
> On 2/16/26 17:42, Michal Hocko wrote:
> > On Mon 16-02-26 23:59:50, Wenchao Hao wrote:
> > > On Mon, Feb 16, 2026 at 7:58 PM Kiryl Shutsemau <kirill@shutemov.name> wrote:
> > > >
> > > >
> > > > One users want the stat and all users on the machine pay the price?
> > > > That's a poor trade off.
> > > >
> > > > In general, smaps scales poorly. It collects a lot of stats and most of
> > > > them are ignored by user. We need something like statx(2) where user can
> > > > declare what he is interested in, so kernel won't waste cycles.
> > > >
> > >
> > > I initially considered two approaches:
> > >
> > > First, exposing the needed information via smaps. This does incur some
> > > performance cost but is the simplest to implement. The new feature can be
> > > dynamically toggled via a command-line parameter. When disabled, the
> > > overhead is negligible—only a minor if check, which is insignificant compared
> > > to the full smaps cost.
> >
> > You are comparing content of all anon pages, aren't you? Depending on
> > the content this can add a lot of overhead.
> >
> > > Second, adding a new system call or extending madvise with a new command
> > > like MADV_GET_ZEROANON. Userspace tools can then use it to measure
> > > memory waste from zero-filled anonymous huge pages.
> >
> > MADV_GET_ZEROPAGE would make more sense to me. But a more fundamental
> > question is whether this metric is really what you want long term.
> > Kernel can do all sorts of optimizations behind userspace back - e.g.
> > map shared zero page - so just learning that a process has a lot of
> > pages filled up with zeroes doesn't tell you all that much. Or does it?
>
> If a sysadmin wants to see where THP hurt (zero-filled pages), surely
> MADV_GET_ZEROPAGE is the wrong (ugly) interface.
The question is whether sysadmin should really ask questions like that.
Without a deeper understanding of the workload the answer could be
misleading at best, no matter what interface is available.
If you know and understand the workload you already know that THP is not
a good fit and you do not need to ask about that. If you want to
understand whether your particular workload has a big internal
fragmentation due to THPs then MADV_GET_ZEROPAGE sounds like a
reasonable fit to me.
From a sysadmin POV you care about the overall memory consuption, right?
And for that I believe you need some sort of high level compression or
similar interface.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm: Add AnonZero accounting for zero-filled anonymous pages
2026-02-16 17:10 ` Michal Hocko
@ 2026-02-16 17:17 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 26+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-16 17:17 UTC (permalink / raw)
To: Michal Hocko
Cc: Wenchao Hao, Kiryl Shutsemau, Andrew Morton, Lorenzo Stoakes,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, linux-mm, linux-kernel
On 2/16/26 18:10, Michal Hocko wrote:
> On Mon 16-02-26 17:56:12, David Hildenbrand wrote:
>> On 2/16/26 17:42, Michal Hocko wrote:
>>>
>>> You are comparing content of all anon pages, aren't you? Depending on
>>> the content this can add a lot of overhead.
>>>
>>>
>>> MADV_GET_ZEROPAGE would make more sense to me. But a more fundamental
>>> question is whether this metric is really what you want long term.
>>> Kernel can do all sorts of optimizations behind userspace back - e.g.
>>> map shared zero page - so just learning that a process has a lot of
>>> pages filled up with zeroes doesn't tell you all that much. Or does it?
>>
>> If a sysadmin wants to see where THP hurt (zero-filled pages), surely
>> MADV_GET_ZEROPAGE is the wrong (ugly) interface.
>
> The question is whether sysadmin should really ask questions like that.
> Without a deeper understanding of the workload the answer could be
> misleading at best, no matter what interface is available.
Given the requests for per-process control of THPs I assume some
sysadmins (at hyperscalers :) ) really care about the relevant processes
and even the relevant memory areas (e.g., jmalloc area).
>
> If you know and understand the workload you already know that THP is not
> a good fit and you do not need to ask about that. If you want to
> understand whether your particular workload has a big internal
> fragmentation due to THPs then MADV_GET_ZEROPAGE sounds like a
> reasonable fit to me.
You could also just scan the pages yourself for 0-content, or what's the
benefit of letting the kernel do that?
>
> From a sysadmin POV you care about the overall memory consuption, right?
I'm not so sure about that.
--
Cheers,
David
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm: Add AnonZero accounting for zero-filled anonymous pages
2026-02-16 17:01 ` Matthew Wilcox
2026-02-16 17:10 ` David Hildenbrand (Arm)
@ 2026-02-16 17:18 ` Kiryl Shutsemau
1 sibling, 0 replies; 26+ messages in thread
From: Kiryl Shutsemau @ 2026-02-16 17:18 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Wenchao Hao, David Hildenbrand (Arm),
Andrew Morton, Lorenzo Stoakes, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, linux-kernel
On Mon, Feb 16, 2026 at 05:01:51PM +0000, Matthew Wilcox wrote:
> On Mon, Feb 16, 2026 at 04:54:05PM +0000, Kiryl Shutsemau wrote:
> > What I would like to see in the kernel is a syscall that return the
> > memory stats in binary form. Something like
> >
> > size_t memstat(int pidfd, struct memstat memstatbuf[], size_t n,
> > unsigned long flags, unsigned long start, unsigned long end);
> >
> > The syscall will fill up to n memstatbufs, one per-VMA. What exactly
> > filled there defined by flags. The return value is how many memstatbuf
> > is populated. The caller can call it multiple times to walk address
> > space it is interested in.
> >
> > We also can have a flag that mirrors smaps_rollup behaviour and collect
> > all the data into a single memstatbuf.
>
> But is that what we want? Let's say a process allocates a 2MB THP, uses
> 12kB of it and then forks. A lot. Now all children that haven't called
> exec() see the wasted 2036kB. Would we rather have something that scans
> (say) the LRU list looking for zero memory?
Are describing THP shrinker? :)
I don't particular care about original poster use-case. I want to see
more scalable way to access memory statistic information in general.
smaps is slow, because it collects information user doesn't care about.
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm: Add AnonZero accounting for zero-filled anonymous pages
2026-02-14 8:45 [PATCH] mm: Add AnonZero accounting for zero-filled anonymous pages Wenchao Hao
` (3 preceding siblings ...)
2026-02-16 17:03 ` Matthew Wilcox
@ 2026-02-17 15:22 ` Wenchao Hao
2026-02-17 20:29 ` David Hildenbrand (Arm)
2026-02-18 7:52 ` Michal Hocko
4 siblings, 2 replies; 26+ messages in thread
From: Wenchao Hao @ 2026-02-17 15:22 UTC (permalink / raw)
To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel
On Sat, Feb 14, 2026 at 4:45 PM Wenchao Hao <haowenchao22@gmail.com> wrote:
>
> Add kernel command line option "count_zero_page" to track anonymous pages
> have been allocated and mapped to userspace but zero-filled.
>
> This feature is mainly used to debug large folio mechanism, which
> pre-allocates and map more pages than actually needed, leading to memory
> waste from unaccessed pages.
>
> Export the result in /proc/pid/smaps as "AnonZero" field.
>
> Link: https://lore.kernel.org/linux-mm/20260210043456.2137482-1-haowenchao22@gmail.com/
Sorry for the late reply. We are now on Chinese New Year holiday, so...
The original goal of this patch is to measure memory waste from anonymous
THPs - pages pre-allocated on fault but never accessed.
On memory-sensitive devices like mobile phones, this helps us make better
decisions about when and how to enable THP. I think this is useful for
guiding THP policies, even as a debugging feature.
Let me summarize the discussion so far:
- Matthew Wilcox questioned the value and raised concerns fork but haven't
exec path
- Michal Hocko criticized the inefficiency of scanning zero-filled pages.
- Kiryl Shutsemau prefers a system-call-based interface.
- David Hildenbrand acknowledged the value and suggested implementation
improvements.
Please correct me if I missed or misrepresented anything.
I suggest we first agree whether this functionality is useful for upstream,
before discussing implementation details.
Reasons why this should go upstream from me:
- Anonymous THP can introduce real memory waste, but we currently have no
good way to measure it.
- With accurate metrics, we can make better THP policy: disable for
low-utilization cases, or early-unmap to relieve memory pressure and so
on. This is especially valuable for mobile/embedded devices.
Possible implementations:
1. A new smaps counter (default-off) to count zero-filled pages.
2. A new madvise command like MADV_GET_ZEROPAGE
3. A dedicated system call
I welcome feedback on whether this is useful, and any better approaches.
Thank you.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm: Add AnonZero accounting for zero-filled anonymous pages
2026-02-17 15:22 ` Wenchao Hao
@ 2026-02-17 20:29 ` David Hildenbrand (Arm)
2026-02-17 21:53 ` Kiryl Shutsemau
2026-02-18 7:52 ` Michal Hocko
1 sibling, 1 reply; 26+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-17 20:29 UTC (permalink / raw)
To: Wenchao Hao, Andrew Morton, Lorenzo Stoakes, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, linux-kernel, Kiryl Shutsemau
On 2/17/26 16:22, Wenchao Hao wrote:
> On Sat, Feb 14, 2026 at 4:45 PM Wenchao Hao <haowenchao22@gmail.com> wrote:
>>
>> Add kernel command line option "count_zero_page" to track anonymous pages
>> have been allocated and mapped to userspace but zero-filled.
>>
>> This feature is mainly used to debug large folio mechanism, which
>> pre-allocates and map more pages than actually needed, leading to memory
>> waste from unaccessed pages.
>>
>> Export the result in /proc/pid/smaps as "AnonZero" field.
>>
>> Link: https://lore.kernel.org/linux-mm/20260210043456.2137482-1-haowenchao22@gmail.com/
>
> Sorry for the late reply. We are now on Chinese New Year holiday, so...
>
> The original goal of this patch is to measure memory waste from anonymous
> THPs - pages pre-allocated on fault but never accessed.
>
> On memory-sensitive devices like mobile phones, this helps us make better
> decisions about when and how to enable THP. I think this is useful for
> guiding THP policies, even as a debugging feature.
>
> Let me summarize the discussion so far:
> - Matthew Wilcox questioned the value and raised concerns fork but haven't
> exec path
> - Michal Hocko criticized the inefficiency of scanning zero-filled pages.
> - Kiryl Shutsemau prefers a system-call-based interface.
> - David Hildenbrand acknowledged the value and suggested implementation
> improvements.
> Please correct me if I missed or misrepresented anything.
>
> I suggest we first agree whether this functionality is useful for upstream,
> before discussing implementation details.
>
> Reasons why this should go upstream from me:
>
> - Anonymous THP can introduce real memory waste, but we currently have no
> good way to measure it.
> - With accurate metrics, we can make better THP policy: disable for
> low-utilization cases, or early-unmap to relieve memory pressure and so
> on. This is especially valuable for mobile/embedded devices.
>
> Possible implementations:
>
> 1. A new smaps counter (default-off) to count zero-filled pages.
> 2. A new madvise command like MADV_GET_ZEROPAGE
> 3. A dedicated system call
I understand Kiyls point about smaps providing too much information
users might not be interested in already. So sorting that out might
provide a real benefit to other users that are only interested in
specific stats (e.g., Rss).
Providing a system call where one can specify/filter in theory sounds
like a good idea. A syscall implies that one has to write a tool to
obtain these metrics.
The nice thing about smaps/smaps_rollup is that it can be easily
consumed on any system while debugging.
I wonder if there could be a way to achieve something similar with a
file. Likely not, but maybe someone reading along can surprise me :)
Otherwise we'd have to go with a tool.
--
Cheers,
David
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm: Add AnonZero accounting for zero-filled anonymous pages
2026-02-17 20:29 ` David Hildenbrand (Arm)
@ 2026-02-17 21:53 ` Kiryl Shutsemau
2026-02-19 2:11 ` Wenchao Hao
0 siblings, 1 reply; 26+ messages in thread
From: Kiryl Shutsemau @ 2026-02-17 21:53 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: Wenchao Hao, Andrew Morton, Lorenzo Stoakes, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, linux-kernel
On Tue, Feb 17, 2026 at 09:29:02PM +0100, David Hildenbrand (Arm) wrote:
> On 2/17/26 16:22, Wenchao Hao wrote:
> > On Sat, Feb 14, 2026 at 4:45 PM Wenchao Hao <haowenchao22@gmail.com> wrote:
> > >
> > > Add kernel command line option "count_zero_page" to track anonymous pages
> > > have been allocated and mapped to userspace but zero-filled.
> > >
> > > This feature is mainly used to debug large folio mechanism, which
> > > pre-allocates and map more pages than actually needed, leading to memory
> > > waste from unaccessed pages.
> > >
> > > Export the result in /proc/pid/smaps as "AnonZero" field.
> > >
> > > Link: https://lore.kernel.org/linux-mm/20260210043456.2137482-1-haowenchao22@gmail.com/
> >
> > Sorry for the late reply. We are now on Chinese New Year holiday, so...
> >
> > The original goal of this patch is to measure memory waste from anonymous
> > THPs - pages pre-allocated on fault but never accessed.
> >
> > On memory-sensitive devices like mobile phones, this helps us make better
> > decisions about when and how to enable THP. I think this is useful for
> > guiding THP policies, even as a debugging feature.
> >
> > Let me summarize the discussion so far:
> > - Matthew Wilcox questioned the value and raised concerns fork but haven't
> > exec path
> > - Michal Hocko criticized the inefficiency of scanning zero-filled pages.
> > - Kiryl Shutsemau prefers a system-call-based interface.
> > - David Hildenbrand acknowledged the value and suggested implementation
> > improvements.
> > Please correct me if I missed or misrepresented anything.
> >
> > I suggest we first agree whether this functionality is useful for upstream,
> > before discussing implementation details.
> >
> > Reasons why this should go upstream from me:
> >
> > - Anonymous THP can introduce real memory waste, but we currently have no
> > good way to measure it.
> > - With accurate metrics, we can make better THP policy: disable for
> > low-utilization cases, or early-unmap to relieve memory pressure and so
> > on. This is especially valuable for mobile/embedded devices.
> >
> > Possible implementations:
> >
> > 1. A new smaps counter (default-off) to count zero-filled pages.
> > 2. A new madvise command like MADV_GET_ZEROPAGE
> > 3. A dedicated system call
>
> I understand Kiyls point about smaps providing too much information users
> might not be interested in already. So sorting that out might provide a real
> benefit to other users that are only interested in specific stats (e.g.,
> Rss).
You can also limit the range of virtual address space you want to look
at.
> Providing a system call where one can specify/filter in theory sounds like a
> good idea. A syscall implies that one has to write a tool to obtain these
> metrics.
>
> The nice thing about smaps/smaps_rollup is that it can be easily consumed on
> any system while debugging.
>
> I wonder if there could be a way to achieve something similar with a file.
> Likely not, but maybe someone reading along can surprise me :)
I guess you can open a file write to it what you want to get and then
read. It is awkward from shell to keep file descriptor around, but doable.
> Otherwise we'd have to go with a tool.
A tool might be more ergonomic.
To minimize friction, it would be nice to put the tool into util-linux
(or whatever trendy Rust-rewrite called), so it would find its way to
every machine. Eventually.
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm: Add AnonZero accounting for zero-filled anonymous pages
2026-02-17 15:22 ` Wenchao Hao
2026-02-17 20:29 ` David Hildenbrand (Arm)
@ 2026-02-18 7:52 ` Michal Hocko
2026-02-19 2:47 ` Wenchao Hao
1 sibling, 1 reply; 26+ messages in thread
From: Michal Hocko @ 2026-02-18 7:52 UTC (permalink / raw)
To: Wenchao Hao
Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, linux-mm, linux-kernel
On Tue 17-02-26 23:22:20, Wenchao Hao wrote:
> On Sat, Feb 14, 2026 at 4:45 PM Wenchao Hao <haowenchao22@gmail.com> wrote:
> >
> > Add kernel command line option "count_zero_page" to track anonymous pages
> > have been allocated and mapped to userspace but zero-filled.
> >
> > This feature is mainly used to debug large folio mechanism, which
> > pre-allocates and map more pages than actually needed, leading to memory
> > waste from unaccessed pages.
> >
> > Export the result in /proc/pid/smaps as "AnonZero" field.
> >
> > Link: https://lore.kernel.org/linux-mm/20260210043456.2137482-1-haowenchao22@gmail.com/
>
> Sorry for the late reply. We are now on Chinese New Year holiday, so...
>
> The original goal of this patch is to measure memory waste from anonymous
> THPs - pages pre-allocated on fault but never accessed.
I believe you wanted to say "but never modified". Unless you map THP
through ptes you have simply do not have that information. Reading
zeroes might be just what your workload needs (e.g. large sparce data
structures).
> On memory-sensitive devices like mobile phones, this helps us make better
> decisions about when and how to enable THP. I think this is useful for
> guiding THP policies, even as a debugging feature.
>
> Let me summarize the discussion so far:
> - Matthew Wilcox questioned the value and raised concerns fork but haven't
> exec path
> - Michal Hocko criticized the inefficiency of scanning zero-filled pages.
Let me clarify. I am not objecting the inefficiency. _If_ you need to
recognize zero content then there are no ways around. I have merely
mentioned that the overhead is not negligible for /proc/<pid>/smaps as
you suggested.
> - Kiryl Shutsemau prefers a system-call-based interface.
> - David Hildenbrand acknowledged the value and suggested implementation
> improvements.
> Please correct me if I missed or misrepresented anything.
>
> I suggest we first agree whether this functionality is useful for upstream,
> before discussing implementation details.
Completely agreed!
> Reasons why this should go upstream from me:
>
> - Anonymous THP can introduce real memory waste, but we currently have no
> good way to measure it.
> - With accurate metrics, we can make better THP policy: disable for
> low-utilization cases, or early-unmap to relieve memory pressure and so
> on. This is especially valuable for mobile/embedded devices.
While I agree with your first point I am not so sure about the second.
You can easily run the same workload with and without THP enabled and
compare the rss to learn about a typical internal fragmentation (there
are several layers of precision you can influence - only for process,
madvise...). This is a very crude estimate but it gives you some
picture. Is it convenient. Not at all but likely sufficient if you are
debugging a reproducible workload.
So I would start by explaining why this crude approach is not really
feasible. You are talking about early-unmap. How exactly do you envision
this to be done? I mean finding zero pages is one thing but how do you
make any educated guess that that particular sparsely used page needs to
be broken down and partially unmapped. What kind of interface do you
want to use for that? MADV_FREE for all zero subranges?
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm: Add AnonZero accounting for zero-filled anonymous pages
2026-02-17 21:53 ` Kiryl Shutsemau
@ 2026-02-19 2:11 ` Wenchao Hao
0 siblings, 0 replies; 26+ messages in thread
From: Wenchao Hao @ 2026-02-19 2:11 UTC (permalink / raw)
To: Kiryl Shutsemau
Cc: David Hildenbrand (Arm),
Andrew Morton, Lorenzo Stoakes, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, linux-kernel
On Wed, Feb 18, 2026 at 5:53 AM Kiryl Shutsemau <kas@kernel.org> wrote:
>
> On Tue, Feb 17, 2026 at 09:29:02PM +0100, David Hildenbrand (Arm) wrote:
> > On 2/17/26 16:22, Wenchao Hao wrote:
> > > On Sat, Feb 14, 2026 at 4:45 PM Wenchao Hao <haowenchao22@gmail.com> wrote:
> > > >
> > > > Add kernel command line option "count_zero_page" to track anonymous pages
> > > > have been allocated and mapped to userspace but zero-filled.
> > > >
> > > > This feature is mainly used to debug large folio mechanism, which
> > > > pre-allocates and map more pages than actually needed, leading to memory
> > > > waste from unaccessed pages.
> > > >
> > > > Export the result in /proc/pid/smaps as "AnonZero" field.
> > > >
> > > > Link: https://lore.kernel.org/linux-mm/20260210043456.2137482-1-haowenchao22@gmail.com/
> > >
> > > Sorry for the late reply. We are now on Chinese New Year holiday, so...
> > >
> > > The original goal of this patch is to measure memory waste from anonymous
> > > THPs - pages pre-allocated on fault but never accessed.
> > >
> > > On memory-sensitive devices like mobile phones, this helps us make better
> > > decisions about when and how to enable THP. I think this is useful for
> > > guiding THP policies, even as a debugging feature.
> > >
> > > Let me summarize the discussion so far:
> > > - Matthew Wilcox questioned the value and raised concerns fork but haven't
> > > exec path
> > > - Michal Hocko criticized the inefficiency of scanning zero-filled pages.
> > > - Kiryl Shutsemau prefers a system-call-based interface.
> > > - David Hildenbrand acknowledged the value and suggested implementation
> > > improvements.
> > > Please correct me if I missed or misrepresented anything.
> > >
> > > I suggest we first agree whether this functionality is useful for upstream,
> > > before discussing implementation details.
> > >
> > > Reasons why this should go upstream from me:
> > >
> > > - Anonymous THP can introduce real memory waste, but we currently have no
> > > good way to measure it.
> > > - With accurate metrics, we can make better THP policy: disable for
> > > low-utilization cases, or early-unmap to relieve memory pressure and so
> > > on. This is especially valuable for mobile/embedded devices.
> > >
> > > Possible implementations:
> > >
> > > 1. A new smaps counter (default-off) to count zero-filled pages.
> > > 2. A new madvise command like MADV_GET_ZEROPAGE
> > > 3. A dedicated system call
> >
> > I understand Kiyls point about smaps providing too much information users
> > might not be interested in already. So sorting that out might provide a real
> > benefit to other users that are only interested in specific stats (e.g.,
> > Rss).
>
> You can also limit the range of virtual address space you want to look
> at.
>
> > Providing a system call where one can specify/filter in theory sounds like a
> > good idea. A syscall implies that one has to write a tool to obtain these
> > metrics.
> >
> > The nice thing about smaps/smaps_rollup is that it can be easily consumed on
> > any system while debugging.
> >
> > I wonder if there could be a way to achieve something similar with a file.
> > Likely not, but maybe someone reading along can surprise me :)
>
> I guess you can open a file write to it what you want to get and then
> read. It is awkward from shell to keep file descriptor around, but doable.
>
That approach doesn’t seem very elegant and could be rather complex.
In practice, we might need to analyze multiple processes simultaneously,
which would be quite difficult to implement.
> > Otherwise we'd have to go with a tool.
>
> A tool might be more ergonomic.
>
> To minimize friction, it would be nice to put the tool into util-linux
> (or whatever trendy Rust-rewrite called), so it would find its way to
> every machine. Eventually.
>
At the moment, we have two possible implementation approaches:
One is to extend smaps, with a dynamic toggle to count zero‑filled pages
only when explicitly enabled, so we avoid introducing unnecessary overhead
by default.
A potential downside of smaps is its relatively low information density, as
it may include data users are not interested in.
While in my view, the rich information from smaps is still very helpful for
memory analysis. For example, when a process shows large memory
fluctuations under certain scenarios, smaps snapshots can help us
investigate the root cause.
The other approach is to introduce a new system call, which could then be
packaged into a standard tool. This would allow higher information density
and filter out redundant data compared to smaps.
But so far we have been focusing on implementation details. I believe the
higher priority question is whether upstream actually needs this kind of
functionality.
I don’t want to waste community time and resources discussing something
the mainline does not consider valuable. I’d appreciate hearing everyone’s
thoughts on this.
Thanks.
> --
> Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm: Add AnonZero accounting for zero-filled anonymous pages
2026-02-18 7:52 ` Michal Hocko
@ 2026-02-19 2:47 ` Wenchao Hao
0 siblings, 0 replies; 26+ messages in thread
From: Wenchao Hao @ 2026-02-19 2:47 UTC (permalink / raw)
To: Michal Hocko
Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, linux-mm, linux-kernel
On Wed, Feb 18, 2026 at 3:52 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Tue 17-02-26 23:22:20, Wenchao Hao wrote:
> > On Sat, Feb 14, 2026 at 4:45 PM Wenchao Hao <haowenchao22@gmail.com> wrote:
> > >
> > > Add kernel command line option "count_zero_page" to track anonymous pages
> > > have been allocated and mapped to userspace but zero-filled.
> > >
> > > This feature is mainly used to debug large folio mechanism, which
> > > pre-allocates and map more pages than actually needed, leading to memory
> > > waste from unaccessed pages.
> > >
> > > Export the result in /proc/pid/smaps as "AnonZero" field.
> > >
> > > Link: https://lore.kernel.org/linux-mm/20260210043456.2137482-1-haowenchao22@gmail.com/
> >
> > Sorry for the late reply. We are now on Chinese New Year holiday, so...
> >
> > The original goal of this patch is to measure memory waste from anonymous
> > THPs - pages pre-allocated on fault but never accessed.
>
> I believe you wanted to say "but never modified". Unless you map THP
> through ptes you have simply do not have that information. Reading
> zeroes might be just what your workload needs (e.g. large sparce data
> structures).
>
Yes, my description is more focused on THP mapped via PTEs, such as 64K
huge pages. PMD-mapped THPs are rarely available on memory-constrained
devices like mobile phones because it's hard to get such continuous page.
The reason we need to scan for zero-filled pages is that the access bit
in PTEs cannot reflect the actual usage status of the PTEs.
This has been discussed earlier in the thread.
https://lore.kernel.org/linux-mm/20260210043456.2137482-1-haowenchao22@gmail.com/
> > On memory-sensitive devices like mobile phones, this helps us make better
> > decisions about when and how to enable THP. I think this is useful for
> > guiding THP policies, even as a debugging feature.
> >
> > Let me summarize the discussion so far:
> > - Matthew Wilcox questioned the value and raised concerns fork but haven't
> > exec path
> > - Michal Hocko criticized the inefficiency of scanning zero-filled pages.
>
> Let me clarify. I am not objecting the inefficiency. _If_ you need to
> recognize zero content then there are no ways around. I have merely
> mentioned that the overhead is not negligible for /proc/<pid>/smaps as
> you suggested.
>
> > - Kiryl Shutsemau prefers a system-call-based interface.
> > - David Hildenbrand acknowledged the value and suggested implementation
> > improvements.
> > Please correct me if I missed or misrepresented anything.
> >
> > I suggest we first agree whether this functionality is useful for upstream,
> > before discussing implementation details.
>
> Completely agreed!
>
> > Reasons why this should go upstream from me:
> >
> > - Anonymous THP can introduce real memory waste, but we currently have no
> > good way to measure it.
> > - With accurate metrics, we can make better THP policy: disable for
> > low-utilization cases, or early-unmap to relieve memory pressure and so
> > on. This is especially valuable for mobile/embedded devices.
>
> While I agree with your first point I am not so sure about the second.
> You can easily run the same workload with and without THP enabled and
> compare the rss to learn about a typical internal fragmentation (there
> are several layers of precision you can influence - only for process,
> madvise...). This is a very crude estimate but it gives you some
> picture. Is it convenient. Not at all but likely sufficient if you are
> debugging a reproducible workload.
>
Let me briefly describe the typical workload we are dealing with:
On Android devices, we monitor different apps and analyze the memory
overhead introduced by huge pages (such as 64K pages).
Even for the same app and same scenario, memory allocation and access
patterns can vary significantly and fluctuate widely. So the behavior is
not reproducible.
We could certainly use a controlled demo app for testing, but it cannot
reflect real-world usage.
> So I would start by explaining why this crude approach is not really
> feasible. You are talking about early-unmap. How exactly do you envision
> this to be done? I mean finding zero pages is one thing but how do you
> make any educated guess that that particular sparsely used page needs to
> be broken down and partially unmapped. What kind of interface do you
> want to use for that? MADV_FREE for all zero subranges?
This is just my early thinking, since we haven’t even finished the first
step—quantifying the memory waste introduced by huge pages.
My idea is to provide a mechanism, for example "MADV_SPLIT", which offer
the basic ability to split huge pages within a given range.
For example, split huge pages in a VMA whose access ratio is below a
certain threshold. The upper layer would then call "MADV_SPLIT" based on
the current system load.
Another approach would be to disable huge pages for apps with severe
memory waste to avoid unnecessary overhead.
All of these ideas are built on the first step: identifying and quantifying
the memory waste.
Thanks
> --
> Michal Hocko
> SUSE Labs
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2026-02-19 2:48 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-14 8:45 [PATCH] mm: Add AnonZero accounting for zero-filled anonymous pages Wenchao Hao
2026-02-16 11:34 ` Kiryl Shutsemau
2026-02-16 11:45 ` David Hildenbrand (Arm)
2026-02-16 11:58 ` Kiryl Shutsemau
2026-02-16 12:19 ` David Hildenbrand (Arm)
2026-02-16 15:59 ` Wenchao Hao
2026-02-16 16:42 ` Michal Hocko
2026-02-16 16:56 ` David Hildenbrand (Arm)
2026-02-16 17:10 ` Michal Hocko
2026-02-16 17:17 ` David Hildenbrand (Arm)
2026-02-16 16:54 ` Kiryl Shutsemau
2026-02-16 17:01 ` Matthew Wilcox
2026-02-16 17:10 ` David Hildenbrand (Arm)
2026-02-16 17:18 ` Kiryl Shutsemau
2026-02-16 12:15 ` David Hildenbrand (Arm)
2026-02-16 15:10 ` Wenchao Hao
2026-02-16 15:18 ` David Hildenbrand (Arm)
2026-02-16 14:22 ` Matthew Wilcox
2026-02-16 15:55 ` Wenchao Hao
2026-02-16 17:03 ` Matthew Wilcox
2026-02-17 15:22 ` Wenchao Hao
2026-02-17 20:29 ` David Hildenbrand (Arm)
2026-02-17 21:53 ` Kiryl Shutsemau
2026-02-19 2:11 ` Wenchao Hao
2026-02-18 7:52 ` Michal Hocko
2026-02-19 2:47 ` Wenchao Hao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox