linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Wenchao Hao <haowenchao22@gmail.com>
To: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@kernel.org>,
	 Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	 Vlastimil Babka <vbabka@suse.cz>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	 linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: Add AnonZero accounting for zero-filled anonymous pages
Date: Thu, 19 Feb 2026 10:47:51 +0800	[thread overview]
Message-ID: <CAOptpSNPmOZGq5_a6AmtnBiSnf+2gL_H3JkGW8u=xwuharXnng@mail.gmail.com> (raw)
In-Reply-To: <aZVvoWZDyJFfBqBr@tiehlicka>

On Wed, Feb 18, 2026 at 3:52 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Tue 17-02-26 23:22:20, Wenchao Hao wrote:
> > On Sat, Feb 14, 2026 at 4:45 PM Wenchao Hao <haowenchao22@gmail.com> wrote:
> > >
> > > Add kernel command line option "count_zero_page" to track anonymous pages
> > > have been allocated and mapped to userspace but zero-filled.
> > >
> > > This feature is mainly used to debug large folio mechanism, which
> > > pre-allocates and map more pages than actually needed, leading to memory
> > > waste from unaccessed pages.
> > >
> > > Export the result in /proc/pid/smaps as "AnonZero" field.
> > >
> > > Link: https://lore.kernel.org/linux-mm/20260210043456.2137482-1-haowenchao22@gmail.com/
> >
> > Sorry for the late reply. We are now on Chinese New Year holiday, so...
> >
> > The original goal of this patch is to measure memory waste from anonymous
> > THPs - pages pre-allocated on fault but never accessed.
>
> I believe you wanted to say "but never modified". Unless you map THP
> through ptes you have simply do not have that information. Reading
> zeroes might be just what your workload needs (e.g. large sparce data
> structures).
>
Yes, my description is more focused on THP mapped via PTEs, such as 64K
huge pages. PMD-mapped THPs are rarely available on memory-constrained
devices like mobile phones because it's hard to get such continuous page.

The reason we need to scan for zero-filled pages is that the access bit
in PTEs cannot reflect the actual usage status of the PTEs.

This has been discussed earlier in the thread.
https://lore.kernel.org/linux-mm/20260210043456.2137482-1-haowenchao22@gmail.com/

> > On memory-sensitive devices like mobile phones, this helps us make better
> > decisions about when and how to enable THP. I think this is useful for
> > guiding THP policies, even as a debugging feature.
> >
> > Let me summarize the discussion so far:
> > - Matthew Wilcox questioned the value and raised concerns fork but haven't
> >   exec path
> > - Michal Hocko criticized the inefficiency of scanning zero-filled pages.
>
> Let me clarify. I am not objecting the inefficiency. _If_ you need to
> recognize zero content then there are no ways around. I have merely
> mentioned that the overhead is not negligible for /proc/<pid>/smaps as
> you suggested.
>
> > - Kiryl Shutsemau prefers a system-call-based interface.
> > - David Hildenbrand acknowledged the value and suggested implementation
> >   improvements.
> > Please correct me if I missed or misrepresented anything.
> >
> > I suggest we first agree whether this functionality is useful for upstream,
> > before discussing implementation details.
>
> Completely agreed!
>
> > Reasons why this should go upstream from me:
> >
> > - Anonymous THP can introduce real memory waste, but we currently have no
> >   good way to measure it.
> > - With accurate metrics, we can make better THP policy: disable for
> >   low-utilization cases, or early-unmap to relieve memory pressure and so
> >   on. This is especially valuable for mobile/embedded devices.
>
> While I agree with your first point I am not so sure about the second.
> You can easily run the same workload with and without THP enabled and
> compare the rss to learn about a typical internal fragmentation (there
> are several layers of precision you can influence - only for process,
> madvise...). This is a very crude estimate but it gives you some
> picture. Is it convenient. Not at all but likely sufficient if you are
> debugging a reproducible workload.
>

Let me briefly describe the typical workload we are dealing with:
On Android devices, we monitor different apps and analyze the memory
overhead introduced by huge pages (such as 64K pages).
Even for the same app and same scenario, memory allocation and access
patterns can vary significantly and fluctuate widely. So the behavior is
not reproducible.

We could certainly use a controlled demo app for testing, but it cannot
reflect real-world usage.

> So I would start by explaining why this crude approach is not really
> feasible. You are talking about early-unmap. How exactly do you envision
> this to be done? I mean finding zero pages is one thing but how do you
> make any educated guess that that particular sparsely used page needs to
> be broken down and partially unmapped. What kind of interface do you
> want to use for that? MADV_FREE for all zero subranges?

This is just my early thinking, since we haven’t even finished the first
step—quantifying the memory waste introduced by huge pages.

My idea is to provide a mechanism, for example "MADV_SPLIT", which offer
the basic ability to split huge pages within a given range.
For example, split huge pages in a VMA whose access ratio is below a
certain threshold. The upper layer would then call "MADV_SPLIT" based on
the current system load.

Another approach would be to disable huge pages for apps with severe
memory waste to avoid unnecessary overhead.

All of these ideas are built on the first step: identifying and quantifying
the memory waste.

Thanks

> --
> Michal Hocko
> SUSE Labs


      reply	other threads:[~2026-02-19  2:48 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-14  8:45 Wenchao Hao
2026-02-16 11:34 ` Kiryl Shutsemau
2026-02-16 11:45   ` David Hildenbrand (Arm)
2026-02-16 11:58     ` Kiryl Shutsemau
2026-02-16 12:19       ` David Hildenbrand (Arm)
2026-02-16 15:59       ` Wenchao Hao
2026-02-16 16:42         ` Michal Hocko
2026-02-16 16:56           ` David Hildenbrand (Arm)
2026-02-16 17:10             ` Michal Hocko
2026-02-16 17:17               ` David Hildenbrand (Arm)
2026-02-16 16:54         ` Kiryl Shutsemau
2026-02-16 17:01           ` Matthew Wilcox
2026-02-16 17:10             ` David Hildenbrand (Arm)
2026-02-16 17:18             ` Kiryl Shutsemau
2026-02-16 12:15 ` David Hildenbrand (Arm)
2026-02-16 15:10   ` Wenchao Hao
2026-02-16 15:18     ` David Hildenbrand (Arm)
2026-02-16 14:22 ` Matthew Wilcox
2026-02-16 15:55   ` Wenchao Hao
2026-02-16 17:03 ` Matthew Wilcox
2026-02-17 15:22 ` Wenchao Hao
2026-02-17 20:29   ` David Hildenbrand (Arm)
2026-02-17 21:53     ` Kiryl Shutsemau
2026-02-19  2:11       ` Wenchao Hao
2026-02-18  7:52   ` Michal Hocko
2026-02-19  2:47     ` Wenchao Hao [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOptpSNPmOZGq5_a6AmtnBiSnf+2gL_H3JkGW8u=xwuharXnng@mail.gmail.com' \
    --to=haowenchao22@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox