[RFC PATCH] mm: only set fault addrsss' access bit in do_anonymous

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH] mm: only set fault addrsss' access bit in do_anonymous_page
@ 2026-02-10  4:34 Wenchao Hao
  2026-02-10  9:07 ` David Hildenbrand (Arm)
  2026-02-10 11:56 ` Kiryl Shutsemau
  0 siblings, 2 replies; 17+ messages in thread
From: Wenchao Hao @ 2026-02-10  4:34 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel
  Cc: Wenchao Hao

When do_anonymous_page() creates mappings for huge pages, it currently sets
the access bit for all mapped PTEs (Page Table Entries) by default.

This causes an issue where the Referenced field in /proc/pid/smaps cannot
distinguish whether a page was actually accessed.

So here introduces a new interface, set_anon_ptes(), which only sets the
access bit for the PTE corresponding to the faulting address. This allows
accurate tracking of page access status in /proc/pid/smaps before memory
reclaim scan the folios.

During memory reclaim: folio_referenced() checks and clears the access bits
of PTEs, rmap verifies all PTEs under a folio. If any PTE mapped subpage of
folio has access bit set, the folio is retained during reclaim. So only
set the access bit for the faulting PTE in do_anonymous_page() is safe, as
it does not interfere with reclaim decisions.

The patch only supports architectures without custom set_ptes()
implementations (e.g., x86). ARM64 and other architectures are not yet
supported.

Additionally, I have some questions regarding the contiguous page tables
for 64K huge pages on the ARM64 architecture.

'commit 4602e5757bcc ("arm64/mm: wire up PTE_CONT for user mappings")'
described as following:

> Since a contpte block only has a single access and dirty bit, the semantic
> here changes slightly; when getting a pte (e.g.  ptep_get()) that is part
> of a contpte mapping, the access and dirty information are pulled from the
> block (so all ptes in the block return the same access/dirty info).

While the ARM64 manual states:

> If hardware updates a translation table entry, and if the Contiguous bit in
> that entry is 1, then the members in a group of contiguous translation table
> entries can have different AF, AP[2], and S2AP[1] values.

Does this mean the 16 PTEs are not necessary to share same AF for ARM?

Currently, for ARM64 huge pages with contiguous page tables enabled, the access
and dirty bits for 64K huge pages are actually folded in software.

However, I haven't found whether these access and dirty bits affect the TLB
coalescing of contiguous page tables. If they do not affect it, I think ARM64
can also set the access bit only for the PTE corresponding to the actual fault
address in do_anonymous_page().

Signed-off-by: Wenchao Hao <haowenchao22@gmail.com>
---
 include/linux/pgtable.h | 28 ++++++++++++++++++++++++++++
 mm/memory.c             |  2 +-
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 652f287c1ef6..e2f3c932d672 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -302,6 +302,34 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
 #endif
 #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)

+#ifndef set_ptes
+static inline void set_anon_ptes(struct mm_struct *mm, unsigned long addr,
+		unsigned long fault_addr, pte_t *ptep, pte_t pte, unsigned int nr)
+{
+	bool young = pte_young(pte);
+
+	page_table_check_ptes_set(mm, ptep, pte, nr);
+
+	for (;;) {
+		if (young && addr == fault_addr)
+			pte = pte_mkyoung(pte);
+		else
+			pte = pte_mkold(pte);
+
+		set_pte(ptep, pte);
+		if (--nr == 0)
+			break;
+
+		addr += PAGE_SIZE;
+		ptep++;
+		pte = pte_next_pfn(pte);
+	}
+}
+#else
+#define set_anon_ptes(mm, addr, fault_addr, ptep, pte, nr) \
+		set_ptes(mm, addr, ptep, pte, nr)
+#endif
+
 #ifndef __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
 extern int ptep_set_access_flags(struct vm_area_struct *vma,
 				 unsigned long address, pte_t *ptep,
diff --git a/mm/memory.c b/mm/memory.c
index da360a6eb8a4..65c69c7116a7 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5273,7 +5273,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
 setpte:
 	if (vmf_orig_pte_uffd_wp(vmf))
 		entry = pte_mkuffd_wp(entry);
-	set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr_pages);
+	set_anon_ptes(vma->vm_mm, addr, vmf->address, vmf->pte, entry, nr_pages);

 	/* No need to invalidate - it was non-present before */
 	update_mmu_cache_range(vmf, vma, addr, vmf->pte, nr_pages);
-- 
2.45.0

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] mm: only set fault addrsss' access bit in do_anonymous_page
  2026-02-10  4:34 [RFC PATCH] mm: only set fault addrsss' access bit in do_anonymous_page Wenchao Hao
@ 2026-02-10  9:07 ` David Hildenbrand (Arm)
  2026-02-11  0:49   ` Wenchao Hao
  2026-02-10 11:56 ` Kiryl Shutsemau
  1 sibling, 1 reply; 17+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-10  9:07 UTC (permalink / raw)
  To: Wenchao Hao, Andrew Morton, Lorenzo Stoakes, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	linux-mm, linux-kernel

On 2/10/26 05:34, Wenchao Hao wrote:
> When do_anonymous_page() creates mappings for huge pages, it currently sets
> the access bit for all mapped PTEs (Page Table Entries) by default.
> 
> This causes an issue where the Referenced field in /proc/pid/smaps cannot
> distinguish whether a page was actually accessed.

What is the use case that cares about that?

What we have right now is the exact same behavior as if you would get a 
PMD THP that has a single access+dirty bit at fault time.

Also, architectures that support transparent PTE coalescing will not be 
able to coalesce until all PTE bits are equal.

This level of imprecision is to be expected with large folios that only 
have a single access+dirty bit.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] mm: only set fault addrsss' access bit in do_anonymous_page
  2026-02-10  4:34 [RFC PATCH] mm: only set fault addrsss' access bit in do_anonymous_page Wenchao Hao
  2026-02-10  9:07 ` David Hildenbrand (Arm)
@ 2026-02-10 11:56 ` Kiryl Shutsemau
  2026-02-11  1:00   ` Wenchao Hao
  1 sibling, 1 reply; 17+ messages in thread
From: Kiryl Shutsemau @ 2026-02-10 11:56 UTC (permalink / raw)
  To: Wenchao Hao
  Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel

On Tue, Feb 10, 2026 at 12:34:56PM +0800, Wenchao Hao wrote:
> When do_anonymous_page() creates mappings for huge pages, it currently sets
> the access bit for all mapped PTEs (Page Table Entries) by default.
> 
> This causes an issue where the Referenced field in /proc/pid/smaps cannot
> distinguish whether a page was actually accessed.
> 
> So here introduces a new interface, set_anon_ptes(), which only sets the
> access bit for the PTE corresponding to the faulting address. This allows
> accurate tracking of page access status in /proc/pid/smaps before memory
> reclaim scan the folios.
> 
> During memory reclaim: folio_referenced() checks and clears the access bits
> of PTEs, rmap verifies all PTEs under a folio. If any PTE mapped subpage of
> folio has access bit set, the folio is retained during reclaim. So only
> set the access bit for the faulting PTE in do_anonymous_page() is safe, as
> it does not interfere with reclaim decisions.

We had similar discussion about faultaround and briefly made it produce
old ptes, but it caused performance regression as old ptes require
additional pagewalk to set accessed bit on touch. It got reverted,
but arch can opt-in for setting up old ptes for non-fault address.

See commits:

5c0a85fad949 ("mm: make faultaround produce old ptes")
315d09bf30c2 ("Revert "mm: make faultaround produce old ptes"")
46bdb4277f98 ("mm: Allow architectures to request 'old' entries when prefaulting")

-- 
  Kiryl Shutsemau / Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] mm: only set fault addrsss' access bit in do_anonymous_page
  2026-02-10  9:07 ` David Hildenbrand (Arm)
@ 2026-02-11  0:49   ` Wenchao Hao
  2026-02-11  4:18     ` Dev Jain
  2026-02-11  9:05     ` David Hildenbrand (Arm)
  0 siblings, 2 replies; 17+ messages in thread
From: Wenchao Hao @ 2026-02-11  0:49 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Andrew Morton, Lorenzo Stoakes, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	linux-mm, linux-kernel

On Tue, Feb 10, 2026 at 5:07 PM David Hildenbrand (Arm)
<david@kernel.org> wrote:
>
> On 2/10/26 05:34, Wenchao Hao wrote:
> > When do_anonymous_page() creates mappings for huge pages, it currently sets
> > the access bit for all mapped PTEs (Page Table Entries) by default.
> >
> > This causes an issue where the Referenced field in /proc/pid/smaps cannot
> > distinguish whether a page was actually accessed.
>
> What is the use case that cares about that?
>

We have enabled 64KB large folios on Android devices, which may introduce
some memory waste. I want to figure out the proportion of memory waste
caused by large folios. Reading the "Referenced" field from /proc/pid/smaps
is a relatively low-cost method.

Additionally, considering future hot/cold page identification, we aim to
detect 64KB large folios where some pages are actually unaccessed and split
them into normal pages to avoid memory waste.

However, the current large folio implementation sets the access bit for all
page table entries (PTEs) of the large folio in the do_anonymous_page
function, making it hard to distinguish whether pre-allocated pages were
truly accessed.

> What we have right now is the exact same behavior as if you would get a
> PMD THP that has a single access+dirty bit at fault time.
>
> Also, architectures that support transparent PTE coalescing will not be
> able to coalesce until all PTE bits are equal.
>
> This level of imprecision is to be expected with large folios that only
> have a single access+dirty bit.
>

Thanks a lot for the response.

I saw this description in the ARM manual, “D8.5.5 Use of the Contiguous bit
with hardware updates to the translation tables”:

> If hardware updates a translation table entry, and if the Contiguous bit in
> that entry is 1, then the members in a group of contiguous translation table
> entries can have different AF, AP[2], and S2AP[1] values.

Does this mean that after hardware aggregates multiple PTEs, it can still
independently set the AF and other flag bits corresponding to specific
sub-PTE?

If so, can software also set different AF bits for a group of 16 PTEs
without affecting the transparent PTE coalescing function?

The reason I have this confusion is that there is such a description in
“D8.7.1 The Contiguous bit:”

> Software is required to ensure that all of the adjacent translation table
> entries for the contiguous region point to a contiguous OA range with
> consistent attributes and permissions.

It does not specify whether attributes and permissions include the AF bit.

> --
> Cheers,
>
> David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] mm: only set fault addrsss' access bit in do_anonymous_page
  2026-02-10 11:56 ` Kiryl Shutsemau
@ 2026-02-11  1:00   ` Wenchao Hao
  2026-02-11 11:03     ` Kiryl Shutsemau
  0 siblings, 1 reply; 17+ messages in thread
From: Wenchao Hao @ 2026-02-11  1:00 UTC (permalink / raw)
  To: Kiryl Shutsemau
  Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel

On Tue, Feb 10, 2026 at 7:56 PM Kiryl Shutsemau <kirill@shutemov.name> wrote:
>
> On Tue, Feb 10, 2026 at 12:34:56PM +0800, Wenchao Hao wrote:
> > When do_anonymous_page() creates mappings for huge pages, it currently sets
> > the access bit for all mapped PTEs (Page Table Entries) by default.
> >
> > This causes an issue where the Referenced field in /proc/pid/smaps cannot
> > distinguish whether a page was actually accessed.
> >
> > So here introduces a new interface, set_anon_ptes(), which only sets the
> > access bit for the PTE corresponding to the faulting address. This allows
> > accurate tracking of page access status in /proc/pid/smaps before memory
> > reclaim scan the folios.
> >
> > During memory reclaim: folio_referenced() checks and clears the access bits
> > of PTEs, rmap verifies all PTEs under a folio. If any PTE mapped subpage of
> > folio has access bit set, the folio is retained during reclaim. So only
> > set the access bit for the faulting PTE in do_anonymous_page() is safe, as
> > it does not interfere with reclaim decisions.
>
> We had similar discussion about faultaround and briefly made it produce
> old ptes, but it caused performance regression as old ptes require
> additional pagewalk to set accessed bit on touch. It got reverted,
> but arch can opt-in for setting up old ptes for non-fault address.
>
> See commits:
>
> 5c0a85fad949 ("mm: make faultaround produce old ptes")
> 315d09bf30c2 ("Revert "mm: make faultaround produce old ptes"")
> 46bdb4277f98 ("mm: Allow architectures to request 'old' entries when prefaulting")
>
It does look similar—our modifications both revolve around whether pre-mapped
PTEs should be marked as "new."

Was there any analysis into why your changes led to performance regressions?
This could help guide whether my modifications are meaningful, and perhaps I
could reference your approach to implement similar changes for different
architectures.

> --
>   Kiryl Shutsemau / Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] mm: only set fault addrsss' access bit in do_anonymous_page
  2026-02-11  0:49   ` Wenchao Hao
@ 2026-02-11  4:18     ` Dev Jain
  2026-02-12  1:42       ` Wenchao Hao
  2026-02-11  9:05     ` David Hildenbrand (Arm)
  1 sibling, 1 reply; 17+ messages in thread
From: Dev Jain @ 2026-02-11  4:18 UTC (permalink / raw)
  To: Wenchao Hao, David Hildenbrand (Arm)
  Cc: Andrew Morton, Lorenzo Stoakes, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	linux-mm, linux-kernel


On 11/02/26 6:19 am, Wenchao Hao wrote:
> On Tue, Feb 10, 2026 at 5:07 PM David Hildenbrand (Arm)
> <david@kernel.org> wrote:
>> On 2/10/26 05:34, Wenchao Hao wrote:
>>> When do_anonymous_page() creates mappings for huge pages, it currently sets
>>> the access bit for all mapped PTEs (Page Table Entries) by default.
>>>
>>> This causes an issue where the Referenced field in /proc/pid/smaps cannot
>>> distinguish whether a page was actually accessed.
>> What is the use case that cares about that?
>>
> We have enabled 64KB large folios on Android devices, which may introduce
> some memory waste. I want to figure out the proportion of memory waste
> caused by large folios. Reading the "Referenced" field from /proc/pid/smaps
> is a relatively low-cost method.
>
> Additionally, considering future hot/cold page identification, we aim to
> detect 64KB large folios where some pages are actually unaccessed and split
> them into normal pages to avoid memory waste.
>
> However, the current large folio implementation sets the access bit for all
> page table entries (PTEs) of the large folio in the do_anonymous_page
> function, making it hard to distinguish whether pre-allocated pages were
> truly accessed.
>
>> What we have right now is the exact same behavior as if you would get a
>> PMD THP that has a single access+dirty bit at fault time.
>>
>> Also, architectures that support transparent PTE coalescing will not be
>> able to coalesce until all PTE bits are equal.
>>
>> This level of imprecision is to be expected with large folios that only
>> have a single access+dirty bit.
>>
> Thanks a lot for the response.
>
> I saw this description in the ARM manual, “D8.5.5 Use of the Contiguous bit
> with hardware updates to the translation tables”:
>
>
>> If hardware updates a translation table entry, and if the Contiguous bit in
>> that entry is 1, then the members in a group of contiguous translation table
>> entries can have different AF, AP[2], and S2AP[1] values.
> Does this mean that after hardware aggregates multiple PTEs, it can still
> independently set the AF and other flag bits corresponding to specific
> sub-PTE?

Yes. Hardware can update access and dirty bits per-pte. It is the job
of software to aggregate them.

>
> If so, can software also set different AF bits for a group of 16 PTEs
> without affecting the transparent PTE coalescing function?

Yes. See set_ptes -> __contpte_try_fold: look at pte_mkold(pte_mkclean()).
We ignore the a/d bits while constructing the next expected pte.

>
> The reason I have this confusion is that there is such a description in
> “D8.7.1 The Contiguous bit:”
>
>> Software is required to ensure that all of the adjacent translation table
>> entries for the contiguous region point to a contiguous OA range with
>> consistent attributes and permissions.
> It does not specify whether attributes and permissions include the AF bit.
>
>> --
>> Cheers,
>>
>> David


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] mm: only set fault addrsss' access bit in do_anonymous_page
  2026-02-11  0:49   ` Wenchao Hao
  2026-02-11  4:18     ` Dev Jain
@ 2026-02-11  9:05     ` David Hildenbrand (Arm)
  2026-02-12  1:57       ` Wenchao Hao
  1 sibling, 1 reply; 17+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-11  9:05 UTC (permalink / raw)
  To: Wenchao Hao
  Cc: Andrew Morton, Lorenzo Stoakes, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	linux-mm, linux-kernel

On 2/11/26 01:49, Wenchao Hao wrote:
> On Tue, Feb 10, 2026 at 5:07 PM David Hildenbrand (Arm)
> <david@kernel.org> wrote:
>>
>> On 2/10/26 05:34, Wenchao Hao wrote:
>>> When do_anonymous_page() creates mappings for huge pages, it currently sets
>>> the access bit for all mapped PTEs (Page Table Entries) by default.
>>>
>>> This causes an issue where the Referenced field in /proc/pid/smaps cannot
>>> distinguish whether a page was actually accessed.
>>
>> What is the use case that cares about that?
>>
> 
> We have enabled 64KB large folios on Android devices, which may introduce
> some memory waste. I want to figure out the proportion of memory waste
> caused by large folios. Reading the "Referenced" field from /proc/pid/smaps
> is a relatively low-cost method.

Right. And that imprecision is to be expected when you opt-in into 
something that manages memory in other granularity and only has a single 
a/d bit: a large folio.

Sure, individual PTEs *might* have independent a/d bits, but the 
underlying thing (folio) has only a single one. And optimizations that 
build on top (pte coalescing) reuse that principle that having a single 
logical a/d bit is fine.

> 
> Additionally, considering future hot/cold page identification, we aim to
> detect 64KB large folios where some pages are actually unaccessed and split
> them into normal pages to avoid memory waste.
> 
> However, the current large folio implementation sets the access bit for all
> page table entries (PTEs) of the large folio in the do_anonymous_page
> function, making it hard to distinguish whether pre-allocated pages were
> truly accessed.

The deferred shrinker uses a much simpler mechanism: if the page content 
is zero, likely it was over-allocated and never used later.

It's not completely lightweight (scan pages for 0 content), but is 
reliable, independent of the mapping type (PMD, cont-pte, whatever) and 
independent of any access/dirty bits, leaving performance unharmed.

When you say "I want to figure out the proportion of memory waste", are 
we talking about a debug feature?

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] mm: only set fault addrsss' access bit in do_anonymous_page
  2026-02-11  1:00   ` Wenchao Hao
@ 2026-02-11 11:03     ` Kiryl Shutsemau
  2026-02-12  2:08       ` Wenchao Hao
  0 siblings, 1 reply; 17+ messages in thread
From: Kiryl Shutsemau @ 2026-02-11 11:03 UTC (permalink / raw)
  To: Wenchao Hao
  Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel

On Wed, Feb 11, 2026 at 09:00:45AM +0800, Wenchao Hao wrote:
> On Tue, Feb 10, 2026 at 7:56 PM Kiryl Shutsemau <kirill@shutemov.name> wrote:
> >
> > On Tue, Feb 10, 2026 at 12:34:56PM +0800, Wenchao Hao wrote:
> > > When do_anonymous_page() creates mappings for huge pages, it currently sets
> > > the access bit for all mapped PTEs (Page Table Entries) by default.
> > >
> > > This causes an issue where the Referenced field in /proc/pid/smaps cannot
> > > distinguish whether a page was actually accessed.
> > >
> > > So here introduces a new interface, set_anon_ptes(), which only sets the
> > > access bit for the PTE corresponding to the faulting address. This allows
> > > accurate tracking of page access status in /proc/pid/smaps before memory
> > > reclaim scan the folios.
> > >
> > > During memory reclaim: folio_referenced() checks and clears the access bits
> > > of PTEs, rmap verifies all PTEs under a folio. If any PTE mapped subpage of
> > > folio has access bit set, the folio is retained during reclaim. So only
> > > set the access bit for the faulting PTE in do_anonymous_page() is safe, as
> > > it does not interfere with reclaim decisions.
> >
> > We had similar discussion about faultaround and briefly made it produce
> > old ptes, but it caused performance regression as old ptes require
> > additional pagewalk to set accessed bit on touch. It got reverted,
> > but arch can opt-in for setting up old ptes for non-fault address.
> >
> > See commits:
> >
> > 5c0a85fad949 ("mm: make faultaround produce old ptes")
> > 315d09bf30c2 ("Revert "mm: make faultaround produce old ptes"")
> > 46bdb4277f98 ("mm: Allow architectures to request 'old' entries when prefaulting")
> >
> It does look similar—our modifications both revolve around whether pre-mapped
> PTEs should be marked as "new."
> 
> Was there any analysis into why your changes led to performance regressions?

As I mentioned, my theory was that it is due to an additional pagewalks
CPU has to do to flip access bit when it touches the memory, but I
didn't profile it to confirm.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] mm: only set fault addrsss' access bit in do_anonymous_page
  2026-02-11  4:18     ` Dev Jain
@ 2026-02-12  1:42       ` Wenchao Hao
  2026-02-12  5:04         ` Dev Jain
  0 siblings, 1 reply; 17+ messages in thread
From: Wenchao Hao @ 2026-02-12  1:42 UTC (permalink / raw)
  To: Dev Jain
  Cc: David Hildenbrand (Arm),
	Andrew Morton, Lorenzo Stoakes, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	linux-mm, linux-kernel

On Wed, Feb 11, 2026 at 12:18 PM Dev Jain <dev.jain@arm.com> wrote:
>
>
> On 11/02/26 6:19 am, Wenchao Hao wrote:
> > On Tue, Feb 10, 2026 at 5:07 PM David Hildenbrand (Arm)
> > <david@kernel.org> wrote:
> >> On 2/10/26 05:34, Wenchao Hao wrote:
> >>> When do_anonymous_page() creates mappings for huge pages, it currently sets
> >>> the access bit for all mapped PTEs (Page Table Entries) by default.
> >>>
> >>> This causes an issue where the Referenced field in /proc/pid/smaps cannot
> >>> distinguish whether a page was actually accessed.
> >> What is the use case that cares about that?
> >>
> > We have enabled 64KB large folios on Android devices, which may introduce
> > some memory waste. I want to figure out the proportion of memory waste
> > caused by large folios. Reading the "Referenced" field from /proc/pid/smaps
> > is a relatively low-cost method.
> >
> > Additionally, considering future hot/cold page identification, we aim to
> > detect 64KB large folios where some pages are actually unaccessed and split
> > them into normal pages to avoid memory waste.
> >
> > However, the current large folio implementation sets the access bit for all
> > page table entries (PTEs) of the large folio in the do_anonymous_page
> > function, making it hard to distinguish whether pre-allocated pages were
> > truly accessed.
> >
> >> What we have right now is the exact same behavior as if you would get a
> >> PMD THP that has a single access+dirty bit at fault time.
> >>
> >> Also, architectures that support transparent PTE coalescing will not be
> >> able to coalesce until all PTE bits are equal.
> >>
> >> This level of imprecision is to be expected with large folios that only
> >> have a single access+dirty bit.
> >>
> > Thanks a lot for the response.
> >
> > I saw this description in the ARM manual, “D8.5.5 Use of the Contiguous bit
> > with hardware updates to the translation tables”:
> >
> >
> >> If hardware updates a translation table entry, and if the Contiguous bit in
> >> that entry is 1, then the members in a group of contiguous translation table
> >> entries can have different AF, AP[2], and S2AP[1] values.
> > Does this mean that after hardware aggregates multiple PTEs, it can still
> > independently set the AF and other flag bits corresponding to specific
> > sub-PTE?
>
> Yes. Hardware can update access and dirty bits per-pte. It is the job
> of software to aggregate them.
>
> >
> > If so, can software also set different AF bits for a group of 16 PTEs
> > without affecting the transparent PTE coalescing function?
>
> Yes. See set_ptes -> __contpte_try_fold: look at pte_mkold(pte_mkclean()).
> We ignore the a/d bits while constructing the next expected pte.
>

Thank you for your answer. I think we can now get the following conclusion:
From a hardware perspective, after the PTE continuous bit is set, the access
and dirty flags of the PTE do not affect the transparent PTE
coalescing function.

> >
> > The reason I have this confusion is that there is such a description in
> > “D8.7.1 The Contiguous bit:”
> >
> >> Software is required to ensure that all of the adjacent translation table
> >> entries for the contiguous region point to a contiguous OA range with
> >> consistent attributes and permissions.
> > It does not specify whether attributes and permissions include the AF bit.
> >
> >> --
> >> Cheers,
> >>
> >> David


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] mm: only set fault addrsss' access bit in do_anonymous_page
  2026-02-11  9:05     ` David Hildenbrand (Arm)
@ 2026-02-12  1:57       ` Wenchao Hao
  2026-02-12  8:54         ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 17+ messages in thread
From: Wenchao Hao @ 2026-02-12  1:57 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Andrew Morton, Lorenzo Stoakes, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	linux-mm, linux-kernel

On Wed, Feb 11, 2026 at 5:05 PM David Hildenbrand (Arm)
<david@kernel.org> wrote:
>
> On 2/11/26 01:49, Wenchao Hao wrote:
> > On Tue, Feb 10, 2026 at 5:07 PM David Hildenbrand (Arm)
> > <david@kernel.org> wrote:
> >>
> >> On 2/10/26 05:34, Wenchao Hao wrote:
> >>> When do_anonymous_page() creates mappings for huge pages, it currently sets
> >>> the access bit for all mapped PTEs (Page Table Entries) by default.
> >>>
> >>> This causes an issue where the Referenced field in /proc/pid/smaps cannot
> >>> distinguish whether a page was actually accessed.
> >>
> >> What is the use case that cares about that?
> >>
> >
> > We have enabled 64KB large folios on Android devices, which may introduce
> > some memory waste. I want to figure out the proportion of memory waste
> > caused by large folios. Reading the "Referenced" field from /proc/pid/smaps
> > is a relatively low-cost method.
>
> Right. And that imprecision is to be expected when you opt-in into
> something that manages memory in other granularity and only has a single
> a/d bit: a large folio.
>
> Sure, individual PTEs *might* have independent a/d bits, but the
> underlying thing (folio) has only a single one. And optimizations that
> build on top (pte coalescing) reuse that principle that having a single
> logical a/d bit is fine.
>
> >
> > Additionally, considering future hot/cold page identification, we aim to
> > detect 64KB large folios where some pages are actually unaccessed and split
> > them into normal pages to avoid memory waste.
> >
> > However, the current large folio implementation sets the access bit for all
> > page table entries (PTEs) of the large folio in the do_anonymous_page
> > function, making it hard to distinguish whether pre-allocated pages were
> > truly accessed.
>
> The deferred shrinker uses a much simpler mechanism: if the page content
> is zero, likely it was over-allocated and never used later.
>
> It's not completely lightweight (scan pages for 0 content), but is
> reliable, independent of the mapping type (PMD, cont-pte, whatever) and
> independent of any access/dirty bits, leaving performance unharmed.
>
> When you say "I want to figure out the proportion of memory waste", are
> we talking about a debug feature?
>

Thanks for your explanation. I now understand the design logic.

What I’m proposing is mainly for debugging. After enabling 64K large folio
on Android, we observed increased application memory footprint, especially
for anonymous pages.

Since Android app memory usage depends on runtime scenarios, we cannot
confirm if the growth is directly caused by large folio. We want to
analyze memory
usage via the `Referenced` field in `/proc/[pid]/smaps`.

However, with the current 64K anonymous large folio mapping implementation,
the `Referenced` field does not reflect actual page access activity correctly.

This is why I’m sending this RFC patch.

> --
> Cheers,
>
> David


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] mm: only set fault addrsss' access bit in do_anonymous_page
  2026-02-11 11:03     ` Kiryl Shutsemau
@ 2026-02-12  2:08       ` Wenchao Hao
  0 siblings, 0 replies; 17+ messages in thread
From: Wenchao Hao @ 2026-02-12  2:08 UTC (permalink / raw)
  To: Kiryl Shutsemau
  Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel

On Wed, Feb 11, 2026 at 7:03 PM Kiryl Shutsemau <kirill@shutemov.name> wrote:
>
> On Wed, Feb 11, 2026 at 09:00:45AM +0800, Wenchao Hao wrote:
> > On Tue, Feb 10, 2026 at 7:56 PM Kiryl Shutsemau <kirill@shutemov.name> wrote:
> > >
> > > On Tue, Feb 10, 2026 at 12:34:56PM +0800, Wenchao Hao wrote:
> > > > When do_anonymous_page() creates mappings for huge pages, it currently sets
> > > > the access bit for all mapped PTEs (Page Table Entries) by default.
> > > >
> > > > This causes an issue where the Referenced field in /proc/pid/smaps cannot
> > > > distinguish whether a page was actually accessed.
> > > >
> > > > So here introduces a new interface, set_anon_ptes(), which only sets the
> > > > access bit for the PTE corresponding to the faulting address. This allows
> > > > accurate tracking of page access status in /proc/pid/smaps before memory
> > > > reclaim scan the folios.
> > > >
> > > > During memory reclaim: folio_referenced() checks and clears the access bits
> > > > of PTEs, rmap verifies all PTEs under a folio. If any PTE mapped subpage of
> > > > folio has access bit set, the folio is retained during reclaim. So only
> > > > set the access bit for the faulting PTE in do_anonymous_page() is safe, as
> > > > it does not interfere with reclaim decisions.
> > >
> > > We had similar discussion about faultaround and briefly made it produce
> > > old ptes, but it caused performance regression as old ptes require
> > > additional pagewalk to set accessed bit on touch. It got reverted,
> > > but arch can opt-in for setting up old ptes for non-fault address.
> > >
> > > See commits:
> > >
> > > 5c0a85fad949 ("mm: make faultaround produce old ptes")
> > > 315d09bf30c2 ("Revert "mm: make faultaround produce old ptes"")
> > > 46bdb4277f98 ("mm: Allow architectures to request 'old' entries when prefaulting")
> > >
> > It does look similar—our modifications both revolve around whether pre-mapped
> > PTEs should be marked as "new."
> >
> > Was there any analysis into why your changes led to performance regressions?
>
> As I mentioned, my theory was that it is due to an additional pagewalks
> CPU has to do to flip access bit when it touches the memory, but I
> didn't profile it to confirm.
>
Thanks for your reply.
My change is mainly for debugging purposes and targeted at huge pages.
The only place I can see that might relate to the access bit of
contiguous huge pages is memory reclaim.
However, the huge page reclaim logic checks the access status of the
entire folio, so my conclusion is that this change should not affect
performance.
As for the overhead of the CPU flipping access bits you mentioned, it
cannot be determined at this point.


> --
>   Kiryl Shutsemau / Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] mm: only set fault addrsss' access bit in do_anonymous_page
  2026-02-12  1:42       ` Wenchao Hao
@ 2026-02-12  5:04         ` Dev Jain
  0 siblings, 0 replies; 17+ messages in thread
From: Dev Jain @ 2026-02-12  5:04 UTC (permalink / raw)
  To: Wenchao Hao
  Cc: David Hildenbrand (Arm),
	Andrew Morton, Lorenzo Stoakes, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	linux-mm, linux-kernel


On 12/02/26 7:12 am, Wenchao Hao wrote:
> On Wed, Feb 11, 2026 at 12:18 PM Dev Jain <dev.jain@arm.com> wrote:
>>
>> On 11/02/26 6:19 am, Wenchao Hao wrote:
>>> On Tue, Feb 10, 2026 at 5:07 PM David Hildenbrand (Arm)
>>> <david@kernel.org> wrote:
>>>> On 2/10/26 05:34, Wenchao Hao wrote:
>>>>> When do_anonymous_page() creates mappings for huge pages, it currently sets
>>>>> the access bit for all mapped PTEs (Page Table Entries) by default.
>>>>>
>>>>> This causes an issue where the Referenced field in /proc/pid/smaps cannot
>>>>> distinguish whether a page was actually accessed.
>>>> What is the use case that cares about that?
>>>>
>>> We have enabled 64KB large folios on Android devices, which may introduce
>>> some memory waste. I want to figure out the proportion of memory waste
>>> caused by large folios. Reading the "Referenced" field from /proc/pid/smaps
>>> is a relatively low-cost method.
>>>
>>> Additionally, considering future hot/cold page identification, we aim to
>>> detect 64KB large folios where some pages are actually unaccessed and split
>>> them into normal pages to avoid memory waste.
>>>
>>> However, the current large folio implementation sets the access bit for all
>>> page table entries (PTEs) of the large folio in the do_anonymous_page
>>> function, making it hard to distinguish whether pre-allocated pages were
>>> truly accessed.
>>>
>>>> What we have right now is the exact same behavior as if you would get a
>>>> PMD THP that has a single access+dirty bit at fault time.
>>>>
>>>> Also, architectures that support transparent PTE coalescing will not be
>>>> able to coalesce until all PTE bits are equal.
>>>>
>>>> This level of imprecision is to be expected with large folios that only
>>>> have a single access+dirty bit.
>>>>
>>> Thanks a lot for the response.
>>>
>>> I saw this description in the ARM manual, “D8.5.5 Use of the Contiguous bit
>>> with hardware updates to the translation tables”:
>>>
>>>
>>>> If hardware updates a translation table entry, and if the Contiguous bit in
>>>> that entry is 1, then the members in a group of contiguous translation table
>>>> entries can have different AF, AP[2], and S2AP[1] values.
>>> Does this mean that after hardware aggregates multiple PTEs, it can still
>>> independently set the AF and other flag bits corresponding to specific
>>> sub-PTE?
>> Yes. Hardware can update access and dirty bits per-pte. It is the job
>> of software to aggregate them.
>>
>>> If so, can software also set different AF bits for a group of 16 PTEs
>>> without affecting the transparent PTE coalescing function?
>> Yes. See set_ptes -> __contpte_try_fold: look at pte_mkold(pte_mkclean()).
>> We ignore the a/d bits while constructing the next expected pte.
>>
> Thank you for your answer. I think we can now get the following conclusion:
> From a hardware perspective, after the PTE continuous bit is set, the access
> and dirty flags of the PTE do not affect the transparent PTE
> coalescing function.

Keep in mind that this is the case in software - there is also transparent
coalescing done by hardware, and I am not aware of the spec for that.

>
>>> The reason I have this confusion is that there is such a description in
>>> “D8.7.1 The Contiguous bit:”
>>>
>>>> Software is required to ensure that all of the adjacent translation table
>>>> entries for the contiguous region point to a contiguous OA range with
>>>> consistent attributes and permissions.
>>> It does not specify whether attributes and permissions include the AF bit.
>>>
>>>> --
>>>> Cheers,
>>>>
>>>> David


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] mm: only set fault addrsss' access bit in do_anonymous_page
  2026-02-12  1:57       ` Wenchao Hao
@ 2026-02-12  8:54         ` David Hildenbrand (Arm)
  2026-02-13  9:02           ` Wenchao Hao
  0 siblings, 1 reply; 17+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-12  8:54 UTC (permalink / raw)
  To: Wenchao Hao
  Cc: Andrew Morton, Lorenzo Stoakes, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	linux-mm, linux-kernel

On 2/12/26 02:57, Wenchao Hao wrote:
> On Wed, Feb 11, 2026 at 5:05 PM David Hildenbrand (Arm)
> <david@kernel.org> wrote:
>>
>> On 2/11/26 01:49, Wenchao Hao wrote:
>>> On Tue, Feb 10, 2026 at 5:07 PM David Hildenbrand (Arm)
>>> <david@kernel.org> wrote:
>>>
>>> We have enabled 64KB large folios on Android devices, which may introduce
>>> some memory waste. I want to figure out the proportion of memory waste
>>> caused by large folios. Reading the "Referenced" field from /proc/pid/smaps
>>> is a relatively low-cost method.
>>
>> Right. And that imprecision is to be expected when you opt-in into
>> something that manages memory in other granularity and only has a single
>> a/d bit: a large folio.
>>
>> Sure, individual PTEs *might* have independent a/d bits, but the
>> underlying thing (folio) has only a single one. And optimizations that
>> build on top (pte coalescing) reuse that principle that having a single
>> logical a/d bit is fine.
>>
>>>
>>> Additionally, considering future hot/cold page identification, we aim to
>>> detect 64KB large folios where some pages are actually unaccessed and split
>>> them into normal pages to avoid memory waste.
>>>
>>> However, the current large folio implementation sets the access bit for all
>>> page table entries (PTEs) of the large folio in the do_anonymous_page
>>> function, making it hard to distinguish whether pre-allocated pages were
>>> truly accessed.
>>
>> The deferred shrinker uses a much simpler mechanism: if the page content
>> is zero, likely it was over-allocated and never used later.
>>
>> It's not completely lightweight (scan pages for 0 content), but is
>> reliable, independent of the mapping type (PMD, cont-pte, whatever) and
>> independent of any access/dirty bits, leaving performance unharmed.
>>
>> When you say "I want to figure out the proportion of memory waste", are
>> we talking about a debug feature?
>>
> 
> Thanks for your explanation. I now understand the design logic.
> 
> What I’m proposing is mainly for debugging. After enabling 64K large folio
> on Android, we observed increased application memory footprint, especially
> for anonymous pages.
> 
> Since Android app memory usage depends on runtime scenarios, we cannot
> confirm if the growth is directly caused by large folio. We want to
> analyze memory
> usage via the `Referenced` field in `/proc/[pid]/smaps`.

Scanning for zero-filled pages will be much easier and more reliable. 
For a debug feature good enough.

I'm wondering what the best interface for something like that could be: 
we don't want to make "/proc/[pid]/smaps" slower for all users.

Maybe we could for debug kernels.

For example, adding with CONFIG_DEBUG_KERNEL a new entry

	Anon_Zero:

counter that just tests whether the page content of an anonymous page is 
all zeroes could be doable.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] mm: only set fault addrsss' access bit in do_anonymous_page
  2026-02-12  8:54         ` David Hildenbrand (Arm)
@ 2026-02-13  9:02           ` Wenchao Hao
  2026-02-13  9:07             ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 17+ messages in thread
From: Wenchao Hao @ 2026-02-13  9:02 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Andrew Morton, Lorenzo Stoakes, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	linux-mm, linux-kernel

On Thu, Feb 12, 2026 at 4:54 PM David Hildenbrand (Arm)
<david@kernel.org> wrote:
>
> On 2/12/26 02:57, Wenchao Hao wrote:
> > On Wed, Feb 11, 2026 at 5:05 PM David Hildenbrand (Arm)
> > <david@kernel.org> wrote:
> >>
> >> On 2/11/26 01:49, Wenchao Hao wrote:
> >>> On Tue, Feb 10, 2026 at 5:07 PM David Hildenbrand (Arm)
> >>> <david@kernel.org> wrote:
> >>>
> >>> We have enabled 64KB large folios on Android devices, which may introduce
> >>> some memory waste. I want to figure out the proportion of memory waste
> >>> caused by large folios. Reading the "Referenced" field from /proc/pid/smaps
> >>> is a relatively low-cost method.
> >>
> >> Right. And that imprecision is to be expected when you opt-in into
> >> something that manages memory in other granularity and only has a single
> >> a/d bit: a large folio.
> >>
> >> Sure, individual PTEs *might* have independent a/d bits, but the
> >> underlying thing (folio) has only a single one. And optimizations that
> >> build on top (pte coalescing) reuse that principle that having a single
> >> logical a/d bit is fine.
> >>
> >>>
> >>> Additionally, considering future hot/cold page identification, we aim to
> >>> detect 64KB large folios where some pages are actually unaccessed and split
> >>> them into normal pages to avoid memory waste.
> >>>
> >>> However, the current large folio implementation sets the access bit for all
> >>> page table entries (PTEs) of the large folio in the do_anonymous_page
> >>> function, making it hard to distinguish whether pre-allocated pages were
> >>> truly accessed.
> >>
> >> The deferred shrinker uses a much simpler mechanism: if the page content
> >> is zero, likely it was over-allocated and never used later.
> >>
> >> It's not completely lightweight (scan pages for 0 content), but is
> >> reliable, independent of the mapping type (PMD, cont-pte, whatever) and
> >> independent of any access/dirty bits, leaving performance unharmed.
> >>
> >> When you say "I want to figure out the proportion of memory waste", are
> >> we talking about a debug feature?
> >>
> >
> > Thanks for your explanation. I now understand the design logic.
> >
> > What I’m proposing is mainly for debugging. After enabling 64K large folio
> > on Android, we observed increased application memory footprint, especially
> > for anonymous pages.
> >
> > Since Android app memory usage depends on runtime scenarios, we cannot
> > confirm if the growth is directly caused by large folio. We want to
> > analyze memory
> > usage via the `Referenced` field in `/proc/[pid]/smaps`.
>
> Scanning for zero-filled pages will be much easier and more reliable.
> For a debug feature good enough.
>
> I'm wondering what the best interface for something like that could be:
> we don't want to make "/proc/[pid]/smaps" slower for all users.
>
> Maybe we could for debug kernels.
>
> For example, adding with CONFIG_DEBUG_KERNEL a new entry
>
>         Anon_Zero:
>
> counter that just tests whether the page content of an anonymous page is
> all zeroes could be doable.
>

Apologies for the delayed reply – I was just writing a demo to verify the
approach you mentioned.

Using the CONFIG_DEBUG_KERNEL compile-time macro to isolate this feature
is indeed an excellent idea.

However, in engineering practice, it requires recompiling and
replacing the kernel,
which can be cumbersome. Could we instead use a dynamic switch to control
whether scan for zero-filled pages when reading /proc/[pid]/smaps?

> --
> Cheers,
>
> David


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] mm: only set fault addrsss' access bit in do_anonymous_page
  2026-02-13  9:02           ` Wenchao Hao
@ 2026-02-13  9:07             ` David Hildenbrand (Arm)
  2026-02-13 14:52               ` Wenchao Hao
  0 siblings, 1 reply; 17+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-13  9:07 UTC (permalink / raw)
  To: Wenchao Hao
  Cc: Andrew Morton, Lorenzo Stoakes, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	linux-mm, linux-kernel

On 2/13/26 10:02, Wenchao Hao wrote:
> On Thu, Feb 12, 2026 at 4:54 PM David Hildenbrand (Arm)
> <david@kernel.org> wrote:
>>
>> On 2/12/26 02:57, Wenchao Hao wrote:
>>> On Wed, Feb 11, 2026 at 5:05 PM David Hildenbrand (Arm)
>>> <david@kernel.org> wrote:
>>>
>>> Thanks for your explanation. I now understand the design logic.
>>>
>>> What I’m proposing is mainly for debugging. After enabling 64K large folio
>>> on Android, we observed increased application memory footprint, especially
>>> for anonymous pages.
>>>
>>> Since Android app memory usage depends on runtime scenarios, we cannot
>>> confirm if the growth is directly caused by large folio. We want to
>>> analyze memory
>>> usage via the `Referenced` field in `/proc/[pid]/smaps`.
>>
>> Scanning for zero-filled pages will be much easier and more reliable.
>> For a debug feature good enough.
>>
>> I'm wondering what the best interface for something like that could be:
>> we don't want to make "/proc/[pid]/smaps" slower for all users.
>>
>> Maybe we could for debug kernels.
>>
>> For example, adding with CONFIG_DEBUG_KERNEL a new entry
>>
>>          Anon_Zero:
>>
>> counter that just tests whether the page content of an anonymous page is
>> all zeroes could be doable.
>>
> 
> Apologies for the delayed reply – I was just writing a demo to verify the
> approach you mentioned.
> 
> Using the CONFIG_DEBUG_KERNEL compile-time macro to isolate this feature
> is indeed an excellent idea.
> 
> However, in engineering practice, it requires recompiling and
> replacing the kernel,
> which can be cumbersome. Could we instead use a dynamic switch to control
> whether scan for zero-filled pages when reading /proc/[pid]/smaps?

Maybe a kernel cmdline option could do? Selectively enabling it for some 
PIDs only is not really possible, but also, maybe it's not really needed.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] mm: only set fault addrsss' access bit in do_anonymous_page
  2026-02-13  9:07             ` David Hildenbrand (Arm)
@ 2026-02-13 14:52               ` Wenchao Hao
  2026-02-13 15:08                 ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 17+ messages in thread
From: Wenchao Hao @ 2026-02-13 14:52 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Andrew Morton, Lorenzo Stoakes, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	linux-mm, linux-kernel

On Fri, Feb 13, 2026 at 5:08 PM David Hildenbrand (Arm)
<david@kernel.org> wrote:
>
> On 2/13/26 10:02, Wenchao Hao wrote:
> > On Thu, Feb 12, 2026 at 4:54 PM David Hildenbrand (Arm)
> > <david@kernel.org> wrote:
> >>
> >> On 2/12/26 02:57, Wenchao Hao wrote:
> >>> On Wed, Feb 11, 2026 at 5:05 PM David Hildenbrand (Arm)
> >>> <david@kernel.org> wrote:
> >>>
> >>> Thanks for your explanation. I now understand the design logic.
> >>>
> >>> What I’m proposing is mainly for debugging. After enabling 64K large folio
> >>> on Android, we observed increased application memory footprint, especially
> >>> for anonymous pages.
> >>>
> >>> Since Android app memory usage depends on runtime scenarios, we cannot
> >>> confirm if the growth is directly caused by large folio. We want to
> >>> analyze memory
> >>> usage via the `Referenced` field in `/proc/[pid]/smaps`.
> >>
> >> Scanning for zero-filled pages will be much easier and more reliable.
> >> For a debug feature good enough.
> >>
> >> I'm wondering what the best interface for something like that could be:
> >> we don't want to make "/proc/[pid]/smaps" slower for all users.
> >>
> >> Maybe we could for debug kernels.
> >>
> >> For example, adding with CONFIG_DEBUG_KERNEL a new entry
> >>
> >>          Anon_Zero:
> >>
> >> counter that just tests whether the page content of an anonymous page is
> >> all zeroes could be doable.
> >>
> >
> > Apologies for the delayed reply – I was just writing a demo to verify the
> > approach you mentioned.
> >
> > Using the CONFIG_DEBUG_KERNEL compile-time macro to isolate this feature
> > is indeed an excellent idea.
> >
> > However, in engineering practice, it requires recompiling and
> > replacing the kernel,
> > which can be cumbersome. Could we instead use a dynamic switch to control
> > whether scan for zero-filled pages when reading /proc/[pid]/smaps?
>
> Maybe a kernel cmdline option could do?
Kernel command line parameters can meet our requirements.

> Selectively enabling it for some PIDs only is not really possible, but also, maybe it's not really needed.
Yes, it is unnecessary to do that.

By the way, will you send a new patch for this, or shall I take care of it?
Thanks.
>
> --
> Cheers,
>
> David


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] mm: only set fault addrsss' access bit in do_anonymous_page
  2026-02-13 14:52               ` Wenchao Hao
@ 2026-02-13 15:08                 ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 17+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-13 15:08 UTC (permalink / raw)
  To: Wenchao Hao
  Cc: Andrew Morton, Lorenzo Stoakes, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	linux-mm, linux-kernel

On 2/13/26 15:52, Wenchao Hao wrote:
> On Fri, Feb 13, 2026 at 5:08 PM David Hildenbrand (Arm)
> <david@kernel.org> wrote:
>>
>> On 2/13/26 10:02, Wenchao Hao wrote:
>>> On Thu, Feb 12, 2026 at 4:54 PM David Hildenbrand (Arm)
>>> <david@kernel.org> wrote:
>>>
>>> Apologies for the delayed reply – I was just writing a demo to verify the
>>> approach you mentioned.
>>>
>>> Using the CONFIG_DEBUG_KERNEL compile-time macro to isolate this feature
>>> is indeed an excellent idea.
>>>
>>> However, in engineering practice, it requires recompiling and
>>> replacing the kernel,
>>> which can be cumbersome. Could we instead use a dynamic switch to control
>>> whether scan for zero-filled pages when reading /proc/[pid]/smaps?
>>
>> Maybe a kernel cmdline option could do?
> Kernel command line parameters can meet our requirements.
> 
>> Selectively enabling it for some PIDs only is not really possible, but also, maybe it's not really needed.
> Yes, it is unnecessary to do that.
> 
> By the way, will you send a new patch for this, or shall I take care of it?

You :)

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2026-02-13 15:08 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-10  4:34 [RFC PATCH] mm: only set fault addrsss' access bit in do_anonymous_page Wenchao Hao
2026-02-10  9:07 ` David Hildenbrand (Arm)
2026-02-11  0:49   ` Wenchao Hao
2026-02-11  4:18     ` Dev Jain
2026-02-12  1:42       ` Wenchao Hao
2026-02-12  5:04         ` Dev Jain
2026-02-11  9:05     ` David Hildenbrand (Arm)
2026-02-12  1:57       ` Wenchao Hao
2026-02-12  8:54         ` David Hildenbrand (Arm)
2026-02-13  9:02           ` Wenchao Hao
2026-02-13  9:07             ` David Hildenbrand (Arm)
2026-02-13 14:52               ` Wenchao Hao
2026-02-13 15:08                 ` David Hildenbrand (Arm)
2026-02-10 11:56 ` Kiryl Shutsemau
2026-02-11  1:00   ` Wenchao Hao
2026-02-11 11:03     ` Kiryl Shutsemau
2026-02-12  2:08       ` Wenchao Hao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox