[PATCH v2] smaps: Report correct page sizes with THP

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2] smaps: Report correct page sizes with THP
@ 2026-02-25 23:27 Andi Kleen
  2026-02-26 12:08 ` Usama Arif
  2026-02-26 17:31 ` David Hildenbrand (Arm)
  0 siblings, 2 replies; 8+ messages in thread
From: Andi Kleen @ 2026-02-25 23:27 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm, Andi Kleen

The earlier version of this patch kit wasn't that well received,
with the main objection being non support for mTHP. This variant
tracks any mTHP sizes in a VMA and reports them with MMUPageSizeN in smaps,
with increasing N.  The base page size is still reported w/o a
number postfix to stay compatible.

The nice thing is that the patch is actually simpler and more
straight forward than the THP only variant. Also improved the
documentation.

Recently I wasted quite some time debugging why THP didn't work, when it
was just smaps always reporting the base page size. It has separate
counts for (non m) THP, but using them is not always obvious.
I left KernelPageSize alone.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 Documentation/filesystems/proc.rst |  8 ++++++--
 fs/proc/task_mmu.c                 | 14 +++++++++++++-
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index b0c0d1b45b99..c5102ef7a2eb 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -452,6 +452,7 @@ Memory Area, or VMA) there is a series of lines such as the following::
     Size:               1084 kB
     KernelPageSize:        4 kB
     MMUPageSize:           4 kB
+    MMUPageSize2:       2048 kB
     Rss:                 892 kB
     Pss:                 374 kB
     Pss_Dirty:             0 kB
@@ -476,14 +477,17 @@ Memory Area, or VMA) there is a series of lines such as the following::
     VmFlags: rd ex mr mw me dw
 
 The first of these lines shows the same information as is displayed for
-the mapping in /proc/PID/maps.  Following lines show the size of the
+the mapping in /proc/PID/maps (except that there might be more page sizes
+if the mapping has them)
+Following lines show the size of the
 mapping (size); the size of each page allocated when backing a VMA
 (KernelPageSize), which is usually the same as the size in the page table
 entries; the page size used by the MMU when backing a VMA (in most cases,
 the same as KernelPageSize); the amount of the mapping that is currently
 resident in RAM (RSS); the process's proportional share of this mapping
 (PSS); and the number of clean and dirty shared and private pages in the
-mapping.
+mapping. If the mapping has multiple page size there might be a be multiple
+numbered MMUPageSize entries.
 
 The "proportional set size" (PSS) of a process is the count of pages it has
 in memory, where each page is divided by the number of processes sharing it.
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index e091931d7ca1..8bfd8b13c2ed 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -874,6 +874,7 @@ struct mem_size_stats {
 	unsigned long shared_hugetlb;
 	unsigned long private_hugetlb;
 	unsigned long ksm;
+	unsigned long compound_orders;
 	u64 pss;
 	u64 pss_anon;
 	u64 pss_file;
@@ -942,6 +943,9 @@ static void smaps_account(struct mem_size_stats *mss, struct page *page,
 	if (young || folio_test_young(folio) || folio_test_referenced(folio))
 		mss->referenced += size;
 
+	mss->compound_orders |=
+		BIT_ULL(compound ? folio_large_order(folio) : 0);
+
 	/*
 	 * Then accumulate quantities that may depend on sharing, or that may
 	 * differ page-by-page.
@@ -1371,6 +1375,7 @@ static int show_smap(struct seq_file *m, void *v)
 {
 	struct vm_area_struct *vma = v;
 	struct mem_size_stats mss = {};
+	int i, cnt = 0;
 
 	smap_gather_stats(vma, &mss, 0);
 
@@ -1378,7 +1383,14 @@ static int show_smap(struct seq_file *m, void *v)
 
 	SEQ_PUT_DEC("Size:           ", vma->vm_end - vma->vm_start);
 	SEQ_PUT_DEC(" kB\nKernelPageSize: ", vma_kernel_pagesize(vma));
-	SEQ_PUT_DEC(" kB\nMMUPageSize:    ", vma_mmu_pagesize(vma));
+
+	for_each_set_bit(i, &mss.compound_orders, BITS_PER_LONG) {
+		if (cnt++ == 0)
+			SEQ_PUT_DEC(" kB\nMMUPageSize:    ", PAGE_SIZE << i);
+		else
+			seq_printf(m, " kB\nMMUPageSize%d:   %8u",
+					cnt, 1 << (PAGE_SHIFT-10+i));
+	}
 	seq_puts(m, " kB\n");
 
 	__show_smap(m, &mss, false);
-- 
2.53.0



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] smaps: Report correct page sizes with THP
  2026-02-25 23:27 [PATCH v2] smaps: Report correct page sizes with THP Andi Kleen
@ 2026-02-26 12:08 ` Usama Arif
  2026-03-01 17:20   ` Andi Kleen
  2026-02-26 17:31 ` David Hildenbrand (Arm)
  1 sibling, 1 reply; 8+ messages in thread
From: Usama Arif @ 2026-02-26 12:08 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Usama Arif, linux-mm, akpm

On Wed, 25 Feb 2026 15:27:08 -0800 Andi Kleen <ak@linux.intel.com> wrote:

> The earlier version of this patch kit wasn't that well received,
> with the main objection being non support for mTHP. This variant
> tracks any mTHP sizes in a VMA and reports them with MMUPageSizeN in smaps,
> with increasing N.  The base page size is still reported w/o a
> number postfix to stay compatible.
> 
> The nice thing is that the patch is actually simpler and more
> straight forward than the THP only variant. Also improved the
> documentation.
> 
> Recently I wasted quite some time debugging why THP didn't work, when it
> was just smaps always reporting the base page size. It has separate
> counts for (non m) THP, but using them is not always obvious.
> I left KernelPageSize alone.
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>  Documentation/filesystems/proc.rst |  8 ++++++--
>  fs/proc/task_mmu.c                 | 14 +++++++++++++-
>  2 files changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> index b0c0d1b45b99..c5102ef7a2eb 100644
> --- a/Documentation/filesystems/proc.rst
> +++ b/Documentation/filesystems/proc.rst
> @@ -452,6 +452,7 @@ Memory Area, or VMA) there is a series of lines such as the following::
>      Size:               1084 kB
>      KernelPageSize:        4 kB
>      MMUPageSize:           4 kB
> +    MMUPageSize2:       2048 kB
>      Rss:                 892 kB
>      Pss:                 374 kB
>      Pss_Dirty:             0 kB
> @@ -476,14 +477,17 @@ Memory Area, or VMA) there is a series of lines such as the following::
>      VmFlags: rd ex mr mw me dw
>  
>  The first of these lines shows the same information as is displayed for
> -the mapping in /proc/PID/maps.  Following lines show the size of the
> +the mapping in /proc/PID/maps (except that there might be more page sizes
> +if the mapping has them)
> +Following lines show the size of the
>  mapping (size); the size of each page allocated when backing a VMA
>  (KernelPageSize), which is usually the same as the size in the page table
>  entries; the page size used by the MMU when backing a VMA (in most cases,
>  the same as KernelPageSize); the amount of the mapping that is currently
>  resident in RAM (RSS); the process's proportional share of this mapping
>  (PSS); and the number of clean and dirty shared and private pages in the
> -mapping.
> +mapping. If the mapping has multiple page size there might be a be multiple
> +numbered MMUPageSize entries.
>  
>  The "proportional set size" (PSS) of a process is the count of pages it has
>  in memory, where each page is divided by the number of processes sharing it.
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index e091931d7ca1..8bfd8b13c2ed 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -874,6 +874,7 @@ struct mem_size_stats {
>  	unsigned long shared_hugetlb;
>  	unsigned long private_hugetlb;
>  	unsigned long ksm;
> +	unsigned long compound_orders;
>  	u64 pss;
>  	u64 pss_anon;
>  	u64 pss_file;
> @@ -942,6 +943,9 @@ static void smaps_account(struct mem_size_stats *mss, struct page *page,
>  	if (young || folio_test_young(folio) || folio_test_referenced(folio))
>  		mss->referenced += size;
>  
> +	mss->compound_orders |=
> +		BIT_ULL(compound ? folio_large_order(folio) : 0);
> +
>  	/*
>  	 * Then accumulate quantities that may depend on sharing, or that may
>  	 * differ page-by-page.
> @@ -1371,6 +1375,7 @@ static int show_smap(struct seq_file *m, void *v)
>  {
>  	struct vm_area_struct *vma = v;
>  	struct mem_size_stats mss = {};
> +	int i, cnt = 0;
>  
>  	smap_gather_stats(vma, &mss, 0);
>  
> @@ -1378,7 +1383,14 @@ static int show_smap(struct seq_file *m, void *v)
>  
>  	SEQ_PUT_DEC("Size:           ", vma->vm_end - vma->vm_start);
>  	SEQ_PUT_DEC(" kB\nKernelPageSize: ", vma_kernel_pagesize(vma));
> -	SEQ_PUT_DEC(" kB\nMMUPageSize:    ", vma_mmu_pagesize(vma));
> +
> +	for_each_set_bit(i, &mss.compound_orders, BITS_PER_LONG) {

Hello Andi!

When a VMA has no resident pages (e.g., freshly mmap'd but not yet
faulted), compound_orders will be zero and the for_each_set_bit loop
will not execute at all. This means no MMUPageSize line is emitted
for that VMA.

Previously, vma_mmu_pagesize() was called unconditionally and always
produced the MMUPageSize field. Userspace tools that parse smaps and
expect MMUPageSize to always be present would break on VMAs with no
resident pages. Should we always add it?

Thanks

> +		if (cnt++ == 0)
> +			SEQ_PUT_DEC(" kB\nMMUPageSize:    ", PAGE_SIZE << i);
> +		else
> +			seq_printf(m, " kB\nMMUPageSize%d:   %8u",
> +					cnt, 1 << (PAGE_SHIFT-10+i));
> +	}
>  	seq_puts(m, " kB\n");
>  
>  	__show_smap(m, &mss, false);
> -- 
> 2.53.0
> 
> 
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] smaps: Report correct page sizes with THP
  2026-02-25 23:27 [PATCH v2] smaps: Report correct page sizes with THP Andi Kleen
  2026-02-26 12:08 ` Usama Arif
@ 2026-02-26 17:31 ` David Hildenbrand (Arm)
  2026-03-01 17:35   ` Andi Kleen
  1 sibling, 1 reply; 8+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-26 17:31 UTC (permalink / raw)
  To: Andi Kleen, linux-mm; +Cc: akpm

On 2/26/26 00:27, Andi Kleen wrote:
> The earlier version of this patch kit wasn't that well received,
> with the main objection being non support for mTHP. This variant
> tracks any mTHP sizes in a VMA and reports them with MMUPageSizeN in smaps,
> with increasing N.  The base page size is still reported w/o a
> number postfix to stay compatible.
> 
> The nice thing is that the patch is actually simpler and more
> straight forward than the THP only variant. Also improved the
> documentation.
> 
> Recently I wasted quite some time debugging why THP didn't work, when it
> was just smaps always reporting the base page size. It has separate
> counts for (non m) THP, but using them is not always obvious.
> I left KernelPageSize alone.
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---

You should CC people that were commented on earlier versions.

I still don't like this.


a) Just because a folio has a certain order does not imply that hw actually
coalesces anything. MMUPageSize is otherwise misleading.

b) Simply because you find a folio of a certain order does not imply that
it is even fully mapped in there.

c) PTE coalescing on AMD can even span folios

But more importantly

d) MMUPageSize is independent of the actual page mappings, and I don't
   think we should change these semantics.


Let's see why MMUPageSize was added in the first place:

commit 3340289ddf29ca75c3acfb3a6b72f234b2f74d5c
Author: Mel Gorman <mel@csn.ul.ie>
Date:   Tue Jan 6 14:38:54 2009 -0800

    mm: report the MMU pagesize in /proc/pid/smaps
    
    The KernelPageSize entry in /proc/pid/smaps is the pagesize used by the
    kernel to back a VMA.  This matches the size used by the MMU in the
    majority of cases.  However, one counter-example occurs on PPC64 kernels
    whereby a kernel using 64K as a base pagesize may still use 4K pages for
    the MMU on older processor.  To distinguish, this patch reports
    MMUPageSize as the pagesize used by the MMU in /proc/pid/smaps.


So instead of 64K (PAGE_SIZE), they reported 4K. Always. Even if nothing is mapped.

So you could indicate all MMUPageSize that hardware possibly supports in here.
I don't think it's that helpful.

We once discussed exporting more stats here (similar to AnonHugePages/ShmemPmdMapped, ...)
but we were concerned about creating a mess with mTHP stats.

For this reason, Ryan developed a tool (tools/mm/thpmaps) to introspect the
actual mappings.

See

commit 2444172cfde45a3d6e655f50c620727c76bab4a2
Author: Ryan Roberts <ryan.roberts@arm.com>
Date:   Tue Jan 16 14:12:35 2024 +0000

    tools/mm: add thpmaps script to dump THP usage info
    
    With the proliferation of large folios for file-backed memory, and more
    recently the introduction of multi-size THP for anonymous memory, it is
    becoming useful to be able to see exactly how large folios are mapped into
    processes.  For some architectures (e.g.  arm64), if most memory is mapped
    using contpte-sized and -aligned blocks, TLB usage can be optimized so
    it's useful to see where these requirements are and are not being met.


-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] smaps: Report correct page sizes with THP
  2026-02-26 12:08 ` Usama Arif
@ 2026-03-01 17:20   ` Andi Kleen
  0 siblings, 0 replies; 8+ messages in thread
From: Andi Kleen @ 2026-03-01 17:20 UTC (permalink / raw)
  To: Usama Arif; +Cc: linux-mm, akpm

> When a VMA has no resident pages (e.g., freshly mmap'd but not yet
> faulted), compound_orders will be zero and the for_each_set_bit loop
> will not execute at all. This means no MMUPageSize line is emitted
> for that VMA.
> 
> Previously, vma_mmu_pagesize() was called unconditionally and always
> produced the MMUPageSize field. Userspace tools that parse smaps and
> expect MMUPageSize to always be present would break on VMAs with no
> resident pages. Should we always add it?

Yes that's a good point. Should fall back to the base page size
then. Will just make the simple logic a bit more complex again.

-Andi


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] smaps: Report correct page sizes with THP
  2026-02-26 17:31 ` David Hildenbrand (Arm)
@ 2026-03-01 17:35   ` Andi Kleen
  2026-03-02 19:29     ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2026-03-01 17:35 UTC (permalink / raw)
  To: David Hildenbrand (Arm); +Cc: linux-mm, akpm

> 
> a) Just because a folio has a certain order does not imply that hw actually
> coalesces anything. MMUPageSize is otherwise misleading.

That's true. However reporting 4K for a 2MB THP mapping today is even
more misleading.  That's where I started, it misled me totally!

So you're asking for an architecture specific / cpu specific hook to filter it?
I suppose it could be added, however it might take a very long time
to get merged, and even that cannot handle all corner cases.

> b) Simply because you find a folio of a certain order does not imply that
> it is even fully mapped in there.

Ok. I suppose the walker could handle that.

> 
> c) PTE coalescing on AMD can even span folios

and d) it might randomly not happen due to various runtime reasons.

It seems the only thing that would satisfy all your correctness 
criteria would be to not report a MMUPageSize at all, but we cannot do that
for compatibility reasons as you yourself pointed out.

Given that your requirements are impossible, we have to settle on
something better. I still think that what I proposed is a good
compromise, although yes it's far from perfect.

> 
> But more importantly
> 
> d) MMUPageSize is independent of the actual page mappings, and I don't
>    think we should change these semantics.

That makes no sense. What is it good for then? Just a random number
that looks good?

> 
> 
> Let's see why MMUPageSize was added in the first place:
> 
> commit 3340289ddf29ca75c3acfb3a6b72f234b2f74d5c
> Author: Mel Gorman <mel@csn.ul.ie>
> Date:   Tue Jan 6 14:38:54 2009 -0800
> 
>     mm: report the MMU pagesize in /proc/pid/smaps
>     
>     The KernelPageSize entry in /proc/pid/smaps is the pagesize used by the
>     kernel to back a VMA.  This matches the size used by the MMU in the
>     majority of cases.  However, one counter-example occurs on PPC64 kernels
>     whereby a kernel using 64K as a base pagesize may still use 4K pages for
>     the MMU on older processor.  To distinguish, this patch reports
>     MMUPageSize as the pagesize used by the MMU in /proc/pid/smaps.
> 
> 
> So instead of 64K (PAGE_SIZE), they reported 4K. Always. Even if nothing is mapped.

It doesn't seem like a good design. I don't know what that is good for.

What is reasonable to report something at least approximating what is
really mapped.

> 
> So you could indicate all MMUPageSize that hardware possibly supports in here.
> I don't think it's that helpful.

Right.

> 
> We once discussed exporting more stats here (similar to AnonHugePages/ShmemPmdMapped, ...)
> but we were concerned about creating a mess with mTHP stats.
> 
> For this reason, Ryan developed a tool (tools/mm/thpmaps) to introspect the
> actual mappings.

Some magic other tool doesn't help with the current output confusing
people.

Yes I can always dump the page tables through debugfs (or at least I could
if not most distributions don't bother to enable that config option
for unknown reasons)

-Andi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] smaps: Report correct page sizes with THP
  2026-03-01 17:35   ` Andi Kleen
@ 2026-03-02 19:29     ` David Hildenbrand (Arm)
  2026-03-02 20:41       ` Andi Kleen
  0 siblings, 1 reply; 8+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-02 19:29 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-mm, akpm, Lorenzo Stoakes

>>
>> But more importantly
>>
>> d) MMUPageSize is independent of the actual page mappings, and I don't
>>    think we should change these semantics.
> 
> That makes no sense. What is it good for then? Just a random number
> that looks good?

Especially after reading 3340289ddf29ca75c3acfb3a6b72f234b2f74d5c I
ended up with he following conclusion:

"KernelPageSize" tells you in which granularity we can mmap()/munmap()
etc. Simple.

"MMUPageSize" tells us how this granularity is implemented (or emulated)
under the hood.

The case we care about is when MMUPageSize < KernelPageSize. A process
might have to know that detail even when nothing is currently/yet
faulted in.

Assume a process would perform an atomic that would cross MMUPageSize,
but not KernelPageSize. Depending on the architecture, atomics would not
work as expected in that case.

I'd expect other cases where an architecture might have to care about
the actual, smallest possible MMUPageSize it might be executed on while
running the program.

It's a shame we had to add MMUPageSize, but maybe it might resurface if
we ever support emulating 64K/16K user pagesizes on 4K MMUs.

> 
>>
>>
>> Let's see why MMUPageSize was added in the first place:
>>
>> commit 3340289ddf29ca75c3acfb3a6b72f234b2f74d5c
>> Author: Mel Gorman <mel@csn.ul.ie>
>> Date:   Tue Jan 6 14:38:54 2009 -0800
>>
>>     mm: report the MMU pagesize in /proc/pid/smaps
>>     
>>     The KernelPageSize entry in /proc/pid/smaps is the pagesize used by the
>>     kernel to back a VMA.  This matches the size used by the MMU in the
>>     majority of cases.  However, one counter-example occurs on PPC64 kernels
>>     whereby a kernel using 64K as a base pagesize may still use 4K pages for
>>     the MMU on older processor.  To distinguish, this patch reports
>>     MMUPageSize as the pagesize used by the MMU in /proc/pid/smaps.
>>
>>
>> So instead of 64K (PAGE_SIZE), they reported 4K. Always. Even if nothing is mapped.
> 
> It doesn't seem like a good design. I don't know what that is good for.
> 
> What is reasonable to report something at least approximating what is
> really mapped.

I disagree. It is not of a lot of use to know "In this 1 TiB region,
there is something mapped with 2 MiB", and then exposing it under the
MMUPageSize umbrella with different (mapped) semantics.

AnonHugePages/ShmemPmdMapped/FilePmdMapped are better in that regard,
although historically suboptimal.

Ideally we'd have better statistics out of the box (similar to what
thpmaps does), but as we recently discussed in the context of "AnonZero"
we should much rather look into a better interface to expose something
like that (e.g., a new syscall where on can enable selected statistics),
including new tooling that can obtain exactly the statistics we want.

> 
>>
>> We once discussed exporting more stats here (similar to AnonHugePages/ShmemPmdMapped, ...)
>> but we were concerned about creating a mess with mTHP stats.
>>
>> For this reason, Ryan developed a tool (tools/mm/thpmaps) to introspect the
>> actual mappings.
> 
> 
> Some magic other tool doesn't help with the current output confusing
> people.
Adding more confusion with this MMUPageSize extension is not an option.
We should likely clarify what MMUPageSize really means.

So my NAK stands.

CCing Lorenzo:
https://lore.kernel.org/all/20260225232708.87833-1-ak@linux.intel.com/

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] smaps: Report correct page sizes with THP
  2026-03-02 19:29     ` David Hildenbrand (Arm)
@ 2026-03-02 20:41       ` Andi Kleen
  2026-03-02 21:05         ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2026-03-02 20:41 UTC (permalink / raw)
  To: David Hildenbrand (Arm); +Cc: linux-mm, akpm, Lorenzo Stoakes

> Especially after reading 3340289ddf29ca75c3acfb3a6b72f234b2f74d5c I
> ended up with he following conclusion:
> 
> "KernelPageSize" tells you in which granularity we can mmap()/munmap()
> etc. Simple.
> 
> "MMUPageSize" tells us how this granularity is implemented (or emulated)
> under the hood.
> 
> The case we care about is when MMUPageSize < KernelPageSize. A process
> might have to know that detail even when nothing is currently/yet
> faulted in.
> 
> Assume a process would perform an atomic that would cross MMUPageSize,
> but not KernelPageSize. Depending on the architecture, atomics would not
> work as expected in that case.

I thought most architectures don't support atomics crossing pages
anyways. x86 supports it, but it's discouraged.

> 
> I'd expect other cases where an architecture might have to care about
> the actual, smallest possible MMUPageSize it might be executed on while
> running the program.

That's fine, they can use the min, or just the first match
(which is always the smallest)

> 
> It's a shame we had to add MMUPageSize, but maybe it might resurface if
> we ever support emulating 64K/16K user pagesizes on 4K MMUs.

Okay, so if I follow that correctly you're suggesting 
to change KernelPageSize, not MMUPageSize. I can do that change.

> Adding more confusion with this MMUPageSize extension is not an option.
> We should likely clarify what MMUPageSize really means.

That would be an orthogonal documentation patch.


Thanks,
-Andi


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] smaps: Report correct page sizes with THP
  2026-03-02 20:41       ` Andi Kleen
@ 2026-03-02 21:05         ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 8+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-02 21:05 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-mm, akpm, Lorenzo Stoakes

On 3/2/26 21:41, Andi Kleen wrote:
>> Especially after reading 3340289ddf29ca75c3acfb3a6b72f234b2f74d5c I
>> ended up with he following conclusion:
>>
>> "KernelPageSize" tells you in which granularity we can mmap()/munmap()
>> etc. Simple.
>>
>> "MMUPageSize" tells us how this granularity is implemented (or emulated)
>> under the hood.
>>
>> The case we care about is when MMUPageSize < KernelPageSize. A process
>> might have to know that detail even when nothing is currently/yet
>> faulted in.
>>
>> Assume a process would perform an atomic that would cross MMUPageSize,
>> but not KernelPageSize. Depending on the architecture, atomics would not
>> work as expected in that case.
> 
> I thought most architectures don't support atomics crossing pages
> anyways. x86 supports it, but it's discouraged.

Right. And if your user space thinks it has "64k" pages, when it's
actually emulated through "4k" pages under the hood, that could be a
problem: user space could perform an atomic "within" a 64k page that
actually crosses two 4k pages. So it must be aware that mmap etc operate
in 64k, but the underlying emulation might be smaller.

I suspect (but don't know for sure) that ppc exposed it for such a
reason (maybe not atomic, but something else where user space might be
able to detect the difference).

It's not that common nowadays, but I suspect it might get more common in
the future again.

> 
>>
>> I'd expect other cases where an architecture might have to care about
>> the actual, smallest possible MMUPageSize it might be executed on while
>> running the program.
> 
> That's fine, they can use the min, or just the first match
> (which is always the smallest)
> 
>>
>> It's a shame we had to add MMUPageSize, but maybe it might resurface if
>> we ever support emulating 64K/16K user pagesizes on 4K MMUs.
> 
> Okay, so if I follow that correctly you're suggesting 
> to change KernelPageSize, not MMUPageSize. I can do that change.

Not at all. I'm saying that we leave KernelPageSize and MMUPageSize
alone, just as they are today.

Instead, if we want better statistics (and I think we want) regarding
how things are mapped, we should look into a better interface.

Ideally, that will tell you "how much" of a certain contiguous size is
mapped, if that contiguous size benefits somehow from TLB optimizations,
etc.

Similar to tlbmaps but easier to consume (and obtain, IIRC thpmaps
requires root permissions and has to jump through some racy hoops to
detect folio sizes) and easier to configure and extend.

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-03-02 21:05 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-25 23:27 [PATCH v2] smaps: Report correct page sizes with THP Andi Kleen
2026-02-26 12:08 ` Usama Arif
2026-03-01 17:20   ` Andi Kleen
2026-02-26 17:31 ` David Hildenbrand (Arm)
2026-03-01 17:35   ` Andi Kleen
2026-03-02 19:29     ` David Hildenbrand (Arm)
2026-03-02 20:41       ` Andi Kleen
2026-03-02 21:05         ` David Hildenbrand (Arm)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox