linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] KSM: Optimizations for rmap_walk_ksm
@ 2026-02-06  9:56 xu.xin16
  2026-02-06  9:57 ` [PATCH v2 1/2] ksm: Initialize the addr only once in rmap_walk_ksm xu.xin16
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: xu.xin16 @ 2026-02-06  9:56 UTC (permalink / raw)
  To: david, akpm
  Cc: chengming.zhou, hughd, wang.yaxin, yang.yang29, linux-mm,
	linux-kernel, xu.xin16

From: xu xin <xu.xin16@zte.com.cn>

There are two perfomance optimization patches for rmap_walk_ksm.

The patch [1/2] move the initializaion of addr from the position inside loop
to the position before the loop, since the variable will not change in the
loop.

The patch [2/2] optimize rmap_walk_ksm by passing a suitable page offset range
to the anon_vma_interval_tree_foreach loop to reduce ineffective checks.

The metric performance is seen at patch[2/2].

xu xin (2):
ksm: Initialize the addr only once in rmap_walk_ksm
ksm: Optimize rmap_walk_ksm by passing a suitable address range

mm/ksm.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)

--
2.25.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/2] ksm: Initialize the addr only once in rmap_walk_ksm
  2026-02-06  9:56 [PATCH v2 0/2] KSM: Optimizations for rmap_walk_ksm xu.xin16
@ 2026-02-06  9:57 ` xu.xin16
  2026-02-06 10:46   ` David Hildenbrand (Arm)
  2026-02-06 10:01 ` [PATCH v2 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range xu.xin16
  2026-02-06 10:47 ` [PATCH v2 0/2] KSM: Optimizations for rmap_walk_ksm David Hildenbrand (Arm)
  2 siblings, 1 reply; 8+ messages in thread
From: xu.xin16 @ 2026-02-06  9:57 UTC (permalink / raw)
  To: xu.xin16, david, akpm
  Cc: chengming.zhou, hughd, wang.yaxin, yang.yang29, linux-mm, linux-kernel

From: xu xin <xu.xin16@zte.com.cn>

This is a minor performance optimization, especially when there are many
for-loop iterations, because the addr variable doesn’t change across
iterations.

Therefore, it only needs to be initialized once before the loop.

Signed-off-by: xu xin <xu.xin16@zte.com.cn>
---
 mm/ksm.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 2d89a7c8b4eb..950e122bcbf4 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -3168,6 +3168,8 @@ void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc)
 		return;
 again:
 	hlist_for_each_entry(rmap_item, &stable_node->hlist, hlist) {
+		/* Ignore the stable/unstable/sqnr flags */
+		const unsigned long addr = rmap_item->address & PAGE_MASK;
 		struct anon_vma *anon_vma = rmap_item->anon_vma;
 		struct anon_vma_chain *vmac;
 		struct vm_area_struct *vma;
@@ -3180,16 +3182,13 @@ void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc)
 			}
 			anon_vma_lock_read(anon_vma);
 		}
+
 		anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
 					       0, ULONG_MAX) {
-			unsigned long addr;

 			cond_resched();
 			vma = vmac->vma;

-			/* Ignore the stable/unstable/sqnr flags */
-			addr = rmap_item->address & PAGE_MASK;
-
 			if (addr < vma->vm_start || addr >= vma->vm_end)
 				continue;
 			/*
-- 
2.25.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range
  2026-02-06  9:56 [PATCH v2 0/2] KSM: Optimizations for rmap_walk_ksm xu.xin16
  2026-02-06  9:57 ` [PATCH v2 1/2] ksm: Initialize the addr only once in rmap_walk_ksm xu.xin16
@ 2026-02-06 10:01 ` xu.xin16
  2026-02-06 11:01   ` David Hildenbrand (Arm)
  2026-02-06 10:47 ` [PATCH v2 0/2] KSM: Optimizations for rmap_walk_ksm David Hildenbrand (Arm)
  2 siblings, 1 reply; 8+ messages in thread
From: xu.xin16 @ 2026-02-06 10:01 UTC (permalink / raw)
  To: xu.xin16, david, akpm
  Cc: chengming.zhou, hughd, wang.yaxin, yang.yang29, linux-mm, linux-kernel

From: xu xin <xu.xin16@zte.com.cn>

Problem
=======
When available memory is extremely tight, causing KSM pages to be swapped
out, or when there is significant memory fragmentation and THP triggers
memory compaction, the system will invoke the rmap_walk_ksm function to
perform reverse mapping. However, we observed that this function becomes
particularly time-consuming when a large number of VMAs (e.g., 20,000)
share the same anon_vma. Through debug trace analysis, we found that most
of the latency occurs within anon_vma_interval_tree_foreach, leading to an
excessively long hold time on the anon_vma lock (even reaching 500ms or
more), which in turn causes upper-layer applications (waiting for the
anon_vma lock) to be blocked for extended periods.

Root Reaon
==========
Further investigation revealed that 99.9% of iterations inside the
anon_vma_interval_tree_foreach loop are skipped due to the first check
"if (addr < vma->vm_start || addr >= vma->vm_end)), indicating that a large
number of loop iterations are ineffective. This inefficiency arises because
the pgoff_start and pgoff_end parameters passed to
anon_vma_interval_tree_foreach span the entire address space from 0 to
ULONG_MAX, resulting in very poor loop efficiency.

Solution
========
In fact, we can significantly improve performance by passing a more precise
range based on the given addr. Since the original pages merged by KSM
correspond to anonymous VMAs, the page offset can be calculated as
pgoff = address >> PAGE_SHIFT. Therefore, we can optimize the call by
defining:

	pgoff_start = rmap_item->address >> PAGE_SHIFT;

since KSM folios are always order-0, so folio_nr_pages(KSM folio) is always 1,
so the line:

	"pgoff_end = pgoff_start + folio_nr_pages(folio) - 1;"

becomes directly:

	"pgoff_end = pgoff_start;"

Performance
===========
In our real embedded Linux environment, the measured metrcis were as follows:

1) Time_ms: Max time for holding anon_vma lock in a single rmap_walk_ksm.
2) Nr_iteration_total: The max times of iterations in a loop of anon_vma_interval_tree_foreach
3) Skip_addr_out_of_range: The max times of skipping due to the first check (vma->vm_start
            and vma->vm_end) in a loop of anon_vma_interval_tree_foreach.
4) Skip_mm_mismatch: The max times of skipping due to the second check (rmap_item->mm == vma->vm_mm)
            in a loop of anon_vma_interval_tree_foreach.

The result is as follows:

                 Time_ms      Nr_iteration_total    Skip_addr_out_of_range   Skip_mm_mismatch
Before patched:  228.65       22169                 22168                    0
After pacthed:   0.396        3                     0                        2

The referenced reproducer of rmap_walk_ksm can be found at:
https://lore.kernel.org/all/20260206151424734QIyWL_pA-1QeJPbJlUxsO@zte.com.cn/

Signed-off-by: xu xin <xu.xin16@zte.com.cn>
---
 mm/ksm.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 950e122bcbf4..54f72e92b7f3 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -3170,6 +3170,9 @@ void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc)
 	hlist_for_each_entry(rmap_item, &stable_node->hlist, hlist) {
 		/* Ignore the stable/unstable/sqnr flags */
 		const unsigned long addr = rmap_item->address & PAGE_MASK;
+		const pgoff_t pgoff_start = rmap_item->address >> PAGE_SHIFT;
+		/* KSM folios are always order-0 normal pages */
+		const pgoff_t pgoff_end = pgoff_start;
 		struct anon_vma *anon_vma = rmap_item->anon_vma;
 		struct anon_vma_chain *vmac;
 		struct vm_area_struct *vma;
@@ -3184,7 +3187,7 @@ void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc)
 		}

 		anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
-					       0, ULONG_MAX) {
+					       pgoff_start, pgoff_end) {

 			cond_resched();
 			vma = vmac->vma;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] ksm: Initialize the addr only once in rmap_walk_ksm
  2026-02-06  9:57 ` [PATCH v2 1/2] ksm: Initialize the addr only once in rmap_walk_ksm xu.xin16
@ 2026-02-06 10:46   ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 8+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-06 10:46 UTC (permalink / raw)
  To: xu.xin16, akpm
  Cc: chengming.zhou, hughd, wang.yaxin, yang.yang29, linux-mm, linux-kernel

On 2/6/26 10:57, xu.xin16@zte.com.cn wrote:
> From: xu xin <xu.xin16@zte.com.cn>
> 
> This is a minor performance optimization, especially when there are many
> for-loop iterations, because the addr variable doesn’t change across
> iterations.
> 
> Therefore, it only needs to be initialized once before the loop.
> 
> Signed-off-by: xu xin <xu.xin16@zte.com.cn>
> ---

Make sure to pick up any tags when resending! :)

Acked-by: David Hildenbrand (Arm) <david@kernel.org>

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/2] KSM: Optimizations for rmap_walk_ksm
  2026-02-06  9:56 [PATCH v2 0/2] KSM: Optimizations for rmap_walk_ksm xu.xin16
  2026-02-06  9:57 ` [PATCH v2 1/2] ksm: Initialize the addr only once in rmap_walk_ksm xu.xin16
  2026-02-06 10:01 ` [PATCH v2 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range xu.xin16
@ 2026-02-06 10:47 ` David Hildenbrand (Arm)
  2 siblings, 0 replies; 8+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-06 10:47 UTC (permalink / raw)
  To: xu.xin16, akpm
  Cc: chengming.zhou, hughd, wang.yaxin, yang.yang29, linux-mm, linux-kernel

On 2/6/26 10:56, xu.xin16@zte.com.cn wrote:
> From: xu xin <xu.xin16@zte.com.cn>
> 
> There are two perfomance optimization patches for rmap_walk_ksm.
> 
> The patch [1/2] move the initializaion of addr from the position inside loop
> to the position before the loop, since the variable will not change in the
> loop.
> 
> The patch [2/2] optimize rmap_walk_ksm by passing a suitable page offset range
> to the anon_vma_interval_tree_foreach loop to reduce ineffective checks.
> 
> The metric performance is seen at patch[2/2].

For the future, we usually describe what changed between versions 
briefly in the cover letter. Thanks!

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range
  2026-02-06 10:01 ` [PATCH v2 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range xu.xin16
@ 2026-02-06 11:01   ` David Hildenbrand (Arm)
  2026-02-06 16:15     ` xu.xin16
  0 siblings, 1 reply; 8+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-06 11:01 UTC (permalink / raw)
  To: xu.xin16, akpm
  Cc: chengming.zhou, hughd, wang.yaxin, yang.yang29, linux-mm, linux-kernel

On 2/6/26 11:01, xu.xin16@zte.com.cn wrote:
> From: xu xin <xu.xin16@zte.com.cn>
> 
> Problem
> =======
> When available memory is extremely tight, causing KSM pages to be swapped
> out, or when there is significant memory fragmentation and THP triggers
> memory compaction, the system will invoke the rmap_walk_ksm function to
> perform reverse mapping. However, we observed that this function becomes
> particularly time-consuming when a large number of VMAs (e.g., 20,000)
> share the same anon_vma. Through debug trace analysis, we found that most
> of the latency occurs within anon_vma_interval_tree_foreach, leading to an
> excessively long hold time on the anon_vma lock (even reaching 500ms or
> more), which in turn causes upper-layer applications (waiting for the
> anon_vma lock) to be blocked for extended periods.
> 
> Root Reaon

s/Reaon/Reason/ or better "Cause"

> ==========
> Further investigation revealed that 99.9% of iterations inside the
> anon_vma_interval_tree_foreach loop are skipped due to the first check
> "if (addr < vma->vm_start || addr >= vma->vm_end)), indicating that a large
> number of loop iterations are ineffective. This inefficiency arises because
> the pgoff_start and pgoff_end parameters passed to
> anon_vma_interval_tree_foreach span the entire address space from 0 to
> ULONG_MAX, resulting in very poor loop efficiency.
> 
> Solution
> ========
> In fact, we can significantly improve performance by passing a more precise
> range based on the given addr. Since the original pages merged by KSM
> correspond to anonymous VMAs, the page offset can be calculated as
> pgoff = address >> PAGE_SHIFT. Therefore, we can optimize the call by
> defining:
> 
> 	pgoff_start = rmap_item->address >> PAGE_SHIFT;
> 
> since KSM folios are always order-0, so folio_nr_pages(KSM folio) is always 1,
> so the line:
> 
> 	"pgoff_end = pgoff_start + folio_nr_pages(folio) - 1;"
> 
> becomes directly:
> 
> 	"pgoff_end = pgoff_start;"
> 
> Performance
> ===========
> In our real embedded Linux environment, the measured metrcis were as follows:
> 
> 1) Time_ms: Max time for holding anon_vma lock in a single rmap_walk_ksm.
> 2) Nr_iteration_total: The max times of iterations in a loop of anon_vma_interval_tree_foreach
> 3) Skip_addr_out_of_range: The max times of skipping due to the first check (vma->vm_start
>              and vma->vm_end) in a loop of anon_vma_interval_tree_foreach.
> 4) Skip_mm_mismatch: The max times of skipping due to the second check (rmap_item->mm == vma->vm_mm)
>              in a loop of anon_vma_interval_tree_foreach.
> 
> The result is as follows:
> 
>                   Time_ms      Nr_iteration_total    Skip_addr_out_of_range   Skip_mm_mismatch
> Before patched:  228.65       22169                 22168                    0
> After pacthed:   0.396        3                     0                        2

s/pacthed/patched/

But I would just call it "Before" and "After".

> 
> The referenced reproducer of rmap_walk_ksm can be found at:
> https://lore.kernel.org/all/20260206151424734QIyWL_pA-1QeJPbJlUxsO@zte.com.cn/
> 
> Signed-off-by: xu xin <xu.xin16@zte.com.cn>

Did you accidentally drop a

	Co-developed-by: Wang Yaxin <wang.yaxin@zte.com.cn>

?

> ---
>   mm/ksm.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 950e122bcbf4..54f72e92b7f3 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -3170,6 +3170,9 @@ void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc)
>   	hlist_for_each_entry(rmap_item, &stable_node->hlist, hlist) {
>   		/* Ignore the stable/unstable/sqnr flags */
>   		const unsigned long addr = rmap_item->address & PAGE_MASK;
> +		const pgoff_t pgoff_start = rmap_item->address >> PAGE_SHIFT;
> +		/* KSM folios are always order-0 normal pages */
> +		const pgoff_t pgoff_end = pgoff_start;


Maybe simply

const pgoff_t pgoff = rmap_item->address >> PAGE_SHIFT;

and drop pgoff_end? Then you simply pass pgoff as start and end below. 
You could add the KSM folio comment above the 
anon_vma_interval_tree_foreach.


If the tools/testing/selftests/mm/rmap.c selftests keeps passing 
rmap_walk_ksm() should be working as expected. Did you run it to make sure?

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range
  2026-02-06 11:01   ` David Hildenbrand (Arm)
@ 2026-02-06 16:15     ` xu.xin16
  0 siblings, 0 replies; 8+ messages in thread
From: xu.xin16 @ 2026-02-06 16:15 UTC (permalink / raw)
  To: david
  Cc: akpm, chengming.zhou, hughd, wang.yaxin, yang.yang29, linux-mm,
	linux-kernel

> s/Reaon/Reason/ or better "Cause"

Thanks. Fix it in v3.

> > In our real embedded Linux environment, the measured metrcis were as follows:
> > 
> > 1) Time_ms: Max time for holding anon_vma lock in a single rmap_walk_ksm.
> > 2) Nr_iteration_total: The max times of iterations in a loop of anon_vma_interval_tree_foreach
> > 3) Skip_addr_out_of_range: The max times of skipping due to the first check (vma->vm_start
> >              and vma->vm_end) in a loop of anon_vma_interval_tree_foreach.
> > 4) Skip_mm_mismatch: The max times of skipping due to the second check (rmap_item->mm == vma->vm_mm)
> >              in a loop of anon_vma_interval_tree_foreach.
> > 
> > The result is as follows:
> > 
> >                   Time_ms      Nr_iteration_total    Skip_addr_out_of_range   Skip_mm_mismatch
> > Before patched:  228.65       22169                 22168                    0
> > After pacthed:   0.396        3                     0                        2
> 
> s/pacthed/patched/
> 
> But I would just call it "Before" and "After".

Thanks.

> 
> > 
> > The referenced reproducer of rmap_walk_ksm can be found at:
> > https://lore.kernel.org/all/20260206151424734QIyWL_pA-1QeJPbJlUxsO@zte.com.cn/
> > 
> > Signed-off-by: xu xin <xu.xin16@zte.com.cn>
> 
> Did you accidentally drop a
> 
> 	Co-developed-by: Wang Yaxin <wang.yaxin@zte.com.cn>
> 
> ?

Oh, yes, Thanks.

> 
> > ---
> >   mm/ksm.c | 5 ++++-
> >   1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/ksm.c b/mm/ksm.c
> > index 950e122bcbf4..54f72e92b7f3 100644
> > --- a/mm/ksm.c
> > +++ b/mm/ksm.c
> > @@ -3170,6 +3170,9 @@ void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc)
> >   	hlist_for_each_entry(rmap_item, &stable_node->hlist, hlist) {
> >   		/* Ignore the stable/unstable/sqnr flags */
> >   		const unsigned long addr = rmap_item->address & PAGE_MASK;
> > +		const pgoff_t pgoff_start = rmap_item->address >> PAGE_SHIFT;
> > +		/* KSM folios are always order-0 normal pages */
> > +		const pgoff_t pgoff_end = pgoff_start;
> 
> 
> Maybe simply
> 
> const pgoff_t pgoff = rmap_item->address >> PAGE_SHIFT;
> 
> and drop pgoff_end? Then you simply pass pgoff as start and end below. 
> You could add the KSM folio comment above the 
> anon_vma_interval_tree_foreach.

Will do it in v3.

> 
> 
> If the tools/testing/selftests/mm/rmap.c selftests keeps passing 
> rmap_walk_ksm() should be working as expected. Did you run it to make sure?

Yes, if running tools/testing/selftests/mm/rmap.c, rmap_walk_ksm() can work, but
it won't trigger a high delay of time consuming on anon_vma_interval_tree_foreach
because there's few VMAs sharing anon_vma.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 0/2] KSM: Optimizations for rmap_walk_ksm
@ 2026-02-06  9:55 xu.xin16
  0 siblings, 0 replies; 8+ messages in thread
From: xu.xin16 @ 2026-02-06  9:55 UTC (permalink / raw)
  To: david, akpm
  Cc: chengming.zhou, hughd, wang.yaxin, yang.yang29, linux-mm, linux-kernel

From: xu xin <xu.xin16@zte.com.cn>

There are two perfomance optimization patches for rmap_walk_ksm.

The patch [1/2] move the initializaion of addr from the position inside loop
to the position before the loop, since the variable will not change in the
loop.

The patch [2/2] optimize rmap_walk_ksm by passing a suitable page offset range
to the anon_vma_interval_tree_foreach loop to reduce ineffective checks.

The metric performance is seen at patch[2/2].

xu xin (2):
  ksm: Initialize the addr only once in rmap_walk_ksm
  ksm: Optimize rmap_walk_ksm by passing a suitable address range

 mm/ksm.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-02-06 16:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-06  9:56 [PATCH v2 0/2] KSM: Optimizations for rmap_walk_ksm xu.xin16
2026-02-06  9:57 ` [PATCH v2 1/2] ksm: Initialize the addr only once in rmap_walk_ksm xu.xin16
2026-02-06 10:46   ` David Hildenbrand (Arm)
2026-02-06 10:01 ` [PATCH v2 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range xu.xin16
2026-02-06 11:01   ` David Hildenbrand (Arm)
2026-02-06 16:15     ` xu.xin16
2026-02-06 10:47 ` [PATCH v2 0/2] KSM: Optimizations for rmap_walk_ksm David Hildenbrand (Arm)
  -- strict thread matches above, loose matches on Subject: below --
2026-02-06  9:55 xu.xin16

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox