[PATCH v3 0/2] KSM: Optimizations for rmap_walk

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 0/2] KSM: Optimizations for rmap_walk_ksm
@ 2026-02-12 11:28 xu.xin16
  2026-02-12 11:29 ` [PATCH v3 1/2] ksm: Initialize the addr only once in rmap_walk_ksm xu.xin16
  2026-02-12 11:30 ` [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range xu.xin16
  0 siblings, 2 replies; 27+ messages in thread
From: xu.xin16 @ 2026-02-12 11:28 UTC (permalink / raw)
  To: david, akpm, xu.xin16
  Cc: chengming.zhou, hughd, wang.yaxin, yang.yang29, linux-mm, linux-kernel

From: xu xin <xu.xin16@zte.com.cn>

There are two performance optimization patches for rmap_walk_ksm.

The patch [1/2] move the initializaion of addr from the position inside
loop to the position before the loop, since the variable will not change
in the loop.

The patch [2/2] optimize rmap_walk_ksm by passing a suitable page offset
range to the anon_vma_interval_tree_foreach loop to reduce ineffective
checks.

The metric performance and reproducer can be found at patch[2/2].

Changes in v3:
- Fix some typos in commit description
- Replace "pgoff_start" and 'pgoff_end' by 'pgoff'.

Changes in v2:
- Use const variable to initialize 'addr'  "pgoff_start" and 'pgoff_end'
- Let pgoff_end = pgoff_start, since KSM folios are always order-0 (Suggested by David)

xu xin (2):
  ksm: Initialize the addr only once in rmap_walk_ksm
  ksm: Optimize rmap_walk_ksm by passing a suitable address range

 mm/ksm.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 1/2] ksm: Initialize the addr only once in rmap_walk_ksm
  2026-02-12 11:28 [PATCH v3 0/2] KSM: Optimizations for rmap_walk_ksm xu.xin16
@ 2026-02-12 11:29 ` xu.xin16
  2026-02-12 11:30 ` [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range xu.xin16
  1 sibling, 0 replies; 27+ messages in thread
From: xu.xin16 @ 2026-02-12 11:29 UTC (permalink / raw)
  To: akpm, xu.xin16
  Cc: chengming.zhou, hughd, wang.yaxin, yang.yang29, linux-mm, linux-kernel

From: xu xin <xu.xin16@zte.com.cn>

This is a minor performance optimization, especially when there are many
for-loop iterations, because the addr variable doesn’t change across
iterations.

Therefore, it only needs to be initialized once before the loop.

Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
---
 mm/ksm.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 2d89a7c8b4eb..950e122bcbf4 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -3168,6 +3168,8 @@ void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc)
 		return;
 again:
 	hlist_for_each_entry(rmap_item, &stable_node->hlist, hlist) {
+		/* Ignore the stable/unstable/sqnr flags */
+		const unsigned long addr = rmap_item->address & PAGE_MASK;
 		struct anon_vma *anon_vma = rmap_item->anon_vma;
 		struct anon_vma_chain *vmac;
 		struct vm_area_struct *vma;
@@ -3180,16 +3182,13 @@ void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc)
 			}
 			anon_vma_lock_read(anon_vma);
 		}
+
 		anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
 					       0, ULONG_MAX) {
-			unsigned long addr;

 			cond_resched();
 			vma = vmac->vma;

-			/* Ignore the stable/unstable/sqnr flags */
-			addr = rmap_item->address & PAGE_MASK;
-
 			if (addr < vma->vm_start || addr >= vma->vm_end)
 				continue;
 			/*
-- 
2.25.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range
  2026-02-12 11:28 [PATCH v3 0/2] KSM: Optimizations for rmap_walk_ksm xu.xin16
  2026-02-12 11:29 ` [PATCH v3 1/2] ksm: Initialize the addr only once in rmap_walk_ksm xu.xin16
@ 2026-02-12 11:30 ` xu.xin16
  2026-02-12 12:21   ` David Hildenbrand (Arm)
  2026-04-05  4:44   ` Hugh Dickins
  1 sibling, 2 replies; 27+ messages in thread
From: xu.xin16 @ 2026-02-12 11:30 UTC (permalink / raw)
  To: akpm, xu.xin16, david
  Cc: chengming.zhou, hughd, wang.yaxin, yang.yang29, linux-mm, linux-kernel

From: xu xin <xu.xin16@zte.com.cn>

Problem
=======
When available memory is extremely tight, causing KSM pages to be swapped
out, or when there is significant memory fragmentation and THP triggers
memory compaction, the system will invoke the rmap_walk_ksm function to
perform reverse mapping. However, we observed that this function becomes
particularly time-consuming when a large number of VMAs (e.g., 20,000)
share the same anon_vma. Through debug trace analysis, we found that most
of the latency occurs within anon_vma_interval_tree_foreach, leading to an
excessively long hold time on the anon_vma lock (even reaching 500ms or
more), which in turn causes upper-layer applications (waiting for the
anon_vma lock) to be blocked for extended periods.

Root Cause
==========
Further investigation revealed that 99.9% of iterations inside the
anon_vma_interval_tree_foreach loop are skipped due to the first check
"if (addr < vma->vm_start || addr >= vma->vm_end)), indicating that a large
number of loop iterations are ineffective. This inefficiency arises because
the pgoff_start and pgoff_end parameters passed to
anon_vma_interval_tree_foreach span the entire address space from 0 to
ULONG_MAX, resulting in very poor loop efficiency.

Solution
========
In fact, we can significantly improve performance by passing a more precise
range based on the given addr. Since the original pages merged by KSM
correspond to anonymous VMAs, the page offset can be calculated as
pgoff = address >> PAGE_SHIFT. Therefore, we can optimize the call by
defining:

	pgoff = rmap_item->address >> PAGE_SHIFT;

Performance
===========
In our real embedded Linux environment, the measured metrcis were as
follows:

1) Time_ms: Max time for holding anon_vma lock in a single rmap_walk_ksm.
2) Nr_iteration_total: The max times of iterations in a loop of anon_vma_interval_tree_foreach
3) Skip_addr_out_of_range: The max times of skipping due to the first check (vma->vm_start
            and vma->vm_end) in a loop of anon_vma_interval_tree_foreach.
4) Skip_mm_mismatch: The max times of skipping due to the second check (rmap_item->mm == vma->vm_mm)
            in a loop of anon_vma_interval_tree_foreach.

The result is as follows:

         Time_ms      Nr_iteration_total    Skip_addr_out_of_range   Skip_mm_mismatch
Before:  228.65       22169                 22168                    0
After :   0.396        3                     0                       2

The referenced reproducer of rmap_walk_ksm can be found at:
https://lore.kernel.org/all/20260206151424734QIyWL_pA-1QeJPbJlUxsO@zte.com.cn/

Co-developed-by: Wang Yaxin <wang.yaxin@zte.com.cn>
Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn>
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
---
 mm/ksm.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 950e122bcbf4..7b974f333391 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -3170,6 +3170,7 @@ void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc)
 	hlist_for_each_entry(rmap_item, &stable_node->hlist, hlist) {
 		/* Ignore the stable/unstable/sqnr flags */
 		const unsigned long addr = rmap_item->address & PAGE_MASK;
+		const pgoff_t pgoff = rmap_item->address >> PAGE_SHIFT;
 		struct anon_vma *anon_vma = rmap_item->anon_vma;
 		struct anon_vma_chain *vmac;
 		struct vm_area_struct *vma;
@@ -3183,8 +3184,12 @@ void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc)
 			anon_vma_lock_read(anon_vma);
 		}

+		/*
+		 * Currently KSM folios are order-0 normal pages, so pgoff_end
+		 * should be the same as pgoff_start.
+		 */
 		anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
-					       0, ULONG_MAX) {
+					       pgoff, pgoff) {

 			cond_resched();
 			vma = vmac->vma;
-- 
2.25.1

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range
  2026-02-12 11:30 ` [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range xu.xin16
@ 2026-02-12 12:21   ` David Hildenbrand (Arm)
  2026-04-05  4:44   ` Hugh Dickins
  1 sibling, 0 replies; 27+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-12 12:21 UTC (permalink / raw)
  To: xu.xin16, akpm
  Cc: chengming.zhou, hughd, wang.yaxin, yang.yang29, linux-mm, linux-kernel

On 2/12/26 12:30, xu.xin16@zte.com.cn wrote:
> From: xu xin <xu.xin16@zte.com.cn>
> 
> Problem
> =======
> When available memory is extremely tight, causing KSM pages to be swapped
> out, or when there is significant memory fragmentation and THP triggers
> memory compaction, the system will invoke the rmap_walk_ksm function to
> perform reverse mapping. However, we observed that this function becomes
> particularly time-consuming when a large number of VMAs (e.g., 20,000)
> share the same anon_vma. Through debug trace analysis, we found that most
> of the latency occurs within anon_vma_interval_tree_foreach, leading to an
> excessively long hold time on the anon_vma lock (even reaching 500ms or
> more), which in turn causes upper-layer applications (waiting for the
> anon_vma lock) to be blocked for extended periods.
> 
> Root Cause
> ==========
> Further investigation revealed that 99.9% of iterations inside the
> anon_vma_interval_tree_foreach loop are skipped due to the first check
> "if (addr < vma->vm_start || addr >= vma->vm_end)), indicating that a large
> number of loop iterations are ineffective. This inefficiency arises because
> the pgoff_start and pgoff_end parameters passed to
> anon_vma_interval_tree_foreach span the entire address space from 0 to
> ULONG_MAX, resulting in very poor loop efficiency.
> 
> Solution
> ========
> In fact, we can significantly improve performance by passing a more precise
> range based on the given addr. Since the original pages merged by KSM
> correspond to anonymous VMAs, the page offset can be calculated as
> pgoff = address >> PAGE_SHIFT. Therefore, we can optimize the call by
> defining:
> 
> 	pgoff = rmap_item->address >> PAGE_SHIFT;
> 
> Performance
> ===========
> In our real embedded Linux environment, the measured metrcis were as
> follows:
> 
> 1) Time_ms: Max time for holding anon_vma lock in a single rmap_walk_ksm.
> 2) Nr_iteration_total: The max times of iterations in a loop of anon_vma_interval_tree_foreach
> 3) Skip_addr_out_of_range: The max times of skipping due to the first check (vma->vm_start
>              and vma->vm_end) in a loop of anon_vma_interval_tree_foreach.
> 4) Skip_mm_mismatch: The max times of skipping due to the second check (rmap_item->mm == vma->vm_mm)
>              in a loop of anon_vma_interval_tree_foreach.
> 
> The result is as follows:
> 
>           Time_ms      Nr_iteration_total    Skip_addr_out_of_range   Skip_mm_mismatch
> Before:  228.65       22169                 22168                    0
> After :   0.396        3                     0                       2
> 
> The referenced reproducer of rmap_walk_ksm can be found at:
> https://lore.kernel.org/all/20260206151424734QIyWL_pA-1QeJPbJlUxsO@zte.com.cn/
> 
> Co-developed-by: Wang Yaxin <wang.yaxin@zte.com.cn>
> Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn>
> Signed-off-by: xu xin <xu.xin16@zte.com.cn>
> ---

Acked-by: David Hildenbrand (Arm) <david@kernel.org>

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range
  2026-02-12 11:30 ` [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range xu.xin16
  2026-02-12 12:21   ` David Hildenbrand (Arm)
@ 2026-04-05  4:44   ` Hugh Dickins
  2026-04-05 21:01     ` Andrew Morton
                       ` (3 more replies)
  1 sibling, 4 replies; 27+ messages in thread
From: Hugh Dickins @ 2026-04-05  4:44 UTC (permalink / raw)
  To: xu.xin16
  Cc: akpm, david, chengming.zhou, hughd, wang.yaxin, yang.yang29,
	Michel Lespinasse, Lorenzo Stoakes, linux-mm, linux-kernel

On Thu, 12 Feb 2026, xu.xin16@zte.com.cn wrote:

> From: xu xin <xu.xin16@zte.com.cn>
> 
> Problem
> =======
> When available memory is extremely tight, causing KSM pages to be swapped
> out, or when there is significant memory fragmentation and THP triggers
> memory compaction, the system will invoke the rmap_walk_ksm function to
> perform reverse mapping. However, we observed that this function becomes
> particularly time-consuming when a large number of VMAs (e.g., 20,000)
> share the same anon_vma. Through debug trace analysis, we found that most
> of the latency occurs within anon_vma_interval_tree_foreach, leading to an
> excessively long hold time on the anon_vma lock (even reaching 500ms or
> more), which in turn causes upper-layer applications (waiting for the
> anon_vma lock) to be blocked for extended periods.
> 
> Root Cause
> ==========
> Further investigation revealed that 99.9% of iterations inside the
> anon_vma_interval_tree_foreach loop are skipped due to the first check
> "if (addr < vma->vm_start || addr >= vma->vm_end)), indicating that a large
> number of loop iterations are ineffective. This inefficiency arises because
> the pgoff_start and pgoff_end parameters passed to
> anon_vma_interval_tree_foreach span the entire address space from 0 to
> ULONG_MAX, resulting in very poor loop efficiency.
> 
> Solution
> ========
> In fact, we can significantly improve performance by passing a more precise
> range based on the given addr. Since the original pages merged by KSM
> correspond to anonymous VMAs, the page offset can be calculated as
> pgoff = address >> PAGE_SHIFT. Therefore, we can optimize the call by
> defining:
> 
> 	pgoff = rmap_item->address >> PAGE_SHIFT;
> 
> Performance
> ===========
> In our real embedded Linux environment, the measured metrcis were as
> follows:
> 
> 1) Time_ms: Max time for holding anon_vma lock in a single rmap_walk_ksm.
> 2) Nr_iteration_total: The max times of iterations in a loop of anon_vma_interval_tree_foreach
> 3) Skip_addr_out_of_range: The max times of skipping due to the first check (vma->vm_start
>             and vma->vm_end) in a loop of anon_vma_interval_tree_foreach.
> 4) Skip_mm_mismatch: The max times of skipping due to the second check (rmap_item->mm == vma->vm_mm)
>             in a loop of anon_vma_interval_tree_foreach.
> 
> The result is as follows:
> 
>          Time_ms      Nr_iteration_total    Skip_addr_out_of_range   Skip_mm_mismatch
> Before:  228.65       22169                 22168                    0
> After :   0.396        3                     0                       2
> 
> The referenced reproducer of rmap_walk_ksm can be found at:
> https://lore.kernel.org/all/20260206151424734QIyWL_pA-1QeJPbJlUxsO@zte.com.cn/
> 
> Co-developed-by: Wang Yaxin <wang.yaxin@zte.com.cn>
> Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn>
> Signed-off-by: xu xin <xu.xin16@zte.com.cn>

This is a very attractive speedup, but I believe it's flawed: in the
special case when a range has been mremap-moved, when its anon folio
indexes and anon_vma pgoff correspond to the original user address,
not to the current user address.

In which case, rmap_walk_ksm() will be unable to find all the PTEs
for that KSM folio, which will consequently be pinned in memory -
unable to be reclaimed, unable to be migrated, unable to be hotremoved,
until it's finally unmapped or KSM disabled.

But it's years since I worked on KSM or on anon_vma, so I may be confused
and my belief wrong.  I have tried to test it, and my testcase did appear
to show 7.0-rc6 successfully swapping out even mremap-moved KSM folios,
but mm.git failing to do so.  However, I say "appear to show" because I
found swapping out any KSM pages harder than I'd been expecting: so have
some doubts about my testing.  Let me give more detail on that at the
bottom of this mail: it's a tangent which had better not distract from
your speedup.

If I'm right that your patch is flawed, what to do?

Perhaps there is, or could be, a cleverer way for KSM to walk the anon_vma
interval tree, which can handle the mremap-moved pgoffs appropriately.
Cc'ing Michel, whose bf181b9f9d8d ("mm anon rmap: replace same_anon_vma
linked list with an interval tree.") specifically chose the 0, ULONG_MAX
which you are replacing.

Cc'ing Lorenzo, who is currently considering replacing anon_vma by
something more like my anonmm, which preceded Andrea's anon_vma in 2.6.7;
but Lorenzo supplementing it with the mremap tracking which defeated me.
This rmap_walk_ksm() might well benefit from his approach.  (I'm not
actually expecting any input from Lorenzo here, or Michel: more FYIs.)

But more realistic in the short term, might be for you to keep your
optimization, but fix the lookup, by keeping a count of PTEs found,
and when that falls short, take a second pass with 0, ULONG_MAX.
Somewhat ugly, certainly imperfect, but good enough for now.

More comment on KSM swapout below...

> ---
>  mm/ksm.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 950e122bcbf4..7b974f333391 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -3170,6 +3170,7 @@ void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc)
>  	hlist_for_each_entry(rmap_item, &stable_node->hlist, hlist) {
>  		/* Ignore the stable/unstable/sqnr flags */
>  		const unsigned long addr = rmap_item->address & PAGE_MASK;
> +		const pgoff_t pgoff = rmap_item->address >> PAGE_SHIFT;
>  		struct anon_vma *anon_vma = rmap_item->anon_vma;
>  		struct anon_vma_chain *vmac;
>  		struct vm_area_struct *vma;
> @@ -3183,8 +3184,12 @@ void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc)
>  			anon_vma_lock_read(anon_vma);
>  		}
> 
> +		/*
> +		 * Currently KSM folios are order-0 normal pages, so pgoff_end
> +		 * should be the same as pgoff_start.
> +		 */
>  		anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
> -					       0, ULONG_MAX) {
> +					       pgoff, pgoff) {
> 
>  			cond_resched();
>  			vma = vmac->vma;
> -- 
> 2.25.1

Unrelated to this patch, but when I tried to test KSM swapout (even
without mremap), it first appeared not to be working.  Quite likely
my testcase was too simple and naive, not indicating any problem in
real world usage.  But checking back on much older kernels, I did
find that 5.8 swapped KSM as I was expecting, 5.9 not.

Bisected to commit b518154e59aa ("mm/vmscan: protect the workingset
on anonymous LRU"), the one which changed all those
lru_cache_add_active_or_unevictable()s to
lru_cache_add_inactive_or_unevictable()s.

I rather think that mm/ksm.c should have been updated at that time.
Here's the patch I went on to use in testing the mremap question
(I still had to do more memhogging than 5.8 had needed, but that's
probably just reflective of what that commit was intended to fix).

I'm not saying the below is the right patch (it would probably be
better to replicate the existing flags); but throw it out there
for someone more immersed in KSM to pick up and improve upon.

Hugh

--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1422,7 +1422,7 @@ static int replace_page(struct vm_area_s
 	if (!is_zero_pfn(page_to_pfn(kpage))) {
 		folio_get(kfolio);
 		folio_add_anon_rmap_pte(kfolio, kpage, vma, addr, RMAP_NONE);
-		newpte = mk_pte(kpage, vma->vm_page_prot);
+		newpte = pte_mkold(mk_pte(kpage, vma->vm_page_prot));
 	} else {
 		/*
 		 * Use pte_mkdirty to mark the zero page mapped by KSM, and then
@@ -1514,7 +1514,7 @@ static int try_to_merge_one_page(struct
 			 * stable_tree_insert() will update stable_node.
 			 */
 			folio_set_stable_node(folio, NULL);
-			folio_mark_accessed(folio);
+//			folio_mark_accessed(folio);
 			/*
 			 * Page reclaim just frees a clean folio with no dirty
 			 * ptes: make sure that the ksm page would be swapped.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range
  2026-04-05  4:44   ` Hugh Dickins
@ 2026-04-05 21:01     ` Andrew Morton
  2026-04-07  9:43       ` Lorenzo Stoakes (Oracle)
  2026-04-06  1:58     ` xu.xin16
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 27+ messages in thread
From: Andrew Morton @ 2026-04-05 21:01 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: xu.xin16, david, chengming.zhou, wang.yaxin, yang.yang29,
	Michel Lespinasse, Lorenzo Stoakes, linux-mm, linux-kernel

On Sat, 4 Apr 2026 21:44:14 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote:

> This is a very attractive speedup, but I believe it's flawed: in the
> special case when a range has been mremap-moved, when its anon folio
> indexes and anon_vma pgoff correspond to the original user address,
> not to the current user address.
> 
> In which case, rmap_walk_ksm() will be unable to find all the PTEs
> for that KSM folio, which will consequently be pinned in memory -
> unable to be reclaimed, unable to be migrated, unable to be hotremoved,
> until it's finally unmapped or KSM disabled.
> 
> But it's years since I worked on KSM or on anon_vma, so I may be confused
> and my belief wrong.  I have tried to test it, and my testcase did appear
> to show 7.0-rc6 successfully swapping out even mremap-moved KSM folios,
> but mm.git failing to do so.  However, I say "appear to show" because I
> found swapping out any KSM pages harder than I'd been expecting: so have
> some doubts about my testing.  Let me give more detail on that at the
> bottom of this mail: it's a tangent which had better not distract from
> your speedup.
> 
> If I'm right that your patch is flawed, what to do?

Thanks, Hugh.   Administreevia:

I've removed this patch from the mm-stable branch and I reworked its
[1/2] "ksm: initialize the addr only once in rmap_walk_ksm" to be
presented as a singleton patch.

For now I've restaged this patch ("ksm: optimize rmap_walk_ksm by
passing a suitable address range") at the tail of the mm-unstable
branch and I'll enter wait-and-see mode.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range
  2026-04-05  4:44   ` Hugh Dickins
  2026-04-05 21:01     ` Andrew Morton
@ 2026-04-06  1:58     ` xu.xin16
  2026-04-06  5:35       ` Hugh Dickins
  2026-04-06  9:21     ` David Hildenbrand (arm)
  2026-04-07  9:39     ` Lorenzo Stoakes (Oracle)
  3 siblings, 1 reply; 27+ messages in thread
From: xu.xin16 @ 2026-04-06  1:58 UTC (permalink / raw)
  To: hughd
  Cc: akpm, david, chengming.zhou, hughd, wang.yaxin, yang.yang29,
	michel, ljs, linux-mm, linux-kernel


[-- Attachment #1.1.1: Type: text/plain, Size: 3520 bytes --]

> > The result is as follows:
> > 
> >          Time_ms      Nr_iteration_total    Skip_addr_out_of_range   Skip_mm_mismatch
> > Before:  228.65       22169                 22168                    0
> > After :   0.396        3                     0                       2
> > 
> > The referenced reproducer of rmap_walk_ksm can be found at:
> > https://lore.kernel.org/all/20260206151424734QIyWL_pA-1QeJPbJlUxsO@zte.com.cn/
> > 
> > Co-developed-by: Wang Yaxin <wang.yaxin@zte.com.cn>
> > Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn>
> > Signed-off-by: xu xin <xu.xin16@zte.com.cn>
> 
> This is a very attractive speedup, but I believe it's flawed: in the
> special case when a range has been mremap-moved, when its anon folio
> indexes and anon_vma pgoff correspond to the original user address,
> not to the current user address.
> 
> In which case, rmap_walk_ksm() will be unable to find all the PTEs
> for that KSM folio, which will consequently be pinned in memory -
> unable to be reclaimed, unable to be migrated, unable to be hotremoved,
> until it's finally unmapped or KSM disabled.
> 
> But it's years since I worked on KSM or on anon_vma, so I may be confused
> and my belief wrong.  I have tried to test it, and my testcase did appear
> to show 7.0-rc6 successfully swapping out even mremap-moved KSM folios,
> but mm.git failing to do so.  

Thank you very much for providing such detailed historical context. However,
I'm curious about your test case: how did you observe that KSM pages in mm.git
could not be swapped out, while 7.0-rc6 worked fine?  

From the current implementation of mremap, before it succeeds, it always calls
prep_move_vma() -> madvise(MADV_UNMERGEABLE) -> break_ksm(), which splits KSM pages
into regular anonymous pages, which appears to be based on a patch you introduced
over a decade ago, 1ff829957316(ksm: prevent mremap move poisoning). Given this,
KSM pages should already be broken prior to the move, so they wouldn't remain as
mergeable pages after mremap. Could there be a scenario where this breaking mechanism
is bypassed, or am I missing a subtlety in the sequence of operations?

Thanks!

> However, I say "appear to show" because I
> found swapping out any KSM pages harder than I'd been expecting: so have
> some doubts about my testing.  Let me give more detail on that at the
> bottom of this mail: it's a tangent which had better not distract from
> your speedup.
> 
> If I'm right that your patch is flawed, what to do?
> 
> Perhaps there is, or could be, a cleverer way for KSM to walk the anon_vma
> interval tree, which can handle the mremap-moved pgoffs appropriately.
> Cc'ing Michel, whose bf181b9f9d8d ("mm anon rmap: replace same_anon_vma
> linked list with an interval tree.") specifically chose the 0, ULONG_MAX
> which you are replacing.
> 
> Cc'ing Lorenzo, who is currently considering replacing anon_vma by
> something more like my anonmm, which preceded Andrea's anon_vma in 2.6.7;
> but Lorenzo supplementing it with the mremap tracking which defeated me.
> This rmap_walk_ksm() might well benefit from his approach.  (I'm not
> actually expecting any input from Lorenzo here, or Michel: more FYIs.)
> 
> But more realistic in the short term, might be for you to keep your
> optimization, but fix the lookup, by keeping a count of PTEs found,
> and when that falls short, take a second pass with 0, ULONG_MAX.
> Somewhat ugly, certainly imperfect, but good enough for now.

[-- Attachment #1.1.2: Type: text/html , Size: 4533 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range
  2026-04-06  1:58     ` xu.xin16
@ 2026-04-06  5:35       ` Hugh Dickins
  2026-04-07  6:21         ` xu.xin16
  0 siblings, 1 reply; 27+ messages in thread
From: Hugh Dickins @ 2026-04-06  5:35 UTC (permalink / raw)
  To: xu.xin16
  Cc: hughd, akpm, david, chengming.zhou, wang.yaxin, yang.yang29,
	michel, ljs, linux-mm, linux-kernel

On Mon, 6 Apr 2026, xu.xin16@zte.com.cn wrote:
> > 
> > But it's years since I worked on KSM or on anon_vma, so I may be confused
> > and my belief wrong.  I have tried to test it, and my testcase did appear
> > to show 7.0-rc6 successfully swapping out even mremap-moved KSM folios,
> > but mm.git failing to do so.  
> 
> Thank you very much for providing such detailed historical context. However,
> I'm curious about your test case: how did you observe that KSM pages in mm.git
> could not be swapped out, while 7.0-rc6 worked fine?  
> 
> From the current implementation of mremap, before it succeeds, it always calls
> prep_move_vma() -> madvise(MADV_UNMERGEABLE) -> break_ksm(), which splits KSM pages
> into regular anonymous pages, which appears to be based on a patch you introduced
> over a decade ago, 1ff829957316(ksm: prevent mremap move poisoning). Given this,
> KSM pages should already be broken prior to the move, so they wouldn't remain as
> mergeable pages after mremap. Could there be a scenario where this breaking mechanism
> is bypassed, or am I missing a subtlety in the sequence of operations?

I'd completely forgotten that patch by now!  But it's dealing with a
different issue; and note how it's intentionally leaving MADV_MERGEABLE
on the vma itself, just using MADV_UNMERGEABLE (with &dummy) as an
interface to CoW the KSM pages at that time, letting them be remerged after.

The sequence in my testcase was:

boot with mem=1G
echo 1 >/sys/kernel/mm/ksm/run
base = mmap(NULL, 3*PAGE_SIZE, PROT_READ|PROT_WRITE,
		MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
madvise(base, 3*PAGE_SIZE, MADV_MERGEABLE);
madvise(base, 3*PAGE_SIZE, MADV_DONTFORK); /* in case system() used */
memset(base, 0x77, 2*PAGE_SIZE);
sleep(1); /* I think not required */
mremap(base + PAGE_SIZE, PAGE_SIZE, PAGE_SIZE,
	MREMAP_MAYMOVE|MREMAP_FIXED, base + 2*PAGE_SIZE);
base2 = mmap(NULL, 512K, PROT_READ|PROT_WRITE,
		MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
madvise(base2, 512K, MADV_DONTFORK); /* in case system() used */
memset(base2, 0x77, 512K);
print pages_shared pages_sharing /* 1 1 expected, 1 1 seen */
run something to mmap 1G anon, touch all, touch again, exit 
print pages_shared pages_sharing /* 0 0 expected, 1 1 seen */
exit

Those base2 lines were a late addition, to get the test without mremap
showing 0 0 instead of 1 1 at the end; just as I had to apply that
pte_mkold-without-folio_mark_accessed patch to the kernel's mm/ksm.c.

Originally I was checking the testcase's /proc/pid/smaps manually
before exit; then found printing pages_shared pages_sharing easier.

Hugh


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range
  2026-04-05  4:44   ` Hugh Dickins
  2026-04-05 21:01     ` Andrew Morton
  2026-04-06  1:58     ` xu.xin16
@ 2026-04-06  9:21     ` David Hildenbrand (arm)
  2026-04-06  9:23       ` David Hildenbrand (arm)
  2026-04-07  9:39     ` Lorenzo Stoakes (Oracle)
  3 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand (arm) @ 2026-04-06  9:21 UTC (permalink / raw)
  To: Hugh Dickins, xu.xin16
  Cc: akpm, chengming.zhou, wang.yaxin, yang.yang29, Michel Lespinasse,
	Lorenzo Stoakes, linux-mm, linux-kernel

On 4/5/26 06:44, Hugh Dickins wrote:
> On Thu, 12 Feb 2026, xu.xin16@zte.com.cn wrote:
> 
>> From: xu xin <xu.xin16@zte.com.cn>
>>
>> Problem
>> =======
>> When available memory is extremely tight, causing KSM pages to be swapped
>> out, or when there is significant memory fragmentation and THP triggers
>> memory compaction, the system will invoke the rmap_walk_ksm function to
>> perform reverse mapping. However, we observed that this function becomes
>> particularly time-consuming when a large number of VMAs (e.g., 20,000)
>> share the same anon_vma. Through debug trace analysis, we found that most
>> of the latency occurs within anon_vma_interval_tree_foreach, leading to an
>> excessively long hold time on the anon_vma lock (even reaching 500ms or
>> more), which in turn causes upper-layer applications (waiting for the
>> anon_vma lock) to be blocked for extended periods.
>>
>> Root Cause
>> ==========
>> Further investigation revealed that 99.9% of iterations inside the
>> anon_vma_interval_tree_foreach loop are skipped due to the first check
>> "if (addr < vma->vm_start || addr >= vma->vm_end)), indicating that a large
>> number of loop iterations are ineffective. This inefficiency arises because
>> the pgoff_start and pgoff_end parameters passed to
>> anon_vma_interval_tree_foreach span the entire address space from 0 to
>> ULONG_MAX, resulting in very poor loop efficiency.
>>
>> Solution
>> ========
>> In fact, we can significantly improve performance by passing a more precise
>> range based on the given addr. Since the original pages merged by KSM
>> correspond to anonymous VMAs, the page offset can be calculated as
>> pgoff = address >> PAGE_SHIFT. Therefore, we can optimize the call by
>> defining:
>>
>> 	pgoff = rmap_item->address >> PAGE_SHIFT;
>>
>> Performance
>> ===========
>> In our real embedded Linux environment, the measured metrcis were as
>> follows:
>>
>> 1) Time_ms: Max time for holding anon_vma lock in a single rmap_walk_ksm.
>> 2) Nr_iteration_total: The max times of iterations in a loop of anon_vma_interval_tree_foreach
>> 3) Skip_addr_out_of_range: The max times of skipping due to the first check (vma->vm_start
>>              and vma->vm_end) in a loop of anon_vma_interval_tree_foreach.
>> 4) Skip_mm_mismatch: The max times of skipping due to the second check (rmap_item->mm == vma->vm_mm)
>>              in a loop of anon_vma_interval_tree_foreach.
>>
>> The result is as follows:
>>
>>           Time_ms      Nr_iteration_total    Skip_addr_out_of_range   Skip_mm_mismatch
>> Before:  228.65       22169                 22168                    0
>> After :   0.396        3                     0                       2
>>
>> The referenced reproducer of rmap_walk_ksm can be found at:
>> https://lore.kernel.org/all/20260206151424734QIyWL_pA-1QeJPbJlUxsO@zte.com.cn/
>>
>> Co-developed-by: Wang Yaxin <wang.yaxin@zte.com.cn>
>> Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn>
>> Signed-off-by: xu xin <xu.xin16@zte.com.cn>
> 
> This is a very attractive speedup, but I believe it's flawed: in the
> special case when a range has been mremap-moved, when its anon folio
> indexes and anon_vma pgoff correspond to the original user address,
> not to the current user address.

[as discussed in earlier versions of this patch set]

mremap() breaks KSM in the range to be moved.

See prep_move_vma()->ksm_madvise(MADV_UNMERGEABLE)

So I am not sure what you say can trigger.

But I'm just scrolling by, as I'm still busy celebrating Easter :)

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range
  2026-04-06  9:21     ` David Hildenbrand (arm)
@ 2026-04-06  9:23       ` David Hildenbrand (arm)
  0 siblings, 0 replies; 27+ messages in thread
From: David Hildenbrand (arm) @ 2026-04-06  9:23 UTC (permalink / raw)
  To: Hugh Dickins, xu.xin16
  Cc: akpm, chengming.zhou, wang.yaxin, yang.yang29, Michel Lespinasse,
	Lorenzo Stoakes, linux-mm, linux-kernel

On 4/6/26 11:21, David Hildenbrand (arm) wrote:
> On 4/5/26 06:44, Hugh Dickins wrote:
>> On Thu, 12 Feb 2026, xu.xin16@zte.com.cn wrote:
>>
>>> From: xu xin <xu.xin16@zte.com.cn>
>>>
>>> Problem
>>> =======
>>> When available memory is extremely tight, causing KSM pages to be swapped
>>> out, or when there is significant memory fragmentation and THP triggers
>>> memory compaction, the system will invoke the rmap_walk_ksm function to
>>> perform reverse mapping. However, we observed that this function becomes
>>> particularly time-consuming when a large number of VMAs (e.g., 20,000)
>>> share the same anon_vma. Through debug trace analysis, we found that most
>>> of the latency occurs within anon_vma_interval_tree_foreach, leading to an
>>> excessively long hold time on the anon_vma lock (even reaching 500ms or
>>> more), which in turn causes upper-layer applications (waiting for the
>>> anon_vma lock) to be blocked for extended periods.
>>>
>>> Root Cause
>>> ==========
>>> Further investigation revealed that 99.9% of iterations inside the
>>> anon_vma_interval_tree_foreach loop are skipped due to the first check
>>> "if (addr < vma->vm_start || addr >= vma->vm_end)), indicating that a large
>>> number of loop iterations are ineffective. This inefficiency arises because
>>> the pgoff_start and pgoff_end parameters passed to
>>> anon_vma_interval_tree_foreach span the entire address space from 0 to
>>> ULONG_MAX, resulting in very poor loop efficiency.
>>>
>>> Solution
>>> ========
>>> In fact, we can significantly improve performance by passing a more precise
>>> range based on the given addr. Since the original pages merged by KSM
>>> correspond to anonymous VMAs, the page offset can be calculated as
>>> pgoff = address >> PAGE_SHIFT. Therefore, we can optimize the call by
>>> defining:
>>>
>>> 	pgoff = rmap_item->address >> PAGE_SHIFT;
>>>
>>> Performance
>>> ===========
>>> In our real embedded Linux environment, the measured metrcis were as
>>> follows:
>>>
>>> 1) Time_ms: Max time for holding anon_vma lock in a single rmap_walk_ksm.
>>> 2) Nr_iteration_total: The max times of iterations in a loop of anon_vma_interval_tree_foreach
>>> 3) Skip_addr_out_of_range: The max times of skipping due to the first check (vma->vm_start
>>>               and vma->vm_end) in a loop of anon_vma_interval_tree_foreach.
>>> 4) Skip_mm_mismatch: The max times of skipping due to the second check (rmap_item->mm == vma->vm_mm)
>>>               in a loop of anon_vma_interval_tree_foreach.
>>>
>>> The result is as follows:
>>>
>>>            Time_ms      Nr_iteration_total    Skip_addr_out_of_range   Skip_mm_mismatch
>>> Before:  228.65       22169                 22168                    0
>>> After :   0.396        3                     0                       2
>>>
>>> The referenced reproducer of rmap_walk_ksm can be found at:
>>> https://lore.kernel.org/all/20260206151424734QIyWL_pA-1QeJPbJlUxsO@zte.com.cn/
>>>
>>> Co-developed-by: Wang Yaxin <wang.yaxin@zte.com.cn>
>>> Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn>
>>> Signed-off-by: xu xin <xu.xin16@zte.com.cn>
>>
>> This is a very attractive speedup, but I believe it's flawed: in the
>> special case when a range has been mremap-moved, when its anon folio
>> indexes and anon_vma pgoff correspond to the original user address,
>> not to the current user address.
> 
> [as discussed in earlier versions of this patch set]
> 
> mremap() breaks KSM in the range to be moved.
> 
> See prep_move_vma()->ksm_madvise(MADV_UNMERGEABLE)
> 
> So I am not sure what you say can trigger.
> 
> But I'm just scrolling by, as I'm still busy celebrating Easter :)
> 

[realizing threading is somehow messed up and Xu Xin commented that already]

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range
  2026-04-06  5:35       ` Hugh Dickins
@ 2026-04-07  6:21         ` xu.xin16
  2026-04-07  9:36           ` Lorenzo Stoakes (Oracle)
  0 siblings, 1 reply; 27+ messages in thread
From: xu.xin16 @ 2026-04-07  6:21 UTC (permalink / raw)
  To: hughd
  Cc: hughd, akpm, david, chengming.zhou, wang.yaxin, yang.yang29,
	michel, ljs, linux-mm, linux-kernel

> > From the current implementation of mremap, before it succeeds, it always calls
> > prep_move_vma() -> madvise(MADV_UNMERGEABLE) -> break_ksm(), which splits KSM pages
> > into regular anonymous pages, which appears to be based on a patch you introduced
> > over a decade ago, 1ff829957316(ksm: prevent mremap move poisoning). Given this,
> > KSM pages should already be broken prior to the move, so they wouldn't remain as
> > mergeable pages after mremap. Could there be a scenario where this breaking mechanism
> > is bypassed, or am I missing a subtlety in the sequence of operations?
> 
> I'd completely forgotten that patch by now!  But it's dealing with a
> different issue; and note how it's intentionally leaving MADV_MERGEABLE
> on the vma itself, just using MADV_UNMERGEABLE (with &dummy) as an
> interface to CoW the KSM pages at that time, letting them be remerged after.
> 
> The sequence in my testcase was:
> 
> boot with mem=1G
> echo 1 >/sys/kernel/mm/ksm/run
> base = mmap(NULL, 3*PAGE_SIZE, PROT_READ|PROT_WRITE,
> 		MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
> madvise(base, 3*PAGE_SIZE, MADV_MERGEABLE);
> madvise(base, 3*PAGE_SIZE, MADV_DONTFORK); /* in case system() used */
> memset(base, 0x77, 2*PAGE_SIZE);
> sleep(1); /* I think not required */
> mremap(base + PAGE_SIZE, PAGE_SIZE, PAGE_SIZE,
> 	MREMAP_MAYMOVE|MREMAP_FIXED, base + 2*PAGE_SIZE);
> base2 = mmap(NULL, 512K, PROT_READ|PROT_WRITE,
> 		MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
> madvise(base2, 512K, MADV_DONTFORK); /* in case system() used */
> memset(base2, 0x77, 512K);
> print pages_shared pages_sharing /* 1 1 expected, 1 1 seen */
> run something to mmap 1G anon, touch all, touch again, exit 
> print pages_shared pages_sharing /* 0 0 expected, 1 1 seen */
> exit
> 
> Those base2 lines were a late addition, to get the test without mremap
> showing 0 0 instead of 1 1 at the end; just as I had to apply that
> pte_mkold-without-folio_mark_accessed patch to the kernel's mm/ksm.c.
> 
> Originally I was checking the testcase's /proc/pid/smaps manually
> before exit; then found printing pages_shared pages_sharing easier.
> 
> Hugh

Following the idea from your test case, I wrote a similar test program,
using migration instead of swap to trigger reverse mapping. The results
show that pages after mremap can still be successfully migrated.

See my testcase:
https://lore.kernel.org/all/20260407140805858ViqJKFhfmYSfq0FynsaEY@zte.com.cn/

Therefore, I suspect that the reason your test program did not swap out
the pages might lie elsewhere, rather than being caused by this optimization.

Thanks.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range
  2026-04-07  6:21         ` xu.xin16
@ 2026-04-07  9:36           ` Lorenzo Stoakes (Oracle)
  2026-04-08 12:57             ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 27+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-04-07  9:36 UTC (permalink / raw)
  To: xu.xin16
  Cc: hughd, akpm, david, chengming.zhou, wang.yaxin, yang.yang29,
	michel, linux-mm, linux-kernel

On Tue, Apr 07, 2026 at 02:21:41PM +0800, xu.xin16@zte.com.cn wrote:
> > > From the current implementation of mremap, before it succeeds, it always calls
> > > prep_move_vma() -> madvise(MADV_UNMERGEABLE) -> break_ksm(), which splits KSM pages
> > > into regular anonymous pages, which appears to be based on a patch you introduced
> > > over a decade ago, 1ff829957316(ksm: prevent mremap move poisoning). Given this,
> > > KSM pages should already be broken prior to the move, so they wouldn't remain as
> > > mergeable pages after mremap. Could there be a scenario where this breaking mechanism
> > > is bypassed, or am I missing a subtlety in the sequence of operations?
> >
> > I'd completely forgotten that patch by now!  But it's dealing with a
> > different issue; and note how it's intentionally leaving MADV_MERGEABLE
> > on the vma itself, just using MADV_UNMERGEABLE (with &dummy) as an
> > interface to CoW the KSM pages at that time, letting them be remerged after.

Hmm yeah, we mark them unmergeable but don't update the VMA flags (since using
&dummy), so they can just be merged later right?

And then the:

void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc)
{
	...
		const pgoff_t pgoff = rmap_item->address >> PAGE_SHIFT;
		...
		anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
					       pgoff, pgoff) {
			...
		}
	...
}

Would _assume_ that folio->pgoff == addr >> PAGE_SHIFT, which will no longer be
the case here?

And yeah this all sucks (come to my lsf talk etc.)

This does make me realise I have to also radically change KSM (gulp) in that
work too. So maybe time for me to actually learn more about it...

> >
> > The sequence in my testcase was:
> >
> > boot with mem=1G
> > echo 1 >/sys/kernel/mm/ksm/run
> > base = mmap(NULL, 3*PAGE_SIZE, PROT_READ|PROT_WRITE,
> > 		MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
> > madvise(base, 3*PAGE_SIZE, MADV_MERGEABLE);
> > madvise(base, 3*PAGE_SIZE, MADV_DONTFORK); /* in case system() used */
> > memset(base, 0x77, 2*PAGE_SIZE);
> > sleep(1); /* I think not required */
> > mremap(base + PAGE_SIZE, PAGE_SIZE, PAGE_SIZE,
> > 	MREMAP_MAYMOVE|MREMAP_FIXED, base + 2*PAGE_SIZE);
> > base2 = mmap(NULL, 512K, PROT_READ|PROT_WRITE,
> > 		MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
> > madvise(base2, 512K, MADV_DONTFORK); /* in case system() used */
> > memset(base2, 0x77, 512K);
> > print pages_shared pages_sharing /* 1 1 expected, 1 1 seen */
> > run something to mmap 1G anon, touch all, touch again, exit
> > print pages_shared pages_sharing /* 0 0 expected, 1 1 seen */
> > exit
> >
> > Those base2 lines were a late addition, to get the test without mremap
> > showing 0 0 instead of 1 1 at the end; just as I had to apply that
> > pte_mkold-without-folio_mark_accessed patch to the kernel's mm/ksm.c.
> >
> > Originally I was checking the testcase's /proc/pid/smaps manually
> > before exit; then found printing pages_shared pages_sharing easier.
> >
> > Hugh
>
> Following the idea from your test case, I wrote a similar test program,
> using migration instead of swap to trigger reverse mapping. The results
> show that pages after mremap can still be successfully migrated.
>
> See my testcase:
> https://lore.kernel.org/all/20260407140805858ViqJKFhfmYSfq0FynsaEY@zte.com.cn/
>
> Therefore, I suspect that the reason your test program did not swap out
> the pages might lie elsewhere, rather than being caused by this optimization.
>
> Thanks.

Maybe test programs are not happening to hit the 'merge again' case after the
initial force-unmergeing?

I may be missing things here, my bandwidth is now unfortunately seriously
hampered and likely to remain so for some time :'(

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range
  2026-04-05  4:44   ` Hugh Dickins
                       ` (2 preceding siblings ...)
  2026-04-06  9:21     ` David Hildenbrand (arm)
@ 2026-04-07  9:39     ` Lorenzo Stoakes (Oracle)
  3 siblings, 0 replies; 27+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-04-07  9:39 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: xu.xin16, akpm, david, chengming.zhou, wang.yaxin, yang.yang29,
	Michel Lespinasse, linux-mm, linux-kernel

On Sat, Apr 04, 2026 at 09:44:14PM -0700, Hugh Dickins wrote:
> Perhaps there is, or could be, a cleverer way for KSM to walk the anon_vma
> interval tree, which can handle the mremap-moved pgoffs appropriately.
> Cc'ing Michel, whose bf181b9f9d8d ("mm anon rmap: replace same_anon_vma
> linked list with an interval tree.") specifically chose the 0, ULONG_MAX
> which you are replacing.

No, I don't think there could be, and I wouldn't want anybody to try to
implement any kind of remap-tracking that might clash with my future work, not
that I think there's a hugely sensible way of doing that with the current
anon_vma implementation.

>
> Cc'ing Lorenzo, who is currently considering replacing anon_vma by
> something more like my anonmm, which preceded Andrea's anon_vma in 2.6.7;
> but Lorenzo supplementing it with the mremap tracking which defeated me.
> This rmap_walk_ksm() might well benefit from his approach.  (I'm not
> actually expecting any input from Lorenzo here, or Michel: more FYIs.)

Thanks :)

Maybe I should go read your anonmm implementation... the mremap-tracking is
tricky but I have it working (modulo, KSM, yeah this whole thing was a good hint
that I need to look at that, too [+ whatever else I've missed so far]).

Bandwith is low for foreseeable future so expectations of not-reply are probably
fairly valid atm (and yet here I am, replying :)

>
> But more realistic in the short term, might be for you to keep your
> optimization, but fix the lookup, by keeping a count of PTEs found,
> and when that falls short, take a second pass with 0, ULONG_MAX.
> Somewhat ugly, certainly imperfect, but good enough for now.

Yeah that could work, it's not likely that many of these would be mremap()'d
right?

Yes ugly, but anon_vma is (very) ugly.

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range
  2026-04-05 21:01     ` Andrew Morton
@ 2026-04-07  9:43       ` Lorenzo Stoakes (Oracle)
  2026-04-07 21:21         ` Andrew Morton
  0 siblings, 1 reply; 27+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-04-07  9:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, xu.xin16, david, chengming.zhou, wang.yaxin,
	yang.yang29, Michel Lespinasse, linux-mm, linux-kernel

On Sun, Apr 05, 2026 at 02:01:32PM -0700, Andrew Morton wrote:
> On Sat, 4 Apr 2026 21:44:14 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote:
>
> > This is a very attractive speedup, but I believe it's flawed: in the
> > special case when a range has been mremap-moved, when its anon folio
> > indexes and anon_vma pgoff correspond to the original user address,
> > not to the current user address.
> >
> > In which case, rmap_walk_ksm() will be unable to find all the PTEs
> > for that KSM folio, which will consequently be pinned in memory -
> > unable to be reclaimed, unable to be migrated, unable to be hotremoved,
> > until it's finally unmapped or KSM disabled.
> >
> > But it's years since I worked on KSM or on anon_vma, so I may be confused
> > and my belief wrong.  I have tried to test it, and my testcase did appear
> > to show 7.0-rc6 successfully swapping out even mremap-moved KSM folios,
> > but mm.git failing to do so.  However, I say "appear to show" because I
> > found swapping out any KSM pages harder than I'd been expecting: so have
> > some doubts about my testing.  Let me give more detail on that at the
> > bottom of this mail: it's a tangent which had better not distract from
> > your speedup.
> >
> > If I'm right that your patch is flawed, what to do?
>
> Thanks, Hugh.   Administreevia:
>
> I've removed this patch from the mm-stable branch and I reworked its
> [1/2] "ksm: initialize the addr only once in rmap_walk_ksm" to be
> presented as a singleton patch.
>
> For now I've restaged this patch ("ksm: optimize rmap_walk_ksm by
> passing a suitable address range") at the tail of the mm-unstable
> branch and I'll enter wait-and-see mode.
>

Given we're at -rc7 now, I think we should delay this patch until 7.2, unless
I'm much mistaken wrt Hugh's concerns.

I'm concerned this is a subtle way of breaking things so we really want to be
confident.

We should also bundle up the test at
https://lore.kernel.org/all/20260407140805858ViqJKFhfmYSfq0FynsaEY@zte.com.cn/
with this patch (should we find it's ok) as a separate series.

Really overall I think safest to yank until 7.2 honestly.

Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range
  2026-04-07  9:43       ` Lorenzo Stoakes (Oracle)
@ 2026-04-07 21:21         ` Andrew Morton
  2026-04-08  6:29           ` Lorenzo Stoakes
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Morton @ 2026-04-07 21:21 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: Hugh Dickins, xu.xin16, david, chengming.zhou, wang.yaxin,
	yang.yang29, Michel Lespinasse, linux-mm, linux-kernel

On Tue, 7 Apr 2026 10:43:12 +0100 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:

> > Thanks, Hugh.   Administreevia:
> >
> > I've removed this patch from the mm-stable branch and I reworked its
> > [1/2] "ksm: initialize the addr only once in rmap_walk_ksm" to be
> > presented as a singleton patch.
> >
> > For now I've restaged this patch ("ksm: optimize rmap_walk_ksm by
> > passing a suitable address range") at the tail of the mm-unstable
> > branch and I'll enter wait-and-see mode.
> >
> 
> Given we're at -rc7 now, I think we should delay this patch until 7.2, unless
> I'm much mistaken wrt Hugh's concerns.
> 
> I'm concerned this is a subtle way of breaking things so we really want to be
> confident.
> 
> We should also bundle up the test at
> https://lore.kernel.org/all/20260407140805858ViqJKFhfmYSfq0FynsaEY@zte.com.cn/
> with this patch (should we find it's ok) as a separate series.
> 
> Really overall I think safest to yank until 7.2 honestly.

OK.  But let's not lose sight of those potential efficiency gains:

         Time_ms      Nr_iteration_total    Skip_addr_out_of_range   Skip_mm_mismatch
Before:  228.65       22169                 22168                    0
After :   0.396        3                     0                       2



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range
  2026-04-07 21:21         ` Andrew Morton
@ 2026-04-08  6:29           ` Lorenzo Stoakes
  0 siblings, 0 replies; 27+ messages in thread
From: Lorenzo Stoakes @ 2026-04-08  6:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, xu.xin16, david, chengming.zhou, wang.yaxin,
	yang.yang29, Michel Lespinasse, linux-mm, linux-kernel

On Tue, Apr 07, 2026 at 02:21:42PM -0700, Andrew Morton wrote:
> On Tue, 7 Apr 2026 10:43:12 +0100 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
>
> > > Thanks, Hugh.   Administreevia:
> > >
> > > I've removed this patch from the mm-stable branch and I reworked its
> > > [1/2] "ksm: initialize the addr only once in rmap_walk_ksm" to be
> > > presented as a singleton patch.
> > >
> > > For now I've restaged this patch ("ksm: optimize rmap_walk_ksm by
> > > passing a suitable address range") at the tail of the mm-unstable
> > > branch and I'll enter wait-and-see mode.
> > >
> >
> > Given we're at -rc7 now, I think we should delay this patch until 7.2, unless
> > I'm much mistaken wrt Hugh's concerns.
> >
> > I'm concerned this is a subtle way of breaking things so we really want to be
> > confident.
> >
> > We should also bundle up the test at
> > https://lore.kernel.org/all/20260407140805858ViqJKFhfmYSfq0FynsaEY@zte.com.cn/
> > with this patch (should we find it's ok) as a separate series.
> >
> > Really overall I think safest to yank until 7.2 honestly.
>
> OK.  But let's not lose sight of those potential efficiency gains:
>
>          Time_ms      Nr_iteration_total    Skip_addr_out_of_range   Skip_mm_mismatch
> Before:  228.65       22169                 22168                    0
> After :   0.396        3                     0                       2
>

Yes, sure. We could possibly achieve similar by doing a quick search first then
trying the broader search as suggested by Hugh?

But want to make sure correctness is there!

Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range
  2026-04-07  9:36           ` Lorenzo Stoakes (Oracle)
@ 2026-04-08 12:57             ` David Hildenbrand (Arm)
  2026-04-09  9:18               ` Lorenzo Stoakes
  0 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-08 12:57 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle), xu.xin16
  Cc: hughd, akpm, chengming.zhou, wang.yaxin, yang.yang29, michel,
	linux-mm, linux-kernel

On 4/7/26 11:36, Lorenzo Stoakes (Oracle) wrote:
> On Tue, Apr 07, 2026 at 02:21:41PM +0800, xu.xin16@zte.com.cn wrote:
>>>
>>> I'd completely forgotten that patch by now!  But it's dealing with a
>>> different issue; and note how it's intentionally leaving MADV_MERGEABLE
>>> on the vma itself, just using MADV_UNMERGEABLE (with &dummy) as an
>>> interface to CoW the KSM pages at that time, letting them be remerged after.
> 
> Hmm yeah, we mark them unmergeable but don't update the VMA flags (since using
> &dummy), so they can just be merged later right?
> 
> And then the:
> 
> void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc)
> {
> 	...
> 		const pgoff_t pgoff = rmap_item->address >> PAGE_SHIFT;
> 		...
> 		anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
> 					       pgoff, pgoff) {
> 			...
> 		}
> 	...
> }
> 
> Would _assume_ that folio->pgoff == addr >> PAGE_SHIFT, which will no longer be
> the case here?

I'm wondering whether we could figure the pgoff out, somehow, so we
wouldn't have to store it elsewhere.

What we need is essentially what __folio_set_anon() would have done for
the original folio we replaced.

	folio->index = linear_page_index(vma, address);

Could we obtain that from the anon_vma assigned to our rmap_item?

pgoff_t pgoff;

pgoff = (rmap_item->address - anon_vma->vma->vm_start) >> PAGE_SHIFT;
pgoff += anon_vma->vma->vm_pgoff;

It would be the same adjustment everywhere we look in child processes,
because the moment they would mremap() would be where we would have
unshared.

Just a thought after reading avc_start_pgoff ...

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range
  2026-04-08 12:57             ` David Hildenbrand (Arm)
@ 2026-04-09  9:18               ` Lorenzo Stoakes
  2026-04-09  9:37                 ` David Hildenbrand (Arm)
  2026-04-09 10:06                 ` xu.xin16
  0 siblings, 2 replies; 27+ messages in thread
From: Lorenzo Stoakes @ 2026-04-09  9:18 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: xu.xin16, hughd, akpm, chengming.zhou, wang.yaxin, yang.yang29,
	michel, linux-mm, linux-kernel

On Wed, Apr 08, 2026 at 02:57:10PM +0200, David Hildenbrand (Arm) wrote:
> On 4/7/26 11:36, Lorenzo Stoakes (Oracle) wrote:
> > On Tue, Apr 07, 2026 at 02:21:41PM +0800, xu.xin16@zte.com.cn wrote:
> >>>
> >>> I'd completely forgotten that patch by now!  But it's dealing with a
> >>> different issue; and note how it's intentionally leaving MADV_MERGEABLE
> >>> on the vma itself, just using MADV_UNMERGEABLE (with &dummy) as an
> >>> interface to CoW the KSM pages at that time, letting them be remerged after.
> >
> > Hmm yeah, we mark them unmergeable but don't update the VMA flags (since using
> > &dummy), so they can just be merged later right?
> >
> > And then the:
> >
> > void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc)
> > {
> > 	...
> > 		const pgoff_t pgoff = rmap_item->address >> PAGE_SHIFT;
> > 		...
> > 		anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
> > 					       pgoff, pgoff) {
> > 			...
> > 		}
> > 	...
> > }
> >
> > Would _assume_ that folio->pgoff == addr >> PAGE_SHIFT, which will no longer be
> > the case here?
>
> I'm wondering whether we could figure the pgoff out, somehow, so we
> wouldn't have to store it elsewhere.
>
> What we need is essentially what __folio_set_anon() would have done for
> the original folio we replaced.
>
> 	folio->index = linear_page_index(vma, address);
>
> Could we obtain that from the anon_vma assigned to our rmap_item?
>
> pgoff_t pgoff;
>
> pgoff = (rmap_item->address - anon_vma->vma->vm_start) >> PAGE_SHIFT;
> pgoff += anon_vma->vma->vm_pgoff;

anon_vma doesn't have a vma field :) it has anon_vma->rb_root which maps to all
'related' VMAs.

And we're already looking at what might be covered by the anon_vma by
invoking anon_vma_interval_tree_foreach() on anon_vma->rb_root in [0,
ULONG_MAX).

>
> It would be the same adjustment everywhere we look in child processes,
> because the moment they would mremap() would be where we would have
> unshared.
>
> Just a thought after reading avc_start_pgoff ...

One interesting thing here is in the anon_vma_interval_tree_foreach() loop
we check:

if (addr < vma->vm_start || addr >= vma->vm_end)
	continue;

Which is the same as saying 'hey we are ignoring remaps'.

But... if _we_ got remapped previously (the unsharing is only temporary),
then we'd _still_ have an anon_vma with an old index != addr >> PAGE_SHIFT,
and would still not be able to figure out the correct pgoff after sharing.

I wonder if we could just store the pgoff in the rmap_item though?

Because we unshare on remap, so we'd expect a new share after remapping, at
which point we could account for the remapping by just setting
rmap_item->pgoff = vma->vm_pgoff I think?

Then we're back in business.

Another way around this issue is to do the rmap_walk_ksm() loop for (addr
>> PAGE_SHIFT) _first_, but that'd only be useful for walkers that can exit
early once they find the mapping they care about, and I worry about 'some
how' missing remapped cases, so probably not actually all that useful.

>
> --
> Cheers,
>
> David

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range
  2026-04-09  9:18               ` Lorenzo Stoakes
@ 2026-04-09  9:37                 ` David Hildenbrand (Arm)
  2026-04-09  9:41                   ` David Hildenbrand (Arm)
  2026-04-09 10:06                 ` xu.xin16
  1 sibling, 1 reply; 27+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-09  9:37 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: xu.xin16, hughd, akpm, chengming.zhou, wang.yaxin, yang.yang29,
	michel, linux-mm, linux-kernel

On 4/9/26 11:18, Lorenzo Stoakes wrote:
> On Wed, Apr 08, 2026 at 02:57:10PM +0200, David Hildenbrand (Arm) wrote:
>> On 4/7/26 11:36, Lorenzo Stoakes (Oracle) wrote:
>>>
>>> Hmm yeah, we mark them unmergeable but don't update the VMA flags (since using
>>> &dummy), so they can just be merged later right?
>>>
>>> And then the:
>>>
>>> void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc)
>>> {
>>> 	...
>>> 		const pgoff_t pgoff = rmap_item->address >> PAGE_SHIFT;
>>> 		...
>>> 		anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
>>> 					       pgoff, pgoff) {
>>> 			...
>>> 		}
>>> 	...
>>> }
>>>
>>> Would _assume_ that folio->pgoff == addr >> PAGE_SHIFT, which will no longer be
>>> the case here?
>>
>> I'm wondering whether we could figure the pgoff out, somehow, so we
>> wouldn't have to store it elsewhere.
>>
>> What we need is essentially what __folio_set_anon() would have done for
>> the original folio we replaced.
>>
>> 	folio->index = linear_page_index(vma, address);
>>
>> Could we obtain that from the anon_vma assigned to our rmap_item?
>>
>> pgoff_t pgoff;
>>
>> pgoff = (rmap_item->address - anon_vma->vma->vm_start) >> PAGE_SHIFT;
>> pgoff += anon_vma->vma->vm_pgoff;
> 
> anon_vma doesn't have a vma field :) it has anon_vma->rb_root which maps to all
> 'related' VMAs.

Right, anon_vma_chain has. Dammit.

> 
> And we're already looking at what might be covered by the anon_vma by
> invoking anon_vma_interval_tree_foreach() on anon_vma->rb_root in [0,
> ULONG_MAX).
> 
>>
>> It would be the same adjustment everywhere we look in child processes,
>> because the moment they would mremap() would be where we would have
>> unshared.
>>
>> Just a thought after reading avc_start_pgoff ...
> 
> One interesting thing here is in the anon_vma_interval_tree_foreach() loop
> we check:
> 
> if (addr < vma->vm_start || addr >= vma->vm_end)
> 	continue;
> 
> Which is the same as saying 'hey we are ignoring remaps'.
> 
> But... if _we_ got remapped previously (the unsharing is only temporary),
> then we'd _still_ have an anon_vma with an old index != addr >> PAGE_SHIFT,
> and would still not be able to figure out the correct pgoff after sharing.
> 
> I wonder if we could just store the pgoff in the rmap_item though?

That's what I said elsewhere and what I was trying to avoid here.

It's 64bytes, and adding a new item will increase it to 96 bytes IIUC.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range
  2026-04-09  9:37                 ` David Hildenbrand (Arm)
@ 2026-04-09  9:41                   ` David Hildenbrand (Arm)
  2026-04-09  9:53                     ` Lorenzo Stoakes
  2026-04-09  9:55                     ` David Hildenbrand (Arm)
  0 siblings, 2 replies; 27+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-09  9:41 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: xu.xin16, hughd, akpm, chengming.zhou, wang.yaxin, yang.yang29,
	michel, linux-mm, linux-kernel

On 4/9/26 11:37, David Hildenbrand (Arm) wrote:
> On 4/9/26 11:18, Lorenzo Stoakes wrote:
>> On Wed, Apr 08, 2026 at 02:57:10PM +0200, David Hildenbrand (Arm) wrote:
>>>
>>> I'm wondering whether we could figure the pgoff out, somehow, so we
>>> wouldn't have to store it elsewhere.
>>>
>>> What we need is essentially what __folio_set_anon() would have done for
>>> the original folio we replaced.
>>>
>>> 	folio->index = linear_page_index(vma, address);
>>>
>>> Could we obtain that from the anon_vma assigned to our rmap_item?
>>>
>>> pgoff_t pgoff;
>>>
>>> pgoff = (rmap_item->address - anon_vma->vma->vm_start) >> PAGE_SHIFT;
>>> pgoff += anon_vma->vma->vm_pgoff;
>>
>> anon_vma doesn't have a vma field :) it has anon_vma->rb_root which maps to all
>> 'related' VMAs.
> 
> Right, anon_vma_chain has. Dammit.
> 
>>
>> And we're already looking at what might be covered by the anon_vma by
>> invoking anon_vma_interval_tree_foreach() on anon_vma->rb_root in [0,
>> ULONG_MAX).
>>
>>>
>>> It would be the same adjustment everywhere we look in child processes,
>>> because the moment they would mremap() would be where we would have
>>> unshared.
>>>
>>> Just a thought after reading avc_start_pgoff ...
>>
>> One interesting thing here is in the anon_vma_interval_tree_foreach() loop
>> we check:
>>
>> if (addr < vma->vm_start || addr >= vma->vm_end)
>> 	continue;
>>
>> Which is the same as saying 'hey we are ignoring remaps'.
>>
>> But... if _we_ got remapped previously (the unsharing is only temporary),
>> then we'd _still_ have an anon_vma with an old index != addr >> PAGE_SHIFT,
>> and would still not be able to figure out the correct pgoff after sharing.
>>
>> I wonder if we could just store the pgoff in the rmap_item though?
> 
> That's what I said elsewhere and what I was trying to avoid here.
> 
> It's 64bytes, and adding a new item will increase it to 96 bytes IIUC.

As we're using a dedicate kmem cache it might "only" add 8 bytes, not
sure. Still an undesired increase given that we need that for each entry
in the stable/unstable tree.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range
  2026-04-09  9:41                   ` David Hildenbrand (Arm)
@ 2026-04-09  9:53                     ` Lorenzo Stoakes
  2026-04-09  9:56                       ` David Hildenbrand (Arm)
  2026-04-09  9:55                     ` David Hildenbrand (Arm)
  1 sibling, 1 reply; 27+ messages in thread
From: Lorenzo Stoakes @ 2026-04-09  9:53 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: xu.xin16, hughd, akpm, chengming.zhou, wang.yaxin, yang.yang29,
	michel, linux-mm, linux-kernel

On Thu, Apr 09, 2026 at 11:41:46AM +0200, David Hildenbrand (Arm) wrote:
> On 4/9/26 11:37, David Hildenbrand (Arm) wrote:
> > On 4/9/26 11:18, Lorenzo Stoakes wrote:
> >> On Wed, Apr 08, 2026 at 02:57:10PM +0200, David Hildenbrand (Arm) wrote:
> >>>
> >>> I'm wondering whether we could figure the pgoff out, somehow, so we
> >>> wouldn't have to store it elsewhere.
> >>>
> >>> What we need is essentially what __folio_set_anon() would have done for
> >>> the original folio we replaced.
> >>>
> >>> 	folio->index = linear_page_index(vma, address);
> >>>
> >>> Could we obtain that from the anon_vma assigned to our rmap_item?
> >>>
> >>> pgoff_t pgoff;
> >>>
> >>> pgoff = (rmap_item->address - anon_vma->vma->vm_start) >> PAGE_SHIFT;
> >>> pgoff += anon_vma->vma->vm_pgoff;
> >>
> >> anon_vma doesn't have a vma field :) it has anon_vma->rb_root which maps to all
> >> 'related' VMAs.
> >
> > Right, anon_vma_chain has. Dammit.
> >
> >>
> >> And we're already looking at what might be covered by the anon_vma by
> >> invoking anon_vma_interval_tree_foreach() on anon_vma->rb_root in [0,
> >> ULONG_MAX).
> >>
> >>>
> >>> It would be the same adjustment everywhere we look in child processes,
> >>> because the moment they would mremap() would be where we would have
> >>> unshared.
> >>>
> >>> Just a thought after reading avc_start_pgoff ...
> >>
> >> One interesting thing here is in the anon_vma_interval_tree_foreach() loop
> >> we check:
> >>
> >> if (addr < vma->vm_start || addr >= vma->vm_end)
> >> 	continue;
> >>
> >> Which is the same as saying 'hey we are ignoring remaps'.
> >>
> >> But... if _we_ got remapped previously (the unsharing is only temporary),
> >> then we'd _still_ have an anon_vma with an old index != addr >> PAGE_SHIFT,
> >> and would still not be able to figure out the correct pgoff after sharing.
> >>
> >> I wonder if we could just store the pgoff in the rmap_item though?
> >
> > That's what I said elsewhere and what I was trying to avoid here.
> >
> > It's 64bytes, and adding a new item will increase it to 96 bytes IIUC.
>
> As we're using a dedicate kmem cache it might "only" add 8 bytes, not
> sure. Still an undesired increase given that we need that for each entry
> in the stable/unstable tree.

Hm, random idea, but I wonder if we could cram a bit somewhere that
indicates whether a remap has in fact taken place?

rmap_item->some_field |= !!(vma->vm_start >> PAGE_SHIFT != vma->vm_pgoff);

(yeah obviously _not implemented like that_ but you get the point)

Since remap case should be rare, then if that bit is clear, do the cheap
path, otherwise do expensive?

Longer term, my anon_vma rework should fix this more broadly :)

>
> --
> Cheers,
>
> David

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range
  2026-04-09  9:41                   ` David Hildenbrand (Arm)
  2026-04-09  9:53                     ` Lorenzo Stoakes
@ 2026-04-09  9:55                     ` David Hildenbrand (Arm)
  2026-04-09  9:59                       ` Lorenzo Stoakes
  2026-04-09 10:56                       ` 答复: " xu.xin16
  1 sibling, 2 replies; 27+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-09  9:55 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: xu.xin16, hughd, akpm, chengming.zhou, wang.yaxin, yang.yang29,
	michel, linux-mm, linux-kernel

On 4/9/26 11:41, David Hildenbrand (Arm) wrote:
> On 4/9/26 11:37, David Hildenbrand (Arm) wrote:
>> On 4/9/26 11:18, Lorenzo Stoakes wrote:
>>>
>>> anon_vma doesn't have a vma field :) it has anon_vma->rb_root which maps to all
>>> 'related' VMAs.
>>
>> Right, anon_vma_chain has. Dammit.
>>
>>>
>>> And we're already looking at what might be covered by the anon_vma by
>>> invoking anon_vma_interval_tree_foreach() on anon_vma->rb_root in [0,
>>> ULONG_MAX).
>>>
>>>
>>> One interesting thing here is in the anon_vma_interval_tree_foreach() loop
>>> we check:
>>>
>>> if (addr < vma->vm_start || addr >= vma->vm_end)
>>> 	continue;
>>>
>>> Which is the same as saying 'hey we are ignoring remaps'.
>>>
>>> But... if _we_ got remapped previously (the unsharing is only temporary),
>>> then we'd _still_ have an anon_vma with an old index != addr >> PAGE_SHIFT,
>>> and would still not be able to figure out the correct pgoff after sharing.
>>>
>>> I wonder if we could just store the pgoff in the rmap_item though?
>>
>> That's what I said elsewhere and what I was trying to avoid here.
>>
>> It's 64bytes, and adding a new item will increase it to 96 bytes IIUC.
> 
> As we're using a dedicate kmem cache it might "only" add 8 bytes, not
> sure. Still an undesired increase given that we need that for each entry
> in the stable/unstable tree.
> 

Hmm, maybe we could do the following. I think the other members are only
relevant for the unstable tree.

diff --git a/mm/ksm.c b/mm/ksm.c
index 7d5b76478f0b..0c6bfed280f7 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -191,12 +191,13 @@ struct ksm_stable_node {
  * @nid: NUMA node id of unstable tree in which linked (may not match page)
  * @mm: the memory structure this rmap_item is pointing into
  * @address: the virtual address this rmap_item tracks (+ flags in low bits)
- * @oldchecksum: previous checksum of the page at that virtual address
+ * @oldchecksum: previous checksum of the page at that virtual address (unstable tree)
  * @node: rb node of this rmap_item in the unstable tree
  * @head: pointer to stable_node heading this list in the stable tree
  * @hlist: link into hlist of rmap_items hanging off that stable_node
- * @age: number of scan iterations since creation
- * @remaining_skips: how many scans to skip
+ * @age: number of scan iterations since creation (unstable tree)
+ * @remaining_skips: how many scans to skip (unstable tree)
+ * @pgoff: pgoff into @anon_vma where the page is mapped (stable tree)
  */
 struct ksm_rmap_item {
 	struct ksm_rmap_item *rmap_list;
@@ -208,9 +209,14 @@ struct ksm_rmap_item {
 	};
 	struct mm_struct *mm;
 	unsigned long address;		/* + low bits used for flags below */
-	unsigned int oldchecksum;	/* when unstable */
-	rmap_age_t age;
-	rmap_age_t remaining_skips;
+	union {
+		struct {
+			unsigned int oldchecksum;
+			rmap_age_t age;
+			rmap_age_t remaining_skips;
+		};
+		pgoff_t pgoff;
+	};
 	union {
 		struct rb_node node;	/* when node of unstable tree */
 		struct {		/* when listed from stable tree */
@@ -1600,6 +1606,7 @@ static int try_to_merge_with_ksm_page(struct ksm_rmap_item *rmap_item,
 
 	/* Must get reference to anon_vma while still holding mmap_lock */
 	rmap_item->anon_vma = vma->anon_vma;
+	rmap_item->pgoff = linear_page_index(vma, rmap_item->address);
 	get_anon_vma(vma->anon_vma);
 out:
 	mmap_read_unlock(mm);
-- 
2.43.0

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range
  2026-04-09  9:53                     ` Lorenzo Stoakes
@ 2026-04-09  9:56                       ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 27+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-09  9:56 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: xu.xin16, hughd, akpm, chengming.zhou, wang.yaxin, yang.yang29,
	michel, linux-mm, linux-kernel

On 4/9/26 11:53, Lorenzo Stoakes wrote:
> On Thu, Apr 09, 2026 at 11:41:46AM +0200, David Hildenbrand (Arm) wrote:
>> On 4/9/26 11:37, David Hildenbrand (Arm) wrote:
>>>
>>> Right, anon_vma_chain has. Dammit.
>>>
>>>
>>> That's what I said elsewhere and what I was trying to avoid here.
>>>
>>> It's 64bytes, and adding a new item will increase it to 96 bytes IIUC.
>>
>> As we're using a dedicate kmem cache it might "only" add 8 bytes, not
>> sure. Still an undesired increase given that we need that for each entry
>> in the stable/unstable tree.
> 
> Hm, random idea, but I wonder if we could cram a bit somewhere that
> indicates whether a remap has in fact taken place?

He, also what I raised elsewhere :)

We would have space for that in the rmap_item.

But see my other message, maybe we can indeed store the pgoff. Needs a
second thought.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range
  2026-04-09  9:55                     ` David Hildenbrand (Arm)
@ 2026-04-09  9:59                       ` Lorenzo Stoakes
  2026-04-09 10:56                       ` 答复: " xu.xin16
  1 sibling, 0 replies; 27+ messages in thread
From: Lorenzo Stoakes @ 2026-04-09  9:59 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: xu.xin16, hughd, akpm, chengming.zhou, wang.yaxin, yang.yang29,
	michel, linux-mm, linux-kernel

On Thu, Apr 09, 2026 at 11:55:10AM +0200, David Hildenbrand (Arm) wrote:
> On 4/9/26 11:41, David Hildenbrand (Arm) wrote:
> > On 4/9/26 11:37, David Hildenbrand (Arm) wrote:
> >> On 4/9/26 11:18, Lorenzo Stoakes wrote:
> >>>
> >>> anon_vma doesn't have a vma field :) it has anon_vma->rb_root which maps to all
> >>> 'related' VMAs.
> >>
> >> Right, anon_vma_chain has. Dammit.
> >>
> >>>
> >>> And we're already looking at what might be covered by the anon_vma by
> >>> invoking anon_vma_interval_tree_foreach() on anon_vma->rb_root in [0,
> >>> ULONG_MAX).
> >>>
> >>>
> >>> One interesting thing here is in the anon_vma_interval_tree_foreach() loop
> >>> we check:
> >>>
> >>> if (addr < vma->vm_start || addr >= vma->vm_end)
> >>> 	continue;
> >>>
> >>> Which is the same as saying 'hey we are ignoring remaps'.
> >>>
> >>> But... if _we_ got remapped previously (the unsharing is only temporary),
> >>> then we'd _still_ have an anon_vma with an old index != addr >> PAGE_SHIFT,
> >>> and would still not be able to figure out the correct pgoff after sharing.
> >>>
> >>> I wonder if we could just store the pgoff in the rmap_item though?
> >>
> >> That's what I said elsewhere and what I was trying to avoid here.
> >>
> >> It's 64bytes, and adding a new item will increase it to 96 bytes IIUC.
> >
> > As we're using a dedicate kmem cache it might "only" add 8 bytes, not
> > sure. Still an undesired increase given that we need that for each entry
> > in the stable/unstable tree.
> >
>
> Hmm, maybe we could do the following. I think the other members are only
> relevant for the unstable tree.

Nice, will leave the KSM stuff to you to confirm :)

This kind of approach should work fine...

>
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 7d5b76478f0b..0c6bfed280f7 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -191,12 +191,13 @@ struct ksm_stable_node {
>   * @nid: NUMA node id of unstable tree in which linked (may not match page)
>   * @mm: the memory structure this rmap_item is pointing into
>   * @address: the virtual address this rmap_item tracks (+ flags in low bits)
> - * @oldchecksum: previous checksum of the page at that virtual address
> + * @oldchecksum: previous checksum of the page at that virtual address (unstable tree)
>   * @node: rb node of this rmap_item in the unstable tree
>   * @head: pointer to stable_node heading this list in the stable tree
>   * @hlist: link into hlist of rmap_items hanging off that stable_node
> - * @age: number of scan iterations since creation
> - * @remaining_skips: how many scans to skip
> + * @age: number of scan iterations since creation (unstable tree)
> + * @remaining_skips: how many scans to skip (unstable tree)
> + * @pgoff: pgoff into @anon_vma where the page is mapped (stable tree)
>   */
>  struct ksm_rmap_item {
>  	struct ksm_rmap_item *rmap_list;
> @@ -208,9 +209,14 @@ struct ksm_rmap_item {
>  	};
>  	struct mm_struct *mm;
>  	unsigned long address;		/* + low bits used for flags below */
> -	unsigned int oldchecksum;	/* when unstable */
> -	rmap_age_t age;
> -	rmap_age_t remaining_skips;
> +	union {
> +		struct {
> +			unsigned int oldchecksum;
> +			rmap_age_t age;
> +			rmap_age_t remaining_skips;
> +		};
> +		pgoff_t pgoff;
> +	};

union to the rescue :)

>  	union {
>  		struct rb_node node;	/* when node of unstable tree */
>  		struct {		/* when listed from stable tree */
> @@ -1600,6 +1606,7 @@ static int try_to_merge_with_ksm_page(struct ksm_rmap_item *rmap_item,
>
>  	/* Must get reference to anon_vma while still holding mmap_lock */
>  	rmap_item->anon_vma = vma->anon_vma;
> +	rmap_item->pgoff = linear_page_index(vma, rmap_item->address);
>  	get_anon_vma(vma->anon_vma);
>  out:
>  	mmap_read_unlock(mm);
> --
> 2.43.0
>
> --
> Cheers,
>
> David

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range
  2026-04-09  9:18               ` Lorenzo Stoakes
  2026-04-09  9:37                 ` David Hildenbrand (Arm)
@ 2026-04-09 10:06                 ` xu.xin16
  2026-04-09 10:09                   ` Lorenzo Stoakes
  1 sibling, 1 reply; 27+ messages in thread
From: xu.xin16 @ 2026-04-09 10:06 UTC (permalink / raw)
  To: ljs, david
  Cc: hughd, akpm, chengming.zhou, wang.yaxin, yang.yang29, michel,
	linux-mm, linux-kernel

> > >>> I'd completely forgotten that patch by now!  But it's dealing with a
> > >>> different issue; and note how it's intentionally leaving MADV_MERGEABLE
> > >>> on the vma itself, just using MADV_UNMERGEABLE (with &dummy) as an
> > >>> interface to CoW the KSM pages at that time, letting them be remerged after.
> > >
> > > Hmm yeah, we mark them unmergeable but don't update the VMA flags (since using
> > > &dummy), so they can just be merged later right?
> > >
> > > And then the:
> > >
> > > void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc)
> > > {
> > > 	...
> > > 		const pgoff_t pgoff = rmap_item->address >> PAGE_SHIFT;
> > > 		...
> > > 		anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
> > > 					       pgoff, pgoff) {
> > > 			...
> > > 		}
> > > 	...
> > > }
> > >
> > > Would _assume_ that folio->pgoff == addr >> PAGE_SHIFT, which will no longer be
> > > the case here?
> >
> > I'm wondering whether we could figure the pgoff out, somehow, so we
> > wouldn't have to store it elsewhere.
> >
> > What we need is essentially what __folio_set_anon() would have done for
> > the original folio we replaced.
> >
> > 	folio->index = linear_page_index(vma, address);
> >
> > Could we obtain that from the anon_vma assigned to our rmap_item?
> >
> > pgoff_t pgoff;
> >
> > pgoff = (rmap_item->address - anon_vma->vma->vm_start) >> PAGE_SHIFT;
> > pgoff += anon_vma->vma->vm_pgoff;
> 
> anon_vma doesn't have a vma field :) it has anon_vma->rb_root which maps to all
> 'related' VMAs.

Yes, we cannot rely solely on anon_vma to locate all PTEs mapping this page; we
must also have the original page's pgoff. In fact, I believe only the current
vma->vm_pgoff is just necessary. I've examined the implementation of
anon_vma_interval_tree_foreach — it essentially iterates to find a suitable
VMA such that the provided pgoff falls within the VMA's range
	[vm_pgoff, vm_pgoff + vma_pages(v) - 1].

The root cause of the issue Hugh points is that the pgoff calculated from
rmap_item->address (which derives from vma->vm_start) is not the pgoff of
the page prior to merging. Consequently, the anon_vma_interval_tree_foreach traversal
cannot match the correct VMA satisfying vma_start_pgoff <= pgoff <= vma_end_pgoff.
This origins from an existing fact: if a user invokes mremap(), the new vma->vm_start
may changes, while the mapped page's index remains unchanged, but vma->vm_pgoff is updated
synchronously to ensure that the vma_address() calculation remains valid, like
rmap_walk_anon() in mm/rmap.c.

Based on the above, I think a simpler approach which does not increase the size
of the ksm_rmap_item struct below.

> 
> And we're already looking at what might be covered by the anon_vma by
> invoking anon_vma_interval_tree_foreach() on anon_vma->rb_root in [0,
> ULONG_MAX).
> 
> >
> > It would be the same adjustment everywhere we look in child processes,
> > because the moment they would mremap() would be where we would have
> > unshared.
> >
> > Just a thought after reading avc_start_pgoff ...
> 
> One interesting thing here is in the anon_vma_interval_tree_foreach() loop
> we check:
> 
> if (addr < vma->vm_start || addr >= vma->vm_end)
> 	continue;
> 
> Which is the same as saying 'hey we are ignoring remaps'.
> 
> But... if _we_ got remapped previously (the unsharing is only temporary),
> then we'd _still_ have an anon_vma with an old index != addr >> PAGE_SHIFT,
> and would still not be able to figure out the correct pgoff after sharing.
> 
> I wonder if we could just store the pgoff in the rmap_item though?
> 
> Because we unshare on remap, so we'd expect a new share after remapping, at
> which point we could account for the remapping by just setting
> rmap_item->pgoff = vma->vm_pgoff I think?

Can we just replace the stored anon_vma of "ksm_rmap_item" with the orig_vma
when KSM merging? Then, from rmap_item->orig_vma, we can directly obtain both
the anon_vma and the vm_pgoff, thereby enabling the location of all PTEs mapping
this page without any ambiguity.

Cheers, Xu


> 
> Then we're back in business.
> 
> Another way around this issue is to do the rmap_walk_ksm() loop for (addr
> >> PAGE_SHIFT) _first_, but that'd only be useful for walkers that can exit
> early once they find the mapping they care about, and I worry about 'some
> how' missing remapped cases, so probably not actually all that useful.
> 
> >
> > --
> > Cheers,
> >
> > David
> 
> Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range
  2026-04-09 10:06                 ` xu.xin16
@ 2026-04-09 10:09                   ` Lorenzo Stoakes
  0 siblings, 0 replies; 27+ messages in thread
From: Lorenzo Stoakes @ 2026-04-09 10:09 UTC (permalink / raw)
  To: xu.xin16
  Cc: david, hughd, akpm, chengming.zhou, wang.yaxin, yang.yang29,
	michel, linux-mm, linux-kernel

On Thu, Apr 09, 2026 at 06:06:05PM +0800, xu.xin16@zte.com.cn wrote:
> Can we just replace the stored anon_vma of "ksm_rmap_item" with the orig_vma
> when KSM merging? Then, from rmap_item->orig_vma, we can directly obtain both
> the anon_vma and the vm_pgoff, thereby enabling the location of all PTEs mapping
> this page without any ambiguity.

Please no :) that's a UAF waiting to happen, VMAs are highly dynamic objects
that can change at any given time if appropriate locks aren't held, nor are they
refcounted.

David suggested a way of storing the vm_pgoff without increasing rmap item
struct size, hopefully that's viable and then we can get the benefits here
without breaking anything!

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 27+ messages in thread

* 答复: [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range
  2026-04-09  9:55                     ` David Hildenbrand (Arm)
  2026-04-09  9:59                       ` Lorenzo Stoakes
@ 2026-04-09 10:56                       ` xu.xin16
  1 sibling, 0 replies; 27+ messages in thread
From: xu.xin16 @ 2026-04-09 10:56 UTC (permalink / raw)
  To: david, shr, ljs
  Cc: ljs, hughd, akpm, chengming.zhou, wang.yaxin, yang.yang29,
	michel, linux-mm, linux-kernel

> On 4/9/26 11:41, David Hildenbrand (Arm) wrote:
> > On 4/9/26 11:37, David Hildenbrand (Arm) wrote:
> >> On 4/9/26 11:18, Lorenzo Stoakes wrote:
> >>>
> >>> anon_vma doesn't have a vma field :) it has anon_vma->rb_root which maps to all
> >>> 'related' VMAs.
> >>
> >> Right, anon_vma_chain has. Dammit.
> >>
> >>>
> >>> And we're already looking at what might be covered by the anon_vma by
> >>> invoking anon_vma_interval_tree_foreach() on anon_vma->rb_root in [0,
> >>> ULONG_MAX).
> >>>
> >>>
> >>> One interesting thing here is in the anon_vma_interval_tree_foreach() loop
> >>> we check:
> >>>
> >>> if (addr < vma->vm_start || addr >= vma->vm_end)
> >>> 	continue;
> >>>
> >>> Which is the same as saying 'hey we are ignoring remaps'.
> >>>
> >>> But... if _we_ got remapped previously (the unsharing is only temporary),
> >>> then we'd _still_ have an anon_vma with an old index != addr >> PAGE_SHIFT,
> >>> and would still not be able to figure out the correct pgoff after sharing.
> >>>
> >>> I wonder if we could just store the pgoff in the rmap_item though?
> >>
> >> That's what I said elsewhere and what I was trying to avoid here.
> >>
> >> It's 64bytes, and adding a new item will increase it to 96 bytes IIUC.
> > 
> > As we're using a dedicate kmem cache it might "only" add 8 bytes, not
> > sure. Still an undesired increase given that we need that for each entry
> > in the stable/unstable tree.
> > 
> 
> Hmm, maybe we could do the following. I think the other members are only
> relevant for the unstable tree.

Well, I suspect that "SmartScan-Related" members might be also needed and used even when

it's a stable rmap_item. In should_skip_rmap_item(), if its page is KSM, it can't be skip.

What if the rmap_item is stable, but its page is not KSM?

Cc Stefan.

> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 7d5b76478f0b..0c6bfed280f7 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -191,12 +191,13 @@ struct ksm_stable_node {
>   * @nid: NUMA node id of unstable tree in which linked (may not match page)
>   * @mm: the memory structure this rmap_item is pointing into
>   * @address: the virtual address this rmap_item tracks (+ flags in low bits)
> - * @oldchecksum: previous checksum of the page at that virtual address
> + * @oldchecksum: previous checksum of the page at that virtual address (unstable tree)
>   * @node: rb node of this rmap_item in the unstable tree
>   * @head: pointer to stable_node heading this list in the stable tree
>   * @hlist: link into hlist of rmap_items hanging off that stable_node
> - * @age: number of scan iterations since creation
> - * @remaining_skips: how many scans to skip
> + * @age: number of scan iterations since creation (unstable tree)
> + * @remaining_skips: how many scans to skip (unstable tree)
> + * @pgoff: pgoff into @anon_vma where the page is mapped (stable tree)
>   */
>  struct ksm_rmap_item {
>  	struct ksm_rmap_item *rmap_list;
> @@ -208,9 +209,14 @@ struct ksm_rmap_item {
>  	};
>  	struct mm_struct *mm;
>  	unsigned long address;		/* + low bits used for flags below */
> -	unsigned int oldchecksum;	/* when unstable */
> -	rmap_age_t age;
> -	rmap_age_t remaining_skips;
> +	union {
> +		struct {
> +			unsigned int oldchecksum;
> +			rmap_age_t age;
> +			rmap_age_t remaining_skips;
> +		};
> +		pgoff_t pgoff;
> +	};
>  	union {
>  		struct rb_node node;	/* when node of unstable tree */
>  		struct {		/* when listed from stable tree */
> @@ -1600,6 +1606,7 @@ static int try_to_merge_with_ksm_page(struct ksm_rmap_item *rmap_item,
>  
>  	/* Must get reference to anon_vma while still holding mmap_lock */
>  	rmap_item->anon_vma = vma->anon_vma;
> +	rmap_item->pgoff = linear_page_index(vma, rmap_item->address);
>  	get_anon_vma(vma->anon_vma);
>  out:
>  	mmap_read_unlock(mm);
> -- 
> 2.43.0
> 
> -- 
> Cheers,
> 
> David
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2026-04-09 10:56 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-12 11:28 [PATCH v3 0/2] KSM: Optimizations for rmap_walk_ksm xu.xin16
2026-02-12 11:29 ` [PATCH v3 1/2] ksm: Initialize the addr only once in rmap_walk_ksm xu.xin16
2026-02-12 11:30 ` [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range xu.xin16
2026-02-12 12:21   ` David Hildenbrand (Arm)
2026-04-05  4:44   ` Hugh Dickins
2026-04-05 21:01     ` Andrew Morton
2026-04-07  9:43       ` Lorenzo Stoakes (Oracle)
2026-04-07 21:21         ` Andrew Morton
2026-04-08  6:29           ` Lorenzo Stoakes
2026-04-06  1:58     ` xu.xin16
2026-04-06  5:35       ` Hugh Dickins
2026-04-07  6:21         ` xu.xin16
2026-04-07  9:36           ` Lorenzo Stoakes (Oracle)
2026-04-08 12:57             ` David Hildenbrand (Arm)
2026-04-09  9:18               ` Lorenzo Stoakes
2026-04-09  9:37                 ` David Hildenbrand (Arm)
2026-04-09  9:41                   ` David Hildenbrand (Arm)
2026-04-09  9:53                     ` Lorenzo Stoakes
2026-04-09  9:56                       ` David Hildenbrand (Arm)
2026-04-09  9:55                     ` David Hildenbrand (Arm)
2026-04-09  9:59                       ` Lorenzo Stoakes
2026-04-09 10:56                       ` 答复: " xu.xin16
2026-04-09 10:06                 ` xu.xin16
2026-04-09 10:09                   ` Lorenzo Stoakes
2026-04-06  9:21     ` David Hildenbrand (arm)
2026-04-06  9:23       ` David Hildenbrand (arm)
2026-04-07  9:39     ` Lorenzo Stoakes (Oracle)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox