> > The result is as follows:

> > 

> >          Time_ms      Nr_iteration_total    Skip_addr_out_of_range   Skip_mm_mismatch

> > Before:  228.65       22169                 22168                    0

> > After :   0.396        3                     0                       2

> > 

> > The referenced reproducer of rmap_walk_ksm can be found at:

> > https://lore.kernel.org/all/20260206151424734QIyWL_pA-1QeJPbJlUxsO@zte.com.cn/

> > 

> > Co-developed-by: Wang Yaxin <wang.yaxin@zte.com.cn>

> > Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn>

> > Signed-off-by: xu xin <xu.xin16@zte.com.cn>

> This is a very attractive speedup, but I believe it's flawed: in the

> special case when a range has been mremap-moved, when its anon folio

> indexes and anon_vma pgoff correspond to the original user address,

> not to the current user address.

> In which case, rmap_walk_ksm() will be unable to find all the PTEs

> for that KSM folio, which will consequently be pinned in memory -

> unable to be reclaimed, unable to be migrated, unable to be hotremoved,

> until it's finally unmapped or KSM disabled.

> But it's years since I worked on KSM or on anon_vma, so I may be confused

> and my belief wrong.  I have tried to test it, and my testcase did appear

> to show 7.0-rc6 successfully swapping out even mremap-moved KSM folios,

> but mm.git failing to do so.  


Thank you very much for providing such detailed historical context. However,

I'm curious about your test case: how did you observe that KSM pages in mm.git

could not be swapped out, while 7.0-rc6 worked fine?  


From the current implementation of mremap, before it succeeds, it always calls

prep_move_vma() -> madvise(MADV_UNMERGEABLE) -> break_ksm(), which splits KSM pages

into regular anonymous pages, which appears to be based on a patch you introduced

over a decade ago, 1ff829957316(ksm: prevent mremap move poisoning). Given this,

KSM pages should already be broken prior to the move, so they wouldn't remain as

mergeable pages after mremap. Could there be a scenario where this breaking mechanism

is bypassed, or am I missing a subtlety in the sequence of operations?


Thanks!


> However, I say "appear to show" because I

> found swapping out any KSM pages harder than I'd been expecting: so have

> some doubts about my testing.  Let me give more detail on that at the

> bottom of this mail: it's a tangent which had better not distract from

> your speedup.

> If I'm right that your patch is flawed, what to do?

> Perhaps there is, or could be, a cleverer way for KSM to walk the anon_vma

> interval tree, which can handle the mremap-moved pgoffs appropriately.

> Cc'ing Michel, whose bf181b9f9d8d ("mm anon rmap: replace same_anon_vma

> linked list with an interval tree.") specifically chose the 0, ULONG_MAX

> which you are replacing.

> Cc'ing Lorenzo, who is currently considering replacing anon_vma by

> something more like my anonmm, which preceded Andrea's anon_vma in 2.6.7;

> but Lorenzo supplementing it with the mremap tracking which defeated me.

> This rmap_walk_ksm() might well benefit from his approach.  (I'm not

> actually expecting any input from Lorenzo here, or Michel: more FYIs.)

> But more realistic in the short term, might be for you to keep your

> optimization, but fix the lookup, by keeping a count of PTEs found,

> and when that falls short, take a second pass with 0, ULONG_MAX.

> Somewhat ugly, certainly imperfect, but good enough for now.