> > The result is as follows: > > > > Time_ms Nr_iteration_total Skip_addr_out_of_range Skip_mm_mismatch > > Before: 228.65 22169 22168 0 > > After : 0.396 3 0 2 > > > > The referenced reproducer of rmap_walk_ksm can be found at: > > https://lore.kernel.org/all/20260206151424734QIyWL_pA-1QeJPbJlUxsO@zte.com.cn/ > > > > Co-developed-by: Wang Yaxin > > Signed-off-by: Wang Yaxin > > Signed-off-by: xu xin > > This is a very attractive speedup, but I believe it's flawed: in the > special case when a range has been mremap-moved, when its anon folio > indexes and anon_vma pgoff correspond to the original user address, > not to the current user address. > > In which case, rmap_walk_ksm() will be unable to find all the PTEs > for that KSM folio, which will consequently be pinned in memory - > unable to be reclaimed, unable to be migrated, unable to be hotremoved, > until it's finally unmapped or KSM disabled. > > But it's years since I worked on KSM or on anon_vma, so I may be confused > and my belief wrong. I have tried to test it, and my testcase did appear > to show 7.0-rc6 successfully swapping out even mremap-moved KSM folios, > but mm.git failing to do so. Thank you very much for providing such detailed historical context. However, I'm curious about your test case: how did you observe that KSM pages in mm.git could not be swapped out, while 7.0-rc6 worked fine? From the current implementation of mremap, before it succeeds, it always calls prep_move_vma() -> madvise(MADV_UNMERGEABLE) -> break_ksm(), which splits KSM pages into regular anonymous pages, which appears to be based on a patch you introduced over a decade ago, 1ff829957316(ksm: prevent mremap move poisoning). Given this, KSM pages should already be broken prior to the move, so they wouldn't remain as mergeable pages after mremap. Could there be a scenario where this breaking mechanism is bypassed, or am I missing a subtlety in the sequence of operations? Thanks! > However, I say "appear to show" because I > found swapping out any KSM pages harder than I'd been expecting: so have > some doubts about my testing. Let me give more detail on that at the > bottom of this mail: it's a tangent which had better not distract from > your speedup. > > If I'm right that your patch is flawed, what to do? > > Perhaps there is, or could be, a cleverer way for KSM to walk the anon_vma > interval tree, which can handle the mremap-moved pgoffs appropriately. > Cc'ing Michel, whose bf181b9f9d8d ("mm anon rmap: replace same_anon_vma > linked list with an interval tree.") specifically chose the 0, ULONG_MAX > which you are replacing. > > Cc'ing Lorenzo, who is currently considering replacing anon_vma by > something more like my anonmm, which preceded Andrea's anon_vma in 2.6.7; > but Lorenzo supplementing it with the mremap tracking which defeated me. > This rmap_walk_ksm() might well benefit from his approach. (I'm not > actually expecting any input from Lorenzo here, or Michel: more FYIs.) > > But more realistic in the short term, might be for you to keep your > optimization, but fix the lookup, by keeping a count of PTEs found, > and when that falls short, take a second pass with 0, ULONG_MAX. > Somewhat ugly, certainly imperfect, but good enough for now.