From: <xu.xin16@zte.com.cn>
To: <hughd@google.com>
Cc: <akpm@linux-foundation.org>, <david@kernel.org>,
<chengming.zhou@linux.dev>, <hughd@google.com>,
<wang.yaxin@zte.com.cn>, <yang.yang29@zte.com.cn>,
<michel@lespinasse.org>, <ljs@kernel.org>, <linux-mm@kvack.org>,
<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range
Date: Mon, 6 Apr 2026 09:58:04 +0800 (CST) [thread overview]
Message-ID: <20260406095804589iRP1BCGrNX3DviT29nv2O@zte.com.cn> (raw)
In-Reply-To: <02e1b8df-d568-8cbb-b8f6-46d5476d9d75@google.com>
[-- Attachment #1.1.1: Type: text/plain, Size: 3520 bytes --]
> > The result is as follows:
> >
> > Time_ms Nr_iteration_total Skip_addr_out_of_range Skip_mm_mismatch
> > Before: 228.65 22169 22168 0
> > After : 0.396 3 0 2
> >
> > The referenced reproducer of rmap_walk_ksm can be found at:
> > https://lore.kernel.org/all/20260206151424734QIyWL_pA-1QeJPbJlUxsO@zte.com.cn/
> >
> > Co-developed-by: Wang Yaxin <wang.yaxin@zte.com.cn>
> > Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn>
> > Signed-off-by: xu xin <xu.xin16@zte.com.cn>
>
> This is a very attractive speedup, but I believe it's flawed: in the
> special case when a range has been mremap-moved, when its anon folio
> indexes and anon_vma pgoff correspond to the original user address,
> not to the current user address.
>
> In which case, rmap_walk_ksm() will be unable to find all the PTEs
> for that KSM folio, which will consequently be pinned in memory -
> unable to be reclaimed, unable to be migrated, unable to be hotremoved,
> until it's finally unmapped or KSM disabled.
>
> But it's years since I worked on KSM or on anon_vma, so I may be confused
> and my belief wrong. I have tried to test it, and my testcase did appear
> to show 7.0-rc6 successfully swapping out even mremap-moved KSM folios,
> but mm.git failing to do so.
Thank you very much for providing such detailed historical context. However,
I'm curious about your test case: how did you observe that KSM pages in mm.git
could not be swapped out, while 7.0-rc6 worked fine?
From the current implementation of mremap, before it succeeds, it always calls
prep_move_vma() -> madvise(MADV_UNMERGEABLE) -> break_ksm(), which splits KSM pages
into regular anonymous pages, which appears to be based on a patch you introduced
over a decade ago, 1ff829957316(ksm: prevent mremap move poisoning). Given this,
KSM pages should already be broken prior to the move, so they wouldn't remain as
mergeable pages after mremap. Could there be a scenario where this breaking mechanism
is bypassed, or am I missing a subtlety in the sequence of operations?
Thanks!
> However, I say "appear to show" because I
> found swapping out any KSM pages harder than I'd been expecting: so have
> some doubts about my testing. Let me give more detail on that at the
> bottom of this mail: it's a tangent which had better not distract from
> your speedup.
>
> If I'm right that your patch is flawed, what to do?
>
> Perhaps there is, or could be, a cleverer way for KSM to walk the anon_vma
> interval tree, which can handle the mremap-moved pgoffs appropriately.
> Cc'ing Michel, whose bf181b9f9d8d ("mm anon rmap: replace same_anon_vma
> linked list with an interval tree.") specifically chose the 0, ULONG_MAX
> which you are replacing.
>
> Cc'ing Lorenzo, who is currently considering replacing anon_vma by
> something more like my anonmm, which preceded Andrea's anon_vma in 2.6.7;
> but Lorenzo supplementing it with the mremap tracking which defeated me.
> This rmap_walk_ksm() might well benefit from his approach. (I'm not
> actually expecting any input from Lorenzo here, or Michel: more FYIs.)
>
> But more realistic in the short term, might be for you to keep your
> optimization, but fix the lookup, by keeping a count of PTEs found,
> and when that falls short, take a second pass with 0, ULONG_MAX.
> Somewhat ugly, certainly imperfect, but good enough for now.
[-- Attachment #1.1.2: Type: text/html , Size: 4533 bytes --]
next prev parent reply other threads:[~2026-04-06 1:58 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-12 11:28 [PATCH v3 0/2] KSM: Optimizations for rmap_walk_ksm xu.xin16
2026-02-12 11:29 ` [PATCH v3 1/2] ksm: Initialize the addr only once in rmap_walk_ksm xu.xin16
2026-02-12 11:30 ` [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range xu.xin16
2026-02-12 12:21 ` David Hildenbrand (Arm)
2026-04-05 4:44 ` Hugh Dickins
2026-04-05 21:01 ` Andrew Morton
2026-04-07 9:43 ` Lorenzo Stoakes (Oracle)
2026-04-07 21:21 ` Andrew Morton
2026-04-08 6:29 ` Lorenzo Stoakes
2026-04-06 1:58 ` xu.xin16 [this message]
2026-04-06 5:35 ` Hugh Dickins
2026-04-07 6:21 ` xu.xin16
2026-04-07 9:36 ` Lorenzo Stoakes (Oracle)
2026-04-08 12:57 ` David Hildenbrand (Arm)
2026-04-06 9:21 ` David Hildenbrand (arm)
2026-04-06 9:23 ` David Hildenbrand (arm)
2026-04-07 9:39 ` Lorenzo Stoakes (Oracle)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260406095804589iRP1BCGrNX3DviT29nv2O@zte.com.cn \
--to=xu.xin16@zte.com.cn \
--cc=akpm@linux-foundation.org \
--cc=chengming.zhou@linux.dev \
--cc=david@kernel.org \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=michel@lespinasse.org \
--cc=wang.yaxin@zte.com.cn \
--cc=yang.yang29@zte.com.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox