From: David Hildenbrand <david@redhat.com>
To: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>
Cc: Xu Xin <xu.xin16@zte.com.cn>,
Chengming Zhou <chengming.zhou@linux.dev>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
craftfever <craftfever@airmail.cc>
Subject: Re: [PATCH v5] ksm: use range-walk function to jump over holes in scan_get_next_rmap_item
Date: Thu, 23 Oct 2025 12:13:32 +0200 [thread overview]
Message-ID: <e0886fdf-d198-4130-bd9a-be276c59da37@redhat.com> (raw)
In-Reply-To: <20251023035841.41406-1-pedrodemargomes@gmail.com>
On 23.10.25 05:58, Pedro Demarchi Gomes wrote:
> Currently, scan_get_next_rmap_item() walks every page address in a VMA
> to locate mergeable pages. This becomes highly inefficient when scanning
> large virtual memory areas that contain mostly unmapped regions, causing
> ksmd to use large amount of cpu without deduplicating much pages.
>
> This patch replaces the per-address lookup with a range walk using
> walk_page_range(). The range walker allows KSM to skip over entire
> unmapped holes in a VMA, avoiding unnecessary lookups.
> This problem was previously discussed in [1].
>
> Consider the following test program which creates a 32 TiB mapping in
> the virtual address space but only populates a single page:
>
> #include <unistd.h>
> #include <stdio.h>
> #include <sys/mman.h>
>
> /* 32 TiB */
> const size_t size = 32ul * 1024 * 1024 * 1024 * 1024;
>
> int main() {
> char *area = mmap(NULL, size, PROT_READ | PROT_WRITE,
> MAP_NORESERVE | MAP_PRIVATE | MAP_ANON, -1, 0);
>
> if (area == MAP_FAILED) {
> perror("mmap() failed\n");
> return -1;
> }
>
> /* Populate a single page such that we get an anon_vma. */
> *area = 0;
>
> /* Enable KSM. */
> madvise(area, size, MADV_MERGEABLE);
> pause();
> return 0;
> }
>
> $ ./ksm-sparse &
> $ echo 1 > /sys/kernel/mm/ksm/run
>
> Without this patch ksmd uses 100% of the cpu for a long time (more then
> 1 hour in my test machine) scanning all the 32 TiB virtual address space
> that contain only one mapped page. This makes ksmd essentially deadlocked
> not able to deduplicate anything of value.
> With this patch ksmd walks only the one mapped page and skips the rest of
> the 32 TiB virtual address space, making the scan fast using little cpu.
>
> [1] https://lore.kernel.org/linux-mm/423de7a3-1c62-4e72-8e79-19a6413e420c@redhat.com/
>
> ---
> v5:
> - Improve patch description
>
> v4: https://lore.kernel.org/linux-mm/20251022153059.22763-1-pedrodemargomes@gmail.com/
> - Make minimal changes to replace folio_walk by walk_page_range_vma
>
> v3: https://lore.kernel.org/all/20251016012236.4189-1-pedrodemargomes@gmail.com/
> - Treat THPs in ksm_pmd_entry
> - Update ksm_scan.address outside walk_page_range
> - Change goto to while loop
>
> v2: https://lore.kernel.org/all/20251014151126.87589-1-pedrodemargomes@gmail.com/
> - Use pmd_entry to walk page range
> - Use cond_resched inside pmd_entry()
> - walk_page_range returns page+folio
>
> v1: https://lore.kernel.org/all/20251014055828.124522-1-pedrodemargomes@gmail.com/
>
> Reported-by: craftfever <craftfever@airmail.cc>
> Closes: https://lkml.kernel.org/r/020cf8de6e773bb78ba7614ef250129f11a63781@murena.io
> Suggested-by: David Hildenbrand <david@redhat.com>
> Co-developed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> Fixes: 31dbd01f3143 ("ksm: Kernel SamePage Merging")
> Signed-off-by: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
I think we really want to
Cc: stable@vger.kernel.org
Andrew can do that when applying.
Acked-by: David Hildenbrand <david@redhat.com>
As a note, we have similar code that should probably be doing a range
walk instead: unmerge_ksm_pages()->break_ksm().
It can be triggered on a range through unmerge_ksm_pages(), which gets
called from:
* ksm_madvise() through madvise(MADV_UNMERGEABLE). There are not a lot
of users of that function.
* __ksm_del_vma() through ksm_del_vmas(). Effectively called when
disabling KSM for a process either through the sysctl or from s390x gmap
code when enabling storage keys for a VM.
In both cases, it's not ksmd that's blocked, it's just that the
operation (trigger by the app) takes longer.
So both are not as critical as this thing here, but likely we should
take care of it at some point.
Interestingly, I converted that from a walk_page_range_vma() to
folio_walk_start() after converting it from follow_page() to
walk_page_range_vma().
But we never did a range walk, we just walked individual addresses,
because that's what break_ksm() does.
We could effectively revert e317a8d8b4f600fc7ec9725e26417030ee594f52 and
adjust it to perform an actual range walk by passing a range to break_ksm().
--
Cheers
David / dhildenb
next prev parent reply other threads:[~2025-10-23 10:13 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-23 3:58 Pedro Demarchi Gomes
2025-10-23 10:13 ` David Hildenbrand [this message]
2025-10-24 21:52 ` Pedro Demarchi Gomes
[not found] ` <d06f4c9d-cf63-4bc4-859d-8256e16dd68d@airmail.cc>
2025-10-23 10:13 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e0886fdf-d198-4130-bd9a-be276c59da37@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=chengming.zhou@linux.dev \
--cc=craftfever@airmail.cc \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=pedrodemargomes@gmail.com \
--cc=xu.xin16@zte.com.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox