Re: [Regerssion] [KSM] KSM CPU overhead in 6.16+ kernel compared to <=6.15 versions ("folio_walk

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [Regerssion] [KSM] KSM CPU overhead in 6.16+ kernel compared to <=6.15 versions ("folio_walk_start" kernel object overhead)
       [not found] <34d27471-80a4-49f8-b6cb-f2e51518d9ea@airmail.cc>
@ 2025-10-13 18:55 ` David Hildenbrand
       [not found]   ` <46d26246-5bd5-43f7-b1a4-dc721f717413@airmail.cc>
  0 siblings, 1 reply; 6+ messages in thread
From: David Hildenbrand @ 2025-10-13 18:55 UTC (permalink / raw)
  To: 423de7a3-1c62-4e72-8e79-19a6413e420c
  Cc: akpm, chengming.zhou, craftfever, linux-kernel, linux-mm,
	regressions, xu.xin16, Lorenzo Stoakes

On 13.10.25 19:09, craftfever wrote:
>   > Looking again, no, that's not the case. We do a cond_resched() after
> every page we looked up.
>   >
>   > Also, b1d3e9bbccb4 was introduced in v6.12 already. Regarding
> folio_walk_start(), also nothing major changed ever since v6.12.
>   >
>   > Looking at scan_get_next_rmap_item(). I guess we might hold the mmap
> lock for quite a long time (if we're iterating large areas where there
> are no suitable pages mapped -- very large sparse areas).
>   >
>   > That would explain why we end up calling folio_walk_start() that
> frequently.
>   >
>   > But nothing really changed in that regard lately in KSM code.
>   >
>   > What we probably should be doing, is give up the mmap lock after
> scanning a certain size. Or better, switch to per-VMA locks if possible.
>   >
>   > Also, looking up each address is highly inefficient if we end up having
>   > large empty areas. A range-walk function would be much better suited
> for that, so we can just jump over holes completely.
>   >
>   > But anyhow, nothing seems to have changed ever since 6.15 AFAIKT, so
> I'm not really sure what's going on here. Likely it's unrelated to KSM
> changes.
>   >
>   > -- Cheers
>   >
>   > David / dhildenb
>   >
> 
> I have to make a correction, folio_start_walk is present in "perf top"
> statistics on 6.12-6.15, it just consumes 0.5-1% of kernel time compared
> to 11-14% on 6.16+, where it causes ksmd 100% cpu usage compared <=6.15
> kernels.

I'm currently looking at the diff from 6.15 -> 6.16.

In KSM code nothing changed, really.

In folio_walk_start() itself nothing changed.

In the functions it calls also nothing relevant should have changed.


So the only explanation would be that it is simply called much more frequently.

And that might be the case if we are now scanning much, much larger VMAs that
are mostly empty, that would otherwise not be scanned.

I now recall that we had a fix from Lorenzo:

commit cf7e7a3503df0b71afd68ee84e9a09d4514cc2dd
Author: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Date:   Thu May 29 18:15:47 2025 +0100

     mm: prevent KSM from breaking VMA merging for new VMAs
     
     If a user wishes to enable KSM mergeability for an entire process and all
     fork/exec'd processes that come after it, they use the prctl()
     PR_SET_MEMORY_MERGE operation.

That went into 6.17.

Assuming we merge more carefully now, we might no longer run into the

	if (!vma->anon_vma)
		ksm_scan.address = vma->vm_end;

For the gigantic VMAs and possibly end up scanning these gigantic empty VMAs.

Just a thought:

A) Can you reproduce on 6.17?

B) Does the 6.16 you are testing with contain a backport of that commit?

Definitely, scan_get_next_rmap_item() must be optimized to walk a sparse page table
more efficiently.


> I understand, that something changed in linked function, that
> affecting KSM behavior. Maybe, you can reproduce it with same settings,
> especially it happens with Chromium apps, there is V8 sandbox with huge
> VM size. Maybe, you could reproduce the problem with the same
> MemoryKSM=yes in user@.service, that sets KSM processing for all user
> processes, especially, when Chromium is running. KSM CPU usage really
> differs between 6.12-6.15 and 6.16+. Maybe, it's related to your
> explanation.

I'm afraid I don't currently have time to reproduce.

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Regerssion] [KSM] KSM CPU overhead in 6.16+ kernel compared to <=6.15 versions ("folio_walk_start" kernel object overhead)
       [not found]   ` <46d26246-5bd5-43f7-b1a4-dc721f717413@airmail.cc>
@ 2025-10-13 19:58     ` David Hildenbrand
  2025-10-14  7:59       ` David Hildenbrand
  0 siblings, 1 reply; 6+ messages in thread
From: David Hildenbrand @ 2025-10-13 19:58 UTC (permalink / raw)
  To: craftfever
  Cc: akpm, chengming.zhou, linux-kernel, linux-mm, lorenzo.stoakes,
	regressions, xu.xin16

On 13.10.25 21:54, craftfever wrote:
> 
> Unfortunately, yes I can reproduce it. And I thought that lockups does
> not happen anymore, but I was wrong, I booted today with 6.17.2 updated
> and KSM enabled and whole situation is back. But, it only happens, when
> scanning pages corresponding to a process with huge VM size, like
> Chromium with 1TB of virtual memory. The rest is alright. It's look
> like, that the folio_walk_start called with much higher frequency, than
> in 6.12-6.15 versions. in that version page scanning of huge VM size
> processes is pretty fast and flawless) Right now, when Chromium is
> running, I expecting constant 42% folio_walk_start and 15%
> ksm_scan_thread on 6.17.2 kernel (contrary to 1% folio_walk_start and
> even less ksm_scan_thread in 6.12-6.15). I must admin that whole system
> is not freezing, just Chromium with high CPU usage from ksmd and kernel.

What about 6.16?

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Regerssion] [KSM] KSM CPU overhead in 6.16+ kernel compared to <=6.15 versions ("folio_walk_start" kernel object overhead)
  2025-10-13 19:58     ` David Hildenbrand
@ 2025-10-14  7:59       ` David Hildenbrand
  0 siblings, 0 replies; 6+ messages in thread
From: David Hildenbrand @ 2025-10-14  7:59 UTC (permalink / raw)
  To: craftfever
  Cc: akpm, chengming.zhou, linux-kernel, linux-mm, lorenzo.stoakes,
	regressions, xu.xin16

On 13.10.25 21:58, David Hildenbrand wrote:
> On 13.10.25 21:54, craftfever wrote:
>>
>> Unfortunately, yes I can reproduce it. And I thought that lockups does
>> not happen anymore, but I was wrong, I booted today with 6.17.2 updated
>> and KSM enabled and whole situation is back. But, it only happens, when
>> scanning pages corresponding to a process with huge VM size, like
>> Chromium with 1TB of virtual memory. The rest is alright. It's look
>> like, that the folio_walk_start called with much higher frequency, than
>> in 6.12-6.15 versions. in that version page scanning of huge VM size
>> processes is pretty fast and flawless) Right now, when Chromium is
>> running, I expecting constant 42% folio_walk_start and 15%
>> ksm_scan_thread on 6.17.2 kernel (contrary to 1% folio_walk_start and
>> even less ksm_scan_thread in 6.12-6.15). I must admin that whole system
>> is not freezing, just Chromium with high CPU usage from ksmd and kernel.
> 
> What about 6.16?
> 

What you replied to in private:

Just compared stock kernels (6.16.8 and 6.17.2) and must admit that the
behavior pretty same, same lockup, when just starting Chromium and same
kernel objects and ksmd overhead. No difference. (Approx 20-32% of
"folio_walk_start" and 10% ksm_scan_thread at this time on both kernels)


IIUC, 6.16.8 dos not contain a backport of Lorenzos fix, so we can rule that one out I think.

There is another VMA merging related one in 6.16:

commit 879bca0a2c4f40b08d09a95a2a0c3c6513060b5c
Author: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Date:   Tue Apr 8 10:29:31 2025 +0100

     mm/vma: fix incorrectly disallowed anonymous VMA merges
     
     Patch series "fix incorrectly disallowed anonymous VMA merges", v2.
     
     It appears that we have been incorrectly rejecting merge cases for 15
     years, apparently by mistake.
     
     Imagine a range of anonymous mapped momemory divided into two VMAs like
     this, with incompatible protection bits:
     
Could you try reverting 879bca0a2c4f40b08d09a95a2a0c3c6513060b5c on top of 6.16 and
see if the problem goes away?

Meanwhile I'll try using an ordinary pagewalk that covers a larger area
instead of a foliowalk that walks each address.

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Regerssion] [KSM] KSM CPU overhead in 6.16+ kernel compared to <=6.15 versions ("folio_walk_start" kernel object overhead)
  2025-10-13  9:52 ` David Hildenbrand
@ 2025-10-13 10:18   ` David Hildenbrand
  0 siblings, 0 replies; 6+ messages in thread
From: David Hildenbrand @ 2025-10-13 10:18 UTC (permalink / raw)
  To: craftfever, akpm, xu.xin16, chengming.zhou
  Cc: linux-mm, linux-kernel, regressions

On 13.10.25 11:52, David Hildenbrand wrote:
> On 13.10.25 11:22, craftfever@murena.io wrote:
> 
> Hi,
> 
>> I've posted about that problem already on bigzilla (#220599), but maintainers asked to post issues on maillist.
>> The problem with freezes during KSM page scanning with certain processes like Chromium with huge virtual memory size amount was fized in 6.17.1 compared to 6.16.x/6.17, but problem with huge CPU overhead is present there. Compared to Linux <=6.15, where the overhead is much lighter anad there no much CPU consuming during KSM scanning, there is "folio_walk_start" kernel object is present (which I reviewed with "perf top" command) that is not present in versions <=6.15 during KSM work and which is in work starting from Linux 6.16. This method very resource-consuming compared to algorithm used in <=6.15 versions. Is there a kernel parameter to disable it or it needs more optimization?
> 
> I doubt hat it has a lot to do with folio_walk_start(), that's just a
> simple page table walk replacing the previous walk based on follow_page().
> 
> So that's why you would suddenly spot it in perf top -- before commit
> b1d3e9bbccb4 ("mm/ksm: convert scan_get_next_rmap_item() from
> follow_page() to folio_walk") we would have used follow_page().
> 
> Do you see any kernel splats / soft-lockups?
> 
> I can see that in commit b1d3e9bbccb4 I removed a cond_resched(). maybe
> that's why it's a problem in you kernel config.

Looking again, no, that's not the case. We do a cond_resched() after 
every page we looked up.

Also, b1d3e9bbccb4 was introduced in v6.12 already. Regarding 
folio_walk_start(), also nothing major changed ever since v6.12.

Looking at scan_get_next_rmap_item(). I guess we might hold the mmap 
lock for quite a long time (if we're iterating large areas where there 
are no suitable pages mapped -- very large sparse areas).

That would explain why we end up calling folio_walk_start() that frequently.

But nothing really changed in that regard lately in KSM code.

What we probably should be doing, is give up the mmap lock after 
scanning a certain size. Or better, switch to per-VMA locks if possible.

Also, looking up each address is highly inefficient if we end up having
large empty areas. A range-walk function would be much better suited for 
that, so we can just jump over holes completely.

But anyhow, nothing seems to have changed ever since 6.15 AFAIKT, so I'm 
not really sure what's going on here. Likely it's unrelated to KSM changes.

-- 
Cheers

David / dhildenb

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Regerssion] [KSM] KSM CPU overhead in 6.16+ kernel compared to <=6.15 versions ("folio_walk_start" kernel object overhead)
  2025-10-13  9:22 craftfever
@ 2025-10-13  9:52 ` David Hildenbrand
  2025-10-13 10:18   ` David Hildenbrand
  0 siblings, 1 reply; 6+ messages in thread
From: David Hildenbrand @ 2025-10-13  9:52 UTC (permalink / raw)
  To: craftfever, akpm, xu.xin16, chengming.zhou
  Cc: linux-mm, linux-kernel, regressions

On 13.10.25 11:22, craftfever@murena.io wrote:

Hi,

> I've posted about that problem already on bigzilla (#220599), but maintainers asked to post issues on maillist.
> The problem with freezes during KSM page scanning with certain processes like Chromium with huge virtual memory size amount was fized in 6.17.1 compared to 6.16.x/6.17, but problem with huge CPU overhead is present there. Compared to Linux <=6.15, where the overhead is much lighter anad there no much CPU consuming during KSM scanning, there is "folio_walk_start" kernel object is present (which I reviewed with "perf top" command) that is not present in versions <=6.15 during KSM work and which is in work starting from Linux 6.16. This method very resource-consuming compared to algorithm used in <=6.15 versions. Is there a kernel parameter to disable it or it needs more optimization?

I doubt hat it has a lot to do with folio_walk_start(), that's just a 
simple page table walk replacing the previous walk based on follow_page().

So that's why you would suddenly spot it in perf top -- before commit 
b1d3e9bbccb4 ("mm/ksm: convert scan_get_next_rmap_item() from 
follow_page() to folio_walk") we would have used follow_page().

Do you see any kernel splats / soft-lockups?

I can see that in commit b1d3e9bbccb4 I removed a cond_resched(). maybe 
that's why it's a problem in you kernel config.

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Regerssion] [KSM] KSM CPU overhead in 6.16+ kernel compared to <=6.15 versions ("folio_walk_start" kernel object overhead)
@ 2025-10-13  9:22 craftfever
  2025-10-13  9:52 ` David Hildenbrand
  0 siblings, 1 reply; 6+ messages in thread
From: craftfever @ 2025-10-13  9:22 UTC (permalink / raw)
  To: akpm, david, xu.xin16, chengming.zhou; +Cc: linux-mm, linux-kernel, regressions

I've posted about that problem already on bigzilla (#220599), but maintainers asked to post issues on maillist.
The problem with freezes during KSM page scanning with certain processes like Chromium with huge virtual memory size amount was fized in 6.17.1 compared to 6.16.x/6.17, but problem with huge CPU overhead is present there. Compared to Linux <=6.15, where the overhead is much lighter anad there no much CPU consuming during KSM scanning, there is "folio_walk_start" kernel object is present (which I reviewed with "perf top" command) that is not present in versions <=6.15 during KSM work and which is in work starting from Linux 6.16. This method very resource-consuming compared to algorithm used in <=6.15 versions. Is there a kernel parameter to disable it or it needs more optimization?

I'm using MemoryKSM setting in systemd in user@.service for KSM process merging and it very light on <=6.15 ver., but CPU consuming on 6.16+ (6.17.1 without freezes) due to reasons said above.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-10-14  7:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <34d27471-80a4-49f8-b6cb-f2e51518d9ea@airmail.cc>
2025-10-13 18:55 ` [Regerssion] [KSM] KSM CPU overhead in 6.16+ kernel compared to <=6.15 versions ("folio_walk_start" kernel object overhead) David Hildenbrand
     [not found]   ` <46d26246-5bd5-43f7-b1a4-dc721f717413@airmail.cc>
2025-10-13 19:58     ` David Hildenbrand
2025-10-14  7:59       ` David Hildenbrand
2025-10-13  9:22 craftfever
2025-10-13  9:52 ` David Hildenbrand
2025-10-13 10:18   ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox