* [PATCH] mm/khugepaged: Fix skipping of alloc sleep after second failure
@ 2025-11-24 6:19 Zhiheng Tao
2025-11-24 9:14 ` David Hildenbrand (Red Hat)
0 siblings, 1 reply; 5+ messages in thread
From: Zhiheng Tao @ 2025-11-24 6:19 UTC (permalink / raw)
To: akpm, david, lorenzo.stoakes
Cc: ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain,
baohua, lance.yang, shy828301, zokeefe, peterx, linux-mm,
linux-kernel, Zhiheng Tao
In khugepaged_do_scan(), two consecutive allocation failures cause
the logic to skip the dedicated 60s throttling sleep
(khugepaged_alloc_sleep_millisecs), forcing a fallback to the
shorter 10s scanning interval via the outer loop
Since fragmentation is unlikely to resolve in 10s, this results in
wasted CPU cycles on immediate retries.
Reorder the failure logic to ensure khugepaged_alloc_sleep() is
always called on each allocation failure.
Fixes: c6a7f445a272 ("mm: khugepaged: don't carry huge page to the next loop for !CONFIG_NUMA")
Signed-off-by: Zhiheng Tao <junchuan.tzh@antgroup.com>
---
mm/khugepaged.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index abe54f0..c3f9721 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2562,12 +2562,12 @@ static void khugepaged_do_scan(struct collapse_control *cc)
if (result == SCAN_ALLOC_HUGE_PAGE_FAIL) {
/*
* If fail to allocate the first time, try to sleep for
- * a while. When hit again, cancel the scan.
+ * a while. When hit again, sleep and cancel the scan.
*/
+ khugepaged_alloc_sleep();
if (!wait)
break;
wait = false;
- khugepaged_alloc_sleep();
}
}
}
--
1.8.3.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm/khugepaged: Fix skipping of alloc sleep after second failure
2025-11-24 6:19 [PATCH] mm/khugepaged: Fix skipping of alloc sleep after second failure Zhiheng Tao
@ 2025-11-24 9:14 ` David Hildenbrand (Red Hat)
2025-11-24 9:27 ` Lance Yang
2025-11-25 4:15 ` Zhiheng Tao
0 siblings, 2 replies; 5+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-24 9:14 UTC (permalink / raw)
To: Zhiheng Tao, akpm, lorenzo.stoakes
Cc: ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain,
baohua, lance.yang, shy828301, zokeefe, peterx, linux-mm,
linux-kernel
On 11/24/25 07:19, Zhiheng Tao wrote:
> In khugepaged_do_scan(), two consecutive allocation failures cause
> the logic to skip the dedicated 60s throttling sleep
> (khugepaged_alloc_sleep_millisecs), forcing a fallback to the
> shorter 10s scanning interval via the outer loop
>
> Since fragmentation is unlikely to resolve in 10s, this results in
> wasted CPU cycles on immediate retries.
Why shouldn't memory comapction be able to compact a single THP in 10s?
Why should it resolve in 60s?
>
> Reorder the failure logic to ensure khugepaged_alloc_sleep() is
> always called on each allocation failure.
>
> Fixes: c6a7f445a272 ("mm: khugepaged: don't carry huge page to the next loop for !CONFIG_NUMA")
What are we fixing here? This sounds like a change that might be better
on some systems, but worse on others?
We really need more information on when/how an issue was hit, and how
this patch here really moves the needle in any way.
--
Cheers
David
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm/khugepaged: Fix skipping of alloc sleep after second failure
2025-11-24 9:14 ` David Hildenbrand (Red Hat)
@ 2025-11-24 9:27 ` Lance Yang
2025-11-25 4:26 ` Zhiheng Tao
2025-11-25 4:15 ` Zhiheng Tao
1 sibling, 1 reply; 5+ messages in thread
From: Lance Yang @ 2025-11-24 9:27 UTC (permalink / raw)
To: Zhiheng Tao
Cc: ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain,
baohua, shy828301, zokeefe, peterx, akpm, linux-mm, linux-kernel,
lorenzo.stoakes, David Hildenbrand (Red Hat)
On 2025/11/24 17:14, David Hildenbrand (Red Hat) wrote:
> On 11/24/25 07:19, Zhiheng Tao wrote:
>> In khugepaged_do_scan(), two consecutive allocation failures cause
>> the logic to skip the dedicated 60s throttling sleep
>> (khugepaged_alloc_sleep_millisecs), forcing a fallback to the
>> shorter 10s scanning interval via the outer loop
>>
>> Since fragmentation is unlikely to resolve in 10s, this results in
>> wasted CPU cycles on immediate retries.
>
> Why shouldn't memory comapction be able to compact a single THP in 10s?
>
> Why should it resolve in 60s?
>
>>
>> Reorder the failure logic to ensure khugepaged_alloc_sleep() is
>> always called on each allocation failure.
>>
>> Fixes: c6a7f445a272 ("mm: khugepaged: don't carry huge page to the
>> next loop for !CONFIG_NUMA")
>
> What are we fixing here? This sounds like a change that might be better
> on some systems, but worse on others?
Seems like we're not honoring khugepaged_alloc_sleep_millisecs on the
second allocation failure... but is that actually a problem?
>
> We really need more information on when/how an issue was hit, and how
> this patch here really moves the needle in any way.
+1
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm/khugepaged: Fix skipping of alloc sleep after second failure
2025-11-24 9:14 ` David Hildenbrand (Red Hat)
2025-11-24 9:27 ` Lance Yang
@ 2025-11-25 4:15 ` Zhiheng Tao
1 sibling, 0 replies; 5+ messages in thread
From: Zhiheng Tao @ 2025-11-25 4:15 UTC (permalink / raw)
To: David Hildenbrand (Red Hat)
Cc: akpm, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, lance.yang, shy828301, zokeefe,
peterx, linux-mm, linux-kernel
On Mon, Nov 24, 2025 at 10:14:20AM +0100, David Hildenbrand (Red Hat) wrote:
> On 11/24/25 07:19, Zhiheng Tao wrote:
> >In khugepaged_do_scan(), two consecutive allocation failures cause
> >the logic to skip the dedicated 60s throttling sleep
> >(khugepaged_alloc_sleep_millisecs), forcing a fallback to the
> >shorter 10s scanning interval via the outer loop
> >
> >Since fragmentation is unlikely to resolve in 10s, this results in
> >wasted CPU cycles on immediate retries.
>
> Why shouldn't memory comapction be able to compact a single THP in 10s?
>
> Why should it resolve in 60s?
>
It may resolve in 10s or 60s. The problem is that the sleep controlled
by khugepaged_alloc_sleep_millisecs should not be skipped if allocation
fails.
> >
> >Reorder the failure logic to ensure khugepaged_alloc_sleep() is
> >always called on each allocation failure.
> >
> >Fixes: c6a7f445a272 ("mm: khugepaged: don't carry huge page to the next loop for !CONFIG_NUMA")
>
> What are we fixing here? This sounds like a change that might be
> better on some systems, but worse on others?
>
> We really need more information on when/how an issue was hit, and
> how this patch here really moves the needle in any way.
>
It works better. The missing of khugepaged_alloc_sleep() is not
introduced by this change. Maybe I should remove "Fix".
> --
> Cheers
>
> David
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm/khugepaged: Fix skipping of alloc sleep after second failure
2025-11-24 9:27 ` Lance Yang
@ 2025-11-25 4:26 ` Zhiheng Tao
0 siblings, 0 replies; 5+ messages in thread
From: Zhiheng Tao @ 2025-11-25 4:26 UTC (permalink / raw)
To: Lance Yang
Cc: ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain,
baohua, shy828301, zokeefe, peterx, akpm, linux-mm, linux-kernel,
lorenzo.stoakes, David Hildenbrand (Red Hat)
On Mon, Nov 24, 2025 at 05:27:23PM +0800, Lance Yang wrote:
>
>
> On 2025/11/24 17:14, David Hildenbrand (Red Hat) wrote:
> >On 11/24/25 07:19, Zhiheng Tao wrote:
> >>In khugepaged_do_scan(), two consecutive allocation failures cause
> >>the logic to skip the dedicated 60s throttling sleep
> >>(khugepaged_alloc_sleep_millisecs), forcing a fallback to the
> >>shorter 10s scanning interval via the outer loop
> >>
> >>Since fragmentation is unlikely to resolve in 10s, this results in
> >>wasted CPU cycles on immediate retries.
> >
> >Why shouldn't memory comapction be able to compact a single THP in 10s?
> >
> >Why should it resolve in 60s?
> >
> >>
> >>Reorder the failure logic to ensure khugepaged_alloc_sleep() is
> >>always called on each allocation failure.
> >>
> >>Fixes: c6a7f445a272 ("mm: khugepaged: don't carry huge page to
> >>the next loop for !CONFIG_NUMA")
> >
> >What are we fixing here? This sounds like a change that might be
> >better on some systems, but worse on others?
>
> Seems like we're not honoring khugepaged_alloc_sleep_millisecs on the
> second allocation failure... but is that actually a problem?
>
Is it more appropriate to honor the second allocation failure?
It was a problem before commit c6a7f445a272 when
khugepaged_pages_to_scan=512.
> >
> >We really need more information on when/how an issue was hit, and
> >how this patch here really moves the needle in any way.
>
> +1
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-11-25 4:26 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-24 6:19 [PATCH] mm/khugepaged: Fix skipping of alloc sleep after second failure Zhiheng Tao
2025-11-24 9:14 ` David Hildenbrand (Red Hat)
2025-11-24 9:27 ` Lance Yang
2025-11-25 4:26 ` Zhiheng Tao
2025-11-25 4:15 ` Zhiheng Tao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox