* [PATCH 1/1] mm: avoid processing mlocked THPs in deferred split shrinker
@ 2025-09-08 4:07 Lance Yang
2025-09-08 7:38 ` David Hildenbrand
0 siblings, 1 reply; 4+ messages in thread
From: Lance Yang @ 2025-09-08 4:07 UTC (permalink / raw)
To: akpm
Cc: david, Liam.Howlett, baohua, baolin.wang, dev.jain, linux-kernel,
linux-mm, lorenzo.stoakes, npache, ryan.roberts, usamaarif642,
ziy, Lance Yang
From: Lance Yang <lance.yang@linux.dev>
When a new THP is faulted in or collapsed, it is unconditionally added to
the deferred split queue. If this THP is subsequently mlocked, it remains
on the queue but is removed from the LRU and marked unevictable.
During memory reclaim, deferred_split_scan() will still pick up this large
folio. Because it's not partially mapped, it will proceed to call
thp_underused() and then attempt to split_folio() to free all zero-filled
subpages.
This is a pointless waste of CPU cycles. The folio is mlocked and
unevictable, so any attempt to reclaim memory from it via splitting is
doomed to fail.
So, let's add an early folio_test_mlocked() check to skip this case.
Signed-off-by: Lance Yang <lance.yang@linux.dev>
---
mm/huge_memory.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 77f0c3417973..d2e84015d6b4 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -4183,6 +4183,9 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
bool underused = false;
if (!folio_test_partially_mapped(folio)) {
+ /* An mlocked folio is not a candidate for the shrinker. */
+ if (folio_test_mlocked(folio))
+ goto next;
underused = thp_underused(folio);
if (!underused)
goto next;
--
2.49.0
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 1/1] mm: avoid processing mlocked THPs in deferred split shrinker
2025-09-08 4:07 [PATCH 1/1] mm: avoid processing mlocked THPs in deferred split shrinker Lance Yang
@ 2025-09-08 7:38 ` David Hildenbrand
2025-09-08 8:13 ` Lance Yang
0 siblings, 1 reply; 4+ messages in thread
From: David Hildenbrand @ 2025-09-08 7:38 UTC (permalink / raw)
To: Lance Yang, akpm
Cc: Liam.Howlett, baohua, baolin.wang, dev.jain, linux-kernel,
linux-mm, lorenzo.stoakes, npache, ryan.roberts, usamaarif642,
ziy
On 08.09.25 06:07, Lance Yang wrote:
> From: Lance Yang <lance.yang@linux.dev>
Subject should likely be more specific:
mm: skip mlocked THPs that are underused early in deferred_split_scan()
>
> When a new THP is faulted in or collapsed, it is unconditionally added to
> the deferred split queue. If this THP is subsequently mlocked, it remains
> on the queue but is removed from the LRU and marked unevictable.
>
> During memory reclaim, deferred_split_scan() will still pick up this large
> folio. Because it's not partially mapped, it will proceed to call
> thp_underused() and then attempt to split_folio() to free all zero-filled
> subpages.
>
> This is a pointless waste of CPU cycles. The folio is mlocked and
> unevictable, so any attempt to reclaim memory from it via splitting is
> doomed to fail.
I think the whole description is a bit misleading: we're not reclaiming
memory from fully-mapped THPs even when they are underused, because it
could violate mlock() semantics where we don't want a page fault+memory
allocation on next access.
So something like the following might be clearer.
"When we stumble over a fully-mapped THP in the deferred shrinker, it
does not make sense trying to detect whether it is underused, because
try_to_map_unused_to_zeropage(), called while splitting the folio, will
not actually replace any zero-ed pages by the shared zeropage.
Splitting the folio in that case does not make any sense, so let's not
even scan if the folio is underused.
"
If I run my reproducer from [1] and mlock() the pages just after
allocating them, then I essentially get
AnonHugePages: 1048576 kB
converted to
Anonymous: 1048580 kB
Which makes sense (no memory optimized out) as discussed above.
[1] https://lkml.kernel.org/r/20250905141137.3529867-1-david@redhat.com
>
> So, let's add an early folio_test_mlocked() check to skip this case.
>
> Signed-off-by: Lance Yang <lance.yang@linux.dev>
> ---
> mm/huge_memory.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 77f0c3417973..d2e84015d6b4 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -4183,6 +4183,9 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
> bool underused = false;
>
> if (!folio_test_partially_mapped(folio)) {
> + /* An mlocked folio is not a candidate for the shrinker. */
/*
* See try_to_map_unused_to_zeropage(): we cannot optimize zero-filled
* pages after splitting an mlocked folio.
*/
> + if (folio_test_mlocked(folio))
> + goto next;
> underused = thp_underused(folio);
> if (!underused)
> goto next;
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 1/1] mm: avoid processing mlocked THPs in deferred split shrinker
2025-09-08 7:38 ` David Hildenbrand
@ 2025-09-08 8:13 ` Lance Yang
2025-09-08 8:33 ` David Hildenbrand
0 siblings, 1 reply; 4+ messages in thread
From: Lance Yang @ 2025-09-08 8:13 UTC (permalink / raw)
To: David Hildenbrand
Cc: Liam.Howlett, baohua, baolin.wang, dev.jain, linux-kernel,
linux-mm, lorenzo.stoakes, npache, ryan.roberts, usamaarif642,
ziy, akpm
On 2025/9/8 15:38, David Hildenbrand wrote:
> On 08.09.25 06:07, Lance Yang wrote:
>> From: Lance Yang <lance.yang@linux.dev>
>
> Subject should likely be more specific:
>
> mm: skip mlocked THPs that are underused early in deferred_split_scan()
Right, that's a much better and more precise subject. Thanks!
>
>>
>> When a new THP is faulted in or collapsed, it is unconditionally added to
>> the deferred split queue. If this THP is subsequently mlocked, it remains
>> on the queue but is removed from the LRU and marked unevictable.
>>
>> During memory reclaim, deferred_split_scan() will still pick up this
>> large
>> folio. Because it's not partially mapped, it will proceed to call
>> thp_underused() and then attempt to split_folio() to free all zero-filled
>> subpages.
>>
>> This is a pointless waste of CPU cycles. The folio is mlocked and
>> unevictable, so any attempt to reclaim memory from it via splitting is
>> doomed to fail.
>
> I think the whole description is a bit misleading: we're not reclaiming
> memory from fully-mapped THPs even when they are underused, because it
> could violate mlock() semantics where we don't want a page fault+memory
> allocation on next access.
>
> So something like the following might be clearer.
>
> "When we stumble over a fully-mapped THP in the deferred shrinker, it
> does not make sense trying to detect whether it is underused, because
> try_to_map_unused_to_zeropage(), called while splitting the folio, will
> not actually replace any zero-ed pages by the shared zeropage.
>
> Splitting the folio in that case does not make any sense, so let's not
> even scan if the folio is underused.
> "
Nice, that makes it much clearer. My understanding was indeed imprecise.
>
>
>
> If I run my reproducer from [1] and mlock() the pages just after
> allocating them, then I essentially get
>
> AnonHugePages: 1048576 kB
>
> converted to
>
> Anonymous: 1048580 kB
>
> Which makes sense (no memory optimized out) as discussed above.
Yes, my reproducer also shows exactly that. It's clear a lot of work is
done but no memory is actually optimized out ;)
>
>
> [1] https://lkml.kernel.org/r/20250905141137.3529867-1-david@redhat.com
>
>>
>> So, let's add an early folio_test_mlocked() check to skip this case.
>>
>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>> ---
>> mm/huge_memory.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 77f0c3417973..d2e84015d6b4 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -4183,6 +4183,9 @@ static unsigned long deferred_split_scan(struct
>> shrinker *shrink,
>> bool underused = false;
>> if (!folio_test_partially_mapped(folio)) {
>> + /* An mlocked folio is not a candidate for the shrinker. */
>
> /*
> * See try_to_map_unused_to_zeropage(): we cannot optimize zero-filled
> * pages after splitting an mlocked folio.
> */
Got it. I'll update the changelog and this comment as suggested.
>
>> + if (folio_test_mlocked(folio))
>> + goto next;
>> underused = thp_underused(folio);
>> if (!underused)
>> goto next;
>
>
Cheers,
Lance
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 1/1] mm: avoid processing mlocked THPs in deferred split shrinker
2025-09-08 8:13 ` Lance Yang
@ 2025-09-08 8:33 ` David Hildenbrand
0 siblings, 0 replies; 4+ messages in thread
From: David Hildenbrand @ 2025-09-08 8:33 UTC (permalink / raw)
To: Lance Yang
Cc: Liam.Howlett, baohua, baolin.wang, dev.jain, linux-kernel,
linux-mm, lorenzo.stoakes, npache, ryan.roberts, usamaarif642,
ziy, akpm
>>
>> If I run my reproducer from [1] and mlock() the pages just after
>> allocating them, then I essentially get
>>
>> AnonHugePages: 1048576 kB
>>
>> converted to
>>
>> Anonymous: 1048580 kB
>>
>> Which makes sense (no memory optimized out) as discussed above.
>
> Yes, my reproducer also shows exactly that. It's clear a lot of work is
> done but no memory is actually optimized out ;)
I'm not really concerned about the scanning overhead. The real harm is
splitting a THP without any benefit.
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-09-08 8:33 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-08 4:07 [PATCH 1/1] mm: avoid processing mlocked THPs in deferred split shrinker Lance Yang
2025-09-08 7:38 ` David Hildenbrand
2025-09-08 8:13 ` Lance Yang
2025-09-08 8:33 ` David Hildenbrand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox