linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: migrate: requeue destination folio on deferred split queue
@ 2026-03-06 13:35 Usama Arif
  2026-03-06 13:49 ` David Hildenbrand (Arm)
  2026-03-06 13:51 ` David Hildenbrand (Arm)
  0 siblings, 2 replies; 10+ messages in thread
From: Usama Arif @ 2026-03-06 13:35 UTC (permalink / raw)
  To: Andrew Morton, npache, david, ziy, linux-mm
  Cc: matthew.brost, joshua.hahnjy, hannes, rakie.kim, byungchul,
	gourry, ying.huang, apopple, linux-kernel, kernel-team,
	Usama Arif

During folio migration, __folio_migrate_mapping() removes the source
folio from the deferred split queue, but the destination folio is never
re-queued.  This causes underutilized THPs to escape the shrinker after
NUMA migration, since they silently drop off the deferred split list.

Fix this by calling deferred_split_folio() on the destination folio
after a successful migration, for large rmappable folios.

Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Fixes: dafff3f4c850 ("mm: split underused THPs")
Signed-off-by: Usama Arif <usama.arif@linux.dev>
---
 mm/migrate.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/mm/migrate.c b/mm/migrate.c
index ece77ccb2ec0..98d0a594f7b7 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1393,6 +1393,17 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
 	if (old_page_state & PAGE_WAS_MAPPED)
 		remove_migration_ptes(src, dst, 0);
 
+	/*
+	 * Requeue the destination folio on the deferred split queue if
+	 * the source was a large folio that was on the queue. Without
+	 * this, NUMA migration causes underutilized THPs to escape
+	 * the shrinker since the source is unqueued in
+	 * __folio_migrate_mapping() and the destination is never
+	 * re-queued.
+	 */
+	if (folio_test_large(dst) && folio_test_large_rmappable(dst))
+		deferred_split_folio(dst, false);
+
 out_unlock_both:
 	folio_unlock(dst);
 	folio_set_owner_migrate_reason(dst, reason);
-- 
2.47.3



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: migrate: requeue destination folio on deferred split queue
  2026-03-06 13:35 [PATCH] mm: migrate: requeue destination folio on deferred split queue Usama Arif
@ 2026-03-06 13:49 ` David Hildenbrand (Arm)
  2026-03-06 14:12   ` Usama Arif
  2026-03-06 13:51 ` David Hildenbrand (Arm)
  1 sibling, 1 reply; 10+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-06 13:49 UTC (permalink / raw)
  To: Usama Arif, Andrew Morton, npache, ziy, linux-mm
  Cc: matthew.brost, joshua.hahnjy, hannes, rakie.kim, byungchul,
	gourry, ying.huang, apopple, linux-kernel, kernel-team

On 3/6/26 14:35, Usama Arif wrote:
> During folio migration, __folio_migrate_mapping() removes the source
> folio from the deferred split queue, but the destination folio is never
> re-queued.  This causes underutilized THPs to escape the shrinker after
> NUMA migration, since they silently drop off the deferred split list.
> 
> Fix this by calling deferred_split_folio() on the destination folio
> after a successful migration, for large rmappable folios.
> 
> Reported-by: Johannes Weiner <hannes@cmpxchg.org>
> Fixes: dafff3f4c850 ("mm: split underused THPs")
> Signed-off-by: Usama Arif <usama.arif@linux.dev>
> ---
>  mm/migrate.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index ece77ccb2ec0..98d0a594f7b7 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1393,6 +1393,17 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>  	if (old_page_state & PAGE_WAS_MAPPED)
>  		remove_migration_ptes(src, dst, 0);
>  
> +	/*
> +	 * Requeue the destination folio on the deferred split queue if
> +	 * the source was a large folio that was on the queue. Without
> +	 * this, NUMA migration causes underutilized THPs to escape
> +	 * the shrinker since the source is unqueued in
> +	 * __folio_migrate_mapping() and the destination is never
> +	 * re-queued.
> +	 */
> +	if (folio_test_large(dst) && folio_test_large_rmappable(dst))
> +		deferred_split_folio(dst, false);

Doesn't that mean that you will readd any large folios, even if already
previously taken off the list after scanning?

So I am not sure if your "if the source was a large folio that was on
the queue." comment is accurate?

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: migrate: requeue destination folio on deferred split queue
  2026-03-06 13:35 [PATCH] mm: migrate: requeue destination folio on deferred split queue Usama Arif
  2026-03-06 13:49 ` David Hildenbrand (Arm)
@ 2026-03-06 13:51 ` David Hildenbrand (Arm)
  1 sibling, 0 replies; 10+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-06 13:51 UTC (permalink / raw)
  To: Usama Arif, Andrew Morton, npache, ziy, linux-mm
  Cc: matthew.brost, joshua.hahnjy, hannes, rakie.kim, byungchul,
	gourry, ying.huang, apopple, linux-kernel, kernel-team

On 3/6/26 14:35, Usama Arif wrote:
> During folio migration, __folio_migrate_mapping() removes the source
> folio from the deferred split queue, but the destination folio is never
> re-queued.  This causes underutilized THPs to escape the shrinker after
> NUMA migration, since they silently drop off the deferred split list.
> 
> Fix this by calling deferred_split_folio() on the destination folio
> after a successful migration, for large rmappable folios.
> 
> Reported-by: Johannes Weiner <hannes@cmpxchg.org>
> Fixes: dafff3f4c850 ("mm: split underused THPs")
> Signed-off-by: Usama Arif <usama.arif@linux.dev>
> ---
>  mm/migrate.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index ece77ccb2ec0..98d0a594f7b7 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1393,6 +1393,17 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>  	if (old_page_state & PAGE_WAS_MAPPED)
>  		remove_migration_ptes(src, dst, 0);
>  
> +	/*
> +	 * Requeue the destination folio on the deferred split queue if
> +	 * the source was a large folio that was on the queue. Without
> +	 * this, NUMA migration causes underutilized THPs to escape
> +	 * the shrinker since the source is unqueued in
> +	 * __folio_migrate_mapping() and the destination is never
> +	 * re-queued.
> +	 */
> +	if (folio_test_large(dst) && folio_test_large_rmappable(dst))
> +		deferred_split_folio(dst, false);

Also, should you be checking for anon and non-device folios?

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: migrate: requeue destination folio on deferred split queue
  2026-03-06 13:49 ` David Hildenbrand (Arm)
@ 2026-03-06 14:12   ` Usama Arif
  2026-03-06 14:46     ` Zi Yan
  2026-03-06 16:08     ` Matthew Wilcox
  0 siblings, 2 replies; 10+ messages in thread
From: Usama Arif @ 2026-03-06 14:12 UTC (permalink / raw)
  To: David Hildenbrand (Arm), Andrew Morton, npache, ziy, linux-mm
  Cc: matthew.brost, joshua.hahnjy, hannes, rakie.kim, byungchul,
	gourry, ying.huang, apopple, linux-kernel, kernel-team



On 06/03/2026 13:49, David Hildenbrand (Arm) wrote:
> On 3/6/26 14:35, Usama Arif wrote:
>> During folio migration, __folio_migrate_mapping() removes the source
>> folio from the deferred split queue, but the destination folio is never
>> re-queued.  This causes underutilized THPs to escape the shrinker after
>> NUMA migration, since they silently drop off the deferred split list.
>>
>> Fix this by calling deferred_split_folio() on the destination folio
>> after a successful migration, for large rmappable folios.
>>
>> Reported-by: Johannes Weiner <hannes@cmpxchg.org>
>> Fixes: dafff3f4c850 ("mm: split underused THPs")
>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>> ---
>>  mm/migrate.c | 11 +++++++++++
>>  1 file changed, 11 insertions(+)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index ece77ccb2ec0..98d0a594f7b7 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1393,6 +1393,17 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>>  	if (old_page_state & PAGE_WAS_MAPPED)
>>  		remove_migration_ptes(src, dst, 0);
>>  
>> +	/*
>> +	 * Requeue the destination folio on the deferred split queue if
>> +	 * the source was a large folio that was on the queue. Without
>> +	 * this, NUMA migration causes underutilized THPs to escape
>> +	 * the shrinker since the source is unqueued in
>> +	 * __folio_migrate_mapping() and the destination is never
>> +	 * re-queued.
>> +	 */
>> +	if (folio_test_large(dst) && folio_test_large_rmappable(dst))
>> +		deferred_split_folio(dst, false);
> 
> Doesn't that mean that you will readd any large folios, even if already
> previously taken off the list after scanning?
> 
> So I am not sure if your "if the source was a large folio that was on
> the queue." comment is accurate?
> 

Yes you are right. How about something like below? We also won't need to check
for anon and non-device folios with this as we only set the the flag if it was
already on deferred_split list.


diff --git a/mm/migrate.c b/mm/migrate.c
index ece77ccb2ec0..9e0780d380e4 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1137,7 +1137,9 @@ static int move_to_new_folio(struct folio *dst, struct folio *src,
 enum {
        PAGE_WAS_MAPPED = BIT(0),
        PAGE_WAS_MLOCKED = BIT(1),
-       PAGE_OLD_STATES = PAGE_WAS_MAPPED | PAGE_WAS_MLOCKED,
+       PAGE_WAS_ON_DEFERRED_SPLIT = BIT(2),
+       PAGE_OLD_STATES = PAGE_WAS_MAPPED | PAGE_WAS_MLOCKED |
+                         PAGE_WAS_ON_DEFERRED_SPLIT,
 };
 
 static void __migrate_folio_record(struct folio *dst,
@@ -1373,6 +1375,15 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
                goto out_unlock_both;
        }
 
+       /*
+        * Record whether the source folio is on the deferred split queue
+        * before move_to_new_folio(), which unqueues it via
+        * __folio_migrate_mapping().
+        */
+       if (folio_test_large(src) && folio_test_large_rmappable(src) &&
+           !data_race(list_empty(&src->_deferred_list)))
+               old_page_state |= PAGE_WAS_ON_DEFERRED_SPLIT;
+
        rc = move_to_new_folio(dst, src, mode);
        if (rc)
                goto out;
@@ -1393,6 +1404,15 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
        if (old_page_state & PAGE_WAS_MAPPED)
                remove_migration_ptes(src, dst, 0);
 
+       /*
+        * Requeue the destination folio on the deferred split queue if
+        * the source was on the queue.  The source is unqueued in
+        * __folio_migrate_mapping(), so we record and check the state
+        * from before move_to_new_folio().
+        */
+       if (old_page_state & PAGE_WAS_ON_DEFERRED_SPLIT)
+               deferred_split_folio(dst, false);
+
 out_unlock_both:
        folio_unlock(dst);
        folio_set_owner_migrate_reason(dst, reason);


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: migrate: requeue destination folio on deferred split queue
  2026-03-06 14:12   ` Usama Arif
@ 2026-03-06 14:46     ` Zi Yan
  2026-03-06 16:15       ` Usama Arif
  2026-03-06 16:08     ` Matthew Wilcox
  1 sibling, 1 reply; 10+ messages in thread
From: Zi Yan @ 2026-03-06 14:46 UTC (permalink / raw)
  To: Usama Arif
  Cc: David Hildenbrand (Arm),
	Andrew Morton, npache, linux-mm, matthew.brost, joshua.hahnjy,
	hannes, rakie.kim, byungchul, gourry, ying.huang, apopple,
	linux-kernel, kernel-team

On 6 Mar 2026, at 9:12, Usama Arif wrote:

> On 06/03/2026 13:49, David Hildenbrand (Arm) wrote:
>> On 3/6/26 14:35, Usama Arif wrote:
>>> During folio migration, __folio_migrate_mapping() removes the source
>>> folio from the deferred split queue, but the destination folio is never
>>> re-queued.  This causes underutilized THPs to escape the shrinker after
>>> NUMA migration, since they silently drop off the deferred split list.
>>>
>>> Fix this by calling deferred_split_folio() on the destination folio
>>> after a successful migration, for large rmappable folios.
>>>
>>> Reported-by: Johannes Weiner <hannes@cmpxchg.org>
>>> Fixes: dafff3f4c850 ("mm: split underused THPs")
>>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>>> ---
>>>  mm/migrate.c | 11 +++++++++++
>>>  1 file changed, 11 insertions(+)
>>>
>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>> index ece77ccb2ec0..98d0a594f7b7 100644
>>> --- a/mm/migrate.c
>>> +++ b/mm/migrate.c
>>> @@ -1393,6 +1393,17 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>>>  	if (old_page_state & PAGE_WAS_MAPPED)
>>>  		remove_migration_ptes(src, dst, 0);
>>>
>>> +	/*
>>> +	 * Requeue the destination folio on the deferred split queue if
>>> +	 * the source was a large folio that was on the queue. Without
>>> +	 * this, NUMA migration causes underutilized THPs to escape
>>> +	 * the shrinker since the source is unqueued in
>>> +	 * __folio_migrate_mapping() and the destination is never
>>> +	 * re-queued.
>>> +	 */
>>> +	if (folio_test_large(dst) && folio_test_large_rmappable(dst))
>>> +		deferred_split_folio(dst, false);
>>
>> Doesn't that mean that you will readd any large folios, even if already
>> previously taken off the list after scanning?
>>
>> So I am not sure if your "if the source was a large folio that was on
>> the queue." comment is accurate?
>>
>
> Yes you are right. How about something like below? We also won't need to check
> for anon and non-device folios with this as we only set the the flag if it was
> already on deferred_split list.

BTW, migrate_pages() tries to split partially mapped folios before migration[1],
so what remains in the deferred_list would be:

1. partially mapped but with a pin,
2. fully mapped but potentially underused.

I wonder if you want to do an underused scan before migration and try to split
underused THPs. Or to avoid this additional scan, find a way of detecting
zero pages at page copy time and split it after migration.

Anyway, it seems that all large folios are in this deferred_list. Maybe, like
David suggested in his LSFMM proposal, we should scan large folios on LRU lists
at reclaim time instead, since there is not much difference between deferred_list
and LRU lists right now.


[1] https://elixir.bootlin.com/linux/v6.19.3/source/mm/migrate.c#L1840

>
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index ece77ccb2ec0..9e0780d380e4 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1137,7 +1137,9 @@ static int move_to_new_folio(struct folio *dst, struct folio *src,
>  enum {
>         PAGE_WAS_MAPPED = BIT(0),
>         PAGE_WAS_MLOCKED = BIT(1),
> -       PAGE_OLD_STATES = PAGE_WAS_MAPPED | PAGE_WAS_MLOCKED,
> +       PAGE_WAS_ON_DEFERRED_SPLIT = BIT(2),
> +       PAGE_OLD_STATES = PAGE_WAS_MAPPED | PAGE_WAS_MLOCKED |
> +                         PAGE_WAS_ON_DEFERRED_SPLIT,
>  };
>
>  static void __migrate_folio_record(struct folio *dst,
> @@ -1373,6 +1375,15 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>                 goto out_unlock_both;
>         }
>
> +       /*
> +        * Record whether the source folio is on the deferred split queue
> +        * before move_to_new_folio(), which unqueues it via
> +        * __folio_migrate_mapping().
> +        */
> +       if (folio_test_large(src) && folio_test_large_rmappable(src) &&
> +           !data_race(list_empty(&src->_deferred_list)))
> +               old_page_state |= PAGE_WAS_ON_DEFERRED_SPLIT;
> +
>         rc = move_to_new_folio(dst, src, mode);
>         if (rc)
>                 goto out;
> @@ -1393,6 +1404,15 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>         if (old_page_state & PAGE_WAS_MAPPED)
>                 remove_migration_ptes(src, dst, 0);
>
> +       /*
> +        * Requeue the destination folio on the deferred split queue if
> +        * the source was on the queue.  The source is unqueued in
> +        * __folio_migrate_mapping(), so we record and check the state
> +        * from before move_to_new_folio().
> +        */
> +       if (old_page_state & PAGE_WAS_ON_DEFERRED_SPLIT)
> +               deferred_split_folio(dst, false);
> +
>  out_unlock_both:
>         folio_unlock(dst);
>         folio_set_owner_migrate_reason(dst, reason);


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: migrate: requeue destination folio on deferred split queue
  2026-03-06 14:12   ` Usama Arif
  2026-03-06 14:46     ` Zi Yan
@ 2026-03-06 16:08     ` Matthew Wilcox
  2026-03-06 16:19       ` Usama Arif
  1 sibling, 1 reply; 10+ messages in thread
From: Matthew Wilcox @ 2026-03-06 16:08 UTC (permalink / raw)
  To: Usama Arif
  Cc: David Hildenbrand (Arm),
	Andrew Morton, npache, ziy, linux-mm, matthew.brost,
	joshua.hahnjy, hannes, rakie.kim, byungchul, gourry, ying.huang,
	apopple, linux-kernel, kernel-team

On Fri, Mar 06, 2026 at 05:12:38PM +0300, Usama Arif wrote:
> +       /*
> +        * Record whether the source folio is on the deferred split queue
> +        * before move_to_new_folio(), which unqueues it via
> +        * __folio_migrate_mapping().
> +        */
> +       if (folio_test_large(src) && folio_test_large_rmappable(src) &&
> +           !data_race(list_empty(&src->_deferred_list)))

Why do you need data_race() here?  list_empty() contains a READ_ONCE()
so shouldn't be necessary?

> +               old_page_state |= PAGE_WAS_ON_DEFERRED_SPLIT;

You've done a great job of the naming.  So much so that the comment
seems entirely unnecessary?

> +       /*
> +        * Requeue the destination folio on the deferred split queue if
> +        * the source was on the queue.  The source is unqueued in
> +        * __folio_migrate_mapping(), so we record and check the state
> +        * from before move_to_new_folio().
> +        */
> +       if (old_page_state & PAGE_WAS_ON_DEFERRED_SPLIT)
> +               deferred_split_folio(dst, false);

Again, I'm not sure the comment says anything that the code doesn't?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: migrate: requeue destination folio on deferred split queue
  2026-03-06 14:46     ` Zi Yan
@ 2026-03-06 16:15       ` Usama Arif
  2026-03-06 16:23         ` David Hildenbrand (Arm)
  2026-03-06 16:26         ` Zi Yan
  0 siblings, 2 replies; 10+ messages in thread
From: Usama Arif @ 2026-03-06 16:15 UTC (permalink / raw)
  To: Zi Yan
  Cc: David Hildenbrand (Arm),
	Andrew Morton, npache, linux-mm, matthew.brost, joshua.hahnjy,
	hannes, rakie.kim, byungchul, gourry, ying.huang, apopple,
	linux-kernel, kernel-team



On 06/03/2026 14:46, Zi Yan wrote:
> On 6 Mar 2026, at 9:12, Usama Arif wrote:
> 
>> On 06/03/2026 13:49, David Hildenbrand (Arm) wrote:
>>> On 3/6/26 14:35, Usama Arif wrote:
>>>> During folio migration, __folio_migrate_mapping() removes the source
>>>> folio from the deferred split queue, but the destination folio is never
>>>> re-queued.  This causes underutilized THPs to escape the shrinker after
>>>> NUMA migration, since they silently drop off the deferred split list.
>>>>
>>>> Fix this by calling deferred_split_folio() on the destination folio
>>>> after a successful migration, for large rmappable folios.
>>>>
>>>> Reported-by: Johannes Weiner <hannes@cmpxchg.org>
>>>> Fixes: dafff3f4c850 ("mm: split underused THPs")
>>>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>>>> ---
>>>>  mm/migrate.c | 11 +++++++++++
>>>>  1 file changed, 11 insertions(+)
>>>>
>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>> index ece77ccb2ec0..98d0a594f7b7 100644
>>>> --- a/mm/migrate.c
>>>> +++ b/mm/migrate.c
>>>> @@ -1393,6 +1393,17 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>>>>  	if (old_page_state & PAGE_WAS_MAPPED)
>>>>  		remove_migration_ptes(src, dst, 0);
>>>>
>>>> +	/*
>>>> +	 * Requeue the destination folio on the deferred split queue if
>>>> +	 * the source was a large folio that was on the queue. Without
>>>> +	 * this, NUMA migration causes underutilized THPs to escape
>>>> +	 * the shrinker since the source is unqueued in
>>>> +	 * __folio_migrate_mapping() and the destination is never
>>>> +	 * re-queued.
>>>> +	 */
>>>> +	if (folio_test_large(dst) && folio_test_large_rmappable(dst))
>>>> +		deferred_split_folio(dst, false);
>>>
>>> Doesn't that mean that you will readd any large folios, even if already
>>> previously taken off the list after scanning?
>>>
>>> So I am not sure if your "if the source was a large folio that was on
>>> the queue." comment is accurate?
>>>
>>
>> Yes you are right. How about something like below? We also won't need to check
>> for anon and non-device folios with this as we only set the the flag if it was
>> already on deferred_split list.
> 
> BTW, migrate_pages() tries to split partially mapped folios before migration[1],
> so what remains in the deferred_list would be:
> 
> 1. partially mapped but with a pin,
> 2. fully mapped but potentially underused.
> 

Yes, thats right.

> I wonder if you want to do an underused scan before migration and try to split
> underused THPs.

hmm, I think we should keep THPs as is if there is no memory pressure (proactive
or otherwise). Scanning THPs for zeros has a cost and we would also lose the benefit
of THPs when we dont need memory.

> Or to avoid this additional scan, find a way of detecting
> zero pages at page copy time and split it after migration.
> 

Yeah but I think we lose the benefits of THPs after migration when we dont need
additional memory?

> Anyway, it seems that all large folios are in this deferred_list. Maybe, like
> David suggested in his LSFMM proposal, we should scan large folios on LRU lists
> at reclaim time instead, since there is not much difference between deferred_list
> and LRU lists right now.
> 

Yeah the THP shrinker is a very basic implementation and there are a lot of 

> 
> [1] https://elixir.bootlin.com/linux/v6.19.3/source/mm/migrate.c#L1840
> 

Also Johannes pointed out its not great storing this information in page flags,
we can just keep it as local variable. This is what the patch would look like:


diff --git a/mm/migrate.c b/mm/migrate.c
index ece77ccb2ec0..48a972f158ab 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1360,6 +1360,7 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
        int rc;
        int old_page_state = 0;
        struct anon_vma *anon_vma = NULL;
+       bool src_deferred_split = false;
        struct list_head *prev;
 
        __migrate_folio_extract(dst, &old_page_state, &anon_vma);
@@ -1373,6 +1374,10 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
                goto out_unlock_both;
        }
 
+       if (folio_test_large(src) && folio_test_large_rmappable(src) &&
+           !data_race(list_empty(&src->_deferred_list)))
+               src_deferred_split = true;
+
        rc = move_to_new_folio(dst, src, mode);
        if (rc)
                goto out;
@@ -1393,6 +1398,15 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
        if (old_page_state & PAGE_WAS_MAPPED)
                remove_migration_ptes(src, dst, 0);
 
+       /*
+        * Requeue the destination folio on the deferred split queue if
+        * the source was on the queue.  The source is unqueued in
+        * __folio_migrate_mapping(), so we recorded the state from
+        * before move_to_new_folio().
+        */
+       if (src_deferred_split)
+               deferred_split_folio(dst, false);
+
 out_unlock_both:
        folio_unlock(dst);
        folio_set_owner_migrate_reason(dst, reason);
 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: migrate: requeue destination folio on deferred split queue
  2026-03-06 16:08     ` Matthew Wilcox
@ 2026-03-06 16:19       ` Usama Arif
  0 siblings, 0 replies; 10+ messages in thread
From: Usama Arif @ 2026-03-06 16:19 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: David Hildenbrand (Arm),
	Andrew Morton, npache, ziy, linux-mm, matthew.brost,
	joshua.hahnjy, hannes, rakie.kim, byungchul, gourry, ying.huang,
	apopple, linux-kernel, kernel-team



On 06/03/2026 16:08, Matthew Wilcox wrote:
> On Fri, Mar 06, 2026 at 05:12:38PM +0300, Usama Arif wrote:
>> +       /*
>> +        * Record whether the source folio is on the deferred split queue
>> +        * before move_to_new_folio(), which unqueues it via
>> +        * __folio_migrate_mapping().
>> +        */
>> +       if (folio_test_large(src) && folio_test_large_rmappable(src) &&
>> +           !data_race(list_empty(&src->_deferred_list)))
> 
> Why do you need data_race() here?  list_empty() contains a READ_ONCE()
> so shouldn't be necessary?

Ah mainly because we dont acquire split_queue_lock before accessing,
similar to what we do in folio_unqueue_deferred_split().

> 
>> +               old_page_state |= PAGE_WAS_ON_DEFERRED_SPLIT;
> 
> You've done a great job of the naming.  So much so that the comment
> seems entirely unnecessary?
> 
>> +       /*
>> +        * Requeue the destination folio on the deferred split queue if
>> +        * the source was on the queue.  The source is unqueued in
>> +        * __folio_migrate_mapping(), so we record and check the state
>> +        * from before move_to_new_folio().
>> +        */
>> +       if (old_page_state & PAGE_WAS_ON_DEFERRED_SPLIT)
>> +               deferred_split_folio(dst, false);
> 
> Again, I'm not sure the comment says anything that the code doesn't?

Yeah there is a much simpler version in reply to Zi's review in [1].
I found the whole migrate_folio_move() function quite verbose already,
was keeping up with the theme of that function lol. I will see if I can
cut down on the comment in [1] as well. Thanks!

[1] https://lore.kernel.org/all/28e48b47-f215-4e4a-b55a-01dbf293ff35@linux.dev/



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: migrate: requeue destination folio on deferred split queue
  2026-03-06 16:15       ` Usama Arif
@ 2026-03-06 16:23         ` David Hildenbrand (Arm)
  2026-03-06 16:26         ` Zi Yan
  1 sibling, 0 replies; 10+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-06 16:23 UTC (permalink / raw)
  To: Usama Arif, Zi Yan
  Cc: Andrew Morton, npache, linux-mm, matthew.brost, joshua.hahnjy,
	hannes, rakie.kim, byungchul, gourry, ying.huang, apopple,
	linux-kernel, kernel-team

On 3/6/26 17:15, Usama Arif wrote:
> 
> 
> On 06/03/2026 14:46, Zi Yan wrote:
>> On 6 Mar 2026, at 9:12, Usama Arif wrote:
>>
>>>
>>> Yes you are right. How about something like below? We also won't need to check
>>> for anon and non-device folios with this as we only set the the flag if it was
>>> already on deferred_split list.
>>
>> BTW, migrate_pages() tries to split partially mapped folios before migration[1],
>> so what remains in the deferred_list would be:
>>
>> 1. partially mapped but with a pin,
>> 2. fully mapped but potentially underused.
>>
> 
> Yes, thats right.
> 
>> I wonder if you want to do an underused scan before migration and try to split
>> underused THPs.
> 
> hmm, I think we should keep THPs as is if there is no memory pressure (proactive
> or otherwise). Scanning THPs for zeros has a cost and we would also lose the benefit
> of THPs when we dont need memory.
> 
>> Or to avoid this additional scan, find a way of detecting
>> zero pages at page copy time and split it after migration.
>>
> 
> Yeah but I think we lose the benefits of THPs after migration when we dont need
> additional memory?
> 
>> Anyway, it seems that all large folios are in this deferred_list. Maybe, like
>> David suggested in his LSFMM proposal, we should scan large folios on LRU lists
>> at reclaim time instead, since there is not much difference between deferred_list
>> and LRU lists right now.
>>
> 
> Yeah the THP shrinker is a very basic implementation and there are a lot of 
> 
>>
>> [1] https://elixir.bootlin.com/linux/v6.19.3/source/mm/migrate.c#L1840
>>
> 
> Also Johannes pointed out its not great storing this information in page flags,
> we can just keep it as local variable. This is what the patch would look like:

Much cleaner.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: migrate: requeue destination folio on deferred split queue
  2026-03-06 16:15       ` Usama Arif
  2026-03-06 16:23         ` David Hildenbrand (Arm)
@ 2026-03-06 16:26         ` Zi Yan
  1 sibling, 0 replies; 10+ messages in thread
From: Zi Yan @ 2026-03-06 16:26 UTC (permalink / raw)
  To: Usama Arif
  Cc: David Hildenbrand (Arm),
	Andrew Morton, npache, linux-mm, matthew.brost, joshua.hahnjy,
	hannes, rakie.kim, byungchul, gourry, ying.huang, apopple,
	linux-kernel, kernel-team

On 6 Mar 2026, at 11:15, Usama Arif wrote:

> On 06/03/2026 14:46, Zi Yan wrote:
>> On 6 Mar 2026, at 9:12, Usama Arif wrote:
>>
>>> On 06/03/2026 13:49, David Hildenbrand (Arm) wrote:
>>>> On 3/6/26 14:35, Usama Arif wrote:
>>>>> During folio migration, __folio_migrate_mapping() removes the source
>>>>> folio from the deferred split queue, but the destination folio is never
>>>>> re-queued.  This causes underutilized THPs to escape the shrinker after
>>>>> NUMA migration, since they silently drop off the deferred split list.
>>>>>
>>>>> Fix this by calling deferred_split_folio() on the destination folio
>>>>> after a successful migration, for large rmappable folios.
>>>>>
>>>>> Reported-by: Johannes Weiner <hannes@cmpxchg.org>
>>>>> Fixes: dafff3f4c850 ("mm: split underused THPs")
>>>>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>>>>> ---
>>>>>  mm/migrate.c | 11 +++++++++++
>>>>>  1 file changed, 11 insertions(+)
>>>>>
>>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>>> index ece77ccb2ec0..98d0a594f7b7 100644
>>>>> --- a/mm/migrate.c
>>>>> +++ b/mm/migrate.c
>>>>> @@ -1393,6 +1393,17 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>>>>>  	if (old_page_state & PAGE_WAS_MAPPED)
>>>>>  		remove_migration_ptes(src, dst, 0);
>>>>>
>>>>> +	/*
>>>>> +	 * Requeue the destination folio on the deferred split queue if
>>>>> +	 * the source was a large folio that was on the queue. Without
>>>>> +	 * this, NUMA migration causes underutilized THPs to escape
>>>>> +	 * the shrinker since the source is unqueued in
>>>>> +	 * __folio_migrate_mapping() and the destination is never
>>>>> +	 * re-queued.
>>>>> +	 */
>>>>> +	if (folio_test_large(dst) && folio_test_large_rmappable(dst))
>>>>> +		deferred_split_folio(dst, false);
>>>>
>>>> Doesn't that mean that you will readd any large folios, even if already
>>>> previously taken off the list after scanning?
>>>>
>>>> So I am not sure if your "if the source was a large folio that was on
>>>> the queue." comment is accurate?
>>>>
>>>
>>> Yes you are right. How about something like below? We also won't need to check
>>> for anon and non-device folios with this as we only set the the flag if it was
>>> already on deferred_split list.
>>
>> BTW, migrate_pages() tries to split partially mapped folios before migration[1],
>> so what remains in the deferred_list would be:
>>
>> 1. partially mapped but with a pin,
>> 2. fully mapped but potentially underused.
>>
>
> Yes, thats right.
>
>> I wonder if you want to do an underused scan before migration and try to split
>> underused THPs.
>
> hmm, I think we should keep THPs as is if there is no memory pressure (proactive
> or otherwise). Scanning THPs for zeros has a cost and we would also lose the benefit
> of THPs when we dont need memory.

Makes sense.

>
>> Or to avoid this additional scan, find a way of detecting
>> zero pages at page copy time and split it after migration.
>>
>
> Yeah but I think we lose the benefits of THPs after migration when we dont need
> additional memory?

Right.

>
>> Anyway, it seems that all large folios are in this deferred_list. Maybe, like
>> David suggested in his LSFMM proposal, we should scan large folios on LRU lists
>> at reclaim time instead, since there is not much difference between deferred_list
>> and LRU lists right now.
>>
>
> Yeah the THP shrinker is a very basic implementation and there are a lot of
>
>>
>> [1] https://elixir.bootlin.com/linux/v6.19.3/source/mm/migrate.c#L1840
>>
>
> Also Johannes pointed out its not great storing this information in page flags,
> we can just keep it as local variable. This is what the patch would look like:
>
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index ece77ccb2ec0..48a972f158ab 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1360,6 +1360,7 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>         int rc;
>         int old_page_state = 0;
>         struct anon_vma *anon_vma = NULL;
> +       bool src_deferred_split = false;
>         struct list_head *prev;
>
>         __migrate_folio_extract(dst, &old_page_state, &anon_vma);
> @@ -1373,6 +1374,10 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>                 goto out_unlock_both;
>         }
>
> +       if (folio_test_large(src) && folio_test_large_rmappable(src) &&
> +           !data_race(list_empty(&src->_deferred_list)))
> +               src_deferred_split = true;
> +
>         rc = move_to_new_folio(dst, src, mode);
>         if (rc)
>                 goto out;
> @@ -1393,6 +1398,15 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>         if (old_page_state & PAGE_WAS_MAPPED)
>                 remove_migration_ptes(src, dst, 0);
>
> +       /*
> +        * Requeue the destination folio on the deferred split queue if
> +        * the source was on the queue.  The source is unqueued in
> +        * __folio_migrate_mapping(), so we recorded the state from
> +        * before move_to_new_folio().
> +        */
> +       if (src_deferred_split)
> +               deferred_split_folio(dst, false);
> +
>  out_unlock_both:
>         folio_unlock(dst);
>         folio_set_owner_migrate_reason(dst, reason);

LGTM. Thanks for improving it.

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-03-06 16:26 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-03-06 13:35 [PATCH] mm: migrate: requeue destination folio on deferred split queue Usama Arif
2026-03-06 13:49 ` David Hildenbrand (Arm)
2026-03-06 14:12   ` Usama Arif
2026-03-06 14:46     ` Zi Yan
2026-03-06 16:15       ` Usama Arif
2026-03-06 16:23         ` David Hildenbrand (Arm)
2026-03-06 16:26         ` Zi Yan
2026-03-06 16:08     ` Matthew Wilcox
2026-03-06 16:19       ` Usama Arif
2026-03-06 13:51 ` David Hildenbrand (Arm)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox