* [PATCH] mm: migrate: requeue destination folio on deferred split queue
@ 2026-03-06 13:35 Usama Arif
2026-03-06 13:49 ` David Hildenbrand (Arm)
2026-03-06 13:51 ` David Hildenbrand (Arm)
0 siblings, 2 replies; 10+ messages in thread
From: Usama Arif @ 2026-03-06 13:35 UTC (permalink / raw)
To: Andrew Morton, npache, david, ziy, linux-mm
Cc: matthew.brost, joshua.hahnjy, hannes, rakie.kim, byungchul,
gourry, ying.huang, apopple, linux-kernel, kernel-team,
Usama Arif
During folio migration, __folio_migrate_mapping() removes the source
folio from the deferred split queue, but the destination folio is never
re-queued. This causes underutilized THPs to escape the shrinker after
NUMA migration, since they silently drop off the deferred split list.
Fix this by calling deferred_split_folio() on the destination folio
after a successful migration, for large rmappable folios.
Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Fixes: dafff3f4c850 ("mm: split underused THPs")
Signed-off-by: Usama Arif <usama.arif@linux.dev>
---
mm/migrate.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/mm/migrate.c b/mm/migrate.c
index ece77ccb2ec0..98d0a594f7b7 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1393,6 +1393,17 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
if (old_page_state & PAGE_WAS_MAPPED)
remove_migration_ptes(src, dst, 0);
+ /*
+ * Requeue the destination folio on the deferred split queue if
+ * the source was a large folio that was on the queue. Without
+ * this, NUMA migration causes underutilized THPs to escape
+ * the shrinker since the source is unqueued in
+ * __folio_migrate_mapping() and the destination is never
+ * re-queued.
+ */
+ if (folio_test_large(dst) && folio_test_large_rmappable(dst))
+ deferred_split_folio(dst, false);
+
out_unlock_both:
folio_unlock(dst);
folio_set_owner_migrate_reason(dst, reason);
--
2.47.3
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH] mm: migrate: requeue destination folio on deferred split queue 2026-03-06 13:35 [PATCH] mm: migrate: requeue destination folio on deferred split queue Usama Arif @ 2026-03-06 13:49 ` David Hildenbrand (Arm) 2026-03-06 14:12 ` Usama Arif 2026-03-06 13:51 ` David Hildenbrand (Arm) 1 sibling, 1 reply; 10+ messages in thread From: David Hildenbrand (Arm) @ 2026-03-06 13:49 UTC (permalink / raw) To: Usama Arif, Andrew Morton, npache, ziy, linux-mm Cc: matthew.brost, joshua.hahnjy, hannes, rakie.kim, byungchul, gourry, ying.huang, apopple, linux-kernel, kernel-team On 3/6/26 14:35, Usama Arif wrote: > During folio migration, __folio_migrate_mapping() removes the source > folio from the deferred split queue, but the destination folio is never > re-queued. This causes underutilized THPs to escape the shrinker after > NUMA migration, since they silently drop off the deferred split list. > > Fix this by calling deferred_split_folio() on the destination folio > after a successful migration, for large rmappable folios. > > Reported-by: Johannes Weiner <hannes@cmpxchg.org> > Fixes: dafff3f4c850 ("mm: split underused THPs") > Signed-off-by: Usama Arif <usama.arif@linux.dev> > --- > mm/migrate.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/mm/migrate.c b/mm/migrate.c > index ece77ccb2ec0..98d0a594f7b7 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -1393,6 +1393,17 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private, > if (old_page_state & PAGE_WAS_MAPPED) > remove_migration_ptes(src, dst, 0); > > + /* > + * Requeue the destination folio on the deferred split queue if > + * the source was a large folio that was on the queue. Without > + * this, NUMA migration causes underutilized THPs to escape > + * the shrinker since the source is unqueued in > + * __folio_migrate_mapping() and the destination is never > + * re-queued. > + */ > + if (folio_test_large(dst) && folio_test_large_rmappable(dst)) > + deferred_split_folio(dst, false); Doesn't that mean that you will readd any large folios, even if already previously taken off the list after scanning? So I am not sure if your "if the source was a large folio that was on the queue." comment is accurate? -- Cheers, David ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: migrate: requeue destination folio on deferred split queue 2026-03-06 13:49 ` David Hildenbrand (Arm) @ 2026-03-06 14:12 ` Usama Arif 2026-03-06 14:46 ` Zi Yan 2026-03-06 16:08 ` Matthew Wilcox 0 siblings, 2 replies; 10+ messages in thread From: Usama Arif @ 2026-03-06 14:12 UTC (permalink / raw) To: David Hildenbrand (Arm), Andrew Morton, npache, ziy, linux-mm Cc: matthew.brost, joshua.hahnjy, hannes, rakie.kim, byungchul, gourry, ying.huang, apopple, linux-kernel, kernel-team On 06/03/2026 13:49, David Hildenbrand (Arm) wrote: > On 3/6/26 14:35, Usama Arif wrote: >> During folio migration, __folio_migrate_mapping() removes the source >> folio from the deferred split queue, but the destination folio is never >> re-queued. This causes underutilized THPs to escape the shrinker after >> NUMA migration, since they silently drop off the deferred split list. >> >> Fix this by calling deferred_split_folio() on the destination folio >> after a successful migration, for large rmappable folios. >> >> Reported-by: Johannes Weiner <hannes@cmpxchg.org> >> Fixes: dafff3f4c850 ("mm: split underused THPs") >> Signed-off-by: Usama Arif <usama.arif@linux.dev> >> --- >> mm/migrate.c | 11 +++++++++++ >> 1 file changed, 11 insertions(+) >> >> diff --git a/mm/migrate.c b/mm/migrate.c >> index ece77ccb2ec0..98d0a594f7b7 100644 >> --- a/mm/migrate.c >> +++ b/mm/migrate.c >> @@ -1393,6 +1393,17 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private, >> if (old_page_state & PAGE_WAS_MAPPED) >> remove_migration_ptes(src, dst, 0); >> >> + /* >> + * Requeue the destination folio on the deferred split queue if >> + * the source was a large folio that was on the queue. Without >> + * this, NUMA migration causes underutilized THPs to escape >> + * the shrinker since the source is unqueued in >> + * __folio_migrate_mapping() and the destination is never >> + * re-queued. >> + */ >> + if (folio_test_large(dst) && folio_test_large_rmappable(dst)) >> + deferred_split_folio(dst, false); > > Doesn't that mean that you will readd any large folios, even if already > previously taken off the list after scanning? > > So I am not sure if your "if the source was a large folio that was on > the queue." comment is accurate? > Yes you are right. How about something like below? We also won't need to check for anon and non-device folios with this as we only set the the flag if it was already on deferred_split list. diff --git a/mm/migrate.c b/mm/migrate.c index ece77ccb2ec0..9e0780d380e4 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1137,7 +1137,9 @@ static int move_to_new_folio(struct folio *dst, struct folio *src, enum { PAGE_WAS_MAPPED = BIT(0), PAGE_WAS_MLOCKED = BIT(1), - PAGE_OLD_STATES = PAGE_WAS_MAPPED | PAGE_WAS_MLOCKED, + PAGE_WAS_ON_DEFERRED_SPLIT = BIT(2), + PAGE_OLD_STATES = PAGE_WAS_MAPPED | PAGE_WAS_MLOCKED | + PAGE_WAS_ON_DEFERRED_SPLIT, }; static void __migrate_folio_record(struct folio *dst, @@ -1373,6 +1375,15 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private, goto out_unlock_both; } + /* + * Record whether the source folio is on the deferred split queue + * before move_to_new_folio(), which unqueues it via + * __folio_migrate_mapping(). + */ + if (folio_test_large(src) && folio_test_large_rmappable(src) && + !data_race(list_empty(&src->_deferred_list))) + old_page_state |= PAGE_WAS_ON_DEFERRED_SPLIT; + rc = move_to_new_folio(dst, src, mode); if (rc) goto out; @@ -1393,6 +1404,15 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private, if (old_page_state & PAGE_WAS_MAPPED) remove_migration_ptes(src, dst, 0); + /* + * Requeue the destination folio on the deferred split queue if + * the source was on the queue. The source is unqueued in + * __folio_migrate_mapping(), so we record and check the state + * from before move_to_new_folio(). + */ + if (old_page_state & PAGE_WAS_ON_DEFERRED_SPLIT) + deferred_split_folio(dst, false); + out_unlock_both: folio_unlock(dst); folio_set_owner_migrate_reason(dst, reason); ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: migrate: requeue destination folio on deferred split queue 2026-03-06 14:12 ` Usama Arif @ 2026-03-06 14:46 ` Zi Yan 2026-03-06 16:15 ` Usama Arif 2026-03-06 16:08 ` Matthew Wilcox 1 sibling, 1 reply; 10+ messages in thread From: Zi Yan @ 2026-03-06 14:46 UTC (permalink / raw) To: Usama Arif Cc: David Hildenbrand (Arm), Andrew Morton, npache, linux-mm, matthew.brost, joshua.hahnjy, hannes, rakie.kim, byungchul, gourry, ying.huang, apopple, linux-kernel, kernel-team On 6 Mar 2026, at 9:12, Usama Arif wrote: > On 06/03/2026 13:49, David Hildenbrand (Arm) wrote: >> On 3/6/26 14:35, Usama Arif wrote: >>> During folio migration, __folio_migrate_mapping() removes the source >>> folio from the deferred split queue, but the destination folio is never >>> re-queued. This causes underutilized THPs to escape the shrinker after >>> NUMA migration, since they silently drop off the deferred split list. >>> >>> Fix this by calling deferred_split_folio() on the destination folio >>> after a successful migration, for large rmappable folios. >>> >>> Reported-by: Johannes Weiner <hannes@cmpxchg.org> >>> Fixes: dafff3f4c850 ("mm: split underused THPs") >>> Signed-off-by: Usama Arif <usama.arif@linux.dev> >>> --- >>> mm/migrate.c | 11 +++++++++++ >>> 1 file changed, 11 insertions(+) >>> >>> diff --git a/mm/migrate.c b/mm/migrate.c >>> index ece77ccb2ec0..98d0a594f7b7 100644 >>> --- a/mm/migrate.c >>> +++ b/mm/migrate.c >>> @@ -1393,6 +1393,17 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private, >>> if (old_page_state & PAGE_WAS_MAPPED) >>> remove_migration_ptes(src, dst, 0); >>> >>> + /* >>> + * Requeue the destination folio on the deferred split queue if >>> + * the source was a large folio that was on the queue. Without >>> + * this, NUMA migration causes underutilized THPs to escape >>> + * the shrinker since the source is unqueued in >>> + * __folio_migrate_mapping() and the destination is never >>> + * re-queued. >>> + */ >>> + if (folio_test_large(dst) && folio_test_large_rmappable(dst)) >>> + deferred_split_folio(dst, false); >> >> Doesn't that mean that you will readd any large folios, even if already >> previously taken off the list after scanning? >> >> So I am not sure if your "if the source was a large folio that was on >> the queue." comment is accurate? >> > > Yes you are right. How about something like below? We also won't need to check > for anon and non-device folios with this as we only set the the flag if it was > already on deferred_split list. BTW, migrate_pages() tries to split partially mapped folios before migration[1], so what remains in the deferred_list would be: 1. partially mapped but with a pin, 2. fully mapped but potentially underused. I wonder if you want to do an underused scan before migration and try to split underused THPs. Or to avoid this additional scan, find a way of detecting zero pages at page copy time and split it after migration. Anyway, it seems that all large folios are in this deferred_list. Maybe, like David suggested in his LSFMM proposal, we should scan large folios on LRU lists at reclaim time instead, since there is not much difference between deferred_list and LRU lists right now. [1] https://elixir.bootlin.com/linux/v6.19.3/source/mm/migrate.c#L1840 > > > diff --git a/mm/migrate.c b/mm/migrate.c > index ece77ccb2ec0..9e0780d380e4 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -1137,7 +1137,9 @@ static int move_to_new_folio(struct folio *dst, struct folio *src, > enum { > PAGE_WAS_MAPPED = BIT(0), > PAGE_WAS_MLOCKED = BIT(1), > - PAGE_OLD_STATES = PAGE_WAS_MAPPED | PAGE_WAS_MLOCKED, > + PAGE_WAS_ON_DEFERRED_SPLIT = BIT(2), > + PAGE_OLD_STATES = PAGE_WAS_MAPPED | PAGE_WAS_MLOCKED | > + PAGE_WAS_ON_DEFERRED_SPLIT, > }; > > static void __migrate_folio_record(struct folio *dst, > @@ -1373,6 +1375,15 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private, > goto out_unlock_both; > } > > + /* > + * Record whether the source folio is on the deferred split queue > + * before move_to_new_folio(), which unqueues it via > + * __folio_migrate_mapping(). > + */ > + if (folio_test_large(src) && folio_test_large_rmappable(src) && > + !data_race(list_empty(&src->_deferred_list))) > + old_page_state |= PAGE_WAS_ON_DEFERRED_SPLIT; > + > rc = move_to_new_folio(dst, src, mode); > if (rc) > goto out; > @@ -1393,6 +1404,15 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private, > if (old_page_state & PAGE_WAS_MAPPED) > remove_migration_ptes(src, dst, 0); > > + /* > + * Requeue the destination folio on the deferred split queue if > + * the source was on the queue. The source is unqueued in > + * __folio_migrate_mapping(), so we record and check the state > + * from before move_to_new_folio(). > + */ > + if (old_page_state & PAGE_WAS_ON_DEFERRED_SPLIT) > + deferred_split_folio(dst, false); > + > out_unlock_both: > folio_unlock(dst); > folio_set_owner_migrate_reason(dst, reason); Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: migrate: requeue destination folio on deferred split queue 2026-03-06 14:46 ` Zi Yan @ 2026-03-06 16:15 ` Usama Arif 2026-03-06 16:23 ` David Hildenbrand (Arm) 2026-03-06 16:26 ` Zi Yan 0 siblings, 2 replies; 10+ messages in thread From: Usama Arif @ 2026-03-06 16:15 UTC (permalink / raw) To: Zi Yan Cc: David Hildenbrand (Arm), Andrew Morton, npache, linux-mm, matthew.brost, joshua.hahnjy, hannes, rakie.kim, byungchul, gourry, ying.huang, apopple, linux-kernel, kernel-team On 06/03/2026 14:46, Zi Yan wrote: > On 6 Mar 2026, at 9:12, Usama Arif wrote: > >> On 06/03/2026 13:49, David Hildenbrand (Arm) wrote: >>> On 3/6/26 14:35, Usama Arif wrote: >>>> During folio migration, __folio_migrate_mapping() removes the source >>>> folio from the deferred split queue, but the destination folio is never >>>> re-queued. This causes underutilized THPs to escape the shrinker after >>>> NUMA migration, since they silently drop off the deferred split list. >>>> >>>> Fix this by calling deferred_split_folio() on the destination folio >>>> after a successful migration, for large rmappable folios. >>>> >>>> Reported-by: Johannes Weiner <hannes@cmpxchg.org> >>>> Fixes: dafff3f4c850 ("mm: split underused THPs") >>>> Signed-off-by: Usama Arif <usama.arif@linux.dev> >>>> --- >>>> mm/migrate.c | 11 +++++++++++ >>>> 1 file changed, 11 insertions(+) >>>> >>>> diff --git a/mm/migrate.c b/mm/migrate.c >>>> index ece77ccb2ec0..98d0a594f7b7 100644 >>>> --- a/mm/migrate.c >>>> +++ b/mm/migrate.c >>>> @@ -1393,6 +1393,17 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private, >>>> if (old_page_state & PAGE_WAS_MAPPED) >>>> remove_migration_ptes(src, dst, 0); >>>> >>>> + /* >>>> + * Requeue the destination folio on the deferred split queue if >>>> + * the source was a large folio that was on the queue. Without >>>> + * this, NUMA migration causes underutilized THPs to escape >>>> + * the shrinker since the source is unqueued in >>>> + * __folio_migrate_mapping() and the destination is never >>>> + * re-queued. >>>> + */ >>>> + if (folio_test_large(dst) && folio_test_large_rmappable(dst)) >>>> + deferred_split_folio(dst, false); >>> >>> Doesn't that mean that you will readd any large folios, even if already >>> previously taken off the list after scanning? >>> >>> So I am not sure if your "if the source was a large folio that was on >>> the queue." comment is accurate? >>> >> >> Yes you are right. How about something like below? We also won't need to check >> for anon and non-device folios with this as we only set the the flag if it was >> already on deferred_split list. > > BTW, migrate_pages() tries to split partially mapped folios before migration[1], > so what remains in the deferred_list would be: > > 1. partially mapped but with a pin, > 2. fully mapped but potentially underused. > Yes, thats right. > I wonder if you want to do an underused scan before migration and try to split > underused THPs. hmm, I think we should keep THPs as is if there is no memory pressure (proactive or otherwise). Scanning THPs for zeros has a cost and we would also lose the benefit of THPs when we dont need memory. > Or to avoid this additional scan, find a way of detecting > zero pages at page copy time and split it after migration. > Yeah but I think we lose the benefits of THPs after migration when we dont need additional memory? > Anyway, it seems that all large folios are in this deferred_list. Maybe, like > David suggested in his LSFMM proposal, we should scan large folios on LRU lists > at reclaim time instead, since there is not much difference between deferred_list > and LRU lists right now. > Yeah the THP shrinker is a very basic implementation and there are a lot of > > [1] https://elixir.bootlin.com/linux/v6.19.3/source/mm/migrate.c#L1840 > Also Johannes pointed out its not great storing this information in page flags, we can just keep it as local variable. This is what the patch would look like: diff --git a/mm/migrate.c b/mm/migrate.c index ece77ccb2ec0..48a972f158ab 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1360,6 +1360,7 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private, int rc; int old_page_state = 0; struct anon_vma *anon_vma = NULL; + bool src_deferred_split = false; struct list_head *prev; __migrate_folio_extract(dst, &old_page_state, &anon_vma); @@ -1373,6 +1374,10 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private, goto out_unlock_both; } + if (folio_test_large(src) && folio_test_large_rmappable(src) && + !data_race(list_empty(&src->_deferred_list))) + src_deferred_split = true; + rc = move_to_new_folio(dst, src, mode); if (rc) goto out; @@ -1393,6 +1398,15 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private, if (old_page_state & PAGE_WAS_MAPPED) remove_migration_ptes(src, dst, 0); + /* + * Requeue the destination folio on the deferred split queue if + * the source was on the queue. The source is unqueued in + * __folio_migrate_mapping(), so we recorded the state from + * before move_to_new_folio(). + */ + if (src_deferred_split) + deferred_split_folio(dst, false); + out_unlock_both: folio_unlock(dst); folio_set_owner_migrate_reason(dst, reason); ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: migrate: requeue destination folio on deferred split queue 2026-03-06 16:15 ` Usama Arif @ 2026-03-06 16:23 ` David Hildenbrand (Arm) 2026-03-06 16:26 ` Zi Yan 1 sibling, 0 replies; 10+ messages in thread From: David Hildenbrand (Arm) @ 2026-03-06 16:23 UTC (permalink / raw) To: Usama Arif, Zi Yan Cc: Andrew Morton, npache, linux-mm, matthew.brost, joshua.hahnjy, hannes, rakie.kim, byungchul, gourry, ying.huang, apopple, linux-kernel, kernel-team On 3/6/26 17:15, Usama Arif wrote: > > > On 06/03/2026 14:46, Zi Yan wrote: >> On 6 Mar 2026, at 9:12, Usama Arif wrote: >> >>> >>> Yes you are right. How about something like below? We also won't need to check >>> for anon and non-device folios with this as we only set the the flag if it was >>> already on deferred_split list. >> >> BTW, migrate_pages() tries to split partially mapped folios before migration[1], >> so what remains in the deferred_list would be: >> >> 1. partially mapped but with a pin, >> 2. fully mapped but potentially underused. >> > > Yes, thats right. > >> I wonder if you want to do an underused scan before migration and try to split >> underused THPs. > > hmm, I think we should keep THPs as is if there is no memory pressure (proactive > or otherwise). Scanning THPs for zeros has a cost and we would also lose the benefit > of THPs when we dont need memory. > >> Or to avoid this additional scan, find a way of detecting >> zero pages at page copy time and split it after migration. >> > > Yeah but I think we lose the benefits of THPs after migration when we dont need > additional memory? > >> Anyway, it seems that all large folios are in this deferred_list. Maybe, like >> David suggested in his LSFMM proposal, we should scan large folios on LRU lists >> at reclaim time instead, since there is not much difference between deferred_list >> and LRU lists right now. >> > > Yeah the THP shrinker is a very basic implementation and there are a lot of > >> >> [1] https://elixir.bootlin.com/linux/v6.19.3/source/mm/migrate.c#L1840 >> > > Also Johannes pointed out its not great storing this information in page flags, > we can just keep it as local variable. This is what the patch would look like: Much cleaner. -- Cheers, David ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: migrate: requeue destination folio on deferred split queue 2026-03-06 16:15 ` Usama Arif 2026-03-06 16:23 ` David Hildenbrand (Arm) @ 2026-03-06 16:26 ` Zi Yan 1 sibling, 0 replies; 10+ messages in thread From: Zi Yan @ 2026-03-06 16:26 UTC (permalink / raw) To: Usama Arif Cc: David Hildenbrand (Arm), Andrew Morton, npache, linux-mm, matthew.brost, joshua.hahnjy, hannes, rakie.kim, byungchul, gourry, ying.huang, apopple, linux-kernel, kernel-team On 6 Mar 2026, at 11:15, Usama Arif wrote: > On 06/03/2026 14:46, Zi Yan wrote: >> On 6 Mar 2026, at 9:12, Usama Arif wrote: >> >>> On 06/03/2026 13:49, David Hildenbrand (Arm) wrote: >>>> On 3/6/26 14:35, Usama Arif wrote: >>>>> During folio migration, __folio_migrate_mapping() removes the source >>>>> folio from the deferred split queue, but the destination folio is never >>>>> re-queued. This causes underutilized THPs to escape the shrinker after >>>>> NUMA migration, since they silently drop off the deferred split list. >>>>> >>>>> Fix this by calling deferred_split_folio() on the destination folio >>>>> after a successful migration, for large rmappable folios. >>>>> >>>>> Reported-by: Johannes Weiner <hannes@cmpxchg.org> >>>>> Fixes: dafff3f4c850 ("mm: split underused THPs") >>>>> Signed-off-by: Usama Arif <usama.arif@linux.dev> >>>>> --- >>>>> mm/migrate.c | 11 +++++++++++ >>>>> 1 file changed, 11 insertions(+) >>>>> >>>>> diff --git a/mm/migrate.c b/mm/migrate.c >>>>> index ece77ccb2ec0..98d0a594f7b7 100644 >>>>> --- a/mm/migrate.c >>>>> +++ b/mm/migrate.c >>>>> @@ -1393,6 +1393,17 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private, >>>>> if (old_page_state & PAGE_WAS_MAPPED) >>>>> remove_migration_ptes(src, dst, 0); >>>>> >>>>> + /* >>>>> + * Requeue the destination folio on the deferred split queue if >>>>> + * the source was a large folio that was on the queue. Without >>>>> + * this, NUMA migration causes underutilized THPs to escape >>>>> + * the shrinker since the source is unqueued in >>>>> + * __folio_migrate_mapping() and the destination is never >>>>> + * re-queued. >>>>> + */ >>>>> + if (folio_test_large(dst) && folio_test_large_rmappable(dst)) >>>>> + deferred_split_folio(dst, false); >>>> >>>> Doesn't that mean that you will readd any large folios, even if already >>>> previously taken off the list after scanning? >>>> >>>> So I am not sure if your "if the source was a large folio that was on >>>> the queue." comment is accurate? >>>> >>> >>> Yes you are right. How about something like below? We also won't need to check >>> for anon and non-device folios with this as we only set the the flag if it was >>> already on deferred_split list. >> >> BTW, migrate_pages() tries to split partially mapped folios before migration[1], >> so what remains in the deferred_list would be: >> >> 1. partially mapped but with a pin, >> 2. fully mapped but potentially underused. >> > > Yes, thats right. > >> I wonder if you want to do an underused scan before migration and try to split >> underused THPs. > > hmm, I think we should keep THPs as is if there is no memory pressure (proactive > or otherwise). Scanning THPs for zeros has a cost and we would also lose the benefit > of THPs when we dont need memory. Makes sense. > >> Or to avoid this additional scan, find a way of detecting >> zero pages at page copy time and split it after migration. >> > > Yeah but I think we lose the benefits of THPs after migration when we dont need > additional memory? Right. > >> Anyway, it seems that all large folios are in this deferred_list. Maybe, like >> David suggested in his LSFMM proposal, we should scan large folios on LRU lists >> at reclaim time instead, since there is not much difference between deferred_list >> and LRU lists right now. >> > > Yeah the THP shrinker is a very basic implementation and there are a lot of > >> >> [1] https://elixir.bootlin.com/linux/v6.19.3/source/mm/migrate.c#L1840 >> > > Also Johannes pointed out its not great storing this information in page flags, > we can just keep it as local variable. This is what the patch would look like: > > > diff --git a/mm/migrate.c b/mm/migrate.c > index ece77ccb2ec0..48a972f158ab 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -1360,6 +1360,7 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private, > int rc; > int old_page_state = 0; > struct anon_vma *anon_vma = NULL; > + bool src_deferred_split = false; > struct list_head *prev; > > __migrate_folio_extract(dst, &old_page_state, &anon_vma); > @@ -1373,6 +1374,10 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private, > goto out_unlock_both; > } > > + if (folio_test_large(src) && folio_test_large_rmappable(src) && > + !data_race(list_empty(&src->_deferred_list))) > + src_deferred_split = true; > + > rc = move_to_new_folio(dst, src, mode); > if (rc) > goto out; > @@ -1393,6 +1398,15 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private, > if (old_page_state & PAGE_WAS_MAPPED) > remove_migration_ptes(src, dst, 0); > > + /* > + * Requeue the destination folio on the deferred split queue if > + * the source was on the queue. The source is unqueued in > + * __folio_migrate_mapping(), so we recorded the state from > + * before move_to_new_folio(). > + */ > + if (src_deferred_split) > + deferred_split_folio(dst, false); > + > out_unlock_both: > folio_unlock(dst); > folio_set_owner_migrate_reason(dst, reason); LGTM. Thanks for improving it. Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: migrate: requeue destination folio on deferred split queue 2026-03-06 14:12 ` Usama Arif 2026-03-06 14:46 ` Zi Yan @ 2026-03-06 16:08 ` Matthew Wilcox 2026-03-06 16:19 ` Usama Arif 1 sibling, 1 reply; 10+ messages in thread From: Matthew Wilcox @ 2026-03-06 16:08 UTC (permalink / raw) To: Usama Arif Cc: David Hildenbrand (Arm), Andrew Morton, npache, ziy, linux-mm, matthew.brost, joshua.hahnjy, hannes, rakie.kim, byungchul, gourry, ying.huang, apopple, linux-kernel, kernel-team On Fri, Mar 06, 2026 at 05:12:38PM +0300, Usama Arif wrote: > + /* > + * Record whether the source folio is on the deferred split queue > + * before move_to_new_folio(), which unqueues it via > + * __folio_migrate_mapping(). > + */ > + if (folio_test_large(src) && folio_test_large_rmappable(src) && > + !data_race(list_empty(&src->_deferred_list))) Why do you need data_race() here? list_empty() contains a READ_ONCE() so shouldn't be necessary? > + old_page_state |= PAGE_WAS_ON_DEFERRED_SPLIT; You've done a great job of the naming. So much so that the comment seems entirely unnecessary? > + /* > + * Requeue the destination folio on the deferred split queue if > + * the source was on the queue. The source is unqueued in > + * __folio_migrate_mapping(), so we record and check the state > + * from before move_to_new_folio(). > + */ > + if (old_page_state & PAGE_WAS_ON_DEFERRED_SPLIT) > + deferred_split_folio(dst, false); Again, I'm not sure the comment says anything that the code doesn't? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: migrate: requeue destination folio on deferred split queue 2026-03-06 16:08 ` Matthew Wilcox @ 2026-03-06 16:19 ` Usama Arif 0 siblings, 0 replies; 10+ messages in thread From: Usama Arif @ 2026-03-06 16:19 UTC (permalink / raw) To: Matthew Wilcox Cc: David Hildenbrand (Arm), Andrew Morton, npache, ziy, linux-mm, matthew.brost, joshua.hahnjy, hannes, rakie.kim, byungchul, gourry, ying.huang, apopple, linux-kernel, kernel-team On 06/03/2026 16:08, Matthew Wilcox wrote: > On Fri, Mar 06, 2026 at 05:12:38PM +0300, Usama Arif wrote: >> + /* >> + * Record whether the source folio is on the deferred split queue >> + * before move_to_new_folio(), which unqueues it via >> + * __folio_migrate_mapping(). >> + */ >> + if (folio_test_large(src) && folio_test_large_rmappable(src) && >> + !data_race(list_empty(&src->_deferred_list))) > > Why do you need data_race() here? list_empty() contains a READ_ONCE() > so shouldn't be necessary? Ah mainly because we dont acquire split_queue_lock before accessing, similar to what we do in folio_unqueue_deferred_split(). > >> + old_page_state |= PAGE_WAS_ON_DEFERRED_SPLIT; > > You've done a great job of the naming. So much so that the comment > seems entirely unnecessary? > >> + /* >> + * Requeue the destination folio on the deferred split queue if >> + * the source was on the queue. The source is unqueued in >> + * __folio_migrate_mapping(), so we record and check the state >> + * from before move_to_new_folio(). >> + */ >> + if (old_page_state & PAGE_WAS_ON_DEFERRED_SPLIT) >> + deferred_split_folio(dst, false); > > Again, I'm not sure the comment says anything that the code doesn't? Yeah there is a much simpler version in reply to Zi's review in [1]. I found the whole migrate_folio_move() function quite verbose already, was keeping up with the theme of that function lol. I will see if I can cut down on the comment in [1] as well. Thanks! [1] https://lore.kernel.org/all/28e48b47-f215-4e4a-b55a-01dbf293ff35@linux.dev/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: migrate: requeue destination folio on deferred split queue 2026-03-06 13:35 [PATCH] mm: migrate: requeue destination folio on deferred split queue Usama Arif 2026-03-06 13:49 ` David Hildenbrand (Arm) @ 2026-03-06 13:51 ` David Hildenbrand (Arm) 1 sibling, 0 replies; 10+ messages in thread From: David Hildenbrand (Arm) @ 2026-03-06 13:51 UTC (permalink / raw) To: Usama Arif, Andrew Morton, npache, ziy, linux-mm Cc: matthew.brost, joshua.hahnjy, hannes, rakie.kim, byungchul, gourry, ying.huang, apopple, linux-kernel, kernel-team On 3/6/26 14:35, Usama Arif wrote: > During folio migration, __folio_migrate_mapping() removes the source > folio from the deferred split queue, but the destination folio is never > re-queued. This causes underutilized THPs to escape the shrinker after > NUMA migration, since they silently drop off the deferred split list. > > Fix this by calling deferred_split_folio() on the destination folio > after a successful migration, for large rmappable folios. > > Reported-by: Johannes Weiner <hannes@cmpxchg.org> > Fixes: dafff3f4c850 ("mm: split underused THPs") > Signed-off-by: Usama Arif <usama.arif@linux.dev> > --- > mm/migrate.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/mm/migrate.c b/mm/migrate.c > index ece77ccb2ec0..98d0a594f7b7 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -1393,6 +1393,17 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private, > if (old_page_state & PAGE_WAS_MAPPED) > remove_migration_ptes(src, dst, 0); > > + /* > + * Requeue the destination folio on the deferred split queue if > + * the source was a large folio that was on the queue. Without > + * this, NUMA migration causes underutilized THPs to escape > + * the shrinker since the source is unqueued in > + * __folio_migrate_mapping() and the destination is never > + * re-queued. > + */ > + if (folio_test_large(dst) && folio_test_large_rmappable(dst)) > + deferred_split_folio(dst, false); Also, should you be checking for anon and non-device folios? -- Cheers, David ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-03-06 16:26 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2026-03-06 13:35 [PATCH] mm: migrate: requeue destination folio on deferred split queue Usama Arif 2026-03-06 13:49 ` David Hildenbrand (Arm) 2026-03-06 14:12 ` Usama Arif 2026-03-06 14:46 ` Zi Yan 2026-03-06 16:15 ` Usama Arif 2026-03-06 16:23 ` David Hildenbrand (Arm) 2026-03-06 16:26 ` Zi Yan 2026-03-06 16:08 ` Matthew Wilcox 2026-03-06 16:19 ` Usama Arif 2026-03-06 13:51 ` David Hildenbrand (Arm)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox