linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Usama Arif <usama.arif@linux.dev>
To: Zi Yan <ziy@nvidia.com>
Cc: "David Hildenbrand (Arm)" <david@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	npache@redhat.com, linux-mm@kvack.org, matthew.brost@intel.com,
	joshua.hahnjy@gmail.com, hannes@cmpxchg.org, rakie.kim@sk.com,
	byungchul@sk.com, gourry@gourry.net,
	ying.huang@linux.alibaba.com, apopple@nvidia.com,
	linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH] mm: migrate: requeue destination folio on deferred split queue
Date: Fri, 6 Mar 2026 19:15:07 +0300	[thread overview]
Message-ID: <28e48b47-f215-4e4a-b55a-01dbf293ff35@linux.dev> (raw)
In-Reply-To: <993B37FF-29E1-41F5-A1E8-F38B9CD24478@nvidia.com>



On 06/03/2026 14:46, Zi Yan wrote:
> On 6 Mar 2026, at 9:12, Usama Arif wrote:
> 
>> On 06/03/2026 13:49, David Hildenbrand (Arm) wrote:
>>> On 3/6/26 14:35, Usama Arif wrote:
>>>> During folio migration, __folio_migrate_mapping() removes the source
>>>> folio from the deferred split queue, but the destination folio is never
>>>> re-queued.  This causes underutilized THPs to escape the shrinker after
>>>> NUMA migration, since they silently drop off the deferred split list.
>>>>
>>>> Fix this by calling deferred_split_folio() on the destination folio
>>>> after a successful migration, for large rmappable folios.
>>>>
>>>> Reported-by: Johannes Weiner <hannes@cmpxchg.org>
>>>> Fixes: dafff3f4c850 ("mm: split underused THPs")
>>>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>>>> ---
>>>>  mm/migrate.c | 11 +++++++++++
>>>>  1 file changed, 11 insertions(+)
>>>>
>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>> index ece77ccb2ec0..98d0a594f7b7 100644
>>>> --- a/mm/migrate.c
>>>> +++ b/mm/migrate.c
>>>> @@ -1393,6 +1393,17 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>>>>  	if (old_page_state & PAGE_WAS_MAPPED)
>>>>  		remove_migration_ptes(src, dst, 0);
>>>>
>>>> +	/*
>>>> +	 * Requeue the destination folio on the deferred split queue if
>>>> +	 * the source was a large folio that was on the queue. Without
>>>> +	 * this, NUMA migration causes underutilized THPs to escape
>>>> +	 * the shrinker since the source is unqueued in
>>>> +	 * __folio_migrate_mapping() and the destination is never
>>>> +	 * re-queued.
>>>> +	 */
>>>> +	if (folio_test_large(dst) && folio_test_large_rmappable(dst))
>>>> +		deferred_split_folio(dst, false);
>>>
>>> Doesn't that mean that you will readd any large folios, even if already
>>> previously taken off the list after scanning?
>>>
>>> So I am not sure if your "if the source was a large folio that was on
>>> the queue." comment is accurate?
>>>
>>
>> Yes you are right. How about something like below? We also won't need to check
>> for anon and non-device folios with this as we only set the the flag if it was
>> already on deferred_split list.
> 
> BTW, migrate_pages() tries to split partially mapped folios before migration[1],
> so what remains in the deferred_list would be:
> 
> 1. partially mapped but with a pin,
> 2. fully mapped but potentially underused.
> 

Yes, thats right.

> I wonder if you want to do an underused scan before migration and try to split
> underused THPs.

hmm, I think we should keep THPs as is if there is no memory pressure (proactive
or otherwise). Scanning THPs for zeros has a cost and we would also lose the benefit
of THPs when we dont need memory.

> Or to avoid this additional scan, find a way of detecting
> zero pages at page copy time and split it after migration.
> 

Yeah but I think we lose the benefits of THPs after migration when we dont need
additional memory?

> Anyway, it seems that all large folios are in this deferred_list. Maybe, like
> David suggested in his LSFMM proposal, we should scan large folios on LRU lists
> at reclaim time instead, since there is not much difference between deferred_list
> and LRU lists right now.
> 

Yeah the THP shrinker is a very basic implementation and there are a lot of 

> 
> [1] https://elixir.bootlin.com/linux/v6.19.3/source/mm/migrate.c#L1840
> 

Also Johannes pointed out its not great storing this information in page flags,
we can just keep it as local variable. This is what the patch would look like:


diff --git a/mm/migrate.c b/mm/migrate.c
index ece77ccb2ec0..48a972f158ab 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1360,6 +1360,7 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
        int rc;
        int old_page_state = 0;
        struct anon_vma *anon_vma = NULL;
+       bool src_deferred_split = false;
        struct list_head *prev;
 
        __migrate_folio_extract(dst, &old_page_state, &anon_vma);
@@ -1373,6 +1374,10 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
                goto out_unlock_both;
        }
 
+       if (folio_test_large(src) && folio_test_large_rmappable(src) &&
+           !data_race(list_empty(&src->_deferred_list)))
+               src_deferred_split = true;
+
        rc = move_to_new_folio(dst, src, mode);
        if (rc)
                goto out;
@@ -1393,6 +1398,15 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
        if (old_page_state & PAGE_WAS_MAPPED)
                remove_migration_ptes(src, dst, 0);
 
+       /*
+        * Requeue the destination folio on the deferred split queue if
+        * the source was on the queue.  The source is unqueued in
+        * __folio_migrate_mapping(), so we recorded the state from
+        * before move_to_new_folio().
+        */
+       if (src_deferred_split)
+               deferred_split_folio(dst, false);
+
 out_unlock_both:
        folio_unlock(dst);
        folio_set_owner_migrate_reason(dst, reason);
 


  reply	other threads:[~2026-03-06 16:15 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-06 13:35 Usama Arif
2026-03-06 13:49 ` David Hildenbrand (Arm)
2026-03-06 14:12   ` Usama Arif
2026-03-06 14:46     ` Zi Yan
2026-03-06 16:15       ` Usama Arif [this message]
2026-03-06 16:23         ` David Hildenbrand (Arm)
2026-03-06 16:26         ` Zi Yan
2026-03-06 16:08     ` Matthew Wilcox
2026-03-06 16:19       ` Usama Arif
2026-03-06 13:51 ` David Hildenbrand (Arm)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=28e48b47-f215-4e4a-b55a-01dbf293ff35@linux.dev \
    --to=usama.arif@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=byungchul@sk.com \
    --cc=david@kernel.org \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matthew.brost@intel.com \
    --cc=npache@redhat.com \
    --cc=rakie.kim@sk.com \
    --cc=ying.huang@linux.alibaba.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox