From: Zi Yan <ziy@nvidia.com>
To: Usama Arif <usama.arif@linux.dev>
Cc: "Mika Penttilä" <mpenttil@redhat.com>,
"Balbir Singh" <balbirs@nvidia.com>,
"Kiryl Shutsemau" <kas@kernel.org>,
matthew.brost@intel.com, npache@redhat.com, david@kernel.org,
"Usama Arif" <usamaarif642@gmail.com>,
"Andrew Morton" <akpm@linux-foundation.org>,
linux-mm@kvack.org, joshua.hahnjy@gmail.com, hannes@cmpxchg.org,
rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net,
ying.huang@linux.alibaba.com, apopple@nvidia.com,
riel@surriel.com, shakeel.butt@linux.dev,
linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH] mm/migrate_device: fix folio refcount leak on folio_split_unmapped failure
Date: Thu, 05 Mar 2026 12:32:51 -0500 [thread overview]
Message-ID: <1EAE2E58-7A71-4B59-B1EF-3A3C753DDC1E@nvidia.com> (raw)
In-Reply-To: <7996d5c5-24db-4ef2-b88a-1b9d33f9e976@linux.dev>
On 5 Mar 2026, at 12:00, Usama Arif wrote:
> On 05/03/2026 16:39, Zi Yan wrote:
>> On 5 Mar 2026, at 11:36, Usama Arif wrote:
>>
>>> On 05/03/2026 12:09, Mika Penttilä wrote:
>>>> On 3/5/26 13:44, Usama Arif wrote:
>>>>
>>>>>
>>>>> On 05/03/2026 06:09, Mika Penttilä wrote:
>>>>>> Hi!
>>>>>>
>>>>>> On 3/5/26 01:28, Usama Arif wrote:
>>>>>>
>>>>>>> On 04/03/2026 22:09, Balbir Singh wrote:
>>>>>>>> On 3/5/26 08:54, Zi Yan wrote:
>>>>>>>>> On 4 Mar 2026, at 16:48, Balbir Singh wrote:
>>>>>>>>>
>>>>>>>>>> On 3/5/26 02:17, Zi Yan wrote:
>>>>>>>>>>> On 4 Mar 2026, at 7:01, Usama Arif wrote:
>>>>>>>>>>>
>>>>>>>>>>>> From: Usama Arif <usama.arif@linux.dev>
>>>>>>>>>>>>
>>>>>>>>>>>> migrate_vma_split_unmapped_folio() takes an extra reference via
>>>>>>>>>>>> folio_get() before calling folio_split_unmapped(). On success, the
>>>>>>>>>>>> split consumes this reference: __folio_freeze_and_split_unmapped()
>>>>>>>>>>>> expects the +1 in its folio_ref_freeze() check, and distributes it
>>>>>>>>>>>> across the resulting sub-folios via folio_ref_unfreeze(...+1), which
>>>>>>>>>>>> are later balanced by folio_put() calls in __migrate_device_finalize().
>>>>>>>>>>>>
>>>>>>>>>>>> If folio_split_unmapped() fails (e.g., unexpected pinning returns
>>>>>>>>>>>> -EAGAIN), the function returns without calling folio_put(). The extra
>>>>>>>>>>>> reference is never released.
>>>>>>>>>>>>
>>>>>>>>>>>> Add the missing folio_put() on the error path.
>>>>>>>>>>>>
>>>>>>>>>>>> Fixes: 4265d67e405a4 ("mm/migrate_device: add THP splitting during migration")
>>>>>>>>>>>> Closes: https://lore.kernel.org/all/CAA1CXcDyqPPwf_-W7B+PFQtL8HdoJGCEqVsVxq7DhOUB=L4PQA@mail.gmail.com/
>>>>>>>>>>>> Reported-by: Nico Pache <npache@redhat.com>
>>>>>>>>>>>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>>>>>>>>>>>> ---
>>>>>>>>>>>> mm/migrate_device.c | 4 +++-
>>>>>>>>>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
>>>>>>>>>>>> index 0a8b31939640f..351ecd9065d13 100644
>>>>>>>>>>>> --- a/mm/migrate_device.c
>>>>>>>>>>>> +++ b/mm/migrate_device.c
>>>>>>>>>>>> @@ -917,8 +917,10 @@ static int migrate_vma_split_unmapped_folio(struct migrate_vma *migrate,
>>>>>>>>>>>> folio_get(folio);
>>>>>>>>>>>> split_huge_pmd_address(migrate->vma, addr, true);
>>>>>>>>>>>> ret = folio_split_unmapped(folio, 0);
>>>>>>>>>>>> - if (ret)
>>>>>>>>>>>> + if (ret) {
>>>>>>>>>>>> + folio_put(folio);
>>>>>>>>>>>> return ret;
>>>>>>>>>>>> + }
>>>>>>>>>>>> migrate->src[idx] &= ~MIGRATE_PFN_COMPOUND;
>>>>>>>>>>>> flags = migrate->src[idx] & ((1UL << MIGRATE_PFN_SHIFT) - 1);
>>>>>>>>>>>> pfn = migrate->src[idx] >> MIGRATE_PFN_SHIFT;
>>>>>>>>>>>> --
>>>>>>>>>>>> 2.47.3
>>>>>>>>>>> Add Balbir, who wrote the code, to comment on this.
>>>>>>>>>>>
>>>>>>>>>> Thanks Zi!
>>>>>>>>>>
>>>>>>>>>> Just wondering if there is a reproducer for the issue and how the fix was tested?
>>>>>>>>>> I expect migrate_vma_finalize() to be called for folios, even when split failed and
>>>>>>>>>> drop the lock.
>>>>>>>>> Does migrate_vma_finalize() do folio_put() for failed-to-split folios?
>>>>>>>>> If so, how does it distinguish between split folios and failed-to-split folios?
>>>>>>>>> By comparing source and destination folio orders?
>>>>>>>>>
>>>>>>>> We reset the MIGRATE_PFN_MIGRATE flag for failing to migrate pfns. We do a folio_put
>>>>>>>> on the src in finalize, if it is split then on all the split folios as well.
>>>>>>>>
>>>>>>>>> What we see from migrate_vma_split_unmapped_folio() is that
>>>>>>>>> it adds a refcount for all input folios, but only drops a refcount
>>>>>>>>> for the split folio. Isn’t it cause failed-to-split folios to have
>>>>>>>>> additional refcount?
>>>>>>>>>
>>>>>>> Hello!
>>>>>>>
>>>>>>> Thanks for reviewing everyone. So its very difficult to create a reproducer I think
>>>>>>> the extra reference would need to appear after migrate_device_unmap() but before
>>>>>>> folio_split_unmapped() in migrate_vma_pages()? That's hard to trigger reliably from
>>>>>>> userspace.
>>>>>>>
>>>>>>> The fix came about when Nico indicated there might be an issue if split_huge_pmd_address
>>>>>>> fails in my patch [1].
>>>>>>>
>>>>>>> Below is my understanding of how refcounting is working over here step by step. I
>>>>>>> might very well be wrong on this, and the refcounting is a bit all over the place
>>>>>>> and I might miss a reference change somewhere so would really appreciate if someone
>>>>>>> can confirm this!
>>>>>>>
>>>>>>>
>>>>>>> 1. migrate_vma_collect_huge_pmd():
>>>>>>> a) folio_get(folio) -> +1 (collect reference)
>>>>>>> 2. migrate_device_unmap():
>>>>>>> a) folio_isolate_lru() -> +1 (isolation reference)
>>>>>>> b) folio_put() -> -1 (drops the collect reference)
>>>>>>>
>>>>>>>
>>>>>>> Without this patch fix:
>>>>>>>
>>>>>>> 3. migrate_vma_split_unmapped_folio():
>>>>>>> a) folio_get(folio) -> +1 (split reference)
>>>>>>> b) folio_split_unmapped() -> fails
>>>>>>> c) Returns error — without folio_put() which is the fix
>>>>>>> 4. Caller in migrate_vma_pages(): clears MIGRATE_PFN_MIGRATE | MIGRATE_PFN_COMPOUND
>>>>>>> 5. __migrate_device_finalize(): sees !(src_pfns[i] & MIGRATE_PFN_MIGRATE), restores the folio:
>>>>>>> a) remove_migration_ptes(src, src) — re-establishes user PTEs
>>>>>>> b) folio_unlock(src)
>>>>>>> c) folio_put(src) -> -1 (drops the isolation reference)
>>>>>>>
>>>>>>> The split reference in 3.a is never released and the folio has a permanently elevated refcount.
>>>>>>> Unless I missed a folio_put somewhere for the refcount increase in folio_isolate_lru() (2.b)?
>>>>>>>
>>>>>>> Please let me know if this makes sense!
>>>>>>>
>>>>>>> [1] https://lore.kernel.org/all/CAA1CXcDyqPPwf_-W7B+PFQtL8HdoJGCEqVsVxq7DhOUB=L4PQA@mail.gmail.com/
>>>>>>>
>>>>>>>> Thanks! Yes, the patch makes sense
>>>>>>>>
>>>>>>>> Acked-by: Balbir Singh <balbirs@nvidia.com>
>>>>>>>>
>>>>>>>> Balbir
>>>>>> I remember stumbling on this while ago also. The folio_get() in migrate_vma_split_unmapped_folio()
>>>>>> is balanced with put_page() in __split_huge_pmd_locked() (freeze = true), can't fail for device pages.
>>>>>> Folios at this point are unmapped but have 1 refcount from "collecting".
>>>>>> After folio_split_unmapped() the refcount(s) is still 1.
>>>>>>
>>>>>> So it seems the code is good as is? A comment though would be good for the extra folio_get..
>>>>>>
>>>>> hmm I dont think the put_page() in __split_huge_pmd_locked() is there to balance the folio_get() in
>>>>> migrate_vma_split_unmapped_folio(). There are other points where split_huge_pmd_locked() is called
>>>>> with freeze = true [1] and they don't get a reference before calling split_huge_pmd.
>>>>>
>>>>> I think the folio_put() in __split_huge_pmd_locked() freeze = true case is there as migration
>>>>> entries are being installed?
>>>>>
>>>>> [1] https://elixir.bootlin.com/linux/v6.19.3/source/mm/rmap.c#L2334
>>>>>
>>>>>
>>>> Yes normally you want to drop the reference when installing migration entries but in this context
>>>> you have already done the collecting for the THP folio and you want to balance with the folio_get()
>>>> the put_page() to keep the refs unchanged. Is that right Balbir?
>>>>
>>>> --Mika
>>>>
>>>
>>> Hi Mika,
>>>
>>> You are right, This patch is wrong. I tried the below diff to force folio_split_unmapped to return
>>> -EAGAIN. I ran tools/testing/selftests/mm/hmm-tests -r hmm.hmm_device_private.migrate_anon_huge_err
>>> to trigger the path for folio_split_unmapped.
>>>
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 8e2746ea74adf..6df33b4990a13 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -4140,6 +4140,8 @@ int folio_split_unmapped(struct folio *folio, unsigned int new_order)
>>> if (folio_expected_ref_count(folio) != folio_ref_count(folio) - 1)
>>> return -EAGAIN;
>>>
>>> + return -EAGAIN;
>>> +
>>> local_irq_disable();
>>> ret = __folio_freeze_and_split_unmapped(folio, new_order, &folio->page, NULL,
>>> NULL, false, NULL, SPLIT_TYPE_UNIFORM,
>>>
>>>
>>>
>>> I inserted a lot of traces to keep track of refcounts [1]. Without this patch, I get
>>> ....
>>> hmm-tests-129 [000] ..... 1.476233: __migrate_device_pages: SPLIT_UNMAPPED: folio=ffc536e2c4100000 refcount=0 AFTER error NO folio_put
>>> hmm-tests-129 [000] ..... 1.476234: __migrate_device_pages: PAGES: split FAILED folio=ffc536e2c4100000 refcount=0
>>> hmm-tests-129 [000] ..... 1.476236: __migrate_device_finalize: FINALIZE[0]: src=ffc536e2c4100000 dst=ffc536e2c4100000 src==dst=1 refcount_src=1 mapcount_src=0 order_src=0 migrate=0 BEFORE remove_migration_ptes
>>> hmm-tests-129 [000] ..... 1.476237: __migrate_device_finalize: FINALIZE[0]: src=ffc536e2c4100000 refcount=1 mapcount=0 AFTER remove_migration_ptes
>>> hmm-tests-129 [000] ..... 1.476237: __migrate_device_finalize: FINALIZE[0]: src=ffc536e2c4100000 refcount=0 AFTER folio_put(src)
>>>
>>> i.e. refcount = 512, which is correct as split_huge_pmd_address was successful. Full output is
>>> at [2].
>>>
>>> With this patch, I get:
>>>
>>> BUG: Bad rss-counter state mm:00000000cfe88d5e type:MM_FILEPAGES val:-511 Comm:bash Pid:63
>>> BUG: Bad rss-counter state mm:00000000cfe88d5e type:MM_ANONPAGES val:511 Comm:bash Pid:63
>>> ...
>>> hmm-tests-129 [000] ..... 1.468315: __migrate_device_pages: SPLIT_UNMAPPED: folio=ffed210c840f0000 refcount=1 AFTER error folio_put FIX PRESENT
>>> hmm-tests-129 [000] ..... 1.468315: __migrate_device_pages: PAGES: split FAILED folio=ffed210c840f0000 refcount=1
>>> hmm-tests-129 [000] ..... 1.468318: __migrate_device_finalize: FINALIZE[0]: src=ffed210c840f0000 dst=ffed210c840f0000 src==dst=1 refcount_src=1 mapcount_src=0 order_src=9 migrate=0 BEFORE remove_migration_ptes
>>> hmm-tests-129 [000] ..... 1.468357: __migrate_device_finalize: FINALIZE[0]: src=ffed210c840f0000 refcount=513 mapcount=512 AFTER remove_migration_ptes
>>> hmm-tests-129 [000] ..... 1.468357: __migrate_device_finalize: FINALIZE[0]: src=ffed210c840f0000 refcount=512 AFTER folio_put(src)
>>>
>>> refcount=0 means the folio would be freed which is not correct. The full output is at [3].
>>>
>>> Thank you for clearing this up!
>>
>> Thank you for doing the investigation. Can you send a patch to add a comment
>> in migrate_vma_split_unmapped_folio() about this to avoid the confusion
>> in the future?
>>
>
> Yeah this was really confusing.
>
> Does something like below look good?
>
> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
> index 78c7acf024615..a302f9d3ce921 100644
> --- a/mm/migrate_device.c
> +++ b/mm/migrate_device.c
> @@ -910,6 +910,11 @@ static int migrate_vma_split_unmapped_folio(struct migrate_vma *migrate,
>
> folio_get(folio);
> split_huge_pmd_address(migrate->vma, addr, true);
> + /*
> + * split_huge_pmd_address consumes the folio_get reference above.
> + * Therefore no folio_put is needed on the folio_split_unmapped
> + * error path.
> + */
> ret = folio_split_unmapped(folio, 0);
> if (ret)
> return ret;
I do not think there is a need to explain why there is no folio_put()
below. How about below?
1. it makes sure the folio has the right ref count,
2. it explains folio_get() is for split_huge_pmd_address() instead of
folio_split_unmapped().
diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index 0a8b31939640f..0b31b878210ba 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -914,8 +914,14 @@ static int migrate_vma_split_unmapped_folio(struct migrate_vma *migrate,
unsigned long flags;
int ret = 0;
+ VM_WARN_ON_ONCE(folio_ref_count(folio) == 1);
+ /*
+ * take a reference, since split_huge_pmd_address() with freeze = true
+ * drops a reference at the end.
+ */
folio_get(folio);
split_huge_pmd_address(migrate->vma, addr, true);
+
ret = folio_split_unmapped(folio, 0);
if (ret)
return ret;
>
>>>
>>>
>>> [1] https://gist.github.com/uarif1/65e1e816af7aa0ae38dd6ec64d62a993
>>> [2] https://gist.github.com/uarif1/79ea9500667daa4e2ef09cb5d308f041
>>> [3] https://gist.github.com/uarif1/8a35a6c65ba8b3a1c1dfe72dc30e821d
>>
>>
>> Best Regards,
>> Yan, Zi
Best Regards,
Yan, Zi
next prev parent reply other threads:[~2026-03-05 17:33 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-04 12:01 Usama Arif
2026-03-04 14:00 ` Kiryl Shutsemau
2026-03-04 15:17 ` Zi Yan
2026-03-04 21:48 ` Balbir Singh
2026-03-04 21:54 ` Zi Yan
2026-03-04 22:02 ` Matthew Brost
2026-03-04 22:09 ` Balbir Singh
2026-03-04 23:28 ` Usama Arif
2026-03-05 6:09 ` Mika Penttilä
2026-03-05 11:44 ` Usama Arif
2026-03-05 12:09 ` Mika Penttilä
2026-03-05 16:36 ` Usama Arif
2026-03-05 16:39 ` Zi Yan
2026-03-05 17:00 ` Usama Arif
2026-03-05 17:32 ` Zi Yan [this message]
2026-03-04 15:25 ` Joshua Hahn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1EAE2E58-7A71-4B59-B1EF-3A3C753DDC1E@nvidia.com \
--to=ziy@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=balbirs@nvidia.com \
--cc=byungchul@sk.com \
--cc=david@kernel.org \
--cc=gourry@gourry.net \
--cc=hannes@cmpxchg.org \
--cc=joshua.hahnjy@gmail.com \
--cc=kas@kernel.org \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=matthew.brost@intel.com \
--cc=mpenttil@redhat.com \
--cc=npache@redhat.com \
--cc=rakie.kim@sk.com \
--cc=riel@surriel.com \
--cc=shakeel.butt@linux.dev \
--cc=usama.arif@linux.dev \
--cc=usamaarif642@gmail.com \
--cc=ying.huang@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox