From: Usama Arif <usama.arif@linux.dev>
To: Balbir Singh <balbirs@nvidia.com>, Zi Yan <ziy@nvidia.com>,
Kiryl Shutsemau <kas@kernel.org>,
matthew.brost@intel.com, npache@redhat.com, david@kernel.org
Cc: Usama Arif <usamaarif642@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, joshua.hahnjy@gmail.com, hannes@cmpxchg.org,
rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net,
ying.huang@linux.alibaba.com, apopple@nvidia.com,
riel@surriel.com, shakeel.butt@linux.dev, kas@kernel.org,
linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH] mm/migrate_device: fix folio refcount leak on folio_split_unmapped failure
Date: Thu, 5 Mar 2026 02:28:39 +0300 [thread overview]
Message-ID: <622eb392-8c04-473d-b42a-ecdc489799c4@linux.dev> (raw)
In-Reply-To: <5e59c077-9f06-4e45-86e1-ca696e6105b4@nvidia.com>
On 04/03/2026 22:09, Balbir Singh wrote:
> On 3/5/26 08:54, Zi Yan wrote:
>> On 4 Mar 2026, at 16:48, Balbir Singh wrote:
>>
>>> On 3/5/26 02:17, Zi Yan wrote:
>>>> On 4 Mar 2026, at 7:01, Usama Arif wrote:
>>>>
>>>>> From: Usama Arif <usama.arif@linux.dev>
>>>>>
>>>>> migrate_vma_split_unmapped_folio() takes an extra reference via
>>>>> folio_get() before calling folio_split_unmapped(). On success, the
>>>>> split consumes this reference: __folio_freeze_and_split_unmapped()
>>>>> expects the +1 in its folio_ref_freeze() check, and distributes it
>>>>> across the resulting sub-folios via folio_ref_unfreeze(...+1), which
>>>>> are later balanced by folio_put() calls in __migrate_device_finalize().
>>>>>
>>>>> If folio_split_unmapped() fails (e.g., unexpected pinning returns
>>>>> -EAGAIN), the function returns without calling folio_put(). The extra
>>>>> reference is never released.
>>>>>
>>>>> Add the missing folio_put() on the error path.
>>>>>
>>>>> Fixes: 4265d67e405a4 ("mm/migrate_device: add THP splitting during migration")
>>>>> Closes: https://lore.kernel.org/all/CAA1CXcDyqPPwf_-W7B+PFQtL8HdoJGCEqVsVxq7DhOUB=L4PQA@mail.gmail.com/
>>>>> Reported-by: Nico Pache <npache@redhat.com>
>>>>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>>>>> ---
>>>>> mm/migrate_device.c | 4 +++-
>>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
>>>>> index 0a8b31939640f..351ecd9065d13 100644
>>>>> --- a/mm/migrate_device.c
>>>>> +++ b/mm/migrate_device.c
>>>>> @@ -917,8 +917,10 @@ static int migrate_vma_split_unmapped_folio(struct migrate_vma *migrate,
>>>>> folio_get(folio);
>>>>> split_huge_pmd_address(migrate->vma, addr, true);
>>>>> ret = folio_split_unmapped(folio, 0);
>>>>> - if (ret)
>>>>> + if (ret) {
>>>>> + folio_put(folio);
>>>>> return ret;
>>>>> + }
>>>>> migrate->src[idx] &= ~MIGRATE_PFN_COMPOUND;
>>>>> flags = migrate->src[idx] & ((1UL << MIGRATE_PFN_SHIFT) - 1);
>>>>> pfn = migrate->src[idx] >> MIGRATE_PFN_SHIFT;
>>>>> --
>>>>> 2.47.3
>>>>
>>>> Add Balbir, who wrote the code, to comment on this.
>>>>
>>>
>>> Thanks Zi!
>>>
>>> Just wondering if there is a reproducer for the issue and how the fix was tested?
>>> I expect migrate_vma_finalize() to be called for folios, even when split failed and
>>> drop the lock.
>>
>> Does migrate_vma_finalize() do folio_put() for failed-to-split folios?
>> If so, how does it distinguish between split folios and failed-to-split folios?
>> By comparing source and destination folio orders?
>>
>
> We reset the MIGRATE_PFN_MIGRATE flag for failing to migrate pfns. We do a folio_put
> on the src in finalize, if it is split then on all the split folios as well.
>
>> What we see from migrate_vma_split_unmapped_folio() is that
>> it adds a refcount for all input folios, but only drops a refcount
>> for the split folio. Isn’t it cause failed-to-split folios to have
>> additional refcount?
>>
Hello!
Thanks for reviewing everyone. So its very difficult to create a reproducer I think
the extra reference would need to appear after migrate_device_unmap() but before
folio_split_unmapped() in migrate_vma_pages()? That's hard to trigger reliably from
userspace.
The fix came about when Nico indicated there might be an issue if split_huge_pmd_address
fails in my patch [1].
Below is my understanding of how refcounting is working over here step by step. I
might very well be wrong on this, and the refcounting is a bit all over the place
and I might miss a reference change somewhere so would really appreciate if someone
can confirm this!
1. migrate_vma_collect_huge_pmd():
a) folio_get(folio) -> +1 (collect reference)
2. migrate_device_unmap():
a) folio_isolate_lru() -> +1 (isolation reference)
b) folio_put() -> -1 (drops the collect reference)
Without this patch fix:
3. migrate_vma_split_unmapped_folio():
a) folio_get(folio) -> +1 (split reference)
b) folio_split_unmapped() -> fails
c) Returns error — without folio_put() which is the fix
4. Caller in migrate_vma_pages(): clears MIGRATE_PFN_MIGRATE | MIGRATE_PFN_COMPOUND
5. __migrate_device_finalize(): sees !(src_pfns[i] & MIGRATE_PFN_MIGRATE), restores the folio:
a) remove_migration_ptes(src, src) — re-establishes user PTEs
b) folio_unlock(src)
c) folio_put(src) -> -1 (drops the isolation reference)
The split reference in 3.a is never released and the folio has a permanently elevated refcount.
Unless I missed a folio_put somewhere for the refcount increase in folio_isolate_lru() (2.b)?
Please let me know if this makes sense!
[1] https://lore.kernel.org/all/CAA1CXcDyqPPwf_-W7B+PFQtL8HdoJGCEqVsVxq7DhOUB=L4PQA@mail.gmail.com/
>
> Thanks! Yes, the patch makes sense
>
> Acked-by: Balbir Singh <balbirs@nvidia.com>
>
> Balbir
next prev parent reply other threads:[~2026-03-04 23:28 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-04 12:01 Usama Arif
2026-03-04 14:00 ` Kiryl Shutsemau
2026-03-04 15:17 ` Zi Yan
2026-03-04 21:48 ` Balbir Singh
2026-03-04 21:54 ` Zi Yan
2026-03-04 22:02 ` Matthew Brost
2026-03-04 22:09 ` Balbir Singh
2026-03-04 23:28 ` Usama Arif [this message]
2026-03-05 6:09 ` Mika Penttilä
2026-03-05 11:44 ` Usama Arif
2026-03-05 12:09 ` Mika Penttilä
2026-03-04 15:25 ` Joshua Hahn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=622eb392-8c04-473d-b42a-ecdc489799c4@linux.dev \
--to=usama.arif@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=balbirs@nvidia.com \
--cc=byungchul@sk.com \
--cc=david@kernel.org \
--cc=gourry@gourry.net \
--cc=hannes@cmpxchg.org \
--cc=joshua.hahnjy@gmail.com \
--cc=kas@kernel.org \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=matthew.brost@intel.com \
--cc=npache@redhat.com \
--cc=rakie.kim@sk.com \
--cc=riel@surriel.com \
--cc=shakeel.butt@linux.dev \
--cc=usamaarif642@gmail.com \
--cc=ying.huang@linux.alibaba.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox