* [PATCH] mm/migrate_device: fix folio refcount leak on folio_split_unmapped failure
@ 2026-03-04 12:01 Usama Arif
2026-03-04 14:00 ` Kiryl Shutsemau
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Usama Arif @ 2026-03-04 12:01 UTC (permalink / raw)
To: Andrew Morton, npache, david, ziy, linux-mm
Cc: matthew.brost, joshua.hahnjy, hannes, rakie.kim, byungchul,
gourry, ying.huang, apopple, riel, shakeel.butt, kas,
linux-kernel, kernel-team, Usama Arif
From: Usama Arif <usama.arif@linux.dev>
migrate_vma_split_unmapped_folio() takes an extra reference via
folio_get() before calling folio_split_unmapped(). On success, the
split consumes this reference: __folio_freeze_and_split_unmapped()
expects the +1 in its folio_ref_freeze() check, and distributes it
across the resulting sub-folios via folio_ref_unfreeze(...+1), which
are later balanced by folio_put() calls in __migrate_device_finalize().
If folio_split_unmapped() fails (e.g., unexpected pinning returns
-EAGAIN), the function returns without calling folio_put(). The extra
reference is never released.
Add the missing folio_put() on the error path.
Fixes: 4265d67e405a4 ("mm/migrate_device: add THP splitting during migration")
Closes: https://lore.kernel.org/all/CAA1CXcDyqPPwf_-W7B+PFQtL8HdoJGCEqVsVxq7DhOUB=L4PQA@mail.gmail.com/
Reported-by: Nico Pache <npache@redhat.com>
Signed-off-by: Usama Arif <usama.arif@linux.dev>
---
mm/migrate_device.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index 0a8b31939640f..351ecd9065d13 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -917,8 +917,10 @@ static int migrate_vma_split_unmapped_folio(struct migrate_vma *migrate,
folio_get(folio);
split_huge_pmd_address(migrate->vma, addr, true);
ret = folio_split_unmapped(folio, 0);
- if (ret)
+ if (ret) {
+ folio_put(folio);
return ret;
+ }
migrate->src[idx] &= ~MIGRATE_PFN_COMPOUND;
flags = migrate->src[idx] & ((1UL << MIGRATE_PFN_SHIFT) - 1);
pfn = migrate->src[idx] >> MIGRATE_PFN_SHIFT;
--
2.47.3
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mm/migrate_device: fix folio refcount leak on folio_split_unmapped failure
2026-03-04 12:01 [PATCH] mm/migrate_device: fix folio refcount leak on folio_split_unmapped failure Usama Arif
@ 2026-03-04 14:00 ` Kiryl Shutsemau
2026-03-04 15:17 ` Zi Yan
2026-03-04 15:25 ` Joshua Hahn
2 siblings, 0 replies; 16+ messages in thread
From: Kiryl Shutsemau @ 2026-03-04 14:00 UTC (permalink / raw)
To: Usama Arif
Cc: Andrew Morton, npache, david, ziy, linux-mm, matthew.brost,
joshua.hahnjy, hannes, rakie.kim, byungchul, gourry, ying.huang,
apopple, riel, shakeel.butt, linux-kernel, kernel-team,
Usama Arif
On Wed, Mar 04, 2026 at 04:01:32AM -0800, Usama Arif wrote:
> From: Usama Arif <usama.arif@linux.dev>
>
> migrate_vma_split_unmapped_folio() takes an extra reference via
> folio_get() before calling folio_split_unmapped(). On success, the
> split consumes this reference: __folio_freeze_and_split_unmapped()
> expects the +1 in its folio_ref_freeze() check, and distributes it
> across the resulting sub-folios via folio_ref_unfreeze(...+1), which
> are later balanced by folio_put() calls in __migrate_device_finalize().
Without this explanation folio_get() looks very random. And I still
can't say I understand reference management for the folios here.
Who takes reference for the folio if it !THP that gets return in the
_finalize()?
Can we get reference for THP and !THP at the same spot?
I think we should avoid spacial-casing THP where possible.
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mm/migrate_device: fix folio refcount leak on folio_split_unmapped failure
2026-03-04 12:01 [PATCH] mm/migrate_device: fix folio refcount leak on folio_split_unmapped failure Usama Arif
2026-03-04 14:00 ` Kiryl Shutsemau
@ 2026-03-04 15:17 ` Zi Yan
2026-03-04 21:48 ` Balbir Singh
2026-03-04 15:25 ` Joshua Hahn
2 siblings, 1 reply; 16+ messages in thread
From: Zi Yan @ 2026-03-04 15:17 UTC (permalink / raw)
To: Usama Arif, Balbir Singh
Cc: Andrew Morton, npache, david, linux-mm, matthew.brost,
joshua.hahnjy, hannes, rakie.kim, byungchul, gourry, ying.huang,
apopple, riel, shakeel.butt, kas, linux-kernel, kernel-team,
Usama Arif
On 4 Mar 2026, at 7:01, Usama Arif wrote:
> From: Usama Arif <usama.arif@linux.dev>
>
> migrate_vma_split_unmapped_folio() takes an extra reference via
> folio_get() before calling folio_split_unmapped(). On success, the
> split consumes this reference: __folio_freeze_and_split_unmapped()
> expects the +1 in its folio_ref_freeze() check, and distributes it
> across the resulting sub-folios via folio_ref_unfreeze(...+1), which
> are later balanced by folio_put() calls in __migrate_device_finalize().
>
> If folio_split_unmapped() fails (e.g., unexpected pinning returns
> -EAGAIN), the function returns without calling folio_put(). The extra
> reference is never released.
>
> Add the missing folio_put() on the error path.
>
> Fixes: 4265d67e405a4 ("mm/migrate_device: add THP splitting during migration")
> Closes: https://lore.kernel.org/all/CAA1CXcDyqPPwf_-W7B+PFQtL8HdoJGCEqVsVxq7DhOUB=L4PQA@mail.gmail.com/
> Reported-by: Nico Pache <npache@redhat.com>
> Signed-off-by: Usama Arif <usama.arif@linux.dev>
> ---
> mm/migrate_device.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
> index 0a8b31939640f..351ecd9065d13 100644
> --- a/mm/migrate_device.c
> +++ b/mm/migrate_device.c
> @@ -917,8 +917,10 @@ static int migrate_vma_split_unmapped_folio(struct migrate_vma *migrate,
> folio_get(folio);
> split_huge_pmd_address(migrate->vma, addr, true);
> ret = folio_split_unmapped(folio, 0);
> - if (ret)
> + if (ret) {
> + folio_put(folio);
> return ret;
> + }
> migrate->src[idx] &= ~MIGRATE_PFN_COMPOUND;
> flags = migrate->src[idx] & ((1UL << MIGRATE_PFN_SHIFT) - 1);
> pfn = migrate->src[idx] >> MIGRATE_PFN_SHIFT;
> --
> 2.47.3
Add Balbir, who wrote the code, to comment on this.
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mm/migrate_device: fix folio refcount leak on folio_split_unmapped failure
2026-03-04 12:01 [PATCH] mm/migrate_device: fix folio refcount leak on folio_split_unmapped failure Usama Arif
2026-03-04 14:00 ` Kiryl Shutsemau
2026-03-04 15:17 ` Zi Yan
@ 2026-03-04 15:25 ` Joshua Hahn
2 siblings, 0 replies; 16+ messages in thread
From: Joshua Hahn @ 2026-03-04 15:25 UTC (permalink / raw)
To: Usama Arif
Cc: Andrew Morton, npache, david, ziy, linux-mm, matthew.brost,
joshua.hahnjy, hannes, rakie.kim, byungchul, gourry, ying.huang,
apopple, riel, shakeel.butt, kas, linux-kernel, kernel-team,
Usama Arif
On Wed, 4 Mar 2026 04:01:32 -0800 Usama Arif <usamaarif642@gmail.com> wrote:
> From: Usama Arif <usama.arif@linux.dev>
>
> migrate_vma_split_unmapped_folio() takes an extra reference via
> folio_get() before calling folio_split_unmapped(). On success, the
> split consumes this reference: __folio_freeze_and_split_unmapped()
> expects the +1 in its folio_ref_freeze() check, and distributes it
> across the resulting sub-folios via folio_ref_unfreeze(...+1), which
> are later balanced by folio_put() calls in __migrate_device_finalize().
>
> If folio_split_unmapped() fails (e.g., unexpected pinning returns
> -EAGAIN), the function returns without calling folio_put(). The extra
> reference is never released.
>
> Add the missing folio_put() on the error path.
Agreed with Kiryl that maybe there is an opportunity to do additional
cleanup. But as a fix for the issue, I think that this patch looks good
to me. We can send a follow-up patch in the future if we want to
clean this area up : -)
Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> Fixes: 4265d67e405a4 ("mm/migrate_device: add THP splitting during migration")
> Closes: https://lore.kernel.org/all/CAA1CXcDyqPPwf_-W7B+PFQtL8HdoJGCEqVsVxq7DhOUB=L4PQA@mail.gmail.com/
> Reported-by: Nico Pache <npache@redhat.com>
> Signed-off-by: Usama Arif <usama.arif@linux.dev>
> ---
> mm/migrate_device.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
> index 0a8b31939640f..351ecd9065d13 100644
> --- a/mm/migrate_device.c
> +++ b/mm/migrate_device.c
> @@ -917,8 +917,10 @@ static int migrate_vma_split_unmapped_folio(struct migrate_vma *migrate,
> folio_get(folio);
> split_huge_pmd_address(migrate->vma, addr, true);
> ret = folio_split_unmapped(folio, 0);
> - if (ret)
> + if (ret) {
> + folio_put(folio);
> return ret;
> + }
> migrate->src[idx] &= ~MIGRATE_PFN_COMPOUND;
> flags = migrate->src[idx] & ((1UL << MIGRATE_PFN_SHIFT) - 1);
> pfn = migrate->src[idx] >> MIGRATE_PFN_SHIFT;
> --
> 2.47.3
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mm/migrate_device: fix folio refcount leak on folio_split_unmapped failure
2026-03-04 15:17 ` Zi Yan
@ 2026-03-04 21:48 ` Balbir Singh
2026-03-04 21:54 ` Zi Yan
0 siblings, 1 reply; 16+ messages in thread
From: Balbir Singh @ 2026-03-04 21:48 UTC (permalink / raw)
To: Zi Yan, Usama Arif
Cc: Andrew Morton, npache, david, linux-mm, matthew.brost,
joshua.hahnjy, hannes, rakie.kim, byungchul, gourry, ying.huang,
apopple, riel, shakeel.butt, kas, linux-kernel, kernel-team,
Usama Arif
On 3/5/26 02:17, Zi Yan wrote:
> On 4 Mar 2026, at 7:01, Usama Arif wrote:
>
>> From: Usama Arif <usama.arif@linux.dev>
>>
>> migrate_vma_split_unmapped_folio() takes an extra reference via
>> folio_get() before calling folio_split_unmapped(). On success, the
>> split consumes this reference: __folio_freeze_and_split_unmapped()
>> expects the +1 in its folio_ref_freeze() check, and distributes it
>> across the resulting sub-folios via folio_ref_unfreeze(...+1), which
>> are later balanced by folio_put() calls in __migrate_device_finalize().
>>
>> If folio_split_unmapped() fails (e.g., unexpected pinning returns
>> -EAGAIN), the function returns without calling folio_put(). The extra
>> reference is never released.
>>
>> Add the missing folio_put() on the error path.
>>
>> Fixes: 4265d67e405a4 ("mm/migrate_device: add THP splitting during migration")
>> Closes: https://lore.kernel.org/all/CAA1CXcDyqPPwf_-W7B+PFQtL8HdoJGCEqVsVxq7DhOUB=L4PQA@mail.gmail.com/
>> Reported-by: Nico Pache <npache@redhat.com>
>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>> ---
>> mm/migrate_device.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
>> index 0a8b31939640f..351ecd9065d13 100644
>> --- a/mm/migrate_device.c
>> +++ b/mm/migrate_device.c
>> @@ -917,8 +917,10 @@ static int migrate_vma_split_unmapped_folio(struct migrate_vma *migrate,
>> folio_get(folio);
>> split_huge_pmd_address(migrate->vma, addr, true);
>> ret = folio_split_unmapped(folio, 0);
>> - if (ret)
>> + if (ret) {
>> + folio_put(folio);
>> return ret;
>> + }
>> migrate->src[idx] &= ~MIGRATE_PFN_COMPOUND;
>> flags = migrate->src[idx] & ((1UL << MIGRATE_PFN_SHIFT) - 1);
>> pfn = migrate->src[idx] >> MIGRATE_PFN_SHIFT;
>> --
>> 2.47.3
>
> Add Balbir, who wrote the code, to comment on this.
>
Thanks Zi!
Just wondering if there is a reproducer for the issue and how the fix was tested?
I expect migrate_vma_finalize() to be called for folios, even when split failed and
drop the lock.
Balbir
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mm/migrate_device: fix folio refcount leak on folio_split_unmapped failure
2026-03-04 21:48 ` Balbir Singh
@ 2026-03-04 21:54 ` Zi Yan
2026-03-04 22:02 ` Matthew Brost
2026-03-04 22:09 ` Balbir Singh
0 siblings, 2 replies; 16+ messages in thread
From: Zi Yan @ 2026-03-04 21:54 UTC (permalink / raw)
To: Balbir Singh
Cc: Usama Arif, Andrew Morton, npache, david, linux-mm,
matthew.brost, joshua.hahnjy, hannes, rakie.kim, byungchul,
gourry, ying.huang, apopple, riel, shakeel.butt, kas,
linux-kernel, kernel-team, Usama Arif
On 4 Mar 2026, at 16:48, Balbir Singh wrote:
> On 3/5/26 02:17, Zi Yan wrote:
>> On 4 Mar 2026, at 7:01, Usama Arif wrote:
>>
>>> From: Usama Arif <usama.arif@linux.dev>
>>>
>>> migrate_vma_split_unmapped_folio() takes an extra reference via
>>> folio_get() before calling folio_split_unmapped(). On success, the
>>> split consumes this reference: __folio_freeze_and_split_unmapped()
>>> expects the +1 in its folio_ref_freeze() check, and distributes it
>>> across the resulting sub-folios via folio_ref_unfreeze(...+1), which
>>> are later balanced by folio_put() calls in __migrate_device_finalize().
>>>
>>> If folio_split_unmapped() fails (e.g., unexpected pinning returns
>>> -EAGAIN), the function returns without calling folio_put(). The extra
>>> reference is never released.
>>>
>>> Add the missing folio_put() on the error path.
>>>
>>> Fixes: 4265d67e405a4 ("mm/migrate_device: add THP splitting during migration")
>>> Closes: https://lore.kernel.org/all/CAA1CXcDyqPPwf_-W7B+PFQtL8HdoJGCEqVsVxq7DhOUB=L4PQA@mail.gmail.com/
>>> Reported-by: Nico Pache <npache@redhat.com>
>>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>>> ---
>>> mm/migrate_device.c | 4 +++-
>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
>>> index 0a8b31939640f..351ecd9065d13 100644
>>> --- a/mm/migrate_device.c
>>> +++ b/mm/migrate_device.c
>>> @@ -917,8 +917,10 @@ static int migrate_vma_split_unmapped_folio(struct migrate_vma *migrate,
>>> folio_get(folio);
>>> split_huge_pmd_address(migrate->vma, addr, true);
>>> ret = folio_split_unmapped(folio, 0);
>>> - if (ret)
>>> + if (ret) {
>>> + folio_put(folio);
>>> return ret;
>>> + }
>>> migrate->src[idx] &= ~MIGRATE_PFN_COMPOUND;
>>> flags = migrate->src[idx] & ((1UL << MIGRATE_PFN_SHIFT) - 1);
>>> pfn = migrate->src[idx] >> MIGRATE_PFN_SHIFT;
>>> --
>>> 2.47.3
>>
>> Add Balbir, who wrote the code, to comment on this.
>>
>
> Thanks Zi!
>
> Just wondering if there is a reproducer for the issue and how the fix was tested?
> I expect migrate_vma_finalize() to be called for folios, even when split failed and
> drop the lock.
Does migrate_vma_finalize() do folio_put() for failed-to-split folios?
If so, how does it distinguish between split folios and failed-to-split folios?
By comparing source and destination folio orders?
What we see from migrate_vma_split_unmapped_folio() is that
it adds a refcount for all input folios, but only drops a refcount
for the split folio. Isn’t it cause failed-to-split folios to have
additional refcount?
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mm/migrate_device: fix folio refcount leak on folio_split_unmapped failure
2026-03-04 21:54 ` Zi Yan
@ 2026-03-04 22:02 ` Matthew Brost
2026-03-04 22:09 ` Balbir Singh
1 sibling, 0 replies; 16+ messages in thread
From: Matthew Brost @ 2026-03-04 22:02 UTC (permalink / raw)
To: Zi Yan
Cc: Balbir Singh, Usama Arif, Andrew Morton, npache, david, linux-mm,
joshua.hahnjy, hannes, rakie.kim, byungchul, gourry, ying.huang,
apopple, riel, shakeel.butt, kas, linux-kernel, kernel-team,
Usama Arif
On Wed, Mar 04, 2026 at 04:54:01PM -0500, Zi Yan wrote:
> On 4 Mar 2026, at 16:48, Balbir Singh wrote:
>
> > On 3/5/26 02:17, Zi Yan wrote:
> >> On 4 Mar 2026, at 7:01, Usama Arif wrote:
> >>
> >>> From: Usama Arif <usama.arif@linux.dev>
> >>>
> >>> migrate_vma_split_unmapped_folio() takes an extra reference via
> >>> folio_get() before calling folio_split_unmapped(). On success, the
> >>> split consumes this reference: __folio_freeze_and_split_unmapped()
> >>> expects the +1 in its folio_ref_freeze() check, and distributes it
> >>> across the resulting sub-folios via folio_ref_unfreeze(...+1), which
> >>> are later balanced by folio_put() calls in __migrate_device_finalize().
> >>>
> >>> If folio_split_unmapped() fails (e.g., unexpected pinning returns
> >>> -EAGAIN), the function returns without calling folio_put(). The extra
> >>> reference is never released.
> >>>
> >>> Add the missing folio_put() on the error path.
> >>>
> >>> Fixes: 4265d67e405a4 ("mm/migrate_device: add THP splitting during migration")
> >>> Closes: https://lore.kernel.org/all/CAA1CXcDyqPPwf_-W7B+PFQtL8HdoJGCEqVsVxq7DhOUB=L4PQA@mail.gmail.com/
> >>> Reported-by: Nico Pache <npache@redhat.com>
> >>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
> >>> ---
> >>> mm/migrate_device.c | 4 +++-
> >>> 1 file changed, 3 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
> >>> index 0a8b31939640f..351ecd9065d13 100644
> >>> --- a/mm/migrate_device.c
> >>> +++ b/mm/migrate_device.c
> >>> @@ -917,8 +917,10 @@ static int migrate_vma_split_unmapped_folio(struct migrate_vma *migrate,
> >>> folio_get(folio);
> >>> split_huge_pmd_address(migrate->vma, addr, true);
> >>> ret = folio_split_unmapped(folio, 0);
> >>> - if (ret)
> >>> + if (ret) {
> >>> + folio_put(folio);
> >>> return ret;
> >>> + }
> >>> migrate->src[idx] &= ~MIGRATE_PFN_COMPOUND;
> >>> flags = migrate->src[idx] & ((1UL << MIGRATE_PFN_SHIFT) - 1);
> >>> pfn = migrate->src[idx] >> MIGRATE_PFN_SHIFT;
> >>> --
> >>> 2.47.3
> >>
> >> Add Balbir, who wrote the code, to comment on this.
> >>
> >
> > Thanks Zi!
> >
> > Just wondering if there is a reproducer for the issue and how the fix was tested?
> > I expect migrate_vma_finalize() to be called for folios, even when split failed and
> > drop the lock.
>
> Does migrate_vma_finalize() do folio_put() for failed-to-split folios?
> If so, how does it distinguish between split folios and failed-to-split folios?
> By comparing source and destination folio orders?
>
> What we see from migrate_vma_split_unmapped_folio() is that
> it adds a refcount for all input folios, but only drops a refcount
> for the split folio. Isn’t it cause failed-to-split folios to have
> additional refcount?
I wonder if I’ve actually seen this bug. I’ve occasionally seen CPU page
faults hang forever spinning, which could be caused by the page’s
refcount accidentally being increased here. It’s quite difficult and
random to reproduce, so I don’t have a real analysis of what’s happening
in this case.
Matt
>
>
> Best Regards,
> Yan, Zi
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mm/migrate_device: fix folio refcount leak on folio_split_unmapped failure
2026-03-04 21:54 ` Zi Yan
2026-03-04 22:02 ` Matthew Brost
@ 2026-03-04 22:09 ` Balbir Singh
2026-03-04 23:28 ` Usama Arif
1 sibling, 1 reply; 16+ messages in thread
From: Balbir Singh @ 2026-03-04 22:09 UTC (permalink / raw)
To: Zi Yan
Cc: Usama Arif, Andrew Morton, npache, david, linux-mm,
matthew.brost, joshua.hahnjy, hannes, rakie.kim, byungchul,
gourry, ying.huang, apopple, riel, shakeel.butt, kas,
linux-kernel, kernel-team, Usama Arif
On 3/5/26 08:54, Zi Yan wrote:
> On 4 Mar 2026, at 16:48, Balbir Singh wrote:
>
>> On 3/5/26 02:17, Zi Yan wrote:
>>> On 4 Mar 2026, at 7:01, Usama Arif wrote:
>>>
>>>> From: Usama Arif <usama.arif@linux.dev>
>>>>
>>>> migrate_vma_split_unmapped_folio() takes an extra reference via
>>>> folio_get() before calling folio_split_unmapped(). On success, the
>>>> split consumes this reference: __folio_freeze_and_split_unmapped()
>>>> expects the +1 in its folio_ref_freeze() check, and distributes it
>>>> across the resulting sub-folios via folio_ref_unfreeze(...+1), which
>>>> are later balanced by folio_put() calls in __migrate_device_finalize().
>>>>
>>>> If folio_split_unmapped() fails (e.g., unexpected pinning returns
>>>> -EAGAIN), the function returns without calling folio_put(). The extra
>>>> reference is never released.
>>>>
>>>> Add the missing folio_put() on the error path.
>>>>
>>>> Fixes: 4265d67e405a4 ("mm/migrate_device: add THP splitting during migration")
>>>> Closes: https://lore.kernel.org/all/CAA1CXcDyqPPwf_-W7B+PFQtL8HdoJGCEqVsVxq7DhOUB=L4PQA@mail.gmail.com/
>>>> Reported-by: Nico Pache <npache@redhat.com>
>>>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>>>> ---
>>>> mm/migrate_device.c | 4 +++-
>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
>>>> index 0a8b31939640f..351ecd9065d13 100644
>>>> --- a/mm/migrate_device.c
>>>> +++ b/mm/migrate_device.c
>>>> @@ -917,8 +917,10 @@ static int migrate_vma_split_unmapped_folio(struct migrate_vma *migrate,
>>>> folio_get(folio);
>>>> split_huge_pmd_address(migrate->vma, addr, true);
>>>> ret = folio_split_unmapped(folio, 0);
>>>> - if (ret)
>>>> + if (ret) {
>>>> + folio_put(folio);
>>>> return ret;
>>>> + }
>>>> migrate->src[idx] &= ~MIGRATE_PFN_COMPOUND;
>>>> flags = migrate->src[idx] & ((1UL << MIGRATE_PFN_SHIFT) - 1);
>>>> pfn = migrate->src[idx] >> MIGRATE_PFN_SHIFT;
>>>> --
>>>> 2.47.3
>>>
>>> Add Balbir, who wrote the code, to comment on this.
>>>
>>
>> Thanks Zi!
>>
>> Just wondering if there is a reproducer for the issue and how the fix was tested?
>> I expect migrate_vma_finalize() to be called for folios, even when split failed and
>> drop the lock.
>
> Does migrate_vma_finalize() do folio_put() for failed-to-split folios?
> If so, how does it distinguish between split folios and failed-to-split folios?
> By comparing source and destination folio orders?
>
We reset the MIGRATE_PFN_MIGRATE flag for failing to migrate pfns. We do a folio_put
on the src in finalize, if it is split then on all the split folios as well.
> What we see from migrate_vma_split_unmapped_folio() is that
> it adds a refcount for all input folios, but only drops a refcount
> for the split folio. Isn’t it cause failed-to-split folios to have
> additional refcount?
>
Thanks! Yes, the patch makes sense
Acked-by: Balbir Singh <balbirs@nvidia.com>
Balbir
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mm/migrate_device: fix folio refcount leak on folio_split_unmapped failure
2026-03-04 22:09 ` Balbir Singh
@ 2026-03-04 23:28 ` Usama Arif
2026-03-05 6:09 ` Mika Penttilä
0 siblings, 1 reply; 16+ messages in thread
From: Usama Arif @ 2026-03-04 23:28 UTC (permalink / raw)
To: Balbir Singh, Zi Yan, Kiryl Shutsemau, matthew.brost, npache, david
Cc: Usama Arif, Andrew Morton, linux-mm, joshua.hahnjy, hannes,
rakie.kim, byungchul, gourry, ying.huang, apopple, riel,
shakeel.butt, kas, linux-kernel, kernel-team
On 04/03/2026 22:09, Balbir Singh wrote:
> On 3/5/26 08:54, Zi Yan wrote:
>> On 4 Mar 2026, at 16:48, Balbir Singh wrote:
>>
>>> On 3/5/26 02:17, Zi Yan wrote:
>>>> On 4 Mar 2026, at 7:01, Usama Arif wrote:
>>>>
>>>>> From: Usama Arif <usama.arif@linux.dev>
>>>>>
>>>>> migrate_vma_split_unmapped_folio() takes an extra reference via
>>>>> folio_get() before calling folio_split_unmapped(). On success, the
>>>>> split consumes this reference: __folio_freeze_and_split_unmapped()
>>>>> expects the +1 in its folio_ref_freeze() check, and distributes it
>>>>> across the resulting sub-folios via folio_ref_unfreeze(...+1), which
>>>>> are later balanced by folio_put() calls in __migrate_device_finalize().
>>>>>
>>>>> If folio_split_unmapped() fails (e.g., unexpected pinning returns
>>>>> -EAGAIN), the function returns without calling folio_put(). The extra
>>>>> reference is never released.
>>>>>
>>>>> Add the missing folio_put() on the error path.
>>>>>
>>>>> Fixes: 4265d67e405a4 ("mm/migrate_device: add THP splitting during migration")
>>>>> Closes: https://lore.kernel.org/all/CAA1CXcDyqPPwf_-W7B+PFQtL8HdoJGCEqVsVxq7DhOUB=L4PQA@mail.gmail.com/
>>>>> Reported-by: Nico Pache <npache@redhat.com>
>>>>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>>>>> ---
>>>>> mm/migrate_device.c | 4 +++-
>>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
>>>>> index 0a8b31939640f..351ecd9065d13 100644
>>>>> --- a/mm/migrate_device.c
>>>>> +++ b/mm/migrate_device.c
>>>>> @@ -917,8 +917,10 @@ static int migrate_vma_split_unmapped_folio(struct migrate_vma *migrate,
>>>>> folio_get(folio);
>>>>> split_huge_pmd_address(migrate->vma, addr, true);
>>>>> ret = folio_split_unmapped(folio, 0);
>>>>> - if (ret)
>>>>> + if (ret) {
>>>>> + folio_put(folio);
>>>>> return ret;
>>>>> + }
>>>>> migrate->src[idx] &= ~MIGRATE_PFN_COMPOUND;
>>>>> flags = migrate->src[idx] & ((1UL << MIGRATE_PFN_SHIFT) - 1);
>>>>> pfn = migrate->src[idx] >> MIGRATE_PFN_SHIFT;
>>>>> --
>>>>> 2.47.3
>>>>
>>>> Add Balbir, who wrote the code, to comment on this.
>>>>
>>>
>>> Thanks Zi!
>>>
>>> Just wondering if there is a reproducer for the issue and how the fix was tested?
>>> I expect migrate_vma_finalize() to be called for folios, even when split failed and
>>> drop the lock.
>>
>> Does migrate_vma_finalize() do folio_put() for failed-to-split folios?
>> If so, how does it distinguish between split folios and failed-to-split folios?
>> By comparing source and destination folio orders?
>>
>
> We reset the MIGRATE_PFN_MIGRATE flag for failing to migrate pfns. We do a folio_put
> on the src in finalize, if it is split then on all the split folios as well.
>
>> What we see from migrate_vma_split_unmapped_folio() is that
>> it adds a refcount for all input folios, but only drops a refcount
>> for the split folio. Isn’t it cause failed-to-split folios to have
>> additional refcount?
>>
Hello!
Thanks for reviewing everyone. So its very difficult to create a reproducer I think
the extra reference would need to appear after migrate_device_unmap() but before
folio_split_unmapped() in migrate_vma_pages()? That's hard to trigger reliably from
userspace.
The fix came about when Nico indicated there might be an issue if split_huge_pmd_address
fails in my patch [1].
Below is my understanding of how refcounting is working over here step by step. I
might very well be wrong on this, and the refcounting is a bit all over the place
and I might miss a reference change somewhere so would really appreciate if someone
can confirm this!
1. migrate_vma_collect_huge_pmd():
a) folio_get(folio) -> +1 (collect reference)
2. migrate_device_unmap():
a) folio_isolate_lru() -> +1 (isolation reference)
b) folio_put() -> -1 (drops the collect reference)
Without this patch fix:
3. migrate_vma_split_unmapped_folio():
a) folio_get(folio) -> +1 (split reference)
b) folio_split_unmapped() -> fails
c) Returns error — without folio_put() which is the fix
4. Caller in migrate_vma_pages(): clears MIGRATE_PFN_MIGRATE | MIGRATE_PFN_COMPOUND
5. __migrate_device_finalize(): sees !(src_pfns[i] & MIGRATE_PFN_MIGRATE), restores the folio:
a) remove_migration_ptes(src, src) — re-establishes user PTEs
b) folio_unlock(src)
c) folio_put(src) -> -1 (drops the isolation reference)
The split reference in 3.a is never released and the folio has a permanently elevated refcount.
Unless I missed a folio_put somewhere for the refcount increase in folio_isolate_lru() (2.b)?
Please let me know if this makes sense!
[1] https://lore.kernel.org/all/CAA1CXcDyqPPwf_-W7B+PFQtL8HdoJGCEqVsVxq7DhOUB=L4PQA@mail.gmail.com/
>
> Thanks! Yes, the patch makes sense
>
> Acked-by: Balbir Singh <balbirs@nvidia.com>
>
> Balbir
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mm/migrate_device: fix folio refcount leak on folio_split_unmapped failure
2026-03-04 23:28 ` Usama Arif
@ 2026-03-05 6:09 ` Mika Penttilä
2026-03-05 11:44 ` Usama Arif
0 siblings, 1 reply; 16+ messages in thread
From: Mika Penttilä @ 2026-03-05 6:09 UTC (permalink / raw)
To: Usama Arif, Balbir Singh, Zi Yan, Kiryl Shutsemau, matthew.brost,
npache, david
Cc: Usama Arif, Andrew Morton, linux-mm, joshua.hahnjy, hannes,
rakie.kim, byungchul, gourry, ying.huang, apopple, riel,
shakeel.butt, linux-kernel, kernel-team
Hi!
On 3/5/26 01:28, Usama Arif wrote:
>
> On 04/03/2026 22:09, Balbir Singh wrote:
>> On 3/5/26 08:54, Zi Yan wrote:
>>> On 4 Mar 2026, at 16:48, Balbir Singh wrote:
>>>
>>>> On 3/5/26 02:17, Zi Yan wrote:
>>>>> On 4 Mar 2026, at 7:01, Usama Arif wrote:
>>>>>
>>>>>> From: Usama Arif <usama.arif@linux.dev>
>>>>>>
>>>>>> migrate_vma_split_unmapped_folio() takes an extra reference via
>>>>>> folio_get() before calling folio_split_unmapped(). On success, the
>>>>>> split consumes this reference: __folio_freeze_and_split_unmapped()
>>>>>> expects the +1 in its folio_ref_freeze() check, and distributes it
>>>>>> across the resulting sub-folios via folio_ref_unfreeze(...+1), which
>>>>>> are later balanced by folio_put() calls in __migrate_device_finalize().
>>>>>>
>>>>>> If folio_split_unmapped() fails (e.g., unexpected pinning returns
>>>>>> -EAGAIN), the function returns without calling folio_put(). The extra
>>>>>> reference is never released.
>>>>>>
>>>>>> Add the missing folio_put() on the error path.
>>>>>>
>>>>>> Fixes: 4265d67e405a4 ("mm/migrate_device: add THP splitting during migration")
>>>>>> Closes: https://lore.kernel.org/all/CAA1CXcDyqPPwf_-W7B+PFQtL8HdoJGCEqVsVxq7DhOUB=L4PQA@mail.gmail.com/
>>>>>> Reported-by: Nico Pache <npache@redhat.com>
>>>>>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>>>>>> ---
>>>>>> mm/migrate_device.c | 4 +++-
>>>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
>>>>>> index 0a8b31939640f..351ecd9065d13 100644
>>>>>> --- a/mm/migrate_device.c
>>>>>> +++ b/mm/migrate_device.c
>>>>>> @@ -917,8 +917,10 @@ static int migrate_vma_split_unmapped_folio(struct migrate_vma *migrate,
>>>>>> folio_get(folio);
>>>>>> split_huge_pmd_address(migrate->vma, addr, true);
>>>>>> ret = folio_split_unmapped(folio, 0);
>>>>>> - if (ret)
>>>>>> + if (ret) {
>>>>>> + folio_put(folio);
>>>>>> return ret;
>>>>>> + }
>>>>>> migrate->src[idx] &= ~MIGRATE_PFN_COMPOUND;
>>>>>> flags = migrate->src[idx] & ((1UL << MIGRATE_PFN_SHIFT) - 1);
>>>>>> pfn = migrate->src[idx] >> MIGRATE_PFN_SHIFT;
>>>>>> --
>>>>>> 2.47.3
>>>>> Add Balbir, who wrote the code, to comment on this.
>>>>>
>>>> Thanks Zi!
>>>>
>>>> Just wondering if there is a reproducer for the issue and how the fix was tested?
>>>> I expect migrate_vma_finalize() to be called for folios, even when split failed and
>>>> drop the lock.
>>> Does migrate_vma_finalize() do folio_put() for failed-to-split folios?
>>> If so, how does it distinguish between split folios and failed-to-split folios?
>>> By comparing source and destination folio orders?
>>>
>> We reset the MIGRATE_PFN_MIGRATE flag for failing to migrate pfns. We do a folio_put
>> on the src in finalize, if it is split then on all the split folios as well.
>>
>>> What we see from migrate_vma_split_unmapped_folio() is that
>>> it adds a refcount for all input folios, but only drops a refcount
>>> for the split folio. Isn’t it cause failed-to-split folios to have
>>> additional refcount?
>>>
> Hello!
>
> Thanks for reviewing everyone. So its very difficult to create a reproducer I think
> the extra reference would need to appear after migrate_device_unmap() but before
> folio_split_unmapped() in migrate_vma_pages()? That's hard to trigger reliably from
> userspace.
>
> The fix came about when Nico indicated there might be an issue if split_huge_pmd_address
> fails in my patch [1].
>
> Below is my understanding of how refcounting is working over here step by step. I
> might very well be wrong on this, and the refcounting is a bit all over the place
> and I might miss a reference change somewhere so would really appreciate if someone
> can confirm this!
>
>
> 1. migrate_vma_collect_huge_pmd():
> a) folio_get(folio) -> +1 (collect reference)
> 2. migrate_device_unmap():
> a) folio_isolate_lru() -> +1 (isolation reference)
> b) folio_put() -> -1 (drops the collect reference)
>
>
> Without this patch fix:
>
> 3. migrate_vma_split_unmapped_folio():
> a) folio_get(folio) -> +1 (split reference)
> b) folio_split_unmapped() -> fails
> c) Returns error — without folio_put() which is the fix
> 4. Caller in migrate_vma_pages(): clears MIGRATE_PFN_MIGRATE | MIGRATE_PFN_COMPOUND
> 5. __migrate_device_finalize(): sees !(src_pfns[i] & MIGRATE_PFN_MIGRATE), restores the folio:
> a) remove_migration_ptes(src, src) — re-establishes user PTEs
> b) folio_unlock(src)
> c) folio_put(src) -> -1 (drops the isolation reference)
>
> The split reference in 3.a is never released and the folio has a permanently elevated refcount.
> Unless I missed a folio_put somewhere for the refcount increase in folio_isolate_lru() (2.b)?
>
> Please let me know if this makes sense!
>
> [1] https://lore.kernel.org/all/CAA1CXcDyqPPwf_-W7B+PFQtL8HdoJGCEqVsVxq7DhOUB=L4PQA@mail.gmail.com/
>
>> Thanks! Yes, the patch makes sense
>>
>> Acked-by: Balbir Singh <balbirs@nvidia.com>
>>
>> Balbir
I remember stumbling on this while ago also. The folio_get() in migrate_vma_split_unmapped_folio()
is balanced with put_page() in __split_huge_pmd_locked() (freeze = true), can't fail for device pages.
Folios at this point are unmapped but have 1 refcount from "collecting".
After folio_split_unmapped() the refcount(s) is still 1.
So it seems the code is good as is? A comment though would be good for the extra folio_get..
Thanks,
--Mika
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mm/migrate_device: fix folio refcount leak on folio_split_unmapped failure
2026-03-05 6:09 ` Mika Penttilä
@ 2026-03-05 11:44 ` Usama Arif
2026-03-05 12:09 ` Mika Penttilä
0 siblings, 1 reply; 16+ messages in thread
From: Usama Arif @ 2026-03-05 11:44 UTC (permalink / raw)
To: Mika Penttilä,
Balbir Singh, Zi Yan, Kiryl Shutsemau, matthew.brost, npache,
david
Cc: Usama Arif, Andrew Morton, linux-mm, joshua.hahnjy, hannes,
rakie.kim, byungchul, gourry, ying.huang, apopple, riel,
shakeel.butt, linux-kernel, kernel-team
On 05/03/2026 06:09, Mika Penttilä wrote:
> Hi!
>
> On 3/5/26 01:28, Usama Arif wrote:
>
>>
>> On 04/03/2026 22:09, Balbir Singh wrote:
>>> On 3/5/26 08:54, Zi Yan wrote:
>>>> On 4 Mar 2026, at 16:48, Balbir Singh wrote:
>>>>
>>>>> On 3/5/26 02:17, Zi Yan wrote:
>>>>>> On 4 Mar 2026, at 7:01, Usama Arif wrote:
>>>>>>
>>>>>>> From: Usama Arif <usama.arif@linux.dev>
>>>>>>>
>>>>>>> migrate_vma_split_unmapped_folio() takes an extra reference via
>>>>>>> folio_get() before calling folio_split_unmapped(). On success, the
>>>>>>> split consumes this reference: __folio_freeze_and_split_unmapped()
>>>>>>> expects the +1 in its folio_ref_freeze() check, and distributes it
>>>>>>> across the resulting sub-folios via folio_ref_unfreeze(...+1), which
>>>>>>> are later balanced by folio_put() calls in __migrate_device_finalize().
>>>>>>>
>>>>>>> If folio_split_unmapped() fails (e.g., unexpected pinning returns
>>>>>>> -EAGAIN), the function returns without calling folio_put(). The extra
>>>>>>> reference is never released.
>>>>>>>
>>>>>>> Add the missing folio_put() on the error path.
>>>>>>>
>>>>>>> Fixes: 4265d67e405a4 ("mm/migrate_device: add THP splitting during migration")
>>>>>>> Closes: https://lore.kernel.org/all/CAA1CXcDyqPPwf_-W7B+PFQtL8HdoJGCEqVsVxq7DhOUB=L4PQA@mail.gmail.com/
>>>>>>> Reported-by: Nico Pache <npache@redhat.com>
>>>>>>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>>>>>>> ---
>>>>>>> mm/migrate_device.c | 4 +++-
>>>>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
>>>>>>> index 0a8b31939640f..351ecd9065d13 100644
>>>>>>> --- a/mm/migrate_device.c
>>>>>>> +++ b/mm/migrate_device.c
>>>>>>> @@ -917,8 +917,10 @@ static int migrate_vma_split_unmapped_folio(struct migrate_vma *migrate,
>>>>>>> folio_get(folio);
>>>>>>> split_huge_pmd_address(migrate->vma, addr, true);
>>>>>>> ret = folio_split_unmapped(folio, 0);
>>>>>>> - if (ret)
>>>>>>> + if (ret) {
>>>>>>> + folio_put(folio);
>>>>>>> return ret;
>>>>>>> + }
>>>>>>> migrate->src[idx] &= ~MIGRATE_PFN_COMPOUND;
>>>>>>> flags = migrate->src[idx] & ((1UL << MIGRATE_PFN_SHIFT) - 1);
>>>>>>> pfn = migrate->src[idx] >> MIGRATE_PFN_SHIFT;
>>>>>>> --
>>>>>>> 2.47.3
>>>>>> Add Balbir, who wrote the code, to comment on this.
>>>>>>
>>>>> Thanks Zi!
>>>>>
>>>>> Just wondering if there is a reproducer for the issue and how the fix was tested?
>>>>> I expect migrate_vma_finalize() to be called for folios, even when split failed and
>>>>> drop the lock.
>>>> Does migrate_vma_finalize() do folio_put() for failed-to-split folios?
>>>> If so, how does it distinguish between split folios and failed-to-split folios?
>>>> By comparing source and destination folio orders?
>>>>
>>> We reset the MIGRATE_PFN_MIGRATE flag for failing to migrate pfns. We do a folio_put
>>> on the src in finalize, if it is split then on all the split folios as well.
>>>
>>>> What we see from migrate_vma_split_unmapped_folio() is that
>>>> it adds a refcount for all input folios, but only drops a refcount
>>>> for the split folio. Isn’t it cause failed-to-split folios to have
>>>> additional refcount?
>>>>
>> Hello!
>>
>> Thanks for reviewing everyone. So its very difficult to create a reproducer I think
>> the extra reference would need to appear after migrate_device_unmap() but before
>> folio_split_unmapped() in migrate_vma_pages()? That's hard to trigger reliably from
>> userspace.
>>
>> The fix came about when Nico indicated there might be an issue if split_huge_pmd_address
>> fails in my patch [1].
>>
>> Below is my understanding of how refcounting is working over here step by step. I
>> might very well be wrong on this, and the refcounting is a bit all over the place
>> and I might miss a reference change somewhere so would really appreciate if someone
>> can confirm this!
>>
>>
>> 1. migrate_vma_collect_huge_pmd():
>> a) folio_get(folio) -> +1 (collect reference)
>> 2. migrate_device_unmap():
>> a) folio_isolate_lru() -> +1 (isolation reference)
>> b) folio_put() -> -1 (drops the collect reference)
>>
>>
>> Without this patch fix:
>>
>> 3. migrate_vma_split_unmapped_folio():
>> a) folio_get(folio) -> +1 (split reference)
>> b) folio_split_unmapped() -> fails
>> c) Returns error — without folio_put() which is the fix
>> 4. Caller in migrate_vma_pages(): clears MIGRATE_PFN_MIGRATE | MIGRATE_PFN_COMPOUND
>> 5. __migrate_device_finalize(): sees !(src_pfns[i] & MIGRATE_PFN_MIGRATE), restores the folio:
>> a) remove_migration_ptes(src, src) — re-establishes user PTEs
>> b) folio_unlock(src)
>> c) folio_put(src) -> -1 (drops the isolation reference)
>>
>> The split reference in 3.a is never released and the folio has a permanently elevated refcount.
>> Unless I missed a folio_put somewhere for the refcount increase in folio_isolate_lru() (2.b)?
>>
>> Please let me know if this makes sense!
>>
>> [1] https://lore.kernel.org/all/CAA1CXcDyqPPwf_-W7B+PFQtL8HdoJGCEqVsVxq7DhOUB=L4PQA@mail.gmail.com/
>>
>>> Thanks! Yes, the patch makes sense
>>>
>>> Acked-by: Balbir Singh <balbirs@nvidia.com>
>>>
>>> Balbir
>
> I remember stumbling on this while ago also. The folio_get() in migrate_vma_split_unmapped_folio()
> is balanced with put_page() in __split_huge_pmd_locked() (freeze = true), can't fail for device pages.
> Folios at this point are unmapped but have 1 refcount from "collecting".
> After folio_split_unmapped() the refcount(s) is still 1.
>
> So it seems the code is good as is? A comment though would be good for the extra folio_get..
>
hmm I dont think the put_page() in __split_huge_pmd_locked() is there to balance the folio_get() in
migrate_vma_split_unmapped_folio(). There are other points where split_huge_pmd_locked() is called
with freeze = true [1] and they don't get a reference before calling split_huge_pmd.
I think the folio_put() in __split_huge_pmd_locked() freeze = true case is there as migration
entries are being installed?
[1] https://elixir.bootlin.com/linux/v6.19.3/source/mm/rmap.c#L2334
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mm/migrate_device: fix folio refcount leak on folio_split_unmapped failure
2026-03-05 11:44 ` Usama Arif
@ 2026-03-05 12:09 ` Mika Penttilä
2026-03-05 16:36 ` Usama Arif
0 siblings, 1 reply; 16+ messages in thread
From: Mika Penttilä @ 2026-03-05 12:09 UTC (permalink / raw)
To: Usama Arif, Balbir Singh, Zi Yan, Kiryl Shutsemau, matthew.brost,
npache, david
Cc: Usama Arif, Andrew Morton, linux-mm, joshua.hahnjy, hannes,
rakie.kim, byungchul, gourry, ying.huang, apopple, riel,
shakeel.butt, linux-kernel, kernel-team
On 3/5/26 13:44, Usama Arif wrote:
>
> On 05/03/2026 06:09, Mika Penttilä wrote:
>> Hi!
>>
>> On 3/5/26 01:28, Usama Arif wrote:
>>
>>> On 04/03/2026 22:09, Balbir Singh wrote:
>>>> On 3/5/26 08:54, Zi Yan wrote:
>>>>> On 4 Mar 2026, at 16:48, Balbir Singh wrote:
>>>>>
>>>>>> On 3/5/26 02:17, Zi Yan wrote:
>>>>>>> On 4 Mar 2026, at 7:01, Usama Arif wrote:
>>>>>>>
>>>>>>>> From: Usama Arif <usama.arif@linux.dev>
>>>>>>>>
>>>>>>>> migrate_vma_split_unmapped_folio() takes an extra reference via
>>>>>>>> folio_get() before calling folio_split_unmapped(). On success, the
>>>>>>>> split consumes this reference: __folio_freeze_and_split_unmapped()
>>>>>>>> expects the +1 in its folio_ref_freeze() check, and distributes it
>>>>>>>> across the resulting sub-folios via folio_ref_unfreeze(...+1), which
>>>>>>>> are later balanced by folio_put() calls in __migrate_device_finalize().
>>>>>>>>
>>>>>>>> If folio_split_unmapped() fails (e.g., unexpected pinning returns
>>>>>>>> -EAGAIN), the function returns without calling folio_put(). The extra
>>>>>>>> reference is never released.
>>>>>>>>
>>>>>>>> Add the missing folio_put() on the error path.
>>>>>>>>
>>>>>>>> Fixes: 4265d67e405a4 ("mm/migrate_device: add THP splitting during migration")
>>>>>>>> Closes: https://lore.kernel.org/all/CAA1CXcDyqPPwf_-W7B+PFQtL8HdoJGCEqVsVxq7DhOUB=L4PQA@mail.gmail.com/
>>>>>>>> Reported-by: Nico Pache <npache@redhat.com>
>>>>>>>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>>>>>>>> ---
>>>>>>>> mm/migrate_device.c | 4 +++-
>>>>>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
>>>>>>>> index 0a8b31939640f..351ecd9065d13 100644
>>>>>>>> --- a/mm/migrate_device.c
>>>>>>>> +++ b/mm/migrate_device.c
>>>>>>>> @@ -917,8 +917,10 @@ static int migrate_vma_split_unmapped_folio(struct migrate_vma *migrate,
>>>>>>>> folio_get(folio);
>>>>>>>> split_huge_pmd_address(migrate->vma, addr, true);
>>>>>>>> ret = folio_split_unmapped(folio, 0);
>>>>>>>> - if (ret)
>>>>>>>> + if (ret) {
>>>>>>>> + folio_put(folio);
>>>>>>>> return ret;
>>>>>>>> + }
>>>>>>>> migrate->src[idx] &= ~MIGRATE_PFN_COMPOUND;
>>>>>>>> flags = migrate->src[idx] & ((1UL << MIGRATE_PFN_SHIFT) - 1);
>>>>>>>> pfn = migrate->src[idx] >> MIGRATE_PFN_SHIFT;
>>>>>>>> --
>>>>>>>> 2.47.3
>>>>>>> Add Balbir, who wrote the code, to comment on this.
>>>>>>>
>>>>>> Thanks Zi!
>>>>>>
>>>>>> Just wondering if there is a reproducer for the issue and how the fix was tested?
>>>>>> I expect migrate_vma_finalize() to be called for folios, even when split failed and
>>>>>> drop the lock.
>>>>> Does migrate_vma_finalize() do folio_put() for failed-to-split folios?
>>>>> If so, how does it distinguish between split folios and failed-to-split folios?
>>>>> By comparing source and destination folio orders?
>>>>>
>>>> We reset the MIGRATE_PFN_MIGRATE flag for failing to migrate pfns. We do a folio_put
>>>> on the src in finalize, if it is split then on all the split folios as well.
>>>>
>>>>> What we see from migrate_vma_split_unmapped_folio() is that
>>>>> it adds a refcount for all input folios, but only drops a refcount
>>>>> for the split folio. Isn’t it cause failed-to-split folios to have
>>>>> additional refcount?
>>>>>
>>> Hello!
>>>
>>> Thanks for reviewing everyone. So its very difficult to create a reproducer I think
>>> the extra reference would need to appear after migrate_device_unmap() but before
>>> folio_split_unmapped() in migrate_vma_pages()? That's hard to trigger reliably from
>>> userspace.
>>>
>>> The fix came about when Nico indicated there might be an issue if split_huge_pmd_address
>>> fails in my patch [1].
>>>
>>> Below is my understanding of how refcounting is working over here step by step. I
>>> might very well be wrong on this, and the refcounting is a bit all over the place
>>> and I might miss a reference change somewhere so would really appreciate if someone
>>> can confirm this!
>>>
>>>
>>> 1. migrate_vma_collect_huge_pmd():
>>> a) folio_get(folio) -> +1 (collect reference)
>>> 2. migrate_device_unmap():
>>> a) folio_isolate_lru() -> +1 (isolation reference)
>>> b) folio_put() -> -1 (drops the collect reference)
>>>
>>>
>>> Without this patch fix:
>>>
>>> 3. migrate_vma_split_unmapped_folio():
>>> a) folio_get(folio) -> +1 (split reference)
>>> b) folio_split_unmapped() -> fails
>>> c) Returns error — without folio_put() which is the fix
>>> 4. Caller in migrate_vma_pages(): clears MIGRATE_PFN_MIGRATE | MIGRATE_PFN_COMPOUND
>>> 5. __migrate_device_finalize(): sees !(src_pfns[i] & MIGRATE_PFN_MIGRATE), restores the folio:
>>> a) remove_migration_ptes(src, src) — re-establishes user PTEs
>>> b) folio_unlock(src)
>>> c) folio_put(src) -> -1 (drops the isolation reference)
>>>
>>> The split reference in 3.a is never released and the folio has a permanently elevated refcount.
>>> Unless I missed a folio_put somewhere for the refcount increase in folio_isolate_lru() (2.b)?
>>>
>>> Please let me know if this makes sense!
>>>
>>> [1] https://lore.kernel.org/all/CAA1CXcDyqPPwf_-W7B+PFQtL8HdoJGCEqVsVxq7DhOUB=L4PQA@mail.gmail.com/
>>>
>>>> Thanks! Yes, the patch makes sense
>>>>
>>>> Acked-by: Balbir Singh <balbirs@nvidia.com>
>>>>
>>>> Balbir
>> I remember stumbling on this while ago also. The folio_get() in migrate_vma_split_unmapped_folio()
>> is balanced with put_page() in __split_huge_pmd_locked() (freeze = true), can't fail for device pages.
>> Folios at this point are unmapped but have 1 refcount from "collecting".
>> After folio_split_unmapped() the refcount(s) is still 1.
>>
>> So it seems the code is good as is? A comment though would be good for the extra folio_get..
>>
> hmm I dont think the put_page() in __split_huge_pmd_locked() is there to balance the folio_get() in
> migrate_vma_split_unmapped_folio(). There are other points where split_huge_pmd_locked() is called
> with freeze = true [1] and they don't get a reference before calling split_huge_pmd.
>
> I think the folio_put() in __split_huge_pmd_locked() freeze = true case is there as migration
> entries are being installed?
>
> [1] https://elixir.bootlin.com/linux/v6.19.3/source/mm/rmap.c#L2334
>
>
Yes normally you want to drop the reference when installing migration entries but in this context
you have already done the collecting for the THP folio and you want to balance with the folio_get()
the put_page() to keep the refs unchanged. Is that right Balbir?
--Mika
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mm/migrate_device: fix folio refcount leak on folio_split_unmapped failure
2026-03-05 12:09 ` Mika Penttilä
@ 2026-03-05 16:36 ` Usama Arif
2026-03-05 16:39 ` Zi Yan
0 siblings, 1 reply; 16+ messages in thread
From: Usama Arif @ 2026-03-05 16:36 UTC (permalink / raw)
To: Mika Penttilä,
Balbir Singh, Zi Yan, Kiryl Shutsemau, matthew.brost, npache,
david
Cc: Usama Arif, Andrew Morton, linux-mm, joshua.hahnjy, hannes,
rakie.kim, byungchul, gourry, ying.huang, apopple, riel,
shakeel.butt, linux-kernel, kernel-team
On 05/03/2026 12:09, Mika Penttilä wrote:
> On 3/5/26 13:44, Usama Arif wrote:
>
>>
>> On 05/03/2026 06:09, Mika Penttilä wrote:
>>> Hi!
>>>
>>> On 3/5/26 01:28, Usama Arif wrote:
>>>
>>>> On 04/03/2026 22:09, Balbir Singh wrote:
>>>>> On 3/5/26 08:54, Zi Yan wrote:
>>>>>> On 4 Mar 2026, at 16:48, Balbir Singh wrote:
>>>>>>
>>>>>>> On 3/5/26 02:17, Zi Yan wrote:
>>>>>>>> On 4 Mar 2026, at 7:01, Usama Arif wrote:
>>>>>>>>
>>>>>>>>> From: Usama Arif <usama.arif@linux.dev>
>>>>>>>>>
>>>>>>>>> migrate_vma_split_unmapped_folio() takes an extra reference via
>>>>>>>>> folio_get() before calling folio_split_unmapped(). On success, the
>>>>>>>>> split consumes this reference: __folio_freeze_and_split_unmapped()
>>>>>>>>> expects the +1 in its folio_ref_freeze() check, and distributes it
>>>>>>>>> across the resulting sub-folios via folio_ref_unfreeze(...+1), which
>>>>>>>>> are later balanced by folio_put() calls in __migrate_device_finalize().
>>>>>>>>>
>>>>>>>>> If folio_split_unmapped() fails (e.g., unexpected pinning returns
>>>>>>>>> -EAGAIN), the function returns without calling folio_put(). The extra
>>>>>>>>> reference is never released.
>>>>>>>>>
>>>>>>>>> Add the missing folio_put() on the error path.
>>>>>>>>>
>>>>>>>>> Fixes: 4265d67e405a4 ("mm/migrate_device: add THP splitting during migration")
>>>>>>>>> Closes: https://lore.kernel.org/all/CAA1CXcDyqPPwf_-W7B+PFQtL8HdoJGCEqVsVxq7DhOUB=L4PQA@mail.gmail.com/
>>>>>>>>> Reported-by: Nico Pache <npache@redhat.com>
>>>>>>>>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>>>>>>>>> ---
>>>>>>>>> mm/migrate_device.c | 4 +++-
>>>>>>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>>>>>
>>>>>>>>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
>>>>>>>>> index 0a8b31939640f..351ecd9065d13 100644
>>>>>>>>> --- a/mm/migrate_device.c
>>>>>>>>> +++ b/mm/migrate_device.c
>>>>>>>>> @@ -917,8 +917,10 @@ static int migrate_vma_split_unmapped_folio(struct migrate_vma *migrate,
>>>>>>>>> folio_get(folio);
>>>>>>>>> split_huge_pmd_address(migrate->vma, addr, true);
>>>>>>>>> ret = folio_split_unmapped(folio, 0);
>>>>>>>>> - if (ret)
>>>>>>>>> + if (ret) {
>>>>>>>>> + folio_put(folio);
>>>>>>>>> return ret;
>>>>>>>>> + }
>>>>>>>>> migrate->src[idx] &= ~MIGRATE_PFN_COMPOUND;
>>>>>>>>> flags = migrate->src[idx] & ((1UL << MIGRATE_PFN_SHIFT) - 1);
>>>>>>>>> pfn = migrate->src[idx] >> MIGRATE_PFN_SHIFT;
>>>>>>>>> --
>>>>>>>>> 2.47.3
>>>>>>>> Add Balbir, who wrote the code, to comment on this.
>>>>>>>>
>>>>>>> Thanks Zi!
>>>>>>>
>>>>>>> Just wondering if there is a reproducer for the issue and how the fix was tested?
>>>>>>> I expect migrate_vma_finalize() to be called for folios, even when split failed and
>>>>>>> drop the lock.
>>>>>> Does migrate_vma_finalize() do folio_put() for failed-to-split folios?
>>>>>> If so, how does it distinguish between split folios and failed-to-split folios?
>>>>>> By comparing source and destination folio orders?
>>>>>>
>>>>> We reset the MIGRATE_PFN_MIGRATE flag for failing to migrate pfns. We do a folio_put
>>>>> on the src in finalize, if it is split then on all the split folios as well.
>>>>>
>>>>>> What we see from migrate_vma_split_unmapped_folio() is that
>>>>>> it adds a refcount for all input folios, but only drops a refcount
>>>>>> for the split folio. Isn’t it cause failed-to-split folios to have
>>>>>> additional refcount?
>>>>>>
>>>> Hello!
>>>>
>>>> Thanks for reviewing everyone. So its very difficult to create a reproducer I think
>>>> the extra reference would need to appear after migrate_device_unmap() but before
>>>> folio_split_unmapped() in migrate_vma_pages()? That's hard to trigger reliably from
>>>> userspace.
>>>>
>>>> The fix came about when Nico indicated there might be an issue if split_huge_pmd_address
>>>> fails in my patch [1].
>>>>
>>>> Below is my understanding of how refcounting is working over here step by step. I
>>>> might very well be wrong on this, and the refcounting is a bit all over the place
>>>> and I might miss a reference change somewhere so would really appreciate if someone
>>>> can confirm this!
>>>>
>>>>
>>>> 1. migrate_vma_collect_huge_pmd():
>>>> a) folio_get(folio) -> +1 (collect reference)
>>>> 2. migrate_device_unmap():
>>>> a) folio_isolate_lru() -> +1 (isolation reference)
>>>> b) folio_put() -> -1 (drops the collect reference)
>>>>
>>>>
>>>> Without this patch fix:
>>>>
>>>> 3. migrate_vma_split_unmapped_folio():
>>>> a) folio_get(folio) -> +1 (split reference)
>>>> b) folio_split_unmapped() -> fails
>>>> c) Returns error — without folio_put() which is the fix
>>>> 4. Caller in migrate_vma_pages(): clears MIGRATE_PFN_MIGRATE | MIGRATE_PFN_COMPOUND
>>>> 5. __migrate_device_finalize(): sees !(src_pfns[i] & MIGRATE_PFN_MIGRATE), restores the folio:
>>>> a) remove_migration_ptes(src, src) — re-establishes user PTEs
>>>> b) folio_unlock(src)
>>>> c) folio_put(src) -> -1 (drops the isolation reference)
>>>>
>>>> The split reference in 3.a is never released and the folio has a permanently elevated refcount.
>>>> Unless I missed a folio_put somewhere for the refcount increase in folio_isolate_lru() (2.b)?
>>>>
>>>> Please let me know if this makes sense!
>>>>
>>>> [1] https://lore.kernel.org/all/CAA1CXcDyqPPwf_-W7B+PFQtL8HdoJGCEqVsVxq7DhOUB=L4PQA@mail.gmail.com/
>>>>
>>>>> Thanks! Yes, the patch makes sense
>>>>>
>>>>> Acked-by: Balbir Singh <balbirs@nvidia.com>
>>>>>
>>>>> Balbir
>>> I remember stumbling on this while ago also. The folio_get() in migrate_vma_split_unmapped_folio()
>>> is balanced with put_page() in __split_huge_pmd_locked() (freeze = true), can't fail for device pages.
>>> Folios at this point are unmapped but have 1 refcount from "collecting".
>>> After folio_split_unmapped() the refcount(s) is still 1.
>>>
>>> So it seems the code is good as is? A comment though would be good for the extra folio_get..
>>>
>> hmm I dont think the put_page() in __split_huge_pmd_locked() is there to balance the folio_get() in
>> migrate_vma_split_unmapped_folio(). There are other points where split_huge_pmd_locked() is called
>> with freeze = true [1] and they don't get a reference before calling split_huge_pmd.
>>
>> I think the folio_put() in __split_huge_pmd_locked() freeze = true case is there as migration
>> entries are being installed?
>>
>> [1] https://elixir.bootlin.com/linux/v6.19.3/source/mm/rmap.c#L2334
>>
>>
> Yes normally you want to drop the reference when installing migration entries but in this context
> you have already done the collecting for the THP folio and you want to balance with the folio_get()
> the put_page() to keep the refs unchanged. Is that right Balbir?
>
> --Mika
>
Hi Mika,
You are right, This patch is wrong. I tried the below diff to force folio_split_unmapped to return
-EAGAIN. I ran tools/testing/selftests/mm/hmm-tests -r hmm.hmm_device_private.migrate_anon_huge_err
to trigger the path for folio_split_unmapped.
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 8e2746ea74adf..6df33b4990a13 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -4140,6 +4140,8 @@ int folio_split_unmapped(struct folio *folio, unsigned int new_order)
if (folio_expected_ref_count(folio) != folio_ref_count(folio) - 1)
return -EAGAIN;
+ return -EAGAIN;
+
local_irq_disable();
ret = __folio_freeze_and_split_unmapped(folio, new_order, &folio->page, NULL,
NULL, false, NULL, SPLIT_TYPE_UNIFORM,
I inserted a lot of traces to keep track of refcounts [1]. Without this patch, I get
....
hmm-tests-129 [000] ..... 1.476233: __migrate_device_pages: SPLIT_UNMAPPED: folio=ffc536e2c4100000 refcount=0 AFTER error NO folio_put
hmm-tests-129 [000] ..... 1.476234: __migrate_device_pages: PAGES: split FAILED folio=ffc536e2c4100000 refcount=0
hmm-tests-129 [000] ..... 1.476236: __migrate_device_finalize: FINALIZE[0]: src=ffc536e2c4100000 dst=ffc536e2c4100000 src==dst=1 refcount_src=1 mapcount_src=0 order_src=0 migrate=0 BEFORE remove_migration_ptes
hmm-tests-129 [000] ..... 1.476237: __migrate_device_finalize: FINALIZE[0]: src=ffc536e2c4100000 refcount=1 mapcount=0 AFTER remove_migration_ptes
hmm-tests-129 [000] ..... 1.476237: __migrate_device_finalize: FINALIZE[0]: src=ffc536e2c4100000 refcount=0 AFTER folio_put(src)
i.e. refcount = 512, which is correct as split_huge_pmd_address was successful. Full output is
at [2].
With this patch, I get:
BUG: Bad rss-counter state mm:00000000cfe88d5e type:MM_FILEPAGES val:-511 Comm:bash Pid:63
BUG: Bad rss-counter state mm:00000000cfe88d5e type:MM_ANONPAGES val:511 Comm:bash Pid:63
...
hmm-tests-129 [000] ..... 1.468315: __migrate_device_pages: SPLIT_UNMAPPED: folio=ffed210c840f0000 refcount=1 AFTER error folio_put FIX PRESENT
hmm-tests-129 [000] ..... 1.468315: __migrate_device_pages: PAGES: split FAILED folio=ffed210c840f0000 refcount=1
hmm-tests-129 [000] ..... 1.468318: __migrate_device_finalize: FINALIZE[0]: src=ffed210c840f0000 dst=ffed210c840f0000 src==dst=1 refcount_src=1 mapcount_src=0 order_src=9 migrate=0 BEFORE remove_migration_ptes
hmm-tests-129 [000] ..... 1.468357: __migrate_device_finalize: FINALIZE[0]: src=ffed210c840f0000 refcount=513 mapcount=512 AFTER remove_migration_ptes
hmm-tests-129 [000] ..... 1.468357: __migrate_device_finalize: FINALIZE[0]: src=ffed210c840f0000 refcount=512 AFTER folio_put(src)
refcount=0 means the folio would be freed which is not correct. The full output is at [3].
Thank you for clearing this up!
[1] https://gist.github.com/uarif1/65e1e816af7aa0ae38dd6ec64d62a993
[2] https://gist.github.com/uarif1/79ea9500667daa4e2ef09cb5d308f041
[3] https://gist.github.com/uarif1/8a35a6c65ba8b3a1c1dfe72dc30e821d
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mm/migrate_device: fix folio refcount leak on folio_split_unmapped failure
2026-03-05 16:36 ` Usama Arif
@ 2026-03-05 16:39 ` Zi Yan
2026-03-05 17:00 ` Usama Arif
0 siblings, 1 reply; 16+ messages in thread
From: Zi Yan @ 2026-03-05 16:39 UTC (permalink / raw)
To: Usama Arif
Cc: Mika Penttilä,
Balbir Singh, Kiryl Shutsemau, matthew.brost, npache, david,
Usama Arif, Andrew Morton, linux-mm, joshua.hahnjy, hannes,
rakie.kim, byungchul, gourry, ying.huang, apopple, riel,
shakeel.butt, linux-kernel, kernel-team
On 5 Mar 2026, at 11:36, Usama Arif wrote:
> On 05/03/2026 12:09, Mika Penttilä wrote:
>> On 3/5/26 13:44, Usama Arif wrote:
>>
>>>
>>> On 05/03/2026 06:09, Mika Penttilä wrote:
>>>> Hi!
>>>>
>>>> On 3/5/26 01:28, Usama Arif wrote:
>>>>
>>>>> On 04/03/2026 22:09, Balbir Singh wrote:
>>>>>> On 3/5/26 08:54, Zi Yan wrote:
>>>>>>> On 4 Mar 2026, at 16:48, Balbir Singh wrote:
>>>>>>>
>>>>>>>> On 3/5/26 02:17, Zi Yan wrote:
>>>>>>>>> On 4 Mar 2026, at 7:01, Usama Arif wrote:
>>>>>>>>>
>>>>>>>>>> From: Usama Arif <usama.arif@linux.dev>
>>>>>>>>>>
>>>>>>>>>> migrate_vma_split_unmapped_folio() takes an extra reference via
>>>>>>>>>> folio_get() before calling folio_split_unmapped(). On success, the
>>>>>>>>>> split consumes this reference: __folio_freeze_and_split_unmapped()
>>>>>>>>>> expects the +1 in its folio_ref_freeze() check, and distributes it
>>>>>>>>>> across the resulting sub-folios via folio_ref_unfreeze(...+1), which
>>>>>>>>>> are later balanced by folio_put() calls in __migrate_device_finalize().
>>>>>>>>>>
>>>>>>>>>> If folio_split_unmapped() fails (e.g., unexpected pinning returns
>>>>>>>>>> -EAGAIN), the function returns without calling folio_put(). The extra
>>>>>>>>>> reference is never released.
>>>>>>>>>>
>>>>>>>>>> Add the missing folio_put() on the error path.
>>>>>>>>>>
>>>>>>>>>> Fixes: 4265d67e405a4 ("mm/migrate_device: add THP splitting during migration")
>>>>>>>>>> Closes: https://lore.kernel.org/all/CAA1CXcDyqPPwf_-W7B+PFQtL8HdoJGCEqVsVxq7DhOUB=L4PQA@mail.gmail.com/
>>>>>>>>>> Reported-by: Nico Pache <npache@redhat.com>
>>>>>>>>>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>>>>>>>>>> ---
>>>>>>>>>> mm/migrate_device.c | 4 +++-
>>>>>>>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
>>>>>>>>>> index 0a8b31939640f..351ecd9065d13 100644
>>>>>>>>>> --- a/mm/migrate_device.c
>>>>>>>>>> +++ b/mm/migrate_device.c
>>>>>>>>>> @@ -917,8 +917,10 @@ static int migrate_vma_split_unmapped_folio(struct migrate_vma *migrate,
>>>>>>>>>> folio_get(folio);
>>>>>>>>>> split_huge_pmd_address(migrate->vma, addr, true);
>>>>>>>>>> ret = folio_split_unmapped(folio, 0);
>>>>>>>>>> - if (ret)
>>>>>>>>>> + if (ret) {
>>>>>>>>>> + folio_put(folio);
>>>>>>>>>> return ret;
>>>>>>>>>> + }
>>>>>>>>>> migrate->src[idx] &= ~MIGRATE_PFN_COMPOUND;
>>>>>>>>>> flags = migrate->src[idx] & ((1UL << MIGRATE_PFN_SHIFT) - 1);
>>>>>>>>>> pfn = migrate->src[idx] >> MIGRATE_PFN_SHIFT;
>>>>>>>>>> --
>>>>>>>>>> 2.47.3
>>>>>>>>> Add Balbir, who wrote the code, to comment on this.
>>>>>>>>>
>>>>>>>> Thanks Zi!
>>>>>>>>
>>>>>>>> Just wondering if there is a reproducer for the issue and how the fix was tested?
>>>>>>>> I expect migrate_vma_finalize() to be called for folios, even when split failed and
>>>>>>>> drop the lock.
>>>>>>> Does migrate_vma_finalize() do folio_put() for failed-to-split folios?
>>>>>>> If so, how does it distinguish between split folios and failed-to-split folios?
>>>>>>> By comparing source and destination folio orders?
>>>>>>>
>>>>>> We reset the MIGRATE_PFN_MIGRATE flag for failing to migrate pfns. We do a folio_put
>>>>>> on the src in finalize, if it is split then on all the split folios as well.
>>>>>>
>>>>>>> What we see from migrate_vma_split_unmapped_folio() is that
>>>>>>> it adds a refcount for all input folios, but only drops a refcount
>>>>>>> for the split folio. Isn’t it cause failed-to-split folios to have
>>>>>>> additional refcount?
>>>>>>>
>>>>> Hello!
>>>>>
>>>>> Thanks for reviewing everyone. So its very difficult to create a reproducer I think
>>>>> the extra reference would need to appear after migrate_device_unmap() but before
>>>>> folio_split_unmapped() in migrate_vma_pages()? That's hard to trigger reliably from
>>>>> userspace.
>>>>>
>>>>> The fix came about when Nico indicated there might be an issue if split_huge_pmd_address
>>>>> fails in my patch [1].
>>>>>
>>>>> Below is my understanding of how refcounting is working over here step by step. I
>>>>> might very well be wrong on this, and the refcounting is a bit all over the place
>>>>> and I might miss a reference change somewhere so would really appreciate if someone
>>>>> can confirm this!
>>>>>
>>>>>
>>>>> 1. migrate_vma_collect_huge_pmd():
>>>>> a) folio_get(folio) -> +1 (collect reference)
>>>>> 2. migrate_device_unmap():
>>>>> a) folio_isolate_lru() -> +1 (isolation reference)
>>>>> b) folio_put() -> -1 (drops the collect reference)
>>>>>
>>>>>
>>>>> Without this patch fix:
>>>>>
>>>>> 3. migrate_vma_split_unmapped_folio():
>>>>> a) folio_get(folio) -> +1 (split reference)
>>>>> b) folio_split_unmapped() -> fails
>>>>> c) Returns error — without folio_put() which is the fix
>>>>> 4. Caller in migrate_vma_pages(): clears MIGRATE_PFN_MIGRATE | MIGRATE_PFN_COMPOUND
>>>>> 5. __migrate_device_finalize(): sees !(src_pfns[i] & MIGRATE_PFN_MIGRATE), restores the folio:
>>>>> a) remove_migration_ptes(src, src) — re-establishes user PTEs
>>>>> b) folio_unlock(src)
>>>>> c) folio_put(src) -> -1 (drops the isolation reference)
>>>>>
>>>>> The split reference in 3.a is never released and the folio has a permanently elevated refcount.
>>>>> Unless I missed a folio_put somewhere for the refcount increase in folio_isolate_lru() (2.b)?
>>>>>
>>>>> Please let me know if this makes sense!
>>>>>
>>>>> [1] https://lore.kernel.org/all/CAA1CXcDyqPPwf_-W7B+PFQtL8HdoJGCEqVsVxq7DhOUB=L4PQA@mail.gmail.com/
>>>>>
>>>>>> Thanks! Yes, the patch makes sense
>>>>>>
>>>>>> Acked-by: Balbir Singh <balbirs@nvidia.com>
>>>>>>
>>>>>> Balbir
>>>> I remember stumbling on this while ago also. The folio_get() in migrate_vma_split_unmapped_folio()
>>>> is balanced with put_page() in __split_huge_pmd_locked() (freeze = true), can't fail for device pages.
>>>> Folios at this point are unmapped but have 1 refcount from "collecting".
>>>> After folio_split_unmapped() the refcount(s) is still 1.
>>>>
>>>> So it seems the code is good as is? A comment though would be good for the extra folio_get..
>>>>
>>> hmm I dont think the put_page() in __split_huge_pmd_locked() is there to balance the folio_get() in
>>> migrate_vma_split_unmapped_folio(). There are other points where split_huge_pmd_locked() is called
>>> with freeze = true [1] and they don't get a reference before calling split_huge_pmd.
>>>
>>> I think the folio_put() in __split_huge_pmd_locked() freeze = true case is there as migration
>>> entries are being installed?
>>>
>>> [1] https://elixir.bootlin.com/linux/v6.19.3/source/mm/rmap.c#L2334
>>>
>>>
>> Yes normally you want to drop the reference when installing migration entries but in this context
>> you have already done the collecting for the THP folio and you want to balance with the folio_get()
>> the put_page() to keep the refs unchanged. Is that right Balbir?
>>
>> --Mika
>>
>
> Hi Mika,
>
> You are right, This patch is wrong. I tried the below diff to force folio_split_unmapped to return
> -EAGAIN. I ran tools/testing/selftests/mm/hmm-tests -r hmm.hmm_device_private.migrate_anon_huge_err
> to trigger the path for folio_split_unmapped.
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 8e2746ea74adf..6df33b4990a13 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -4140,6 +4140,8 @@ int folio_split_unmapped(struct folio *folio, unsigned int new_order)
> if (folio_expected_ref_count(folio) != folio_ref_count(folio) - 1)
> return -EAGAIN;
>
> + return -EAGAIN;
> +
> local_irq_disable();
> ret = __folio_freeze_and_split_unmapped(folio, new_order, &folio->page, NULL,
> NULL, false, NULL, SPLIT_TYPE_UNIFORM,
>
>
>
> I inserted a lot of traces to keep track of refcounts [1]. Without this patch, I get
> ....
> hmm-tests-129 [000] ..... 1.476233: __migrate_device_pages: SPLIT_UNMAPPED: folio=ffc536e2c4100000 refcount=0 AFTER error NO folio_put
> hmm-tests-129 [000] ..... 1.476234: __migrate_device_pages: PAGES: split FAILED folio=ffc536e2c4100000 refcount=0
> hmm-tests-129 [000] ..... 1.476236: __migrate_device_finalize: FINALIZE[0]: src=ffc536e2c4100000 dst=ffc536e2c4100000 src==dst=1 refcount_src=1 mapcount_src=0 order_src=0 migrate=0 BEFORE remove_migration_ptes
> hmm-tests-129 [000] ..... 1.476237: __migrate_device_finalize: FINALIZE[0]: src=ffc536e2c4100000 refcount=1 mapcount=0 AFTER remove_migration_ptes
> hmm-tests-129 [000] ..... 1.476237: __migrate_device_finalize: FINALIZE[0]: src=ffc536e2c4100000 refcount=0 AFTER folio_put(src)
>
> i.e. refcount = 512, which is correct as split_huge_pmd_address was successful. Full output is
> at [2].
>
> With this patch, I get:
>
> BUG: Bad rss-counter state mm:00000000cfe88d5e type:MM_FILEPAGES val:-511 Comm:bash Pid:63
> BUG: Bad rss-counter state mm:00000000cfe88d5e type:MM_ANONPAGES val:511 Comm:bash Pid:63
> ...
> hmm-tests-129 [000] ..... 1.468315: __migrate_device_pages: SPLIT_UNMAPPED: folio=ffed210c840f0000 refcount=1 AFTER error folio_put FIX PRESENT
> hmm-tests-129 [000] ..... 1.468315: __migrate_device_pages: PAGES: split FAILED folio=ffed210c840f0000 refcount=1
> hmm-tests-129 [000] ..... 1.468318: __migrate_device_finalize: FINALIZE[0]: src=ffed210c840f0000 dst=ffed210c840f0000 src==dst=1 refcount_src=1 mapcount_src=0 order_src=9 migrate=0 BEFORE remove_migration_ptes
> hmm-tests-129 [000] ..... 1.468357: __migrate_device_finalize: FINALIZE[0]: src=ffed210c840f0000 refcount=513 mapcount=512 AFTER remove_migration_ptes
> hmm-tests-129 [000] ..... 1.468357: __migrate_device_finalize: FINALIZE[0]: src=ffed210c840f0000 refcount=512 AFTER folio_put(src)
>
> refcount=0 means the folio would be freed which is not correct. The full output is at [3].
>
> Thank you for clearing this up!
Thank you for doing the investigation. Can you send a patch to add a comment
in migrate_vma_split_unmapped_folio() about this to avoid the confusion
in the future?
>
>
> [1] https://gist.github.com/uarif1/65e1e816af7aa0ae38dd6ec64d62a993
> [2] https://gist.github.com/uarif1/79ea9500667daa4e2ef09cb5d308f041
> [3] https://gist.github.com/uarif1/8a35a6c65ba8b3a1c1dfe72dc30e821d
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mm/migrate_device: fix folio refcount leak on folio_split_unmapped failure
2026-03-05 16:39 ` Zi Yan
@ 2026-03-05 17:00 ` Usama Arif
2026-03-05 17:32 ` Zi Yan
0 siblings, 1 reply; 16+ messages in thread
From: Usama Arif @ 2026-03-05 17:00 UTC (permalink / raw)
To: Zi Yan
Cc: Mika Penttilä,
Balbir Singh, Kiryl Shutsemau, matthew.brost, npache, david,
Usama Arif, Andrew Morton, linux-mm, joshua.hahnjy, hannes,
rakie.kim, byungchul, gourry, ying.huang, apopple, riel,
shakeel.butt, linux-kernel, kernel-team
On 05/03/2026 16:39, Zi Yan wrote:
> On 5 Mar 2026, at 11:36, Usama Arif wrote:
>
>> On 05/03/2026 12:09, Mika Penttilä wrote:
>>> On 3/5/26 13:44, Usama Arif wrote:
>>>
>>>>
>>>> On 05/03/2026 06:09, Mika Penttilä wrote:
>>>>> Hi!
>>>>>
>>>>> On 3/5/26 01:28, Usama Arif wrote:
>>>>>
>>>>>> On 04/03/2026 22:09, Balbir Singh wrote:
>>>>>>> On 3/5/26 08:54, Zi Yan wrote:
>>>>>>>> On 4 Mar 2026, at 16:48, Balbir Singh wrote:
>>>>>>>>
>>>>>>>>> On 3/5/26 02:17, Zi Yan wrote:
>>>>>>>>>> On 4 Mar 2026, at 7:01, Usama Arif wrote:
>>>>>>>>>>
>>>>>>>>>>> From: Usama Arif <usama.arif@linux.dev>
>>>>>>>>>>>
>>>>>>>>>>> migrate_vma_split_unmapped_folio() takes an extra reference via
>>>>>>>>>>> folio_get() before calling folio_split_unmapped(). On success, the
>>>>>>>>>>> split consumes this reference: __folio_freeze_and_split_unmapped()
>>>>>>>>>>> expects the +1 in its folio_ref_freeze() check, and distributes it
>>>>>>>>>>> across the resulting sub-folios via folio_ref_unfreeze(...+1), which
>>>>>>>>>>> are later balanced by folio_put() calls in __migrate_device_finalize().
>>>>>>>>>>>
>>>>>>>>>>> If folio_split_unmapped() fails (e.g., unexpected pinning returns
>>>>>>>>>>> -EAGAIN), the function returns without calling folio_put(). The extra
>>>>>>>>>>> reference is never released.
>>>>>>>>>>>
>>>>>>>>>>> Add the missing folio_put() on the error path.
>>>>>>>>>>>
>>>>>>>>>>> Fixes: 4265d67e405a4 ("mm/migrate_device: add THP splitting during migration")
>>>>>>>>>>> Closes: https://lore.kernel.org/all/CAA1CXcDyqPPwf_-W7B+PFQtL8HdoJGCEqVsVxq7DhOUB=L4PQA@mail.gmail.com/
>>>>>>>>>>> Reported-by: Nico Pache <npache@redhat.com>
>>>>>>>>>>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>>>>>>>>>>> ---
>>>>>>>>>>> mm/migrate_device.c | 4 +++-
>>>>>>>>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
>>>>>>>>>>> index 0a8b31939640f..351ecd9065d13 100644
>>>>>>>>>>> --- a/mm/migrate_device.c
>>>>>>>>>>> +++ b/mm/migrate_device.c
>>>>>>>>>>> @@ -917,8 +917,10 @@ static int migrate_vma_split_unmapped_folio(struct migrate_vma *migrate,
>>>>>>>>>>> folio_get(folio);
>>>>>>>>>>> split_huge_pmd_address(migrate->vma, addr, true);
>>>>>>>>>>> ret = folio_split_unmapped(folio, 0);
>>>>>>>>>>> - if (ret)
>>>>>>>>>>> + if (ret) {
>>>>>>>>>>> + folio_put(folio);
>>>>>>>>>>> return ret;
>>>>>>>>>>> + }
>>>>>>>>>>> migrate->src[idx] &= ~MIGRATE_PFN_COMPOUND;
>>>>>>>>>>> flags = migrate->src[idx] & ((1UL << MIGRATE_PFN_SHIFT) - 1);
>>>>>>>>>>> pfn = migrate->src[idx] >> MIGRATE_PFN_SHIFT;
>>>>>>>>>>> --
>>>>>>>>>>> 2.47.3
>>>>>>>>>> Add Balbir, who wrote the code, to comment on this.
>>>>>>>>>>
>>>>>>>>> Thanks Zi!
>>>>>>>>>
>>>>>>>>> Just wondering if there is a reproducer for the issue and how the fix was tested?
>>>>>>>>> I expect migrate_vma_finalize() to be called for folios, even when split failed and
>>>>>>>>> drop the lock.
>>>>>>>> Does migrate_vma_finalize() do folio_put() for failed-to-split folios?
>>>>>>>> If so, how does it distinguish between split folios and failed-to-split folios?
>>>>>>>> By comparing source and destination folio orders?
>>>>>>>>
>>>>>>> We reset the MIGRATE_PFN_MIGRATE flag for failing to migrate pfns. We do a folio_put
>>>>>>> on the src in finalize, if it is split then on all the split folios as well.
>>>>>>>
>>>>>>>> What we see from migrate_vma_split_unmapped_folio() is that
>>>>>>>> it adds a refcount for all input folios, but only drops a refcount
>>>>>>>> for the split folio. Isn’t it cause failed-to-split folios to have
>>>>>>>> additional refcount?
>>>>>>>>
>>>>>> Hello!
>>>>>>
>>>>>> Thanks for reviewing everyone. So its very difficult to create a reproducer I think
>>>>>> the extra reference would need to appear after migrate_device_unmap() but before
>>>>>> folio_split_unmapped() in migrate_vma_pages()? That's hard to trigger reliably from
>>>>>> userspace.
>>>>>>
>>>>>> The fix came about when Nico indicated there might be an issue if split_huge_pmd_address
>>>>>> fails in my patch [1].
>>>>>>
>>>>>> Below is my understanding of how refcounting is working over here step by step. I
>>>>>> might very well be wrong on this, and the refcounting is a bit all over the place
>>>>>> and I might miss a reference change somewhere so would really appreciate if someone
>>>>>> can confirm this!
>>>>>>
>>>>>>
>>>>>> 1. migrate_vma_collect_huge_pmd():
>>>>>> a) folio_get(folio) -> +1 (collect reference)
>>>>>> 2. migrate_device_unmap():
>>>>>> a) folio_isolate_lru() -> +1 (isolation reference)
>>>>>> b) folio_put() -> -1 (drops the collect reference)
>>>>>>
>>>>>>
>>>>>> Without this patch fix:
>>>>>>
>>>>>> 3. migrate_vma_split_unmapped_folio():
>>>>>> a) folio_get(folio) -> +1 (split reference)
>>>>>> b) folio_split_unmapped() -> fails
>>>>>> c) Returns error — without folio_put() which is the fix
>>>>>> 4. Caller in migrate_vma_pages(): clears MIGRATE_PFN_MIGRATE | MIGRATE_PFN_COMPOUND
>>>>>> 5. __migrate_device_finalize(): sees !(src_pfns[i] & MIGRATE_PFN_MIGRATE), restores the folio:
>>>>>> a) remove_migration_ptes(src, src) — re-establishes user PTEs
>>>>>> b) folio_unlock(src)
>>>>>> c) folio_put(src) -> -1 (drops the isolation reference)
>>>>>>
>>>>>> The split reference in 3.a is never released and the folio has a permanently elevated refcount.
>>>>>> Unless I missed a folio_put somewhere for the refcount increase in folio_isolate_lru() (2.b)?
>>>>>>
>>>>>> Please let me know if this makes sense!
>>>>>>
>>>>>> [1] https://lore.kernel.org/all/CAA1CXcDyqPPwf_-W7B+PFQtL8HdoJGCEqVsVxq7DhOUB=L4PQA@mail.gmail.com/
>>>>>>
>>>>>>> Thanks! Yes, the patch makes sense
>>>>>>>
>>>>>>> Acked-by: Balbir Singh <balbirs@nvidia.com>
>>>>>>>
>>>>>>> Balbir
>>>>> I remember stumbling on this while ago also. The folio_get() in migrate_vma_split_unmapped_folio()
>>>>> is balanced with put_page() in __split_huge_pmd_locked() (freeze = true), can't fail for device pages.
>>>>> Folios at this point are unmapped but have 1 refcount from "collecting".
>>>>> After folio_split_unmapped() the refcount(s) is still 1.
>>>>>
>>>>> So it seems the code is good as is? A comment though would be good for the extra folio_get..
>>>>>
>>>> hmm I dont think the put_page() in __split_huge_pmd_locked() is there to balance the folio_get() in
>>>> migrate_vma_split_unmapped_folio(). There are other points where split_huge_pmd_locked() is called
>>>> with freeze = true [1] and they don't get a reference before calling split_huge_pmd.
>>>>
>>>> I think the folio_put() in __split_huge_pmd_locked() freeze = true case is there as migration
>>>> entries are being installed?
>>>>
>>>> [1] https://elixir.bootlin.com/linux/v6.19.3/source/mm/rmap.c#L2334
>>>>
>>>>
>>> Yes normally you want to drop the reference when installing migration entries but in this context
>>> you have already done the collecting for the THP folio and you want to balance with the folio_get()
>>> the put_page() to keep the refs unchanged. Is that right Balbir?
>>>
>>> --Mika
>>>
>>
>> Hi Mika,
>>
>> You are right, This patch is wrong. I tried the below diff to force folio_split_unmapped to return
>> -EAGAIN. I ran tools/testing/selftests/mm/hmm-tests -r hmm.hmm_device_private.migrate_anon_huge_err
>> to trigger the path for folio_split_unmapped.
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 8e2746ea74adf..6df33b4990a13 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -4140,6 +4140,8 @@ int folio_split_unmapped(struct folio *folio, unsigned int new_order)
>> if (folio_expected_ref_count(folio) != folio_ref_count(folio) - 1)
>> return -EAGAIN;
>>
>> + return -EAGAIN;
>> +
>> local_irq_disable();
>> ret = __folio_freeze_and_split_unmapped(folio, new_order, &folio->page, NULL,
>> NULL, false, NULL, SPLIT_TYPE_UNIFORM,
>>
>>
>>
>> I inserted a lot of traces to keep track of refcounts [1]. Without this patch, I get
>> ....
>> hmm-tests-129 [000] ..... 1.476233: __migrate_device_pages: SPLIT_UNMAPPED: folio=ffc536e2c4100000 refcount=0 AFTER error NO folio_put
>> hmm-tests-129 [000] ..... 1.476234: __migrate_device_pages: PAGES: split FAILED folio=ffc536e2c4100000 refcount=0
>> hmm-tests-129 [000] ..... 1.476236: __migrate_device_finalize: FINALIZE[0]: src=ffc536e2c4100000 dst=ffc536e2c4100000 src==dst=1 refcount_src=1 mapcount_src=0 order_src=0 migrate=0 BEFORE remove_migration_ptes
>> hmm-tests-129 [000] ..... 1.476237: __migrate_device_finalize: FINALIZE[0]: src=ffc536e2c4100000 refcount=1 mapcount=0 AFTER remove_migration_ptes
>> hmm-tests-129 [000] ..... 1.476237: __migrate_device_finalize: FINALIZE[0]: src=ffc536e2c4100000 refcount=0 AFTER folio_put(src)
>>
>> i.e. refcount = 512, which is correct as split_huge_pmd_address was successful. Full output is
>> at [2].
>>
>> With this patch, I get:
>>
>> BUG: Bad rss-counter state mm:00000000cfe88d5e type:MM_FILEPAGES val:-511 Comm:bash Pid:63
>> BUG: Bad rss-counter state mm:00000000cfe88d5e type:MM_ANONPAGES val:511 Comm:bash Pid:63
>> ...
>> hmm-tests-129 [000] ..... 1.468315: __migrate_device_pages: SPLIT_UNMAPPED: folio=ffed210c840f0000 refcount=1 AFTER error folio_put FIX PRESENT
>> hmm-tests-129 [000] ..... 1.468315: __migrate_device_pages: PAGES: split FAILED folio=ffed210c840f0000 refcount=1
>> hmm-tests-129 [000] ..... 1.468318: __migrate_device_finalize: FINALIZE[0]: src=ffed210c840f0000 dst=ffed210c840f0000 src==dst=1 refcount_src=1 mapcount_src=0 order_src=9 migrate=0 BEFORE remove_migration_ptes
>> hmm-tests-129 [000] ..... 1.468357: __migrate_device_finalize: FINALIZE[0]: src=ffed210c840f0000 refcount=513 mapcount=512 AFTER remove_migration_ptes
>> hmm-tests-129 [000] ..... 1.468357: __migrate_device_finalize: FINALIZE[0]: src=ffed210c840f0000 refcount=512 AFTER folio_put(src)
>>
>> refcount=0 means the folio would be freed which is not correct. The full output is at [3].
>>
>> Thank you for clearing this up!
>
> Thank you for doing the investigation. Can you send a patch to add a comment
> in migrate_vma_split_unmapped_folio() about this to avoid the confusion
> in the future?
>
Yeah this was really confusing.
Does something like below look good?
diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index 78c7acf024615..a302f9d3ce921 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -910,6 +910,11 @@ static int migrate_vma_split_unmapped_folio(struct migrate_vma *migrate,
folio_get(folio);
split_huge_pmd_address(migrate->vma, addr, true);
+ /*
+ * split_huge_pmd_address consumes the folio_get reference above.
+ * Therefore no folio_put is needed on the folio_split_unmapped
+ * error path.
+ */
ret = folio_split_unmapped(folio, 0);
if (ret)
return ret;
>>
>>
>> [1] https://gist.github.com/uarif1/65e1e816af7aa0ae38dd6ec64d62a993
>> [2] https://gist.github.com/uarif1/79ea9500667daa4e2ef09cb5d308f041
>> [3] https://gist.github.com/uarif1/8a35a6c65ba8b3a1c1dfe72dc30e821d
>
>
> Best Regards,
> Yan, Zi
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mm/migrate_device: fix folio refcount leak on folio_split_unmapped failure
2026-03-05 17:00 ` Usama Arif
@ 2026-03-05 17:32 ` Zi Yan
0 siblings, 0 replies; 16+ messages in thread
From: Zi Yan @ 2026-03-05 17:32 UTC (permalink / raw)
To: Usama Arif
Cc: Mika Penttilä,
Balbir Singh, Kiryl Shutsemau, matthew.brost, npache, david,
Usama Arif, Andrew Morton, linux-mm, joshua.hahnjy, hannes,
rakie.kim, byungchul, gourry, ying.huang, apopple, riel,
shakeel.butt, linux-kernel, kernel-team
On 5 Mar 2026, at 12:00, Usama Arif wrote:
> On 05/03/2026 16:39, Zi Yan wrote:
>> On 5 Mar 2026, at 11:36, Usama Arif wrote:
>>
>>> On 05/03/2026 12:09, Mika Penttilä wrote:
>>>> On 3/5/26 13:44, Usama Arif wrote:
>>>>
>>>>>
>>>>> On 05/03/2026 06:09, Mika Penttilä wrote:
>>>>>> Hi!
>>>>>>
>>>>>> On 3/5/26 01:28, Usama Arif wrote:
>>>>>>
>>>>>>> On 04/03/2026 22:09, Balbir Singh wrote:
>>>>>>>> On 3/5/26 08:54, Zi Yan wrote:
>>>>>>>>> On 4 Mar 2026, at 16:48, Balbir Singh wrote:
>>>>>>>>>
>>>>>>>>>> On 3/5/26 02:17, Zi Yan wrote:
>>>>>>>>>>> On 4 Mar 2026, at 7:01, Usama Arif wrote:
>>>>>>>>>>>
>>>>>>>>>>>> From: Usama Arif <usama.arif@linux.dev>
>>>>>>>>>>>>
>>>>>>>>>>>> migrate_vma_split_unmapped_folio() takes an extra reference via
>>>>>>>>>>>> folio_get() before calling folio_split_unmapped(). On success, the
>>>>>>>>>>>> split consumes this reference: __folio_freeze_and_split_unmapped()
>>>>>>>>>>>> expects the +1 in its folio_ref_freeze() check, and distributes it
>>>>>>>>>>>> across the resulting sub-folios via folio_ref_unfreeze(...+1), which
>>>>>>>>>>>> are later balanced by folio_put() calls in __migrate_device_finalize().
>>>>>>>>>>>>
>>>>>>>>>>>> If folio_split_unmapped() fails (e.g., unexpected pinning returns
>>>>>>>>>>>> -EAGAIN), the function returns without calling folio_put(). The extra
>>>>>>>>>>>> reference is never released.
>>>>>>>>>>>>
>>>>>>>>>>>> Add the missing folio_put() on the error path.
>>>>>>>>>>>>
>>>>>>>>>>>> Fixes: 4265d67e405a4 ("mm/migrate_device: add THP splitting during migration")
>>>>>>>>>>>> Closes: https://lore.kernel.org/all/CAA1CXcDyqPPwf_-W7B+PFQtL8HdoJGCEqVsVxq7DhOUB=L4PQA@mail.gmail.com/
>>>>>>>>>>>> Reported-by: Nico Pache <npache@redhat.com>
>>>>>>>>>>>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>>>>>>>>>>>> ---
>>>>>>>>>>>> mm/migrate_device.c | 4 +++-
>>>>>>>>>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
>>>>>>>>>>>> index 0a8b31939640f..351ecd9065d13 100644
>>>>>>>>>>>> --- a/mm/migrate_device.c
>>>>>>>>>>>> +++ b/mm/migrate_device.c
>>>>>>>>>>>> @@ -917,8 +917,10 @@ static int migrate_vma_split_unmapped_folio(struct migrate_vma *migrate,
>>>>>>>>>>>> folio_get(folio);
>>>>>>>>>>>> split_huge_pmd_address(migrate->vma, addr, true);
>>>>>>>>>>>> ret = folio_split_unmapped(folio, 0);
>>>>>>>>>>>> - if (ret)
>>>>>>>>>>>> + if (ret) {
>>>>>>>>>>>> + folio_put(folio);
>>>>>>>>>>>> return ret;
>>>>>>>>>>>> + }
>>>>>>>>>>>> migrate->src[idx] &= ~MIGRATE_PFN_COMPOUND;
>>>>>>>>>>>> flags = migrate->src[idx] & ((1UL << MIGRATE_PFN_SHIFT) - 1);
>>>>>>>>>>>> pfn = migrate->src[idx] >> MIGRATE_PFN_SHIFT;
>>>>>>>>>>>> --
>>>>>>>>>>>> 2.47.3
>>>>>>>>>>> Add Balbir, who wrote the code, to comment on this.
>>>>>>>>>>>
>>>>>>>>>> Thanks Zi!
>>>>>>>>>>
>>>>>>>>>> Just wondering if there is a reproducer for the issue and how the fix was tested?
>>>>>>>>>> I expect migrate_vma_finalize() to be called for folios, even when split failed and
>>>>>>>>>> drop the lock.
>>>>>>>>> Does migrate_vma_finalize() do folio_put() for failed-to-split folios?
>>>>>>>>> If so, how does it distinguish between split folios and failed-to-split folios?
>>>>>>>>> By comparing source and destination folio orders?
>>>>>>>>>
>>>>>>>> We reset the MIGRATE_PFN_MIGRATE flag for failing to migrate pfns. We do a folio_put
>>>>>>>> on the src in finalize, if it is split then on all the split folios as well.
>>>>>>>>
>>>>>>>>> What we see from migrate_vma_split_unmapped_folio() is that
>>>>>>>>> it adds a refcount for all input folios, but only drops a refcount
>>>>>>>>> for the split folio. Isn’t it cause failed-to-split folios to have
>>>>>>>>> additional refcount?
>>>>>>>>>
>>>>>>> Hello!
>>>>>>>
>>>>>>> Thanks for reviewing everyone. So its very difficult to create a reproducer I think
>>>>>>> the extra reference would need to appear after migrate_device_unmap() but before
>>>>>>> folio_split_unmapped() in migrate_vma_pages()? That's hard to trigger reliably from
>>>>>>> userspace.
>>>>>>>
>>>>>>> The fix came about when Nico indicated there might be an issue if split_huge_pmd_address
>>>>>>> fails in my patch [1].
>>>>>>>
>>>>>>> Below is my understanding of how refcounting is working over here step by step. I
>>>>>>> might very well be wrong on this, and the refcounting is a bit all over the place
>>>>>>> and I might miss a reference change somewhere so would really appreciate if someone
>>>>>>> can confirm this!
>>>>>>>
>>>>>>>
>>>>>>> 1. migrate_vma_collect_huge_pmd():
>>>>>>> a) folio_get(folio) -> +1 (collect reference)
>>>>>>> 2. migrate_device_unmap():
>>>>>>> a) folio_isolate_lru() -> +1 (isolation reference)
>>>>>>> b) folio_put() -> -1 (drops the collect reference)
>>>>>>>
>>>>>>>
>>>>>>> Without this patch fix:
>>>>>>>
>>>>>>> 3. migrate_vma_split_unmapped_folio():
>>>>>>> a) folio_get(folio) -> +1 (split reference)
>>>>>>> b) folio_split_unmapped() -> fails
>>>>>>> c) Returns error — without folio_put() which is the fix
>>>>>>> 4. Caller in migrate_vma_pages(): clears MIGRATE_PFN_MIGRATE | MIGRATE_PFN_COMPOUND
>>>>>>> 5. __migrate_device_finalize(): sees !(src_pfns[i] & MIGRATE_PFN_MIGRATE), restores the folio:
>>>>>>> a) remove_migration_ptes(src, src) — re-establishes user PTEs
>>>>>>> b) folio_unlock(src)
>>>>>>> c) folio_put(src) -> -1 (drops the isolation reference)
>>>>>>>
>>>>>>> The split reference in 3.a is never released and the folio has a permanently elevated refcount.
>>>>>>> Unless I missed a folio_put somewhere for the refcount increase in folio_isolate_lru() (2.b)?
>>>>>>>
>>>>>>> Please let me know if this makes sense!
>>>>>>>
>>>>>>> [1] https://lore.kernel.org/all/CAA1CXcDyqPPwf_-W7B+PFQtL8HdoJGCEqVsVxq7DhOUB=L4PQA@mail.gmail.com/
>>>>>>>
>>>>>>>> Thanks! Yes, the patch makes sense
>>>>>>>>
>>>>>>>> Acked-by: Balbir Singh <balbirs@nvidia.com>
>>>>>>>>
>>>>>>>> Balbir
>>>>>> I remember stumbling on this while ago also. The folio_get() in migrate_vma_split_unmapped_folio()
>>>>>> is balanced with put_page() in __split_huge_pmd_locked() (freeze = true), can't fail for device pages.
>>>>>> Folios at this point are unmapped but have 1 refcount from "collecting".
>>>>>> After folio_split_unmapped() the refcount(s) is still 1.
>>>>>>
>>>>>> So it seems the code is good as is? A comment though would be good for the extra folio_get..
>>>>>>
>>>>> hmm I dont think the put_page() in __split_huge_pmd_locked() is there to balance the folio_get() in
>>>>> migrate_vma_split_unmapped_folio(). There are other points where split_huge_pmd_locked() is called
>>>>> with freeze = true [1] and they don't get a reference before calling split_huge_pmd.
>>>>>
>>>>> I think the folio_put() in __split_huge_pmd_locked() freeze = true case is there as migration
>>>>> entries are being installed?
>>>>>
>>>>> [1] https://elixir.bootlin.com/linux/v6.19.3/source/mm/rmap.c#L2334
>>>>>
>>>>>
>>>> Yes normally you want to drop the reference when installing migration entries but in this context
>>>> you have already done the collecting for the THP folio and you want to balance with the folio_get()
>>>> the put_page() to keep the refs unchanged. Is that right Balbir?
>>>>
>>>> --Mika
>>>>
>>>
>>> Hi Mika,
>>>
>>> You are right, This patch is wrong. I tried the below diff to force folio_split_unmapped to return
>>> -EAGAIN. I ran tools/testing/selftests/mm/hmm-tests -r hmm.hmm_device_private.migrate_anon_huge_err
>>> to trigger the path for folio_split_unmapped.
>>>
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 8e2746ea74adf..6df33b4990a13 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -4140,6 +4140,8 @@ int folio_split_unmapped(struct folio *folio, unsigned int new_order)
>>> if (folio_expected_ref_count(folio) != folio_ref_count(folio) - 1)
>>> return -EAGAIN;
>>>
>>> + return -EAGAIN;
>>> +
>>> local_irq_disable();
>>> ret = __folio_freeze_and_split_unmapped(folio, new_order, &folio->page, NULL,
>>> NULL, false, NULL, SPLIT_TYPE_UNIFORM,
>>>
>>>
>>>
>>> I inserted a lot of traces to keep track of refcounts [1]. Without this patch, I get
>>> ....
>>> hmm-tests-129 [000] ..... 1.476233: __migrate_device_pages: SPLIT_UNMAPPED: folio=ffc536e2c4100000 refcount=0 AFTER error NO folio_put
>>> hmm-tests-129 [000] ..... 1.476234: __migrate_device_pages: PAGES: split FAILED folio=ffc536e2c4100000 refcount=0
>>> hmm-tests-129 [000] ..... 1.476236: __migrate_device_finalize: FINALIZE[0]: src=ffc536e2c4100000 dst=ffc536e2c4100000 src==dst=1 refcount_src=1 mapcount_src=0 order_src=0 migrate=0 BEFORE remove_migration_ptes
>>> hmm-tests-129 [000] ..... 1.476237: __migrate_device_finalize: FINALIZE[0]: src=ffc536e2c4100000 refcount=1 mapcount=0 AFTER remove_migration_ptes
>>> hmm-tests-129 [000] ..... 1.476237: __migrate_device_finalize: FINALIZE[0]: src=ffc536e2c4100000 refcount=0 AFTER folio_put(src)
>>>
>>> i.e. refcount = 512, which is correct as split_huge_pmd_address was successful. Full output is
>>> at [2].
>>>
>>> With this patch, I get:
>>>
>>> BUG: Bad rss-counter state mm:00000000cfe88d5e type:MM_FILEPAGES val:-511 Comm:bash Pid:63
>>> BUG: Bad rss-counter state mm:00000000cfe88d5e type:MM_ANONPAGES val:511 Comm:bash Pid:63
>>> ...
>>> hmm-tests-129 [000] ..... 1.468315: __migrate_device_pages: SPLIT_UNMAPPED: folio=ffed210c840f0000 refcount=1 AFTER error folio_put FIX PRESENT
>>> hmm-tests-129 [000] ..... 1.468315: __migrate_device_pages: PAGES: split FAILED folio=ffed210c840f0000 refcount=1
>>> hmm-tests-129 [000] ..... 1.468318: __migrate_device_finalize: FINALIZE[0]: src=ffed210c840f0000 dst=ffed210c840f0000 src==dst=1 refcount_src=1 mapcount_src=0 order_src=9 migrate=0 BEFORE remove_migration_ptes
>>> hmm-tests-129 [000] ..... 1.468357: __migrate_device_finalize: FINALIZE[0]: src=ffed210c840f0000 refcount=513 mapcount=512 AFTER remove_migration_ptes
>>> hmm-tests-129 [000] ..... 1.468357: __migrate_device_finalize: FINALIZE[0]: src=ffed210c840f0000 refcount=512 AFTER folio_put(src)
>>>
>>> refcount=0 means the folio would be freed which is not correct. The full output is at [3].
>>>
>>> Thank you for clearing this up!
>>
>> Thank you for doing the investigation. Can you send a patch to add a comment
>> in migrate_vma_split_unmapped_folio() about this to avoid the confusion
>> in the future?
>>
>
> Yeah this was really confusing.
>
> Does something like below look good?
>
> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
> index 78c7acf024615..a302f9d3ce921 100644
> --- a/mm/migrate_device.c
> +++ b/mm/migrate_device.c
> @@ -910,6 +910,11 @@ static int migrate_vma_split_unmapped_folio(struct migrate_vma *migrate,
>
> folio_get(folio);
> split_huge_pmd_address(migrate->vma, addr, true);
> + /*
> + * split_huge_pmd_address consumes the folio_get reference above.
> + * Therefore no folio_put is needed on the folio_split_unmapped
> + * error path.
> + */
> ret = folio_split_unmapped(folio, 0);
> if (ret)
> return ret;
I do not think there is a need to explain why there is no folio_put()
below. How about below?
1. it makes sure the folio has the right ref count,
2. it explains folio_get() is for split_huge_pmd_address() instead of
folio_split_unmapped().
diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index 0a8b31939640f..0b31b878210ba 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -914,8 +914,14 @@ static int migrate_vma_split_unmapped_folio(struct migrate_vma *migrate,
unsigned long flags;
int ret = 0;
+ VM_WARN_ON_ONCE(folio_ref_count(folio) == 1);
+ /*
+ * take a reference, since split_huge_pmd_address() with freeze = true
+ * drops a reference at the end.
+ */
folio_get(folio);
split_huge_pmd_address(migrate->vma, addr, true);
+
ret = folio_split_unmapped(folio, 0);
if (ret)
return ret;
>
>>>
>>>
>>> [1] https://gist.github.com/uarif1/65e1e816af7aa0ae38dd6ec64d62a993
>>> [2] https://gist.github.com/uarif1/79ea9500667daa4e2ef09cb5d308f041
>>> [3] https://gist.github.com/uarif1/8a35a6c65ba8b3a1c1dfe72dc30e821d
>>
>>
>> Best Regards,
>> Yan, Zi
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2026-03-05 17:33 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-03-04 12:01 [PATCH] mm/migrate_device: fix folio refcount leak on folio_split_unmapped failure Usama Arif
2026-03-04 14:00 ` Kiryl Shutsemau
2026-03-04 15:17 ` Zi Yan
2026-03-04 21:48 ` Balbir Singh
2026-03-04 21:54 ` Zi Yan
2026-03-04 22:02 ` Matthew Brost
2026-03-04 22:09 ` Balbir Singh
2026-03-04 23:28 ` Usama Arif
2026-03-05 6:09 ` Mika Penttilä
2026-03-05 11:44 ` Usama Arif
2026-03-05 12:09 ` Mika Penttilä
2026-03-05 16:36 ` Usama Arif
2026-03-05 16:39 ` Zi Yan
2026-03-05 17:00 ` Usama Arif
2026-03-05 17:32 ` Zi Yan
2026-03-04 15:25 ` Joshua Hahn
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox