Re: A potential refcount issue during __folio

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: A potential refcount issue during __folio_split
       [not found] ` <20260222010708.uohpmddmzaa4i4ic@master>
@ 2026-02-22  3:00   ` Zi Yan
  2026-02-22 10:28     ` Wei Yang
  2026-02-23  9:23     ` David Hildenbrand (Arm)
  0 siblings, 2 replies; 4+ messages in thread
From: Zi Yan @ 2026-02-22  3:00 UTC (permalink / raw)
  To: Wei Yang; +Cc: David Hildenbrand (Arm), Linux MM

+linux-mm list, since the topic is interesting and worth having a record in the list.

On 21 Feb 2026, at 20:07, Wei Yang wrote:

> On Sun, Feb 22, 2026 at 01:04:25AM +0000, Wei Yang wrote:
>> Hi, David & Zi Yan
>>
>> With some tests, I may find one refcount issue during __folio_split(), when
>>
>>  * folio is isolated and @list is provided
>>  * @lock_at is not the large folio's head page
>>
>> The tricky thing is after __folio_freeze_and_split_unmapped() and
>> remap_page(), we would release one refcount for each after-split folio except
>> @lock_at after-split folio.
>>
>> But when @list is provided, we would grab extra refcount in
>> __folio_freeze_and_split_unmapped() for each tail after-split folio by
>> lru_add_split_folio(), except head after-split folio.
>>
>> If @lock_at is the large folio's head page, it is fine. If not, the
>> after-split folio's refcount is not well maintained.
>>
>> Take anonymous large folio mapped in one process for example, the refcount
>> change during uniformly __folio_split() to order-0 would look like this:
>>
>> after lru_add_split_folio()  after remap     after unlock if @lockat == head
>>   f0: 1                      1 + 1 = 2       2
>>   f1: 1 + 1 = 2              2 + 1 = 3       3 - 1 = 2
>>   f2: 1 + 1 = 2              2 + 1 = 3       3 - 1 = 2
>>   f3: 1 + 1 = 2              2 + 1 = 3       3 - 1 = 2
>>
>>                                              after unlock if @lockat == head + 1
>>                                              2 - 1 = 1
>>                                              3
>>                                              3 - 1 = 2
>>                                              3 - 1 = 2
>>
>> This shows the refcount of f0/f1 is not correct, if @lockat != head page.

after lru_add_split_folio(), refcount for head should be 0, since it is frozen,
each of the rest subpages should be refcount == 1. Then, head is unfreezed
and its refcount goes to 1. remap adds 1 to all subpages refcount.
after the unlock loop, every subpage get -1 refcount except lock_at.


>>
>> The good news is there is no use case in kernel now. Then I am not sure this
>> worth a fix. Would like to ask for your opinion first. Hope I don't miss
>> something important.
>>
>> Since there is no real case in kernel, I adjust the current debugfs
>> interface(/sys/kernel/debug/split_huge_pages) to trigger it. Below is the diff
>> for the change. This change could pass the selftests/split_huge_page_test to
>> make sure the code change itself is correct.
>>
>> Then change the lockat to folio_page(folio, 1) could trigger the issue, when
>> trying to split a THP through the debugfs from userspace.
>>
>
> Sorry, the diff is lost:
>
> From c6d4c3d81e16f5f4b509cff884540bec0f91e6c3 Mon Sep 17 00:00:00 2001
> From: Wei Yang <richard.weiyang@gmail.com>
> Date: Thu, 19 Feb 2026 08:44:49 +0800
> Subject: [PATCH] [T]mm/huge_memory: test split with isolation
>
> ---
>  mm/huge_memory.c | 31 +++++++++++++++++++++----------
>  1 file changed, 21 insertions(+), 10 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index ed0375ea22d1..65354c5edfef 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -4621,9 +4621,11 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start,
>  	for (addr = vaddr_start; addr < vaddr_end; addr += PAGE_SIZE) {
>  		struct vm_area_struct *vma = vma_lookup(mm, addr);
>  		struct folio_walk fw;
> -		struct folio *folio;
> +		struct folio *folio, *folio2;
> +		struct page *lockat;
>  		struct address_space *mapping;
>  		unsigned int target_order = new_order;
> +		LIST_HEAD(split_folios);
>
>  		if (!vma)
>  			break;
> @@ -4660,32 +4662,41 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start,
>  		    folio_expected_ref_count(folio) != folio_ref_count(folio))
>  			goto next;
>
> -		if (!folio_trylock(folio))
> +		if (!folio_isolate_lru(folio))
>  			goto next;
> -		folio_get(folio);
> -		folio_walk_end(&fw, vma);
>
>  		if (!folio_test_anon(folio) && folio->mapping != mapping)
> -			goto unlock;
> +			goto putback;
> +
> +		folio_lock(folio);
> +		folio_walk_end(&fw, vma);
> +		lockat = folio_page(folio, 0);
>
>  		if (in_folio_offset < 0 ||
>  		    in_folio_offset >= folio_nr_pages(folio)) {
> -			if (!split_folio_to_order(folio, target_order))
> +			lockat = folio_page(folio, 0);
> +			if (!split_huge_page_to_list_to_order(lockat, &split_folios, target_order))
>  				split++;
>  		} else {
>  			struct page *split_at = folio_page(folio,
>  							   in_folio_offset);
> -			if (!folio_split(folio, target_order, split_at, NULL))
> +			if (!folio_split(folio, target_order, split_at, &split_folios))
>  				split++;
>  		}
>
> -unlock:
> +		list_add_tail(&folio->lru, &split_folios);
> +		folio_unlock(page_folio(lockat));
>
> -		folio_unlock(folio);
> -		folio_put(folio);
> +		list_for_each_entry_safe(folio, folio2, &split_folios, lru) {
> +			list_del(&folio->lru);
> +			folio_putback_lru(folio);
> +		}
>
>  		cond_resched();
>  		continue;
> +
> +putback:
> +		folio_putback_lru(folio);

        ^^^^^^^^^^ cannot always put folio here.

>  next:
>  		folio_walk_end(&fw, vma);
>  		cond_resched();
> -- 

Your code change is wrong. Because when you are using split_huge_page_to_list_to_order(),
the code pattern should be:

get_page(lock_at);
lock_page(lock_at);
split_huge_page_to_list_to_order(lock_at);
unlock_page(lock_at);
put_page(lock_at);

So the extra refcount in lock_at will be decreased by put_page(lock_at);

But your code change does not do put_page(lock_at) but always does folio_putback_lru(folio),
where folio is the original head.

BTW, in the folio world, I do not think it is possible to perform the aforementioned
split_huge_page_to_list_to_order() pattern any more, since you always work on folio,
the head. Unless there is a need of getting hold of a tail after-split folio after
a folio split, the pattern would be:

tail_page = folio_page(folio, N);

folio_get(folio);
folio_lock(folio);
folio_split(folio, ..., /* new parameter: lock_at = */ tail_page, ...);
tail_folio = page_folio(tail_page);
folio_unlock(tail_folio);
folio_put(tail_folio);



--
Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: A potential refcount issue during __folio_split
  2026-02-22  3:00   ` A potential refcount issue during __folio_split Zi Yan
@ 2026-02-22 10:28     ` Wei Yang
  2026-02-23  9:23     ` David Hildenbrand (Arm)
  1 sibling, 0 replies; 4+ messages in thread
From: Wei Yang @ 2026-02-22 10:28 UTC (permalink / raw)
  To: Zi Yan; +Cc: Wei Yang, David Hildenbrand (Arm), Linux MM

On Sat, Feb 21, 2026 at 10:00:44PM -0500, Zi Yan wrote:
>+linux-mm list, since the topic is interesting and worth having a record in the list.
>
>On 21 Feb 2026, at 20:07, Wei Yang wrote:
>
>> On Sun, Feb 22, 2026 at 01:04:25AM +0000, Wei Yang wrote:
>>> Hi, David & Zi Yan
>>>
>>> With some tests, I may find one refcount issue during __folio_split(), when
>>>
>>>  * folio is isolated and @list is provided
>>>  * @lock_at is not the large folio's head page
>>>
>>> The tricky thing is after __folio_freeze_and_split_unmapped() and
>>> remap_page(), we would release one refcount for each after-split folio except
>>> @lock_at after-split folio.
>>>
>>> But when @list is provided, we would grab extra refcount in
>>> __folio_freeze_and_split_unmapped() for each tail after-split folio by
>>> lru_add_split_folio(), except head after-split folio.
>>>
>>> If @lock_at is the large folio's head page, it is fine. If not, the
>>> after-split folio's refcount is not well maintained.
>>>
>>> Take anonymous large folio mapped in one process for example, the refcount
>>> change during uniformly __folio_split() to order-0 would look like this:
>>>
>>> after lru_add_split_folio()  after remap     after unlock if @lockat == head
>>>   f0: 1                      1 + 1 = 2       2
>>>   f1: 1 + 1 = 2              2 + 1 = 3       3 - 1 = 2
>>>   f2: 1 + 1 = 2              2 + 1 = 3       3 - 1 = 2
>>>   f3: 1 + 1 = 2              2 + 1 = 3       3 - 1 = 2
>>>
>>>                                              after unlock if @lockat == head + 1
>>>                                              2 - 1 = 1
>>>                                              3
>>>                                              3 - 1 = 2
>>>                                              3 - 1 = 2
>>>
>>> This shows the refcount of f0/f1 is not correct, if @lockat != head page.
>
>after lru_add_split_folio(), refcount for head should be 0, since it is frozen,
>each of the rest subpages should be refcount == 1. Then, head is unfreezed
>and its refcount goes to 1. remap adds 1 to all subpages refcount.
>after the unlock loop, every subpage get -1 refcount except lock_at.
>
>
>>>
>>> The good news is there is no use case in kernel now. Then I am not sure this
>>> worth a fix. Would like to ask for your opinion first. Hope I don't miss
>>> something important.
>>>
>>> Since there is no real case in kernel, I adjust the current debugfs
>>> interface(/sys/kernel/debug/split_huge_pages) to trigger it. Below is the diff
>>> for the change. This change could pass the selftests/split_huge_page_test to
>>> make sure the code change itself is correct.
>>>
>>> Then change the lockat to folio_page(folio, 1) could trigger the issue, when
>>> trying to split a THP through the debugfs from userspace.
>>>
>>
>> Sorry, the diff is lost:
>>
>> From c6d4c3d81e16f5f4b509cff884540bec0f91e6c3 Mon Sep 17 00:00:00 2001
>> From: Wei Yang <richard.weiyang@gmail.com>
>> Date: Thu, 19 Feb 2026 08:44:49 +0800
>> Subject: [PATCH] [T]mm/huge_memory: test split with isolation
>>
>> ---
>>  mm/huge_memory.c | 31 +++++++++++++++++++++----------
>>  1 file changed, 21 insertions(+), 10 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index ed0375ea22d1..65354c5edfef 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -4621,9 +4621,11 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start,
>>  	for (addr = vaddr_start; addr < vaddr_end; addr += PAGE_SIZE) {
>>  		struct vm_area_struct *vma = vma_lookup(mm, addr);
>>  		struct folio_walk fw;
>> -		struct folio *folio;
>> +		struct folio *folio, *folio2;
>> +		struct page *lockat;
>>  		struct address_space *mapping;
>>  		unsigned int target_order = new_order;
>> +		LIST_HEAD(split_folios);
>>
>>  		if (!vma)
>>  			break;
>> @@ -4660,32 +4662,41 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start,
>>  		    folio_expected_ref_count(folio) != folio_ref_count(folio))
>>  			goto next;
>>
>> -		if (!folio_trylock(folio))
>> +		if (!folio_isolate_lru(folio))
>>  			goto next;
>> -		folio_get(folio);
>> -		folio_walk_end(&fw, vma);
>>
>>  		if (!folio_test_anon(folio) && folio->mapping != mapping)
>> -			goto unlock;
>> +			goto putback;
>> +
>> +		folio_lock(folio);
>> +		folio_walk_end(&fw, vma);
>> +		lockat = folio_page(folio, 0);
>>
>>  		if (in_folio_offset < 0 ||
>>  		    in_folio_offset >= folio_nr_pages(folio)) {
>> -			if (!split_folio_to_order(folio, target_order))
>> +			lockat = folio_page(folio, 0);
>> +			if (!split_huge_page_to_list_to_order(lockat, &split_folios, target_order))
>>  				split++;
>>  		} else {
>>  			struct page *split_at = folio_page(folio,
>>  							   in_folio_offset);
>> -			if (!folio_split(folio, target_order, split_at, NULL))
>> +			if (!folio_split(folio, target_order, split_at, &split_folios))
>>  				split++;
>>  		}
>>
>> -unlock:
>> +		list_add_tail(&folio->lru, &split_folios);
>> +		folio_unlock(page_folio(lockat));
>>
>> -		folio_unlock(folio);
>> -		folio_put(folio);
>> +		list_for_each_entry_safe(folio, folio2, &split_folios, lru) {
>> +			list_del(&folio->lru);
>> +			folio_putback_lru(folio);
>> +		}
>>
>>  		cond_resched();
>>  		continue;
>> +
>> +putback:
>> +		folio_putback_lru(folio);
>
>        ^^^^^^^^^^ cannot always put folio here.
>

You mean I should put page_folio(lockat)?

This is the error patch. After isolation, if folio mapping
changes, it release the folio. So the folio is not split yet.

For other case, it handles differently. See below.

>>  next:
>>  		folio_walk_end(&fw, vma);
>>  		cond_resched();
>> -- 
>
>Your code change is wrong. Because when you are using split_huge_page_to_list_to_order(),
>the code pattern should be:
>
>get_page(lock_at);
>lock_page(lock_at);
>split_huge_page_to_list_to_order(lock_at);
>unlock_page(lock_at);
>put_page(lock_at);
>
>So the extra refcount in lock_at will be decreased by put_page(lock_at);
>

Yes, generally it is. But we seem not forbid provide a list to put after-split
folio.

And I found there is another requirement, if we want to put after-folio on
specified list, we have to isolate folio first.

I took sometime to realize it.

>But your code change does not do put_page(lock_at) but always does folio_putback_lru(folio),
>where folio is the original head.
>

I put "folio" to @split_folios after split. And then do folio_putback_lru()
for each of them.

If split successful, @split_folios contains all after-split folios, including
@lock_at.

If split fails, @split_folios contains the original folio, including @lock_at
by some kind.

Maybe I misunderstand your point. Would you mind be more specific?

>BTW, in the folio world, I do not think it is possible to perform the aforementioned
>split_huge_page_to_list_to_order() pattern any more, since you always work on folio,
>the head. Unless there is a need of getting hold of a tail after-split folio after
>a folio split, the pattern would be:
>
>tail_page = folio_page(folio, N);
>
>folio_get(folio);
>folio_lock(folio);
>folio_split(folio, ..., /* new parameter: lock_at = */ tail_page, ...);
>tail_folio = page_folio(tail_page);
>folio_unlock(tail_folio);
>folio_put(tail_folio);
>

Yes, in the folio world, we probably won't lockat a middle page.

Another thing I found is in commit e9b61f19858a ("thp: reintroduce
split_huge_page()"), the comment of split_huge_page_to_list() says "@page
can point to any subpage of huge page to split", but the refcount mechanism
seems settle down then. So I am afraid actually @page has to be head since
then.

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: A potential refcount issue during __folio_split
  2026-02-22  3:00   ` A potential refcount issue during __folio_split Zi Yan
  2026-02-22 10:28     ` Wei Yang
@ 2026-02-23  9:23     ` David Hildenbrand (Arm)
  2026-02-23 11:59       ` Wei Yang
  1 sibling, 1 reply; 4+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-23  9:23 UTC (permalink / raw)
  To: Zi Yan, Wei Yang; +Cc: Linux MM

> BTW, in the folio world, I do not think it is possible to perform the aforementioned
> split_huge_page_to_list_to_order() pattern any more, since you always work on folio,
> the head. Unless there is a need of getting hold of a tail after-split folio after
> a folio split, the pattern would be:
> 
> tail_page = folio_page(folio, N);
> 
> folio_get(folio);
> folio_lock(folio);
> folio_split(folio, ..., /* new parameter: lock_at = */ tail_page, ...);
> tail_folio = page_folio(tail_page);
> folio_unlock(tail_folio);
> folio_put(tail_folio);

Agreed. Maybe it would be even nicer if the split function could return 
the new folio directly.

folio_get(folio);
folio_lock(folio);
split_folio = folio_split_XXX(folio, ..., tail_page, ...);
if (IS_ERR_VALUE(split_folio)) {
	...
}
folio_unlock(split_folio);
folio_put(split__folio);

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: A potential refcount issue during __folio_split
  2026-02-23  9:23     ` David Hildenbrand (Arm)
@ 2026-02-23 11:59       ` Wei Yang
  0 siblings, 0 replies; 4+ messages in thread
From: Wei Yang @ 2026-02-23 11:59 UTC (permalink / raw)
  To: David Hildenbrand (Arm); +Cc: Zi Yan, Wei Yang, Linux MM

On Mon, Feb 23, 2026 at 10:23:11AM +0100, David Hildenbrand (Arm) wrote:
>> BTW, in the folio world, I do not think it is possible to perform the aforementioned
>> split_huge_page_to_list_to_order() pattern any more, since you always work on folio,
>> the head. Unless there is a need of getting hold of a tail after-split folio after
>> a folio split, the pattern would be:
>> 
>> tail_page = folio_page(folio, N);
>> 
>> folio_get(folio);
>> folio_lock(folio);
>> folio_split(folio, ..., /* new parameter: lock_at = */ tail_page, ...);
>> tail_folio = page_folio(tail_page);
>> folio_unlock(tail_folio);
>> folio_put(tail_folio);
>

Missed this. Agree.

>Agreed. Maybe it would be even nicer if the split function could return the
>new folio directly.
>
>folio_get(folio);
>folio_lock(folio);
>split_folio = folio_split_XXX(folio, ..., tail_page, ...);
>if (IS_ERR_VALUE(split_folio)) {
>	...
>}
>folio_unlock(split_folio);
>folio_put(split__folio);
>

I am afraid it would be complicated?

Well, we don't have this usecase now, could decide it when we do need it.

>-- 
>Cheers,
>
>David

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-02-23 11:59 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20260222010425.gbsjzhrew3pg4qrw@master>
     [not found] ` <20260222010708.uohpmddmzaa4i4ic@master>
2026-02-22  3:00   ` A potential refcount issue during __folio_split Zi Yan
2026-02-22 10:28     ` Wei Yang
2026-02-23  9:23     ` David Hildenbrand (Arm)
2026-02-23 11:59       ` Wei Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox