Re: A potential refcount issue during __folio

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: A potential refcount issue during __folio_split
       [not found] ` <20260222010708.uohpmddmzaa4i4ic@master>
@ 2026-02-22  3:00   ` Zi Yan
  2026-02-22 10:28     ` Wei Yang
  2026-02-23  9:23     ` David Hildenbrand (Arm)
  0 siblings, 2 replies; 6+ messages in thread
From: Zi Yan @ 2026-02-22  3:00 UTC (permalink / raw)
  To: Wei Yang; +Cc: David Hildenbrand (Arm), Linux MM

+linux-mm list, since the topic is interesting and worth having a record in the list.

On 21 Feb 2026, at 20:07, Wei Yang wrote:

> On Sun, Feb 22, 2026 at 01:04:25AM +0000, Wei Yang wrote:
>> Hi, David & Zi Yan
>>
>> With some tests, I may find one refcount issue during __folio_split(), when
>>
>>  * folio is isolated and @list is provided
>>  * @lock_at is not the large folio's head page
>>
>> The tricky thing is after __folio_freeze_and_split_unmapped() and
>> remap_page(), we would release one refcount for each after-split folio except
>> @lock_at after-split folio.
>>
>> But when @list is provided, we would grab extra refcount in
>> __folio_freeze_and_split_unmapped() for each tail after-split folio by
>> lru_add_split_folio(), except head after-split folio.
>>
>> If @lock_at is the large folio's head page, it is fine. If not, the
>> after-split folio's refcount is not well maintained.
>>
>> Take anonymous large folio mapped in one process for example, the refcount
>> change during uniformly __folio_split() to order-0 would look like this:
>>
>> after lru_add_split_folio()  after remap     after unlock if @lockat == head
>>   f0: 1                      1 + 1 = 2       2
>>   f1: 1 + 1 = 2              2 + 1 = 3       3 - 1 = 2
>>   f2: 1 + 1 = 2              2 + 1 = 3       3 - 1 = 2
>>   f3: 1 + 1 = 2              2 + 1 = 3       3 - 1 = 2
>>
>>                                              after unlock if @lockat == head + 1
>>                                              2 - 1 = 1
>>                                              3
>>                                              3 - 1 = 2
>>                                              3 - 1 = 2
>>
>> This shows the refcount of f0/f1 is not correct, if @lockat != head page.

after lru_add_split_folio(), refcount for head should be 0, since it is frozen,
each of the rest subpages should be refcount == 1. Then, head is unfreezed
and its refcount goes to 1. remap adds 1 to all subpages refcount.
after the unlock loop, every subpage get -1 refcount except lock_at.


>>
>> The good news is there is no use case in kernel now. Then I am not sure this
>> worth a fix. Would like to ask for your opinion first. Hope I don't miss
>> something important.
>>
>> Since there is no real case in kernel, I adjust the current debugfs
>> interface(/sys/kernel/debug/split_huge_pages) to trigger it. Below is the diff
>> for the change. This change could pass the selftests/split_huge_page_test to
>> make sure the code change itself is correct.
>>
>> Then change the lockat to folio_page(folio, 1) could trigger the issue, when
>> trying to split a THP through the debugfs from userspace.
>>
>
> Sorry, the diff is lost:
>
> From c6d4c3d81e16f5f4b509cff884540bec0f91e6c3 Mon Sep 17 00:00:00 2001
> From: Wei Yang <richard.weiyang@gmail.com>
> Date: Thu, 19 Feb 2026 08:44:49 +0800
> Subject: [PATCH] [T]mm/huge_memory: test split with isolation
>
> ---
>  mm/huge_memory.c | 31 +++++++++++++++++++++----------
>  1 file changed, 21 insertions(+), 10 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index ed0375ea22d1..65354c5edfef 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -4621,9 +4621,11 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start,
>  	for (addr = vaddr_start; addr < vaddr_end; addr += PAGE_SIZE) {
>  		struct vm_area_struct *vma = vma_lookup(mm, addr);
>  		struct folio_walk fw;
> -		struct folio *folio;
> +		struct folio *folio, *folio2;
> +		struct page *lockat;
>  		struct address_space *mapping;
>  		unsigned int target_order = new_order;
> +		LIST_HEAD(split_folios);
>
>  		if (!vma)
>  			break;
> @@ -4660,32 +4662,41 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start,
>  		    folio_expected_ref_count(folio) != folio_ref_count(folio))
>  			goto next;
>
> -		if (!folio_trylock(folio))
> +		if (!folio_isolate_lru(folio))
>  			goto next;
> -		folio_get(folio);
> -		folio_walk_end(&fw, vma);
>
>  		if (!folio_test_anon(folio) && folio->mapping != mapping)
> -			goto unlock;
> +			goto putback;
> +
> +		folio_lock(folio);
> +		folio_walk_end(&fw, vma);
> +		lockat = folio_page(folio, 0);
>
>  		if (in_folio_offset < 0 ||
>  		    in_folio_offset >= folio_nr_pages(folio)) {
> -			if (!split_folio_to_order(folio, target_order))
> +			lockat = folio_page(folio, 0);
> +			if (!split_huge_page_to_list_to_order(lockat, &split_folios, target_order))
>  				split++;
>  		} else {
>  			struct page *split_at = folio_page(folio,
>  							   in_folio_offset);
> -			if (!folio_split(folio, target_order, split_at, NULL))
> +			if (!folio_split(folio, target_order, split_at, &split_folios))
>  				split++;
>  		}
>
> -unlock:
> +		list_add_tail(&folio->lru, &split_folios);
> +		folio_unlock(page_folio(lockat));
>
> -		folio_unlock(folio);
> -		folio_put(folio);
> +		list_for_each_entry_safe(folio, folio2, &split_folios, lru) {
> +			list_del(&folio->lru);
> +			folio_putback_lru(folio);
> +		}
>
>  		cond_resched();
>  		continue;
> +
> +putback:
> +		folio_putback_lru(folio);

        ^^^^^^^^^^ cannot always put folio here.

>  next:
>  		folio_walk_end(&fw, vma);
>  		cond_resched();
> -- 

Your code change is wrong. Because when you are using split_huge_page_to_list_to_order(),
the code pattern should be:

get_page(lock_at);
lock_page(lock_at);
split_huge_page_to_list_to_order(lock_at);
unlock_page(lock_at);
put_page(lock_at);

So the extra refcount in lock_at will be decreased by put_page(lock_at);

But your code change does not do put_page(lock_at) but always does folio_putback_lru(folio),
where folio is the original head.

BTW, in the folio world, I do not think it is possible to perform the aforementioned
split_huge_page_to_list_to_order() pattern any more, since you always work on folio,
the head. Unless there is a need of getting hold of a tail after-split folio after
a folio split, the pattern would be:

tail_page = folio_page(folio, N);

folio_get(folio);
folio_lock(folio);
folio_split(folio, ..., /* new parameter: lock_at = */ tail_page, ...);
tail_folio = page_folio(tail_page);
folio_unlock(tail_folio);
folio_put(tail_folio);



--
Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: A potential refcount issue during __folio_split
  2026-02-22  3:00   ` A potential refcount issue during __folio_split Zi Yan
@ 2026-02-22 10:28     ` Wei Yang
  2026-02-23  9:23     ` David Hildenbrand (Arm)
  1 sibling, 0 replies; 6+ messages in thread
From: Wei Yang @ 2026-02-22 10:28 UTC (permalink / raw)
  To: Zi Yan; +Cc: Wei Yang, David Hildenbrand (Arm), Linux MM

On Sat, Feb 21, 2026 at 10:00:44PM -0500, Zi Yan wrote:
>+linux-mm list, since the topic is interesting and worth having a record in the list.
>
>On 21 Feb 2026, at 20:07, Wei Yang wrote:
>
>> On Sun, Feb 22, 2026 at 01:04:25AM +0000, Wei Yang wrote:
>>> Hi, David & Zi Yan
>>>
>>> With some tests, I may find one refcount issue during __folio_split(), when
>>>
>>>  * folio is isolated and @list is provided
>>>  * @lock_at is not the large folio's head page
>>>
>>> The tricky thing is after __folio_freeze_and_split_unmapped() and
>>> remap_page(), we would release one refcount for each after-split folio except
>>> @lock_at after-split folio.
>>>
>>> But when @list is provided, we would grab extra refcount in
>>> __folio_freeze_and_split_unmapped() for each tail after-split folio by
>>> lru_add_split_folio(), except head after-split folio.
>>>
>>> If @lock_at is the large folio's head page, it is fine. If not, the
>>> after-split folio's refcount is not well maintained.
>>>
>>> Take anonymous large folio mapped in one process for example, the refcount
>>> change during uniformly __folio_split() to order-0 would look like this:
>>>
>>> after lru_add_split_folio()  after remap     after unlock if @lockat == head
>>>   f0: 1                      1 + 1 = 2       2
>>>   f1: 1 + 1 = 2              2 + 1 = 3       3 - 1 = 2
>>>   f2: 1 + 1 = 2              2 + 1 = 3       3 - 1 = 2
>>>   f3: 1 + 1 = 2              2 + 1 = 3       3 - 1 = 2
>>>
>>>                                              after unlock if @lockat == head + 1
>>>                                              2 - 1 = 1
>>>                                              3
>>>                                              3 - 1 = 2
>>>                                              3 - 1 = 2
>>>
>>> This shows the refcount of f0/f1 is not correct, if @lockat != head page.
>
>after lru_add_split_folio(), refcount for head should be 0, since it is frozen,
>each of the rest subpages should be refcount == 1. Then, head is unfreezed
>and its refcount goes to 1. remap adds 1 to all subpages refcount.
>after the unlock loop, every subpage get -1 refcount except lock_at.
>
>
>>>
>>> The good news is there is no use case in kernel now. Then I am not sure this
>>> worth a fix. Would like to ask for your opinion first. Hope I don't miss
>>> something important.
>>>
>>> Since there is no real case in kernel, I adjust the current debugfs
>>> interface(/sys/kernel/debug/split_huge_pages) to trigger it. Below is the diff
>>> for the change. This change could pass the selftests/split_huge_page_test to
>>> make sure the code change itself is correct.
>>>
>>> Then change the lockat to folio_page(folio, 1) could trigger the issue, when
>>> trying to split a THP through the debugfs from userspace.
>>>
>>
>> Sorry, the diff is lost:
>>
>> From c6d4c3d81e16f5f4b509cff884540bec0f91e6c3 Mon Sep 17 00:00:00 2001
>> From: Wei Yang <richard.weiyang@gmail.com>
>> Date: Thu, 19 Feb 2026 08:44:49 +0800
>> Subject: [PATCH] [T]mm/huge_memory: test split with isolation
>>
>> ---
>>  mm/huge_memory.c | 31 +++++++++++++++++++++----------
>>  1 file changed, 21 insertions(+), 10 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index ed0375ea22d1..65354c5edfef 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -4621,9 +4621,11 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start,
>>  	for (addr = vaddr_start; addr < vaddr_end; addr += PAGE_SIZE) {
>>  		struct vm_area_struct *vma = vma_lookup(mm, addr);
>>  		struct folio_walk fw;
>> -		struct folio *folio;
>> +		struct folio *folio, *folio2;
>> +		struct page *lockat;
>>  		struct address_space *mapping;
>>  		unsigned int target_order = new_order;
>> +		LIST_HEAD(split_folios);
>>
>>  		if (!vma)
>>  			break;
>> @@ -4660,32 +4662,41 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start,
>>  		    folio_expected_ref_count(folio) != folio_ref_count(folio))
>>  			goto next;
>>
>> -		if (!folio_trylock(folio))
>> +		if (!folio_isolate_lru(folio))
>>  			goto next;
>> -		folio_get(folio);
>> -		folio_walk_end(&fw, vma);
>>
>>  		if (!folio_test_anon(folio) && folio->mapping != mapping)
>> -			goto unlock;
>> +			goto putback;
>> +
>> +		folio_lock(folio);
>> +		folio_walk_end(&fw, vma);
>> +		lockat = folio_page(folio, 0);
>>
>>  		if (in_folio_offset < 0 ||
>>  		    in_folio_offset >= folio_nr_pages(folio)) {
>> -			if (!split_folio_to_order(folio, target_order))
>> +			lockat = folio_page(folio, 0);
>> +			if (!split_huge_page_to_list_to_order(lockat, &split_folios, target_order))
>>  				split++;
>>  		} else {
>>  			struct page *split_at = folio_page(folio,
>>  							   in_folio_offset);
>> -			if (!folio_split(folio, target_order, split_at, NULL))
>> +			if (!folio_split(folio, target_order, split_at, &split_folios))
>>  				split++;
>>  		}
>>
>> -unlock:
>> +		list_add_tail(&folio->lru, &split_folios);
>> +		folio_unlock(page_folio(lockat));
>>
>> -		folio_unlock(folio);
>> -		folio_put(folio);
>> +		list_for_each_entry_safe(folio, folio2, &split_folios, lru) {
>> +			list_del(&folio->lru);
>> +			folio_putback_lru(folio);
>> +		}
>>
>>  		cond_resched();
>>  		continue;
>> +
>> +putback:
>> +		folio_putback_lru(folio);
>
>        ^^^^^^^^^^ cannot always put folio here.
>

You mean I should put page_folio(lockat)?

This is the error patch. After isolation, if folio mapping
changes, it release the folio. So the folio is not split yet.

For other case, it handles differently. See below.

>>  next:
>>  		folio_walk_end(&fw, vma);
>>  		cond_resched();
>> -- 
>
>Your code change is wrong. Because when you are using split_huge_page_to_list_to_order(),
>the code pattern should be:
>
>get_page(lock_at);
>lock_page(lock_at);
>split_huge_page_to_list_to_order(lock_at);
>unlock_page(lock_at);
>put_page(lock_at);
>
>So the extra refcount in lock_at will be decreased by put_page(lock_at);
>

Yes, generally it is. But we seem not forbid provide a list to put after-split
folio.

And I found there is another requirement, if we want to put after-folio on
specified list, we have to isolate folio first.

I took sometime to realize it.

>But your code change does not do put_page(lock_at) but always does folio_putback_lru(folio),
>where folio is the original head.
>

I put "folio" to @split_folios after split. And then do folio_putback_lru()
for each of them.

If split successful, @split_folios contains all after-split folios, including
@lock_at.

If split fails, @split_folios contains the original folio, including @lock_at
by some kind.

Maybe I misunderstand your point. Would you mind be more specific?

>BTW, in the folio world, I do not think it is possible to perform the aforementioned
>split_huge_page_to_list_to_order() pattern any more, since you always work on folio,
>the head. Unless there is a need of getting hold of a tail after-split folio after
>a folio split, the pattern would be:
>
>tail_page = folio_page(folio, N);
>
>folio_get(folio);
>folio_lock(folio);
>folio_split(folio, ..., /* new parameter: lock_at = */ tail_page, ...);
>tail_folio = page_folio(tail_page);
>folio_unlock(tail_folio);
>folio_put(tail_folio);
>

Yes, in the folio world, we probably won't lockat a middle page.

Another thing I found is in commit e9b61f19858a ("thp: reintroduce
split_huge_page()"), the comment of split_huge_page_to_list() says "@page
can point to any subpage of huge page to split", but the refcount mechanism
seems settle down then. So I am afraid actually @page has to be head since
then.

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: A potential refcount issue during __folio_split
  2026-02-22  3:00   ` A potential refcount issue during __folio_split Zi Yan
  2026-02-22 10:28     ` Wei Yang
@ 2026-02-23  9:23     ` David Hildenbrand (Arm)
  2026-02-23 11:59       ` Wei Yang
  1 sibling, 1 reply; 6+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-23  9:23 UTC (permalink / raw)
  To: Zi Yan, Wei Yang; +Cc: Linux MM

> BTW, in the folio world, I do not think it is possible to perform the aforementioned
> split_huge_page_to_list_to_order() pattern any more, since you always work on folio,
> the head. Unless there is a need of getting hold of a tail after-split folio after
> a folio split, the pattern would be:
> 
> tail_page = folio_page(folio, N);
> 
> folio_get(folio);
> folio_lock(folio);
> folio_split(folio, ..., /* new parameter: lock_at = */ tail_page, ...);
> tail_folio = page_folio(tail_page);
> folio_unlock(tail_folio);
> folio_put(tail_folio);

Agreed. Maybe it would be even nicer if the split function could return 
the new folio directly.

folio_get(folio);
folio_lock(folio);
split_folio = folio_split_XXX(folio, ..., tail_page, ...);
if (IS_ERR_VALUE(split_folio)) {
	...
}
folio_unlock(split_folio);
folio_put(split__folio);

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: A potential refcount issue during __folio_split
  2026-02-23  9:23     ` David Hildenbrand (Arm)
@ 2026-02-23 11:59       ` Wei Yang
  2026-02-24  4:00         ` Zi Yan
  0 siblings, 1 reply; 6+ messages in thread
From: Wei Yang @ 2026-02-23 11:59 UTC (permalink / raw)
  To: David Hildenbrand (Arm); +Cc: Zi Yan, Wei Yang, Linux MM

On Mon, Feb 23, 2026 at 10:23:11AM +0100, David Hildenbrand (Arm) wrote:
>> BTW, in the folio world, I do not think it is possible to perform the aforementioned
>> split_huge_page_to_list_to_order() pattern any more, since you always work on folio,
>> the head. Unless there is a need of getting hold of a tail after-split folio after
>> a folio split, the pattern would be:
>> 
>> tail_page = folio_page(folio, N);
>> 
>> folio_get(folio);
>> folio_lock(folio);
>> folio_split(folio, ..., /* new parameter: lock_at = */ tail_page, ...);
>> tail_folio = page_folio(tail_page);
>> folio_unlock(tail_folio);
>> folio_put(tail_folio);
>

Missed this. Agree.

>Agreed. Maybe it would be even nicer if the split function could return the
>new folio directly.
>
>folio_get(folio);
>folio_lock(folio);
>split_folio = folio_split_XXX(folio, ..., tail_page, ...);
>if (IS_ERR_VALUE(split_folio)) {
>	...
>}
>folio_unlock(split_folio);
>folio_put(split__folio);
>

I am afraid it would be complicated?

Well, we don't have this usecase now, could decide it when we do need it.

>-- 
>Cheers,
>
>David

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: A potential refcount issue during __folio_split
  2026-02-23 11:59       ` Wei Yang
@ 2026-02-24  4:00         ` Zi Yan
  2026-02-24  4:25           ` Wei Yang
  0 siblings, 1 reply; 6+ messages in thread
From: Zi Yan @ 2026-02-24  4:00 UTC (permalink / raw)
  To: Wei Yang; +Cc: David Hildenbrand (Arm), Linux MM

On 23 Feb 2026, at 6:59, Wei Yang wrote:

> On Mon, Feb 23, 2026 at 10:23:11AM +0100, David Hildenbrand (Arm) wrote:
>>> BTW, in the folio world, I do not think it is possible to perform the aforementioned
>>> split_huge_page_to_list_to_order() pattern any more, since you always work on folio,
>>> the head. Unless there is a need of getting hold of a tail after-split folio after
>>> a folio split, the pattern would be:
>>>
>>> tail_page = folio_page(folio, N);
>>>
>>> folio_get(folio);
>>> folio_lock(folio);
>>> folio_split(folio, ..., /* new parameter: lock_at = */ tail_page, ...);
>>> tail_folio = page_folio(tail_page);
>>> folio_unlock(tail_folio);
>>> folio_put(tail_folio);
>>
>
> Missed this. Agree.
>
>> Agreed. Maybe it would be even nicer if the split function could return the
>> new folio directly.
>>
>> folio_get(folio);
>> folio_lock(folio);
>> split_folio = folio_split_XXX(folio, ..., tail_page, ...);
>> if (IS_ERR_VALUE(split_folio)) {
>> 	...
>> }
>> folio_unlock(split_folio);
>> folio_put(split__folio);
>>
>
> I am afraid it would be complicated?
>
> Well, we don't have this usecase now, could decide it when we do need it.

The patch below should work, but for now, since we do not have any user,
it is better to update the comment and add a check to make sure @lock_at
always points to the head page if @list is not NULL.

From 66e24e6cc4397caa134f5600d22d77fdb9b58049 Mon Sep 17 00:00:00 2001
From: Zi Yan <ziy@nvidia.com>
Date: Mon, 23 Feb 2026 21:59:18 -0500
Subject: [PATCH] mm/huge_memory: allow caller to unlock any subpage of a folio
 after split

Transfer to-be-split folio's reference to an after-split folio that caller
wants to unlock and put.

Also let __folio_split() return the folio containing @lock_at for caller to
use.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/huge_memory.c | 65 ++++++++++++++++++++++++++++++++++--------------
 1 file changed, 47 insertions(+), 18 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 0d487649e4de..d051d611c6e5 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3768,10 +3768,9 @@ static unsigned int folio_cache_ref_count(const struct folio *folio)
 }

 static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int new_order,
-					     struct page *split_at, struct xa_state *xas,
-					     struct address_space *mapping, bool do_lru,
-					     struct list_head *list, enum split_type split_type,
-					     pgoff_t end, int *nr_shmem_dropped)
+		struct page *split_at, struct page *lock_at, struct xa_state *xas,
+		struct address_space *mapping, bool do_lru, struct list_head *list,
+		enum split_type split_type, pgoff_t end, int *nr_shmem_dropped)
 {
 	struct folio *end_folio = folio_next(folio);
 	struct folio *new_folio, *next;
@@ -3855,7 +3854,11 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
 			folio_ref_unfreeze(new_folio,
 					   folio_cache_ref_count(new_folio) + 1);

-			if (do_lru)
+			/*
+			 * skip @lock_at since caller wants to unlock and put it
+			 * after split
+			 */
+			if (do_lru && new_folio != page_folio(lock_at))
 				lru_add_split_folio(folio, new_folio, lruvec, list);

 			/*
@@ -3898,8 +3901,17 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
 		 */
 		folio_ref_unfreeze(folio, folio_cache_ref_count(folio) + 1);

-		if (do_lru)
+		if (do_lru) {
+			/*
+			 * caller wants to unlock and put @lock_at instead of
+			 * @folio, treat @folio as other after-split folios
+			 * by either elevating its refcount and putting it in
+			 * @list or putting it back to lru if @list is NULL.
+			 */
+			if (folio != page_folio(lock_at))
+				lru_add_split_folio(folio, folio, lruvec, list);
 			unlock_page_lruvec(lruvec);
+		}

 		if (ci)
 			swap_cluster_unlock(ci);
@@ -3925,14 +3937,13 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
  * preparing @folio for __split_unmapped_folio().
  *
  * After splitting, the after-split folio containing @lock_at remains locked
- * and others are unlocked:
- * 1. for uniform split, @lock_at points to one of @folio's subpages;
- * 2. for buddy allocator like (non-uniform) split, @lock_at points to @folio.
+ * and others are unlocked and the caller's folio reference is transferred to
+ * @lock_at's folio. @lock_at can point to anyone of @folio's subpages.
  *
  * Return: 0 - successful, <0 - failed (if -ENOMEM is returned, @folio might be
  * split but not to @new_order, the caller needs to check)
  */
-static int __folio_split(struct folio *folio, unsigned int new_order,
+static struct folio* __folio_split(struct folio *folio, unsigned int new_order,
 		struct page *split_at, struct page *lock_at,
 		struct list_head *list, enum split_type split_type)
 {
@@ -4052,8 +4063,10 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
 		}
 	}

-	ret = __folio_freeze_and_split_unmapped(folio, new_order, split_at, &xas, mapping,
-						true, list, split_type, end, &nr_shmem_dropped);
+	ret = __folio_freeze_and_split_unmapped(folio, new_order, split_at,
+						lock_at, &xas, mapping, true,
+						list, split_type, end,
+						&nr_shmem_dropped);
 fail:
 	if (mapping)
 		xas_unlock(&xas);
@@ -4100,7 +4113,10 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
 	if (old_order == HPAGE_PMD_ORDER)
 		count_vm_event(!ret ? THP_SPLIT_PAGE : THP_SPLIT_PAGE_FAILED);
 	count_mthp_stat(old_order, !ret ? MTHP_STAT_SPLIT : MTHP_STAT_SPLIT_FAILED);
-	return ret;
+
+	if (!ret)
+		return page_folio(lock_at);
+	return (struct folio*)ERR_PTR(ret);
 }

 /**
@@ -4138,9 +4154,10 @@ int folio_split_unmapped(struct folio *folio, unsigned int new_order)
 		return -EAGAIN;

 	local_irq_disable();
-	ret = __folio_freeze_and_split_unmapped(folio, new_order, &folio->page, NULL,
-						NULL, false, NULL, SPLIT_TYPE_UNIFORM,
-						0, NULL);
+	ret = __folio_freeze_and_split_unmapped(folio, new_order, &folio->page,
+						&folio->page, NULL, NULL, false,
+						NULL, SPLIT_TYPE_UNIFORM, 0,
+						NULL);
 	local_irq_enable();
 	return ret;
 }
@@ -4196,9 +4213,14 @@ int __split_huge_page_to_list_to_order(struct page *page, struct list_head *list
 				     unsigned int new_order)
 {
 	struct folio *folio = page_folio(page);
+	struct folio *ret;

-	return __folio_split(folio, new_order, &folio->page, page, list,
+	ret = __folio_split(folio, new_order, &folio->page, page, list,
 			     SPLIT_TYPE_UNIFORM);
+	if (IS_ERR_VALUE(ret))
+		return PTR_ERR(ret);
+
+	return 0;
 }

 /**
@@ -4228,8 +4250,15 @@ int __split_huge_page_to_list_to_order(struct page *page, struct list_head *list
 int folio_split(struct folio *folio, unsigned int new_order,
 		struct page *split_at, struct list_head *list)
 {
-	return __folio_split(folio, new_order, split_at, &folio->page, list,
+	struct folio *ret;
+
+	ret = __folio_split(folio, new_order, split_at, &folio->page, list,
 			     SPLIT_TYPE_NON_UNIFORM);
+
+	if (IS_ERR_VALUE(ret))
+		return PTR_ERR(ret);
+
+	return 0;
 }

 /**
-- 
2.51.0


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: A potential refcount issue during __folio_split
  2026-02-24  4:00         ` Zi Yan
@ 2026-02-24  4:25           ` Wei Yang
  0 siblings, 0 replies; 6+ messages in thread
From: Wei Yang @ 2026-02-24  4:25 UTC (permalink / raw)
  To: Zi Yan; +Cc: Wei Yang, David Hildenbrand (Arm), Linux MM

On Mon, Feb 23, 2026 at 11:00:01PM -0500, Zi Yan wrote:
>On 23 Feb 2026, at 6:59, Wei Yang wrote:
>
>> On Mon, Feb 23, 2026 at 10:23:11AM +0100, David Hildenbrand (Arm) wrote:
>>>> BTW, in the folio world, I do not think it is possible to perform the aforementioned
>>>> split_huge_page_to_list_to_order() pattern any more, since you always work on folio,
>>>> the head. Unless there is a need of getting hold of a tail after-split folio after
>>>> a folio split, the pattern would be:
>>>>
>>>> tail_page = folio_page(folio, N);
>>>>
>>>> folio_get(folio);
>>>> folio_lock(folio);
>>>> folio_split(folio, ..., /* new parameter: lock_at = */ tail_page, ...);
>>>> tail_folio = page_folio(tail_page);
>>>> folio_unlock(tail_folio);
>>>> folio_put(tail_folio);
>>>
>>
>> Missed this. Agree.
>>
>>> Agreed. Maybe it would be even nicer if the split function could return the
>>> new folio directly.
>>>
>>> folio_get(folio);
>>> folio_lock(folio);
>>> split_folio = folio_split_XXX(folio, ..., tail_page, ...);
>>> if (IS_ERR_VALUE(split_folio)) {
>>> 	...
>>> }
>>> folio_unlock(split_folio);
>>> folio_put(split__folio);
>>>
>>
>> I am afraid it would be complicated?
>>
>> Well, we don't have this usecase now, could decide it when we do need it.
>
>The patch below should work, but for now, since we do not have any user,
>it is better to update the comment and add a check to make sure @lock_at
>always points to the head page if @list is not NULL.
>

Agree.

>From 66e24e6cc4397caa134f5600d22d77fdb9b58049 Mon Sep 17 00:00:00 2001
>From: Zi Yan <ziy@nvidia.com>
>Date: Mon, 23 Feb 2026 21:59:18 -0500
>Subject: [PATCH] mm/huge_memory: allow caller to unlock any subpage of a folio
> after split
>
>Transfer to-be-split folio's reference to an after-split folio that caller
>wants to unlock and put.
>
>Also let __folio_split() return the folio containing @lock_at for caller to
>use.
>
>Signed-off-by: Zi Yan <ziy@nvidia.com>
>---
> mm/huge_memory.c | 65 ++++++++++++++++++++++++++++++++++--------------
> 1 file changed, 47 insertions(+), 18 deletions(-)
>
>diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>index 0d487649e4de..d051d611c6e5 100644
>--- a/mm/huge_memory.c
>+++ b/mm/huge_memory.c
>@@ -3768,10 +3768,9 @@ static unsigned int folio_cache_ref_count(const struct folio *folio)
> }
>
> static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int new_order,
>-					     struct page *split_at, struct xa_state *xas,
>-					     struct address_space *mapping, bool do_lru,
>-					     struct list_head *list, enum split_type split_type,
>-					     pgoff_t end, int *nr_shmem_dropped)
>+		struct page *split_at, struct page *lock_at, struct xa_state *xas,
>+		struct address_space *mapping, bool do_lru, struct list_head *list,
>+		enum split_type split_type, pgoff_t end, int *nr_shmem_dropped)
> {
> 	struct folio *end_folio = folio_next(folio);
> 	struct folio *new_folio, *next;
>@@ -3855,7 +3854,11 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
> 			folio_ref_unfreeze(new_folio,
> 					   folio_cache_ref_count(new_folio) + 1);
>
>-			if (do_lru)
>+			/*
>+			 * skip @lock_at since caller wants to unlock and put it
>+			 * after split
>+			 */
>+			if (do_lru && new_folio != page_folio(lock_at))
> 				lru_add_split_folio(folio, new_folio, lruvec, list);

This makes me thing whether we need to always grab lru lock.

If the folio has already been removed from lru, it looks not necessary?

Well this is another thing.

>
> 			/*
>@@ -3898,8 +3901,17 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
> 		 */
> 		folio_ref_unfreeze(folio, folio_cache_ref_count(folio) + 1);
>
>-		if (do_lru)
>+		if (do_lru) {
>+			/*
>+			 * caller wants to unlock and put @lock_at instead of
>+			 * @folio, treat @folio as other after-split folios
>+			 * by either elevating its refcount and putting it in
>+			 * @list or putting it back to lru if @list is NULL.
>+			 */
>+			if (folio != page_folio(lock_at))
>+				lru_add_split_folio(folio, folio, lruvec, list);
> 			unlock_page_lruvec(lruvec);
>+		}
>
> 		if (ci)
> 			swap_cluster_unlock(ci);
>@@ -3925,14 +3937,13 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
>  * preparing @folio for __split_unmapped_folio().
>  *
>  * After splitting, the after-split folio containing @lock_at remains locked
>- * and others are unlocked:
>- * 1. for uniform split, @lock_at points to one of @folio's subpages;
>- * 2. for buddy allocator like (non-uniform) split, @lock_at points to @folio.
>+ * and others are unlocked and the caller's folio reference is transferred to
>+ * @lock_at's folio. @lock_at can point to anyone of @folio's subpages.
>  *
>  * Return: 0 - successful, <0 - failed (if -ENOMEM is returned, @folio might be
>  * split but not to @new_order, the caller needs to check)
>  */
>-static int __folio_split(struct folio *folio, unsigned int new_order,
>+static struct folio* __folio_split(struct folio *folio, unsigned int new_order,
> 		struct page *split_at, struct page *lock_at,
> 		struct list_head *list, enum split_type split_type)
> {
>@@ -4052,8 +4063,10 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
> 		}
> 	}
>
>-	ret = __folio_freeze_and_split_unmapped(folio, new_order, split_at, &xas, mapping,
>-						true, list, split_type, end, &nr_shmem_dropped);
>+	ret = __folio_freeze_and_split_unmapped(folio, new_order, split_at,
>+						lock_at, &xas, mapping, true,
>+						list, split_type, end,
>+						&nr_shmem_dropped);
> fail:
> 	if (mapping)
> 		xas_unlock(&xas);
>@@ -4100,7 +4113,10 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
> 	if (old_order == HPAGE_PMD_ORDER)
> 		count_vm_event(!ret ? THP_SPLIT_PAGE : THP_SPLIT_PAGE_FAILED);
> 	count_mthp_stat(old_order, !ret ? MTHP_STAT_SPLIT : MTHP_STAT_SPLIT_FAILED);
>-	return ret;
>+
>+	if (!ret)
>+		return page_folio(lock_at);
>+	return (struct folio*)ERR_PTR(ret);
> }
>
> /**
>@@ -4138,9 +4154,10 @@ int folio_split_unmapped(struct folio *folio, unsigned int new_order)
> 		return -EAGAIN;
>
> 	local_irq_disable();
>-	ret = __folio_freeze_and_split_unmapped(folio, new_order, &folio->page, NULL,
>-						NULL, false, NULL, SPLIT_TYPE_UNIFORM,
>-						0, NULL);
>+	ret = __folio_freeze_and_split_unmapped(folio, new_order, &folio->page,
>+						&folio->page, NULL, NULL, false,
>+						NULL, SPLIT_TYPE_UNIFORM, 0,
>+						NULL);
> 	local_irq_enable();
> 	return ret;
> }
>@@ -4196,9 +4213,14 @@ int __split_huge_page_to_list_to_order(struct page *page, struct list_head *list
> 				     unsigned int new_order)
> {
> 	struct folio *folio = page_folio(page);
>+	struct folio *ret;
>
>-	return __folio_split(folio, new_order, &folio->page, page, list,
>+	ret = __folio_split(folio, new_order, &folio->page, page, list,
> 			     SPLIT_TYPE_UNIFORM);
>+	if (IS_ERR_VALUE(ret))
>+		return PTR_ERR(ret);
>+
>+	return 0;
> }
>
> /**
>@@ -4228,8 +4250,15 @@ int __split_huge_page_to_list_to_order(struct page *page, struct list_head *list
> int folio_split(struct folio *folio, unsigned int new_order,
> 		struct page *split_at, struct list_head *list)
> {
>-	return __folio_split(folio, new_order, split_at, &folio->page, list,
>+	struct folio *ret;
>+
>+	ret = __folio_split(folio, new_order, split_at, &folio->page, list,
> 			     SPLIT_TYPE_NON_UNIFORM);
>+
>+	if (IS_ERR_VALUE(ret))
>+		return PTR_ERR(ret);
>+
>+	return 0;
> }
>
> /**
>-- 
>2.51.0
>
>
>Best Regards,
>Yan, Zi

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-02-24  4:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20260222010425.gbsjzhrew3pg4qrw@master>
     [not found] ` <20260222010708.uohpmddmzaa4i4ic@master>
2026-02-22  3:00   ` A potential refcount issue during __folio_split Zi Yan
2026-02-22 10:28     ` Wei Yang
2026-02-23  9:23     ` David Hildenbrand (Arm)
2026-02-23 11:59       ` Wei Yang
2026-02-24  4:00         ` Zi Yan
2026-02-24  4:25           ` Wei Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox