linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/huge_memory: fix a folio_split() race condition with folio_try_get()
@ 2026-02-28  1:06 Zi Yan
  2026-02-28  3:10 ` Lance Yang
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Zi Yan @ 2026-02-28  1:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Lorenzo Stoakes, Zi Yan, Hugh Dickins,
	Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
	Barry Song, Lance Yang, Matthew Wilcox, Bas van Dijk, Eero Kelly,
	Andrew Battat, Adam Bratschi-Kaye, linux-mm, linux-kernel,
	linux-fsdevel, stable

During a pagecache folio split, the values in the related xarray should not
be changed from the original folio at xarray split time until all
after-split folios are well formed and stored in the xarray. Current use
of xas_try_split() in __split_unmapped_folio() lets some after-split folios
show up at wrong indices in the xarray. When these misplaced after-split
folios are unfrozen, before correct folios are stored via __xa_store(), and
grabbed by folio_try_get(), they are returned to userspace at wrong file
indices, causing data corruption.

Fix it by using the original folio in xas_try_split() calls, so that
folio_try_get() can get the right after-split folios after the original
folio is unfrozen.

Uniform split, split_huge_page*(), is not affected, since it uses
xas_split_alloc() and xas_split() only once and stores the original folio
in the xarray.

Fixes below points to the commit introduces the code, but folio_split() is
used in a later commit 7460b470a131f ("mm/truncate: use folio_split() in
truncate operation").

Fixes: 00527733d0dc8 ("mm/huge_memory: add two new (not yet used) functions for folio_split()")
Reported-by: Bas van Dijk <bas@dfinity.org>
Closes: https://lore.kernel.org/all/CAKNNEtw5_kZomhkugedKMPOG-sxs5Q5OLumWJdiWXv+C9Yct0w@mail.gmail.com/
Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: <stable@vger.kernel.org>
---
 mm/huge_memory.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 56db54fa48181..e4ed0404e8b55 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3647,6 +3647,7 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
 	const bool is_anon = folio_test_anon(folio);
 	int old_order = folio_order(folio);
 	int start_order = split_type == SPLIT_TYPE_UNIFORM ? new_order : old_order - 1;
+	struct folio *origin_folio = folio;
 	int split_order;
 
 	/*
@@ -3672,7 +3673,13 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
 				xas_split(xas, folio, old_order);
 			else {
 				xas_set_order(xas, folio->index, split_order);
-				xas_try_split(xas, folio, old_order);
+				/*
+				 * use the original folio, so that a parallel
+				 * folio_try_get() waits on it until xarray is
+				 * updated with after-split folios and
+				 * the original one is unfrozen.
+				 */
+				xas_try_split(xas, origin_folio, old_order);
 				if (xas_error(xas))
 					return xas_error(xas);
 			}
-- 
2.51.0



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/huge_memory: fix a folio_split() race condition with folio_try_get()
  2026-02-28  1:06 [PATCH] mm/huge_memory: fix a folio_split() race condition with folio_try_get() Zi Yan
@ 2026-02-28  3:10 ` Lance Yang
  2026-03-02 14:28   ` David Hildenbrand (Arm)
  2026-03-02 13:30 ` Lorenzo Stoakes
  2026-03-02 14:40 ` David Hildenbrand (Arm)
  2 siblings, 1 reply; 10+ messages in thread
From: Lance Yang @ 2026-02-28  3:10 UTC (permalink / raw)
  To: Zi Yan, Andrew Morton
  Cc: David Hildenbrand, Lorenzo Stoakes, Hugh Dickins, Baolin Wang,
	Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Matthew Wilcox, Bas van Dijk, Eero Kelly, Andrew Battat,
	Adam Bratschi-Kaye, linux-mm, linux-kernel, linux-fsdevel,
	stable



On 2026/2/28 09:06, Zi Yan wrote:
> During a pagecache folio split, the values in the related xarray should not
> be changed from the original folio at xarray split time until all
> after-split folios are well formed and stored in the xarray. Current use
> of xas_try_split() in __split_unmapped_folio() lets some after-split folios
> show up at wrong indices in the xarray. When these misplaced after-split
> folios are unfrozen, before correct folios are stored via __xa_store(), and
> grabbed by folio_try_get(), they are returned to userspace at wrong file
> indices, causing data corruption.
> 
> Fix it by using the original folio in xas_try_split() calls, so that
> folio_try_get() can get the right after-split folios after the original
> folio is unfrozen.
> 
> Uniform split, split_huge_page*(), is not affected, since it uses
> xas_split_alloc() and xas_split() only once and stores the original folio
> in the xarray.
> 
> Fixes below points to the commit introduces the code, but folio_split() is
> used in a later commit 7460b470a131f ("mm/truncate: use folio_split() in
> truncate operation").
> 
> Fixes: 00527733d0dc8 ("mm/huge_memory: add two new (not yet used) functions for folio_split()")
> Reported-by: Bas van Dijk <bas@dfinity.org>
> Closes: https://lore.kernel.org/all/CAKNNEtw5_kZomhkugedKMPOG-sxs5Q5OLumWJdiWXv+C9Yct0w@mail.gmail.com/
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> Cc: <stable@vger.kernel.org>
> ---

Thanks for the fix!

I also made a C reproducer and tested this patch - the corruption
disappeared.

Without patch: corruption in < 10 iterations
With patch: 1000 iterations, all clean

Tested-by: Lance Yang <lance.yang@linux.dev>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/huge_memory: fix a folio_split() race condition with folio_try_get()
  2026-02-28  1:06 [PATCH] mm/huge_memory: fix a folio_split() race condition with folio_try_get() Zi Yan
  2026-02-28  3:10 ` Lance Yang
@ 2026-03-02 13:30 ` Lorenzo Stoakes
  2026-03-02 16:30   ` Zi Yan
  2026-03-02 14:40 ` David Hildenbrand (Arm)
  2 siblings, 1 reply; 10+ messages in thread
From: Lorenzo Stoakes @ 2026-03-02 13:30 UTC (permalink / raw)
  To: Zi Yan
  Cc: Andrew Morton, David Hildenbrand, Hugh Dickins, Baolin Wang,
	Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Matthew Wilcox, Bas van Dijk, Eero Kelly,
	Andrew Battat, Adam Bratschi-Kaye, linux-mm, linux-kernel,
	linux-fsdevel, stable

On Fri, Feb 27, 2026 at 08:06:14PM -0500, Zi Yan wrote:
> During a pagecache folio split, the values in the related xarray should not
> be changed from the original folio at xarray split time until all
> after-split folios are well formed and stored in the xarray. Current use
> of xas_try_split() in __split_unmapped_folio() lets some after-split folios
> show up at wrong indices in the xarray. When these misplaced after-split
> folios are unfrozen, before correct folios are stored via __xa_store(), and
> grabbed by folio_try_get(), they are returned to userspace at wrong file
> indices, causing data corruption.
>
> Fix it by using the original folio in xas_try_split() calls, so that
> folio_try_get() can get the right after-split folios after the original
> folio is unfrozen.
>
> Uniform split, split_huge_page*(), is not affected, since it uses
> xas_split_alloc() and xas_split() only once and stores the original folio
> in the xarray.
>
> Fixes below points to the commit introduces the code, but folio_split() is
> used in a later commit 7460b470a131f ("mm/truncate: use folio_split() in
> truncate operation").
>
> Fixes: 00527733d0dc8 ("mm/huge_memory: add two new (not yet used) functions for folio_split()")
> Reported-by: Bas van Dijk <bas@dfinity.org>
> Closes: https://lore.kernel.org/all/CAKNNEtw5_kZomhkugedKMPOG-sxs5Q5OLumWJdiWXv+C9Yct0w@mail.gmail.com/
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> Cc: <stable@vger.kernel.org>
> ---
>  mm/huge_memory.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 56db54fa48181..e4ed0404e8b55 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3647,6 +3647,7 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
>  	const bool is_anon = folio_test_anon(folio);
>  	int old_order = folio_order(folio);
>  	int start_order = split_type == SPLIT_TYPE_UNIFORM ? new_order : old_order - 1;
> +	struct folio *origin_folio = folio;

NIT: 'origin' folio is a bit ambigious, maybe old_folio, since it is of order old_order?

>  	int split_order;
>
>  	/*
> @@ -3672,7 +3673,13 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
>  				xas_split(xas, folio, old_order);

Aside, but this 'if (foo) bar(); else { ... }' pattern is horrible, think it's
justifiable to put both in {}... :)

>  			else {
>  				xas_set_order(xas, folio->index, split_order);
> -				xas_try_split(xas, folio, old_order);
> +				/*
> +				 * use the original folio, so that a parallel
> +				 * folio_try_get() waits on it until xarray is
> +				 * updated with after-split folios and
> +				 * the original one is unfrozen.
> +				 */
> +				xas_try_split(xas, origin_folio, old_order);

Hmm, but won't we have already split the original folio by now? So is
origin_folio/old_folio a pointer to what was the original folio but now is
that but with weird tail page setup? :) like:

|------------------------|
|           f            |
|------------------------|
^old_folio  ^ split_at

|-----------|------------|
|     f     |     f2     |
|-----------|------------|
^old_folio

|-----------|-----|------|
|     f     |  f3 |  f4  |
|-----------|-----|------|
^old_folio

etc.

So the xarray would contain:

|-----------|-----|------|
|    f      |  f  |   f  |
|-----------|-----|------|

Wouldn't it after this?

Oh I guess before it'd contain:

|-----------|-----|------|
|     f     |  f4 |  f4  |
|-----------|-----|------|

Right?


You saying you'll later put the correct xas entries in post-split. Where does
that happen?

And why was it a problem when these new folios were unfrozen?

(Since the folio is a pointer to an offset in the vmemmap)

I guess if you update that later in the xas, it's ok, and everything waits on
the right thing so this is probably fine, and the f4 f4 above is probably not
fine...

I'm guessing the original folio is kept frozen during the operation?

Anyway please help my confusion not so familiar with this code :)


>  				if (xas_error(xas))
>  					return xas_error(xas);
>  			}
> --
> 2.51.0
>

Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/huge_memory: fix a folio_split() race condition with folio_try_get()
  2026-02-28  3:10 ` Lance Yang
@ 2026-03-02 14:28   ` David Hildenbrand (Arm)
  2026-03-02 15:11     ` Lance Yang
  0 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-02 14:28 UTC (permalink / raw)
  To: Lance Yang, Zi Yan, Andrew Morton
  Cc: Lorenzo Stoakes, Hugh Dickins, Baolin Wang, Liam R. Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Matthew Wilcox,
	Bas van Dijk, Eero Kelly, Andrew Battat, Adam Bratschi-Kaye,
	linux-mm, linux-kernel, linux-fsdevel, stable

On 2/28/26 04:10, Lance Yang wrote:
> 
> 
> On 2026/2/28 09:06, Zi Yan wrote:
>> During a pagecache folio split, the values in the related xarray
>> should not
>> be changed from the original folio at xarray split time until all
>> after-split folios are well formed and stored in the xarray. Current use
>> of xas_try_split() in __split_unmapped_folio() lets some after-split
>> folios
>> show up at wrong indices in the xarray. When these misplaced after-split
>> folios are unfrozen, before correct folios are stored via
>> __xa_store(), and
>> grabbed by folio_try_get(), they are returned to userspace at wrong file
>> indices, causing data corruption.
>>
>> Fix it by using the original folio in xas_try_split() calls, so that
>> folio_try_get() can get the right after-split folios after the original
>> folio is unfrozen.
>>
>> Uniform split, split_huge_page*(), is not affected, since it uses
>> xas_split_alloc() and xas_split() only once and stores the original folio
>> in the xarray.
>>
>> Fixes below points to the commit introduces the code, but
>> folio_split() is
>> used in a later commit 7460b470a131f ("mm/truncate: use folio_split() in
>> truncate operation").
>>
>> Fixes: 00527733d0dc8 ("mm/huge_memory: add two new (not yet used)
>> functions for folio_split()")
>> Reported-by: Bas van Dijk <bas@dfinity.org>
>> Closes: https://lore.kernel.org/all/CAKNNEtw5_kZomhkugedKMPOG-
>> sxs5Q5OLumWJdiWXv+C9Yct0w@mail.gmail.com/
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> Cc: <stable@vger.kernel.org>
>> ---
> 
> Thanks for the fix!
> 
> I also made a C reproducer and tested this patch - the corruption
> disappeared.

Should we link that reproducer somehow from the patch description?

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/huge_memory: fix a folio_split() race condition with folio_try_get()
  2026-02-28  1:06 [PATCH] mm/huge_memory: fix a folio_split() race condition with folio_try_get() Zi Yan
  2026-02-28  3:10 ` Lance Yang
  2026-03-02 13:30 ` Lorenzo Stoakes
@ 2026-03-02 14:40 ` David Hildenbrand (Arm)
  2026-03-02 16:34   ` Zi Yan
  2 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-02 14:40 UTC (permalink / raw)
  To: Zi Yan, Andrew Morton
  Cc: Lorenzo Stoakes, Hugh Dickins, Baolin Wang, Liam R. Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	Matthew Wilcox, Bas van Dijk, Eero Kelly, Andrew Battat,
	Adam Bratschi-Kaye, linux-mm, linux-kernel, linux-fsdevel,
	stable

On 2/28/26 02:06, Zi Yan wrote:
> During a pagecache folio split, the values in the related xarray should not
> be changed from the original folio at xarray split time until all
> after-split folios are well formed and stored in the xarray. Current use
> of xas_try_split() in __split_unmapped_folio() lets some after-split folios
> show up at wrong indices in the xarray. When these misplaced after-split
> folios are unfrozen, before correct folios are stored via __xa_store(), and
> grabbed by folio_try_get(), they are returned to userspace at wrong file
> indices, causing data corruption.
> 
> Fix it by using the original folio in xas_try_split() calls, so that
> folio_try_get() can get the right after-split folios after the original
> folio is unfrozen.
> 
> Uniform split, split_huge_page*(), is not affected, since it uses
> xas_split_alloc() and xas_split() only once and stores the original folio
> in the xarray.

Could we make both code paths similar and store the original folio in
both cases?

IIUC, the __xa_store() is performed unconditionally after
__split_unmapped_folio().

I'm wondering, though, about the "new_folio->index >= end" case.
Wouldn't we leave some stale entries in the xarray? But that handling
has always been confusing to me :)

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/huge_memory: fix a folio_split() race condition with folio_try_get()
  2026-03-02 14:28   ` David Hildenbrand (Arm)
@ 2026-03-02 15:11     ` Lance Yang
  2026-03-02 16:36       ` Zi Yan
  0 siblings, 1 reply; 10+ messages in thread
From: Lance Yang @ 2026-03-02 15:11 UTC (permalink / raw)
  To: David Hildenbrand (Arm), Zi Yan, Andrew Morton
  Cc: Lorenzo Stoakes, Hugh Dickins, Baolin Wang, Liam R. Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Matthew Wilcox,
	Bas van Dijk, Eero Kelly, Andrew Battat, Adam Bratschi-Kaye,
	linux-mm, linux-kernel, linux-fsdevel, stable



On 2026/3/2 22:28, David Hildenbrand (Arm) wrote:
> On 2/28/26 04:10, Lance Yang wrote:
>>
>>
>> On 2026/2/28 09:06, Zi Yan wrote:
>>> During a pagecache folio split, the values in the related xarray
>>> should not
>>> be changed from the original folio at xarray split time until all
>>> after-split folios are well formed and stored in the xarray. Current use
>>> of xas_try_split() in __split_unmapped_folio() lets some after-split
>>> folios
>>> show up at wrong indices in the xarray. When these misplaced after-split
>>> folios are unfrozen, before correct folios are stored via
>>> __xa_store(), and
>>> grabbed by folio_try_get(), they are returned to userspace at wrong file
>>> indices, causing data corruption.
>>>
>>> Fix it by using the original folio in xas_try_split() calls, so that
>>> folio_try_get() can get the right after-split folios after the original
>>> folio is unfrozen.
>>>
>>> Uniform split, split_huge_page*(), is not affected, since it uses
>>> xas_split_alloc() and xas_split() only once and stores the original folio
>>> in the xarray.
>>>
>>> Fixes below points to the commit introduces the code, but
>>> folio_split() is
>>> used in a later commit 7460b470a131f ("mm/truncate: use folio_split() in
>>> truncate operation").
>>>
>>> Fixes: 00527733d0dc8 ("mm/huge_memory: add two new (not yet used)
>>> functions for folio_split()")
>>> Reported-by: Bas van Dijk <bas@dfinity.org>
>>> Closes: https://lore.kernel.org/all/CAKNNEtw5_kZomhkugedKMPOG-
>>> sxs5Q5OLumWJdiWXv+C9Yct0w@mail.gmail.com/
>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>> Cc: <stable@vger.kernel.org>
>>> ---
>>
>> Thanks for the fix!
>>
>> I also made a C reproducer and tested this patch - the corruption
>> disappeared.
> 
> Should we link that reproducer somehow from the patch description?

Yes, the original reproducer provided by Bas is available here[1].

Regarding the C reproducer, Zi plans to add it to selftests in a
follow-up patch (as we discussed off-list).

[1] https://github.com/dfinity/thp-madv-remove-test

Cheers,
Lance


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/huge_memory: fix a folio_split() race condition with folio_try_get()
  2026-03-02 13:30 ` Lorenzo Stoakes
@ 2026-03-02 16:30   ` Zi Yan
  0 siblings, 0 replies; 10+ messages in thread
From: Zi Yan @ 2026-03-02 16:30 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, David Hildenbrand, Hugh Dickins, Baolin Wang,
	Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Matthew Wilcox, Bas van Dijk, Eero Kelly,
	Andrew Battat, Adam Bratschi-Kaye, linux-mm, linux-kernel,
	linux-fsdevel, stable

On 2 Mar 2026, at 8:30, Lorenzo Stoakes wrote:

> On Fri, Feb 27, 2026 at 08:06:14PM -0500, Zi Yan wrote:
>> During a pagecache folio split, the values in the related xarray should not
>> be changed from the original folio at xarray split time until all
>> after-split folios are well formed and stored in the xarray. Current use
>> of xas_try_split() in __split_unmapped_folio() lets some after-split folios
>> show up at wrong indices in the xarray. When these misplaced after-split
>> folios are unfrozen, before correct folios are stored via __xa_store(), and
>> grabbed by folio_try_get(), they are returned to userspace at wrong file
>> indices, causing data corruption.
>>
>> Fix it by using the original folio in xas_try_split() calls, so that
>> folio_try_get() can get the right after-split folios after the original
>> folio is unfrozen.
>>
>> Uniform split, split_huge_page*(), is not affected, since it uses
>> xas_split_alloc() and xas_split() only once and stores the original folio
>> in the xarray.
>>
>> Fixes below points to the commit introduces the code, but folio_split() is
>> used in a later commit 7460b470a131f ("mm/truncate: use folio_split() in
>> truncate operation").
>>
>> Fixes: 00527733d0dc8 ("mm/huge_memory: add two new (not yet used) functions for folio_split()")
>> Reported-by: Bas van Dijk <bas@dfinity.org>
>> Closes: https://lore.kernel.org/all/CAKNNEtw5_kZomhkugedKMPOG-sxs5Q5OLumWJdiWXv+C9Yct0w@mail.gmail.com/
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> Cc: <stable@vger.kernel.org>
>> ---
>>  mm/huge_memory.c | 9 ++++++++-
>>  1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 56db54fa48181..e4ed0404e8b55 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -3647,6 +3647,7 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
>>  	const bool is_anon = folio_test_anon(folio);
>>  	int old_order = folio_order(folio);
>>  	int start_order = split_type == SPLIT_TYPE_UNIFORM ? new_order : old_order - 1;
>> +	struct folio *origin_folio = folio;
>
> NIT: 'origin' folio is a bit ambigious, maybe old_folio, since it is of order old_order?

OK, will rename it.

>
>>  	int split_order;
>>
>>  	/*
>> @@ -3672,7 +3673,13 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
>>  				xas_split(xas, folio, old_order);
>
> Aside, but this 'if (foo) bar(); else { ... }' pattern is horrible, think it's
> justifiable to put both in {}... :)

I can fix it along with this. It should not cause much trouble during backport.

>
>>  			else {
>>  				xas_set_order(xas, folio->index, split_order);
>> -				xas_try_split(xas, folio, old_order);
>> +				/*
>> +				 * use the original folio, so that a parallel
>> +				 * folio_try_get() waits on it until xarray is
>> +				 * updated with after-split folios and
>> +				 * the original one is unfrozen.
>> +				 */
>> +				xas_try_split(xas, origin_folio, old_order);
>
> Hmm, but won't we have already split the original folio by now? So is
> origin_folio/old_folio a pointer to what was the original folio but now is
> that but with weird tail page setup? :) like:
>
> |------------------------|
> |           f            |
> |------------------------|
> ^old_folio  ^ split_at
>
> |-----------|------------|
> |     f     |     f2     |
> |-----------|------------|
> ^old_folio
>
> |-----------|-----|------|
> |     f     |  f3 |  f4  |
> |-----------|-----|------|
> ^old_folio

This should be:

|-----------|-----|------|
|     f     |  f2 |  f3  |
|-----------|-----|------|
^old_folio

after split, the head page of f2 does not change,
so f2 becomes f2,f3, where f3 is the tail page
in the middle.

>
> etc.
>
> So the xarray would contain:
>
> |-----------|-----|------|
> |    f      |  f  |   f  |
> |-----------|-----|------|

This is the expected xarray state.

>
> Wouldn't it after this?
>
> Oh I guess before it'd contain:
>
> |-----------|-----|------|
> |     f     |  f4 |  f4  |
> |-----------|-----|------|
>
> Right?

You got the gist of it. The reality (see the fix above) is

|-----------|-----|------|
|     f     |  f2 |  f3  |
|-----------|-----|------|

But another split comes at f3, the xarray becomes

|-----------|-----|---|---|
|     f     |  f2 |f3 | f3|
|-----------|-----|---|---|

due to how xas_try_split() works. Yeah, feel free to
blame me, since when I wrote xas_try_split(), I did
not get into all the details. I am planning to
change xas_try_split() so that the xarray will become

|-----------|-----|---|---|
|     f     |  f2 |f3 | f4|
|-----------|-----|---|---|


>
>
> You saying you'll later put the correct xas entries in post-split. Where does
> that happen?

After __split_unmmaped_folio(), when __xa_store() is performed.

>
> And why was it a problem when these new folios were unfrozen?
>
> (Since the folio is a pointer to an offset in the vmemmap)
>
> I guess if you update that later in the xas, it's ok, and everything waits on
> the right thing so this is probably fine, and the f4 f4 above is probably not
> fine...
>
> I'm guessing the original folio is kept frozen during the operation?

Right. f is kept frozen until the entire xarray is updated. But if the xarray
is like (before the fix)

|-----------|-----|---|---|
|     f     |  f2 |f3 | f3|
|-----------|-----|---|---|

the code after __split_unmmaped_folio()
1. unfreezes f2, __xa_store(f2)
2. unfreezes f3, __xa_store(f3)
3. unfreezes f4, __xa_store(f4), which overwrites the second f3 to f4,

and a parallel folio_try_get() that looks at the second f3 at step 2
sees f3 is unfrozen, then gives f3 to user but should have given
f4. It only happens when the split is at the second half of the old
folio.

>
> Anyway please help my confusion not so familiar with this code :)
>

Let me know if you have any more questions.

>
>>  				if (xas_error(xas))
>>  					return xas_error(xas);
>>  			}
>> --
>> 2.51.0
>>
>
> Thanks, Lorenzo


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/huge_memory: fix a folio_split() race condition with folio_try_get()
  2026-03-02 14:40 ` David Hildenbrand (Arm)
@ 2026-03-02 16:34   ` Zi Yan
  0 siblings, 0 replies; 10+ messages in thread
From: Zi Yan @ 2026-03-02 16:34 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Andrew Morton, Lorenzo Stoakes, Hugh Dickins, Baolin Wang,
	Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Matthew Wilcox, Bas van Dijk, Eero Kelly,
	Andrew Battat, Adam Bratschi-Kaye, linux-mm, linux-kernel,
	linux-fsdevel, stable

On 2 Mar 2026, at 9:40, David Hildenbrand (Arm) wrote:

> On 2/28/26 02:06, Zi Yan wrote:
>> During a pagecache folio split, the values in the related xarray should not
>> be changed from the original folio at xarray split time until all
>> after-split folios are well formed and stored in the xarray. Current use
>> of xas_try_split() in __split_unmapped_folio() lets some after-split folios
>> show up at wrong indices in the xarray. When these misplaced after-split
>> folios are unfrozen, before correct folios are stored via __xa_store(), and
>> grabbed by folio_try_get(), they are returned to userspace at wrong file
>> indices, causing data corruption.
>>
>> Fix it by using the original folio in xas_try_split() calls, so that
>> folio_try_get() can get the right after-split folios after the original
>> folio is unfrozen.
>>
>> Uniform split, split_huge_page*(), is not affected, since it uses
>> xas_split_alloc() and xas_split() only once and stores the original folio
>> in the xarray.
>
> Could we make both code paths similar and store the original folio in
> both cases?

Sure.

>
> IIUC, the __xa_store() is performed unconditionally after
> __split_unmapped_folio().
>
> I'm wondering, though, about the "new_folio->index >= end" case.
> Wouldn't we leave some stale entries in the xarray? But that handling
> has always been confusing to me :)

IIUC, __filemap_remove_folio() calls page_cache_delete(), which overwrites
that stale entries with shadow, which is NULL here.


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/huge_memory: fix a folio_split() race condition with folio_try_get()
  2026-03-02 15:11     ` Lance Yang
@ 2026-03-02 16:36       ` Zi Yan
  2026-03-02 20:54         ` [External Sender] " Bas van Dijk
  0 siblings, 1 reply; 10+ messages in thread
From: Zi Yan @ 2026-03-02 16:36 UTC (permalink / raw)
  To: Lance Yang, Bas van Dijk
  Cc: David Hildenbrand (Arm),
	Andrew Morton, Lorenzo Stoakes, Hugh Dickins, Baolin Wang,
	Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Matthew Wilcox, Eero Kelly, Andrew Battat, Adam Bratschi-Kaye,
	linux-mm, linux-kernel, linux-fsdevel, stable

On 2 Mar 2026, at 10:11, Lance Yang wrote:

> On 2026/3/2 22:28, David Hildenbrand (Arm) wrote:
>> On 2/28/26 04:10, Lance Yang wrote:
>>>
>>>
>>> On 2026/2/28 09:06, Zi Yan wrote:
>>>> During a pagecache folio split, the values in the related xarray
>>>> should not
>>>> be changed from the original folio at xarray split time until all
>>>> after-split folios are well formed and stored in the xarray. Current use
>>>> of xas_try_split() in __split_unmapped_folio() lets some after-split
>>>> folios
>>>> show up at wrong indices in the xarray. When these misplaced after-split
>>>> folios are unfrozen, before correct folios are stored via
>>>> __xa_store(), and
>>>> grabbed by folio_try_get(), they are returned to userspace at wrong file
>>>> indices, causing data corruption.
>>>>
>>>> Fix it by using the original folio in xas_try_split() calls, so that
>>>> folio_try_get() can get the right after-split folios after the original
>>>> folio is unfrozen.
>>>>
>>>> Uniform split, split_huge_page*(), is not affected, since it uses
>>>> xas_split_alloc() and xas_split() only once and stores the original folio
>>>> in the xarray.
>>>>
>>>> Fixes below points to the commit introduces the code, but
>>>> folio_split() is
>>>> used in a later commit 7460b470a131f ("mm/truncate: use folio_split() in
>>>> truncate operation").
>>>>
>>>> Fixes: 00527733d0dc8 ("mm/huge_memory: add two new (not yet used)
>>>> functions for folio_split()")
>>>> Reported-by: Bas van Dijk <bas@dfinity.org>
>>>> Closes: https://lore.kernel.org/all/CAKNNEtw5_kZomhkugedKMPOG-
>>>> sxs5Q5OLumWJdiWXv+C9Yct0w@mail.gmail.com/
>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>> Cc: <stable@vger.kernel.org>
>>>> ---
>>>
>>> Thanks for the fix!
>>>
>>> I also made a C reproducer and tested this patch - the corruption
>>> disappeared.
>>
>> Should we link that reproducer somehow from the patch description?
>
> Yes, the original reproducer provided by Bas is available here[1].
>
> Regarding the C reproducer, Zi plans to add it to selftests in a
> follow-up patch (as we discussed off-list).
>
> [1] https://github.com/dfinity/thp-madv-remove-test

Sure. I will add the reproducer link to the commit log.


Hi Bas,

I used Cursor to convert your rust-based thp-madv-remove-test to C.
Do you have any concern if I add it to kernel’s selftests to check
this race condition?

Thanks.


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [External Sender] Re: [PATCH] mm/huge_memory: fix a folio_split() race condition with folio_try_get()
  2026-03-02 16:36       ` Zi Yan
@ 2026-03-02 20:54         ` Bas van Dijk
  0 siblings, 0 replies; 10+ messages in thread
From: Bas van Dijk @ 2026-03-02 20:54 UTC (permalink / raw)
  To: Zi Yan
  Cc: Lance Yang, David Hildenbrand (Arm),
	Andrew Morton, Lorenzo Stoakes, Hugh Dickins, Baolin Wang,
	Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Matthew Wilcox, Eero Kelly, Andrew Battat, linux-mm,
	linux-kernel, linux-fsdevel, stable, Adam Bratschi-Kaye

[-- Attachment #1: Type: text/plain, Size: 2960 bytes --]

Hi Yan,

For the record it was my colleague Adam Bratschi-Kaye (CCed) who wrote the
rust-based thp-madv-remove-test.

I don't have an issue if you include a C version as a kernel selftest. In
fact, it's a good idea to prevent this from regressing in the future.

Cheers,

Bas


On Mon, 2 Mar 2026, 17:37 Zi Yan, <ziy@nvidia.com> wrote:

> On 2 Mar 2026, at 10:11, Lance Yang wrote:
>
> > On 2026/3/2 22:28, David Hildenbrand (Arm) wrote:
> >> On 2/28/26 04:10, Lance Yang wrote:
> >>>
> >>>
> >>> On 2026/2/28 09:06, Zi Yan wrote:
> >>>> During a pagecache folio split, the values in the related xarray
> >>>> should not
> >>>> be changed from the original folio at xarray split time until all
> >>>> after-split folios are well formed and stored in the xarray. Current
> use
> >>>> of xas_try_split() in __split_unmapped_folio() lets some after-split
> >>>> folios
> >>>> show up at wrong indices in the xarray. When these misplaced
> after-split
> >>>> folios are unfrozen, before correct folios are stored via
> >>>> __xa_store(), and
> >>>> grabbed by folio_try_get(), they are returned to userspace at wrong
> file
> >>>> indices, causing data corruption.
> >>>>
> >>>> Fix it by using the original folio in xas_try_split() calls, so that
> >>>> folio_try_get() can get the right after-split folios after the
> original
> >>>> folio is unfrozen.
> >>>>
> >>>> Uniform split, split_huge_page*(), is not affected, since it uses
> >>>> xas_split_alloc() and xas_split() only once and stores the original
> folio
> >>>> in the xarray.
> >>>>
> >>>> Fixes below points to the commit introduces the code, but
> >>>> folio_split() is
> >>>> used in a later commit 7460b470a131f ("mm/truncate: use folio_split()
> in
> >>>> truncate operation").
> >>>>
> >>>> Fixes: 00527733d0dc8 ("mm/huge_memory: add two new (not yet used)
> >>>> functions for folio_split()")
> >>>> Reported-by: Bas van Dijk <bas@dfinity.org>
> >>>> Closes: https://lore.kernel.org/all/CAKNNEtw5_kZomhkugedKMPOG-
> >>>> sxs5Q5OLumWJdiWXv+C9Yct0w@mail.gmail.com/
> >>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
> >>>> Cc: <stable@vger.kernel.org>
> >>>> ---
> >>>
> >>> Thanks for the fix!
> >>>
> >>> I also made a C reproducer and tested this patch - the corruption
> >>> disappeared.
> >>
> >> Should we link that reproducer somehow from the patch description?
> >
> > Yes, the original reproducer provided by Bas is available here[1].
> >
> > Regarding the C reproducer, Zi plans to add it to selftests in a
> > follow-up patch (as we discussed off-list).
> >
> > [1] https://github.com/dfinity/thp-madv-remove-test
>
> Sure. I will add the reproducer link to the commit log.
>
>
> Hi Bas,
>
> I used Cursor to convert your rust-based thp-madv-remove-test to C.
> Do you have any concern if I add it to kernel’s selftests to check
> this race condition?
>
> Thanks.
>
>
> Best Regards,
> Yan, Zi
>

[-- Attachment #2: Type: text/html, Size: 4679 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-03-02 20:54 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-28  1:06 [PATCH] mm/huge_memory: fix a folio_split() race condition with folio_try_get() Zi Yan
2026-02-28  3:10 ` Lance Yang
2026-03-02 14:28   ` David Hildenbrand (Arm)
2026-03-02 15:11     ` Lance Yang
2026-03-02 16:36       ` Zi Yan
2026-03-02 20:54         ` [External Sender] " Bas van Dijk
2026-03-02 13:30 ` Lorenzo Stoakes
2026-03-02 16:30   ` Zi Yan
2026-03-02 14:40 ` David Hildenbrand (Arm)
2026-03-02 16:34   ` Zi Yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox