linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] mm/huge_memory: fix a folio_split() race condition with folio_try_get()
@ 2026-03-02 20:31 Zi Yan
  2026-03-03  6:09 ` Baolin Wang
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Zi Yan @ 2026-03-02 20:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Lorenzo Stoakes, Zi Yan, Hugh Dickins,
	Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
	Barry Song, Lance Yang, Matthew Wilcox, Bas van Dijk, Eero Kelly,
	Andrew Battat, Adam Bratschi-Kaye, linux-mm, linux-kernel,
	linux-fsdevel, stable

During a pagecache folio split, the values in the related xarray should not
be changed from the original folio at xarray split time until all
after-split folios are well formed and stored in the xarray. Current use
of xas_try_split() in __split_unmapped_folio() lets some after-split folios
show up at wrong indices in the xarray. When these misplaced after-split
folios are unfrozen, before correct folios are stored via __xa_store(), and
grabbed by folio_try_get(), they are returned to userspace at wrong file
indices, causing data corruption. More detailed explanation is at the
bottom.

The reproducer is at: https://github.com/dfinity/thp-madv-remove-test
It
1. creates a memfd,
2. forks,
3. in the child process, maps the file with large folios (via shmem code
   path) and reads the mapped file continuously with 16 threads,
4. in the parent process, uses madvise(MADV_REMOVE) to punch poles in the
   large folio.

Data corruption can be observed without the fix. Basically, data from a
wrong page->index is returned.

Fix it by using the original folio in xas_try_split() calls, so that
folio_try_get() can get the right after-split folios after the original
folio is unfrozen.

Uniform split, split_huge_page*(), is not affected, since it uses
xas_split_alloc() and xas_split() only once and stores the original folio
in the xarray. Change xas_split() used in uniform split branch to use
the original folio to avoid confusion.

Fixes below points to the commit introduces the code, but folio_split() is
used in a later commit 7460b470a131f ("mm/truncate: use folio_split() in
truncate operation").

More details:

For example, a folio f is split non-uniformly into f, f2, f3, f4 like
below:
+----------------+---------+----+----+
|       f        |    f2   | f3 | f4 |
+----------------+---------+----+----+
but the xarray would look like below after __split_unmapped_folio() is
done:
+----------------+---------+----+----+
|       f        |    f2   | f3 | f3 |
+----------------+---------+----+----+

After __split_unmapped_folio(), the code changes the xarray and unfreezes
after-split folios:

1. unfreezes f2, __xa_store(f2)
2. unfreezes f3, __xa_store(f3)
3. unfreezes f4, __xa_store(f4), which overwrites the second f3 to f4.
4. unfreezes f.

Meanwhile, a parallel filemap_get_entry() can read the second f3 from the
xarray and use folio_try_get() on it at step 2 when f3 is unfrozen. Then,
f3 is wrongly returned to user.

After the fix, the xarray looks like below after __split_unmapped_folio():
+----------------+---------+----+----+
|       f        |    f    | f  | f  |
+----------------+---------+----+----+
so that the race window no longer exists.

Fixes: 00527733d0dc8 ("mm/huge_memory: add two new (not yet used) functions for folio_split()")
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reported-by: Bas van Dijk <bas@dfinity.org>
Closes: https://lore.kernel.org/all/CAKNNEtw5_kZomhkugedKMPOG-sxs5Q5OLumWJdiWXv+C9Yct0w@mail.gmail.com/
Tested-by: Lance Yang <lance.yang@linux.dev>
Cc: <stable@vger.kernel.org>
---
 mm/huge_memory.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 56db54fa48181..f0bdac3f270b5 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3647,6 +3647,7 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
 	const bool is_anon = folio_test_anon(folio);
 	int old_order = folio_order(folio);
 	int start_order = split_type == SPLIT_TYPE_UNIFORM ? new_order : old_order - 1;
+	struct folio *old_folio = folio;
 	int split_order;
 
 	/*
@@ -3668,11 +3669,17 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
 			 * irq is disabled to allocate enough memory, whereas
 			 * non-uniform split can handle ENOMEM.
 			 */
-			if (split_type == SPLIT_TYPE_UNIFORM)
-				xas_split(xas, folio, old_order);
-			else {
+			if (split_type == SPLIT_TYPE_UNIFORM) {
+				xas_split(xas, old_folio, old_order);
+			} else {
 				xas_set_order(xas, folio->index, split_order);
-				xas_try_split(xas, folio, old_order);
+				/*
+				 * use the to-be-split folio, so that a parallel
+				 * folio_try_get() waits on it until xarray is
+				 * updated with after-split folios and
+				 * the original one is unfrozen.
+				 */
+				xas_try_split(xas, old_folio, old_order);
 				if (xas_error(xas))
 					return xas_error(xas);
 			}
-- 
2.51.0



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] mm/huge_memory: fix a folio_split() race condition with folio_try_get()
  2026-03-02 20:31 [PATCH v2] mm/huge_memory: fix a folio_split() race condition with folio_try_get() Zi Yan
@ 2026-03-03  6:09 ` Baolin Wang
  2026-03-03  8:28 ` Wei Yang
  2026-03-03  8:48 ` David Hildenbrand (Arm)
  2 siblings, 0 replies; 4+ messages in thread
From: Baolin Wang @ 2026-03-03  6:09 UTC (permalink / raw)
  To: Zi Yan, Andrew Morton
  Cc: David Hildenbrand, Lorenzo Stoakes, Hugh Dickins,
	Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Matthew Wilcox, Bas van Dijk, Eero Kelly,
	Andrew Battat, Adam Bratschi-Kaye, linux-mm, linux-kernel,
	linux-fsdevel, stable



On 3/3/26 4:31 AM, Zi Yan wrote:
> During a pagecache folio split, the values in the related xarray should not
> be changed from the original folio at xarray split time until all
> after-split folios are well formed and stored in the xarray. Current use
> of xas_try_split() in __split_unmapped_folio() lets some after-split folios
> show up at wrong indices in the xarray. When these misplaced after-split
> folios are unfrozen, before correct folios are stored via __xa_store(), and
> grabbed by folio_try_get(), they are returned to userspace at wrong file
> indices, causing data corruption. More detailed explanation is at the
> bottom.
> 
> The reproducer is at: https://github.com/dfinity/thp-madv-remove-test
> It
> 1. creates a memfd,
> 2. forks,
> 3. in the child process, maps the file with large folios (via shmem code
>     path) and reads the mapped file continuously with 16 threads,
> 4. in the parent process, uses madvise(MADV_REMOVE) to punch poles in the
>     large folio.
> 
> Data corruption can be observed without the fix. Basically, data from a
> wrong page->index is returned.
> 
> Fix it by using the original folio in xas_try_split() calls, so that
> folio_try_get() can get the right after-split folios after the original
> folio is unfrozen.
> 
> Uniform split, split_huge_page*(), is not affected, since it uses
> xas_split_alloc() and xas_split() only once and stores the original folio
> in the xarray. Change xas_split() used in uniform split branch to use
> the original folio to avoid confusion.
> 
> Fixes below points to the commit introduces the code, but folio_split() is
> used in a later commit 7460b470a131f ("mm/truncate: use folio_split() in
> truncate operation").
> 
> More details:
> 
> For example, a folio f is split non-uniformly into f, f2, f3, f4 like
> below:
> +----------------+---------+----+----+
> |       f        |    f2   | f3 | f4 |
> +----------------+---------+----+----+
> but the xarray would look like below after __split_unmapped_folio() is
> done:
> +----------------+---------+----+----+
> |       f        |    f2   | f3 | f3 |
> +----------------+---------+----+----+
> 
> After __split_unmapped_folio(), the code changes the xarray and unfreezes
> after-split folios:
> 
> 1. unfreezes f2, __xa_store(f2)
> 2. unfreezes f3, __xa_store(f3)
> 3. unfreezes f4, __xa_store(f4), which overwrites the second f3 to f4.
> 4. unfreezes f.
> 
> Meanwhile, a parallel filemap_get_entry() can read the second f3 from the
> xarray and use folio_try_get() on it at step 2 when f3 is unfrozen. Then,
> f3 is wrongly returned to user.
> 
> After the fix, the xarray looks like below after __split_unmapped_folio():
> +----------------+---------+----+----+
> |       f        |    f    | f  | f  |
> +----------------+---------+----+----+
> so that the race window no longer exists.

Thanks for the detailed explanation. Make sense to me.

> Fixes: 00527733d0dc8 ("mm/huge_memory: add two new (not yet used) functions for folio_split()")
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> Reported-by: Bas van Dijk <bas@dfinity.org>
> Closes: https://lore.kernel.org/all/CAKNNEtw5_kZomhkugedKMPOG-sxs5Q5OLumWJdiWXv+C9Yct0w@mail.gmail.com/
> Tested-by: Lance Yang <lance.yang@linux.dev>
> Cc: <stable@vger.kernel.org>
> ---

Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] mm/huge_memory: fix a folio_split() race condition with folio_try_get()
  2026-03-02 20:31 [PATCH v2] mm/huge_memory: fix a folio_split() race condition with folio_try_get() Zi Yan
  2026-03-03  6:09 ` Baolin Wang
@ 2026-03-03  8:28 ` Wei Yang
  2026-03-03  8:48 ` David Hildenbrand (Arm)
  2 siblings, 0 replies; 4+ messages in thread
From: Wei Yang @ 2026-03-03  8:28 UTC (permalink / raw)
  To: Zi Yan
  Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes, Hugh Dickins,
	Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
	Barry Song, Lance Yang, Matthew Wilcox, Bas van Dijk, Eero Kelly,
	Andrew Battat, Adam Bratschi-Kaye, linux-mm, linux-kernel,
	linux-fsdevel, stable

On Mon, Mar 02, 2026 at 03:31:59PM -0500, Zi Yan wrote:
>During a pagecache folio split, the values in the related xarray should not
>be changed from the original folio at xarray split time until all
>after-split folios are well formed and stored in the xarray. Current use
>of xas_try_split() in __split_unmapped_folio() lets some after-split folios
>show up at wrong indices in the xarray. When these misplaced after-split
>folios are unfrozen, before correct folios are stored via __xa_store(), and
>grabbed by folio_try_get(), they are returned to userspace at wrong file
>indices, causing data corruption. More detailed explanation is at the
>bottom.
>
>The reproducer is at: https://github.com/dfinity/thp-madv-remove-test
>It
>1. creates a memfd,
>2. forks,
>3. in the child process, maps the file with large folios (via shmem code
>   path) and reads the mapped file continuously with 16 threads,
>4. in the parent process, uses madvise(MADV_REMOVE) to punch poles in the
>   large folio.
>
>Data corruption can be observed without the fix. Basically, data from a
>wrong page->index is returned.
>
>Fix it by using the original folio in xas_try_split() calls, so that
>folio_try_get() can get the right after-split folios after the original
>folio is unfrozen.
>
>Uniform split, split_huge_page*(), is not affected, since it uses
>xas_split_alloc() and xas_split() only once and stores the original folio
>in the xarray. Change xas_split() used in uniform split branch to use
>the original folio to avoid confusion.
>
>Fixes below points to the commit introduces the code, but folio_split() is
>used in a later commit 7460b470a131f ("mm/truncate: use folio_split() in
>truncate operation").
>
>More details:
>
>For example, a folio f is split non-uniformly into f, f2, f3, f4 like
>below:
>+----------------+---------+----+----+
>|       f        |    f2   | f3 | f4 |
>+----------------+---------+----+----+
>but the xarray would look like below after __split_unmapped_folio() is
>done:
>+----------------+---------+----+----+
>|       f        |    f2   | f3 | f3 |
>+----------------+---------+----+----+
>

Thanks for the detailed explanation, I finally realized it behaves like this.

>After __split_unmapped_folio(), the code changes the xarray and unfreezes
>after-split folios:
>
>1. unfreezes f2, __xa_store(f2)
>2. unfreezes f3, __xa_store(f3)
>3. unfreezes f4, __xa_store(f4), which overwrites the second f3 to f4.
>4. unfreezes f.
>
>Meanwhile, a parallel filemap_get_entry() can read the second f3 from the
>xarray and use folio_try_get() on it at step 2 when f3 is unfrozen. Then,
>f3 is wrongly returned to user.
>
>After the fix, the xarray looks like below after __split_unmapped_folio():
>+----------------+---------+----+----+
>|       f        |    f    | f  | f  |
>+----------------+---------+----+----+
>so that the race window no longer exists.

Since we unfreeze f at last.

>
>Fixes: 00527733d0dc8 ("mm/huge_memory: add two new (not yet used) functions for folio_split()")
>Signed-off-by: Zi Yan <ziy@nvidia.com>
>Reported-by: Bas van Dijk <bas@dfinity.org>
>Closes: https://lore.kernel.org/all/CAKNNEtw5_kZomhkugedKMPOG-sxs5Q5OLumWJdiWXv+C9Yct0w@mail.gmail.com/
>Tested-by: Lance Yang <lance.yang@linux.dev>
>Cc: <stable@vger.kernel.org>

So thanks for the fix.

Reviewed-by: Wei Yang <richard.weiyang@gmail.com>

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] mm/huge_memory: fix a folio_split() race condition with folio_try_get()
  2026-03-02 20:31 [PATCH v2] mm/huge_memory: fix a folio_split() race condition with folio_try_get() Zi Yan
  2026-03-03  6:09 ` Baolin Wang
  2026-03-03  8:28 ` Wei Yang
@ 2026-03-03  8:48 ` David Hildenbrand (Arm)
  2 siblings, 0 replies; 4+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-03  8:48 UTC (permalink / raw)
  To: Zi Yan, Andrew Morton
  Cc: Lorenzo Stoakes, Hugh Dickins, Baolin Wang, Liam R. Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	Matthew Wilcox, Bas van Dijk, Eero Kelly, Andrew Battat,
	Adam Bratschi-Kaye, linux-mm, linux-kernel, linux-fsdevel,
	stable

On 3/2/26 21:31, Zi Yan wrote:
> During a pagecache folio split, the values in the related xarray should not
> be changed from the original folio at xarray split time until all
> after-split folios are well formed and stored in the xarray. Current use
> of xas_try_split() in __split_unmapped_folio() lets some after-split folios
> show up at wrong indices in the xarray. When these misplaced after-split
> folios are unfrozen, before correct folios are stored via __xa_store(), and
> grabbed by folio_try_get(), they are returned to userspace at wrong file
> indices, causing data corruption. More detailed explanation is at the
> bottom.
> 
> The reproducer is at: https://github.com/dfinity/thp-madv-remove-test
> It
> 1. creates a memfd,
> 2. forks,
> 3. in the child process, maps the file with large folios (via shmem code
>    path) and reads the mapped file continuously with 16 threads,
> 4. in the parent process, uses madvise(MADV_REMOVE) to punch poles in the
>    large folio.
> 
> Data corruption can be observed without the fix. Basically, data from a
> wrong page->index is returned.
> 
> Fix it by using the original folio in xas_try_split() calls, so that
> folio_try_get() can get the right after-split folios after the original
> folio is unfrozen.
> 
> Uniform split, split_huge_page*(), is not affected, since it uses
> xas_split_alloc() and xas_split() only once and stores the original folio
> in the xarray. Change xas_split() used in uniform split branch to use
> the original folio to avoid confusion.
> 
> Fixes below points to the commit introduces the code, but folio_split() is
> used in a later commit 7460b470a131f ("mm/truncate: use folio_split() in
> truncate operation").
> 
> More details:
> 
> For example, a folio f is split non-uniformly into f, f2, f3, f4 like
> below:
> +----------------+---------+----+----+
> |       f        |    f2   | f3 | f4 |
> +----------------+---------+----+----+
> but the xarray would look like below after __split_unmapped_folio() is
> done:
> +----------------+---------+----+----+
> |       f        |    f2   | f3 | f3 |
> +----------------+---------+----+----+
> 
> After __split_unmapped_folio(), the code changes the xarray and unfreezes
> after-split folios:
> 
> 1. unfreezes f2, __xa_store(f2)
> 2. unfreezes f3, __xa_store(f3)
> 3. unfreezes f4, __xa_store(f4), which overwrites the second f3 to f4.
> 4. unfreezes f.
> 
> Meanwhile, a parallel filemap_get_entry() can read the second f3 from the
> xarray and use folio_try_get() on it at step 2 when f3 is unfrozen. Then,
> f3 is wrongly returned to user.
> 
> After the fix, the xarray looks like below after __split_unmapped_folio():
> +----------------+---------+----+----+
> |       f        |    f    | f  | f  |
> +----------------+---------+----+----+
> so that the race window no longer exists.
> 
> Fixes: 00527733d0dc8 ("mm/huge_memory: add two new (not yet used) functions for folio_split()")
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> Reported-by: Bas van Dijk <bas@dfinity.org>
> Closes: https://lore.kernel.org/all/CAKNNEtw5_kZomhkugedKMPOG-sxs5Q5OLumWJdiWXv+C9Yct0w@mail.gmail.com/
> Tested-by: Lance Yang <lance.yang@linux.dev>
> Cc: <stable@vger.kernel.org>
> ---
>  mm/huge_memory.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 56db54fa48181..f0bdac3f270b5 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3647,6 +3647,7 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
>  	const bool is_anon = folio_test_anon(folio);
>  	int old_order = folio_order(folio);
>  	int start_order = split_type == SPLIT_TYPE_UNIFORM ? new_order : old_order - 1;
> +	struct folio *old_folio = folio;
>  	int split_order;
>  
>  	/*
> @@ -3668,11 +3669,17 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
>  			 * irq is disabled to allocate enough memory, whereas
>  			 * non-uniform split can handle ENOMEM.
>  			 */
> -			if (split_type == SPLIT_TYPE_UNIFORM)
> -				xas_split(xas, folio, old_order);
> -			else {

Just wondering whether we should no move the comment over here now, so
it just covers both cases.

> +			if (split_type == SPLIT_TYPE_UNIFORM) {
> +				xas_split(xas, old_folio, old_order);
> +			} else {
>  				xas_set_order(xas, folio->index, split_order);
> -				xas_try_split(xas, folio, old_order);
> +				/*
> +				 * use the to-be-split folio, so that a parallel
> +				 * folio_try_get() waits on it until xarray is
> +				 * updated with after-split folios and
> +				 * the original one is unfrozen.
> +				 */
> +				xas_try_split(xas, old_folio, old_order);
>  				if (xas_error(xas))
>  					return xas_error(xas);
>  			}

Acked-by: David Hildenbrand (Arm) <david@kernel.org>

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-03-03  8:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-03-02 20:31 [PATCH v2] mm/huge_memory: fix a folio_split() race condition with folio_try_get() Zi Yan
2026-03-03  6:09 ` Baolin Wang
2026-03-03  8:28 ` Wei Yang
2026-03-03  8:48 ` David Hildenbrand (Arm)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox