[PATCH] mm/huge_memory: fix NULL pointer deference when splitting shmem folio in swap cache

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mm/huge_memory: fix NULL pointer deference when splitting shmem folio in swap cache
@ 2025-11-19  1:26 Wei Yang
  2025-11-19  2:32 ` Zi Yan
  2025-11-19  8:57 ` David Hildenbrand (Red Hat)
  0 siblings, 2 replies; 22+ messages in thread
From: Wei Yang @ 2025-11-19  1:26 UTC (permalink / raw)
  To: akpm, david, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett,
	npache, ryan.roberts, dev.jain, baohua, lance.yang
  Cc: linux-mm, Wei Yang, stable

Commit c010d47f107f ("mm: thp: split huge page to any lower order
pages") introduced an early check on the folio's order via
mapping->flags before proceeding with the split work.

This check introduced a bug: for shmem folios in the swap cache, the
mapping pointer can be NULL. Accessing mapping->flags in this state
leads directly to a NULL pointer dereference.

This commit fixes the issue by moving the check for mapping != NULL
before any attempt to access mapping->flags.

This fix necessarily changes the return value from -EBUSY to -EINVAL
when mapping is NULL. After reviewing current callers, they do not
differentiate between these two error codes, making this change safe.

Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages")
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: <stable@vger.kernel.org>

---

This patch is based on current mm-new, latest commit:

    056b93566a35 mm/vmalloc: warn only once when vmalloc detect invalid gfp flags

Backport note:

Current code evolved from original commit with following four changes.
We should do proper adjustment respectively on backporting.

commit c010d47f107f609b9f4d6a103b6dfc53889049e9
Author: Zi Yan <ziy@nvidia.com>
Date:   Mon Feb 26 15:55:33 2024 -0500

    mm: thp: split huge page to any lower order pages

commit 6a50c9b512f7734bc356f4bd47885a6f7c98491a
Author: Ran Xiaokai <ran.xiaokai@zte.com.cn>
Date:   Fri Jun 7 17:40:48 2024 +0800

    mm: huge_memory: fix misused mapping_large_folio_support() for anon folios

commit 9b2f764933eb5e3ac9ebba26e3341529219c4401
Author: Zi Yan <ziy@nvidia.com>
Date:   Wed Jan 22 11:19:27 2025 -0500

    mm/huge_memory: allow split shmem large folio to any lower order

commit 58729c04cf1092b87aeef0bf0998c9e2e4771133
Author: Zi Yan <ziy@nvidia.com>
Date:   Fri Mar 7 12:39:57 2025 -0500

    mm/huge_memory: add buddy allocator like (non-uniform) folio_split()
---
 mm/huge_memory.c | 68 +++++++++++++++++++++++++-----------------------
 1 file changed, 35 insertions(+), 33 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 7c69572b6c3f..8701c3eef05f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3696,29 +3696,42 @@ bool folio_split_supported(struct folio *folio, unsigned int new_order,
 				"Cannot split to order-1 folio");
 		if (new_order == 1)
 			return false;
-	} else if (split_type == SPLIT_TYPE_NON_UNIFORM || new_order) {
-		if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
-		    !mapping_large_folio_support(folio->mapping)) {
-			/*
-			 * We can always split a folio down to a single page
-			 * (new_order == 0) uniformly.
-			 *
-			 * For any other scenario
-			 *   a) uniform split targeting a large folio
-			 *      (new_order > 0)
-			 *   b) any non-uniform split
-			 * we must confirm that the file system supports large
-			 * folios.
-			 *
-			 * Note that we might still have THPs in such
-			 * mappings, which is created from khugepaged when
-			 * CONFIG_READ_ONLY_THP_FOR_FS is enabled. But in that
-			 * case, the mapping does not actually support large
-			 * folios properly.
-			 */
-			VM_WARN_ONCE(warns,
-				"Cannot split file folio to non-0 order");
+	} else {
+		const struct address_space *mapping = folio->mapping;
+
+		/* Truncated ? */
+		/*
+		 * TODO: add support for large shmem folio in swap cache.
+		 * When shmem is in swap cache, mapping is NULL and
+		 * folio_test_swapcache() is true.
+		 */
+		if (!mapping)
 			return false;
+
+		if (split_type == SPLIT_TYPE_NON_UNIFORM || new_order) {
+			if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
+			    !mapping_large_folio_support(folio->mapping)) {
+				/*
+				 * We can always split a folio down to a
+				 * single page (new_order == 0) uniformly.
+				 *
+				 * For any other scenario
+				 *   a) uniform split targeting a large folio
+				 *      (new_order > 0)
+				 *   b) any non-uniform split
+				 * we must confirm that the file system
+				 * supports large folios.
+				 *
+				 * Note that we might still have THPs in such
+				 * mappings, which is created from khugepaged
+				 * when CONFIG_READ_ONLY_THP_FOR_FS is
+				 * enabled. But in that case, the mapping does
+				 * not actually support large folios properly.
+				 */
+				VM_WARN_ONCE(warns,
+					"Cannot split file folio to non-0 order");
+				return false;
+			}
 		}
 	}
 
@@ -3965,17 +3978,6 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
 
 		mapping = folio->mapping;
 
-		/* Truncated ? */
-		/*
-		 * TODO: add support for large shmem folio in swap cache.
-		 * When shmem is in swap cache, mapping is NULL and
-		 * folio_test_swapcache() is true.
-		 */
-		if (!mapping) {
-			ret = -EBUSY;
-			goto out;
-		}
-
 		min_order = mapping_min_folio_order(folio->mapping);
 		if (new_order < min_order) {
 			ret = -EINVAL;
-- 
2.34.1



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/huge_memory: fix NULL pointer deference when splitting shmem folio in swap cache
  2025-11-19  1:26 [PATCH] mm/huge_memory: fix NULL pointer deference when splitting shmem folio in swap cache Wei Yang
@ 2025-11-19  2:32 ` Zi Yan
  2025-11-19  2:56   ` Wei Yang
  2025-11-19  8:57 ` David Hildenbrand (Red Hat)
  1 sibling, 1 reply; 22+ messages in thread
From: Zi Yan @ 2025-11-19  2:32 UTC (permalink / raw)
  To: Wei Yang
  Cc: akpm, david, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, lance.yang, linux-mm, stable

On 18 Nov 2025, at 20:26, Wei Yang wrote:

> Commit c010d47f107f ("mm: thp: split huge page to any lower order
> pages") introduced an early check on the folio's order via
> mapping->flags before proceeding with the split work.
>
> This check introduced a bug: for shmem folios in the swap cache, the
> mapping pointer can be NULL. Accessing mapping->flags in this state
> leads directly to a NULL pointer dereference.
>
> This commit fixes the issue by moving the check for mapping != NULL
> before any attempt to access mapping->flags.
>
> This fix necessarily changes the return value from -EBUSY to -EINVAL
> when mapping is NULL. After reviewing current callers, they do not
> differentiate between these two error codes, making this change safe.
>
> Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages")
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: <stable@vger.kernel.org>
>
> ---
>
> This patch is based on current mm-new, latest commit:
>
>     056b93566a35 mm/vmalloc: warn only once when vmalloc detect invalid gfp flags
>
> Backport note:
>
> Current code evolved from original commit with following four changes.
> We should do proper adjustment respectively on backporting.
>
> commit c010d47f107f609b9f4d6a103b6dfc53889049e9
> Author: Zi Yan <ziy@nvidia.com>
> Date:   Mon Feb 26 15:55:33 2024 -0500
>
>     mm: thp: split huge page to any lower order pages
>
> commit 6a50c9b512f7734bc356f4bd47885a6f7c98491a
> Author: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> Date:   Fri Jun 7 17:40:48 2024 +0800
>
>     mm: huge_memory: fix misused mapping_large_folio_support() for anon folios

This is a hot fix to commit c010d47f107f, so the backport should end
at this point.

>
> commit 9b2f764933eb5e3ac9ebba26e3341529219c4401
> Author: Zi Yan <ziy@nvidia.com>
> Date:   Wed Jan 22 11:19:27 2025 -0500
>
>     mm/huge_memory: allow split shmem large folio to any lower order
>
> commit 58729c04cf1092b87aeef0bf0998c9e2e4771133
> Author: Zi Yan <ziy@nvidia.com>
> Date:   Fri Mar 7 12:39:57 2025 -0500
>
>     mm/huge_memory: add buddy allocator like (non-uniform) folio_split()
> ---
>  mm/huge_memory.c | 68 +++++++++++++++++++++++++-----------------------
>  1 file changed, 35 insertions(+), 33 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 7c69572b6c3f..8701c3eef05f 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3696,29 +3696,42 @@ bool folio_split_supported(struct folio *folio, unsigned int new_order,
>  				"Cannot split to order-1 folio");
>  		if (new_order == 1)
>  			return false;
> -	} else if (split_type == SPLIT_TYPE_NON_UNIFORM || new_order) {
> -		if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
> -		    !mapping_large_folio_support(folio->mapping)) {
> -			/*
> -			 * We can always split a folio down to a single page
> -			 * (new_order == 0) uniformly.
> -			 *
> -			 * For any other scenario
> -			 *   a) uniform split targeting a large folio
> -			 *      (new_order > 0)
> -			 *   b) any non-uniform split
> -			 * we must confirm that the file system supports large
> -			 * folios.
> -			 *
> -			 * Note that we might still have THPs in such
> -			 * mappings, which is created from khugepaged when
> -			 * CONFIG_READ_ONLY_THP_FOR_FS is enabled. But in that
> -			 * case, the mapping does not actually support large
> -			 * folios properly.
> -			 */
> -			VM_WARN_ONCE(warns,
> -				"Cannot split file folio to non-0 order");
> +	} else {
> +		const struct address_space *mapping = folio->mapping;
> +
> +		/* Truncated ? */
> +		/*
> +		 * TODO: add support for large shmem folio in swap cache.
> +		 * When shmem is in swap cache, mapping is NULL and
> +		 * folio_test_swapcache() is true.
> +		 */
> +		if (!mapping)
>  			return false;
> +
> +		if (split_type == SPLIT_TYPE_NON_UNIFORM || new_order) {
> +			if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
> +			    !mapping_large_folio_support(folio->mapping)) {

folio->mapping can just be mapping here. The above involved commits would
mostly need separate backport patches, so keeping folio->mapping
as the original code does not make backporting easier.

> +				/*
> +				 * We can always split a folio down to a
> +				 * single page (new_order == 0) uniformly.
> +				 *
> +				 * For any other scenario
> +				 *   a) uniform split targeting a large folio
> +				 *      (new_order > 0)
> +				 *   b) any non-uniform split
> +				 * we must confirm that the file system
> +				 * supports large folios.
> +				 *
> +				 * Note that we might still have THPs in such
> +				 * mappings, which is created from khugepaged
> +				 * when CONFIG_READ_ONLY_THP_FOR_FS is
> +				 * enabled. But in that case, the mapping does
> +				 * not actually support large folios properly.
> +				 */
> +				VM_WARN_ONCE(warns,
> +					"Cannot split file folio to non-0 order");
> +				return false;
> +			}
>  		}
>  	}
>
> @@ -3965,17 +3978,6 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>
>  		mapping = folio->mapping;
>
> -		/* Truncated ? */
> -		/*
> -		 * TODO: add support for large shmem folio in swap cache.
> -		 * When shmem is in swap cache, mapping is NULL and
> -		 * folio_test_swapcache() is true.
> -		 */
> -		if (!mapping) {
> -			ret = -EBUSY;
> -			goto out;
> -		}
> -
>  		min_order = mapping_min_folio_order(folio->mapping);
>  		if (new_order < min_order) {
>  			ret = -EINVAL;
> -- 
> 2.34.1

Otherwise, LGTM. Thank you for fixing the issue.

Reviewed-by: Zi Yan <ziy@nvidia.com>

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/huge_memory: fix NULL pointer deference when splitting shmem folio in swap cache
  2025-11-19  2:32 ` Zi Yan
@ 2025-11-19  2:56   ` Wei Yang
  0 siblings, 0 replies; 22+ messages in thread
From: Wei Yang @ 2025-11-19  2:56 UTC (permalink / raw)
  To: Zi Yan
  Cc: Wei Yang, akpm, david, lorenzo.stoakes, baolin.wang,
	Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, lance.yang,
	linux-mm, stable

On Tue, Nov 18, 2025 at 09:32:05PM -0500, Zi Yan wrote:
>On 18 Nov 2025, at 20:26, Wei Yang wrote:
>
>> Commit c010d47f107f ("mm: thp: split huge page to any lower order
>> pages") introduced an early check on the folio's order via
>> mapping->flags before proceeding with the split work.
>>
>> This check introduced a bug: for shmem folios in the swap cache, the
>> mapping pointer can be NULL. Accessing mapping->flags in this state
>> leads directly to a NULL pointer dereference.
>>
>> This commit fixes the issue by moving the check for mapping != NULL
>> before any attempt to access mapping->flags.
>>
>> This fix necessarily changes the return value from -EBUSY to -EINVAL
>> when mapping is NULL. After reviewing current callers, they do not
>> differentiate between these two error codes, making this change safe.
>>
>> Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages")
>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> Cc: Zi Yan <ziy@nvidia.com>
>> Cc: <stable@vger.kernel.org>
>>
>> ---
>>
>> This patch is based on current mm-new, latest commit:
>>
>>     056b93566a35 mm/vmalloc: warn only once when vmalloc detect invalid gfp flags
>>
>> Backport note:
>>
>> Current code evolved from original commit with following four changes.
>> We should do proper adjustment respectively on backporting.
>>
>> commit c010d47f107f609b9f4d6a103b6dfc53889049e9
>> Author: Zi Yan <ziy@nvidia.com>
>> Date:   Mon Feb 26 15:55:33 2024 -0500
>>
>>     mm: thp: split huge page to any lower order pages
>>
>> commit 6a50c9b512f7734bc356f4bd47885a6f7c98491a
>> Author: Ran Xiaokai <ran.xiaokai@zte.com.cn>
>> Date:   Fri Jun 7 17:40:48 2024 +0800
>>
>>     mm: huge_memory: fix misused mapping_large_folio_support() for anon folios
>
>This is a hot fix to commit c010d47f107f, so the backport should end
>at this point.
>
>>
>> commit 9b2f764933eb5e3ac9ebba26e3341529219c4401
>> Author: Zi Yan <ziy@nvidia.com>
>> Date:   Wed Jan 22 11:19:27 2025 -0500
>>
>>     mm/huge_memory: allow split shmem large folio to any lower order
>>
>> commit 58729c04cf1092b87aeef0bf0998c9e2e4771133
>> Author: Zi Yan <ziy@nvidia.com>
>> Date:   Fri Mar 7 12:39:57 2025 -0500
>>
>>     mm/huge_memory: add buddy allocator like (non-uniform) folio_split()
>> ---
>>  mm/huge_memory.c | 68 +++++++++++++++++++++++++-----------------------
>>  1 file changed, 35 insertions(+), 33 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 7c69572b6c3f..8701c3eef05f 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -3696,29 +3696,42 @@ bool folio_split_supported(struct folio *folio, unsigned int new_order,
>>  				"Cannot split to order-1 folio");
>>  		if (new_order == 1)
>>  			return false;
>> -	} else if (split_type == SPLIT_TYPE_NON_UNIFORM || new_order) {
>> -		if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
>> -		    !mapping_large_folio_support(folio->mapping)) {
>> -			/*
>> -			 * We can always split a folio down to a single page
>> -			 * (new_order == 0) uniformly.
>> -			 *
>> -			 * For any other scenario
>> -			 *   a) uniform split targeting a large folio
>> -			 *      (new_order > 0)
>> -			 *   b) any non-uniform split
>> -			 * we must confirm that the file system supports large
>> -			 * folios.
>> -			 *
>> -			 * Note that we might still have THPs in such
>> -			 * mappings, which is created from khugepaged when
>> -			 * CONFIG_READ_ONLY_THP_FOR_FS is enabled. But in that
>> -			 * case, the mapping does not actually support large
>> -			 * folios properly.
>> -			 */
>> -			VM_WARN_ONCE(warns,
>> -				"Cannot split file folio to non-0 order");
>> +	} else {
>> +		const struct address_space *mapping = folio->mapping;
>> +
>> +		/* Truncated ? */
>> +		/*
>> +		 * TODO: add support for large shmem folio in swap cache.
>> +		 * When shmem is in swap cache, mapping is NULL and
>> +		 * folio_test_swapcache() is true.
>> +		 */
>> +		if (!mapping)
>>  			return false;
>> +
>> +		if (split_type == SPLIT_TYPE_NON_UNIFORM || new_order) {
>> +			if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
>> +			    !mapping_large_folio_support(folio->mapping)) {
>
>folio->mapping can just be mapping here. The above involved commits would
>mostly need separate backport patches, so keeping folio->mapping
>as the original code does not make backporting easier.
>

Thanks, I think you are right. I tried to keep the foot print small for
backport. But it seems not benefit much.

@Andrew

If an updated version is necessary, please let me know.

>> +				/*
>> +				 * We can always split a folio down to a
>> +				 * single page (new_order == 0) uniformly.
>> +				 *
>> +				 * For any other scenario
>> +				 *   a) uniform split targeting a large folio
>> +				 *      (new_order > 0)
>> +				 *   b) any non-uniform split
>> +				 * we must confirm that the file system
>> +				 * supports large folios.
>> +				 *
>> +				 * Note that we might still have THPs in such
>> +				 * mappings, which is created from khugepaged
>> +				 * when CONFIG_READ_ONLY_THP_FOR_FS is
>> +				 * enabled. But in that case, the mapping does
>> +				 * not actually support large folios properly.
>> +				 */
>> +				VM_WARN_ONCE(warns,
>> +					"Cannot split file folio to non-0 order");
>> +				return false;
>> +			}
>>  		}
>>  	}
>>
>> @@ -3965,17 +3978,6 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>
>>  		mapping = folio->mapping;
>>
>> -		/* Truncated ? */
>> -		/*
>> -		 * TODO: add support for large shmem folio in swap cache.
>> -		 * When shmem is in swap cache, mapping is NULL and
>> -		 * folio_test_swapcache() is true.
>> -		 */
>> -		if (!mapping) {
>> -			ret = -EBUSY;
>> -			goto out;
>> -		}
>> -
>>  		min_order = mapping_min_folio_order(folio->mapping);
>>  		if (new_order < min_order) {
>>  			ret = -EINVAL;
>> -- 
>> 2.34.1
>
>Otherwise, LGTM. Thank you for fixing the issue.
>
>Reviewed-by: Zi Yan <ziy@nvidia.com>
>
>Best Regards,
>Yan, Zi

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/huge_memory: fix NULL pointer deference when splitting shmem folio in swap cache
  2025-11-19  1:26 [PATCH] mm/huge_memory: fix NULL pointer deference when splitting shmem folio in swap cache Wei Yang
  2025-11-19  2:32 ` Zi Yan
@ 2025-11-19  8:57 ` David Hildenbrand (Red Hat)
  2025-11-19 12:23   ` Wei Yang
  2025-11-19 12:42   ` Wei Yang
  1 sibling, 2 replies; 22+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-19  8:57 UTC (permalink / raw)
  To: Wei Yang, akpm, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett,
	npache, ryan.roberts, dev.jain, baohua, lance.yang
  Cc: linux-mm, stable

On 19.11.25 02:26, Wei Yang wrote:
> Commit c010d47f107f ("mm: thp: split huge page to any lower order
> pages") introduced an early check on the folio's order via
> mapping->flags before proceeding with the split work.
> 
> This check introduced a bug: for shmem folios in the swap cache, the
> mapping pointer can be NULL. Accessing mapping->flags in this state
> leads directly to a NULL pointer dereference.

Under which circumstances would that be the case? Only for large shmem 
folios in the swapcache or also for truncated folios? So I'd assume this
would also affect truncated folios and we should spell that out here?

> 
> This commit fixes the issue by moving the check for mapping != NULL
> before any attempt to access mapping->flags.
> 
> This fix necessarily changes the return value from -EBUSY to -EINVAL
> when mapping is NULL. After reviewing current callers, they do not
> differentiate between these two error codes, making this change safe.

The doc of __split_huge_page_to_list_to_order() would now be outdated 
and has to be updated.

Also, take a look at s390_wiggle_split_folio(): returning -EINVAL 
instead of -EBUSY will make a difference on concurrent truncation. 
-EINVAL will be propagated and make the operation fail, while -EBUSY 
will be translated to -EAGAIN and the caller will simply lookup the 
folio again and retry.

So I think we should try to keep truncation return -EBUSY. For the shmem 
case, I think it's ok to return -EINVAL. I guess we can identify such 
folios by checking for folio_test_swapcache().


Probably worth mentioning that this was identified by code inspection?

> 
> Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages")
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: <stable@vger.kernel.org>

Hmm, what would this patch look like when based on current upstream? 
We'd likely want to get that upstream asap.

> 
> ---
> 
> This patch is based on current mm-new, latest commit:
> 
>      056b93566a35 mm/vmalloc: warn only once when vmalloc detect invalid gfp flags
> 
> Backport note:
> 
> Current code evolved from original commit with following four changes.
> We should do proper adjustment respectively on backporting.
> 
> commit c010d47f107f609b9f4d6a103b6dfc53889049e9
> Author: Zi Yan <ziy@nvidia.com>
> Date:   Mon Feb 26 15:55:33 2024 -0500
> 
>      mm: thp: split huge page to any lower order pages
> 
> commit 6a50c9b512f7734bc356f4bd47885a6f7c98491a
> Author: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> Date:   Fri Jun 7 17:40:48 2024 +0800
> 
>      mm: huge_memory: fix misused mapping_large_folio_support() for anon folios
> 
> commit 9b2f764933eb5e3ac9ebba26e3341529219c4401
> Author: Zi Yan <ziy@nvidia.com>
> Date:   Wed Jan 22 11:19:27 2025 -0500
> 
>      mm/huge_memory: allow split shmem large folio to any lower order
> 
> commit 58729c04cf1092b87aeef0bf0998c9e2e4771133
> Author: Zi Yan <ziy@nvidia.com>
> Date:   Fri Mar 7 12:39:57 2025 -0500
> 
>      mm/huge_memory: add buddy allocator like (non-uniform) folio_split()
> ---
>   mm/huge_memory.c | 68 +++++++++++++++++++++++++-----------------------
>   1 file changed, 35 insertions(+), 33 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 7c69572b6c3f..8701c3eef05f 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3696,29 +3696,42 @@ bool folio_split_supported(struct folio *folio, unsigned int new_order,
>   				"Cannot split to order-1 folio");
>   		if (new_order == 1)
>   			return false;
> -	} else if (split_type == SPLIT_TYPE_NON_UNIFORM || new_order) {
> -		if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
> -		    !mapping_large_folio_support(folio->mapping)) {
> -			/*
> -			 * We can always split a folio down to a single page
> -			 * (new_order == 0) uniformly.
> -			 *
> -			 * For any other scenario
> -			 *   a) uniform split targeting a large folio
> -			 *      (new_order > 0)
> -			 *   b) any non-uniform split
> -			 * we must confirm that the file system supports large
> -			 * folios.
> -			 *
> -			 * Note that we might still have THPs in such
> -			 * mappings, which is created from khugepaged when
> -			 * CONFIG_READ_ONLY_THP_FOR_FS is enabled. But in that
> -			 * case, the mapping does not actually support large
> -			 * folios properly.
> -			 */
> -			VM_WARN_ONCE(warns,
> -				"Cannot split file folio to non-0 order");
> +	} else {


Why not simply

} else if (!folio->mapping) {
	/*
	 * Truncated?
	...
	return false;
} else if (split_type == SPLIT_TYPE_NON_UNIFORM || new_order) {
...

> +		const struct address_space *mapping = folio->mapping;
> +
> +		/* Truncated ? */
> +		/*
> +		 * TODO: add support for large shmem folio in swap cache.
> +		 * When shmem is in swap cache, mapping is NULL and
> +		 * folio_test_swapcache() is true.
> +		 */

While at it, can we merge the two comments into something, like:

/*
  * If there is no mapping that the folio was truncated and we cannot
  * split.
  *
  * TODO: large shmem folio in the swap cache also don't currently have a
  * mapping but folio_test_swapcache() is true for them.
  */

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/huge_memory: fix NULL pointer deference when splitting shmem folio in swap cache
  2025-11-19  8:57 ` David Hildenbrand (Red Hat)
@ 2025-11-19 12:23   ` Wei Yang
  2025-11-19 12:54     ` David Hildenbrand (Red Hat)
  2025-11-19 12:42   ` Wei Yang
  1 sibling, 1 reply; 22+ messages in thread
From: Wei Yang @ 2025-11-19 12:23 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat)
  Cc: Wei Yang, akpm, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett,
	npache, ryan.roberts, dev.jain, baohua, lance.yang, linux-mm,
	stable

On Wed, Nov 19, 2025 at 09:57:58AM +0100, David Hildenbrand (Red Hat) wrote:
>On 19.11.25 02:26, Wei Yang wrote:
>> Commit c010d47f107f ("mm: thp: split huge page to any lower order
>> pages") introduced an early check on the folio's order via
>> mapping->flags before proceeding with the split work.
>> 
>> This check introduced a bug: for shmem folios in the swap cache, the
>> mapping pointer can be NULL. Accessing mapping->flags in this state
>> leads directly to a NULL pointer dereference.
>
>Under which circumstances would that be the case? Only for large shmem folios
>in the swapcache or also for truncated folios? So I'd assume this
>would also affect truncated folios and we should spell that out here?
>

Sorry I don't mention it for my uncertainty.

Will add it.

>> 
>> This commit fixes the issue by moving the check for mapping != NULL
>> before any attempt to access mapping->flags.
>> 
>> This fix necessarily changes the return value from -EBUSY to -EINVAL
>> when mapping is NULL. After reviewing current callers, they do not
>> differentiate between these two error codes, making this change safe.
>
>The doc of __split_huge_page_to_list_to_order() would now be outdated and has
>to be updated.
>
>Also, take a look at s390_wiggle_split_folio(): returning -EINVAL instead of
>-EBUSY will make a difference on concurrent truncation. -EINVAL will be
>propagated and make the operation fail, while -EBUSY will be translated to
>-EAGAIN and the caller will simply lookup the folio again and retry.
>

Oops, I missed this case.

I will rescan users.

>So I think we should try to keep truncation return -EBUSY. For the shmem
>case, I think it's ok to return -EINVAL. I guess we can identify such folios
>by checking for folio_test_swapcache().
>

Hmm... Don't get how to do this nicely.

Looks we can't do it in folio_split_supported().

Or change folio_split_supported() return error code directly?

>
>Probably worth mentioning that this was identified by code inspection?
>

Agree.

>> 
>> Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages")
>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> Cc: Zi Yan <ziy@nvidia.com>
>> Cc: <stable@vger.kernel.org>
>
>Hmm, what would this patch look like when based on current upstream? We'd
>likely want to get that upstream asap.
>

This depends whether we want it on top of [1].

Current upstream doesn't have it [1] and need to fix it in two places.

Andrew mention prefer a fixup version in [2]. 

[1]: lkml.kernel.org/r/20251106034155.21398-1-richard.weiyang@gmail.com
[2]: lkml.kernel.org/r/20251118140658.9078de6aab719b2308996387@linux-foundation.org

>> 
>> ---
>> 
>> This patch is based on current mm-new, latest commit:
>> 
>>      056b93566a35 mm/vmalloc: warn only once when vmalloc detect invalid gfp flags
>> 
>> Backport note:
>> 
>> Current code evolved from original commit with following four changes.
>> We should do proper adjustment respectively on backporting.
>> 
>> commit c010d47f107f609b9f4d6a103b6dfc53889049e9
>> Author: Zi Yan <ziy@nvidia.com>
>> Date:   Mon Feb 26 15:55:33 2024 -0500
>> 
>>      mm: thp: split huge page to any lower order pages
>> 
>> commit 6a50c9b512f7734bc356f4bd47885a6f7c98491a
>> Author: Ran Xiaokai <ran.xiaokai@zte.com.cn>
>> Date:   Fri Jun 7 17:40:48 2024 +0800
>> 
>>      mm: huge_memory: fix misused mapping_large_folio_support() for anon folios
>> 
>> commit 9b2f764933eb5e3ac9ebba26e3341529219c4401
>> Author: Zi Yan <ziy@nvidia.com>
>> Date:   Wed Jan 22 11:19:27 2025 -0500
>> 
>>      mm/huge_memory: allow split shmem large folio to any lower order
>> 
>> commit 58729c04cf1092b87aeef0bf0998c9e2e4771133
>> Author: Zi Yan <ziy@nvidia.com>
>> Date:   Fri Mar 7 12:39:57 2025 -0500
>> 
>>      mm/huge_memory: add buddy allocator like (non-uniform) folio_split()
>> ---
>>   mm/huge_memory.c | 68 +++++++++++++++++++++++++-----------------------
>>   1 file changed, 35 insertions(+), 33 deletions(-)
>> 
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 7c69572b6c3f..8701c3eef05f 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -3696,29 +3696,42 @@ bool folio_split_supported(struct folio *folio, unsigned int new_order,
>>   				"Cannot split to order-1 folio");
>>   		if (new_order == 1)
>>   			return false;
>> -	} else if (split_type == SPLIT_TYPE_NON_UNIFORM || new_order) {
>> -		if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
>> -		    !mapping_large_folio_support(folio->mapping)) {
>> -			/*
>> -			 * We can always split a folio down to a single page
>> -			 * (new_order == 0) uniformly.
>> -			 *
>> -			 * For any other scenario
>> -			 *   a) uniform split targeting a large folio
>> -			 *      (new_order > 0)
>> -			 *   b) any non-uniform split
>> -			 * we must confirm that the file system supports large
>> -			 * folios.
>> -			 *
>> -			 * Note that we might still have THPs in such
>> -			 * mappings, which is created from khugepaged when
>> -			 * CONFIG_READ_ONLY_THP_FOR_FS is enabled. But in that
>> -			 * case, the mapping does not actually support large
>> -			 * folios properly.
>> -			 */
>> -			VM_WARN_ONCE(warns,
>> -				"Cannot split file folio to non-0 order");
>> +	} else {
>
>
>Why not simply
>
>} else if (!folio->mapping) {
>	/*
>	 * Truncated?
>	...
>	return false;
>} else if (split_type == SPLIT_TYPE_NON_UNIFORM || new_order) {
>...
>

Ah, just not spot this :-(

>> +		const struct address_space *mapping = folio->mapping;
>> +
>> +		/* Truncated ? */
>> +		/*
>> +		 * TODO: add support for large shmem folio in swap cache.
>> +		 * When shmem is in swap cache, mapping is NULL and
>> +		 * folio_test_swapcache() is true.
>> +		 */
>
>While at it, can we merge the two comments into something, like:
>
>/*
> * If there is no mapping that the folio was truncated and we cannot
> * split.
> *
> * TODO: large shmem folio in the swap cache also don't currently have a
> * mapping but folio_test_swapcache() is true for them.
> */
>

Sure.

>-- 
>Cheers
>
>David

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/huge_memory: fix NULL pointer deference when splitting shmem folio in swap cache
  2025-11-19  8:57 ` David Hildenbrand (Red Hat)
  2025-11-19 12:23   ` Wei Yang
@ 2025-11-19 12:42   ` Wei Yang
  2025-11-19 14:13     ` David Hildenbrand (Red Hat)
  1 sibling, 1 reply; 22+ messages in thread
From: Wei Yang @ 2025-11-19 12:42 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat)
  Cc: Wei Yang, akpm, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett,
	npache, ryan.roberts, dev.jain, baohua, lance.yang, linux-mm,
	stable

On Wed, Nov 19, 2025 at 09:57:58AM +0100, David Hildenbrand (Red Hat) wrote:
>On 19.11.25 02:26, Wei Yang wrote:
>> Commit c010d47f107f ("mm: thp: split huge page to any lower order
>> pages") introduced an early check on the folio's order via
>> mapping->flags before proceeding with the split work.
>> 
>> This check introduced a bug: for shmem folios in the swap cache, the
>> mapping pointer can be NULL. Accessing mapping->flags in this state
>> leads directly to a NULL pointer dereference.
>
>Under which circumstances would that be the case? Only for large shmem folios
>in the swapcache or also for truncated folios? So I'd assume this
>would also affect truncated folios and we should spell that out here?
>
>> 
>> This commit fixes the issue by moving the check for mapping != NULL
>> before any attempt to access mapping->flags.
>> 
>> This fix necessarily changes the return value from -EBUSY to -EINVAL
>> when mapping is NULL. After reviewing current callers, they do not
>> differentiate between these two error codes, making this change safe.
>
>The doc of __split_huge_page_to_list_to_order() would now be outdated and has
>to be updated.
>
>Also, take a look at s390_wiggle_split_folio(): returning -EINVAL instead of
>-EBUSY will make a difference on concurrent truncation. -EINVAL will be
>propagated and make the operation fail, while -EBUSY will be translated to
>-EAGAIN and the caller will simply lookup the folio again and retry.
>
>So I think we should try to keep truncation return -EBUSY. For the shmem
>case, I think it's ok to return -EINVAL. I guess we can identify such folios
>by checking for folio_test_swapcache().
>

I come up a draft:

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 7c69572b6c3f..3e140fa1ca13 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3696,6 +3696,18 @@ bool folio_split_supported(struct folio *folio, unsigned int new_order,
                                "Cannot split to order-1 folio");
                if (new_order == 1)
                        return -EINVAL;
+       } else if (!folio->mapping) {
+               /*
+                * If there is no mapping that the folio was truncated and we
+                * cannot split.
+                *
+                * TODO: large shmem folio in the swap cache also don't
+                * currently have a mapping but folio_test_swapcache() is true
+                * for them.
+                */
+               if (folio_test_swapcache(folio))
+                       return -EINVAL;
+               return -EBUSY;
        } else if (split_type == SPLIT_TYPE_NON_UNIFORM || new_order) {
                if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
                    !mapping_large_folio_support(folio->mapping)) {
@@ -3931,8 +3943,9 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
        if (new_order >= old_order)
                return -EINVAL;
 
-       if (!folio_split_supported(folio, new_order, split_type, /* warn = */ true))
-               return -EINVAL;
+       ret = folio_split_supported((folio, new_order, split_type, /* warn = */ true));
+       if (ret)
+               return ret;
 
        is_hzp = is_huge_zero_folio(folio);
        if (is_hzp) {

Not sure I get your point correctly.

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/huge_memory: fix NULL pointer deference when splitting shmem folio in swap cache
  2025-11-19 12:23   ` Wei Yang
@ 2025-11-19 12:54     ` David Hildenbrand (Red Hat)
  2025-11-19 13:08       ` Zi Yan
  2025-11-19 13:14       ` Wei Yang
  0 siblings, 2 replies; 22+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-19 12:54 UTC (permalink / raw)
  To: Wei Yang
  Cc: akpm, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, lance.yang, linux-mm, stable


> 
>> So I think we should try to keep truncation return -EBUSY. For the shmem
>> case, I think it's ok to return -EINVAL. I guess we can identify such folios
>> by checking for folio_test_swapcache().
>>
> 
> Hmm... Don't get how to do this nicely.
> 
> Looks we can't do it in folio_split_supported().
> 
> Or change folio_split_supported() return error code directly?


On upstream, I would do something like the following (untested):

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 2f2a521e5d683..33fc3590867e2 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3524,6 +3524,9 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order,
                                 "Cannot split to order-1 folio");
                 if (new_order == 1)
                         return false;
+       } else if (folio_test_swapcache(folio)) {
+               /* TODO: support shmem folios that are in the swapcache. */
+               return false;
         } else if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
             !mapping_large_folio_support(folio->mapping)) {
                 /*
@@ -3556,6 +3559,9 @@ bool uniform_split_supported(struct folio *folio, unsigned int new_order,
                                 "Cannot split to order-1 folio");
                 if (new_order == 1)
                         return false;
+       } else if (folio_test_swapcache(folio)) {
+               /* TODO: support shmem folios that are in the swapcache. */
+               return false;
         } else  if (new_order) {
                 if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
                     !mapping_large_folio_support(folio->mapping)) {
@@ -3619,6 +3625,15 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
         if (folio != page_folio(split_at) || folio != page_folio(lock_at))
                 return -EINVAL;
  
+       /*
+        * Folios that just got truncated cannot get split. Signal to the
+        * caller that there was a race.
+        *
+        * TODO: support shmem folios that are in the swapcache.
+        */
+       if (!is_anon && !folio->mapping && !folio_test_swapcache(folio))
+               return -EBUSY;
+
         if (new_order >= folio_order(folio))
                 return -EINVAL;
  
@@ -3659,17 +3674,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
                 gfp_t gfp;
  
                 mapping = folio->mapping;
-
-               /* Truncated ? */
-               /*
-                * TODO: add support for large shmem folio in swap cache.
-                * When shmem is in swap cache, mapping is NULL and
-                * folio_test_swapcache() is true.
-                */
-               if (!mapping) {
-                       ret = -EBUSY;
-                       goto out;
-               }
+               VM_WARN_ON_ONCE_FOLIO(!mapping, folio);
  
                 min_order = mapping_min_folio_order(folio->mapping);
                 if (new_order < min_order) {


So rule out the truncated case earlier, leaving only the swapcache check to be handled
later.

Thoughts?

> 
>>
>> Probably worth mentioning that this was identified by code inspection?
>>
> 
> Agree.
> 
>>>
>>> Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages")
>>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>>> Cc: Zi Yan <ziy@nvidia.com>
>>> Cc: <stable@vger.kernel.org>
>>
>> Hmm, what would this patch look like when based on current upstream? We'd
>> likely want to get that upstream asap.
>>
> 
> This depends whether we want it on top of [1].
> 
> Current upstream doesn't have it [1] and need to fix it in two places.
> 
> Andrew mention prefer a fixup version in [2].
> 
> [1]: lkml.kernel.org/r/20251106034155.21398-1-richard.weiyang@gmail.com
> [2]: lkml.kernel.org/r/20251118140658.9078de6aab719b2308996387@linux-foundation.org

As we will want to backport this patch, likely we want to have it apply on current master.

Bur Andrew can comment what he prefers in this case of a stable fix.

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/huge_memory: fix NULL pointer deference when splitting shmem folio in swap cache
  2025-11-19 12:54     ` David Hildenbrand (Red Hat)
@ 2025-11-19 13:08       ` Zi Yan
  2025-11-19 13:41         ` Wei Yang
  2025-11-19 14:09         ` David Hildenbrand (Red Hat)
  2025-11-19 13:14       ` Wei Yang
  1 sibling, 2 replies; 22+ messages in thread
From: Zi Yan @ 2025-11-19 13:08 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat)
  Cc: Wei Yang, akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett,
	npache, ryan.roberts, dev.jain, baohua, lance.yang, linux-mm,
	stable

On 19 Nov 2025, at 7:54, David Hildenbrand (Red Hat) wrote:

>>
>>> So I think we should try to keep truncation return -EBUSY. For the shmem
>>> case, I think it's ok to return -EINVAL. I guess we can identify such folios
>>> by checking for folio_test_swapcache().
>>>
>>
>> Hmm... Don't get how to do this nicely.
>>
>> Looks we can't do it in folio_split_supported().
>>
>> Or change folio_split_supported() return error code directly?
>
>
> On upstream, I would do something like the following (untested):
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 2f2a521e5d683..33fc3590867e2 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3524,6 +3524,9 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order,
>                                 "Cannot split to order-1 folio");
>                 if (new_order == 1)
>                         return false;
> +       } else if (folio_test_swapcache(folio)) {
> +               /* TODO: support shmem folios that are in the swapcache. */
> +               return false;
>         } else if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
>             !mapping_large_folio_support(folio->mapping)) {
>                 /*
> @@ -3556,6 +3559,9 @@ bool uniform_split_supported(struct folio *folio, unsigned int new_order,
>                                 "Cannot split to order-1 folio");
>                 if (new_order == 1)
>                         return false;
> +       } else if (folio_test_swapcache(folio)) {
> +               /* TODO: support shmem folios that are in the swapcache. */
> +               return false;
You are splitting the truncate case into shmem one and page cache one.
This is only for shmem in the swap cache and ...

>         } else  if (new_order) {
>                 if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
>                     !mapping_large_folio_support(folio->mapping)) {
> @@ -3619,6 +3625,15 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>         if (folio != page_folio(split_at) || folio != page_folio(lock_at))
>                 return -EINVAL;
>  +       /*
> +        * Folios that just got truncated cannot get split. Signal to the
> +        * caller that there was a race.
> +        *
> +        * TODO: support shmem folios that are in the swapcache.

this is for page cache one. So this TODO is not needed.

> +        */
> +       if (!is_anon && !folio->mapping && !folio_test_swapcache(folio))
> +               return -EBUSY;
> +
>         if (new_order >= folio_order(folio))
>                 return -EINVAL;
>  @@ -3659,17 +3674,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>                 gfp_t gfp;
>                  mapping = folio->mapping;
> -
> -               /* Truncated ? */
> -               /*
> -                * TODO: add support for large shmem folio in swap cache.
> -                * When shmem is in swap cache, mapping is NULL and
> -                * folio_test_swapcache() is true.
> -                */
> -               if (!mapping) {
> -                       ret = -EBUSY;
> -                       goto out;
> -               }
> +               VM_WARN_ON_ONCE_FOLIO(!mapping, folio);
>                  min_order = mapping_min_folio_order(folio->mapping);
>                 if (new_order < min_order) {
>
>
> So rule out the truncated case earlier, leaving only the swapcache check to be handled
> later.
>
> Thoughts?

I thought the truncated case includes both page cache and shmem in the swap cache.

Otherwise, it looks good to me.

>>
>>>
>>> Probably worth mentioning that this was identified by code inspection?
>>>
>>
>> Agree.
>>
>>>>
>>>> Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages")
>>>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>>>> Cc: Zi Yan <ziy@nvidia.com>
>>>> Cc: <stable@vger.kernel.org>
>>>
>>> Hmm, what would this patch look like when based on current upstream? We'd
>>> likely want to get that upstream asap.
>>>
>>
>> This depends whether we want it on top of [1].
>>
>> Current upstream doesn't have it [1] and need to fix it in two places.
>>
>> Andrew mention prefer a fixup version in [2].
>>
>> [1]: lkml.kernel.org/r/20251106034155.21398-1-richard.weiyang@gmail.com
>> [2]: lkml.kernel.org/r/20251118140658.9078de6aab719b2308996387@linux-foundation.org
>
> As we will want to backport this patch, likely we want to have it apply on current master.
>
> Bur Andrew can comment what he prefers in this case of a stable fix.

That could mess up with mm-new tree[1] based on Andrew's recent feedback.

[1] https://lore.kernel.org/all/20251118140658.9078de6aab719b2308996387@linux-foundation.org/

--
Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/huge_memory: fix NULL pointer deference when splitting shmem folio in swap cache
  2025-11-19 12:54     ` David Hildenbrand (Red Hat)
  2025-11-19 13:08       ` Zi Yan
@ 2025-11-19 13:14       ` Wei Yang
  1 sibling, 0 replies; 22+ messages in thread
From: Wei Yang @ 2025-11-19 13:14 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat)
  Cc: Wei Yang, akpm, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett,
	npache, ryan.roberts, dev.jain, baohua, lance.yang, linux-mm,
	stable

On Wed, Nov 19, 2025 at 01:54:45PM +0100, David Hildenbrand (Red Hat) wrote:
>
>> 
>> > So I think we should try to keep truncation return -EBUSY. For the shmem
>> > case, I think it's ok to return -EINVAL. I guess we can identify such folios
>> > by checking for folio_test_swapcache().
>> > 
>> 
>> Hmm... Don't get how to do this nicely.
>> 
>> Looks we can't do it in folio_split_supported().
>> 
>> Or change folio_split_supported() return error code directly?
>
>
>On upstream, I would do something like the following (untested):
>
>diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>index 2f2a521e5d683..33fc3590867e2 100644
>--- a/mm/huge_memory.c
>+++ b/mm/huge_memory.c
>@@ -3524,6 +3524,9 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order,
>                                "Cannot split to order-1 folio");
>                if (new_order == 1)
>                        return false;
>+       } else if (folio_test_swapcache(folio)) {
>+               /* TODO: support shmem folios that are in the swapcache. */
>+               return false;
>        } else if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
>            !mapping_large_folio_support(folio->mapping)) {
>                /*
>@@ -3556,6 +3559,9 @@ bool uniform_split_supported(struct folio *folio, unsigned int new_order,
>                                "Cannot split to order-1 folio");
>                if (new_order == 1)
>                        return false;
>+       } else if (folio_test_swapcache(folio)) {
>+               /* TODO: support shmem folios that are in the swapcache. */
>+               return false;
>        } else  if (new_order) {
>                if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
>                    !mapping_large_folio_support(folio->mapping)) {
>@@ -3619,6 +3625,15 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>        if (folio != page_folio(split_at) || folio != page_folio(lock_at))
>                return -EINVAL;
>+       /*
>+        * Folios that just got truncated cannot get split. Signal to the
>+        * caller that there was a race.
>+        *
>+        * TODO: support shmem folios that are in the swapcache.
>+        */
>+       if (!is_anon && !folio->mapping && !folio_test_swapcache(folio))
>+               return -EBUSY;
>+
>        if (new_order >= folio_order(folio))
>                return -EINVAL;
>@@ -3659,17 +3674,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>                gfp_t gfp;
>                mapping = folio->mapping;
>-
>-               /* Truncated ? */
>-               /*
>-                * TODO: add support for large shmem folio in swap cache.
>-                * When shmem is in swap cache, mapping is NULL and
>-                * folio_test_swapcache() is true.
>-                */
>-               if (!mapping) {
>-                       ret = -EBUSY;
>-                       goto out;
>-               }
>+               VM_WARN_ON_ONCE_FOLIO(!mapping, folio);
>                min_order = mapping_min_folio_order(folio->mapping);
>                if (new_order < min_order) {
>
>
>So rule out the truncated case earlier, leaving only the swapcache check to be handled
>later.
>
>Thoughts?
>

Cleaner, will test this first.

>> 
>> > 
>> > Probably worth mentioning that this was identified by code inspection?
>> > 
>> 
>> Agree.
>> 
>> > > 
>> > > Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages")
>> > > Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> > > Cc: Zi Yan <ziy@nvidia.com>
>> > > Cc: <stable@vger.kernel.org>
>> > 
>> > Hmm, what would this patch look like when based on current upstream? We'd
>> > likely want to get that upstream asap.
>> > 
>> 
>> This depends whether we want it on top of [1].
>> 
>> Current upstream doesn't have it [1] and need to fix it in two places.
>> 
>> Andrew mention prefer a fixup version in [2].
>> 
>> [1]: lkml.kernel.org/r/20251106034155.21398-1-richard.weiyang@gmail.com
>> [2]: lkml.kernel.org/r/20251118140658.9078de6aab719b2308996387@linux-foundation.org
>
>As we will want to backport this patch, likely we want to have it apply on current master.
>
>Bur Andrew can comment what he prefers in this case of a stable fix.
>

Yep, I will prepare patch both for current master and current mm-new.

And wait for Andrew's order.

>-- 
>Cheers
>
>David

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/huge_memory: fix NULL pointer deference when splitting shmem folio in swap cache
  2025-11-19 13:08       ` Zi Yan
@ 2025-11-19 13:41         ` Wei Yang
  2025-11-19 13:58           ` David Hildenbrand (Red Hat)
  2025-11-19 14:09         ` David Hildenbrand (Red Hat)
  1 sibling, 1 reply; 22+ messages in thread
From: Wei Yang @ 2025-11-19 13:41 UTC (permalink / raw)
  To: Zi Yan
  Cc: David Hildenbrand (Red Hat),
	Wei Yang, akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett,
	npache, ryan.roberts, dev.jain, baohua, lance.yang, linux-mm,
	stable

On Wed, Nov 19, 2025 at 08:08:01AM -0500, Zi Yan wrote:
>On 19 Nov 2025, at 7:54, David Hildenbrand (Red Hat) wrote:
>
>>>
>>>> So I think we should try to keep truncation return -EBUSY. For the shmem
>>>> case, I think it's ok to return -EINVAL. I guess we can identify such folios
>>>> by checking for folio_test_swapcache().
>>>>
>>>
>>> Hmm... Don't get how to do this nicely.
>>>
>>> Looks we can't do it in folio_split_supported().
>>>
>>> Or change folio_split_supported() return error code directly?
>>
>>
>> On upstream, I would do something like the following (untested):
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 2f2a521e5d683..33fc3590867e2 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -3524,6 +3524,9 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order,
>>                                 "Cannot split to order-1 folio");
>>                 if (new_order == 1)
>>                         return false;
>> +       } else if (folio_test_swapcache(folio)) {
>> +               /* TODO: support shmem folios that are in the swapcache. */
>> +               return false;

Hmm... we are filtering out all swapcache instead of just shmem swapcache?

Is it possible for (folio->mapping && folio_test_swapcache(folio)) reach here?
Looks the logic is little different, but maybe I missed something.

OK, my brain is out of state. Hope I don't make stupid mistake.

>>         } else if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
>>             !mapping_large_folio_support(folio->mapping)) {
>>                 /*
>> @@ -3556,6 +3559,9 @@ bool uniform_split_supported(struct folio *folio, unsigned int new_order,
>>                                 "Cannot split to order-1 folio");
>>                 if (new_order == 1)
>>                         return false;
>> +       } else if (folio_test_swapcache(folio)) {
>> +               /* TODO: support shmem folios that are in the swapcache. */
>> +               return false;
>You are splitting the truncate case into shmem one and page cache one.
>This is only for shmem in the swap cache and ...
>
>>         } else  if (new_order) {
>>                 if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
>>                     !mapping_large_folio_support(folio->mapping)) {
>> @@ -3619,6 +3625,15 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>         if (folio != page_folio(split_at) || folio != page_folio(lock_at))
>>                 return -EINVAL;
>>  +       /*
>> +        * Folios that just got truncated cannot get split. Signal to the
>> +        * caller that there was a race.
>> +        *
>> +        * TODO: support shmem folios that are in the swapcache.
>
>this is for page cache one. So this TODO is not needed.
>
>> +        */
>> +       if (!is_anon && !folio->mapping && !folio_test_swapcache(folio))
>> +               return -EBUSY;
>> +
>>         if (new_order >= folio_order(folio))
>>                 return -EINVAL;
>>  @@ -3659,17 +3674,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>                 gfp_t gfp;
>>                  mapping = folio->mapping;
>> -
>> -               /* Truncated ? */
>> -               /*
>> -                * TODO: add support for large shmem folio in swap cache.
>> -                * When shmem is in swap cache, mapping is NULL and
>> -                * folio_test_swapcache() is true.
>> -                */
>> -               if (!mapping) {
>> -                       ret = -EBUSY;
>> -                       goto out;
>> -               }
>> +               VM_WARN_ON_ONCE_FOLIO(!mapping, folio);
>>                  min_order = mapping_min_folio_order(folio->mapping);
>>                 if (new_order < min_order) {
>>
>>
>> So rule out the truncated case earlier, leaving only the swapcache check to be handled
>> later.
>>
>> Thoughts?
>
>I thought the truncated case includes both page cache and shmem in the swap cache.
>
>Otherwise, it looks good to me.
>
>>>
>>>>
>>>> Probably worth mentioning that this was identified by code inspection?
>>>>
>>>
>>> Agree.
>>>
>>>>>
>>>>> Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages")
>>>>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>>>>> Cc: Zi Yan <ziy@nvidia.com>
>>>>> Cc: <stable@vger.kernel.org>
>>>>
>>>> Hmm, what would this patch look like when based on current upstream? We'd
>>>> likely want to get that upstream asap.
>>>>
>>>
>>> This depends whether we want it on top of [1].
>>>
>>> Current upstream doesn't have it [1] and need to fix it in two places.
>>>
>>> Andrew mention prefer a fixup version in [2].
>>>
>>> [1]: lkml.kernel.org/r/20251106034155.21398-1-richard.weiyang@gmail.com
>>> [2]: lkml.kernel.org/r/20251118140658.9078de6aab719b2308996387@linux-foundation.org
>>
>> As we will want to backport this patch, likely we want to have it apply on current master.
>>
>> Bur Andrew can comment what he prefers in this case of a stable fix.
>
>That could mess up with mm-new tree[1] based on Andrew's recent feedback.
>
>[1] https://lore.kernel.org/all/20251118140658.9078de6aab719b2308996387@linux-foundation.org/
>
>--
>Best Regards,
>Yan, Zi

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/huge_memory: fix NULL pointer deference when splitting shmem folio in swap cache
  2025-11-19 13:41         ` Wei Yang
@ 2025-11-19 13:58           ` David Hildenbrand (Red Hat)
  0 siblings, 0 replies; 22+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-19 13:58 UTC (permalink / raw)
  To: Wei Yang, Zi Yan
  Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, lance.yang, linux-mm, stable

On 19.11.25 14:41, Wei Yang wrote:
> On Wed, Nov 19, 2025 at 08:08:01AM -0500, Zi Yan wrote:
>> On 19 Nov 2025, at 7:54, David Hildenbrand (Red Hat) wrote:
>>
>>>>
>>>>> So I think we should try to keep truncation return -EBUSY. For the shmem
>>>>> case, I think it's ok to return -EINVAL. I guess we can identify such folios
>>>>> by checking for folio_test_swapcache().
>>>>>
>>>>
>>>> Hmm... Don't get how to do this nicely.
>>>>
>>>> Looks we can't do it in folio_split_supported().
>>>>
>>>> Or change folio_split_supported() return error code directly?
>>>
>>>
>>> On upstream, I would do something like the following (untested):
>>>
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 2f2a521e5d683..33fc3590867e2 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -3524,6 +3524,9 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order,
>>>                                  "Cannot split to order-1 folio");
>>>                  if (new_order == 1)
>>>                          return false;
>>> +       } else if (folio_test_swapcache(folio)) {
>>> +               /* TODO: support shmem folios that are in the swapcache. */
>>> +               return false;
> 
> Hmm... we are filtering out all swapcache instead of just shmem swapcache?
> 
> Is it possible for (folio->mapping && folio_test_swapcache(folio)) reach here?
> Looks the logic is little different, but maybe I missed something.
> 
> OK, my brain is out of state. Hope I don't make stupid mistake.

It's tricky. folio_test_swapcache() only applies to anon and shmem.

But looking at it, we have

	PG_swapcache = PG_owner_priv_1,

PG_owner_priv_1 is also used for
* XEN stuff
* vmemmap_self_hosted

Which is not important for us IIRC.

But we have

	/* Some filesystems */
	PG_checked = PG_owner_priv_1

So maybe we could indeed have false positives here.


So I guess we cannot rely on folio_test_swapcache() ... here. What a mess.

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/huge_memory: fix NULL pointer deference when splitting shmem folio in swap cache
  2025-11-19 13:08       ` Zi Yan
  2025-11-19 13:41         ` Wei Yang
@ 2025-11-19 14:09         ` David Hildenbrand (Red Hat)
  2025-11-19 14:29           ` Zi Yan
  1 sibling, 1 reply; 22+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-19 14:09 UTC (permalink / raw)
  To: Zi Yan
  Cc: Wei Yang, akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett,
	npache, ryan.roberts, dev.jain, baohua, lance.yang, linux-mm,
	stable

On 19.11.25 14:08, Zi Yan wrote:
> On 19 Nov 2025, at 7:54, David Hildenbrand (Red Hat) wrote:
> 
>>>
>>>> So I think we should try to keep truncation return -EBUSY. For the shmem
>>>> case, I think it's ok to return -EINVAL. I guess we can identify such folios
>>>> by checking for folio_test_swapcache().
>>>>
>>>
>>> Hmm... Don't get how to do this nicely.
>>>
>>> Looks we can't do it in folio_split_supported().
>>>
>>> Or change folio_split_supported() return error code directly?
>>
>>
>> On upstream, I would do something like the following (untested):
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 2f2a521e5d683..33fc3590867e2 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -3524,6 +3524,9 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order,
>>                                  "Cannot split to order-1 folio");
>>                  if (new_order == 1)
>>                          return false;
>> +       } else if (folio_test_swapcache(folio)) {
>> +               /* TODO: support shmem folios that are in the swapcache. */
>> +               return false;
>>          } else if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
>>              !mapping_large_folio_support(folio->mapping)) {
>>                  /*
>> @@ -3556,6 +3559,9 @@ bool uniform_split_supported(struct folio *folio, unsigned int new_order,
>>                                  "Cannot split to order-1 folio");
>>                  if (new_order == 1)
>>                          return false;
>> +       } else if (folio_test_swapcache(folio)) {
>> +               /* TODO: support shmem folios that are in the swapcache. */
>> +               return false;
> You are splitting the truncate case into shmem one and page cache one.
> This is only for shmem in the swap cache and ...
> 
>>          } else  if (new_order) {
>>                  if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
>>                      !mapping_large_folio_support(folio->mapping)) {
>> @@ -3619,6 +3625,15 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>          if (folio != page_folio(split_at) || folio != page_folio(lock_at))
>>                  return -EINVAL;
>>   +       /*
>> +        * Folios that just got truncated cannot get split. Signal to the
>> +        * caller that there was a race.
>> +        *
>> +        * TODO: support shmem folios that are in the swapcache.
> 
> this is for page cache one. So this TODO is not needed.

I added the TODO because we'd like to detect truncation there as well I think. Hm.

> 
>> +        */
>> +       if (!is_anon && !folio->mapping && !folio_test_swapcache(folio))
>> +               return -EBUSY;
>> +

Given folio_test_swapcache() might have false positives,
I assume we'd need a

	folio_test_swapbacked() && folio_test_swapcache(folio)

To detect large large shmem folios in the swapcache in all cases here.

Something like the following would hopefully do:

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 2f2a521e5d683..57aab66bedbea 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3515,6 +3515,13 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
         return ret;
  }
  
+static bool folio_test_shmem_swapcache(struct folio *folio)
+{
+       VM_WARN_ON_ONCE_FOLIO(folio_test_anon(folio), folio);
+       /* These folios do not have folio->mapping set. */
+       return folio_test_swapbacked(folio) && folio_test_swapcache(folio);
+}
+
  bool non_uniform_split_supported(struct folio *folio, unsigned int new_order,
                 bool warns)
  {
@@ -3524,6 +3531,9 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order,
                                 "Cannot split to order-1 folio");
                 if (new_order == 1)
                         return false;
+       } else if (folio_test_shmem_swapcache(folio)) {
+               /* TODO: support shmem folios that are in the swapcache. */
+               return false;
         } else if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
             !mapping_large_folio_support(folio->mapping)) {
                 /*
@@ -3556,6 +3566,9 @@ bool uniform_split_supported(struct folio *folio, unsigned int new_order,
                                 "Cannot split to order-1 folio");
                 if (new_order == 1)
                         return false;
+       } else if (folio_test_shmem_swapcache(folio)) {
+               /* TODO: support shmem folios that are in the swapcache. */
+               return false;
         } else  if (new_order) {
                 if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
                     !mapping_large_folio_support(folio->mapping)) {
@@ -3619,6 +3632,13 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
         if (folio != page_folio(split_at) || folio != page_folio(lock_at))
                 return -EINVAL;
  
+       /*
+        * Folios that just got truncated cannot get split. Signal to the
+        * caller that there was a race.
+        */
+       if (!is_anon && !folio->mapping && !folio_test_shmem_swapcache(folio))
+               return -EBUSY;
+
         if (new_order >= folio_order(folio))
                 return -EINVAL;
  
@@ -3659,17 +3679,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
                 gfp_t gfp;
  
                 mapping = folio->mapping;
-
-               /* Truncated ? */
-               /*
-                * TODO: add support for large shmem folio in swap cache.
-                * When shmem is in swap cache, mapping is NULL and
-                * folio_test_swapcache() is true.
-                */
-               if (!mapping) {
-                       ret = -EBUSY;
-                       goto out;
-               }
+               VM_WARN_ON_ONCE_FOLIO(!mapping, folio);
  
                 min_order = mapping_min_folio_order(folio->mapping);
                 if (new_order < min_order) {

>>>
>>>>
>>>> Probably worth mentioning that this was identified by code inspection?
>>>>
>>>
>>> Agree.
>>>
>>>>>
>>>>> Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages")
>>>>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>>>>> Cc: Zi Yan <ziy@nvidia.com>
>>>>> Cc: <stable@vger.kernel.org>
>>>>
>>>> Hmm, what would this patch look like when based on current upstream? We'd
>>>> likely want to get that upstream asap.
>>>>
>>>
>>> This depends whether we want it on top of [1].
>>>
>>> Current upstream doesn't have it [1] and need to fix it in two places.
>>>
>>> Andrew mention prefer a fixup version in [2].
>>>
>>> [1]: lkml.kernel.org/r/20251106034155.21398-1-richard.weiyang@gmail.com
>>> [2]: lkml.kernel.org/r/20251118140658.9078de6aab719b2308996387@linux-foundation.org
>>
>> As we will want to backport this patch, likely we want to have it apply on current master.
>>
>> Bur Andrew can comment what he prefers in this case of a stable fix.
> 
> That could mess up with mm-new tree[1] based on Andrew's recent feedback.

Right, a similar fix could be had against mm-new.

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/huge_memory: fix NULL pointer deference when splitting shmem folio in swap cache
  2025-11-19 12:42   ` Wei Yang
@ 2025-11-19 14:13     ` David Hildenbrand (Red Hat)
  0 siblings, 0 replies; 22+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-19 14:13 UTC (permalink / raw)
  To: Wei Yang
  Cc: akpm, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, lance.yang, linux-mm, stable

On 19.11.25 13:42, Wei Yang wrote:
> On Wed, Nov 19, 2025 at 09:57:58AM +0100, David Hildenbrand (Red Hat) wrote:
>> On 19.11.25 02:26, Wei Yang wrote:
>>> Commit c010d47f107f ("mm: thp: split huge page to any lower order
>>> pages") introduced an early check on the folio's order via
>>> mapping->flags before proceeding with the split work.
>>>
>>> This check introduced a bug: for shmem folios in the swap cache, the
>>> mapping pointer can be NULL. Accessing mapping->flags in this state
>>> leads directly to a NULL pointer dereference.
>>
>> Under which circumstances would that be the case? Only for large shmem folios
>> in the swapcache or also for truncated folios? So I'd assume this
>> would also affect truncated folios and we should spell that out here?
>>
>>>
>>> This commit fixes the issue by moving the check for mapping != NULL
>>> before any attempt to access mapping->flags.
>>>
>>> This fix necessarily changes the return value from -EBUSY to -EINVAL
>>> when mapping is NULL. After reviewing current callers, they do not
>>> differentiate between these two error codes, making this change safe.
>>
>> The doc of __split_huge_page_to_list_to_order() would now be outdated and has
>> to be updated.
>>
>> Also, take a look at s390_wiggle_split_folio(): returning -EINVAL instead of
>> -EBUSY will make a difference on concurrent truncation. -EINVAL will be
>> propagated and make the operation fail, while -EBUSY will be translated to
>> -EAGAIN and the caller will simply lookup the folio again and retry.
>>
>> So I think we should try to keep truncation return -EBUSY. For the shmem
>> case, I think it's ok to return -EINVAL. I guess we can identify such folios
>> by checking for folio_test_swapcache().
>>
> 
> I come up a draft:
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 7c69572b6c3f..3e140fa1ca13 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3696,6 +3696,18 @@ bool folio_split_supported(struct folio *folio, unsigned int new_order,
>                                  "Cannot split to order-1 folio");
>                  if (new_order == 1)
>                          return -EINVAL;
> +       } else if (!folio->mapping) {
> +               /*
> +                * If there is no mapping that the folio was truncated and we
> +                * cannot split.
> +                *
> +                * TODO: large shmem folio in the swap cache also don't
> +                * currently have a mapping but folio_test_swapcache() is true
> +                * for them.
> +                */
> +               if (folio_test_swapcache(folio))

As per discussions, that would likely have to be

	folio_test_swapbacked() && folio_test_swapcache()


> +                       return -EINVAL;
> +               return -EBUSY;
>          } else if (split_type == SPLIT_TYPE_NON_UNIFORM || new_order) {
>                  if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
>                      !mapping_large_folio_support(folio->mapping)) {
> @@ -3931,8 +3943,9 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>          if (new_order >= old_order)
>                  return -EINVAL;
>   
> -       if (!folio_split_supported(folio, new_order, split_type, /* warn = */ true))
> -               return -EINVAL;
> +       ret = folio_split_supported((folio, new_order, split_type, /* warn = */ true));
> +       if (ret)

I'd prefer if folio_split_supported() would keep returning a boolen, 
such that we detect the truncation case earlier and just return -EBUSY.

But no strong opinion. Important part is that truncation handling is not 
changed without taking a lot of care.

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/huge_memory: fix NULL pointer deference when splitting shmem folio in swap cache
  2025-11-19 14:09         ` David Hildenbrand (Red Hat)
@ 2025-11-19 14:29           ` Zi Yan
  2025-11-19 14:37             ` David Hildenbrand (Red Hat)
  0 siblings, 1 reply; 22+ messages in thread
From: Zi Yan @ 2025-11-19 14:29 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat)
  Cc: Wei Yang, akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett,
	npache, ryan.roberts, dev.jain, baohua, lance.yang, linux-mm,
	stable

On 19 Nov 2025, at 9:09, David Hildenbrand (Red Hat) wrote:

> On 19.11.25 14:08, Zi Yan wrote:
>> On 19 Nov 2025, at 7:54, David Hildenbrand (Red Hat) wrote:
>>
>>>>
>>>>> So I think we should try to keep truncation return -EBUSY. For the shmem
>>>>> case, I think it's ok to return -EINVAL. I guess we can identify such folios
>>>>> by checking for folio_test_swapcache().
>>>>>
>>>>
>>>> Hmm... Don't get how to do this nicely.
>>>>
>>>> Looks we can't do it in folio_split_supported().
>>>>
>>>> Or change folio_split_supported() return error code directly?
>>>
>>>
>>> On upstream, I would do something like the following (untested):
>>>
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 2f2a521e5d683..33fc3590867e2 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -3524,6 +3524,9 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order,
>>>                                  "Cannot split to order-1 folio");
>>>                  if (new_order == 1)
>>>                          return false;
>>> +       } else if (folio_test_swapcache(folio)) {
>>> +               /* TODO: support shmem folios that are in the swapcache. */
>>> +               return false;
>>>          } else if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
>>>              !mapping_large_folio_support(folio->mapping)) {
>>>                  /*
>>> @@ -3556,6 +3559,9 @@ bool uniform_split_supported(struct folio *folio, unsigned int new_order,
>>>                                  "Cannot split to order-1 folio");
>>>                  if (new_order == 1)
>>>                          return false;
>>> +       } else if (folio_test_swapcache(folio)) {
>>> +               /* TODO: support shmem folios that are in the swapcache. */
>>> +               return false;
>> You are splitting the truncate case into shmem one and page cache one.
>> This is only for shmem in the swap cache and ...
>>
>>>          } else  if (new_order) {
>>>                  if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
>>>                      !mapping_large_folio_support(folio->mapping)) {
>>> @@ -3619,6 +3625,15 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>>          if (folio != page_folio(split_at) || folio != page_folio(lock_at))
>>>                  return -EINVAL;
>>>   +       /*
>>> +        * Folios that just got truncated cannot get split. Signal to the
>>> +        * caller that there was a race.
>>> +        *
>>> +        * TODO: support shmem folios that are in the swapcache.
>>
>> this is for page cache one. So this TODO is not needed.
>
> I added the TODO because we'd like to detect truncation there as well I think. Hm.

OK, got it. Here you mean shmem in the swapcache is not checked and
when shmem in the swapcache is supported, folio_test_swapcache() can be removed
here along with the TODO. Now it makes sense.

>>
>>> +        */
>>> +       if (!is_anon && !folio->mapping && !folio_test_swapcache(folio))
>>> +               return -EBUSY;
>>> +
>
> Given folio_test_swapcache() might have false positives,
> I assume we'd need a
>
> 	folio_test_swapbacked() && folio_test_swapcache(folio)
>
> To detect large large shmem folios in the swapcache in all cases here.
>
> Something like the following would hopefully do:
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 2f2a521e5d683..57aab66bedbea 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3515,6 +3515,13 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
>         return ret;
>  }
>  +static bool folio_test_shmem_swapcache(struct folio *folio)
> +{
> +       VM_WARN_ON_ONCE_FOLIO(folio_test_anon(folio), folio);
> +       /* These folios do not have folio->mapping set. */
> +       return folio_test_swapbacked(folio) && folio_test_swapcache(folio);
> +}
> +
>  bool non_uniform_split_supported(struct folio *folio, unsigned int new_order,
>                 bool warns)
>  {
> @@ -3524,6 +3531,9 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order,
>                                 "Cannot split to order-1 folio");
>                 if (new_order == 1)
>                         return false;
> +       } else if (folio_test_shmem_swapcache(folio)) {
> +               /* TODO: support shmem folios that are in the swapcache. */
> +               return false;

With this, truncated shmem returns -EINVALID instead of -EBUSY now.
Can s390_wiggle_split_folio() such folios?

>         } else if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
>             !mapping_large_folio_support(folio->mapping)) {
>                 /*
> @@ -3556,6 +3566,9 @@ bool uniform_split_supported(struct folio *folio, unsigned int new_order,
>                                 "Cannot split to order-1 folio");
>                 if (new_order == 1)
>                         return false;
> +       } else if (folio_test_shmem_swapcache(folio)) {
> +               /* TODO: support shmem folios that are in the swapcache. */
> +               return false;
>         } else  if (new_order) {
>                 if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
>                     !mapping_large_folio_support(folio->mapping)) {
> @@ -3619,6 +3632,13 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>         if (folio != page_folio(split_at) || folio != page_folio(lock_at))
>                 return -EINVAL;
>  +       /*
> +        * Folios that just got truncated cannot get split. Signal to the
> +        * caller that there was a race.
> +        */
> +       if (!is_anon && !folio->mapping && !folio_test_shmem_swapcache(folio))
> +               return -EBUSY;
> +
>         if (new_order >= folio_order(folio))
>                 return -EINVAL;
>  @@ -3659,17 +3679,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>                 gfp_t gfp;
>                  mapping = folio->mapping;
> -
> -               /* Truncated ? */
> -               /*
> -                * TODO: add support for large shmem folio in swap cache.
> -                * When shmem is in swap cache, mapping is NULL and
> -                * folio_test_swapcache() is true.
> -                */
> -               if (!mapping) {
> -                       ret = -EBUSY;
> -                       goto out;
> -               }
> +               VM_WARN_ON_ONCE_FOLIO(!mapping, folio);
>                  min_order = mapping_min_folio_order(folio->mapping);
>                 if (new_order < min_order) {
>

I think it works if there is no impact on s390_wiggle_split_folio().
It also clarifies two truncated cases.

For backporting, maybe just move "if (!mapping)" up for simplicity?

>>>>
>>>>>
>>>>> Probably worth mentioning that this was identified by code inspection?
>>>>>
>>>>
>>>> Agree.
>>>>
>>>>>>
>>>>>> Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages")
>>>>>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>>>>>> Cc: Zi Yan <ziy@nvidia.com>
>>>>>> Cc: <stable@vger.kernel.org>
>>>>>
>>>>> Hmm, what would this patch look like when based on current upstream? We'd
>>>>> likely want to get that upstream asap.
>>>>>
>>>>
>>>> This depends whether we want it on top of [1].
>>>>
>>>> Current upstream doesn't have it [1] and need to fix it in two places.
>>>>
>>>> Andrew mention prefer a fixup version in [2].
>>>>
>>>> [1]: lkml.kernel.org/r/20251106034155.21398-1-richard.weiyang@gmail.com
>>>> [2]: lkml.kernel.org/r/20251118140658.9078de6aab719b2308996387@linux-foundation.org
>>>
>>> As we will want to backport this patch, likely we want to have it apply on current master.
>>>
>>> Bur Andrew can comment what he prefers in this case of a stable fix.
>>
>> That could mess up with mm-new tree[1] based on Andrew's recent feedback.
>
> Right, a similar fix could be had against mm-new.
>
> -- 
> Cheers
>
> David


--
Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/huge_memory: fix NULL pointer deference when splitting shmem folio in swap cache
  2025-11-19 14:29           ` Zi Yan
@ 2025-11-19 14:37             ` David Hildenbrand (Red Hat)
  2025-11-19 14:46               ` David Hildenbrand (Red Hat)
  2025-11-19 14:47               ` Zi Yan
  0 siblings, 2 replies; 22+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-19 14:37 UTC (permalink / raw)
  To: Zi Yan
  Cc: Wei Yang, akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett,
	npache, ryan.roberts, dev.jain, baohua, lance.yang, linux-mm,
	stable

>> Given folio_test_swapcache() might have false positives,
>> I assume we'd need a
>>
>> 	folio_test_swapbacked() && folio_test_swapcache(folio)
>>
>> To detect large large shmem folios in the swapcache in all cases here.
>>
>> Something like the following would hopefully do:
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 2f2a521e5d683..57aab66bedbea 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -3515,6 +3515,13 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
>>          return ret;
>>   }
>>   +static bool folio_test_shmem_swapcache(struct folio *folio)
>> +{
>> +       VM_WARN_ON_ONCE_FOLIO(folio_test_anon(folio), folio);
>> +       /* These folios do not have folio->mapping set. */
>> +       return folio_test_swapbacked(folio) && folio_test_swapcache(folio);
>> +}
>> +
>>   bool non_uniform_split_supported(struct folio *folio, unsigned int new_order,
>>                  bool warns)
>>   {
>> @@ -3524,6 +3531,9 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order,
>>                                  "Cannot split to order-1 folio");
>>                  if (new_order == 1)
>>                          return false;
>> +       } else if (folio_test_shmem_swapcache(folio)) {
>> +               /* TODO: support shmem folios that are in the swapcache. */
>> +               return false;
> 
> With this, truncated shmem returns -EINVALID instead of -EBUSY now.
> Can s390_wiggle_split_folio() such folios?

[noting that s390_wiggle_split_folio() was just one caller where I new 
the return value differs. I suspect there might be more.]

I am still not clear on that one.

s390x obtains the folio while walking the page tables. In case it gets 
-EBUSY it simply retries to obtain the folio from the page tables.

So assuming there was concurrent truncation and we returned -EBUSY, it 
would just retry walking the page tables (trigger a fault to map a 
folio) and retry with that one.

I would assume that the shmem folio in the swapcache could never have 
worked before, and that there is no way to make progress really.

In other words: do we know how we can end up with a shmem folio that is 
in the swapcache and does not have folio->mapping set?

Could that think still be mapped into the page tables? (I hope not, but 
right now I am confused how that can happen )

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/huge_memory: fix NULL pointer deference when splitting shmem folio in swap cache
  2025-11-19 14:37             ` David Hildenbrand (Red Hat)
@ 2025-11-19 14:46               ` David Hildenbrand (Red Hat)
  2025-11-19 14:48                 ` Zi Yan
                                   ` (2 more replies)
  2025-11-19 14:47               ` Zi Yan
  1 sibling, 3 replies; 22+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-19 14:46 UTC (permalink / raw)
  To: Zi Yan
  Cc: Wei Yang, akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett,
	npache, ryan.roberts, dev.jain, baohua, lance.yang, linux-mm,
	stable

On 19.11.25 15:37, David Hildenbrand (Red Hat) wrote:
>>> Given folio_test_swapcache() might have false positives,
>>> I assume we'd need a
>>>
>>> 	folio_test_swapbacked() && folio_test_swapcache(folio)
>>>
>>> To detect large large shmem folios in the swapcache in all cases here.
>>>
>>> Something like the following would hopefully do:
>>>
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 2f2a521e5d683..57aab66bedbea 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -3515,6 +3515,13 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
>>>           return ret;
>>>    }
>>>    +static bool folio_test_shmem_swapcache(struct folio *folio)
>>> +{
>>> +       VM_WARN_ON_ONCE_FOLIO(folio_test_anon(folio), folio);
>>> +       /* These folios do not have folio->mapping set. */
>>> +       return folio_test_swapbacked(folio) && folio_test_swapcache(folio);
>>> +}
>>> +
>>>    bool non_uniform_split_supported(struct folio *folio, unsigned int new_order,
>>>                   bool warns)
>>>    {
>>> @@ -3524,6 +3531,9 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order,
>>>                                   "Cannot split to order-1 folio");
>>>                   if (new_order == 1)
>>>                           return false;
>>> +       } else if (folio_test_shmem_swapcache(folio)) {
>>> +               /* TODO: support shmem folios that are in the swapcache. */
>>> +               return false;
>>
>> With this, truncated shmem returns -EINVALID instead of -EBUSY now.
>> Can s390_wiggle_split_folio() such folios?
> 
> [noting that s390_wiggle_split_folio() was just one caller where I new
> the return value differs. I suspect there might be more.]
> 
> I am still not clear on that one.
> 
> s390x obtains the folio while walking the page tables. In case it gets
> -EBUSY it simply retries to obtain the folio from the page tables.
> 
> So assuming there was concurrent truncation and we returned -EBUSY, it
> would just retry walking the page tables (trigger a fault to map a
> folio) and retry with that one.
> 
> I would assume that the shmem folio in the swapcache could never have
> worked before, and that there is no way to make progress really.
> 
> In other words: do we know how we can end up with a shmem folio that is
> in the swapcache and does not have folio->mapping set?
> 
> Could that think still be mapped into the page tables? (I hope not, but
> right now I am confused how that can happen )
> 

Ah, my memory comes back.

vmscan triggers shmem_writeout() after unmapping the folio and after making sure that there are no unexpected folio references.

shmem_writeout() will do the shmem_delete_from_page_cache() where we set folio->mapping = NULL.

So anything walking the page tables (like s390x) could never find it.


Such shmem folios really cannot get split right now until we either reclaimed them (-> freed) or until shmem_swapin_folio() re-obtained them from the swapcache to re-add them to the swapcache through shmem_add_to_page_cache().

So maybe we can just make our life easy and just keep returning -EBUSY for this scenario for the time being?

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 2f2a521e5d683..5ce86882b2727 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3619,6 +3619,16 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
         if (folio != page_folio(split_at) || folio != page_folio(lock_at))
                 return -EINVAL;
  
+       /*
+        * Folios that just got truncated cannot get split. Signal to the
+        * caller that there was a race.
+        *
+        * TODO: this will also currently refuse shmem folios that are in
+        * the swapcache.
+        */
+       if (!is_anon && !folio->mapping)
+               return -EBUSY;
+
         if (new_order >= folio_order(folio))
                 return -EINVAL;
  
@@ -3659,17 +3669,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
                 gfp_t gfp;
  
                 mapping = folio->mapping;
-
-               /* Truncated ? */
-               /*
-                * TODO: add support for large shmem folio in swap cache.
-                * When shmem is in swap cache, mapping is NULL and
-                * folio_test_swapcache() is true.
-                */
-               if (!mapping) {
-                       ret = -EBUSY;
-                       goto out;
-               }
+               VM_WARN_ON_ONCE_FOLIO(!mapping, folio);
  
                 min_order = mapping_min_folio_order(folio->mapping);
                 if (new_order < min_order) {


-- 
Cheers

David


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/huge_memory: fix NULL pointer deference when splitting shmem folio in swap cache
  2025-11-19 14:37             ` David Hildenbrand (Red Hat)
  2025-11-19 14:46               ` David Hildenbrand (Red Hat)
@ 2025-11-19 14:47               ` Zi Yan
  1 sibling, 0 replies; 22+ messages in thread
From: Zi Yan @ 2025-11-19 14:47 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat)
  Cc: Wei Yang, akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett,
	npache, ryan.roberts, dev.jain, baohua, lance.yang, linux-mm,
	stable

On 19 Nov 2025, at 9:37, David Hildenbrand (Red Hat) wrote:

>>> Given folio_test_swapcache() might have false positives,
>>> I assume we'd need a
>>>
>>> 	folio_test_swapbacked() && folio_test_swapcache(folio)
>>>
>>> To detect large large shmem folios in the swapcache in all cases here.
>>>
>>> Something like the following would hopefully do:
>>>
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 2f2a521e5d683..57aab66bedbea 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -3515,6 +3515,13 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
>>>          return ret;
>>>   }
>>>   +static bool folio_test_shmem_swapcache(struct folio *folio)
>>> +{
>>> +       VM_WARN_ON_ONCE_FOLIO(folio_test_anon(folio), folio);
>>> +       /* These folios do not have folio->mapping set. */
>>> +       return folio_test_swapbacked(folio) && folio_test_swapcache(folio);
>>> +}
>>> +
>>>   bool non_uniform_split_supported(struct folio *folio, unsigned int new_order,
>>>                  bool warns)
>>>   {
>>> @@ -3524,6 +3531,9 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order,
>>>                                  "Cannot split to order-1 folio");
>>>                  if (new_order == 1)
>>>                          return false;
>>> +       } else if (folio_test_shmem_swapcache(folio)) {
>>> +               /* TODO: support shmem folios that are in the swapcache. */
>>> +               return false;
>>
>> With this, truncated shmem returns -EINVALID instead of -EBUSY now.
>> Can s390_wiggle_split_folio() such folios?
>
> [noting that s390_wiggle_split_folio() was just one caller where I new the return value differs. I suspect there might be more.]
>
> I am still not clear on that one.
>
> s390x obtains the folio while walking the page tables. In case it gets -EBUSY it simply retries to obtain the folio from the page tables.
>
> So assuming there was concurrent truncation and we returned -EBUSY, it would just retry walking the page tables (trigger a fault to map a folio) and retry with that one.
>
> I would assume that the shmem folio in the swapcache could never have worked before, and that there is no way to make progress really.
>
> In other words: do we know how we can end up with a shmem folio that is in the swapcache and does not have folio->mapping set?
>
> Could that think still be mapped into the page tables? (I hope not, but right now I am confused how that can happen )

IIUC, in shrink_folio_list(), pageout()[1] calls writeout(), which calls
shmem_writeout(). shmem_writeout() allocates swapcache and removes the folio
from pagecache[2]. Between pageout() and the folio is freed, folio->mapping
is NULL. Before pageout(), the folio should be unmapped.


[1] https://elixir.bootlin.com/linux/v6.17.8/source/mm/vmscan.c#L1452
[2] https://elixir.bootlin.com/linux/v6.17.8/source/mm/shmem.c#L963
Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/huge_memory: fix NULL pointer deference when splitting shmem folio in swap cache
  2025-11-19 14:46               ` David Hildenbrand (Red Hat)
@ 2025-11-19 14:48                 ` Zi Yan
  2025-11-19 14:50                   ` David Hildenbrand (Red Hat)
  2025-11-19 23:18                 ` Wei Yang
  2025-11-20  0:47                 ` Wei Yang
  2 siblings, 1 reply; 22+ messages in thread
From: Zi Yan @ 2025-11-19 14:48 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat)
  Cc: Wei Yang, akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett,
	npache, ryan.roberts, dev.jain, baohua, lance.yang, linux-mm,
	stable

On 19 Nov 2025, at 9:46, David Hildenbrand (Red Hat) wrote:

> On 19.11.25 15:37, David Hildenbrand (Red Hat) wrote:
>>>> Given folio_test_swapcache() might have false positives,
>>>> I assume we'd need a
>>>>
>>>> 	folio_test_swapbacked() && folio_test_swapcache(folio)
>>>>
>>>> To detect large large shmem folios in the swapcache in all cases here.
>>>>
>>>> Something like the following would hopefully do:
>>>>
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index 2f2a521e5d683..57aab66bedbea 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -3515,6 +3515,13 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
>>>>           return ret;
>>>>    }
>>>>    +static bool folio_test_shmem_swapcache(struct folio *folio)
>>>> +{
>>>> +       VM_WARN_ON_ONCE_FOLIO(folio_test_anon(folio), folio);
>>>> +       /* These folios do not have folio->mapping set. */
>>>> +       return folio_test_swapbacked(folio) && folio_test_swapcache(folio);
>>>> +}
>>>> +
>>>>    bool non_uniform_split_supported(struct folio *folio, unsigned int new_order,
>>>>                   bool warns)
>>>>    {
>>>> @@ -3524,6 +3531,9 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order,
>>>>                                   "Cannot split to order-1 folio");
>>>>                   if (new_order == 1)
>>>>                           return false;
>>>> +       } else if (folio_test_shmem_swapcache(folio)) {
>>>> +               /* TODO: support shmem folios that are in the swapcache. */
>>>> +               return false;
>>>
>>> With this, truncated shmem returns -EINVALID instead of -EBUSY now.
>>> Can s390_wiggle_split_folio() such folios?
>>
>> [noting that s390_wiggle_split_folio() was just one caller where I new
>> the return value differs. I suspect there might be more.]
>>
>> I am still not clear on that one.
>>
>> s390x obtains the folio while walking the page tables. In case it gets
>> -EBUSY it simply retries to obtain the folio from the page tables.
>>
>> So assuming there was concurrent truncation and we returned -EBUSY, it
>> would just retry walking the page tables (trigger a fault to map a
>> folio) and retry with that one.
>>
>> I would assume that the shmem folio in the swapcache could never have
>> worked before, and that there is no way to make progress really.
>>
>> In other words: do we know how we can end up with a shmem folio that is
>> in the swapcache and does not have folio->mapping set?
>>
>> Could that think still be mapped into the page tables? (I hope not, but
>> right now I am confused how that can happen )
>>
>
> Ah, my memory comes back.
>
> vmscan triggers shmem_writeout() after unmapping the folio and after making sure that there are no unexpected folio references.
>
> shmem_writeout() will do the shmem_delete_from_page_cache() where we set folio->mapping = NULL.
>
> So anything walking the page tables (like s390x) could never find it.
>
>
> Such shmem folios really cannot get split right now until we either reclaimed them (-> freed) or until shmem_swapin_folio() re-obtained them from the swapcache to re-add them to the swapcache through shmem_add_to_page_cache().
>
> So maybe we can just make our life easy and just keep returning -EBUSY for this scenario for the time being?

Exactly, also an easy backport.

>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 2f2a521e5d683..5ce86882b2727 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3619,6 +3619,16 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>         if (folio != page_folio(split_at) || folio != page_folio(lock_at))
>                 return -EINVAL;
>  +       /*
> +        * Folios that just got truncated cannot get split. Signal to the
> +        * caller that there was a race.
> +        *
> +        * TODO: this will also currently refuse shmem folios that are in
> +        * the swapcache.
> +        */
> +       if (!is_anon && !folio->mapping)
> +               return -EBUSY;
> +
>         if (new_order >= folio_order(folio))
>                 return -EINVAL;
>  @@ -3659,17 +3669,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>                 gfp_t gfp;
>                  mapping = folio->mapping;
> -
> -               /* Truncated ? */
> -               /*
> -                * TODO: add support for large shmem folio in swap cache.
> -                * When shmem is in swap cache, mapping is NULL and
> -                * folio_test_swapcache() is true.
> -                */
> -               if (!mapping) {
> -                       ret = -EBUSY;
> -                       goto out;
> -               }
> +               VM_WARN_ON_ONCE_FOLIO(!mapping, folio);
>                  min_order = mapping_min_folio_order(folio->mapping);
>                 if (new_order < min_order) {
>
>
> -- 
> Cheers
>
> David


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/huge_memory: fix NULL pointer deference when splitting shmem folio in swap cache
  2025-11-19 14:48                 ` Zi Yan
@ 2025-11-19 14:50                   ` David Hildenbrand (Red Hat)
  0 siblings, 0 replies; 22+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-19 14:50 UTC (permalink / raw)
  To: Zi Yan
  Cc: Wei Yang, akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett,
	npache, ryan.roberts, dev.jain, baohua, lance.yang, linux-mm,
	stable

On 19.11.25 15:48, Zi Yan wrote:
> On 19 Nov 2025, at 9:46, David Hildenbrand (Red Hat) wrote:
> 
>> On 19.11.25 15:37, David Hildenbrand (Red Hat) wrote:
>>>>> Given folio_test_swapcache() might have false positives,
>>>>> I assume we'd need a
>>>>>
>>>>> 	folio_test_swapbacked() && folio_test_swapcache(folio)
>>>>>
>>>>> To detect large large shmem folios in the swapcache in all cases here.
>>>>>
>>>>> Something like the following would hopefully do:
>>>>>
>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>>> index 2f2a521e5d683..57aab66bedbea 100644
>>>>> --- a/mm/huge_memory.c
>>>>> +++ b/mm/huge_memory.c
>>>>> @@ -3515,6 +3515,13 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
>>>>>            return ret;
>>>>>     }
>>>>>     +static bool folio_test_shmem_swapcache(struct folio *folio)
>>>>> +{
>>>>> +       VM_WARN_ON_ONCE_FOLIO(folio_test_anon(folio), folio);
>>>>> +       /* These folios do not have folio->mapping set. */
>>>>> +       return folio_test_swapbacked(folio) && folio_test_swapcache(folio);
>>>>> +}
>>>>> +
>>>>>     bool non_uniform_split_supported(struct folio *folio, unsigned int new_order,
>>>>>                    bool warns)
>>>>>     {
>>>>> @@ -3524,6 +3531,9 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order,
>>>>>                                    "Cannot split to order-1 folio");
>>>>>                    if (new_order == 1)
>>>>>                            return false;
>>>>> +       } else if (folio_test_shmem_swapcache(folio)) {
>>>>> +               /* TODO: support shmem folios that are in the swapcache. */
>>>>> +               return false;
>>>>
>>>> With this, truncated shmem returns -EINVALID instead of -EBUSY now.
>>>> Can s390_wiggle_split_folio() such folios?
>>>
>>> [noting that s390_wiggle_split_folio() was just one caller where I new
>>> the return value differs. I suspect there might be more.]
>>>
>>> I am still not clear on that one.
>>>
>>> s390x obtains the folio while walking the page tables. In case it gets
>>> -EBUSY it simply retries to obtain the folio from the page tables.
>>>
>>> So assuming there was concurrent truncation and we returned -EBUSY, it
>>> would just retry walking the page tables (trigger a fault to map a
>>> folio) and retry with that one.
>>>
>>> I would assume that the shmem folio in the swapcache could never have
>>> worked before, and that there is no way to make progress really.
>>>
>>> In other words: do we know how we can end up with a shmem folio that is
>>> in the swapcache and does not have folio->mapping set?
>>>
>>> Could that think still be mapped into the page tables? (I hope not, but
>>> right now I am confused how that can happen )
>>>
>>
>> Ah, my memory comes back.
>>
>> vmscan triggers shmem_writeout() after unmapping the folio and after making sure that there are no unexpected folio references.
>>
>> shmem_writeout() will do the shmem_delete_from_page_cache() where we set folio->mapping = NULL.
>>
>> So anything walking the page tables (like s390x) could never find it.
>>
>>
>> Such shmem folios really cannot get split right now until we either reclaimed them (-> freed) or until shmem_swapin_folio() re-obtained them from the swapcache to re-add them to the swapcache through shmem_add_to_page_cache().
>>
>> So maybe we can just make our life easy and just keep returning -EBUSY for this scenario for the time being?
> 
> Exactly, also an easy backport.

Okay, let's do that then.

@Wei can you send a v2?

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/huge_memory: fix NULL pointer deference when splitting shmem folio in swap cache
  2025-11-19 14:46               ` David Hildenbrand (Red Hat)
  2025-11-19 14:48                 ` Zi Yan
@ 2025-11-19 23:18                 ` Wei Yang
  2025-11-20  0:47                 ` Wei Yang
  2 siblings, 0 replies; 22+ messages in thread
From: Wei Yang @ 2025-11-19 23:18 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat)
  Cc: Zi Yan, Wei Yang, akpm, lorenzo.stoakes, baolin.wang,
	Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, lance.yang,
	linux-mm, stable

On Wed, Nov 19, 2025 at 03:46:14PM +0100, David Hildenbrand (Red Hat) wrote:
>On 19.11.25 15:37, David Hildenbrand (Red Hat) wrote:
>> > > Given folio_test_swapcache() might have false positives,
>> > > I assume we'd need a
>> > > 
>> > > 	folio_test_swapbacked() && folio_test_swapcache(folio)
>> > > 
>> > > To detect large large shmem folios in the swapcache in all cases here.
>> > > 
>> > > Something like the following would hopefully do:
>> > > 
>> > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> > > index 2f2a521e5d683..57aab66bedbea 100644
>> > > --- a/mm/huge_memory.c
>> > > +++ b/mm/huge_memory.c
>> > > @@ -3515,6 +3515,13 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
>> > >           return ret;
>> > >    }
>> > >    +static bool folio_test_shmem_swapcache(struct folio *folio)
>> > > +{
>> > > +       VM_WARN_ON_ONCE_FOLIO(folio_test_anon(folio), folio);
>> > > +       /* These folios do not have folio->mapping set. */
>> > > +       return folio_test_swapbacked(folio) && folio_test_swapcache(folio);
>> > > +}
>> > > +
>> > >    bool non_uniform_split_supported(struct folio *folio, unsigned int new_order,
>> > >                   bool warns)
>> > >    {
>> > > @@ -3524,6 +3531,9 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order,
>> > >                                   "Cannot split to order-1 folio");
>> > >                   if (new_order == 1)
>> > >                           return false;
>> > > +       } else if (folio_test_shmem_swapcache(folio)) {
>> > > +               /* TODO: support shmem folios that are in the swapcache. */
>> > > +               return false;
>> > 
>> > With this, truncated shmem returns -EINVALID instead of -EBUSY now.
>> > Can s390_wiggle_split_folio() such folios?
>> 
>> [noting that s390_wiggle_split_folio() was just one caller where I new
>> the return value differs. I suspect there might be more.]
>> 
>> I am still not clear on that one.
>> 
>> s390x obtains the folio while walking the page tables. In case it gets
>> -EBUSY it simply retries to obtain the folio from the page tables.
>> 
>> So assuming there was concurrent truncation and we returned -EBUSY, it
>> would just retry walking the page tables (trigger a fault to map a
>> folio) and retry with that one.
>> 
>> I would assume that the shmem folio in the swapcache could never have
>> worked before, and that there is no way to make progress really.
>> 
>> In other words: do we know how we can end up with a shmem folio that is
>> in the swapcache and does not have folio->mapping set?
>> 
>> Could that think still be mapped into the page tables? (I hope not, but
>> right now I am confused how that can happen )
>> 
>
>Ah, my memory comes back.
>
>vmscan triggers shmem_writeout() after unmapping the folio and after making sure that there are no unexpected folio references.
>
>shmem_writeout() will do the shmem_delete_from_page_cache() where we set folio->mapping = NULL.
>
>So anything walking the page tables (like s390x) could never find it.
>
>
>Such shmem folios really cannot get split right now until we either reclaimed them (-> freed) or until shmem_swapin_folio() re-obtained them from the swapcache to re-add them to the swapcache through shmem_add_to_page_cache().
>
>So maybe we can just make our life easy and just keep returning -EBUSY for this scenario for the time being?
>
>diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>index 2f2a521e5d683..5ce86882b2727 100644
>--- a/mm/huge_memory.c
>+++ b/mm/huge_memory.c
>@@ -3619,6 +3619,16 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>        if (folio != page_folio(split_at) || folio != page_folio(lock_at))
>                return -EINVAL;
>+       /*
>+        * Folios that just got truncated cannot get split. Signal to the
>+        * caller that there was a race.
>+        *
>+        * TODO: this will also currently refuse shmem folios that are in
>+        * the swapcache.
>+        */
>+       if (!is_anon && !folio->mapping)
>+               return -EBUSY;
>+
>        if (new_order >= folio_order(folio))
>                return -EINVAL;
>@@ -3659,17 +3669,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>                gfp_t gfp;
>                mapping = folio->mapping;
>-
>-               /* Truncated ? */
>-               /*
>-                * TODO: add support for large shmem folio in swap cache.
>-                * When shmem is in swap cache, mapping is NULL and
>-                * folio_test_swapcache() is true.
>-                */
>-               if (!mapping) {
>-                       ret = -EBUSY;
>-                       goto out;
>-               }
>+               VM_WARN_ON_ONCE_FOLIO(!mapping, folio);
>                min_order = mapping_min_folio_order(folio->mapping);
>                if (new_order < min_order) {
>

Thanks, will prepare a v2 with this.

>
>-- 
>Cheers
>
>David

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/huge_memory: fix NULL pointer deference when splitting shmem folio in swap cache
  2025-11-19 14:46               ` David Hildenbrand (Red Hat)
  2025-11-19 14:48                 ` Zi Yan
  2025-11-19 23:18                 ` Wei Yang
@ 2025-11-20  0:47                 ` Wei Yang
  2025-11-20  3:00                   ` Zi Yan
  2 siblings, 1 reply; 22+ messages in thread
From: Wei Yang @ 2025-11-20  0:47 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat)
  Cc: Zi Yan, Wei Yang, akpm, lorenzo.stoakes, baolin.wang,
	Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, lance.yang,
	linux-mm, stable

On Wed, Nov 19, 2025 at 03:46:14PM +0100, David Hildenbrand (Red Hat) wrote:
>On 19.11.25 15:37, David Hildenbrand (Red Hat) wrote:
>> > > Given folio_test_swapcache() might have false positives,
>> > > I assume we'd need a
>> > > 
>> > > 	folio_test_swapbacked() && folio_test_swapcache(folio)
>> > > 
>> > > To detect large large shmem folios in the swapcache in all cases here.
>> > > 
>> > > Something like the following would hopefully do:
>> > > 
>> > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> > > index 2f2a521e5d683..57aab66bedbea 100644
>> > > --- a/mm/huge_memory.c
>> > > +++ b/mm/huge_memory.c
>> > > @@ -3515,6 +3515,13 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
>> > >           return ret;
>> > >    }
>> > >    +static bool folio_test_shmem_swapcache(struct folio *folio)
>> > > +{
>> > > +       VM_WARN_ON_ONCE_FOLIO(folio_test_anon(folio), folio);
>> > > +       /* These folios do not have folio->mapping set. */
>> > > +       return folio_test_swapbacked(folio) && folio_test_swapcache(folio);
>> > > +}
>> > > +
>> > >    bool non_uniform_split_supported(struct folio *folio, unsigned int new_order,
>> > >                   bool warns)
>> > >    {
>> > > @@ -3524,6 +3531,9 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order,
>> > >                                   "Cannot split to order-1 folio");
>> > >                   if (new_order == 1)
>> > >                           return false;
>> > > +       } else if (folio_test_shmem_swapcache(folio)) {
>> > > +               /* TODO: support shmem folios that are in the swapcache. */
>> > > +               return false;
>> > 
>> > With this, truncated shmem returns -EINVALID instead of -EBUSY now.
>> > Can s390_wiggle_split_folio() such folios?
>> 
>> [noting that s390_wiggle_split_folio() was just one caller where I new
>> the return value differs. I suspect there might be more.]
>> 
>> I am still not clear on that one.
>> 
>> s390x obtains the folio while walking the page tables. In case it gets
>> -EBUSY it simply retries to obtain the folio from the page tables.
>> 
>> So assuming there was concurrent truncation and we returned -EBUSY, it
>> would just retry walking the page tables (trigger a fault to map a
>> folio) and retry with that one.
>> 
>> I would assume that the shmem folio in the swapcache could never have
>> worked before, and that there is no way to make progress really.
>> 
>> In other words: do we know how we can end up with a shmem folio that is
>> in the swapcache and does not have folio->mapping set?
>> 
>> Could that think still be mapped into the page tables? (I hope not, but
>> right now I am confused how that can happen )
>> 
>
>Ah, my memory comes back.
>
>vmscan triggers shmem_writeout() after unmapping the folio and after making sure that there are no unexpected folio references.
>
>shmem_writeout() will do the shmem_delete_from_page_cache() where we set folio->mapping = NULL.
>
>So anything walking the page tables (like s390x) could never find it.
>
>
>Such shmem folios really cannot get split right now until we either reclaimed them (-> freed) or until shmem_swapin_folio() re-obtained them from the swapcache to re-add them to the swapcache through shmem_add_to_page_cache().
>
>So maybe we can just make our life easy and just keep returning -EBUSY for this scenario for the time being?
>
>diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>index 2f2a521e5d683..5ce86882b2727 100644
>--- a/mm/huge_memory.c
>+++ b/mm/huge_memory.c
>@@ -3619,6 +3619,16 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>        if (folio != page_folio(split_at) || folio != page_folio(lock_at))
>                return -EINVAL;
>+       /*
>+        * Folios that just got truncated cannot get split. Signal to the
>+        * caller that there was a race.
>+        *
>+        * TODO: this will also currently refuse shmem folios that are in
>+        * the swapcache.
>+        */
>+       if (!is_anon && !folio->mapping)
>+               return -EBUSY;
>+
>        if (new_order >= folio_order(folio))
>                return -EINVAL;
>@@ -3659,17 +3669,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>                gfp_t gfp;
>                mapping = folio->mapping;
>-
>-               /* Truncated ? */
>-               /*
>-                * TODO: add support for large shmem folio in swap cache.
>-                * When shmem is in swap cache, mapping is NULL and
>-                * folio_test_swapcache() is true.
>-                */
>-               if (!mapping) {
>-                       ret = -EBUSY;
>-                       goto out;
>-               }
>+               VM_WARN_ON_ONCE_FOLIO(!mapping, folio);
>                min_order = mapping_min_folio_order(folio->mapping);
>                if (new_order < min_order) {
>

One more thing come up my mind.

Current folio_split_supported() is used in try_folio_split_to_order().

Here are related commits:

[1] commit 7460b470a131f985a70302a322617121efdd7caa
    Author: Zi Yan <ziy@nvidia.com>
    Date:   Fri Mar 7 12:40:00 2025 -0500

        mm/truncate: use folio_split() in truncate operation

[2] commit 77008e1b2ef73249bceb078a321a3ff6bc087afb
    Author: Zi Yan <ziy@nvidia.com>
    Date:   Thu Oct 16 21:36:30 2025 -0400
    
        mm/huge_memory: do not change split_huge_page*() target order silently

[1] looks fine, because before calling folio_split_supported(),
min_order_for_split() would return negative if !folio->mapping.

But [2] moves min_order_for_split() from try_folio_split_to_order() to it
caller.

Currently it looks good, but not sure it will leave potential misuse.

>
>-- 
>Cheers
>
>David

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/huge_memory: fix NULL pointer deference when splitting shmem folio in swap cache
  2025-11-20  0:47                 ` Wei Yang
@ 2025-11-20  3:00                   ` Zi Yan
  0 siblings, 0 replies; 22+ messages in thread
From: Zi Yan @ 2025-11-20  3:00 UTC (permalink / raw)
  To: Wei Yang
  Cc: David Hildenbrand (Red Hat),
	akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, lance.yang, linux-mm, stable

On 19 Nov 2025, at 19:47, Wei Yang wrote:

> On Wed, Nov 19, 2025 at 03:46:14PM +0100, David Hildenbrand (Red Hat) wrote:
>> On 19.11.25 15:37, David Hildenbrand (Red Hat) wrote:
>>>>> Given folio_test_swapcache() might have false positives,
>>>>> I assume we'd need a
>>>>>
>>>>> 	folio_test_swapbacked() && folio_test_swapcache(folio)
>>>>>
>>>>> To detect large large shmem folios in the swapcache in all cases here.
>>>>>
>>>>> Something like the following would hopefully do:
>>>>>
>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>>> index 2f2a521e5d683..57aab66bedbea 100644
>>>>> --- a/mm/huge_memory.c
>>>>> +++ b/mm/huge_memory.c
>>>>> @@ -3515,6 +3515,13 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
>>>>>           return ret;
>>>>>    }
>>>>>    +static bool folio_test_shmem_swapcache(struct folio *folio)
>>>>> +{
>>>>> +       VM_WARN_ON_ONCE_FOLIO(folio_test_anon(folio), folio);
>>>>> +       /* These folios do not have folio->mapping set. */
>>>>> +       return folio_test_swapbacked(folio) && folio_test_swapcache(folio);
>>>>> +}
>>>>> +
>>>>>    bool non_uniform_split_supported(struct folio *folio, unsigned int new_order,
>>>>>                   bool warns)
>>>>>    {
>>>>> @@ -3524,6 +3531,9 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order,
>>>>>                                   "Cannot split to order-1 folio");
>>>>>                   if (new_order == 1)
>>>>>                           return false;
>>>>> +       } else if (folio_test_shmem_swapcache(folio)) {
>>>>> +               /* TODO: support shmem folios that are in the swapcache. */
>>>>> +               return false;
>>>>
>>>> With this, truncated shmem returns -EINVALID instead of -EBUSY now.
>>>> Can s390_wiggle_split_folio() such folios?
>>>
>>> [noting that s390_wiggle_split_folio() was just one caller where I new
>>> the return value differs. I suspect there might be more.]
>>>
>>> I am still not clear on that one.
>>>
>>> s390x obtains the folio while walking the page tables. In case it gets
>>> -EBUSY it simply retries to obtain the folio from the page tables.
>>>
>>> So assuming there was concurrent truncation and we returned -EBUSY, it
>>> would just retry walking the page tables (trigger a fault to map a
>>> folio) and retry with that one.
>>>
>>> I would assume that the shmem folio in the swapcache could never have
>>> worked before, and that there is no way to make progress really.
>>>
>>> In other words: do we know how we can end up with a shmem folio that is
>>> in the swapcache and does not have folio->mapping set?
>>>
>>> Could that think still be mapped into the page tables? (I hope not, but
>>> right now I am confused how that can happen )
>>>
>>
>> Ah, my memory comes back.
>>
>> vmscan triggers shmem_writeout() after unmapping the folio and after making sure that there are no unexpected folio references.
>>
>> shmem_writeout() will do the shmem_delete_from_page_cache() where we set folio->mapping = NULL.
>>
>> So anything walking the page tables (like s390x) could never find it.
>>
>>
>> Such shmem folios really cannot get split right now until we either reclaimed them (-> freed) or until shmem_swapin_folio() re-obtained them from the swapcache to re-add them to the swapcache through shmem_add_to_page_cache().
>>
>> So maybe we can just make our life easy and just keep returning -EBUSY for this scenario for the time being?
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 2f2a521e5d683..5ce86882b2727 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -3619,6 +3619,16 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>        if (folio != page_folio(split_at) || folio != page_folio(lock_at))
>>                return -EINVAL;
>> +       /*
>> +        * Folios that just got truncated cannot get split. Signal to the
>> +        * caller that there was a race.
>> +        *
>> +        * TODO: this will also currently refuse shmem folios that are in
>> +        * the swapcache.
>> +        */
>> +       if (!is_anon && !folio->mapping)
>> +               return -EBUSY;
>> +
>>        if (new_order >= folio_order(folio))
>>                return -EINVAL;
>> @@ -3659,17 +3669,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>                gfp_t gfp;
>>                mapping = folio->mapping;
>> -
>> -               /* Truncated ? */
>> -               /*
>> -                * TODO: add support for large shmem folio in swap cache.
>> -                * When shmem is in swap cache, mapping is NULL and
>> -                * folio_test_swapcache() is true.
>> -                */
>> -               if (!mapping) {
>> -                       ret = -EBUSY;
>> -                       goto out;
>> -               }
>> +               VM_WARN_ON_ONCE_FOLIO(!mapping, folio);
>>                min_order = mapping_min_folio_order(folio->mapping);
>>                if (new_order < min_order) {
>>
>
> One more thing come up my mind.
>
> Current folio_split_supported() is used in try_folio_split_to_order().
>
> Here are related commits:
>
> [1] commit 7460b470a131f985a70302a322617121efdd7caa
>     Author: Zi Yan <ziy@nvidia.com>
>     Date:   Fri Mar 7 12:40:00 2025 -0500
>
>         mm/truncate: use folio_split() in truncate operation
>
> [2] commit 77008e1b2ef73249bceb078a321a3ff6bc087afb
>     Author: Zi Yan <ziy@nvidia.com>
>     Date:   Thu Oct 16 21:36:30 2025 -0400
>
>         mm/huge_memory: do not change split_huge_page*() target order silently
>
> [1] looks fine, because before calling folio_split_supported(),
> min_order_for_split() would return negative if !folio->mapping.
>
> But [2] moves min_order_for_split() from try_folio_split_to_order() to it
> caller.
>
> Currently it looks good, but not sure it will leave potential misuse.

I am sending patches to handle it. Thank you for spotting it.
Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2025-11-20  3:00 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-19  1:26 [PATCH] mm/huge_memory: fix NULL pointer deference when splitting shmem folio in swap cache Wei Yang
2025-11-19  2:32 ` Zi Yan
2025-11-19  2:56   ` Wei Yang
2025-11-19  8:57 ` David Hildenbrand (Red Hat)
2025-11-19 12:23   ` Wei Yang
2025-11-19 12:54     ` David Hildenbrand (Red Hat)
2025-11-19 13:08       ` Zi Yan
2025-11-19 13:41         ` Wei Yang
2025-11-19 13:58           ` David Hildenbrand (Red Hat)
2025-11-19 14:09         ` David Hildenbrand (Red Hat)
2025-11-19 14:29           ` Zi Yan
2025-11-19 14:37             ` David Hildenbrand (Red Hat)
2025-11-19 14:46               ` David Hildenbrand (Red Hat)
2025-11-19 14:48                 ` Zi Yan
2025-11-19 14:50                   ` David Hildenbrand (Red Hat)
2025-11-19 23:18                 ` Wei Yang
2025-11-20  0:47                 ` Wei Yang
2025-11-20  3:00                   ` Zi Yan
2025-11-19 14:47               ` Zi Yan
2025-11-19 13:14       ` Wei Yang
2025-11-19 12:42   ` Wei Yang
2025-11-19 14:13     ` David Hildenbrand (Red Hat)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox