* [RFC PATCH 0/3] folio->mapping == NULL check issue
@ 2025-11-20 3:59 Zi Yan
2025-11-20 3:59 ` [RFC PATCH 1/3] mm/huge_memory: prevent NULL pointer dereference in try_folio_split_to_order() Zi Yan
` (2 more replies)
0 siblings, 3 replies; 20+ messages in thread
From: Zi Yan @ 2025-11-20 3:59 UTC (permalink / raw)
To: David Hildenbrand, Lorenzo Stoakes
Cc: Andrew Morton, Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache,
Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Miaohe Lin,
Naoya Horiguchi, linux-mm, linux-kernel
Hi all,
Based on Wei's observation[1], this patchset is trying to solve several
potential issues:
1. Patch 1, 2: dereferencing NULL folio->mapping in
try_folio_split_to_order() for future users,
2. Patch 3: improper handling of negative return value of
min_order_for_split() in mm/memory-failure.c
No bug is present for the existing code.
For 1, try_folio_split_to_order() is used in
truncate_inode_partial_folio() and it is fine since folio->mapping is
always not NULL there.
For 2, mm/memory-failure.c code does not check for -EBUSY return value of
min_order_for_split() and directly passes it to a folio split function.
The code works by accident but needs to be fixed.
folio_split_supported() does not check if folio->mapping is NULL, but
can dereference it in some cases. My current fix is adding a kernel-doc
to clarify this requirement. An alternative is to add folio->mapping
check and return -EBUSY and return -EINVAL for the other checks. But
folio_split_supported() will no longer checks for "supported",
folio_split_can_split() might be a better name. Then, the existing
can_split_folio() might better be renamed to
folio_split_refcount_check(). In addition, try_folio_split_to_order(),
which calls folio_split_supported(), might want to handle -EBUSY separately
to avoid calling split_huge_page_to_order() when the folio is truncated
for a known -EBUSY error.
Let me know your thoughts. Thanks.
Link: https://lore.kernel.org/all/20251120004735.52z7r4xmogw7mbsj@master/ [1]
Zi Yan (3):
mm/huge_memory: prevent NULL pointer dereference in
try_folio_split_to_order()
mm/huge_memory: add kernel-doc for folio_split_supported()
mm/memory-failure: handle min_order_for_split() error code properly
include/linux/huge_mm.h | 7 +++++++
mm/huge_memory.c | 17 +++++++++++++++++
mm/memory-failure.c | 8 ++++++--
3 files changed, 30 insertions(+), 2 deletions(-)
--
2.51.0
^ permalink raw reply [flat|nested] 20+ messages in thread
* [RFC PATCH 1/3] mm/huge_memory: prevent NULL pointer dereference in try_folio_split_to_order()
2025-11-20 3:59 [RFC PATCH 0/3] folio->mapping == NULL check issue Zi Yan
@ 2025-11-20 3:59 ` Zi Yan
2025-11-20 4:28 ` Balbir Singh
2025-11-20 9:25 ` David Hildenbrand (Red Hat)
2025-11-20 3:59 ` [RFC PATCH 2/3] mm/huge_memory: add kernel-doc for folio_split_supported() Zi Yan
2025-11-20 3:59 ` [RFC PATCH 3/3] mm/memory-failure: handle min_order_for_split() error code properly Zi Yan
2 siblings, 2 replies; 20+ messages in thread
From: Zi Yan @ 2025-11-20 3:59 UTC (permalink / raw)
To: David Hildenbrand, Lorenzo Stoakes
Cc: Andrew Morton, Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache,
Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Miaohe Lin,
Naoya Horiguchi, linux-mm, linux-kernel
folio_split_supported() used in try_folio_split_to_order() requires
folio->mapping to be non NULL, but current try_folio_split_to_order() does
not check it. Add the check to prevent NULL pointer dereference.
There is no issue in the current code, since try_folio_split_to_order() is
only used in truncate_inode_partial_folio(), where folio->mapping is not
NULL.
Signed-off-by: Zi Yan <ziy@nvidia.com>
---
include/linux/huge_mm.h | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 1d439de1ca2c..0d55354e3a34 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -407,6 +407,13 @@ static inline int split_huge_page_to_order(struct page *page, unsigned int new_o
static inline int try_folio_split_to_order(struct folio *folio,
struct page *page, unsigned int new_order)
{
+ /*
+ * Folios that just got truncated cannot get split. Signal to the
+ * caller that there was a race.
+ */
+ if (!folio_test_anon(folio) && !folio->mapping)
+ return -EBUSY;
+
if (!folio_split_supported(folio, new_order, SPLIT_TYPE_NON_UNIFORM, /* warns= */ false))
return split_huge_page_to_order(&folio->page, new_order);
return folio_split(folio, new_order, page, NULL);
--
2.51.0
^ permalink raw reply [flat|nested] 20+ messages in thread
* [RFC PATCH 2/3] mm/huge_memory: add kernel-doc for folio_split_supported()
2025-11-20 3:59 [RFC PATCH 0/3] folio->mapping == NULL check issue Zi Yan
2025-11-20 3:59 ` [RFC PATCH 1/3] mm/huge_memory: prevent NULL pointer dereference in try_folio_split_to_order() Zi Yan
@ 2025-11-20 3:59 ` Zi Yan
2025-11-20 4:37 ` Balbir Singh
2025-11-20 9:27 ` David Hildenbrand (Red Hat)
2025-11-20 3:59 ` [RFC PATCH 3/3] mm/memory-failure: handle min_order_for_split() error code properly Zi Yan
2 siblings, 2 replies; 20+ messages in thread
From: Zi Yan @ 2025-11-20 3:59 UTC (permalink / raw)
To: David Hildenbrand, Lorenzo Stoakes
Cc: Andrew Morton, Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache,
Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Miaohe Lin,
Naoya Horiguchi, linux-mm, linux-kernel
It clarifies that folio_split_supported() does not check folio->mapping and
can dereference it.
Signed-off-by: Zi Yan <ziy@nvidia.com>
---
mm/huge_memory.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index efea42d68157..15e555f1b85d 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3688,6 +3688,23 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
return 0;
}
+/**
+ * folio_split_supported() - check if a folio can be split to a given order
+ * @folio: folio to be split
+ * @new_order: the smallest order of the after split folios (since buddy
+ * allocator like split generates folios with orders from @folio's
+ * order - 1 to new_order).
+ * @split_type: uniform or non-uniform split
+ * @warns: whether gives warnings or not for the checks in the function
+ *
+ * folio_split_supported() checks if @folio can be split to @new_order using
+ * @split_type method.
+ *
+ * Context: Caller must make sure folio->mapping is not NULL, since the
+ * function does not check it and can dereference folio->mapping
+ * Return: true - @folio can be split to @new_order, false - @folio cannot be
+ * split
+ */
bool folio_split_supported(struct folio *folio, unsigned int new_order,
enum split_type split_type, bool warns)
{
--
2.51.0
^ permalink raw reply [flat|nested] 20+ messages in thread
* [RFC PATCH 3/3] mm/memory-failure: handle min_order_for_split() error code properly
2025-11-20 3:59 [RFC PATCH 0/3] folio->mapping == NULL check issue Zi Yan
2025-11-20 3:59 ` [RFC PATCH 1/3] mm/huge_memory: prevent NULL pointer dereference in try_folio_split_to_order() Zi Yan
2025-11-20 3:59 ` [RFC PATCH 2/3] mm/huge_memory: add kernel-doc for folio_split_supported() Zi Yan
@ 2025-11-20 3:59 ` Zi Yan
2025-11-20 4:45 ` Balbir Singh
2025-11-20 9:37 ` David Hildenbrand (Red Hat)
2 siblings, 2 replies; 20+ messages in thread
From: Zi Yan @ 2025-11-20 3:59 UTC (permalink / raw)
To: David Hildenbrand, Lorenzo Stoakes
Cc: Andrew Morton, Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache,
Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Miaohe Lin,
Naoya Horiguchi, linux-mm, linux-kernel
min_order_for_split() returns -EBUSY when the folio is truncated and cannot
be split. In commit 77008e1b2ef7 ("mm/huge_memory: do not change
split_huge_page*() target order silently"), memory_failure() does not
handle it and pass -EBUSY to try_to_split_thp_page() directly.
try_to_split_thp_page() returns -EINVAL since -EBUSY becomes 0xfffffff0 as
new_order is unsigned int in __folio_split() and this large new_order is
rejected as an invalid input. The code does not cause a bug.
soft_offline_in_use_page() also uses min_order_for_split() but it always
passes 0 as new_order for split.
Handle it properly by checking min_order_for_split() return value and not
calling try_to_split_thp_page() if the value is negative. Add a comment
in soft_offline_in_use_page() to clarify the possible negative new_order
value.
Signed-off-by: Zi Yan <ziy@nvidia.com>
---
mm/memory-failure.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 7f908ad795ad..86582f030159 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2437,8 +2437,11 @@ int memory_failure(unsigned long pfn, int flags)
* or unhandlable page. The refcount is bumped iff the
* page is a valid handlable page.
*/
- folio_set_has_hwpoisoned(folio);
- err = try_to_split_thp_page(p, new_order, /* release= */ false);
+ if (new_order >= 0) {
+ folio_set_has_hwpoisoned(folio);
+ err = try_to_split_thp_page(p, new_order, /* release= */ false);
+ } else
+ err = new_order;
/*
* If splitting a folio to order-0 fails, kill the process.
* Split the folio regardless to minimize unusable pages.
@@ -2779,6 +2782,7 @@ static int soft_offline_in_use_page(struct page *page)
/*
* If new_order (target split order) is not 0, do not split the
* folio at all to retain the still accessible large folio.
+ * new_order can be -EBUSY, meaning the folio cannot be split.
* NOTE: if minimizing the number of soft offline pages is
* preferred, split it to non-zero new_order like it is done in
* memory_failure().
--
2.51.0
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 1/3] mm/huge_memory: prevent NULL pointer dereference in try_folio_split_to_order()
2025-11-20 3:59 ` [RFC PATCH 1/3] mm/huge_memory: prevent NULL pointer dereference in try_folio_split_to_order() Zi Yan
@ 2025-11-20 4:28 ` Balbir Singh
2025-11-20 14:45 ` Zi Yan
2025-11-20 9:25 ` David Hildenbrand (Red Hat)
1 sibling, 1 reply; 20+ messages in thread
From: Balbir Singh @ 2025-11-20 4:28 UTC (permalink / raw)
To: Zi Yan, David Hildenbrand, Lorenzo Stoakes
Cc: Andrew Morton, Baolin Wang, Liam R. Howlett, Nico Pache,
Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Miaohe Lin,
Naoya Horiguchi, linux-mm, linux-kernel
On 11/20/25 14:59, Zi Yan wrote:
> folio_split_supported() used in try_folio_split_to_order() requires
> folio->mapping to be non NULL, but current try_folio_split_to_order() does
> not check it. Add the check to prevent NULL pointer dereference.
>
> There is no issue in the current code, since try_folio_split_to_order() is
> only used in truncate_inode_partial_folio(), where folio->mapping is not
> NULL.
>
Just reading through the description does not clarify one thing
What is the race between just truncated and trying to split them - is there a common lock
that needs to be held? Is it the subsequent call in truncate_inode_partial_folio()
that causes the race?
IOW, if a folio is not anonymous and does not have a mapping, how is it
being passed to this function?
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
> include/linux/huge_mm.h | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 1d439de1ca2c..0d55354e3a34 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -407,6 +407,13 @@ static inline int split_huge_page_to_order(struct page *page, unsigned int new_o
> static inline int try_folio_split_to_order(struct folio *folio,
> struct page *page, unsigned int new_order)
> {
> + /*
> + * Folios that just got truncated cannot get split. Signal to the
> + * caller that there was a race.
> + */
> + if (!folio_test_anon(folio) && !folio->mapping)
> + return -EBUSY;
> +
> if (!folio_split_supported(folio, new_order, SPLIT_TYPE_NON_UNIFORM, /* warns= */ false))
> return split_huge_page_to_order(&folio->page, new_order);
> return folio_split(folio, new_order, page, NULL);
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 2/3] mm/huge_memory: add kernel-doc for folio_split_supported()
2025-11-20 3:59 ` [RFC PATCH 2/3] mm/huge_memory: add kernel-doc for folio_split_supported() Zi Yan
@ 2025-11-20 4:37 ` Balbir Singh
2025-11-20 9:27 ` David Hildenbrand (Red Hat)
1 sibling, 0 replies; 20+ messages in thread
From: Balbir Singh @ 2025-11-20 4:37 UTC (permalink / raw)
To: Zi Yan, David Hildenbrand, Lorenzo Stoakes
Cc: Andrew Morton, Baolin Wang, Liam R. Howlett, Nico Pache,
Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Miaohe Lin,
Naoya Horiguchi, linux-mm, linux-kernel
On 11/20/25 14:59, Zi Yan wrote:
> It clarifies that folio_split_supported() does not check folio->mapping and
> can dereference it.
>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
> mm/huge_memory.c | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index efea42d68157..15e555f1b85d 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3688,6 +3688,23 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
> return 0;
> }
>
> +/**
> + * folio_split_supported() - check if a folio can be split to a given order
> + * @folio: folio to be split
> + * @new_order: the smallest order of the after split folios (since buddy
> + * allocator like split generates folios with orders from @folio's
> + * order - 1 to new_order).
> + * @split_type: uniform or non-uniform split
> + * @warns: whether gives warnings or not for the checks in the function
> + *
> + * folio_split_supported() checks if @folio can be split to @new_order using
> + * @split_type method.
> + *
> + * Context: Caller must make sure folio->mapping is not NULL, since the
> + * function does not check it and can dereference folio->mapping
> + * Return: true - @folio can be split to @new_order, false - @folio cannot be
> + * split
> + */
> bool folio_split_supported(struct folio *folio, unsigned int new_order,
> enum split_type split_type, bool warns)
> {
Looks good!
Reviewed-by: Balbir Singh <balbirs@nvidia.com>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 3/3] mm/memory-failure: handle min_order_for_split() error code properly
2025-11-20 3:59 ` [RFC PATCH 3/3] mm/memory-failure: handle min_order_for_split() error code properly Zi Yan
@ 2025-11-20 4:45 ` Balbir Singh
2025-11-20 15:00 ` Zi Yan
2025-11-20 9:37 ` David Hildenbrand (Red Hat)
1 sibling, 1 reply; 20+ messages in thread
From: Balbir Singh @ 2025-11-20 4:45 UTC (permalink / raw)
To: Zi Yan, David Hildenbrand, Lorenzo Stoakes
Cc: Andrew Morton, Baolin Wang, Liam R. Howlett, Nico Pache,
Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Miaohe Lin,
Naoya Horiguchi, linux-mm, linux-kernel
On 11/20/25 14:59, Zi Yan wrote:
> min_order_for_split() returns -EBUSY when the folio is truncated and cannot
> be split. In commit 77008e1b2ef7 ("mm/huge_memory: do not change
> split_huge_page*() target order silently"), memory_failure() does not
> handle it and pass -EBUSY to try_to_split_thp_page() directly.
> try_to_split_thp_page() returns -EINVAL since -EBUSY becomes 0xfffffff0 as
> new_order is unsigned int in __folio_split() and this large new_order is
> rejected as an invalid input. The code does not cause a bug.
> soft_offline_in_use_page() also uses min_order_for_split() but it always
> passes 0 as new_order for split.
>
> Handle it properly by checking min_order_for_split() return value and not
> calling try_to_split_thp_page() if the value is negative. Add a comment
> in soft_offline_in_use_page() to clarify the possible negative new_order
> value.
>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
> mm/memory-failure.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 7f908ad795ad..86582f030159 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -2437,8 +2437,11 @@ int memory_failure(unsigned long pfn, int flags)
> * or unhandlable page. The refcount is bumped iff the
> * page is a valid handlable page.
> */
> - folio_set_has_hwpoisoned(folio);
> - err = try_to_split_thp_page(p, new_order, /* release= */ false);
> + if (new_order >= 0) {
> + folio_set_has_hwpoisoned(folio);
if new_order < 0, do we skip setting hwpoisioned bit on the folio?
> + err = try_to_split_thp_page(p, new_order, /* release= */ false);
> + } else
> + err = new_order;
> /*
> * If splitting a folio to order-0 fails, kill the process.
> * Split the folio regardless to minimize unusable pages.
> @@ -2779,6 +2782,7 @@ static int soft_offline_in_use_page(struct page *page)
> /*
> * If new_order (target split order) is not 0, do not split the
> * folio at all to retain the still accessible large folio.
> + * new_order can be -EBUSY, meaning the folio cannot be split.
> * NOTE: if minimizing the number of soft offline pages is
> * preferred, split it to non-zero new_order like it is done in
> * memory_failure().
Balbir
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 1/3] mm/huge_memory: prevent NULL pointer dereference in try_folio_split_to_order()
2025-11-20 3:59 ` [RFC PATCH 1/3] mm/huge_memory: prevent NULL pointer dereference in try_folio_split_to_order() Zi Yan
2025-11-20 4:28 ` Balbir Singh
@ 2025-11-20 9:25 ` David Hildenbrand (Red Hat)
2025-11-20 14:41 ` Zi Yan
1 sibling, 1 reply; 20+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-20 9:25 UTC (permalink / raw)
To: Zi Yan, Lorenzo Stoakes
Cc: Andrew Morton, Baolin Wang, Liam R. Howlett, Nico Pache,
Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Miaohe Lin,
Naoya Horiguchi, linux-mm, linux-kernel
On 11/20/25 04:59, Zi Yan wrote:
> folio_split_supported() used in try_folio_split_to_order() requires
> folio->mapping to be non NULL, but current try_folio_split_to_order() does
> not check it. Add the check to prevent NULL pointer dereference.
>
> There is no issue in the current code, since try_folio_split_to_order() is
> only used in truncate_inode_partial_folio(), where folio->mapping is not
> NULL.
>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
> include/linux/huge_mm.h | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 1d439de1ca2c..0d55354e3a34 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -407,6 +407,13 @@ static inline int split_huge_page_to_order(struct page *page, unsigned int new_o
> static inline int try_folio_split_to_order(struct folio *folio,
> struct page *page, unsigned int new_order)
> {
> + /*
> + * Folios that just got truncated cannot get split. Signal to the
> + * caller that there was a race.
> + */
> + if (!folio_test_anon(folio) && !folio->mapping)
> + return -EBUSY;
> +
> if (!folio_split_supported(folio, new_order, SPLIT_TYPE_NON_UNIFORM, /* warns= */ false))
> return split_huge_page_to_order(&folio->page, new_order);
> return folio_split(folio, new_order, page, NULL);
I guess we'll take the one from Wei
https://lkml.kernel.org/r/20251119235302.24773-1-richard.weiyang@gmail.com
right?
--
Cheers
David
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 2/3] mm/huge_memory: add kernel-doc for folio_split_supported()
2025-11-20 3:59 ` [RFC PATCH 2/3] mm/huge_memory: add kernel-doc for folio_split_supported() Zi Yan
2025-11-20 4:37 ` Balbir Singh
@ 2025-11-20 9:27 ` David Hildenbrand (Red Hat)
2025-11-20 14:48 ` Zi Yan
1 sibling, 1 reply; 20+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-20 9:27 UTC (permalink / raw)
To: Zi Yan, Lorenzo Stoakes
Cc: Andrew Morton, Baolin Wang, Liam R. Howlett, Nico Pache,
Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Miaohe Lin,
Naoya Horiguchi, linux-mm, linux-kernel
On 11/20/25 04:59, Zi Yan wrote:
> It clarifies that folio_split_supported() does not check folio->mapping and
> can dereference it.
>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
> mm/huge_memory.c | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index efea42d68157..15e555f1b85d 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3688,6 +3688,23 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
> return 0;
> }
>
> +/**
> + * folio_split_supported() - check if a folio can be split to a given order
> + * @folio: folio to be split
> + * @new_order: the smallest order of the after split folios (since buddy
> + * allocator like split generates folios with orders from @folio's
> + * order - 1 to new_order).
> + * @split_type: uniform or non-uniform split
> + * @warns: whether gives warnings or not for the checks in the function
> + *
> + * folio_split_supported() checks if @folio can be split to @new_order using
> + * @split_type method.
> + *
> + * Context: Caller must make sure folio->mapping is not NULL, since the
> + * function does not check it and can dereference folio->mapping
Only for anon folios. Also, I would drop the detail about dereference.
I guess we really need the folio lock to prevent concurrent truncation.
Maybe something like:
"The folio must be locked. For non-anon folios, the caller must make
sure that folio->mapping is not NULL (e.g., not truncated)."
--
Cheers
David
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 3/3] mm/memory-failure: handle min_order_for_split() error code properly
2025-11-20 3:59 ` [RFC PATCH 3/3] mm/memory-failure: handle min_order_for_split() error code properly Zi Yan
2025-11-20 4:45 ` Balbir Singh
@ 2025-11-20 9:37 ` David Hildenbrand (Red Hat)
2025-11-20 14:59 ` Zi Yan
1 sibling, 1 reply; 20+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-20 9:37 UTC (permalink / raw)
To: Zi Yan, Lorenzo Stoakes
Cc: Andrew Morton, Baolin Wang, Liam R. Howlett, Nico Pache,
Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Miaohe Lin,
Naoya Horiguchi, linux-mm, linux-kernel
On 11/20/25 04:59, Zi Yan wrote:
> min_order_for_split() returns -EBUSY when the folio is truncated and cannot
> be split. In commit 77008e1b2ef7 ("mm/huge_memory: do not change
> split_huge_page*() target order silently"), memory_failure() does not
> handle it and pass -EBUSY to try_to_split_thp_page() directly.
> try_to_split_thp_page() returns -EINVAL since -EBUSY becomes 0xfffffff0 as
I'm wondering whether we should change min_order_for_split() to something like:
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 7c69572b6c3f5..34eb6fec9a059 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -4210,16 +4210,19 @@ int folio_split(struct folio *folio, unsigned int new_order,
SPLIT_TYPE_NON_UNIFORM);
}
-int min_order_for_split(struct folio *folio)
+unsigned int min_order_for_split(struct folio *folio)
{
if (folio_test_anon(folio))
return 0;
+ /*
+ * If the folio got truncated, we don't know the previous mapping and
+ * consequently the old min order. But it doesn't matter, as any split
+ * attempt will immediately fail with -EBUSY as the folio cannot get
+ * split until freed.
+ */
if (!folio->mapping) {
- if (folio_test_pmd_mappable(folio))
- count_vm_event(THP_SPLIT_PAGE_FAILED);
- return -EBUSY;
- }
+ return 0;
return mapping_min_folio_order(folio->mapping);
}
--
Cheers
David
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 1/3] mm/huge_memory: prevent NULL pointer dereference in try_folio_split_to_order()
2025-11-20 9:25 ` David Hildenbrand (Red Hat)
@ 2025-11-20 14:41 ` Zi Yan
2025-11-20 19:56 ` David Hildenbrand (Red Hat)
0 siblings, 1 reply; 20+ messages in thread
From: Zi Yan @ 2025-11-20 14:41 UTC (permalink / raw)
To: David Hildenbrand (Red Hat)
Cc: Lorenzo Stoakes, Andrew Morton, Baolin Wang, Liam R. Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
Miaohe Lin, Naoya Horiguchi, linux-mm, linux-kernel
On 20 Nov 2025, at 4:25, David Hildenbrand (Red Hat) wrote:
> On 11/20/25 04:59, Zi Yan wrote:
>> folio_split_supported() used in try_folio_split_to_order() requires
>> folio->mapping to be non NULL, but current try_folio_split_to_order() does
>> not check it. Add the check to prevent NULL pointer dereference.
>>
>> There is no issue in the current code, since try_folio_split_to_order() is
>> only used in truncate_inode_partial_folio(), where folio->mapping is not
>> NULL.
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> ---
>> include/linux/huge_mm.h | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 1d439de1ca2c..0d55354e3a34 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -407,6 +407,13 @@ static inline int split_huge_page_to_order(struct page *page, unsigned int new_o
>> static inline int try_folio_split_to_order(struct folio *folio,
>> struct page *page, unsigned int new_order)
>> {
>> + /*
>> + * Folios that just got truncated cannot get split. Signal to the
>> + * caller that there was a race.
>> + */
>> + if (!folio_test_anon(folio) && !folio->mapping)
>> + return -EBUSY;
>> +
>> if (!folio_split_supported(folio, new_order, SPLIT_TYPE_NON_UNIFORM, /* warns= */ false))
>> return split_huge_page_to_order(&folio->page, new_order);
>> return folio_split(folio, new_order, page, NULL);
>
> I guess we'll take the one from Wei
>
> https://lkml.kernel.org/r/20251119235302.24773-1-richard.weiyang@gmail.com
>
> right?
This is different. Wei’s fix is to __folio_split(), but mine is to
try_folio_split_to_order(). Both call folio_split_supported(), thus
both need the folio->mapping check.
That is also my question in the cover letter on whether we should
move folio->mapping check to folio_split_supported() and return
error code instead of bool. Otherwise, any folio_split_supported()
caller needs to check folio->mapping.
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 1/3] mm/huge_memory: prevent NULL pointer dereference in try_folio_split_to_order()
2025-11-20 4:28 ` Balbir Singh
@ 2025-11-20 14:45 ` Zi Yan
0 siblings, 0 replies; 20+ messages in thread
From: Zi Yan @ 2025-11-20 14:45 UTC (permalink / raw)
To: Balbir Singh
Cc: David Hildenbrand, Lorenzo Stoakes, Andrew Morton, Baolin Wang,
Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, Miaohe Lin, Naoya Horiguchi, linux-mm, linux-kernel
On 19 Nov 2025, at 23:28, Balbir Singh wrote:
> On 11/20/25 14:59, Zi Yan wrote:
>> folio_split_supported() used in try_folio_split_to_order() requires
>> folio->mapping to be non NULL, but current try_folio_split_to_order() does
>> not check it. Add the check to prevent NULL pointer dereference.
>>
>> There is no issue in the current code, since try_folio_split_to_order() is
>> only used in truncate_inode_partial_folio(), where folio->mapping is not
>> NULL.
>>
>
> Just reading through the description does not clarify one thing
>
> What is the race between just truncated and trying to split them - is there a common lock
> that needs to be held? Is it the subsequent call in truncate_inode_partial_folio()
> that causes the race?
>
> IOW, if a folio is not anonymous and does not have a mapping, how is it
> being passed to this function?
Quote David’s explanation[1] (note: shmem is not anonymous):
vmscan triggers shmem_writeout() after unmapping the folio and after making
sure that there are no unexpected folio references.
shmem_writeout() will do the shmem_delete_from_page_cache() where we set
folio->mapping = NULL.
So anything walking the page tables (like s390x) could never find it.
Such shmem folios really cannot get split right now until we either
reclaimed them (-> freed) or until shmem_swapin_folio() re-obtained them
from the swapcache to re-add them to the swapcache through
shmem_add_to_page_cache().
[1] https://lore.kernel.org/all/14253d62-0a85-4f61-aed6-72da17bcef77@kernel.org/
>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> ---
>> include/linux/huge_mm.h | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 1d439de1ca2c..0d55354e3a34 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -407,6 +407,13 @@ static inline int split_huge_page_to_order(struct page *page, unsigned int new_o
>> static inline int try_folio_split_to_order(struct folio *folio,
>> struct page *page, unsigned int new_order)
>> {
>> + /*
>> + * Folios that just got truncated cannot get split. Signal to the
>> + * caller that there was a race.
>> + */
>> + if (!folio_test_anon(folio) && !folio->mapping)
>> + return -EBUSY;
>> +
>> if (!folio_split_supported(folio, new_order, SPLIT_TYPE_NON_UNIFORM, /* warns= */ false))
>> return split_huge_page_to_order(&folio->page, new_order);
>> return folio_split(folio, new_order, page, NULL);
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 2/3] mm/huge_memory: add kernel-doc for folio_split_supported()
2025-11-20 9:27 ` David Hildenbrand (Red Hat)
@ 2025-11-20 14:48 ` Zi Yan
2025-11-20 20:01 ` David Hildenbrand (Red Hat)
0 siblings, 1 reply; 20+ messages in thread
From: Zi Yan @ 2025-11-20 14:48 UTC (permalink / raw)
To: David Hildenbrand (Red Hat)
Cc: Lorenzo Stoakes, Andrew Morton, Baolin Wang, Liam R. Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
Miaohe Lin, Naoya Horiguchi, linux-mm, linux-kernel
On 20 Nov 2025, at 4:27, David Hildenbrand (Red Hat) wrote:
> On 11/20/25 04:59, Zi Yan wrote:
>> It clarifies that folio_split_supported() does not check folio->mapping and
>> can dereference it.
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> ---
>> mm/huge_memory.c | 17 +++++++++++++++++
>> 1 file changed, 17 insertions(+)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index efea42d68157..15e555f1b85d 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -3688,6 +3688,23 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
>> return 0;
>> }
>> +/**
>> + * folio_split_supported() - check if a folio can be split to a given order
>> + * @folio: folio to be split
>> + * @new_order: the smallest order of the after split folios (since buddy
>> + * allocator like split generates folios with orders from @folio's
>> + * order - 1 to new_order).
>> + * @split_type: uniform or non-uniform split
>> + * @warns: whether gives warnings or not for the checks in the function
>> + *
>> + * folio_split_supported() checks if @folio can be split to @new_order using
>> + * @split_type method.
>> + *
>> + * Context: Caller must make sure folio->mapping is not NULL, since the
>> + * function does not check it and can dereference folio->mapping
>
> Only for anon folios. Also, I would drop the detail about dereference.
OK.
>
> I guess we really need the folio lock to prevent concurrent truncation.
>
> Maybe something like:
>
> "The folio must be locked. For non-anon folios, the caller must make sure that folio->mapping is not NULL (e.g., not truncated)."
Sure. Do you think it is worth adding VM_WARN_ONCE_ON(!folio_test_locked);
and VM_WARN_ONCE_ON(!folio->mapping); ?
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 3/3] mm/memory-failure: handle min_order_for_split() error code properly
2025-11-20 9:37 ` David Hildenbrand (Red Hat)
@ 2025-11-20 14:59 ` Zi Yan
0 siblings, 0 replies; 20+ messages in thread
From: Zi Yan @ 2025-11-20 14:59 UTC (permalink / raw)
To: David Hildenbrand (Red Hat)
Cc: Lorenzo Stoakes, Andrew Morton, Baolin Wang, Liam R. Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
Miaohe Lin, Naoya Horiguchi, linux-mm, linux-kernel
On 20 Nov 2025, at 4:37, David Hildenbrand (Red Hat) wrote:
> On 11/20/25 04:59, Zi Yan wrote:
>> min_order_for_split() returns -EBUSY when the folio is truncated and cannot
>> be split. In commit 77008e1b2ef7 ("mm/huge_memory: do not change
>> split_huge_page*() target order silently"), memory_failure() does not
>> handle it and pass -EBUSY to try_to_split_thp_page() directly.
>> try_to_split_thp_page() returns -EINVAL since -EBUSY becomes 0xfffffff0 as
>
> I'm wondering whether we should change min_order_for_split() to something like:
>
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 7c69572b6c3f5..34eb6fec9a059 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -4210,16 +4210,19 @@ int folio_split(struct folio *folio, unsigned int new_order,
> SPLIT_TYPE_NON_UNIFORM);
> }
> -int min_order_for_split(struct folio *folio)
> +unsigned int min_order_for_split(struct folio *folio)
> {
> if (folio_test_anon(folio))
> return 0;
> + /*
> + * If the folio got truncated, we don't know the previous mapping and
> + * consequently the old min order. But it doesn't matter, as any split
> + * attempt will immediately fail with -EBUSY as the folio cannot get
> + * split until freed.
> + */
> if (!folio->mapping) {
> - if (folio_test_pmd_mappable(folio))
> - count_vm_event(THP_SPLIT_PAGE_FAILED);
> - return -EBUSY;
> - }
> + return 0;
> return mapping_min_folio_order(folio->mapping);
> }
I thought about it. My concern was that what if the returned order is not
immediately used for split, maybe for some calculation. I might think too much.
Your approach is much simpler.
I am also going to add a kernel-doc and change the return type to unsigned int:
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 0d55354e3a34..e0731e01df27 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -373,7 +373,7 @@ bool can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins);
int __split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
unsigned int new_order);
int folio_split_unmapped(struct folio *folio, unsigned int new_order);
-int min_order_for_split(struct folio *folio);
+unsigned int min_order_for_split(struct folio *folio);
int split_folio_to_list(struct folio *folio, struct list_head *list);
bool folio_split_supported(struct folio *folio, unsigned int new_order,
enum split_type split_type, bool warns);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 23239c19b36e..f45560b53210 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -4238,7 +4238,18 @@ int folio_split(struct folio *folio, unsigned int new_order,
SPLIT_TYPE_NON_UNIFORM);
}
-int min_order_for_split(struct folio *folio)
+/**
+ * min_order_for_split() - get the minimum order @folio can be split to
+ * @folio: folio to split
+ *
+ * min_order_for_split() tells the minimum order @folio can be split to.
+ * Anonymous folios can be split to order 0, file-backed folios might have
+ * limitations at file system level. If the folio is truncated, 0 will be
+ * returned and any split attempt will get -EBUSY.
+ *
+ * Return: @folio's minimum order
+ */
+unsigned int min_order_for_split(struct folio *folio)
{
if (folio_test_anon(folio))
return 0;
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 3/3] mm/memory-failure: handle min_order_for_split() error code properly
2025-11-20 4:45 ` Balbir Singh
@ 2025-11-20 15:00 ` Zi Yan
0 siblings, 0 replies; 20+ messages in thread
From: Zi Yan @ 2025-11-20 15:00 UTC (permalink / raw)
To: Balbir Singh
Cc: David Hildenbrand, Lorenzo Stoakes, Andrew Morton, Baolin Wang,
Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, Miaohe Lin, Naoya Horiguchi, linux-mm, linux-kernel
On 19 Nov 2025, at 23:45, Balbir Singh wrote:
> On 11/20/25 14:59, Zi Yan wrote:
>> min_order_for_split() returns -EBUSY when the folio is truncated and cannot
>> be split. In commit 77008e1b2ef7 ("mm/huge_memory: do not change
>> split_huge_page*() target order silently"), memory_failure() does not
>> handle it and pass -EBUSY to try_to_split_thp_page() directly.
>> try_to_split_thp_page() returns -EINVAL since -EBUSY becomes 0xfffffff0 as
>> new_order is unsigned int in __folio_split() and this large new_order is
>> rejected as an invalid input. The code does not cause a bug.
>> soft_offline_in_use_page() also uses min_order_for_split() but it always
>> passes 0 as new_order for split.
>>
>> Handle it properly by checking min_order_for_split() return value and not
>> calling try_to_split_thp_page() if the value is negative. Add a comment
>> in soft_offline_in_use_page() to clarify the possible negative new_order
>> value.
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> ---
>> mm/memory-failure.c | 8 ++++++--
>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 7f908ad795ad..86582f030159 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -2437,8 +2437,11 @@ int memory_failure(unsigned long pfn, int flags)
>> * or unhandlable page. The refcount is bumped iff the
>> * page is a valid handlable page.
>> */
>> - folio_set_has_hwpoisoned(folio);
>> - err = try_to_split_thp_page(p, new_order, /* release= */ false);
>> + if (new_order >= 0) {
>> + folio_set_has_hwpoisoned(folio);
>
> if new_order < 0, do we skip setting hwpoisioned bit on the folio?
The bit should be set. Anyway, I am going to take David’s approach to
change min_order_for_split().
Thanks.
>
>> + err = try_to_split_thp_page(p, new_order, /* release= */ false);
>> + } else
>> + err = new_order;
>> /*
>> * If splitting a folio to order-0 fails, kill the process.
>> * Split the folio regardless to minimize unusable pages.
>> @@ -2779,6 +2782,7 @@ static int soft_offline_in_use_page(struct page *page)
>> /*
>> * If new_order (target split order) is not 0, do not split the
>> * folio at all to retain the still accessible large folio.
>> + * new_order can be -EBUSY, meaning the folio cannot be split.
>> * NOTE: if minimizing the number of soft offline pages is
>> * preferred, split it to non-zero new_order like it is done in
>> * memory_failure().
>
> Balbir
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 1/3] mm/huge_memory: prevent NULL pointer dereference in try_folio_split_to_order()
2025-11-20 14:41 ` Zi Yan
@ 2025-11-20 19:56 ` David Hildenbrand (Red Hat)
2025-11-21 16:41 ` Zi Yan
0 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-20 19:56 UTC (permalink / raw)
To: Zi Yan
Cc: Lorenzo Stoakes, Andrew Morton, Baolin Wang, Liam R. Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
Miaohe Lin, Naoya Horiguchi, linux-mm, linux-kernel
On 11/20/25 15:41, Zi Yan wrote:
> On 20 Nov 2025, at 4:25, David Hildenbrand (Red Hat) wrote:
>
>> On 11/20/25 04:59, Zi Yan wrote:
>>> folio_split_supported() used in try_folio_split_to_order() requires
>>> folio->mapping to be non NULL, but current try_folio_split_to_order() does
>>> not check it. Add the check to prevent NULL pointer dereference.
>>>
>>> There is no issue in the current code, since try_folio_split_to_order() is
>>> only used in truncate_inode_partial_folio(), where folio->mapping is not
>>> NULL.
>>>
>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>> ---
>>> include/linux/huge_mm.h | 7 +++++++
>>> 1 file changed, 7 insertions(+)
>>>
>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>> index 1d439de1ca2c..0d55354e3a34 100644
>>> --- a/include/linux/huge_mm.h
>>> +++ b/include/linux/huge_mm.h
>>> @@ -407,6 +407,13 @@ static inline int split_huge_page_to_order(struct page *page, unsigned int new_o
>>> static inline int try_folio_split_to_order(struct folio *folio,
>>> struct page *page, unsigned int new_order)
>>> {
>>> + /*
>>> + * Folios that just got truncated cannot get split. Signal to the
>>> + * caller that there was a race.
>>> + */
>>> + if (!folio_test_anon(folio) && !folio->mapping)
>>> + return -EBUSY;
>>> +
>>> if (!folio_split_supported(folio, new_order, SPLIT_TYPE_NON_UNIFORM, /* warns= */ false))
>>> return split_huge_page_to_order(&folio->page, new_order);
>>> return folio_split(folio, new_order, page, NULL);
>>
>> I guess we'll take the one from Wei
>>
>> https://lkml.kernel.org/r/20251119235302.24773-1-richard.weiyang@gmail.com
>>
>> right?
>
> This is different. Wei’s fix is to __folio_split(), but mine is to
> try_folio_split_to_order(). Both call folio_split_supported(), thus
> both need the folio->mapping check.
Ah, good that I double-checked :)
>
> That is also my question in the cover letter on whether we should
> move folio->mapping check to folio_split_supported() and return
> error code instead of bool. Otherwise, any folio_split_supported()
> caller needs to check folio->mapping.
I think the situation with truncation (-that shmem swapcache thing,
let's ignore that for now) is that the folio cannot be split until fully
freed. But we don't want to return -EINVAL to the caller, the assumption
is that the folio will soon get resolved -- folio freed -- and the
caller will be able to make progress. So it's not really expected to be
persistent.
-EINVAL rather signals "this cannot possibly work, so fail whatever you
are trying".
We rather want to indicate "there was some race situation, if you try
again later it might work or might have resolved itself".
Not sure I like returning an error from folio_split_supported(), as it's
rather a boolean check (supported vs. not supported).
Likely we could just return "false" for truncated folios in
folio_split_supported(), but then state that that case must be handled
upfront.
We could provide another helper to wrap the truncation check, hmmm
BTW, I wonder if the is_huge_zero_folio() check should go into
folio_split_supported() and just return in -EINVAL. (we shouldn't really
trigger that). Similarly we could add a hugetlb sanity check.
--
Cheers
David
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 2/3] mm/huge_memory: add kernel-doc for folio_split_supported()
2025-11-20 14:48 ` Zi Yan
@ 2025-11-20 20:01 ` David Hildenbrand (Red Hat)
0 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-20 20:01 UTC (permalink / raw)
To: Zi Yan
Cc: Lorenzo Stoakes, Andrew Morton, Baolin Wang, Liam R. Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
Miaohe Lin, Naoya Horiguchi, linux-mm, linux-kernel
On 11/20/25 15:48, Zi Yan wrote:
> On 20 Nov 2025, at 4:27, David Hildenbrand (Red Hat) wrote:
>
>> On 11/20/25 04:59, Zi Yan wrote:
>>> It clarifies that folio_split_supported() does not check folio->mapping and
>>> can dereference it.
>>>
>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>> ---
>>> mm/huge_memory.c | 17 +++++++++++++++++
>>> 1 file changed, 17 insertions(+)
>>>
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index efea42d68157..15e555f1b85d 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -3688,6 +3688,23 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
>>> return 0;
>>> }
>>> +/**
>>> + * folio_split_supported() - check if a folio can be split to a given order
>>> + * @folio: folio to be split
>>> + * @new_order: the smallest order of the after split folios (since buddy
>>> + * allocator like split generates folios with orders from @folio's
>>> + * order - 1 to new_order).
>>> + * @split_type: uniform or non-uniform split
>>> + * @warns: whether gives warnings or not for the checks in the function
>>> + *
>>> + * folio_split_supported() checks if @folio can be split to @new_order using
>>> + * @split_type method.
>>> + *
>>> + * Context: Caller must make sure folio->mapping is not NULL, since the
>>> + * function does not check it and can dereference folio->mapping
>>
>> Only for anon folios. Also, I would drop the detail about dereference.
>
> OK.
>
>>
>> I guess we really need the folio lock to prevent concurrent truncation.
>>
>> Maybe something like:
>>
>> "The folio must be locked. For non-anon folios, the caller must make sure that folio->mapping is not NULL (e.g., not truncated)."
>
> Sure. Do you think it is worth adding VM_WARN_ONCE_ON(!folio_test_locked);
> and VM_WARN_ONCE_ON(!folio->mapping); ?
Makes sense. Or we allow !folio->mapping, return false and do something
like the following. Still wondering how we could handle that case better.
if (!folio_split_supported(folio)) {
if (folio_split_temporarily_unsupported(folio))
return -EBUSY;
return -EINVAL;
}
hmmmm
--
Cheers
David
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 1/3] mm/huge_memory: prevent NULL pointer dereference in try_folio_split_to_order()
2025-11-20 19:56 ` David Hildenbrand (Red Hat)
@ 2025-11-21 16:41 ` Zi Yan
2025-11-21 17:09 ` David Hildenbrand (Red Hat)
0 siblings, 1 reply; 20+ messages in thread
From: Zi Yan @ 2025-11-21 16:41 UTC (permalink / raw)
To: David Hildenbrand (Red Hat)
Cc: Lorenzo Stoakes, Andrew Morton, Baolin Wang, Liam R. Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
Miaohe Lin, Naoya Horiguchi, linux-mm, linux-kernel
On 20 Nov 2025, at 14:56, David Hildenbrand (Red Hat) wrote:
> On 11/20/25 15:41, Zi Yan wrote:
>> On 20 Nov 2025, at 4:25, David Hildenbrand (Red Hat) wrote:
>>
>>> On 11/20/25 04:59, Zi Yan wrote:
>>>> folio_split_supported() used in try_folio_split_to_order() requires
>>>> folio->mapping to be non NULL, but current try_folio_split_to_order() does
>>>> not check it. Add the check to prevent NULL pointer dereference.
>>>>
>>>> There is no issue in the current code, since try_folio_split_to_order() is
>>>> only used in truncate_inode_partial_folio(), where folio->mapping is not
>>>> NULL.
>>>>
>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>> ---
>>>> include/linux/huge_mm.h | 7 +++++++
>>>> 1 file changed, 7 insertions(+)
>>>>
>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>>> index 1d439de1ca2c..0d55354e3a34 100644
>>>> --- a/include/linux/huge_mm.h
>>>> +++ b/include/linux/huge_mm.h
>>>> @@ -407,6 +407,13 @@ static inline int split_huge_page_to_order(struct page *page, unsigned int new_o
>>>> static inline int try_folio_split_to_order(struct folio *folio,
>>>> struct page *page, unsigned int new_order)
>>>> {
>>>> + /*
>>>> + * Folios that just got truncated cannot get split. Signal to the
>>>> + * caller that there was a race.
>>>> + */
>>>> + if (!folio_test_anon(folio) && !folio->mapping)
>>>> + return -EBUSY;
>>>> +
>>>> if (!folio_split_supported(folio, new_order, SPLIT_TYPE_NON_UNIFORM, /* warns= */ false))
>>>> return split_huge_page_to_order(&folio->page, new_order);
>>>> return folio_split(folio, new_order, page, NULL);
>>>
>>> I guess we'll take the one from Wei
>>>
>>> https://lkml.kernel.org/r/20251119235302.24773-1-richard.weiyang@gmail.com
>>>
>>> right?
>>
>> This is different. Wei’s fix is to __folio_split(), but mine is to
>> try_folio_split_to_order(). Both call folio_split_supported(), thus
>> both need the folio->mapping check.
>
> Ah, good that I double-checked :)
>
>>
>> That is also my question in the cover letter on whether we should
>> move folio->mapping check to folio_split_supported() and return
>> error code instead of bool. Otherwise, any folio_split_supported()
>> caller needs to check folio->mapping.
>
> I think the situation with truncation (-that shmem swapcache thing, let's ignore that for now) is that the folio cannot be split until fully freed. But we don't want to return -EINVAL to the caller, the assumption is that the folio will soon get resolved -- folio freed -- and the caller will be able to make progress. So it's not really expected to be persistent.
>
> -EINVAL rather signals "this cannot possibly work, so fail whatever you are trying".
>
> We rather want to indicate "there was some race situation, if you try again later it might work or might have resolved itself".
>
> Not sure I like returning an error from folio_split_supported(), as it's rather a boolean check (supported vs. not supported).
Right. My current idea (from the cover letter) is to rename it to
folio_split_can_split (or folio_split_check, so that it does not sound like
a bool return is needed).
>
> Likely we could just return "false" for truncated folios in folio_split_supported(), but then state that that case must be handled upfront.
>
> We could provide another helper to wrap the truncation check, hmmm
Yeah, that sounds complicated too. Putting this truncated folio check outside
of folio_split_supported() looks really error prone and anonying.
>
>
> BTW, I wonder if the is_huge_zero_folio() check should go into folio_split_supported() and just return in -EINVAL. (we shouldn't really trigger that). Similarly we could add a hugetlb sanity check.
Yeah, is_huge_zero_folio() should return -EINVAL not -EBUSY, except
the case the split happens before a process writes 0 to a zero large folio
and gets a new writable large folio, in which we can kinda say it looks like
-EBUSY. But it is still a stretch.
Ack on adding hugetlb sanity check.
OK, just to reiterate my above idea on renaming folio_split_supported().
Are you OK with renaming it to folio_split_check(), so that returning -EBUSY
and -EINVAL looks more reasonable? The benefit is that we no longer need
to worry about we need to always do folio->mapping check before
folio_split_supported(). (In addition, I would rename can_split_folio()
to folio_split_refcount_check() for clarification)
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 1/3] mm/huge_memory: prevent NULL pointer dereference in try_folio_split_to_order()
2025-11-21 16:41 ` Zi Yan
@ 2025-11-21 17:09 ` David Hildenbrand (Red Hat)
2025-11-21 17:24 ` Zi Yan
0 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-21 17:09 UTC (permalink / raw)
To: Zi Yan
Cc: Lorenzo Stoakes, Andrew Morton, Baolin Wang, Liam R. Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
Miaohe Lin, Naoya Horiguchi, linux-mm, linux-kernel
>>
>> BTW, I wonder if the is_huge_zero_folio() check should go into folio_split_supported() and just return in -EINVAL. (we shouldn't really trigger that). Similarly we could add a hugetlb sanity check.
>
> Yeah, is_huge_zero_folio() should return -EINVAL not -EBUSY, except
> the case the split happens before a process writes 0 to a zero large folio
> and gets a new writable large folio, in which we can kinda say it looks like
> -EBUSY. But it is still a stretch.
I see what you mean, but I think this has less to do with actual races.
SO yeah, -EINVAL is likely the tight thing.
>
> Ack on adding hugetlb sanity check.
>
> OK, just to reiterate my above idea on renaming folio_split_supported().
> Are you OK with renaming it to folio_split_check(), so that returning -EBUSY
> and -EINVAL looks more reasonable? The benefit is that we no longer need
> to worry about we need to always do folio->mapping check before
> folio_split_supported(). (In addition, I would rename can_split_folio()
> to folio_split_refcount_check() for clarification)
I guess having some function that tells you "I performed all checks I
could without taking locks/references (like anon_vma) and starting with
the real magic" is what you have in mind.
For these we don't have to prefix with "folio_split" if it sounds weird.
folio_check_splittable() ?
Regarding can_split_folio(), I was wondering whether we can just get rid
of it and use folio_expect_ref_count() instead?
For the two callers that need extra_pins, we could just have something
simple helper in huge_memory.c like
/* Number of folio references from the pagecache or the swapcache. */
unsigned int folio_cache_references(const struct folio *folio)
{
if (folio_test_anon(folio) && !folio_test_swapcache(folio))
return 0;
return folio_nr_pages(folio);
}
--
Cheers
David
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 1/3] mm/huge_memory: prevent NULL pointer dereference in try_folio_split_to_order()
2025-11-21 17:09 ` David Hildenbrand (Red Hat)
@ 2025-11-21 17:24 ` Zi Yan
0 siblings, 0 replies; 20+ messages in thread
From: Zi Yan @ 2025-11-21 17:24 UTC (permalink / raw)
To: David Hildenbrand (Red Hat)
Cc: Lorenzo Stoakes, Andrew Morton, Baolin Wang, Liam R. Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
Miaohe Lin, Naoya Horiguchi, linux-mm, linux-kernel
On 21 Nov 2025, at 12:09, David Hildenbrand (Red Hat) wrote:
>>>
>>> BTW, I wonder if the is_huge_zero_folio() check should go into folio_split_supported() and just return in -EINVAL. (we shouldn't really trigger that). Similarly we could add a hugetlb sanity check.
>>
>> Yeah, is_huge_zero_folio() should return -EINVAL not -EBUSY, except
>> the case the split happens before a process writes 0 to a zero large folio
>> and gets a new writable large folio, in which we can kinda say it looks like
>> -EBUSY. But it is still a stretch.
>
> I see what you mean, but I think this has less to do with actual races. SO yeah, -EINVAL is likely the tight thing.
>
Sure. Will move it and use -EINVAL.
>>
>> Ack on adding hugetlb sanity check.
>>
>> OK, just to reiterate my above idea on renaming folio_split_supported().
>> Are you OK with renaming it to folio_split_check(), so that returning -EBUSY
>> and -EINVAL looks more reasonable? The benefit is that we no longer need
>> to worry about we need to always do folio->mapping check before
>> folio_split_supported(). (In addition, I would rename can_split_folio()
>> to folio_split_refcount_check() for clarification)
> I guess having some function that tells you "I performed all checks I could without taking locks/references (like anon_vma) and starting with the real magic" is what you have in mind.
Yes.
>
> For these we don't have to prefix with "folio_split" if it sounds weird.
>
> folio_check_splittable() ?
Sounds good to me.
>
> Regarding can_split_folio(), I was wondering whether we can just get rid of it and use folio_expect_ref_count() instead?
>
> For the two callers that need extra_pins, we could just have something simple helper in huge_memory.c like
>
> /* Number of folio references from the pagecache or the swapcache. */
> unsigned int folio_cache_references(const struct folio *folio)
> {
> if (folio_test_anon(folio) && !folio_test_swapcache(folio))
> return 0;
> return folio_nr_pages(folio);
> }
OK, I will give this a try in a separate patch in an updated version of this
series.
Thank you for the feedback.
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2025-11-21 17:25 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-20 3:59 [RFC PATCH 0/3] folio->mapping == NULL check issue Zi Yan
2025-11-20 3:59 ` [RFC PATCH 1/3] mm/huge_memory: prevent NULL pointer dereference in try_folio_split_to_order() Zi Yan
2025-11-20 4:28 ` Balbir Singh
2025-11-20 14:45 ` Zi Yan
2025-11-20 9:25 ` David Hildenbrand (Red Hat)
2025-11-20 14:41 ` Zi Yan
2025-11-20 19:56 ` David Hildenbrand (Red Hat)
2025-11-21 16:41 ` Zi Yan
2025-11-21 17:09 ` David Hildenbrand (Red Hat)
2025-11-21 17:24 ` Zi Yan
2025-11-20 3:59 ` [RFC PATCH 2/3] mm/huge_memory: add kernel-doc for folio_split_supported() Zi Yan
2025-11-20 4:37 ` Balbir Singh
2025-11-20 9:27 ` David Hildenbrand (Red Hat)
2025-11-20 14:48 ` Zi Yan
2025-11-20 20:01 ` David Hildenbrand (Red Hat)
2025-11-20 3:59 ` [RFC PATCH 3/3] mm/memory-failure: handle min_order_for_split() error code properly Zi Yan
2025-11-20 4:45 ` Balbir Singh
2025-11-20 15:00 ` Zi Yan
2025-11-20 9:37 ` David Hildenbrand (Red Hat)
2025-11-20 14:59 ` Zi Yan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox