* [PATCH 0/2] Do not change split folio target order
@ 2025-10-10 17:39 Zi Yan
2025-10-10 17:39 ` [PATCH 1/2] mm/huge_memory: do not change split_huge_page*() target order silently Zi Yan
2025-10-10 17:39 ` [PATCH 2/2] mm/memory-failure: improve large block size folio handling Zi Yan
0 siblings, 2 replies; 22+ messages in thread
From: Zi Yan @ 2025-10-10 17:39 UTC (permalink / raw)
To: linmiaohe, david, jane.chu, kernel, syzbot+e6367ea2fdab6ed46056,
syzkaller-bugs
Cc: ziy, akpm, mcgrof, nao.horiguchi, Lorenzo Stoakes, Baolin Wang,
Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, Matthew Wilcox (Oracle),
linux-fsdevel, linux-kernel, linux-mm
Hi all,
Currently, huge page and large folio split APIs would bump new order
when the target folio has min_order_for_split() > 0 and return
success if split succeeds. When callers expect after-split folios to be
order-0, the actual ones are not. The callers might not be able to
handle them, since they call huge page and large folio split APIs to get
order-0 folios. This issue appears in a recent report on
memory_failure()[1], where memory_failure() used split_huge_page() to split
a large forlio to order-0, but after a successful split got non order-0
folios. Because memory_failure() can only handle order-0 folios, this
caused a WARNING.
Fix the issue by not changing split target order and failing the
split if min_order_for_split() is greater than the target order.
In addition, to avoid wasting memory in memory failure handling, a second
patch is added to always split a large folio to min_order_for_split()
even if it is not 0, so that folios not containing the poisoned page can
be freed for reuse. For soft offline, since the folio is still accessible,
do not split if min_order_for_split() is not zero to avoid potential
performance loss.
[1] https://lore.kernel.org/all/68d2c943.a70a0220.1b52b.02b3.GAE@google.com/
Zi Yan (2):
mm/huge_memory: do not change split_huge_page*() target order
silently.
mm/memory-failure: improve large block size folio handling.
include/linux/huge_mm.h | 28 +++++-----------------------
mm/huge_memory.c | 9 +--------
mm/memory-failure.c | 25 +++++++++++++++++++++----
mm/truncate.c | 6 ++++--
4 files changed, 31 insertions(+), 37 deletions(-)
--
2.51.0
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 1/2] mm/huge_memory: do not change split_huge_page*() target order silently.
2025-10-10 17:39 [PATCH 0/2] Do not change split folio target order Zi Yan
@ 2025-10-10 17:39 ` Zi Yan
2025-10-10 18:02 ` Luis Chamberlain
` (5 more replies)
2025-10-10 17:39 ` [PATCH 2/2] mm/memory-failure: improve large block size folio handling Zi Yan
1 sibling, 6 replies; 22+ messages in thread
From: Zi Yan @ 2025-10-10 17:39 UTC (permalink / raw)
To: linmiaohe, david, jane.chu, kernel, syzbot+e6367ea2fdab6ed46056,
syzkaller-bugs
Cc: ziy, akpm, mcgrof, nao.horiguchi, Lorenzo Stoakes, Baolin Wang,
Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, Matthew Wilcox (Oracle),
linux-fsdevel, linux-kernel, linux-mm
Page cache folios from a file system that support large block size (LBS)
can have minimal folio order greater than 0, thus a high order folio might
not be able to be split down to order-0. Commit e220917fa507 ("mm: split a
folio in minimum folio order chunks") bumps the target order of
split_huge_page*() to the minimum allowed order when splitting a LBS folio.
This causes confusion for some split_huge_page*() callers like memory
failure handling code, since they expect after-split folios all have
order-0 when split succeeds but in really get min_order_for_split() order
folios.
Fix it by failing a split if the folio cannot be split to the target order.
Fixes: e220917fa507 ("mm: split a folio in minimum folio order chunks")
[The test poisons LBS folios, which cannot be split to order-0 folios, and
also tries to poison all memory. The non split LBS folios take more memory
than the test anticipated, leading to OOM. The patch fixed the kernel
warning and the test needs some change to avoid OOM.]
Reported-by: syzbot+e6367ea2fdab6ed46056@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/68d2c943.a70a0220.1b52b.02b3.GAE@google.com/
Signed-off-by: Zi Yan <ziy@nvidia.com>
---
include/linux/huge_mm.h | 28 +++++-----------------------
mm/huge_memory.c | 9 +--------
mm/truncate.c | 6 ++++--
3 files changed, 10 insertions(+), 33 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 8eec7a2a977b..9950cda1526a 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -394,34 +394,16 @@ static inline int split_huge_page_to_list_to_order(struct page *page, struct lis
* Return: 0: split is successful, otherwise split failed.
*/
static inline int try_folio_split(struct folio *folio, struct page *page,
- struct list_head *list)
+ struct list_head *list, unsigned int order)
{
- int ret = min_order_for_split(folio);
-
- if (ret < 0)
- return ret;
-
- if (!non_uniform_split_supported(folio, 0, false))
+ if (!non_uniform_split_supported(folio, order, false))
return split_huge_page_to_list_to_order(&folio->page, list,
- ret);
- return folio_split(folio, ret, page, list);
+ order);
+ return folio_split(folio, order, page, list);
}
static inline int split_huge_page(struct page *page)
{
- struct folio *folio = page_folio(page);
- int ret = min_order_for_split(folio);
-
- if (ret < 0)
- return ret;
-
- /*
- * split_huge_page() locks the page before splitting and
- * expects the same page that has been split to be locked when
- * returned. split_folio(page_folio(page)) cannot be used here
- * because it converts the page to folio and passes the head
- * page to be split.
- */
- return split_huge_page_to_list_to_order(page, NULL, ret);
+ return split_huge_page_to_list_to_order(page, NULL, 0);
}
void deferred_split_folio(struct folio *folio, bool partially_mapped);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 0fb4af604657..af06ee6d2206 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3829,8 +3829,6 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
min_order = mapping_min_folio_order(folio->mapping);
if (new_order < min_order) {
- VM_WARN_ONCE(1, "Cannot split mapped folio below min-order: %u",
- min_order);
ret = -EINVAL;
goto out;
}
@@ -4173,12 +4171,7 @@ int min_order_for_split(struct folio *folio)
int split_folio_to_list(struct folio *folio, struct list_head *list)
{
- int ret = min_order_for_split(folio);
-
- if (ret < 0)
- return ret;
-
- return split_huge_page_to_list_to_order(&folio->page, list, ret);
+ return split_huge_page_to_list_to_order(&folio->page, list, 0);
}
/*
diff --git a/mm/truncate.c b/mm/truncate.c
index 91eb92a5ce4f..1c15149ae8e9 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -194,6 +194,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
size_t size = folio_size(folio);
unsigned int offset, length;
struct page *split_at, *split_at2;
+ unsigned int min_order;
if (pos < start)
offset = start - pos;
@@ -223,8 +224,9 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
if (!folio_test_large(folio))
return true;
+ min_order = mapping_min_folio_order(folio->mapping);
split_at = folio_page(folio, PAGE_ALIGN_DOWN(offset) / PAGE_SIZE);
- if (!try_folio_split(folio, split_at, NULL)) {
+ if (!try_folio_split(folio, split_at, NULL, min_order)) {
/*
* try to split at offset + length to make sure folios within
* the range can be dropped, especially to avoid memory waste
@@ -254,7 +256,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
*/
if (folio_test_large(folio2) &&
folio2->mapping == folio->mapping)
- try_folio_split(folio2, split_at2, NULL);
+ try_folio_split(folio2, split_at2, NULL, min_order);
folio_unlock(folio2);
out:
--
2.51.0
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 2/2] mm/memory-failure: improve large block size folio handling.
2025-10-10 17:39 [PATCH 0/2] Do not change split folio target order Zi Yan
2025-10-10 17:39 ` [PATCH 1/2] mm/huge_memory: do not change split_huge_page*() target order silently Zi Yan
@ 2025-10-10 17:39 ` Zi Yan
2025-10-10 18:05 ` Luis Chamberlain
` (2 more replies)
1 sibling, 3 replies; 22+ messages in thread
From: Zi Yan @ 2025-10-10 17:39 UTC (permalink / raw)
To: linmiaohe, david, jane.chu, kernel, syzbot+e6367ea2fdab6ed46056,
syzkaller-bugs
Cc: ziy, akpm, mcgrof, nao.horiguchi, Lorenzo Stoakes, Baolin Wang,
Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, Matthew Wilcox (Oracle),
linux-fsdevel, linux-kernel, linux-mm
Large block size (LBS) folios cannot be split to order-0 folios but
min_order_for_folio(). Current split fails directly, but that is not
optimal. Split the folio to min_order_for_folio(), so that, after split,
only the folio containing the poisoned page becomes unusable instead.
For soft offline, do not split the large folio if it cannot be split to
order-0. Since the folio is still accessible from userspace and premature
split might lead to potential performance loss.
Suggested-by: Jane Chu <jane.chu@oracle.com>
Signed-off-by: Zi Yan <ziy@nvidia.com>
---
mm/memory-failure.c | 25 +++++++++++++++++++++----
1 file changed, 21 insertions(+), 4 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index f698df156bf8..443df9581c24 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1656,12 +1656,13 @@ static int identify_page_state(unsigned long pfn, struct page *p,
* there is still more to do, hence the page refcount we took earlier
* is still needed.
*/
-static int try_to_split_thp_page(struct page *page, bool release)
+static int try_to_split_thp_page(struct page *page, unsigned int new_order,
+ bool release)
{
int ret;
lock_page(page);
- ret = split_huge_page(page);
+ ret = split_huge_page_to_list_to_order(page, NULL, new_order);
unlock_page(page);
if (ret && release)
@@ -2280,6 +2281,7 @@ int memory_failure(unsigned long pfn, int flags)
folio_unlock(folio);
if (folio_test_large(folio)) {
+ int new_order = min_order_for_split(folio);
/*
* The flag must be set after the refcount is bumped
* otherwise it may race with THP split.
@@ -2294,7 +2296,14 @@ int memory_failure(unsigned long pfn, int flags)
* page is a valid handlable page.
*/
folio_set_has_hwpoisoned(folio);
- if (try_to_split_thp_page(p, false) < 0) {
+ /*
+ * If the folio cannot be split to order-0, kill the process,
+ * but split the folio anyway to minimize the amount of unusable
+ * pages.
+ */
+ if (try_to_split_thp_page(p, new_order, false) || new_order) {
+ /* get folio again in case the original one is split */
+ folio = page_folio(p);
res = -EHWPOISON;
kill_procs_now(p, pfn, flags, folio);
put_page(p);
@@ -2621,7 +2630,15 @@ static int soft_offline_in_use_page(struct page *page)
};
if (!huge && folio_test_large(folio)) {
- if (try_to_split_thp_page(page, true)) {
+ int new_order = min_order_for_split(folio);
+
+ /*
+ * If the folio cannot be split to order-0, do not split it at
+ * all to retain the still accessible large folio.
+ * NOTE: if getting free memory is perferred, split it like it
+ * is done in memory_failure().
+ */
+ if (new_order || try_to_split_thp_page(page, new_order, true)) {
pr_info("%#lx: thp split failed\n", pfn);
return -EBUSY;
}
--
2.51.0
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] mm/huge_memory: do not change split_huge_page*() target order silently.
2025-10-10 17:39 ` [PATCH 1/2] mm/huge_memory: do not change split_huge_page*() target order silently Zi Yan
@ 2025-10-10 18:02 ` Luis Chamberlain
2025-10-13 17:11 ` Zi Yan
2025-10-11 2:25 ` Lance Yang
` (4 subsequent siblings)
5 siblings, 1 reply; 22+ messages in thread
From: Luis Chamberlain @ 2025-10-10 18:02 UTC (permalink / raw)
To: Zi Yan
Cc: linmiaohe, david, jane.chu, kernel, syzbot+e6367ea2fdab6ed46056,
syzkaller-bugs, akpm, nao.horiguchi, Lorenzo Stoakes,
Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
Barry Song, Lance Yang, Matthew Wilcox (Oracle),
linux-fsdevel, linux-kernel, linux-mm
On Fri, Oct 10, 2025 at 01:39:05PM -0400, Zi Yan wrote:
> Page cache folios from a file system that support large block size (LBS)
> can have minimal folio order greater than 0, thus a high order folio might
> not be able to be split down to order-0. Commit e220917fa507 ("mm: split a
> folio in minimum folio order chunks") bumps the target order of
> split_huge_page*() to the minimum allowed order when splitting a LBS folio.
> This causes confusion for some split_huge_page*() callers like memory
> failure handling code, since they expect after-split folios all have
> order-0 when split succeeds but in really get min_order_for_split() order
> folios.
>
> Fix it by failing a split if the folio cannot be split to the target order.
>
> Fixes: e220917fa507 ("mm: split a folio in minimum folio order chunks")
> [The test poisons LBS folios, which cannot be split to order-0 folios, and
> also tries to poison all memory. The non split LBS folios take more memory
> than the test anticipated, leading to OOM. The patch fixed the kernel
> warning and the test needs some change to avoid OOM.]
> Reported-by: syzbot+e6367ea2fdab6ed46056@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/68d2c943.a70a0220.1b52b.02b3.GAE@google.com/
> Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Luis
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] mm/memory-failure: improve large block size folio handling.
2025-10-10 17:39 ` [PATCH 2/2] mm/memory-failure: improve large block size folio handling Zi Yan
@ 2025-10-10 18:05 ` Luis Chamberlain
2025-10-11 4:12 ` Miaohe Lin
2025-10-11 10:23 ` kernel test robot
2 siblings, 0 replies; 22+ messages in thread
From: Luis Chamberlain @ 2025-10-10 18:05 UTC (permalink / raw)
To: Zi Yan
Cc: linmiaohe, david, jane.chu, kernel, syzbot+e6367ea2fdab6ed46056,
syzkaller-bugs, akpm, nao.horiguchi, Lorenzo Stoakes,
Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
Barry Song, Lance Yang, Matthew Wilcox (Oracle),
linux-fsdevel, linux-kernel, linux-mm
On Fri, Oct 10, 2025 at 01:39:06PM -0400, Zi Yan wrote:
> Large block size (LBS) folios cannot be split to order-0 folios but
> min_order_for_folio(). Current split fails directly, but that is not
> optimal. Split the folio to min_order_for_folio(), so that, after split,
> only the folio containing the poisoned page becomes unusable instead.
>
> For soft offline, do not split the large folio if it cannot be split to
> order-0. Since the folio is still accessible from userspace and premature
> split might lead to potential performance loss.
>
> Suggested-by: Jane Chu <jane.chu@oracle.com>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Luis
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] mm/huge_memory: do not change split_huge_page*() target order silently.
2025-10-10 17:39 ` [PATCH 1/2] mm/huge_memory: do not change split_huge_page*() target order silently Zi Yan
2025-10-10 18:02 ` Luis Chamberlain
@ 2025-10-11 2:25 ` Lance Yang
2025-10-13 17:06 ` Zi Yan
2025-10-11 9:00 ` kernel test robot
` (3 subsequent siblings)
5 siblings, 1 reply; 22+ messages in thread
From: Lance Yang @ 2025-10-11 2:25 UTC (permalink / raw)
To: Zi Yan
Cc: akpm, syzkaller-bugs, mcgrof, nao.horiguchi, Lorenzo Stoakes,
kernel, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
jane.chu, Dev Jain, Barry Song, Matthew Wilcox (Oracle),
linux-fsdevel, david, linux-kernel, linux-mm, linmiaohe,
syzbot+e6367ea2fdab6ed46056
On 2025/10/11 01:39, Zi Yan wrote:
> Page cache folios from a file system that support large block size (LBS)
> can have minimal folio order greater than 0, thus a high order folio might
> not be able to be split down to order-0. Commit e220917fa507 ("mm: split a
> folio in minimum folio order chunks") bumps the target order of
> split_huge_page*() to the minimum allowed order when splitting a LBS folio.
> This causes confusion for some split_huge_page*() callers like memory
> failure handling code, since they expect after-split folios all have
> order-0 when split succeeds but in really get min_order_for_split() order
> folios.
>
> Fix it by failing a split if the folio cannot be split to the target order.
>
> Fixes: e220917fa507 ("mm: split a folio in minimum folio order chunks")
> [The test poisons LBS folios, which cannot be split to order-0 folios, and
> also tries to poison all memory. The non split LBS folios take more memory
> than the test anticipated, leading to OOM. The patch fixed the kernel
> warning and the test needs some change to avoid OOM.]
> Reported-by: syzbot+e6367ea2fdab6ed46056@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/68d2c943.a70a0220.1b52b.02b3.GAE@google.com/
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
> include/linux/huge_mm.h | 28 +++++-----------------------
> mm/huge_memory.c | 9 +--------
> mm/truncate.c | 6 ++++--
> 3 files changed, 10 insertions(+), 33 deletions(-)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 8eec7a2a977b..9950cda1526a 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -394,34 +394,16 @@ static inline int split_huge_page_to_list_to_order(struct page *page, struct lis
> * Return: 0: split is successful, otherwise split failed.
> */
> static inline int try_folio_split(struct folio *folio, struct page *page,
> - struct list_head *list)
> + struct list_head *list, unsigned int order)
Seems like we need to add the order parameter to the stub for
try_folio_split() as well?
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
...
#else /* CONFIG_TRANSPARENT_HUGEPAGE */
static inline int try_folio_split(struct folio *folio, struct page *page,
struct list_head *list)
{
VM_WARN_ON_ONCE_FOLIO(1, folio);
return -EINVAL;
}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
Cheers,
Lance
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] mm/memory-failure: improve large block size folio handling.
2025-10-10 17:39 ` [PATCH 2/2] mm/memory-failure: improve large block size folio handling Zi Yan
2025-10-10 18:05 ` Luis Chamberlain
@ 2025-10-11 4:12 ` Miaohe Lin
2025-10-11 5:00 ` Matthew Wilcox
2025-10-11 10:23 ` kernel test robot
2 siblings, 1 reply; 22+ messages in thread
From: Miaohe Lin @ 2025-10-11 4:12 UTC (permalink / raw)
To: Zi Yan
Cc: akpm, mcgrof, nao.horiguchi, Lorenzo Stoakes, Baolin Wang,
Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, Matthew Wilcox (Oracle),
linux-fsdevel, linux-kernel, linux-mm, david, jane.chu, kernel,
syzbot+e6367ea2fdab6ed46056, syzkaller-bugs
On 2025/10/11 1:39, Zi Yan wrote:
> Large block size (LBS) folios cannot be split to order-0 folios but
> min_order_for_folio(). Current split fails directly, but that is not
> optimal. Split the folio to min_order_for_folio(), so that, after split,
> only the folio containing the poisoned page becomes unusable instead.
>
> For soft offline, do not split the large folio if it cannot be split to
> order-0. Since the folio is still accessible from userspace and premature
> split might lead to potential performance loss.
Thanks for your patch.
>
> Suggested-by: Jane Chu <jane.chu@oracle.com>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
> mm/memory-failure.c | 25 +++++++++++++++++++++----
> 1 file changed, 21 insertions(+), 4 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index f698df156bf8..443df9581c24 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1656,12 +1656,13 @@ static int identify_page_state(unsigned long pfn, struct page *p,
> * there is still more to do, hence the page refcount we took earlier
> * is still needed.
> */
> -static int try_to_split_thp_page(struct page *page, bool release)
> +static int try_to_split_thp_page(struct page *page, unsigned int new_order,
> + bool release)
> {
> int ret;
>
> lock_page(page);
> - ret = split_huge_page(page);
> + ret = split_huge_page_to_list_to_order(page, NULL, new_order);
> unlock_page(page);
>
> if (ret && release)
> @@ -2280,6 +2281,7 @@ int memory_failure(unsigned long pfn, int flags)
> folio_unlock(folio);
>
> if (folio_test_large(folio)) {
> + int new_order = min_order_for_split(folio);
> /*
> * The flag must be set after the refcount is bumped
> * otherwise it may race with THP split.
> @@ -2294,7 +2296,14 @@ int memory_failure(unsigned long pfn, int flags)
> * page is a valid handlable page.
> */
> folio_set_has_hwpoisoned(folio);
> - if (try_to_split_thp_page(p, false) < 0) {
> + /*
> + * If the folio cannot be split to order-0, kill the process,
> + * but split the folio anyway to minimize the amount of unusable
> + * pages.
> + */
> + if (try_to_split_thp_page(p, new_order, false) || new_order) {
> + /* get folio again in case the original one is split */
> + folio = page_folio(p);
If original folio A is split and the after-split new folio is B (A != B), will the
refcnt of folio A held above be missing? I.e. get_hwpoison_page() held the extra refcnt
of folio A, but we put the refcnt of folio B below. Is this a problem or am I miss
something?
Thanks.
.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] mm/memory-failure: improve large block size folio handling.
2025-10-11 4:12 ` Miaohe Lin
@ 2025-10-11 5:00 ` Matthew Wilcox
2025-10-11 9:07 ` Miaohe Lin
2025-10-13 17:04 ` Zi Yan
0 siblings, 2 replies; 22+ messages in thread
From: Matthew Wilcox @ 2025-10-11 5:00 UTC (permalink / raw)
To: Miaohe Lin
Cc: Zi Yan, akpm, mcgrof, nao.horiguchi, Lorenzo Stoakes,
Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
Barry Song, Lance Yang, linux-fsdevel, linux-kernel, linux-mm,
david, jane.chu, kernel, syzbot+e6367ea2fdab6ed46056,
syzkaller-bugs
On Sat, Oct 11, 2025 at 12:12:12PM +0800, Miaohe Lin wrote:
> > folio_set_has_hwpoisoned(folio);
> > - if (try_to_split_thp_page(p, false) < 0) {
> > + /*
> > + * If the folio cannot be split to order-0, kill the process,
> > + * but split the folio anyway to minimize the amount of unusable
> > + * pages.
> > + */
> > + if (try_to_split_thp_page(p, new_order, false) || new_order) {
> > + /* get folio again in case the original one is split */
> > + folio = page_folio(p);
>
> If original folio A is split and the after-split new folio is B (A != B), will the
> refcnt of folio A held above be missing? I.e. get_hwpoison_page() held the extra refcnt
> of folio A, but we put the refcnt of folio B below. Is this a problem or am I miss
> something?
That's how split works.
Zi Yan, the kernel-doc for folio_split() could use some attention.
First, it's not kernel-doc; the comment opens with /* instead of /**.
Second, it says:
* After split, folio is left locked for caller.
which isn't actually true, right? The folio which contains
@split_at will be locked. Also, it will contain the additional
reference which was taken on @folio by the caller.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] mm/huge_memory: do not change split_huge_page*() target order silently.
2025-10-10 17:39 ` [PATCH 1/2] mm/huge_memory: do not change split_huge_page*() target order silently Zi Yan
2025-10-10 18:02 ` Luis Chamberlain
2025-10-11 2:25 ` Lance Yang
@ 2025-10-11 9:00 ` kernel test robot
2025-10-12 0:41 ` Wei Yang
` (2 subsequent siblings)
5 siblings, 0 replies; 22+ messages in thread
From: kernel test robot @ 2025-10-11 9:00 UTC (permalink / raw)
To: Zi Yan, linmiaohe, david, jane.chu, kernel,
syzbot+e6367ea2fdab6ed46056, syzkaller-bugs
Cc: oe-kbuild-all, ziy, akpm, mcgrof, nao.horiguchi, Lorenzo Stoakes,
Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
Barry Song, Lance Yang, Matthew Wilcox (Oracle),
linux-fsdevel, linux-kernel, linux-mm
Hi Zi,
kernel test robot noticed the following build errors:
[auto build test ERROR on linus/master]
[also build test ERROR on v6.17 next-20251010]
[cannot apply to akpm-mm/mm-everything]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Zi-Yan/mm-huge_memory-do-not-change-split_huge_page-target-order-silently/20251011-014145
base: linus/master
patch link: https://lore.kernel.org/r/20251010173906.3128789-2-ziy%40nvidia.com
patch subject: [PATCH 1/2] mm/huge_memory: do not change split_huge_page*() target order silently.
config: parisc-allnoconfig (https://download.01.org/0day-ci/archive/20251011/202510111633.onu4Yaey-lkp@intel.com/config)
compiler: hppa-linux-gcc (GCC) 15.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251011/202510111633.onu4Yaey-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202510111633.onu4Yaey-lkp@intel.com/
All errors (new ones prefixed by >>):
mm/truncate.c: In function 'truncate_inode_partial_folio':
>> mm/truncate.c:229:14: error: too many arguments to function 'try_folio_split'; expected 3, have 4
229 | if (!try_folio_split(folio, split_at, NULL, min_order)) {
| ^~~~~~~~~~~~~~~ ~~~~~~~~~
In file included from include/linux/mm.h:1081,
from arch/parisc/include/asm/cacheflush.h:5,
from include/linux/cacheflush.h:5,
from include/linux/highmem.h:8,
from include/linux/bvec.h:10,
from include/linux/blk_types.h:10,
from include/linux/writeback.h:13,
from include/linux/backing-dev.h:16,
from mm/truncate.c:12:
include/linux/huge_mm.h:588:19: note: declared here
588 | static inline int try_folio_split(struct folio *folio, struct page *page,
| ^~~~~~~~~~~~~~~
mm/truncate.c:259:25: error: too many arguments to function 'try_folio_split'; expected 3, have 4
259 | try_folio_split(folio2, split_at2, NULL, min_order);
| ^~~~~~~~~~~~~~~ ~~~~~~~~~
include/linux/huge_mm.h:588:19: note: declared here
588 | static inline int try_folio_split(struct folio *folio, struct page *page,
| ^~~~~~~~~~~~~~~
vim +/try_folio_split +229 mm/truncate.c
179
180 /*
181 * Handle partial folios. The folio may be entirely within the
182 * range if a split has raced with us. If not, we zero the part of the
183 * folio that's within the [start, end] range, and then split the folio if
184 * it's large. split_page_range() will discard pages which now lie beyond
185 * i_size, and we rely on the caller to discard pages which lie within a
186 * newly created hole.
187 *
188 * Returns false if splitting failed so the caller can avoid
189 * discarding the entire folio which is stubbornly unsplit.
190 */
191 bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
192 {
193 loff_t pos = folio_pos(folio);
194 size_t size = folio_size(folio);
195 unsigned int offset, length;
196 struct page *split_at, *split_at2;
197 unsigned int min_order;
198
199 if (pos < start)
200 offset = start - pos;
201 else
202 offset = 0;
203 if (pos + size <= (u64)end)
204 length = size - offset;
205 else
206 length = end + 1 - pos - offset;
207
208 folio_wait_writeback(folio);
209 if (length == size) {
210 truncate_inode_folio(folio->mapping, folio);
211 return true;
212 }
213
214 /*
215 * We may be zeroing pages we're about to discard, but it avoids
216 * doing a complex calculation here, and then doing the zeroing
217 * anyway if the page split fails.
218 */
219 if (!mapping_inaccessible(folio->mapping))
220 folio_zero_range(folio, offset, length);
221
222 if (folio_needs_release(folio))
223 folio_invalidate(folio, offset, length);
224 if (!folio_test_large(folio))
225 return true;
226
227 min_order = mapping_min_folio_order(folio->mapping);
228 split_at = folio_page(folio, PAGE_ALIGN_DOWN(offset) / PAGE_SIZE);
> 229 if (!try_folio_split(folio, split_at, NULL, min_order)) {
230 /*
231 * try to split at offset + length to make sure folios within
232 * the range can be dropped, especially to avoid memory waste
233 * for shmem truncate
234 */
235 struct folio *folio2;
236
237 if (offset + length == size)
238 goto no_split;
239
240 split_at2 = folio_page(folio,
241 PAGE_ALIGN_DOWN(offset + length) / PAGE_SIZE);
242 folio2 = page_folio(split_at2);
243
244 if (!folio_try_get(folio2))
245 goto no_split;
246
247 if (!folio_test_large(folio2))
248 goto out;
249
250 if (!folio_trylock(folio2))
251 goto out;
252
253 /*
254 * make sure folio2 is large and does not change its mapping.
255 * Its split result does not matter here.
256 */
257 if (folio_test_large(folio2) &&
258 folio2->mapping == folio->mapping)
259 try_folio_split(folio2, split_at2, NULL, min_order);
260
261 folio_unlock(folio2);
262 out:
263 folio_put(folio2);
264 no_split:
265 return true;
266 }
267 if (folio_test_dirty(folio))
268 return false;
269 truncate_inode_folio(folio->mapping, folio);
270 return true;
271 }
272
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] mm/memory-failure: improve large block size folio handling.
2025-10-11 5:00 ` Matthew Wilcox
@ 2025-10-11 9:07 ` Miaohe Lin
2025-10-13 17:04 ` Zi Yan
1 sibling, 0 replies; 22+ messages in thread
From: Miaohe Lin @ 2025-10-11 9:07 UTC (permalink / raw)
To: Matthew Wilcox, Zi Yan
Cc: akpm, mcgrof, nao.horiguchi, Lorenzo Stoakes, Baolin Wang,
Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, linux-fsdevel, linux-kernel, linux-mm, david,
jane.chu, kernel, syzbot+e6367ea2fdab6ed46056, syzkaller-bugs
On 2025/10/11 13:00, Matthew Wilcox wrote:
> On Sat, Oct 11, 2025 at 12:12:12PM +0800, Miaohe Lin wrote:
>>> folio_set_has_hwpoisoned(folio);
>>> - if (try_to_split_thp_page(p, false) < 0) {
>>> + /*
>>> + * If the folio cannot be split to order-0, kill the process,
>>> + * but split the folio anyway to minimize the amount of unusable
>>> + * pages.
>>> + */
>>> + if (try_to_split_thp_page(p, new_order, false) || new_order) {
>>> + /* get folio again in case the original one is split */
>>> + folio = page_folio(p);
>>
>> If original folio A is split and the after-split new folio is B (A != B), will the
>> refcnt of folio A held above be missing? I.e. get_hwpoison_page() held the extra refcnt
>> of folio A, but we put the refcnt of folio B below. Is this a problem or am I miss
>> something?
>
> That's how split works.
I read the code and see how split works. Thanks for point this out.
>
> Zi Yan, the kernel-doc for folio_split() could use some attention.
That would be really helpful.
Thanks.
.
> First, it's not kernel-doc; the comment opens with /* instead of /**.
> Second, it says:
>
> * After split, folio is left locked for caller.
>
> which isn't actually true, right? The folio which contains
> @split_at will be locked. Also, it will contain the additional
> reference which was taken on @folio by the caller.
>
> .
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] mm/memory-failure: improve large block size folio handling.
2025-10-10 17:39 ` [PATCH 2/2] mm/memory-failure: improve large block size folio handling Zi Yan
2025-10-10 18:05 ` Luis Chamberlain
2025-10-11 4:12 ` Miaohe Lin
@ 2025-10-11 10:23 ` kernel test robot
2025-10-13 17:08 ` Zi Yan
2 siblings, 1 reply; 22+ messages in thread
From: kernel test robot @ 2025-10-11 10:23 UTC (permalink / raw)
To: Zi Yan, linmiaohe, david, jane.chu, kernel,
syzbot+e6367ea2fdab6ed46056, syzkaller-bugs
Cc: oe-kbuild-all, ziy, akpm, mcgrof, nao.horiguchi, Lorenzo Stoakes,
Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
Barry Song, Lance Yang, Matthew Wilcox (Oracle),
linux-fsdevel, linux-kernel, linux-mm
Hi Zi,
kernel test robot noticed the following build errors:
[auto build test ERROR on linus/master]
[also build test ERROR on v6.17 next-20251010]
[cannot apply to akpm-mm/mm-everything]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Zi-Yan/mm-huge_memory-do-not-change-split_huge_page-target-order-silently/20251011-014145
base: linus/master
patch link: https://lore.kernel.org/r/20251010173906.3128789-3-ziy%40nvidia.com
patch subject: [PATCH 2/2] mm/memory-failure: improve large block size folio handling.
config: parisc-allmodconfig (https://download.01.org/0day-ci/archive/20251011/202510111805.rg0AewVk-lkp@intel.com/config)
compiler: hppa-linux-gcc (GCC) 15.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251011/202510111805.rg0AewVk-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202510111805.rg0AewVk-lkp@intel.com/
All errors (new ones prefixed by >>):
mm/memory-failure.c: In function 'memory_failure':
>> mm/memory-failure.c:2278:33: error: implicit declaration of function 'min_order_for_split' [-Wimplicit-function-declaration]
2278 | int new_order = min_order_for_split(folio);
| ^~~~~~~~~~~~~~~~~~~
vim +/min_order_for_split +2278 mm/memory-failure.c
2147
2148 /**
2149 * memory_failure - Handle memory failure of a page.
2150 * @pfn: Page Number of the corrupted page
2151 * @flags: fine tune action taken
2152 *
2153 * This function is called by the low level machine check code
2154 * of an architecture when it detects hardware memory corruption
2155 * of a page. It tries its best to recover, which includes
2156 * dropping pages, killing processes etc.
2157 *
2158 * The function is primarily of use for corruptions that
2159 * happen outside the current execution context (e.g. when
2160 * detected by a background scrubber)
2161 *
2162 * Must run in process context (e.g. a work queue) with interrupts
2163 * enabled and no spinlocks held.
2164 *
2165 * Return:
2166 * 0 - success,
2167 * -ENXIO - memory not managed by the kernel
2168 * -EOPNOTSUPP - hwpoison_filter() filtered the error event,
2169 * -EHWPOISON - the page was already poisoned, potentially
2170 * kill process,
2171 * other negative values - failure.
2172 */
2173 int memory_failure(unsigned long pfn, int flags)
2174 {
2175 struct page *p;
2176 struct folio *folio;
2177 struct dev_pagemap *pgmap;
2178 int res = 0;
2179 unsigned long page_flags;
2180 bool retry = true;
2181 int hugetlb = 0;
2182
2183 if (!sysctl_memory_failure_recovery)
2184 panic("Memory failure on page %lx", pfn);
2185
2186 mutex_lock(&mf_mutex);
2187
2188 if (!(flags & MF_SW_SIMULATED))
2189 hw_memory_failure = true;
2190
2191 p = pfn_to_online_page(pfn);
2192 if (!p) {
2193 res = arch_memory_failure(pfn, flags);
2194 if (res == 0)
2195 goto unlock_mutex;
2196
2197 if (pfn_valid(pfn)) {
2198 pgmap = get_dev_pagemap(pfn);
2199 put_ref_page(pfn, flags);
2200 if (pgmap) {
2201 res = memory_failure_dev_pagemap(pfn, flags,
2202 pgmap);
2203 goto unlock_mutex;
2204 }
2205 }
2206 pr_err("%#lx: memory outside kernel control\n", pfn);
2207 res = -ENXIO;
2208 goto unlock_mutex;
2209 }
2210
2211 try_again:
2212 res = try_memory_failure_hugetlb(pfn, flags, &hugetlb);
2213 if (hugetlb)
2214 goto unlock_mutex;
2215
2216 if (TestSetPageHWPoison(p)) {
2217 res = -EHWPOISON;
2218 if (flags & MF_ACTION_REQUIRED)
2219 res = kill_accessing_process(current, pfn, flags);
2220 if (flags & MF_COUNT_INCREASED)
2221 put_page(p);
2222 action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
2223 goto unlock_mutex;
2224 }
2225
2226 /*
2227 * We need/can do nothing about count=0 pages.
2228 * 1) it's a free page, and therefore in safe hand:
2229 * check_new_page() will be the gate keeper.
2230 * 2) it's part of a non-compound high order page.
2231 * Implies some kernel user: cannot stop them from
2232 * R/W the page; let's pray that the page has been
2233 * used and will be freed some time later.
2234 * In fact it's dangerous to directly bump up page count from 0,
2235 * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.
2236 */
2237 if (!(flags & MF_COUNT_INCREASED)) {
2238 res = get_hwpoison_page(p, flags);
2239 if (!res) {
2240 if (is_free_buddy_page(p)) {
2241 if (take_page_off_buddy(p)) {
2242 page_ref_inc(p);
2243 res = MF_RECOVERED;
2244 } else {
2245 /* We lost the race, try again */
2246 if (retry) {
2247 ClearPageHWPoison(p);
2248 retry = false;
2249 goto try_again;
2250 }
2251 res = MF_FAILED;
2252 }
2253 res = action_result(pfn, MF_MSG_BUDDY, res);
2254 } else {
2255 res = action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);
2256 }
2257 goto unlock_mutex;
2258 } else if (res < 0) {
2259 res = action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED);
2260 goto unlock_mutex;
2261 }
2262 }
2263
2264 folio = page_folio(p);
2265
2266 /* filter pages that are protected from hwpoison test by users */
2267 folio_lock(folio);
2268 if (hwpoison_filter(p)) {
2269 ClearPageHWPoison(p);
2270 folio_unlock(folio);
2271 folio_put(folio);
2272 res = -EOPNOTSUPP;
2273 goto unlock_mutex;
2274 }
2275 folio_unlock(folio);
2276
2277 if (folio_test_large(folio)) {
> 2278 int new_order = min_order_for_split(folio);
2279 /*
2280 * The flag must be set after the refcount is bumped
2281 * otherwise it may race with THP split.
2282 * And the flag can't be set in get_hwpoison_page() since
2283 * it is called by soft offline too and it is just called
2284 * for !MF_COUNT_INCREASED. So here seems to be the best
2285 * place.
2286 *
2287 * Don't need care about the above error handling paths for
2288 * get_hwpoison_page() since they handle either free page
2289 * or unhandlable page. The refcount is bumped iff the
2290 * page is a valid handlable page.
2291 */
2292 folio_set_has_hwpoisoned(folio);
2293 /*
2294 * If the folio cannot be split to order-0, kill the process,
2295 * but split the folio anyway to minimize the amount of unusable
2296 * pages.
2297 */
2298 if (try_to_split_thp_page(p, new_order, false) || new_order) {
2299 /* get folio again in case the original one is split */
2300 folio = page_folio(p);
2301 res = -EHWPOISON;
2302 kill_procs_now(p, pfn, flags, folio);
2303 put_page(p);
2304 action_result(pfn, MF_MSG_UNSPLIT_THP, MF_FAILED);
2305 goto unlock_mutex;
2306 }
2307 VM_BUG_ON_PAGE(!page_count(p), p);
2308 folio = page_folio(p);
2309 }
2310
2311 /*
2312 * We ignore non-LRU pages for good reasons.
2313 * - PG_locked is only well defined for LRU pages and a few others
2314 * - to avoid races with __SetPageLocked()
2315 * - to avoid races with __SetPageSlab*() (and more non-atomic ops)
2316 * The check (unnecessarily) ignores LRU pages being isolated and
2317 * walked by the page reclaim code, however that's not a big loss.
2318 */
2319 shake_folio(folio);
2320
2321 folio_lock(folio);
2322
2323 /*
2324 * We're only intended to deal with the non-Compound page here.
2325 * The page cannot become compound pages again as folio has been
2326 * splited and extra refcnt is held.
2327 */
2328 WARN_ON(folio_test_large(folio));
2329
2330 /*
2331 * We use page flags to determine what action should be taken, but
2332 * the flags can be modified by the error containment action. One
2333 * example is an mlocked page, where PG_mlocked is cleared by
2334 * folio_remove_rmap_*() in try_to_unmap_one(). So to determine page
2335 * status correctly, we save a copy of the page flags at this time.
2336 */
2337 page_flags = folio->flags.f;
2338
2339 /*
2340 * __munlock_folio() may clear a writeback folio's LRU flag without
2341 * the folio lock. We need to wait for writeback completion for this
2342 * folio or it may trigger a vfs BUG while evicting inode.
2343 */
2344 if (!folio_test_lru(folio) && !folio_test_writeback(folio))
2345 goto identify_page_state;
2346
2347 /*
2348 * It's very difficult to mess with pages currently under IO
2349 * and in many cases impossible, so we just avoid it here.
2350 */
2351 folio_wait_writeback(folio);
2352
2353 /*
2354 * Now take care of user space mappings.
2355 * Abort on fail: __filemap_remove_folio() assumes unmapped page.
2356 */
2357 if (!hwpoison_user_mappings(folio, p, pfn, flags)) {
2358 res = action_result(pfn, MF_MSG_UNMAP_FAILED, MF_FAILED);
2359 goto unlock_page;
2360 }
2361
2362 /*
2363 * Torn down by someone else?
2364 */
2365 if (folio_test_lru(folio) && !folio_test_swapcache(folio) &&
2366 folio->mapping == NULL) {
2367 res = action_result(pfn, MF_MSG_TRUNCATED_LRU, MF_IGNORED);
2368 goto unlock_page;
2369 }
2370
2371 identify_page_state:
2372 res = identify_page_state(pfn, p, page_flags);
2373 mutex_unlock(&mf_mutex);
2374 return res;
2375 unlock_page:
2376 folio_unlock(folio);
2377 unlock_mutex:
2378 mutex_unlock(&mf_mutex);
2379 return res;
2380 }
2381 EXPORT_SYMBOL_GPL(memory_failure);
2382
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] mm/huge_memory: do not change split_huge_page*() target order silently.
2025-10-10 17:39 ` [PATCH 1/2] mm/huge_memory: do not change split_huge_page*() target order silently Zi Yan
` (2 preceding siblings ...)
2025-10-11 9:00 ` kernel test robot
@ 2025-10-12 0:41 ` Wei Yang
2025-10-13 17:07 ` Zi Yan
2025-10-12 8:24 ` Pankaj Raghav (Samsung)
2025-10-15 14:25 ` Lorenzo Stoakes
5 siblings, 1 reply; 22+ messages in thread
From: Wei Yang @ 2025-10-12 0:41 UTC (permalink / raw)
To: Zi Yan
Cc: linmiaohe, david, jane.chu, kernel, syzbot+e6367ea2fdab6ed46056,
syzkaller-bugs, akpm, mcgrof, nao.horiguchi, Lorenzo Stoakes,
Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
Barry Song, Lance Yang, Matthew Wilcox (Oracle),
linux-fsdevel, linux-kernel, linux-mm
On Fri, Oct 10, 2025 at 01:39:05PM -0400, Zi Yan wrote:
>Page cache folios from a file system that support large block size (LBS)
>can have minimal folio order greater than 0, thus a high order folio might
>not be able to be split down to order-0. Commit e220917fa507 ("mm: split a
>folio in minimum folio order chunks") bumps the target order of
>split_huge_page*() to the minimum allowed order when splitting a LBS folio.
>This causes confusion for some split_huge_page*() callers like memory
>failure handling code, since they expect after-split folios all have
>order-0 when split succeeds but in really get min_order_for_split() order
>folios.
>
>Fix it by failing a split if the folio cannot be split to the target order.
>
>Fixes: e220917fa507 ("mm: split a folio in minimum folio order chunks")
>[The test poisons LBS folios, which cannot be split to order-0 folios, and
>also tries to poison all memory. The non split LBS folios take more memory
>than the test anticipated, leading to OOM. The patch fixed the kernel
>warning and the test needs some change to avoid OOM.]
>Reported-by: syzbot+e6367ea2fdab6ed46056@syzkaller.appspotmail.com
>Closes: https://lore.kernel.org/all/68d2c943.a70a0220.1b52b.02b3.GAE@google.com/
>Signed-off-by: Zi Yan <ziy@nvidia.com>
>---
> include/linux/huge_mm.h | 28 +++++-----------------------
> mm/huge_memory.c | 9 +--------
> mm/truncate.c | 6 ++++--
> 3 files changed, 10 insertions(+), 33 deletions(-)
>
>diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>index 8eec7a2a977b..9950cda1526a 100644
>--- a/include/linux/huge_mm.h
>+++ b/include/linux/huge_mm.h
>@@ -394,34 +394,16 @@ static inline int split_huge_page_to_list_to_order(struct page *page, struct lis
> * Return: 0: split is successful, otherwise split failed.
> */
It is better to update the document of try_folio_split()
> static inline int try_folio_split(struct folio *folio, struct page *page,
>- struct list_head *list)
>+ struct list_head *list, unsigned int order)
> {
>- int ret = min_order_for_split(folio);
>-
>- if (ret < 0)
>- return ret;
>-
>- if (!non_uniform_split_supported(folio, 0, false))
>+ if (!non_uniform_split_supported(folio, order, false))
> return split_huge_page_to_list_to_order(&folio->page, list,
>- ret);
>- return folio_split(folio, ret, page, list);
>+ order);
>+ return folio_split(folio, order, page, list);
> }
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] mm/huge_memory: do not change split_huge_page*() target order silently.
2025-10-10 17:39 ` [PATCH 1/2] mm/huge_memory: do not change split_huge_page*() target order silently Zi Yan
` (3 preceding siblings ...)
2025-10-12 0:41 ` Wei Yang
@ 2025-10-12 8:24 ` Pankaj Raghav (Samsung)
2025-10-13 17:11 ` Zi Yan
2025-10-15 14:25 ` Lorenzo Stoakes
5 siblings, 1 reply; 22+ messages in thread
From: Pankaj Raghav (Samsung) @ 2025-10-12 8:24 UTC (permalink / raw)
To: Zi Yan
Cc: linmiaohe, david, jane.chu, syzbot+e6367ea2fdab6ed46056,
syzkaller-bugs, akpm, mcgrof, nao.horiguchi, Lorenzo Stoakes,
Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
Barry Song, Lance Yang, Matthew Wilcox (Oracle),
linux-fsdevel, linux-kernel, linux-mm
On Fri, Oct 10, 2025 at 01:39:05PM -0400, Zi Yan wrote:
> Page cache folios from a file system that support large block size (LBS)
> can have minimal folio order greater than 0, thus a high order folio might
> not be able to be split down to order-0. Commit e220917fa507 ("mm: split a
> folio in minimum folio order chunks") bumps the target order of
> split_huge_page*() to the minimum allowed order when splitting a LBS folio.
> This causes confusion for some split_huge_page*() callers like memory
> failure handling code, since they expect after-split folios all have
> order-0 when split succeeds but in really get min_order_for_split() order
> folios.
>
> Fix it by failing a split if the folio cannot be split to the target order.
>
> Fixes: e220917fa507 ("mm: split a folio in minimum folio order chunks")
> [The test poisons LBS folios, which cannot be split to order-0 folios, and
> also tries to poison all memory. The non split LBS folios take more memory
> than the test anticipated, leading to OOM. The patch fixed the kernel
> warning and the test needs some change to avoid OOM.]
> Reported-by: syzbot+e6367ea2fdab6ed46056@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/68d2c943.a70a0220.1b52b.02b3.GAE@google.com/
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
LGTM with the suggested changes to the !CONFIG_THP try_folio_split().
Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] mm/memory-failure: improve large block size folio handling.
2025-10-11 5:00 ` Matthew Wilcox
2025-10-11 9:07 ` Miaohe Lin
@ 2025-10-13 17:04 ` Zi Yan
1 sibling, 0 replies; 22+ messages in thread
From: Zi Yan @ 2025-10-13 17:04 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Miaohe Lin, akpm, mcgrof, nao.horiguchi, Lorenzo Stoakes,
Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
Barry Song, Lance Yang, linux-fsdevel, linux-kernel, linux-mm,
david, jane.chu, kernel, syzbot+e6367ea2fdab6ed46056,
syzkaller-bugs
On 11 Oct 2025, at 1:00, Matthew Wilcox wrote:
> On Sat, Oct 11, 2025 at 12:12:12PM +0800, Miaohe Lin wrote:
>>> folio_set_has_hwpoisoned(folio);
>>> - if (try_to_split_thp_page(p, false) < 0) {
>>> + /*
>>> + * If the folio cannot be split to order-0, kill the process,
>>> + * but split the folio anyway to minimize the amount of unusable
>>> + * pages.
>>> + */
>>> + if (try_to_split_thp_page(p, new_order, false) || new_order) {
>>> + /* get folio again in case the original one is split */
>>> + folio = page_folio(p);
>>
>> If original folio A is split and the after-split new folio is B (A != B), will the
>> refcnt of folio A held above be missing? I.e. get_hwpoison_page() held the extra refcnt
>> of folio A, but we put the refcnt of folio B below. Is this a problem or am I miss
>> something?
>
> That's how split works.
>
> Zi Yan, the kernel-doc for folio_split() could use some attention.
> First, it's not kernel-doc; the comment opens with /* instead of /**.
Got it.
> Second, it says:
>
> * After split, folio is left locked for caller.
>
> which isn't actually true, right? The folio which contains
No, folio is indeed left locked. Currently folio_split() is
used by truncate_inode_partial_folio() via try_folio_split()
and the folio passed into truncate_inode_partial_folio() is
already locked by the caller and is unlocked by the caller as well.
The caller does not know anything about @split_at, thus
cannot unlock the folio containing @split_at.
> @split_at will be locked. Also, it will contain the additional
> reference which was taken on @folio by the caller.
The same for the folio reference.
That is the reason we have @split_at and @lock_at for __folio_split().
I can see it is counter-intuitive. To change it, I might need
your help on how to change truncate_inode_partial_folio() callers,
since all of them are use @folio afterwards, without a reference,
I am not sure if their uses are safe anymore.
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] mm/huge_memory: do not change split_huge_page*() target order silently.
2025-10-11 2:25 ` Lance Yang
@ 2025-10-13 17:06 ` Zi Yan
0 siblings, 0 replies; 22+ messages in thread
From: Zi Yan @ 2025-10-13 17:06 UTC (permalink / raw)
To: Lance Yang
Cc: akpm, syzkaller-bugs, mcgrof, nao.horiguchi, Lorenzo Stoakes,
kernel, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
jane.chu, Dev Jain, Barry Song, Matthew Wilcox (Oracle),
linux-fsdevel, david, linux-kernel, linux-mm, linmiaohe,
syzbot+e6367ea2fdab6ed46056
On 10 Oct 2025, at 22:25, Lance Yang wrote:
> On 2025/10/11 01:39, Zi Yan wrote:
>> Page cache folios from a file system that support large block size (LBS)
>> can have minimal folio order greater than 0, thus a high order folio might
>> not be able to be split down to order-0. Commit e220917fa507 ("mm: split a
>> folio in minimum folio order chunks") bumps the target order of
>> split_huge_page*() to the minimum allowed order when splitting a LBS folio.
>> This causes confusion for some split_huge_page*() callers like memory
>> failure handling code, since they expect after-split folios all have
>> order-0 when split succeeds but in really get min_order_for_split() order
>> folios.
>>
>> Fix it by failing a split if the folio cannot be split to the target order.
>>
>> Fixes: e220917fa507 ("mm: split a folio in minimum folio order chunks")
>> [The test poisons LBS folios, which cannot be split to order-0 folios, and
>> also tries to poison all memory. The non split LBS folios take more memory
>> than the test anticipated, leading to OOM. The patch fixed the kernel
>> warning and the test needs some change to avoid OOM.]
>> Reported-by: syzbot+e6367ea2fdab6ed46056@syzkaller.appspotmail.com
>> Closes: https://lore.kernel.org/all/68d2c943.a70a0220.1b52b.02b3.GAE@google.com/
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> ---
>> include/linux/huge_mm.h | 28 +++++-----------------------
>> mm/huge_memory.c | 9 +--------
>> mm/truncate.c | 6 ++++--
>> 3 files changed, 10 insertions(+), 33 deletions(-)
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 8eec7a2a977b..9950cda1526a 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -394,34 +394,16 @@ static inline int split_huge_page_to_list_to_order(struct page *page, struct lis
>> * Return: 0: split is successful, otherwise split failed.
>> */
>> static inline int try_folio_split(struct folio *folio, struct page *page,
>> - struct list_head *list)
>> + struct list_head *list, unsigned int order)
>
> Seems like we need to add the order parameter to the stub for try_folio_split() as well?
>
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>
> ...
>
> #else /* CONFIG_TRANSPARENT_HUGEPAGE */
>
> static inline int try_folio_split(struct folio *folio, struct page *page,
> struct list_head *list)
> {
> VM_WARN_ON_ONCE_FOLIO(1, folio);
> return -EINVAL;
> }
>
> #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
Thanks. It is also reported by lkp robot. Will fix it.
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] mm/huge_memory: do not change split_huge_page*() target order silently.
2025-10-12 0:41 ` Wei Yang
@ 2025-10-13 17:07 ` Zi Yan
0 siblings, 0 replies; 22+ messages in thread
From: Zi Yan @ 2025-10-13 17:07 UTC (permalink / raw)
To: Wei Yang
Cc: linmiaohe, david, jane.chu, kernel, syzbot+e6367ea2fdab6ed46056,
syzkaller-bugs, akpm, mcgrof, nao.horiguchi, Lorenzo Stoakes,
Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
Barry Song, Lance Yang, Matthew Wilcox (Oracle),
linux-fsdevel, linux-kernel, linux-mm
On 11 Oct 2025, at 20:41, Wei Yang wrote:
> On Fri, Oct 10, 2025 at 01:39:05PM -0400, Zi Yan wrote:
>> Page cache folios from a file system that support large block size (LBS)
>> can have minimal folio order greater than 0, thus a high order folio might
>> not be able to be split down to order-0. Commit e220917fa507 ("mm: split a
>> folio in minimum folio order chunks") bumps the target order of
>> split_huge_page*() to the minimum allowed order when splitting a LBS folio.
>> This causes confusion for some split_huge_page*() callers like memory
>> failure handling code, since they expect after-split folios all have
>> order-0 when split succeeds but in really get min_order_for_split() order
>> folios.
>>
>> Fix it by failing a split if the folio cannot be split to the target order.
>>
>> Fixes: e220917fa507 ("mm: split a folio in minimum folio order chunks")
>> [The test poisons LBS folios, which cannot be split to order-0 folios, and
>> also tries to poison all memory. The non split LBS folios take more memory
>> than the test anticipated, leading to OOM. The patch fixed the kernel
>> warning and the test needs some change to avoid OOM.]
>> Reported-by: syzbot+e6367ea2fdab6ed46056@syzkaller.appspotmail.com
>> Closes: https://lore.kernel.org/all/68d2c943.a70a0220.1b52b.02b3.GAE@google.com/
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> ---
>> include/linux/huge_mm.h | 28 +++++-----------------------
>> mm/huge_memory.c | 9 +--------
>> mm/truncate.c | 6 ++++--
>> 3 files changed, 10 insertions(+), 33 deletions(-)
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 8eec7a2a977b..9950cda1526a 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -394,34 +394,16 @@ static inline int split_huge_page_to_list_to_order(struct page *page, struct lis
>> * Return: 0: split is successful, otherwise split failed.
>> */
>
> It is better to update the document of try_folio_split()
Sure. Will do.
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] mm/memory-failure: improve large block size folio handling.
2025-10-11 10:23 ` kernel test robot
@ 2025-10-13 17:08 ` Zi Yan
0 siblings, 0 replies; 22+ messages in thread
From: Zi Yan @ 2025-10-13 17:08 UTC (permalink / raw)
To: kernel test robot
Cc: linmiaohe, david, jane.chu, kernel, syzbot+e6367ea2fdab6ed46056,
syzkaller-bugs, oe-kbuild-all, akpm, mcgrof, nao.horiguchi,
Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache,
Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
Matthew Wilcox (Oracle),
linux-fsdevel, linux-kernel, linux-mm
On 11 Oct 2025, at 6:23, kernel test robot wrote:
> Hi Zi,
>
> kernel test robot noticed the following build errors:
>
> [auto build test ERROR on linus/master]
> [also build test ERROR on v6.17 next-20251010]
> [cannot apply to akpm-mm/mm-everything]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Zi-Yan/mm-huge_memory-do-not-change-split_huge_page-target-order-silently/20251011-014145
> base: linus/master
> patch link: https://lore.kernel.org/r/20251010173906.3128789-3-ziy%40nvidia.com
> patch subject: [PATCH 2/2] mm/memory-failure: improve large block size folio handling.
> config: parisc-allmodconfig (https://download.01.org/0day-ci/archive/20251011/202510111805.rg0AewVk-lkp@intel.com/config)
> compiler: hppa-linux-gcc (GCC) 15.1.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251011/202510111805.rg0AewVk-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202510111805.rg0AewVk-lkp@intel.com/
>
> All errors (new ones prefixed by >>):
>
> mm/memory-failure.c: In function 'memory_failure':
>>> mm/memory-failure.c:2278:33: error: implicit declaration of function 'min_order_for_split' [-Wimplicit-function-declaration]
> 2278 | int new_order = min_order_for_split(folio);
> | ^~~~~~~~~~~~~~~~~~~
>
min_order_for_split() is missing in the !CONFIG_TRANSPARENT_HUGEPAGE case. Will add one
to get rid of this error.
Thanks.
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] mm/huge_memory: do not change split_huge_page*() target order silently.
2025-10-10 18:02 ` Luis Chamberlain
@ 2025-10-13 17:11 ` Zi Yan
0 siblings, 0 replies; 22+ messages in thread
From: Zi Yan @ 2025-10-13 17:11 UTC (permalink / raw)
To: Luis Chamberlain
Cc: linmiaohe, david, jane.chu, kernel, syzbot+e6367ea2fdab6ed46056,
syzkaller-bugs, akpm, nao.horiguchi, Lorenzo Stoakes,
Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
Barry Song, Lance Yang, Matthew Wilcox (Oracle),
linux-fsdevel, linux-kernel, linux-mm
On 10 Oct 2025, at 14:02, Luis Chamberlain wrote:
> On Fri, Oct 10, 2025 at 01:39:05PM -0400, Zi Yan wrote:
>> Page cache folios from a file system that support large block size (LBS)
>> can have minimal folio order greater than 0, thus a high order folio might
>> not be able to be split down to order-0. Commit e220917fa507 ("mm: split a
>> folio in minimum folio order chunks") bumps the target order of
>> split_huge_page*() to the minimum allowed order when splitting a LBS folio.
>> This causes confusion for some split_huge_page*() callers like memory
>> failure handling code, since they expect after-split folios all have
>> order-0 when split succeeds but in really get min_order_for_split() order
>> folios.
>>
>> Fix it by failing a split if the folio cannot be split to the target order.
>>
>> Fixes: e220917fa507 ("mm: split a folio in minimum folio order chunks")
>> [The test poisons LBS folios, which cannot be split to order-0 folios, and
>> also tries to poison all memory. The non split LBS folios take more memory
>> than the test anticipated, leading to OOM. The patch fixed the kernel
>> warning and the test needs some change to avoid OOM.]
>> Reported-by: syzbot+e6367ea2fdab6ed46056@syzkaller.appspotmail.com
>> Closes: https://lore.kernel.org/all/68d2c943.a70a0220.1b52b.02b3.GAE@google.com/
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>
> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Thanks.
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] mm/huge_memory: do not change split_huge_page*() target order silently.
2025-10-12 8:24 ` Pankaj Raghav (Samsung)
@ 2025-10-13 17:11 ` Zi Yan
0 siblings, 0 replies; 22+ messages in thread
From: Zi Yan @ 2025-10-13 17:11 UTC (permalink / raw)
To: Pankaj Raghav (Samsung)
Cc: linmiaohe, david, jane.chu, syzbot+e6367ea2fdab6ed46056,
syzkaller-bugs, akpm, mcgrof, nao.horiguchi, Lorenzo Stoakes,
Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
Barry Song, Lance Yang, Matthew Wilcox (Oracle),
linux-fsdevel, linux-kernel, linux-mm
On 12 Oct 2025, at 4:24, Pankaj Raghav (Samsung) wrote:
> On Fri, Oct 10, 2025 at 01:39:05PM -0400, Zi Yan wrote:
>> Page cache folios from a file system that support large block size (LBS)
>> can have minimal folio order greater than 0, thus a high order folio might
>> not be able to be split down to order-0. Commit e220917fa507 ("mm: split a
>> folio in minimum folio order chunks") bumps the target order of
>> split_huge_page*() to the minimum allowed order when splitting a LBS folio.
>> This causes confusion for some split_huge_page*() callers like memory
>> failure handling code, since they expect after-split folios all have
>> order-0 when split succeeds but in really get min_order_for_split() order
>> folios.
>>
>> Fix it by failing a split if the folio cannot be split to the target order.
>>
>> Fixes: e220917fa507 ("mm: split a folio in minimum folio order chunks")
>> [The test poisons LBS folios, which cannot be split to order-0 folios, and
>> also tries to poison all memory. The non split LBS folios take more memory
>> than the test anticipated, leading to OOM. The patch fixed the kernel
>> warning and the test needs some change to avoid OOM.]
>> Reported-by: syzbot+e6367ea2fdab6ed46056@syzkaller.appspotmail.com
>> Closes: https://lore.kernel.org/all/68d2c943.a70a0220.1b52b.02b3.GAE@google.com/
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> ---
> LGTM with the suggested changes to the !CONFIG_THP try_folio_split().
>
> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>
Thanks.
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] mm/huge_memory: do not change split_huge_page*() target order silently.
2025-10-10 17:39 ` [PATCH 1/2] mm/huge_memory: do not change split_huge_page*() target order silently Zi Yan
` (4 preceding siblings ...)
2025-10-12 8:24 ` Pankaj Raghav (Samsung)
@ 2025-10-15 14:25 ` Lorenzo Stoakes
2025-10-15 22:57 ` Zi Yan
5 siblings, 1 reply; 22+ messages in thread
From: Lorenzo Stoakes @ 2025-10-15 14:25 UTC (permalink / raw)
To: Zi Yan
Cc: linmiaohe, david, jane.chu, kernel, syzbot+e6367ea2fdab6ed46056,
syzkaller-bugs, akpm, mcgrof, nao.horiguchi, Baolin Wang,
Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, Matthew Wilcox (Oracle),
linux-fsdevel, linux-kernel, linux-mm
On Fri, Oct 10, 2025 at 01:39:05PM -0400, Zi Yan wrote:
> Page cache folios from a file system that support large block size (LBS)
> can have minimal folio order greater than 0, thus a high order folio might
> not be able to be split down to order-0. Commit e220917fa507 ("mm: split a
> folio in minimum folio order chunks") bumps the target order of
> split_huge_page*() to the minimum allowed order when splitting a LBS folio.
> This causes confusion for some split_huge_page*() callers like memory
> failure handling code, since they expect after-split folios all have
> order-0 when split succeeds but in really get min_order_for_split() order
> folios.
>
> Fix it by failing a split if the folio cannot be split to the target order.
>
> Fixes: e220917fa507 ("mm: split a folio in minimum folio order chunks")
> [The test poisons LBS folios, which cannot be split to order-0 folios, and
> also tries to poison all memory. The non split LBS folios take more memory
> than the test anticipated, leading to OOM. The patch fixed the kernel
> warning and the test needs some change to avoid OOM.]
> Reported-by: syzbot+e6367ea2fdab6ed46056@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/68d2c943.a70a0220.1b52b.02b3.GAE@google.com/
> Signed-off-by: Zi Yan <ziy@nvidia.com>
Generally ok with the patch in general but a bunch of comments below!
> ---
> include/linux/huge_mm.h | 28 +++++-----------------------
> mm/huge_memory.c | 9 +--------
> mm/truncate.c | 6 ++++--
> 3 files changed, 10 insertions(+), 33 deletions(-)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 8eec7a2a977b..9950cda1526a 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -394,34 +394,16 @@ static inline int split_huge_page_to_list_to_order(struct page *page, struct lis
> * Return: 0: split is successful, otherwise split failed.
> */
You need to update the kdoc too.
Also can you mention there this is the function you should use if you want
to specify an order?
Maybe we should rename this function to try_folio_split_to_order() to make
that completely explicit now that we're making other splitting logic always
split to order-0?
> static inline int try_folio_split(struct folio *folio, struct page *page,
> - struct list_head *list)
> + struct list_head *list, unsigned int order)
Is this target order? I see non_uniform_split_supported() calls this
new_order so maybe let's use the same naming so as not to confuse it with
the current folio order?
Also - nitty one, but should we put the order as 3rd arg rather than 4th?
As it seems it's normal to pass NULL list, and it's a bit weird to see a
NULL in the middle of the args.
> {
> - int ret = min_order_for_split(folio);
> -
> - if (ret < 0)
> - return ret;
OK so the point of removing this is that we assume in truncate (the only
user) that we already have this information (i.e. from
mapping_min_folio_order()) right?
> -
> - if (!non_uniform_split_supported(folio, 0, false))
> + if (!non_uniform_split_supported(folio, order, false))
While we're here can we make the mystery meat last param commented like:
if (!non_uniform_split_supported(folio, order, /* warns= */false))
> return split_huge_page_to_list_to_order(&folio->page, list,
> - ret);
> - return folio_split(folio, ret, page, list);
> + order);
> + return folio_split(folio, order, page, list);
> }
> static inline int split_huge_page(struct page *page)
> {
> - struct folio *folio = page_folio(page);
> - int ret = min_order_for_split(folio);
> -
> - if (ret < 0)
> - return ret;
> -
> - /*
> - * split_huge_page() locks the page before splitting and
> - * expects the same page that has been split to be locked when
> - * returned. split_folio(page_folio(page)) cannot be used here
> - * because it converts the page to folio and passes the head
> - * page to be split.
> - */
> - return split_huge_page_to_list_to_order(page, NULL, ret);
> + return split_huge_page_to_list_to_order(page, NULL, 0);
OK so the idea here is that callers would expect to split to 0 and the
specific instance where we would actually want this behaviour of splittnig
to a minimum order is now limited only to try_folio_split() (or
try_folio_split_to_order() if you rename)?
> }
> void deferred_split_folio(struct folio *folio, bool partially_mapped);
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 0fb4af604657..af06ee6d2206 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3829,8 +3829,6 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>
> min_order = mapping_min_folio_order(folio->mapping);
> if (new_order < min_order) {
> - VM_WARN_ONCE(1, "Cannot split mapped folio below min-order: %u",
> - min_order);
Why are we dropping this?
> ret = -EINVAL;
> goto out;
> }
> @@ -4173,12 +4171,7 @@ int min_order_for_split(struct folio *folio)
>
> int split_folio_to_list(struct folio *folio, struct list_head *list)
> {
> - int ret = min_order_for_split(folio);
> -
> - if (ret < 0)
> - return ret;
> -
> - return split_huge_page_to_list_to_order(&folio->page, list, ret);
> + return split_huge_page_to_list_to_order(&folio->page, list, 0);
> }
>
> /*
> diff --git a/mm/truncate.c b/mm/truncate.c
> index 91eb92a5ce4f..1c15149ae8e9 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -194,6 +194,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
> size_t size = folio_size(folio);
> unsigned int offset, length;
> struct page *split_at, *split_at2;
> + unsigned int min_order;
>
> if (pos < start)
> offset = start - pos;
> @@ -223,8 +224,9 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
> if (!folio_test_large(folio))
> return true;
>
> + min_order = mapping_min_folio_order(folio->mapping);
> split_at = folio_page(folio, PAGE_ALIGN_DOWN(offset) / PAGE_SIZE);
> - if (!try_folio_split(folio, split_at, NULL)) {
> + if (!try_folio_split(folio, split_at, NULL, min_order)) {
> /*
> * try to split at offset + length to make sure folios within
> * the range can be dropped, especially to avoid memory waste
> @@ -254,7 +256,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
> */
> if (folio_test_large(folio2) &&
> folio2->mapping == folio->mapping)
> - try_folio_split(folio2, split_at2, NULL);
> + try_folio_split(folio2, split_at2, NULL, min_order);
>
> folio_unlock(folio2);
> out:
> --
> 2.51.0
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] mm/huge_memory: do not change split_huge_page*() target order silently.
2025-10-15 14:25 ` Lorenzo Stoakes
@ 2025-10-15 22:57 ` Zi Yan
2025-10-16 8:34 ` Lorenzo Stoakes
0 siblings, 1 reply; 22+ messages in thread
From: Zi Yan @ 2025-10-15 22:57 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: linmiaohe, david, jane.chu, kernel, syzbot+e6367ea2fdab6ed46056,
syzkaller-bugs, akpm, mcgrof, nao.horiguchi, Baolin Wang,
Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, Matthew Wilcox (Oracle),
linux-fsdevel, linux-kernel, linux-mm
On 15 Oct 2025, at 10:25, Lorenzo Stoakes wrote:
> On Fri, Oct 10, 2025 at 01:39:05PM -0400, Zi Yan wrote:
>> Page cache folios from a file system that support large block size (LBS)
>> can have minimal folio order greater than 0, thus a high order folio might
>> not be able to be split down to order-0. Commit e220917fa507 ("mm: split a
>> folio in minimum folio order chunks") bumps the target order of
>> split_huge_page*() to the minimum allowed order when splitting a LBS folio.
>> This causes confusion for some split_huge_page*() callers like memory
>> failure handling code, since they expect after-split folios all have
>> order-0 when split succeeds but in really get min_order_for_split() order
>> folios.
>>
>> Fix it by failing a split if the folio cannot be split to the target order.
>>
>> Fixes: e220917fa507 ("mm: split a folio in minimum folio order chunks")
>> [The test poisons LBS folios, which cannot be split to order-0 folios, and
>> also tries to poison all memory. The non split LBS folios take more memory
>> than the test anticipated, leading to OOM. The patch fixed the kernel
>> warning and the test needs some change to avoid OOM.]
>> Reported-by: syzbot+e6367ea2fdab6ed46056@syzkaller.appspotmail.com
>> Closes: https://lore.kernel.org/all/68d2c943.a70a0220.1b52b.02b3.GAE@google.com/
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>
> Generally ok with the patch in general but a bunch of comments below!
>
>> ---
>> include/linux/huge_mm.h | 28 +++++-----------------------
>> mm/huge_memory.c | 9 +--------
>> mm/truncate.c | 6 ++++--
>> 3 files changed, 10 insertions(+), 33 deletions(-)
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 8eec7a2a977b..9950cda1526a 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -394,34 +394,16 @@ static inline int split_huge_page_to_list_to_order(struct page *page, struct lis
>> * Return: 0: split is successful, otherwise split failed.
>> */
>
> You need to update the kdoc too.
Done it locally.
>
> Also can you mention there this is the function you should use if you want
> to specify an order?
You mean min_order_for_split()? Sure.
>
> Maybe we should rename this function to try_folio_split_to_order() to make
> that completely explicit now that we're making other splitting logic always
> split to order-0?
Sure.
>
>> static inline int try_folio_split(struct folio *folio, struct page *page,
>> - struct list_head *list)
>> + struct list_head *list, unsigned int order)
>
> Is this target order? I see non_uniform_split_supported() calls this
> new_order so maybe let's use the same naming so as not to confuse it with
> the current folio order?
Sure, will rename it to new_order.
>
> Also - nitty one, but should we put the order as 3rd arg rather than 4th?
>
> As it seems it's normal to pass NULL list, and it's a bit weird to see a
> NULL in the middle of the args.
OK, will reorder the args.
>
>> {
>> - int ret = min_order_for_split(folio);
>> -
>> - if (ret < 0)
>> - return ret;
>
> OK so the point of removing this is that we assume in truncate (the only
> user) that we already have this information (i.e. from
> mapping_min_folio_order()) right?
Right.
>
>> -
>> - if (!non_uniform_split_supported(folio, 0, false))
>> + if (!non_uniform_split_supported(folio, order, false))
>
> While we're here can we make the mystery meat last param commented like:
>
> if (!non_uniform_split_supported(folio, order, /* warns= */false))
Sure.
>
>> return split_huge_page_to_list_to_order(&folio->page, list,
>> - ret);
>> - return folio_split(folio, ret, page, list);
>> + order);
>> + return folio_split(folio, order, page, list);
>> }
>> static inline int split_huge_page(struct page *page)
>> {
>> - struct folio *folio = page_folio(page);
>> - int ret = min_order_for_split(folio);
>> -
>> - if (ret < 0)
>> - return ret;
>> -
>> - /*
>> - * split_huge_page() locks the page before splitting and
>> - * expects the same page that has been split to be locked when
>> - * returned. split_folio(page_folio(page)) cannot be used here
>> - * because it converts the page to folio and passes the head
>> - * page to be split.
>> - */
>> - return split_huge_page_to_list_to_order(page, NULL, ret);
>> + return split_huge_page_to_list_to_order(page, NULL, 0);
>
> OK so the idea here is that callers would expect to split to 0 and the
> specific instance where we would actually want this behaviour of splittnig
> to a minimum order is now limited only to try_folio_split() (or
> try_folio_split_to_order() if you rename)?
>
Before commit e220917fa507 (the one to be fixed), split_huge_page() always
splits @page to order 0. It is just restoring the original behavior.
If caller wants to split a different order, they should use
split_huge_page_to_list_to_order() (current no such user except debugfs test
code).
>> }
>> void deferred_split_folio(struct folio *folio, bool partially_mapped);
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 0fb4af604657..af06ee6d2206 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -3829,8 +3829,6 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>
>> min_order = mapping_min_folio_order(folio->mapping);
>> if (new_order < min_order) {
>> - VM_WARN_ONCE(1, "Cannot split mapped folio below min-order: %u",
>> - min_order);
>
> Why are we dropping this?
This is used to catch “misuse” of split_huge_page_to_list_to_order(), when
caller wants to split a LBS folio to an order smaller than
mapping_min_folio_order(). It is based on the assumption that split code
should never fail on a LBS folio. But that assumption is causing problems
like the reported memory failure one. So it is removed to allow split code
to fail without a warning if a LBS folio cannot be split to the new_order.
>
>> ret = -EINVAL;
>> goto out;
>> }
>
>> @@ -4173,12 +4171,7 @@ int min_order_for_split(struct folio *folio)
>>
>> int split_folio_to_list(struct folio *folio, struct list_head *list)
>> {
>> - int ret = min_order_for_split(folio);
>> -
>> - if (ret < 0)
>> - return ret;
>> -
>> - return split_huge_page_to_list_to_order(&folio->page, list, ret);
>> + return split_huge_page_to_list_to_order(&folio->page, list, 0);
>> }
>>
>> /*
>> diff --git a/mm/truncate.c b/mm/truncate.c
>> index 91eb92a5ce4f..1c15149ae8e9 100644
>> --- a/mm/truncate.c
>> +++ b/mm/truncate.c
>> @@ -194,6 +194,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
>> size_t size = folio_size(folio);
>> unsigned int offset, length;
>> struct page *split_at, *split_at2;
>> + unsigned int min_order;
>>
>> if (pos < start)
>> offset = start - pos;
>> @@ -223,8 +224,9 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
>> if (!folio_test_large(folio))
>> return true;
>>
>> + min_order = mapping_min_folio_order(folio->mapping);
>> split_at = folio_page(folio, PAGE_ALIGN_DOWN(offset) / PAGE_SIZE);
>> - if (!try_folio_split(folio, split_at, NULL)) {
>> + if (!try_folio_split(folio, split_at, NULL, min_order)) {
>> /*
>> * try to split at offset + length to make sure folios within
>> * the range can be dropped, especially to avoid memory waste
>> @@ -254,7 +256,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
>> */
>> if (folio_test_large(folio2) &&
>> folio2->mapping == folio->mapping)
>> - try_folio_split(folio2, split_at2, NULL);
>> + try_folio_split(folio2, split_at2, NULL, min_order);
>>
>> folio_unlock(folio2);
>> out:
>> --
>> 2.51.0
>>
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] mm/huge_memory: do not change split_huge_page*() target order silently.
2025-10-15 22:57 ` Zi Yan
@ 2025-10-16 8:34 ` Lorenzo Stoakes
0 siblings, 0 replies; 22+ messages in thread
From: Lorenzo Stoakes @ 2025-10-16 8:34 UTC (permalink / raw)
To: Zi Yan
Cc: linmiaohe, david, jane.chu, kernel, syzbot+e6367ea2fdab6ed46056,
syzkaller-bugs, akpm, mcgrof, nao.horiguchi, Baolin Wang,
Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, Matthew Wilcox (Oracle),
linux-fsdevel, linux-kernel, linux-mm
On Wed, Oct 15, 2025 at 06:57:37PM -0400, Zi Yan wrote:
> On 15 Oct 2025, at 10:25, Lorenzo Stoakes wrote:
>
> > On Fri, Oct 10, 2025 at 01:39:05PM -0400, Zi Yan wrote:
> >> Page cache folios from a file system that support large block size (LBS)
> >> can have minimal folio order greater than 0, thus a high order folio might
> >> not be able to be split down to order-0. Commit e220917fa507 ("mm: split a
> >> folio in minimum folio order chunks") bumps the target order of
> >> split_huge_page*() to the minimum allowed order when splitting a LBS folio.
> >> This causes confusion for some split_huge_page*() callers like memory
> >> failure handling code, since they expect after-split folios all have
> >> order-0 when split succeeds but in really get min_order_for_split() order
> >> folios.
> >>
> >> Fix it by failing a split if the folio cannot be split to the target order.
> >>
> >> Fixes: e220917fa507 ("mm: split a folio in minimum folio order chunks")
> >> [The test poisons LBS folios, which cannot be split to order-0 folios, and
> >> also tries to poison all memory. The non split LBS folios take more memory
> >> than the test anticipated, leading to OOM. The patch fixed the kernel
> >> warning and the test needs some change to avoid OOM.]
> >> Reported-by: syzbot+e6367ea2fdab6ed46056@syzkaller.appspotmail.com
> >> Closes: https://lore.kernel.org/all/68d2c943.a70a0220.1b52b.02b3.GAE@google.com/
> >> Signed-off-by: Zi Yan <ziy@nvidia.com>
> >
> > Generally ok with the patch in general but a bunch of comments below!
> >
> >> ---
> >> include/linux/huge_mm.h | 28 +++++-----------------------
> >> mm/huge_memory.c | 9 +--------
> >> mm/truncate.c | 6 ++++--
> >> 3 files changed, 10 insertions(+), 33 deletions(-)
> >>
> >> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> >> index 8eec7a2a977b..9950cda1526a 100644
> >> --- a/include/linux/huge_mm.h
> >> +++ b/include/linux/huge_mm.h
> >> @@ -394,34 +394,16 @@ static inline int split_huge_page_to_list_to_order(struct page *page, struct lis
> >> * Return: 0: split is successful, otherwise split failed.
> >> */
> >
> > You need to update the kdoc too.
>
> Done it locally.
Thanks!
>
> >
> > Also can you mention there this is the function you should use if you want
> > to specify an order?
>
> You mean min_order_for_split()? Sure.
No, I mean try_folio_split_to_order() :)
But ofc this applies to min_order_for_split() also
>
> >
> > Maybe we should rename this function to try_folio_split_to_order() to make
> > that completely explicit now that we're making other splitting logic always
> > split to order-0?
>
> Sure.
Thanks
> >
> >> static inline int try_folio_split(struct folio *folio, struct page *page,
> >> - struct list_head *list)
> >> + struct list_head *list, unsigned int order)
> >
> > Is this target order? I see non_uniform_split_supported() calls this
> > new_order so maybe let's use the same naming so as not to confuse it with
> > the current folio order?
>
> Sure, will rename it to new_order.
Thanks
>
> >
> > Also - nitty one, but should we put the order as 3rd arg rather than 4th?
> >
> > As it seems it's normal to pass NULL list, and it's a bit weird to see a
> > NULL in the middle of the args.
>
> OK, will reorder the args.
Thanks
>
> >
> >> {
> >> - int ret = min_order_for_split(folio);
> >> -
> >> - if (ret < 0)
> >> - return ret;
> >
> > OK so the point of removing this is that we assume in truncate (the only
> > user) that we already have this information (i.e. from
> > mapping_min_folio_order()) right?
>
> Right.
>
> >
> >> -
> >> - if (!non_uniform_split_supported(folio, 0, false))
> >> + if (!non_uniform_split_supported(folio, order, false))
> >
> > While we're here can we make the mystery meat last param commented like:
> >
> > if (!non_uniform_split_supported(folio, order, /* warns= */false))
>
> Sure.
Thanks
>
> >
> >> return split_huge_page_to_list_to_order(&folio->page, list,
> >> - ret);
> >> - return folio_split(folio, ret, page, list);
> >> + order);
> >> + return folio_split(folio, order, page, list);
> >> }
> >> static inline int split_huge_page(struct page *page)
> >> {
> >> - struct folio *folio = page_folio(page);
> >> - int ret = min_order_for_split(folio);
> >> -
> >> - if (ret < 0)
> >> - return ret;
> >> -
> >> - /*
> >> - * split_huge_page() locks the page before splitting and
> >> - * expects the same page that has been split to be locked when
> >> - * returned. split_folio(page_folio(page)) cannot be used here
> >> - * because it converts the page to folio and passes the head
> >> - * page to be split.
> >> - */
> >> - return split_huge_page_to_list_to_order(page, NULL, ret);
> >> + return split_huge_page_to_list_to_order(page, NULL, 0);
> >
> > OK so the idea here is that callers would expect to split to 0 and the
> > specific instance where we would actually want this behaviour of splittnig
> > to a minimum order is now limited only to try_folio_split() (or
> > try_folio_split_to_order() if you rename)?
> >
>
> Before commit e220917fa507 (the one to be fixed), split_huge_page() always
> splits @page to order 0. It is just restoring the original behavior.
> If caller wants to split a different order, they should use
> split_huge_page_to_list_to_order() (current no such user except debugfs test
> code).
Yeah makes sense, though now they can also use try_folio_split_to_order() of
course!
>
> >> }
> >> void deferred_split_folio(struct folio *folio, bool partially_mapped);
> >>
> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >> index 0fb4af604657..af06ee6d2206 100644
> >> --- a/mm/huge_memory.c
> >> +++ b/mm/huge_memory.c
> >> @@ -3829,8 +3829,6 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
> >>
> >> min_order = mapping_min_folio_order(folio->mapping);
> >> if (new_order < min_order) {
> >> - VM_WARN_ONCE(1, "Cannot split mapped folio below min-order: %u",
> >> - min_order);
> >
> > Why are we dropping this?
>
> This is used to catch “misuse” of split_huge_page_to_list_to_order(), when
> caller wants to split a LBS folio to an order smaller than
> mapping_min_folio_order(). It is based on the assumption that split code
> should never fail on a LBS folio. But that assumption is causing problems
> like the reported memory failure one. So it is removed to allow split code
> to fail without a warning if a LBS folio cannot be split to the new_order.
OK fair, we shouldn't be warning if this is something that can actually
reasonably happen.
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2025-10-16 8:34 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-10 17:39 [PATCH 0/2] Do not change split folio target order Zi Yan
2025-10-10 17:39 ` [PATCH 1/2] mm/huge_memory: do not change split_huge_page*() target order silently Zi Yan
2025-10-10 18:02 ` Luis Chamberlain
2025-10-13 17:11 ` Zi Yan
2025-10-11 2:25 ` Lance Yang
2025-10-13 17:06 ` Zi Yan
2025-10-11 9:00 ` kernel test robot
2025-10-12 0:41 ` Wei Yang
2025-10-13 17:07 ` Zi Yan
2025-10-12 8:24 ` Pankaj Raghav (Samsung)
2025-10-13 17:11 ` Zi Yan
2025-10-15 14:25 ` Lorenzo Stoakes
2025-10-15 22:57 ` Zi Yan
2025-10-16 8:34 ` Lorenzo Stoakes
2025-10-10 17:39 ` [PATCH 2/2] mm/memory-failure: improve large block size folio handling Zi Yan
2025-10-10 18:05 ` Luis Chamberlain
2025-10-11 4:12 ` Miaohe Lin
2025-10-11 5:00 ` Matthew Wilcox
2025-10-11 9:07 ` Miaohe Lin
2025-10-13 17:04 ` Zi Yan
2025-10-11 10:23 ` kernel test robot
2025-10-13 17:08 ` Zi Yan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox