linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/3] Only free healthy pages in high-order has_hwpoisoned folio
@ 2026-02-02 19:41 Jiaqi Yan
  2026-02-02 19:41 ` [PATCH v4 1/3] mm/page_alloc: only " Jiaqi Yan
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Jiaqi Yan @ 2026-02-02 19:41 UTC (permalink / raw)
  To: jackmanb, hannes, linmiaohe, ziy, harry.yoo, willy
  Cc: nao.horiguchi, david, lorenzo.stoakes, william.roche, tony.luck,
	wangkefeng.wang, jane.chu, akpm, osalvador, muchun.song,
	rientjes, duenwen, jthoughton, linux-mm, linux-kernel,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, boudewijn, Jiaqi Yan

At the end of dissolve_free_hugetlb_folio() that a free HugeTLB
folio becomes non-HugeTLB, it is released to buddy allocator
as a high-order folio, e.g. a folio that contains 262144 pages
if the folio was a 1G HugeTLB hugepage.

This is problematic if the HugeTLB hugepage contained HWPoison
subpages. In that case, since buddy allocator does not check
HWPoison for non-zero-order folio, the raw HWPoison page can
be given out with its buddy page and be re-used by either
kernel or userspace.

Memory failure recovery (MFR) in kernel does attempt to take
raw HWPoison page off buddy allocator after
dissolve_free_hugetlb_folio(). However, there is always a time
window between dissolve_free_hugetlb_folio() frees a HWPoison
high-order folio to buddy allocator and MFR takes HWPoison
raw page off buddy allocator.

Another similar situation is when a transparent huge page (THP)
is handled by MFR but splitting failed. Such THP will eventually
be released to buddy allocator when owning userspace processes
are gone, but with certain subpages having HWPoison [9].

One obvious way to avoid both problems is to add page sanity
checks in page allocate or free path. However, it is against
the past efforts to reduce sanity check overhead [1,2,3].

Introduce free_has_hwpoisoned() to only free the healthy pages
and excludes the HWPoison ones in the high-order folio.
free_has_hwpoisoned() happens at the end of free_pages_prepare(),
which already deals with both decomposing the original compound
page, updating page metadata like alloc tag and page owner.
It is also only applied when PG_has_hwpoisoned indicates folio
contains certain HWPoison page(s) for performance reason.
Its idea is to iterate through the sub-pages of the folio to
identify contiguous ranges of healthy pages. Instead of freeing
pages one by one, decompose healthy ranges into the largest
possible blocks. Each block is freed via free_one_page() directly.

free_has_hwpoisoned() has linear time complexity wrt the number
of pages in the folio. While the power-of-two decomposition
ensures that the number of calls to the buddy allocator is
logarithmic for each contiguous healthy range, the mandatory
linear scan of pages to identify PageHWPoison defines the
overall time complexity.

I tested with some test-only code [4] and hugetlb-mfr [5], by
checking the status of pcplist and freelist immediately after
dissolve_free_hugetlb_folio() a free 2M or 1G hugetlb page that
contains 1~8 HWPoison raw pages:

- HWPoison pages are excluded by free_has_hwpoisoned().

- Some healthy pages can be in zone->per_cpu_pageset (pcplist)
  because pcp_count is not high enough. Many healthy pages are
  in some order's zone->free_area[order].free_list (freelist).

- In rare cases, some healthy pages are in neither pcplist
  nor freelist. My best guest is they are allocated before
  the test checks.

To illustrate the latency free_has_hwpoisoned() added to the
memory freeing path, I tested its time cost with 8 HWPoison
pages with instrument code in [4] for 20 sample runs:

- Has HWPoison path: mean=1448us, stdev=174ms

- No HWPoison path: mean=66us, stdev=6us

free_has_hwpoisoned() is around 22x the baseline. It is far from
triggering soft lockup, and the cost is fair for handling
exceptional hardware memory errors.

With free_has_hwpoisoned() ensuring HWPoison pages never made into
buddy allocator, MFR don't need to take_page_off_buddy() anymore
after disovling HWPoison hugepages. So replace __page_handle_poison()
with new __hugepage_handle_poison() for HugeTLB specific call sites.

Based on commit 8dfce8991b95d ("Merge tag 'pinctrl-v6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl")

Changelog

v3 [8] -> v4

- Address comments from Zi Yan, Miaohe Lin, Harry Yoo.

- Set has_hwpoisoned flag after introducing free_has_hwpoisoned().

- Unwrap free_pages_prepare_has_hwpoisoned() into free_pages_prepare().

- If folio has HWPoison, its healthy pages will be freed with FPI_NONE
  right in free_pages_prepare(), who returns false to indicate caller
  should not proceeding its own freeing action.

- Rework the commit on __page_handle_poison(). Only change the handling
  for HWPoison HugeTLB page, leaving free buddy page and soft offline
  handling alone.

v2 [7] -> v3:

- Address comments from Mathew Wilcox, Harry Hoo, Miaohe Lin.

- Let free_has_hwpoisoned() happen after free_pages_prepare(),
  which help to deal with decomposing the original compound page,
  and with page metadata like alloc tag and page owner.

- Tested with "page_owner=on" and CONFIG_MEM_ALLOC_PROFILING*=y.

- Wrap checking PG_has_hwpoisoned and free_has_hwpoisoned() into
  free_pages_prepare_has_hwpoisoned(), which replaces
  free_pages_prepare() calls in free_frozen_pages().

- Rename free_has_hwpoison_page() to free_has_hwpoisoned().

- Measure latency added by free_has_hwpoisoned().

- Ensure struct page *end is only used for pointer arithmetic,
  instead of accessed as page.

- Refactor page_handl_poison instead of just __page_handle_poison().

v1 [6] -> v2:

- Total reimplementation based on discussions with Mathew Wilcox,
  Harry Hoo, Zi Yan etc

- hugetlb_free_hwpoison_folio() => free_has_hwpoison_pages().

- Utilize has_hwpoisoned flag to tell buddy allocator a high-order
  folio contains HWPoison.

- Simplify __page_handle_poison() given that the HWPoison page(s)
  won't be freed within high-order folio.

[1] https://lore.kernel.org/linux-mm/1460711275-1130-15-git-send-email-mgorman@techsingularity.net
[2] https://lore.kernel.org/linux-mm/1460711275-1130-16-git-send-email-mgorman@techsingularity.net
[3] https://lore.kernel.org/all/20230216095131.17336-1-vbabka@suse.cz
[4] https://drive.google.com/file/d/1CzJn1Cc4wCCm183Y77h244fyZIkTLzCt/view?usp=sharing
[5] https://lore.kernel.org/linux-mm/20251116013223.1557158-3-jiaqiyan@google.com
[6] https://lore.kernel.org/linux-mm/20251116014721.1561456-1-jiaqiyan@google.com
[7] https://lore.kernel.org/linux-mm/20251219183346.3627510-1-jiaqiyan@google.com
[8] https://lore.kernel.org/linux-mm/20260112004923.888429-1-jiaqiyan@google.com
[9] https://lore.kernel.org/linux-mm/20260113205441.506897-1-boudewijn@delta-utec.com

Jiaqi Yan (3):
  mm/page_alloc: only free healthy pages in high-order has_hwpoisoned
    folio
  mm/memory-failure: set has_hwpoisoned flags on dissolved HugeTLB folio
  mm/memory-failure: skip take_page_off_buddy after dissolving HWPoison
    HugeTLB page

 include/linux/page-flags.h |   2 +-
 mm/memory-failure.c        |  37 +++++++++--
 mm/page_alloc.c            | 133 ++++++++++++++++++++++++++++++++++++-
 3 files changed, 163 insertions(+), 9 deletions(-)

-- 
2.53.0.rc2.204.g2597b5adb4-goog



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v4 1/3] mm/page_alloc: only free healthy pages in high-order has_hwpoisoned folio
  2026-02-02 19:41 [PATCH v4 0/3] Only free healthy pages in high-order has_hwpoisoned folio Jiaqi Yan
@ 2026-02-02 19:41 ` Jiaqi Yan
  2026-02-02 19:41 ` [PATCH v4 2/3] mm/memory-failure: set has_hwpoisoned flags on dissolved HugeTLB folio Jiaqi Yan
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Jiaqi Yan @ 2026-02-02 19:41 UTC (permalink / raw)
  To: jackmanb, hannes, linmiaohe, ziy, harry.yoo, willy
  Cc: nao.horiguchi, david, lorenzo.stoakes, william.roche, tony.luck,
	wangkefeng.wang, jane.chu, akpm, osalvador, muchun.song,
	rientjes, duenwen, jthoughton, linux-mm, linux-kernel,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, boudewijn, Jiaqi Yan

At the end of dissolve_free_hugetlb_folio(), a free HugeTLB folio
becomes non-HugeTLB, and it is released to buddy allocator
as a high-order folio, e.g. a folio that contains 262144 pages
if the folio was a 1G HugeTLB hugepage.

This is problematic if the HugeTLB hugepage contained HWPoison
subpages. In that case, since buddy allocator does not check
HWPoison for non-zero-order folio, the raw HWPoison page can
be given out with its buddy page and be re-used by either
kernel or userspace.

Memory failure recovery (MFR) in kernel does attempt to take
raw HWPoison page off buddy allocator after
dissolve_free_hugetlb_folio(). However, there is always a time
window between dissolve_free_hugetlb_folio() frees a HWPoison
high-order folio to buddy allocator and MFR takes HWPoison
raw page off buddy allocator.

Another similar situation is when a transparent huge page (THP)
runs into memory failure but splitting failed. Such THP will
eventually be released to buddy allocator when owning userspace
processes are gone, but with certain subpages having HWPoison.

One obvious way to avoid both problems is to add page sanity
checks in page allocate or free path. However, it is against
the past efforts to reduce sanity check overhead [1,2,3].

Introduce free_has_hwpoisoned() to only free the healthy pages
and to exclude the HWPoison ones in the high-order folio.
The idea is to iterate through the sub-pages of the folio to
identify contiguous ranges of healthy pages. Instead of freeing
pages one by one, decompose healthy ranges into the largest
possible blocks having different orders. Every block meets the
requirements to be freed via __free_one_page().

free_has_hwpoisoned() has linear time complexity wrt the number
of pages in the folio. While the power-of-two decomposition
ensures that the number of calls to the buddy allocator is
logarithmic for each contiguous healthy range, the mandatory
linear scan of pages to identify PageHWPoison() defines the
overall time complexity. For a 1G hugepage having several
HWPoison pages, free_has_hwpoisoned() takes around 2ms on
average.

Since free_has_hwpoisoned() has nontrivial overhead, it is
added to free_pages_prepare() as a shortcut and is only done
if PG_has_hwpoisoned indicates HWPoison page exists and
after checks and preparations all succeeded.

[1] https://lore.kernel.org/linux-mm/1460711275-1130-15-git-send-email-mgorman@techsingularity.net
[2] https://lore.kernel.org/linux-mm/1460711275-1130-16-git-send-email-mgorman@techsingularity.net
[3] https://lore.kernel.org/all/20230216095131.17336-1-vbabka@suse.cz

Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
---
 mm/page_alloc.c | 133 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 131 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cbf758e27aa2c..d6883f1b17d95 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -242,6 +242,7 @@ gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK;
 unsigned int pageblock_order __read_mostly;
 #endif
 
+static void free_has_hwpoisoned(struct page *page, unsigned int order);
 static void __free_pages_ok(struct page *page, unsigned int order,
 			    fpi_t fpi_flags);
 
@@ -1340,14 +1341,30 @@ static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, unsigned int nr)
 
 #endif /* CONFIG_MEM_ALLOC_PROFILING */
 
-__always_inline bool free_pages_prepare(struct page *page,
-			unsigned int order)
+/*
+ * Returns
+ * - true: checks and preparations all good, caller can proceed freeing.
+ * - false: do not proceed freeing for one of the two reasons:
+ *   1. Some check failed so it is not safe to proceed freeing.
+ *   2. A compound page having some HWPoison pages. The healthy pages
+ *      are already safely freed, and HWPoison ones isolated.
+ */
+__always_inline bool free_pages_prepare(struct page *page, unsigned int order)
 {
 	int bad = 0;
 	bool skip_kasan_poison = should_skip_kasan_poison(page);
 	bool init = want_init_on_free();
 	bool compound = PageCompound(page);
 	struct folio *folio = page_folio(page);
+	/*
+	 * When dealing with compound page, PG_has_hwpoisoned is cleared
+	 * with PAGE_FLAGS_SECOND. So the check must be done first.
+	 *
+	 * Note we can't exclude PG_has_hwpoisoned from PAGE_FLAGS_SECOND.
+	 * Because PG_has_hwpoisoned == PG_active, free_page_is_bad() will
+	 * confuse and complaint that the first tail page is still active.
+	 */
+	bool should_fhh = compound && folio_test_has_hwpoisoned(folio);
 
 	VM_BUG_ON_PAGE(PageTail(page), page);
 
@@ -1470,6 +1487,16 @@ __always_inline bool free_pages_prepare(struct page *page,
 
 	debug_pagealloc_unmap_pages(page, 1 << order);
 
+	/*
+	 * After breaking down compound page and dealing with page metadata
+	 * (e.g. page owner and page alloc tags), take a shortcut if this
+	 * was a compound page containing certain HWPoison subpages.
+	 */
+	if (should_fhh) {
+		free_has_hwpoisoned(page, order);
+		return false;
+	}
+
 	return true;
 }
 
@@ -2953,6 +2980,108 @@ static bool free_frozen_page_commit(struct zone *zone,
 	return ret;
 }
 
+/*
+ * Given a range of physically contiguous pages, efficiently free them
+ * block by block. Block order is chosen to meet the PFN alignment
+ * requirement in __free_one_page().
+ */
+static void free_contiguous_pages(struct page *curr,
+				  unsigned long nr_pages)
+{
+	unsigned int order;
+	unsigned int align_order;
+	unsigned int size_order;
+	unsigned long remaining;
+	unsigned long pfn = page_to_pfn(curr);
+	const unsigned long end_pfn = pfn + nr_pages;
+	struct zone *zone = page_zone(curr);
+
+	/*
+	 * This decomposition algorithm at every iteration chooses the
+	 * order to be the minimum of two constraints:
+	 * - Alignment: the largest power-of-two that divides current pfn.
+	 * - Size: the largest power-of-two that fits in the current
+	 *   remaining number of pages.
+	 */
+	while (pfn < end_pfn) {
+		remaining = end_pfn - pfn;
+		align_order = ffs(pfn) - 1;
+		size_order = fls_long(remaining) - 1;
+		order = min(align_order, size_order);
+
+		free_one_page(zone, curr, pfn, order, FPI_NONE);
+		curr += (1UL << order);
+		pfn += (1UL << order);
+	}
+
+	VM_WARN_ON(pfn != end_pfn);
+}
+
+/*
+ * Given a high-order compound page containing certain number of HWPoison
+ * pages, free only the healthy ones assuming FPI_NONE.
+ *
+ * Pages must have passed free_pages_prepare(). Even if having HWPoison
+ * pages, breaking down compound page and updating metadata (e.g. page
+ * owner, alloc tag) can be done together during free_pages_prepare(),
+ * which simplifies the splitting here: unlike __split_unmapped_folio(),
+ * there is no need to turn split pages into a compound page or to carry
+ * metadata.
+ *
+ * It calls free_one_page O(2^order) times and cause nontrivial overhead.
+ * So only use this when the compound page really contains HWPoison.
+ *
+ * This implementation doesn't work in memdesc world.
+ */
+static void free_has_hwpoisoned(struct page *page, unsigned int order)
+{
+	struct page *curr = page;
+	struct page *next;
+	unsigned long nr_pages;
+	/*
+	 * Don't assume end points to a valid page. It is only used
+	 * here for pointer arithmetic.
+	 */
+	struct page *end = page + (1 << order);
+	unsigned long total_freed = 0;
+	unsigned long total_hwp = 0;
+
+	VM_WARN_ON(order == 0);
+	VM_WARN_ON(page->flags.f & PAGE_FLAGS_CHECK_AT_PREP);
+
+	while (curr < end) {
+		next = curr;
+		nr_pages = 0;
+
+		while (next < end && !PageHWPoison(next)) {
+			++next;
+			++nr_pages;
+		}
+
+		if (next != end && PageHWPoison(next)) {
+			/*
+			 * Avoid accounting error when the page is freed
+			 * by unpoison_memory().
+			 */
+			clear_page_tag_ref(next);
+			++total_hwp;
+		}
+
+		free_contiguous_pages(curr, nr_pages);
+		total_freed += nr_pages;
+
+		if (next == end)
+			break;
+
+		VM_WARN_ON(!PageHWPoison(next));
+		curr = next + 1;
+	}
+
+	VM_WARN_ON(total_freed + total_hwp != (1 << order));
+	pr_info("Freed %#lx pages, excluded %lu hwpoison pages\n",
+		total_freed, total_hwp);
+}
+
 /*
  * Free a pcp page
  */
-- 
2.53.0.rc2.204.g2597b5adb4-goog



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v4 2/3] mm/memory-failure: set has_hwpoisoned flags on dissolved HugeTLB folio
  2026-02-02 19:41 [PATCH v4 0/3] Only free healthy pages in high-order has_hwpoisoned folio Jiaqi Yan
  2026-02-02 19:41 ` [PATCH v4 1/3] mm/page_alloc: only " Jiaqi Yan
@ 2026-02-02 19:41 ` Jiaqi Yan
  2026-02-02 19:41 ` [PATCH v4 3/3] mm/memory-failure: skip take_page_off_buddy after dissolving HWPoison HugeTLB page Jiaqi Yan
  2026-02-04 15:23 ` [PATCH v4 0/3] Only free healthy pages in high-order has_hwpoisoned folio Boudewijn van der Heide
  3 siblings, 0 replies; 7+ messages in thread
From: Jiaqi Yan @ 2026-02-02 19:41 UTC (permalink / raw)
  To: jackmanb, hannes, linmiaohe, ziy, harry.yoo, willy
  Cc: nao.horiguchi, david, lorenzo.stoakes, william.roche, tony.luck,
	wangkefeng.wang, jane.chu, akpm, osalvador, muchun.song,
	rientjes, duenwen, jthoughton, linux-mm, linux-kernel,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, boudewijn, Jiaqi Yan

When a free HWPoison HugeTLB folio is dissolved, it becomes
non-HugeTLB and is released to buddy allocator as a high-order
folio.

Set has_hwpoisoned flags on the high-order folio so that buddy
allocator can tell that it contains certain HWPoison page(s),
and can handle it specially with free_has_hwpoisoned().

Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
---
 include/linux/page-flags.h | 2 +-
 mm/memory-failure.c        | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index f7a0e4af0c734..d13835e265952 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -904,7 +904,7 @@ static inline int PageTransCompound(const struct page *page)
 TESTPAGEFLAG_FALSE(TransCompound, transcompound)
 #endif
 
-#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_TRANSPARENT_HUGEPAGE)
+#if defined(CONFIG_MEMORY_FAILURE) && (defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLB_PAGE))
 /*
  * PageHasHWPoisoned indicates that at least one subpage is hwpoisoned in the
  * compound page.
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index c80c2907da333..529a83a325740 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1952,6 +1952,7 @@ void folio_clear_hugetlb_hwpoison(struct folio *folio)
 	if (folio_test_hugetlb_vmemmap_optimized(folio))
 		return;
 	folio_clear_hwpoison(folio);
+	folio_set_has_hwpoisoned(folio);
 	folio_free_raw_hwp(folio, true);
 }
 
-- 
2.53.0.rc2.204.g2597b5adb4-goog



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v4 3/3] mm/memory-failure: skip take_page_off_buddy after dissolving HWPoison HugeTLB page
  2026-02-02 19:41 [PATCH v4 0/3] Only free healthy pages in high-order has_hwpoisoned folio Jiaqi Yan
  2026-02-02 19:41 ` [PATCH v4 1/3] mm/page_alloc: only " Jiaqi Yan
  2026-02-02 19:41 ` [PATCH v4 2/3] mm/memory-failure: set has_hwpoisoned flags on dissolved HugeTLB folio Jiaqi Yan
@ 2026-02-02 19:41 ` Jiaqi Yan
  2026-02-04 15:23 ` [PATCH v4 0/3] Only free healthy pages in high-order has_hwpoisoned folio Boudewijn van der Heide
  3 siblings, 0 replies; 7+ messages in thread
From: Jiaqi Yan @ 2026-02-02 19:41 UTC (permalink / raw)
  To: jackmanb, hannes, linmiaohe, ziy, harry.yoo, willy
  Cc: nao.horiguchi, david, lorenzo.stoakes, william.roche, tony.luck,
	wangkefeng.wang, jane.chu, akpm, osalvador, muchun.song,
	rientjes, duenwen, jthoughton, linux-mm, linux-kernel,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, boudewijn, Jiaqi Yan

Now that HWPoison subpage(s) within HugeTLB page will be rejected by
buddy allocator during dissolve_free_hugetlb_folio(), there is no
need to drain_all_pages() and take_page_off_buddy() anymore. In fact,
calling take_page_off_buddy() after dissolve_free_hugetlb_folio()
succeeded returns false, making caller think __page_handle_poison()
failed.

Add __hugepage_handle_poison() and replace __page_handle_poison() at
HugeTLB specific call sites. The being handled HugeTLB page either
is free at the moment of try_memory_failure_hugetlb(), or becomes
free at the moment of me_huge_page().

Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
---
 mm/memory-failure.c | 36 ++++++++++++++++++++++++++++++------
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 529a83a325740..58b34f5d2c05d 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -163,6 +163,30 @@ static struct rb_root_cached pfn_space_itree = RB_ROOT_CACHED;
 static DEFINE_MUTEX(pfn_space_lock);
 
 /*
+ * Only for a HugeTLB page being handled by memory_failure(). The key
+ * difference to soft_offline() is that, no HWPoison subpage will make
+ * into buddy allocator after a successful dissolve_free_hugetlb_folio(),
+ * so take_page_off_buddy() is unnecessary.
+ */
+static int __hugepage_handle_poison(struct page *page)
+{
+	struct folio *folio = page_folio(page);
+
+	VM_WARN_ON_FOLIO(!folio_test_hwpoison(folio), folio);
+
+	/*
+	 * Can't use dissolve_free_hugetlb_folio() without a reliable
+	 * raw_hwp_list telling which subpage is HWPoison.
+	 */
+	if (folio_test_hugetlb_raw_hwp_unreliable(folio))
+		/* raw_hwp_list becomes unreliable when kmalloc() fails. */
+		return -ENOMEM;
+
+	return dissolve_free_hugetlb_folio(folio);
+}
+
+/*
+ * Only for a free or HugeTLB page being handled by soft_offline().
  * Return values:
  *   1:   the page is dissolved (if needed) and taken off from buddy,
  *   0:   the page is dissolved (if needed) and not taken off from buddy,
@@ -1174,11 +1198,11 @@ static int me_huge_page(struct page_state *ps, struct page *p)
 		 * subpages.
 		 */
 		folio_put(folio);
-		if (__page_handle_poison(p) > 0) {
+		if (__hugepage_handle_poison(p)) {
+			res = MF_FAILED;
+		} else {
 			page_ref_inc(p);
 			res = MF_RECOVERED;
-		} else {
-			res = MF_FAILED;
 		}
 	}
 
@@ -2067,11 +2091,11 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
 	 */
 	if (res == 0) {
 		folio_unlock(folio);
-		if (__page_handle_poison(p) > 0) {
+		if (__hugepage_handle_poison(p)) {
+			res = MF_FAILED;
+		} else {
 			page_ref_inc(p);
 			res = MF_RECOVERED;
-		} else {
-			res = MF_FAILED;
 		}
 		return action_result(pfn, MF_MSG_FREE_HUGE, res);
 	}
-- 
2.53.0.rc2.204.g2597b5adb4-goog



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 0/3] Only free healthy pages in high-order has_hwpoisoned folio
  2026-02-02 19:41 [PATCH v4 0/3] Only free healthy pages in high-order has_hwpoisoned folio Jiaqi Yan
                   ` (2 preceding siblings ...)
  2026-02-02 19:41 ` [PATCH v4 3/3] mm/memory-failure: skip take_page_off_buddy after dissolving HWPoison HugeTLB page Jiaqi Yan
@ 2026-02-04 15:23 ` Boudewijn van der Heide
  2026-02-04 15:48   ` Zi Yan
  3 siblings, 1 reply; 7+ messages in thread
From: Boudewijn van der Heide @ 2026-02-04 15:23 UTC (permalink / raw)
  To: jiaqiyan
  Cc: Liam.Howlett, akpm, boudewijn, david, duenwen, hannes, harry.yoo,
	jackmanb, jane.chu, jthoughton, linmiaohe, linux-kernel,
	linux-mm, lorenzo.stoakes, mhocko, muchun.song, nao.horiguchi,
	osalvador, rientjes, rppt, surenb, tony.luck, vbabka,
	wangkefeng.wang, william.roche, willy, ziy

Hi Jiaqi,
Thanks for including the THP scenario.

> Another similar situation is when a transparent huge page (THP)
> is handled by MFR but splitting failed. Such THP will eventually
> be released to buddy allocator when owning userspace processes
> are gone, but with certain subpages having HWPoison [9].

I think for failed-split THP, we need to do the following to support them:

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index cf0d526e6d41..3f727038f400 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2479,6 +2479,7 @@ int memory_failure(unsigned long pfn, int flags)
 		if (err || new_order) {
 			/* get folio again in case the original one is split */
 			folio = page_folio(p);
+			folio_set_has_hwpoisoned(folio);
 			res = -EHWPOISON;
 			kill_procs_now(p, pfn, flags, folio);
 			put_page(p);

We set the PG_has_hwpoison flag here again, 
because when the split partially succeeds (new_order > 0),
page_folio(p) returns a new smaller-order folio that doesn't have the flag set. 
Without this, when the THP is eventually freed, 
free_pages_prepare() won't see the flag 
and HWPoison subpages could enter the buddy allocator. 

This aligns with what Miaohe mentioned in the earlier discussion [1]:

<quote Miaohe>

IMHO, it's enough to handle poisoned sub-pages when in-use or split-failed THP
eventually be released to the buddy.

</quote Miaohe>

Would you prefer to add this to your series,
or should I send a separate follow-up patch?

[1] https://lore.kernel.org/linux-mm/20260113205441.506897-1-boudewijn@delta-utec.com/

Kind regards,
Boudewijn


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 0/3] Only free healthy pages in high-order has_hwpoisoned folio
  2026-02-04 15:23 ` [PATCH v4 0/3] Only free healthy pages in high-order has_hwpoisoned folio Boudewijn van der Heide
@ 2026-02-04 15:48   ` Zi Yan
  2026-02-06 16:16     ` Boudewijn van der Heide
  0 siblings, 1 reply; 7+ messages in thread
From: Zi Yan @ 2026-02-04 15:48 UTC (permalink / raw)
  To: Boudewijn van der Heide
  Cc: jiaqiyan, Liam.Howlett, akpm, david, duenwen, hannes, harry.yoo,
	jackmanb, jane.chu, jthoughton, linmiaohe, linux-kernel,
	linux-mm, lorenzo.stoakes, mhocko, muchun.song, nao.horiguchi,
	osalvador, rientjes, rppt, surenb, tony.luck, vbabka,
	wangkefeng.wang, william.roche, willy

On 4 Feb 2026, at 10:23, Boudewijn van der Heide wrote:

> Hi Jiaqi,
> Thanks for including the THP scenario.
>
>> Another similar situation is when a transparent huge page (THP)
>> is handled by MFR but splitting failed. Such THP will eventually
>> be released to buddy allocator when owning userspace processes
>> are gone, but with certain subpages having HWPoison [9].
>
> I think for failed-split THP, we need to do the following to support them:
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index cf0d526e6d41..3f727038f400 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -2479,6 +2479,7 @@ int memory_failure(unsigned long pfn, int flags)
>  		if (err || new_order) {
>  			/* get folio again in case the original one is split */
>  			folio = page_folio(p);
> +			folio_set_has_hwpoisoned(folio);
>  			res = -EHWPOISON;
>  			kill_procs_now(p, pfn, flags, folio);
>  			put_page(p);
>
> We set the PG_has_hwpoison flag here again,
> because when the split partially succeeds (new_order > 0),
> page_folio(p) returns a new smaller-order folio that doesn't have the flag set.

No, split code[1] sets has_hwpoison flag when it split a folio to a non-0 order
folios. If you do not see it in your kernel code, you might need to update
your kernel to latest stable one or one newer than or equal to 6.18.


[1] https://elixir.bootlin.com/linux/v6.18/source/mm/huge_memory.c#L3343


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 0/3] Only free healthy pages in high-order has_hwpoisoned folio
  2026-02-04 15:48   ` Zi Yan
@ 2026-02-06 16:16     ` Boudewijn van der Heide
  0 siblings, 0 replies; 7+ messages in thread
From: Boudewijn van der Heide @ 2026-02-06 16:16 UTC (permalink / raw)
  To: ziy
  Cc: Liam.Howlett, akpm, boudewijn, david, duenwen, hannes, harry.yoo,
	jackmanb, jane.chu, jiaqiyan, jthoughton, linmiaohe,
	linux-kernel, linux-mm, lorenzo.stoakes, mhocko, muchun.song,
	nao.horiguchi, osalvador, rientjes, rppt, surenb, tony.luck,
	vbabka, wangkefeng.wang, william.roche, willy

Hi Yan, Zi,

> No, split code[1] sets has_hwpoison flag when it split a folio to a non-0 order
> folios. If you do not see it in your kernel code, you might need to update
> your kernel to latest stable one or one newer than or equal to 6.18.

Thanks for pointing out the split code. I went through the callstacks
again and I agree it propagates the flag correctly.
I missed this initially.

Kind regards,
Boudewijn


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-02-06 16:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-02 19:41 [PATCH v4 0/3] Only free healthy pages in high-order has_hwpoisoned folio Jiaqi Yan
2026-02-02 19:41 ` [PATCH v4 1/3] mm/page_alloc: only " Jiaqi Yan
2026-02-02 19:41 ` [PATCH v4 2/3] mm/memory-failure: set has_hwpoisoned flags on dissolved HugeTLB folio Jiaqi Yan
2026-02-02 19:41 ` [PATCH v4 3/3] mm/memory-failure: skip take_page_off_buddy after dissolving HWPoison HugeTLB page Jiaqi Yan
2026-02-04 15:23 ` [PATCH v4 0/3] Only free healthy pages in high-order has_hwpoisoned folio Boudewijn van der Heide
2026-02-04 15:48   ` Zi Yan
2026-02-06 16:16     ` Boudewijn van der Heide

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox