linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] Only free healthy pages in high-order has_hwpoisoned folio
@ 2026-01-12  0:49 Jiaqi Yan
  2026-01-12  0:49 ` [PATCH v3 1/3] mm/memory-failure: set has_hwpoisoned flags on HugeTLB folio Jiaqi Yan
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Jiaqi Yan @ 2026-01-12  0:49 UTC (permalink / raw)
  To: jackmanb, hannes, linmiaohe, ziy, harry.yoo, willy
  Cc: nao.horiguchi, david, lorenzo.stoakes, william.roche, tony.luck,
	wangkefeng.wang, jane.chu, akpm, osalvador, muchun.song,
	rientjes, duenwen, jthoughton, linux-mm, linux-kernel,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, Jiaqi Yan

At the end of dissolve_free_hugetlb_folio() that a free HugeTLB
folio becomes non-HugeTLB, it is released to buddy allocator
as a high-order folio, e.g. a folio that contains 262144 pages
if the folio was a 1G HugeTLB hugepage.

This is problematic if the HugeTLB hugepage contained HWPoison
subpages. In that case, since buddy allocator does not check
HWPoison for non-zero-order folio, the raw HWPoison page can
be given out with its buddy page and be re-used by either
kernel or userspace.

Memory failure recovery (MFR) in kernel does attempt to take
raw HWPoison page off buddy allocator after
dissolve_free_hugetlb_folio(). However, there is always a time
window between dissolve_free_hugetlb_folio() frees a HWPoison
high-order folio to buddy allocator and MFR takes HWPoison
raw page off buddy allocator.

One obvious way to avoid this problem is to add page sanity
checks in page allocate or free path. However, it is against
the past efforts to reduce sanity check overhead [1,2,3].

Introduce free_has_hwpoisoned() to "salvage" the healthy pages
and excludes the HWPoison ones in the high-order folio.
free_has_hwpoisoned() happens after free_pages_prepare(),
which already deals with both decomposing the original compound
page, updating page metadata like alloc tag and page owner.
Its idea is to iterate through the sub-pages of the folio to
identify contiguous ranges of healthy pages. Instead of freeing
pages one by one, decompose healthy ranges into the largest
possible blocks. Each block is freed via free_one_page() directly.

free_has_hwpoisoned has linear time complexity wrt the number
of pages in the folio. While the power-of-two decomposition
ensures that the number of calls to the buddy allocator is
logarithmic for each contiguous healthy range, the mandatory
linear scan of pages to identify PageHWPoison defines the
overall time complexity.

I tested with some test-only code [4] and hugetlb-mfr [5], by
checking the status of pcplist and freelist immediately after
dissolve_free_hugetlb_folio() a free 2M or 1G hugetlb page that
contains 1~8 HWPoison raw pages:

- HWPoison pages are excluded by free_has_hwpoisoned().

- Some healthy pages can be in zone->per_cpu_pageset (pcplist)
  because pcp_count is not high enough. Many healthy pages are
  in some order's zone->free_area[order].free_list (freelist).

- In rare cases, some healthy pages are in neither pcplist
  nor freelist. My best guest is they are allocated before
  the test checks.

To illustrate the latency free_has_hwpoisoned() added to the
freeing memory path, I tested its time cost with 8 HWPoison
pages with instrument code in [4] for 20 sample runs:

- Has HWPoison path: mean=2.02ms, stdev=0.14ms

- No HWPoison path: mean=66us, stdev=6us

free_has_hwpoisoned() is around 30x the baseline. It is far from
triggering soft lockup, and the cost is fair for handling
exceptional hardware memory errors.

Given this nontrivial overhead, checking PG_has_hwpoisoned, doing
normal free_pages_prepare(), and doing free_has_hwpoisoned() when
necessary are wrapped in free_pages_prepare_has_hwpoisoned(), which
replaces free_pages_prepare() calls in free_frozen_pages().

With free_has_hwpoisoned() ensuring HWPoison pages never made into
buddy allocator, MFR don't need to take_page_off_buddy() anymore
after disovling HWPoison hugepages. So refactor page_handle_poison
to remove take_page_off_buddy() in case of hugepage, but still
take_page_off_buddy() in case of free buddy page.

Based on commit ccd1cdca5cd4 ("Merge tag 'nfsd-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux")

Changelog

v2 [7] -> v3:

- Address comments from Mathew Wilcox, Harry Hoo, Miaohe Lin.

- Let free_has_hwpoisoned() happen after free_pages_prepare(),
  which help to deal with decomposing the original compound page,
  and with page metadata like alloc tag and page owner.

- Tested with "page_owner=on" and CONFIG_MEM_ALLOC_PROFILING*=y.

- Wrap checking PG_has_hwpoisoned and free_has_hwpoisoned() into
  free_pages_prepare_has_hwpoisoned(), which replaces
  free_pages_prepare() calls in free_frozen_pages().

- Rename free_has_hwpoison_page() to free_has_hwpoisoned().

- Measure latency added by free_has_hwpoisoned().

- Ensure struct page *end is only used for pointer arithmetic,
  instead of accessed as page.

- Refactor page_handl_poison instead of just __page_handle_poison.

v1 [6] -> v2:

- Total reimplementation based on discussions with Mathew Wilcox,
  Harry Hoo, Zi Yan etc

- hugetlb_free_hwpoison_folio => free_has_hwpoison_pages.

- Utilize has_hwpoisoned flag to tell buddy allocator a high-order
  folio contains HWPoison.

- Simplify __page_handle_poison given that the HWPoison page(s)
  won't be freed within high-order folio.

[1] https://lore.kernel.org/linux-mm/1460711275-1130-15-git-send-email-mgorman@techsingularity.net
[2] https://lore.kernel.org/linux-mm/1460711275-1130-16-git-send-email-mgorman@techsingularity.net
[3] https://lore.kernel.org/all/20230216095131.17336-1-vbabka@suse.cz
[4] https://drive.google.com/file/d/1CzJn1Cc4wCCm183Y77h244fyZIkTLzCt/view?usp=sharing
[5] https://lore.kernel.org/linux-mm/20251116013223.1557158-3-jiaqiyan@google.com
[6] https://lore.kernel.org/linux-mm/20251116014721.1561456-1-jiaqiyan@google.com
[7] https://lore.kernel.org/linux-mm/20251219183346.3627510-1-jiaqiyan@google.com

Jiaqi Yan (3):
  mm/memory-failure: set has_hwpoisoned flags on HugeTLB folio
  mm/page_alloc: only free healthy pages in high-order has_hwpoisoned
    folio
  mm/memory-failure: refactor page_handle_poison()

 include/linux/page-flags.h |   2 +-
 mm/memory-failure.c        |  85 ++++++++++----------
 mm/page_alloc.c            | 157 ++++++++++++++++++++++++++++++++++++-
 3 files changed, 197 insertions(+), 47 deletions(-)

-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v3 1/3] mm/memory-failure: set has_hwpoisoned flags on HugeTLB folio
  2026-01-12  0:49 [PATCH v3 0/3] Only free healthy pages in high-order has_hwpoisoned folio Jiaqi Yan
@ 2026-01-12  0:49 ` Jiaqi Yan
  2026-01-12  2:50   ` Zi Yan
  2026-01-12  0:49 ` [PATCH v3 2/3] mm/page_alloc: only free healthy pages in high-order has_hwpoisoned folio Jiaqi Yan
  2026-01-12  0:49 ` [PATCH v3 3/3] mm/memory-failure: refactor page_handle_poison() Jiaqi Yan
  2 siblings, 1 reply; 5+ messages in thread
From: Jiaqi Yan @ 2026-01-12  0:49 UTC (permalink / raw)
  To: jackmanb, hannes, linmiaohe, ziy, harry.yoo, willy
  Cc: nao.horiguchi, david, lorenzo.stoakes, william.roche, tony.luck,
	wangkefeng.wang, jane.chu, akpm, osalvador, muchun.song,
	rientjes, duenwen, jthoughton, linux-mm, linux-kernel,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, Jiaqi Yan

When a free HWPoison HugeTLB folio is dissolved, it becomes
non-HugeTLB and is released to buddy allocator as a high-order
folio.

Set has_hwpoisoned flags on the high-order folio so that buddy
allocator can tell that it contains certain HWPoison page(s).
This is a prepare change for buddy allocator to handle only the
high-order HWPoison folio differently.

This cannot be done with hwpoison flag because users cannot tell
from the case that the page with hwpoison is hardware corrupted.

Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
---
 include/linux/page-flags.h | 2 +-
 mm/memory-failure.c        | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index f7a0e4af0c734..d13835e265952 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -904,7 +904,7 @@ static inline int PageTransCompound(const struct page *page)
 TESTPAGEFLAG_FALSE(TransCompound, transcompound)
 #endif
 
-#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_TRANSPARENT_HUGEPAGE)
+#if defined(CONFIG_MEMORY_FAILURE) && (defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLB_PAGE))
 /*
  * PageHasHWPoisoned indicates that at least one subpage is hwpoisoned in the
  * compound page.
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index fbc5a01260c89..d204de6c9792a 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1952,6 +1952,7 @@ void folio_clear_hugetlb_hwpoison(struct folio *folio)
 	if (folio_test_hugetlb_vmemmap_optimized(folio))
 		return;
 	folio_clear_hwpoison(folio);
+	folio_set_has_hwpoisoned(folio);
 	folio_free_raw_hwp(folio, true);
 }
 
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v3 2/3] mm/page_alloc: only free healthy pages in high-order has_hwpoisoned folio
  2026-01-12  0:49 [PATCH v3 0/3] Only free healthy pages in high-order has_hwpoisoned folio Jiaqi Yan
  2026-01-12  0:49 ` [PATCH v3 1/3] mm/memory-failure: set has_hwpoisoned flags on HugeTLB folio Jiaqi Yan
@ 2026-01-12  0:49 ` Jiaqi Yan
  2026-01-12  0:49 ` [PATCH v3 3/3] mm/memory-failure: refactor page_handle_poison() Jiaqi Yan
  2 siblings, 0 replies; 5+ messages in thread
From: Jiaqi Yan @ 2026-01-12  0:49 UTC (permalink / raw)
  To: jackmanb, hannes, linmiaohe, ziy, harry.yoo, willy
  Cc: nao.horiguchi, david, lorenzo.stoakes, william.roche, tony.luck,
	wangkefeng.wang, jane.chu, akpm, osalvador, muchun.song,
	rientjes, duenwen, jthoughton, linux-mm, linux-kernel,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, Jiaqi Yan

At the end of dissolve_free_hugetlb_folio(), a free HugeTLB folio
becomes non-HugeTLB, and it is released to buddy allocator
as a high-order folio, e.g. a folio that contains 262144 pages
if the folio was a 1G HugeTLB hugepage.

This is problematic if the HugeTLB hugepage contained HWPoison
subpages. In that case, since buddy allocator does not check
HWPoison for non-zero-order folio, the raw HWPoison page can
be given out with its buddy page and be re-used by either
kernel or userspace.

Memory failure recovery (MFR) in kernel does attempt to take
raw HWPoison page off buddy allocator after
dissolve_free_hugetlb_folio(). However, there is always a time
window between dissolve_free_hugetlb_folio() frees a HWPoison
high-order folio to buddy allocator and MFR takes HWPoison
raw page off buddy allocator.

One obvious way to avoid this problem is to add page sanity
checks in page allocate or free path. However, it is against
the past efforts to reduce sanity check overhead [1,2,3].

Introduce free_has_hwpoisoned() to only free the healthy pages
and to exclude the HWPoison ones in the high-order folio.
The idea is to iterate through the sub-pages of the folio to
identify contiguous ranges of healthy pages. Instead of freeing
pages one by one, decompose healthy ranges into the largest
possible blocks having different orders. Every block meets the
requirements to be freed via __free_one_page().

free_has_hwpoisoned() has linear time complexity wrt the number
of pages in the folio. While the power-of-two decomposition
ensures that the number of calls to the buddy allocator is
logarithmic for each contiguous healthy range, the mandatory
linear scan of pages to identify PageHWPoison() defines the
overall time complexity. For a 1G hugepage having several
HWPoison pages, free_has_hwpoisoned() takes around 2ms on
average.

Since free_has_hwpoisoned() has nontrivial overhead, it is
wrapped inside free_pages_prepare_has_hwpoisoned() and done
only PG_has_hwpoisoned indicates HWPoison page exists and
after free_pages_prepare() succeeded.

[1] https://lore.kernel.org/linux-mm/1460711275-1130-15-git-send-email-mgorman@techsingularity.net
[2] https://lore.kernel.org/linux-mm/1460711275-1130-16-git-send-email-mgorman@techsingularity.net
[3] https://lore.kernel.org/all/20230216095131.17336-1-vbabka@suse.cz

Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
---
 mm/page_alloc.c | 157 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 154 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 822e05f1a9646..9393589118604 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -215,6 +215,9 @@ gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK;
 unsigned int pageblock_order __read_mostly;
 #endif
 
+static bool free_pages_prepare_has_hwpoisoned(struct page *page,
+					      unsigned int order,
+					      fpi_t fpi_flags);
 static void __free_pages_ok(struct page *page, unsigned int order,
 			    fpi_t fpi_flags);
 
@@ -1568,8 +1571,10 @@ static void __free_pages_ok(struct page *page, unsigned int order,
 	unsigned long pfn = page_to_pfn(page);
 	struct zone *zone = page_zone(page);
 
-	if (free_pages_prepare(page, order))
-		free_one_page(zone, page, pfn, order, fpi_flags);
+	if (!free_pages_prepare_has_hwpoisoned(page, order, fpi_flags))
+		return;
+
+	free_one_page(zone, page, pfn, order, fpi_flags);
 }
 
 void __meminit __free_pages_core(struct page *page, unsigned int order,
@@ -2923,6 +2928,152 @@ static bool free_frozen_page_commit(struct zone *zone,
 	return ret;
 }
 
+/*
+ * Given a range of physically contiguous pages, efficiently free them
+ * block by block. Block order is chosen to meet the PFN alignment
+ * requirement in __free_one_page().
+ */
+static void free_contiguous_pages(struct page *curr, unsigned long nr_pages,
+				  fpi_t fpi_flags)
+{
+	unsigned int order;
+	unsigned int align_order;
+	unsigned int size_order;
+	unsigned long remaining;
+	unsigned long pfn = page_to_pfn(curr);
+	const unsigned long end_pfn = pfn + nr_pages;
+	struct zone *zone = page_zone(curr);
+
+	/*
+	 * This decomposition algorithm at every iteration chooses the
+	 * order to be the minimum of two constraints:
+	 * - Alignment: the largest power-of-two that divides current pfn.
+	 * - Size: the largest power-of-two that fits in the current
+	 *   remaining number of pages.
+	 */
+	while (pfn < end_pfn) {
+		remaining = end_pfn - pfn;
+		align_order = ffs(pfn) - 1;
+		size_order = fls_long(remaining) - 1;
+		order = min(align_order, size_order);
+
+		free_one_page(zone, curr, pfn, order, fpi_flags);
+		curr += (1UL << order);
+		pfn += (1UL << order);
+	}
+
+	VM_WARN_ON(pfn != end_pfn);
+}
+
+/*
+ * Given a high-order compound page containing certain number of HWPoison
+ * pages, free only the healthy ones to buddy allocator.
+ *
+ * Pages must have passed free_pages_prepare(). Even if having HWPoison
+ * pages, breaking down compound page and updating metadata (e.g. page
+ * owner, alloc tag) can be done together during free_pages_prepare(),
+ * which simplifies the splitting here: unlike __split_unmapped_folio(),
+ * there is no need to turn split pages into a compound page or to carry
+ * metadata.
+ *
+ * It calls free_one_page O(2^order) times and cause nontrivial overhead.
+ * So only use this when the compound page really contains HWPoison.
+ *
+ * This implementation doesn't work in memdesc world.
+ */
+static void free_has_hwpoisoned(struct page *page, unsigned int order,
+				fpi_t fpi_flags)
+{
+	struct page *curr = page;
+	struct page *next;
+	unsigned long nr_pages;
+	/*
+	 * Don't assume end points to a valid page. It is only used
+	 * here for pointer arithmetic.
+	 */
+	struct page *end = page + (1 << order);
+	unsigned long total_freed = 0;
+	unsigned long total_hwp = 0;
+
+	VM_WARN_ON(order == 0);
+	VM_WARN_ON(page->flags.f & PAGE_FLAGS_CHECK_AT_PREP);
+
+	while (curr < end) {
+		next = curr;
+		nr_pages = 0;
+
+		while (next < end && !PageHWPoison(next)) {
+			++next;
+			++nr_pages;
+		}
+
+		if (next != end && PageHWPoison(next)) {
+			clear_page_tag_ref(next);
+			++total_hwp;
+		}
+
+		free_contiguous_pages(curr, nr_pages, fpi_flags);
+		total_freed += nr_pages;
+		if (next == end)
+			break;
+
+		curr = PageHWPoison(next) ? next + 1 : next;
+	}
+
+	VM_WARN_ON(total_freed + total_hwp != (1 << order));
+	pr_info("Freed %#lx pages, excluded %lu hwpoison pages\n",
+		total_freed, total_hwp);
+}
+
+static bool compound_has_hwpoisoned(struct page *page, unsigned int order)
+{
+	if (order == 0 || !PageCompound(page))
+		return false;
+
+	return folio_test_has_hwpoisoned(page_folio(page));
+}
+
+/*
+ * Do free_has_hwpoisoned() when needed after free_pages_prepare().
+ * Returns
+ * - true: free_pages_prepare() is good and caller can proceed freeing.
+ * - false: caller should not free pages for one of the two reasons:
+ *   1. free_pages_prepare() failed so it is not safe to proceed freeing.
+ *   2. this is a compound page having some HWPoison pages, and healthy
+ *      pages are already safely freed.
+ */
+static bool free_pages_prepare_has_hwpoisoned(struct page *page,
+					      unsigned int order,
+					      fpi_t fpi_flags)
+{
+	/*
+	 * free_pages_prepare() clears PAGE_FLAGS_SECOND flags on the
+	 * first tail page of a compound page, which clears PG_has_hwpoisoned.
+	 * So this call must be before free_pages_prepare().
+	 *
+	 * Note we can't exclude PG_has_hwpoisoned from PAGE_FLAGS_SECOND.
+	 * Because PG_has_hwpoisoned == PG_active, free_page_is_bad() will
+	 * confuse and complaint that the first tail page is still active.
+	 */
+	bool should_fhh = compound_has_hwpoisoned(page, order);
+
+	if (!free_pages_prepare(page, order))
+		return false;
+
+	/*
+	 * After free_pages_prepare() breaks down compound page and deals
+	 * with page metadata (e.g. page owner and page alloc tags),
+	 * free_has_hwpoisoned() can directly use free_one_page() whenever
+	 * it knows the appropriate orders of page blocks to free.
+	 */
+	if (should_fhh) {
+		free_has_hwpoisoned(page, order, fpi_flags);
+		return false;
+	}
+
+	return true;
+}
+
 /*
  * Free a pcp page
  */
@@ -2940,7 +3091,7 @@ static void __free_frozen_pages(struct page *page, unsigned int order,
 		return;
 	}
 
-	if (!free_pages_prepare(page, order))
+	if (!free_pages_prepare_has_hwpoisoned(page, order, fpi_flags))
 		return;
 
 	/*
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v3 3/3] mm/memory-failure: refactor page_handle_poison()
  2026-01-12  0:49 [PATCH v3 0/3] Only free healthy pages in high-order has_hwpoisoned folio Jiaqi Yan
  2026-01-12  0:49 ` [PATCH v3 1/3] mm/memory-failure: set has_hwpoisoned flags on HugeTLB folio Jiaqi Yan
  2026-01-12  0:49 ` [PATCH v3 2/3] mm/page_alloc: only free healthy pages in high-order has_hwpoisoned folio Jiaqi Yan
@ 2026-01-12  0:49 ` Jiaqi Yan
  2 siblings, 0 replies; 5+ messages in thread
From: Jiaqi Yan @ 2026-01-12  0:49 UTC (permalink / raw)
  To: jackmanb, hannes, linmiaohe, ziy, harry.yoo, willy
  Cc: nao.horiguchi, david, lorenzo.stoakes, william.roche, tony.luck,
	wangkefeng.wang, jane.chu, akpm, osalvador, muchun.song,
	rientjes, duenwen, jthoughton, linux-mm, linux-kernel,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, Jiaqi Yan

Now that HWPoison page(s) within HugeTLB page will be rejected by
buddy allocator during dissolve_free_hugetlb_folio(), there is no
need to drain_all_pages() and take_page_off_buddy() anymore. In fact,
calling take_page_off_buddy() after dissolve_free_hugetlb_folio()
succeeded returns false, making caller think page_handl_poion() failed.

On the other hand, for hardware corrupted pages in buddy allocator,
take_page_off_buddy() is still a must-have.

Given hugepage and free buddy page should be treated differently,
refactor page_handle_poison() and __page_handle_poison():

- __page_handle_poison() is unwind into page_handle_poison().

- Callers of page_handle_poison() also need to explicitly tell if
  page is HugeTLB hugepage or free buddy page.

- Add helper hugepage_handle_poison() for several existing HugeTLB
  specific callsites.

Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
---
 mm/memory-failure.c | 84 ++++++++++++++++++++++-----------------------
 1 file changed, 41 insertions(+), 43 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index d204de6c9792a..1fdaee1e48bb8 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -162,54 +162,48 @@ static struct rb_root_cached pfn_space_itree = RB_ROOT_CACHED;
 
 static DEFINE_MUTEX(pfn_space_lock);
 
-/*
- * Return values:
- *   1:   the page is dissolved (if needed) and taken off from buddy,
- *   0:   the page is dissolved (if needed) and not taken off from buddy,
- *   < 0: failed to dissolve.
+/**
+ * Handle the HugeTLB hugepage that @page belongs to. Return values:
+ *   = 0: the hugepage is free hugepage and is dissolved.
+ *   < 0: hugepage is in-use or failed to dissolve.
  */
-static int __page_handle_poison(struct page *page)
+static int hugepage_handle_poison(struct page *page)
 {
-	int ret;
+	return dissolve_free_hugetlb_folio(page_folio(page));
+}
+
+/**
+ * Helper at the end of handling @page having hardware errors.
+ * @huge: @page is part of a HugeTLB hugepage.
+ * @free: @page is free buddy page.
+ * @release: memory-failure module should release a pending refcount.
+ */
+static bool page_handle_poison(struct page *page, bool huge, bool free,
+			       bool release)
+{
+	int ret = 0;
 
 	/*
-	 * zone_pcp_disable() can't be used here. It will
-	 * hold pcp_batch_high_lock and dissolve_free_hugetlb_folio() might hold
-	 * cpu_hotplug_lock via static_key_slow_dec() when hugetlb vmemmap
-	 * optimization is enabled. This will break current lock dependency
-	 * chain and leads to deadlock.
-	 * Disabling pcp before dissolving the page was a deterministic
-	 * approach because we made sure that those pages cannot end up in any
-	 * PCP list. Draining PCP lists expels those pages to the buddy system,
-	 * but nothing guarantees that those pages do not get back to a PCP
-	 * queue if we need to refill those.
+	 * Buddy allocator will exclude the HWPoison page after hugepage
+	 * is successfully dissolved.
 	 */
-	ret = dissolve_free_hugetlb_folio(page_folio(page));
-	if (!ret) {
+	if (huge)
+		ret = hugepage_handle_poison(page);
+
+	if (free) {
 		drain_all_pages(page_zone(page));
-		ret = take_page_off_buddy(page);
+		ret = take_page_off_buddy(page) ? 0 : -1;
 	}
 
-	return ret;
-}
-
-static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release)
-{
-	if (hugepage_or_freepage) {
+	if ((huge || free) && ret < 0)
 		/*
-		 * Doing this check for free pages is also fine since
-		 * dissolve_free_hugetlb_folio() returns 0 for non-hugetlb folios as well.
+		 * We could fail to take off the target page from buddy
+		 * for example due to racy page allocation, but that's
+		 * acceptable because soft-offlined page is not broken
+		 * and if someone really want to use it, they should
+		 * take it.
 		 */
-		if (__page_handle_poison(page) <= 0)
-			/*
-			 * We could fail to take off the target page from buddy
-			 * for example due to racy page allocation, but that's
-			 * acceptable because soft-offlined page is not broken
-			 * and if someone really want to use it, they should
-			 * take it.
-			 */
-			return false;
-	}
+		return false;
 
 	SetPageHWPoison(page);
 	if (release)
@@ -1174,7 +1168,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
 		 * subpages.
 		 */
 		folio_put(folio);
-		if (__page_handle_poison(p) > 0) {
+		if (!hugepage_handle_poison(p)) {
 			page_ref_inc(p);
 			res = MF_RECOVERED;
 		} else {
@@ -2067,7 +2061,7 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
 	 */
 	if (res == 0) {
 		folio_unlock(folio);
-		if (__page_handle_poison(p) > 0) {
+		if (!hugepage_handle_poison(p)) {
 			page_ref_inc(p);
 			res = MF_RECOVERED;
 		} else {
@@ -2815,7 +2809,7 @@ static int soft_offline_in_use_page(struct page *page)
 
 	if (ret) {
 		pr_info("%#lx: invalidated\n", pfn);
-		page_handle_poison(page, false, true);
+		page_handle_poison(page, false, false, true);
 		return 0;
 	}
 
@@ -2836,7 +2830,7 @@ static int soft_offline_in_use_page(struct page *page)
 		if (!ret) {
 			bool release = !huge;
 
-			if (!page_handle_poison(page, huge, release))
+			if (!page_handle_poison(page, huge, false, release))
 				ret = -EBUSY;
 		} else {
 			if (!list_empty(&pagelist))
@@ -2884,6 +2878,8 @@ int soft_offline_page(unsigned long pfn, int flags)
 {
 	int ret;
 	bool try_again = true;
+	bool huge;
+	bool free;
 	struct page *page;
 
 	if (!pfn_valid(pfn)) {
@@ -2929,7 +2925,9 @@ int soft_offline_page(unsigned long pfn, int flags)
 	if (ret > 0) {
 		ret = soft_offline_in_use_page(page);
 	} else if (ret == 0) {
-		if (!page_handle_poison(page, true, false)) {
+		huge = folio_test_hugetlb(page_folio(page));
+		free = is_free_buddy_page(page);
+		if (!page_handle_poison(page, huge, free, false)) {
 			if (try_again) {
 				try_again = false;
 				flags &= ~MF_COUNT_INCREASED;
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 1/3] mm/memory-failure: set has_hwpoisoned flags on HugeTLB folio
  2026-01-12  0:49 ` [PATCH v3 1/3] mm/memory-failure: set has_hwpoisoned flags on HugeTLB folio Jiaqi Yan
@ 2026-01-12  2:50   ` Zi Yan
  0 siblings, 0 replies; 5+ messages in thread
From: Zi Yan @ 2026-01-12  2:50 UTC (permalink / raw)
  To: Jiaqi Yan
  Cc: jackmanb, hannes, linmiaohe, harry.yoo, willy, nao.horiguchi,
	david, lorenzo.stoakes, william.roche, tony.luck,
	wangkefeng.wang, jane.chu, akpm, osalvador, muchun.song,
	rientjes, duenwen, jthoughton, linux-mm, linux-kernel,
	Liam.Howlett, vbabka, rppt, surenb, mhocko

On 11 Jan 2026, at 19:49, Jiaqi Yan wrote:

> When a free HWPoison HugeTLB folio is dissolved, it becomes
> non-HugeTLB and is released to buddy allocator as a high-order
> folio.
>
> Set has_hwpoisoned flags on the high-order folio so that buddy
> allocator can tell that it contains certain HWPoison page(s).
> This is a prepare change for buddy allocator to handle only the
> high-order HWPoison folio differently.
>
> This cannot be done with hwpoison flag because users cannot tell
> from the case that the page with hwpoison is hardware corrupted.
>
> Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
> ---
>  include/linux/page-flags.h | 2 +-
>  mm/memory-failure.c        | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index f7a0e4af0c734..d13835e265952 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -904,7 +904,7 @@ static inline int PageTransCompound(const struct page *page)
>  TESTPAGEFLAG_FALSE(TransCompound, transcompound)
>  #endif
>
> -#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_TRANSPARENT_HUGEPAGE)
> +#if defined(CONFIG_MEMORY_FAILURE) && (defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLB_PAGE))
>  /*
>   * PageHasHWPoisoned indicates that at least one subpage is hwpoisoned in the
>   * compound page.
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index fbc5a01260c89..d204de6c9792a 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1952,6 +1952,7 @@ void folio_clear_hugetlb_hwpoison(struct folio *folio)
>  	if (folio_test_hugetlb_vmemmap_optimized(folio))
>  		return;
>  	folio_clear_hwpoison(folio);
> +	folio_set_has_hwpoisoned(folio);
>  	folio_free_raw_hwp(folio, true);
>  }

Should this patch go after Patch 2 where has_hwpoisoned folio handling code
is added?

--
Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-01-12  2:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-01-12  0:49 [PATCH v3 0/3] Only free healthy pages in high-order has_hwpoisoned folio Jiaqi Yan
2026-01-12  0:49 ` [PATCH v3 1/3] mm/memory-failure: set has_hwpoisoned flags on HugeTLB folio Jiaqi Yan
2026-01-12  2:50   ` Zi Yan
2026-01-12  0:49 ` [PATCH v3 2/3] mm/page_alloc: only free healthy pages in high-order has_hwpoisoned folio Jiaqi Yan
2026-01-12  0:49 ` [PATCH v3 3/3] mm/memory-failure: refactor page_handle_poison() Jiaqi Yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox