From: Jiaqi Yan <jiaqiyan@google.com>
To: jackmanb@google.com, hannes@cmpxchg.org, linmiaohe@huawei.com,
ziy@nvidia.com, harry.yoo@oracle.com, willy@infradead.org
Cc: nao.horiguchi@gmail.com, david@redhat.com,
lorenzo.stoakes@oracle.com, william.roche@oracle.com,
tony.luck@intel.com, wangkefeng.wang@huawei.com,
jane.chu@oracle.com, akpm@linux-foundation.org,
osalvador@suse.de, muchun.song@linux.dev, rientjes@google.com,
duenwen@google.com, jthoughton@google.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, Liam.Howlett@oracle.com,
vbabka@suse.cz, rppt@kernel.org, surenb@google.com,
mhocko@suse.com, Jiaqi Yan <jiaqiyan@google.com>
Subject: [PATCH v2 2/3] mm/page_alloc: only free healthy pages in high-order HWPoison folio
Date: Fri, 19 Dec 2025 18:33:45 +0000 [thread overview]
Message-ID: <20251219183346.3627510-3-jiaqiyan@google.com> (raw)
In-Reply-To: <20251219183346.3627510-1-jiaqiyan@google.com>
At the end of dissolve_free_hugetlb_folio that a free HugeTLB
folio becomes non-HugeTLB, it is released to buddy allocator
as a high-order folio, e.g. a folio that contains 262144 pages
if the folio was a 1G HugeTLB hugepage.
This is problematic if the HugeTLB hugepage contained HWPoison
subpages. In that case, since buddy allocator does not check
HWPoison for non-zero-order folio, the raw HWPoison page can
be given out with its buddy page and be re-used by either
kernel or userspace.
Memory failure recovery (MFR) in kernel does attempt to take
raw HWPoison page off buddy allocator after
dissolve_free_hugetlb_folio. However, there is always a time
window between dissolve_free_hugetlb_folio frees a HWPoison
high-order folio to buddy allocator and MFR takes HWPoison
raw page off buddy allocator.
One obvious way to avoid this problem is to add page sanity
checks in page allocate or free path. However, it is against
the past efforts to reduce sanity check overhead [1,2,3].
Introduce free_has_hwpoison_pages to only free the healthy
pages and excludes the HWPoison ones in the high-order folio.
The idea is to iterate through the sub-pages of the folio to
identify contiguous ranges of healthy pages. Instead of freeing
pages one by one, decompose healthy ranges into the largest
possible blocks. Each block meets the requirements to be freed
to buddy allocator (__free_frozen_pages).
free_has_hwpoison_pages has linear time complexity O(N) wrt the
number of pages in the folio. While the power-of-two decomposition
ensures that the number of calls to the buddy allocator is
logarithmic for each contiguous healthy range, the mandatory
linear scan of pages to identify PageHWPoison defines the
overall time complexity.
[1] https://lore.kernel.org/linux-mm/1460711275-1130-15-git-send-email-mgorman@techsingularity.net/
[2] https://lore.kernel.org/linux-mm/1460711275-1130-16-git-send-email-mgorman@techsingularity.net/
[3] https://lore.kernel.org/all/20230216095131.17336-1-vbabka@suse.cz
Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
---
mm/page_alloc.c | 101 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 101 insertions(+)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 822e05f1a9646..20c8862ce594e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2976,8 +2976,109 @@ static void __free_frozen_pages(struct page *page, unsigned int order,
}
}
+static void prepare_compound_page_to_free(struct page *new_head,
+ unsigned int order,
+ unsigned long flags)
+{
+ new_head->flags.f = flags & (~PAGE_FLAGS_CHECK_AT_FREE);
+ new_head->mapping = NULL;
+ new_head->private = 0;
+
+ clear_compound_head(new_head);
+ if (order)
+ prep_compound_page(new_head, order);
+}
+
+/*
+ * Given a range of pages physically contiguous physical, efficiently
+ * free them in blocks that meet __free_frozen_pages's requirements.
+ */
+static void free_contiguous_pages(struct page *curr, struct page *next,
+ unsigned long flags)
+{
+ unsigned int order;
+ unsigned int align_order;
+ unsigned int size_order;
+ unsigned long pfn;
+ unsigned long end_pfn = page_to_pfn(next);
+ unsigned long remaining;
+
+ /*
+ * This decomposition algorithm at every iteration chooses the
+ * order to be the minimum of two constraints:
+ * - Alignment: the largest power-of-two that divides current pfn.
+ * - Size: the largest power-of-two that fits in the
+ * current remaining number of pages.
+ */
+ while (curr < next) {
+ pfn = page_to_pfn(curr);
+ remaining = end_pfn - pfn;
+
+ align_order = ffs(pfn) - 1;
+ size_order = fls_long(remaining) - 1;
+ order = min(align_order, size_order);
+
+ prepare_compound_page_to_free(curr, order, flags);
+ __free_frozen_pages(curr, order, FPI_NONE);
+ curr += (1UL << order);
+ }
+
+ VM_WARN_ON(curr != next);
+}
+
+/*
+ * Given a high-order compound page containing certain number of HWPoison
+ * pages, free only the healthy ones to buddy allocator.
+ *
+ * It calls __free_frozen_pages O(2^order) times and cause nontrivial
+ * overhead. So only use this when compound page really contains HWPoison.
+ *
+ * This implementation doesn't work in memdesc world.
+ */
+static void free_has_hwpoison_pages(struct page *page, unsigned int order)
+{
+ struct page *curr = page;
+ struct page *end = page + (1 << order);
+ struct page *next;
+ unsigned long flags = page->flags.f;
+ unsigned long nr_pages;
+ unsigned long total_freed = 0;
+ unsigned long total_hwp = 0;
+
+ VM_WARN_ON(flags & PAGE_FLAGS_CHECK_AT_FREE);
+
+ while (curr < end) {
+ next = curr;
+ nr_pages = 0;
+
+ while (next < end && !PageHWPoison(next)) {
+ ++next;
+ ++nr_pages;
+ }
+
+ if (PageHWPoison(next))
+ ++total_hwp;
+
+ free_contiguous_pages(curr, next, flags);
+
+ total_freed += nr_pages;
+ curr = PageHWPoison(next) ? next + 1 : next;
+ }
+
+ pr_info("Excluded %lu hwpoison pages from folio\n", total_hwp);
+ pr_info("Freed %#lx pages from folio\n", total_freed);
+}
+
void free_frozen_pages(struct page *page, unsigned int order)
{
+ struct folio *folio = page_folio(page);
+
+ if (order > 0 && unlikely(folio_test_has_hwpoisoned(folio))) {
+ folio_clear_has_hwpoisoned(folio);
+ free_has_hwpoison_pages(page, order);
+ return;
+ }
+
__free_frozen_pages(page, order, FPI_NONE);
}
--
2.52.0.322.g1dd061c0dc-goog
next prev parent reply other threads:[~2025-12-19 18:34 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-19 18:33 [PATCH v2 0/3] Only " Jiaqi Yan
2025-12-19 18:33 ` [PATCH v2 1/3] mm/memory-failure: set has_hwpoisoned flags on HugeTLB folio Jiaqi Yan
2025-12-19 18:33 ` Jiaqi Yan [this message]
2025-12-23 5:14 ` [PATCH v2 2/3] mm/page_alloc: only free healthy pages in high-order HWPoison folio Harry Yoo
2025-12-27 1:50 ` Jiaqi Yan
2025-12-29 1:15 ` Harry Yoo
2025-12-31 0:19 ` Jiaqi Yan
2025-12-31 4:37 ` Harry Yoo
2025-12-23 7:45 ` Miaohe Lin
2025-12-27 1:50 ` Jiaqi Yan
2025-12-19 18:33 ` [PATCH v2 3/3] mm/memory-failure: simplify __page_handle_poison Jiaqi Yan
2025-12-22 22:12 ` Matthew Wilcox
2025-12-22 23:13 ` Jiaqi Yan
2025-12-22 22:06 ` [PATCH v2 0/3] Only free healthy pages in high-order HWPoison folio Jiaqi Yan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251219183346.3627510-3-jiaqiyan@google.com \
--to=jiaqiyan@google.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=duenwen@google.com \
--cc=hannes@cmpxchg.org \
--cc=harry.yoo@oracle.com \
--cc=jackmanb@google.com \
--cc=jane.chu@oracle.com \
--cc=jthoughton@google.com \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=muchun.song@linux.dev \
--cc=nao.horiguchi@gmail.com \
--cc=osalvador@suse.de \
--cc=rientjes@google.com \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=tony.luck@intel.com \
--cc=vbabka@suse.cz \
--cc=wangkefeng.wang@huawei.com \
--cc=william.roche@oracle.com \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox