From: Jiaqi Yan <jiaqiyan@google.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: nao.horiguchi@gmail.com, lorenzo.stoakes@oracle.com,
william.roche@oracle.com, tony.luck@intel.com,
wangkefeng.wang@huawei.com, jane.chu@oracle.com,
akpm@linux-foundation.org, osalvador@suse.de,
muchun.song@linux.dev, rientjes@google.com, duenwen@google.com,
jthoughton@google.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, Liam.Howlett@oracle.com,
rppt@kernel.org, surenb@google.com, mhocko@suse.com,
boudewijn@delta-utec.com, ziy@nvidia.com, harry.yoo@oracle.com,
willy@infradead.org, linmiaohe@huawei.com, hannes@cmpxchg.org,
jackmanb@google.com
Subject: Re: [PATCH v4 0/3] Only free healthy pages in high-order has_hwpoisoned folio
Date: Mon, 23 Feb 2026 15:17:57 -0800 [thread overview]
Message-ID: <CACw3F53U8xNTHJkdJMo-74_t0a2dX2cJtQOjpjizU8=UNLQRqg@mail.gmail.com> (raw)
In-Reply-To: <20260202194125.2191216-1-jiaqiyan@google.com>
Hi Vlastimil,
Could you and other page_alloc.c reviewers share your thoughts on this
patchset? Thanks!
On Mon, Feb 2, 2026 at 11:41 AM Jiaqi Yan <jiaqiyan@google.com> wrote:
>
> At the end of dissolve_free_hugetlb_folio() that a free HugeTLB
> folio becomes non-HugeTLB, it is released to buddy allocator
> as a high-order folio, e.g. a folio that contains 262144 pages
> if the folio was a 1G HugeTLB hugepage.
>
> This is problematic if the HugeTLB hugepage contained HWPoison
> subpages. In that case, since buddy allocator does not check
> HWPoison for non-zero-order folio, the raw HWPoison page can
> be given out with its buddy page and be re-used by either
> kernel or userspace.
>
> Memory failure recovery (MFR) in kernel does attempt to take
> raw HWPoison page off buddy allocator after
> dissolve_free_hugetlb_folio(). However, there is always a time
> window between dissolve_free_hugetlb_folio() frees a HWPoison
> high-order folio to buddy allocator and MFR takes HWPoison
> raw page off buddy allocator.
>
> Another similar situation is when a transparent huge page (THP)
> is handled by MFR but splitting failed. Such THP will eventually
> be released to buddy allocator when owning userspace processes
> are gone, but with certain subpages having HWPoison [9].
>
> One obvious way to avoid both problems is to add page sanity
> checks in page allocate or free path. However, it is against
> the past efforts to reduce sanity check overhead [1,2,3].
>
> Introduce free_has_hwpoisoned() to only free the healthy pages
> and excludes the HWPoison ones in the high-order folio.
> free_has_hwpoisoned() happens at the end of free_pages_prepare(),
> which already deals with both decomposing the original compound
> page, updating page metadata like alloc tag and page owner.
> It is also only applied when PG_has_hwpoisoned indicates folio
> contains certain HWPoison page(s) for performance reason.
> Its idea is to iterate through the sub-pages of the folio to
> identify contiguous ranges of healthy pages. Instead of freeing
> pages one by one, decompose healthy ranges into the largest
> possible blocks. Each block is freed via free_one_page() directly.
>
> free_has_hwpoisoned() has linear time complexity wrt the number
> of pages in the folio. While the power-of-two decomposition
> ensures that the number of calls to the buddy allocator is
> logarithmic for each contiguous healthy range, the mandatory
> linear scan of pages to identify PageHWPoison defines the
> overall time complexity.
>
> I tested with some test-only code [4] and hugetlb-mfr [5], by
> checking the status of pcplist and freelist immediately after
> dissolve_free_hugetlb_folio() a free 2M or 1G hugetlb page that
> contains 1~8 HWPoison raw pages:
>
> - HWPoison pages are excluded by free_has_hwpoisoned().
>
> - Some healthy pages can be in zone->per_cpu_pageset (pcplist)
> because pcp_count is not high enough. Many healthy pages are
> in some order's zone->free_area[order].free_list (freelist).
>
> - In rare cases, some healthy pages are in neither pcplist
> nor freelist. My best guest is they are allocated before
> the test checks.
>
> To illustrate the latency free_has_hwpoisoned() added to the
> memory freeing path, I tested its time cost with 8 HWPoison
> pages with instrument code in [4] for 20 sample runs:
>
> - Has HWPoison path: mean=1448us, stdev=174ms
>
> - No HWPoison path: mean=66us, stdev=6us
>
> free_has_hwpoisoned() is around 22x the baseline. It is far from
> triggering soft lockup, and the cost is fair for handling
> exceptional hardware memory errors.
>
> With free_has_hwpoisoned() ensuring HWPoison pages never made into
> buddy allocator, MFR don't need to take_page_off_buddy() anymore
> after disovling HWPoison hugepages. So replace __page_handle_poison()
> with new __hugepage_handle_poison() for HugeTLB specific call sites.
>
> Based on commit 8dfce8991b95d ("Merge tag 'pinctrl-v6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl")
>
> Changelog
>
> v3 [8] -> v4
>
> - Address comments from Zi Yan, Miaohe Lin, Harry Yoo.
>
> - Set has_hwpoisoned flag after introducing free_has_hwpoisoned().
>
> - Unwrap free_pages_prepare_has_hwpoisoned() into free_pages_prepare().
>
> - If folio has HWPoison, its healthy pages will be freed with FPI_NONE
> right in free_pages_prepare(), who returns false to indicate caller
> should not proceeding its own freeing action.
>
> - Rework the commit on __page_handle_poison(). Only change the handling
> for HWPoison HugeTLB page, leaving free buddy page and soft offline
> handling alone.
>
> v2 [7] -> v3:
>
> - Address comments from Mathew Wilcox, Harry Hoo, Miaohe Lin.
>
> - Let free_has_hwpoisoned() happen after free_pages_prepare(),
> which help to deal with decomposing the original compound page,
> and with page metadata like alloc tag and page owner.
>
> - Tested with "page_owner=on" and CONFIG_MEM_ALLOC_PROFILING*=y.
>
> - Wrap checking PG_has_hwpoisoned and free_has_hwpoisoned() into
> free_pages_prepare_has_hwpoisoned(), which replaces
> free_pages_prepare() calls in free_frozen_pages().
>
> - Rename free_has_hwpoison_page() to free_has_hwpoisoned().
>
> - Measure latency added by free_has_hwpoisoned().
>
> - Ensure struct page *end is only used for pointer arithmetic,
> instead of accessed as page.
>
> - Refactor page_handl_poison instead of just __page_handle_poison().
>
> v1 [6] -> v2:
>
> - Total reimplementation based on discussions with Mathew Wilcox,
> Harry Hoo, Zi Yan etc
>
> - hugetlb_free_hwpoison_folio() => free_has_hwpoison_pages().
>
> - Utilize has_hwpoisoned flag to tell buddy allocator a high-order
> folio contains HWPoison.
>
> - Simplify __page_handle_poison() given that the HWPoison page(s)
> won't be freed within high-order folio.
>
> [1] https://lore.kernel.org/linux-mm/1460711275-1130-15-git-send-email-mgorman@techsingularity.net
> [2] https://lore.kernel.org/linux-mm/1460711275-1130-16-git-send-email-mgorman@techsingularity.net
> [3] https://lore.kernel.org/all/20230216095131.17336-1-vbabka@suse.cz
> [4] https://drive.google.com/file/d/1CzJn1Cc4wCCm183Y77h244fyZIkTLzCt/view?usp=sharing
> [5] https://lore.kernel.org/linux-mm/20251116013223.1557158-3-jiaqiyan@google.com
> [6] https://lore.kernel.org/linux-mm/20251116014721.1561456-1-jiaqiyan@google.com
> [7] https://lore.kernel.org/linux-mm/20251219183346.3627510-1-jiaqiyan@google.com
> [8] https://lore.kernel.org/linux-mm/20260112004923.888429-1-jiaqiyan@google.com
> [9] https://lore.kernel.org/linux-mm/20260113205441.506897-1-boudewijn@delta-utec.com
>
> Jiaqi Yan (3):
> mm/page_alloc: only free healthy pages in high-order has_hwpoisoned
> folio
> mm/memory-failure: set has_hwpoisoned flags on dissolved HugeTLB folio
> mm/memory-failure: skip take_page_off_buddy after dissolving HWPoison
> HugeTLB page
>
> include/linux/page-flags.h | 2 +-
> mm/memory-failure.c | 37 +++++++++--
> mm/page_alloc.c | 133 ++++++++++++++++++++++++++++++++++++-
> 3 files changed, 163 insertions(+), 9 deletions(-)
>
> --
> 2.53.0.rc2.204.g2597b5adb4-goog
>
prev parent reply other threads:[~2026-02-23 23:18 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-02 19:41 Jiaqi Yan
2026-02-02 19:41 ` [PATCH v4 1/3] mm/page_alloc: only " Jiaqi Yan
2026-02-02 19:41 ` [PATCH v4 2/3] mm/memory-failure: set has_hwpoisoned flags on dissolved HugeTLB folio Jiaqi Yan
2026-02-02 19:41 ` [PATCH v4 3/3] mm/memory-failure: skip take_page_off_buddy after dissolving HWPoison HugeTLB page Jiaqi Yan
2026-02-04 15:23 ` [PATCH v4 0/3] Only free healthy pages in high-order has_hwpoisoned folio Boudewijn van der Heide
2026-02-04 15:48 ` Zi Yan
2026-02-06 16:16 ` Boudewijn van der Heide
2026-02-23 23:17 ` Jiaqi Yan [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CACw3F53U8xNTHJkdJMo-74_t0a2dX2cJtQOjpjizU8=UNLQRqg@mail.gmail.com' \
--to=jiaqiyan@google.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=boudewijn@delta-utec.com \
--cc=duenwen@google.com \
--cc=hannes@cmpxchg.org \
--cc=harry.yoo@oracle.com \
--cc=jackmanb@google.com \
--cc=jane.chu@oracle.com \
--cc=jthoughton@google.com \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=muchun.song@linux.dev \
--cc=nao.horiguchi@gmail.com \
--cc=osalvador@suse.de \
--cc=rientjes@google.com \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=tony.luck@intel.com \
--cc=vbabka@suse.cz \
--cc=wangkefeng.wang@huawei.com \
--cc=william.roche@oracle.com \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox