linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jiaqi Yan <jiaqiyan@google.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: nao.horiguchi@gmail.com, lorenzo.stoakes@oracle.com,
	 william.roche@oracle.com, tony.luck@intel.com,
	wangkefeng.wang@huawei.com,  jane.chu@oracle.com,
	akpm@linux-foundation.org, osalvador@suse.de,
	 muchun.song@linux.dev, rientjes@google.com, duenwen@google.com,
	 jthoughton@google.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,  Liam.Howlett@oracle.com,
	rppt@kernel.org, surenb@google.com, mhocko@suse.com,
	 boudewijn@delta-utec.com, ziy@nvidia.com, harry.yoo@oracle.com,
	 willy@infradead.org, linmiaohe@huawei.com, hannes@cmpxchg.org,
	 jackmanb@google.com
Subject: Re: [PATCH v4 0/3] Only free healthy pages in high-order has_hwpoisoned folio
Date: Mon, 23 Feb 2026 15:17:57 -0800	[thread overview]
Message-ID: <CACw3F53U8xNTHJkdJMo-74_t0a2dX2cJtQOjpjizU8=UNLQRqg@mail.gmail.com> (raw)
In-Reply-To: <20260202194125.2191216-1-jiaqiyan@google.com>

Hi Vlastimil,

Could you and other page_alloc.c reviewers share your thoughts on this
patchset? Thanks!

On Mon, Feb 2, 2026 at 11:41 AM Jiaqi Yan <jiaqiyan@google.com> wrote:
>
> At the end of dissolve_free_hugetlb_folio() that a free HugeTLB
> folio becomes non-HugeTLB, it is released to buddy allocator
> as a high-order folio, e.g. a folio that contains 262144 pages
> if the folio was a 1G HugeTLB hugepage.
>
> This is problematic if the HugeTLB hugepage contained HWPoison
> subpages. In that case, since buddy allocator does not check
> HWPoison for non-zero-order folio, the raw HWPoison page can
> be given out with its buddy page and be re-used by either
> kernel or userspace.
>
> Memory failure recovery (MFR) in kernel does attempt to take
> raw HWPoison page off buddy allocator after
> dissolve_free_hugetlb_folio(). However, there is always a time
> window between dissolve_free_hugetlb_folio() frees a HWPoison
> high-order folio to buddy allocator and MFR takes HWPoison
> raw page off buddy allocator.
>
> Another similar situation is when a transparent huge page (THP)
> is handled by MFR but splitting failed. Such THP will eventually
> be released to buddy allocator when owning userspace processes
> are gone, but with certain subpages having HWPoison [9].
>
> One obvious way to avoid both problems is to add page sanity
> checks in page allocate or free path. However, it is against
> the past efforts to reduce sanity check overhead [1,2,3].
>
> Introduce free_has_hwpoisoned() to only free the healthy pages
> and excludes the HWPoison ones in the high-order folio.
> free_has_hwpoisoned() happens at the end of free_pages_prepare(),
> which already deals with both decomposing the original compound
> page, updating page metadata like alloc tag and page owner.
> It is also only applied when PG_has_hwpoisoned indicates folio
> contains certain HWPoison page(s) for performance reason.
> Its idea is to iterate through the sub-pages of the folio to
> identify contiguous ranges of healthy pages. Instead of freeing
> pages one by one, decompose healthy ranges into the largest
> possible blocks. Each block is freed via free_one_page() directly.
>
> free_has_hwpoisoned() has linear time complexity wrt the number
> of pages in the folio. While the power-of-two decomposition
> ensures that the number of calls to the buddy allocator is
> logarithmic for each contiguous healthy range, the mandatory
> linear scan of pages to identify PageHWPoison defines the
> overall time complexity.
>
> I tested with some test-only code [4] and hugetlb-mfr [5], by
> checking the status of pcplist and freelist immediately after
> dissolve_free_hugetlb_folio() a free 2M or 1G hugetlb page that
> contains 1~8 HWPoison raw pages:
>
> - HWPoison pages are excluded by free_has_hwpoisoned().
>
> - Some healthy pages can be in zone->per_cpu_pageset (pcplist)
>   because pcp_count is not high enough. Many healthy pages are
>   in some order's zone->free_area[order].free_list (freelist).
>
> - In rare cases, some healthy pages are in neither pcplist
>   nor freelist. My best guest is they are allocated before
>   the test checks.
>
> To illustrate the latency free_has_hwpoisoned() added to the
> memory freeing path, I tested its time cost with 8 HWPoison
> pages with instrument code in [4] for 20 sample runs:
>
> - Has HWPoison path: mean=1448us, stdev=174ms
>
> - No HWPoison path: mean=66us, stdev=6us
>
> free_has_hwpoisoned() is around 22x the baseline. It is far from
> triggering soft lockup, and the cost is fair for handling
> exceptional hardware memory errors.
>
> With free_has_hwpoisoned() ensuring HWPoison pages never made into
> buddy allocator, MFR don't need to take_page_off_buddy() anymore
> after disovling HWPoison hugepages. So replace __page_handle_poison()
> with new __hugepage_handle_poison() for HugeTLB specific call sites.
>
> Based on commit 8dfce8991b95d ("Merge tag 'pinctrl-v6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl")
>
> Changelog
>
> v3 [8] -> v4
>
> - Address comments from Zi Yan, Miaohe Lin, Harry Yoo.
>
> - Set has_hwpoisoned flag after introducing free_has_hwpoisoned().
>
> - Unwrap free_pages_prepare_has_hwpoisoned() into free_pages_prepare().
>
> - If folio has HWPoison, its healthy pages will be freed with FPI_NONE
>   right in free_pages_prepare(), who returns false to indicate caller
>   should not proceeding its own freeing action.
>
> - Rework the commit on __page_handle_poison(). Only change the handling
>   for HWPoison HugeTLB page, leaving free buddy page and soft offline
>   handling alone.
>
> v2 [7] -> v3:
>
> - Address comments from Mathew Wilcox, Harry Hoo, Miaohe Lin.
>
> - Let free_has_hwpoisoned() happen after free_pages_prepare(),
>   which help to deal with decomposing the original compound page,
>   and with page metadata like alloc tag and page owner.
>
> - Tested with "page_owner=on" and CONFIG_MEM_ALLOC_PROFILING*=y.
>
> - Wrap checking PG_has_hwpoisoned and free_has_hwpoisoned() into
>   free_pages_prepare_has_hwpoisoned(), which replaces
>   free_pages_prepare() calls in free_frozen_pages().
>
> - Rename free_has_hwpoison_page() to free_has_hwpoisoned().
>
> - Measure latency added by free_has_hwpoisoned().
>
> - Ensure struct page *end is only used for pointer arithmetic,
>   instead of accessed as page.
>
> - Refactor page_handl_poison instead of just __page_handle_poison().
>
> v1 [6] -> v2:
>
> - Total reimplementation based on discussions with Mathew Wilcox,
>   Harry Hoo, Zi Yan etc
>
> - hugetlb_free_hwpoison_folio() => free_has_hwpoison_pages().
>
> - Utilize has_hwpoisoned flag to tell buddy allocator a high-order
>   folio contains HWPoison.
>
> - Simplify __page_handle_poison() given that the HWPoison page(s)
>   won't be freed within high-order folio.
>
> [1] https://lore.kernel.org/linux-mm/1460711275-1130-15-git-send-email-mgorman@techsingularity.net
> [2] https://lore.kernel.org/linux-mm/1460711275-1130-16-git-send-email-mgorman@techsingularity.net
> [3] https://lore.kernel.org/all/20230216095131.17336-1-vbabka@suse.cz
> [4] https://drive.google.com/file/d/1CzJn1Cc4wCCm183Y77h244fyZIkTLzCt/view?usp=sharing
> [5] https://lore.kernel.org/linux-mm/20251116013223.1557158-3-jiaqiyan@google.com
> [6] https://lore.kernel.org/linux-mm/20251116014721.1561456-1-jiaqiyan@google.com
> [7] https://lore.kernel.org/linux-mm/20251219183346.3627510-1-jiaqiyan@google.com
> [8] https://lore.kernel.org/linux-mm/20260112004923.888429-1-jiaqiyan@google.com
> [9] https://lore.kernel.org/linux-mm/20260113205441.506897-1-boudewijn@delta-utec.com
>
> Jiaqi Yan (3):
>   mm/page_alloc: only free healthy pages in high-order has_hwpoisoned
>     folio
>   mm/memory-failure: set has_hwpoisoned flags on dissolved HugeTLB folio
>   mm/memory-failure: skip take_page_off_buddy after dissolving HWPoison
>     HugeTLB page
>
>  include/linux/page-flags.h |   2 +-
>  mm/memory-failure.c        |  37 +++++++++--
>  mm/page_alloc.c            | 133 ++++++++++++++++++++++++++++++++++++-
>  3 files changed, 163 insertions(+), 9 deletions(-)
>
> --
> 2.53.0.rc2.204.g2597b5adb4-goog
>


      parent reply	other threads:[~2026-02-23 23:18 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-02 19:41 Jiaqi Yan
2026-02-02 19:41 ` [PATCH v4 1/3] mm/page_alloc: only " Jiaqi Yan
2026-02-02 19:41 ` [PATCH v4 2/3] mm/memory-failure: set has_hwpoisoned flags on dissolved HugeTLB folio Jiaqi Yan
2026-02-02 19:41 ` [PATCH v4 3/3] mm/memory-failure: skip take_page_off_buddy after dissolving HWPoison HugeTLB page Jiaqi Yan
2026-02-04 15:23 ` [PATCH v4 0/3] Only free healthy pages in high-order has_hwpoisoned folio Boudewijn van der Heide
2026-02-04 15:48   ` Zi Yan
2026-02-06 16:16     ` Boudewijn van der Heide
2026-02-23 23:17 ` Jiaqi Yan [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACw3F53U8xNTHJkdJMo-74_t0a2dX2cJtQOjpjizU8=UNLQRqg@mail.gmail.com' \
    --to=jiaqiyan@google.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=boudewijn@delta-utec.com \
    --cc=duenwen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=harry.yoo@oracle.com \
    --cc=jackmanb@google.com \
    --cc=jane.chu@oracle.com \
    --cc=jthoughton@google.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=nao.horiguchi@gmail.com \
    --cc=osalvador@suse.de \
    --cc=rientjes@google.com \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=tony.luck@intel.com \
    --cc=vbabka@suse.cz \
    --cc=wangkefeng.wang@huawei.com \
    --cc=william.roche@oracle.com \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox