* [PATCH] mm/page_alloc: Fix freeing of failed-split poisoned compound pages
@ 2026-01-13 20:54 Boudewijn van der Heide
2026-01-13 21:05 ` Zi Yan
0 siblings, 1 reply; 8+ messages in thread
From: Boudewijn van der Heide @ 2026-01-13 20:54 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Brendan Jackman, Johannes Weiner, Zi Yan, Naoya Horiguchi,
Oscar Salvador, linux-mm, linux-kernel, boudewijn
free_pages_prepare() only handles poisoned order-0 pages.
In memory_failure() (hard offline), pages
are poisoned before attempting to split huge pages. If the split fails,
the page remains a compound (order > 0) but is already poisoned. However,
Soft-offline pages are always poisoned as order-0 after migration, so
they are unaffected.
The '!order' check causes these poisoned compound pages to skip
poison handling, leaving them in the buddy allocator.
Worst case, a poisoned compound page could be reallocated,
potentially leading to crashes, silent data corruption,
or unwanted memory containment actions before the poison bit is detected.
This patch removes the '&& !order' restriction. Cleanup functions in the
poison-handling block correctly handle non-zero order pages, making
this change safe.
Fixes: 79f5f8fab482 ("mm,hwpoison: rework soft offline for in-use pages")
Signed-off-by: Boudewijn van der Heide <boudewijn@delta-utec.com>
---
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c380f063e8b7..64d15e56706c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1344,7 +1344,7 @@ __always_inline bool free_pages_prepare(struct page *page,
count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages);
}
- if (unlikely(PageHWPoison(page)) && !order) {
+ if (unlikely(PageHWPoison(page))) {
/* Do not let hwpoison pages hit pcplists/buddy */
reset_page_owner(page, order);
page_table_check_free(page, order);
--
2.47.3
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] mm/page_alloc: Fix freeing of failed-split poisoned compound pages
2026-01-13 20:54 [PATCH] mm/page_alloc: Fix freeing of failed-split poisoned compound pages Boudewijn van der Heide
@ 2026-01-13 21:05 ` Zi Yan
2026-01-14 14:48 ` Boudewijn van der Heide
0 siblings, 1 reply; 8+ messages in thread
From: Zi Yan @ 2026-01-13 21:05 UTC (permalink / raw)
To: Boudewijn van der Heide
Cc: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Brendan Jackman, Johannes Weiner, Naoya Horiguchi,
Oscar Salvador, linux-mm, linux-kernel, Miaohe Lin
Add Miaohe (memory failure maintainer)
On 13 Jan 2026, at 15:54, Boudewijn van der Heide wrote:
> free_pages_prepare() only handles poisoned order-0 pages.
> In memory_failure() (hard offline), pages
> are poisoned before attempting to split huge pages. If the split fails,
> the page remains a compound (order > 0) but is already poisoned. However,
> Soft-offline pages are always poisoned as order-0 after migration, so
> they are unaffected.
>
> The '!order' check causes these poisoned compound pages to skip
> poison handling, leaving them in the buddy allocator.
>
> Worst case, a poisoned compound page could be reallocated,
> potentially leading to crashes, silent data corruption,
> or unwanted memory containment actions before the poison bit is detected.
>
> This patch removes the '&& !order' restriction. Cleanup functions in the
> poison-handling block correctly handle non-zero order pages, making
> this change safe.
This is not a fix. IIUC, for >0 order free pages, memory failure uses
take_page_off_buddy() in a different code path.
Miaohe (cc’d) should be able to elaborate more on it.
>
> Fixes: 79f5f8fab482 ("mm,hwpoison: rework soft offline for in-use pages")
> Signed-off-by: Boudewijn van der Heide <boudewijn@delta-utec.com>
> ---
> mm/page_alloc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c380f063e8b7..64d15e56706c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1344,7 +1344,7 @@ __always_inline bool free_pages_prepare(struct page *page,
> count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages);
> }
>
> - if (unlikely(PageHWPoison(page)) && !order) {
> + if (unlikely(PageHWPoison(page))) {
> /* Do not let hwpoison pages hit pcplists/buddy */
> reset_page_owner(page, order);
> page_table_check_free(page, order);
> --
> 2.47.3
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] mm/page_alloc: Fix freeing of failed-split poisoned compound pages
2026-01-13 21:05 ` Zi Yan
@ 2026-01-14 14:48 ` Boudewijn van der Heide
2026-01-15 7:55 ` Miaohe Lin
0 siblings, 1 reply; 8+ messages in thread
From: Boudewijn van der Heide @ 2026-01-14 14:48 UTC (permalink / raw)
To: ziy
Cc: akpm, boudewijn, hannes, jackmanb, linmiaohe, linux-kernel,
linux-mm, mhocko, nao.horiguchi, osalvador, surenb, vbabka
> > free_pages_prepare() only handles poisoned order-0 pages.
> > In memory_failure() (hard offline), pages
> > are poisoned before attempting to split huge pages. If the split fails,
> > the page remains a compound (order > 0) but is already poisoned. However,
> > Soft-offline pages are always poisoned as order-0 after migration, so
> > they are unaffected.
> >
> > The '!order' check causes these poisoned compound pages to skip
> > poison handling, leaving them in the buddy allocator.
> >
> > Worst case, a poisoned compound page could be reallocated,
> > potentially leading to crashes, silent data corruption,
> > or unwanted memory containment actions before the poison bit is detected.
> >
> > This patch removes the '&& !order' restriction. Cleanup functions in the
> > poison-handling block correctly handle non-zero order pages, making
> > this change safe.
> This is not a fix. IIUC, for >0 order free pages, memory failure uses
> take_page_off_buddy() in a different code path.
>
Thanks again for the quick response and clarification!
From my understanding,
you correctly noted that take_page_off_buddy() handles already-free pages,
removing them from the buddy lists and setting SetPageHWPoisonTakenOff().
This prevents those pages from re-entering the buddy allocator.
My concern is about in-use THP-backed compound pages:
1. A compound page is in use.
2. memory_failure() marks it poisoned (TestSetPageHWPoison).
3. try_to_split_thp_page() fails.
4. The process using the THP may be killed;
the page remains compound and poisoned.
5. Later, when the page is finally freed, it reaches free_pages_prepare();
'take_page_off_buddy()' is not invoked in this path.
At this point, the current check:
'if (unlikely(PageHWPoison(page)) && !order)'
will not trigger, because the order > 0.
> Miaohe (cc’d) should be able to elaborate more on it.
Thanks for Cc'ing Miaohe, hopefully Miaohe can provide some more insights!
Thanks,
Boudewijn
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] mm/page_alloc: Fix freeing of failed-split poisoned compound pages
2026-01-14 14:48 ` Boudewijn van der Heide
@ 2026-01-15 7:55 ` Miaohe Lin
2026-01-15 17:11 ` Jiaqi Yan
0 siblings, 1 reply; 8+ messages in thread
From: Miaohe Lin @ 2026-01-15 7:55 UTC (permalink / raw)
To: Boudewijn van der Heide, ziy, Jiaqi Yan
Cc: akpm, hannes, jackmanb, linux-kernel, linux-mm, mhocko,
nao.horiguchi, osalvador, surenb, vbabka
On 2026/1/14 22:48, Boudewijn van der Heide wrote:
>>> free_pages_prepare() only handles poisoned order-0 pages.
>>> In memory_failure() (hard offline), pages
>>> are poisoned before attempting to split huge pages. If the split fails,
>>> the page remains a compound (order > 0) but is already poisoned. However,
>>> Soft-offline pages are always poisoned as order-0 after migration, so
>>> they are unaffected.
>>>
>>> The '!order' check causes these poisoned compound pages to skip
>>> poison handling, leaving them in the buddy allocator.
>>>
>>> Worst case, a poisoned compound page could be reallocated,
>>> potentially leading to crashes, silent data corruption,
>>> or unwanted memory containment actions before the poison bit is detected.
>>>
>>> This patch removes the '&& !order' restriction. Cleanup functions in the
>>> poison-handling block correctly handle non-zero order pages, making
>>> this change safe.
>
>> This is not a fix. IIUC, for >0 order free pages, memory failure uses
>> take_page_off_buddy() in a different code path.
>>
>
> Thanks again for the quick response and clarification!
>>From my understanding,
> you correctly noted that take_page_off_buddy() handles already-free pages,
> removing them from the buddy lists and setting SetPageHWPoisonTakenOff().
> This prevents those pages from re-entering the buddy allocator.
Thanks both.
>
> My concern is about in-use THP-backed compound pages:
> 1. A compound page is in use.
> 2. memory_failure() marks it poisoned (TestSetPageHWPoison).
> 3. try_to_split_thp_page() fails.
> 4. The process using the THP may be killed;
> the page remains compound and poisoned.
> 5. Later, when the page is finally freed, it reaches free_pages_prepare();
> 'take_page_off_buddy()' is not invoked in this path.
Yes, this is also a problematic scenario for Hugetlb HugePage. And Jiaqi works on
it now [1]. I think Jiaqi's patches might apply to THP scenario too. Add @Jiaqi to
verify this.
[1]: https://lore.kernel.org/all/20260112004923.888429-1-jiaqiyan@google.com/
Thanks.
.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm/page_alloc: Fix freeing of failed-split poisoned compound pages
2026-01-15 7:55 ` Miaohe Lin
@ 2026-01-15 17:11 ` Jiaqi Yan
2026-01-16 14:11 ` Boudewijn van der Heide
0 siblings, 1 reply; 8+ messages in thread
From: Jiaqi Yan @ 2026-01-15 17:11 UTC (permalink / raw)
To: Miaohe Lin, ziy, Boudewijn van der Heide
Cc: akpm, hannes, jackmanb, linux-kernel, linux-mm, mhocko,
nao.horiguchi, osalvador, surenb, vbabka
On Wed, Jan 14, 2026 at 11:55 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> On 2026/1/14 22:48, Boudewijn van der Heide wrote:
> >>> free_pages_prepare() only handles poisoned order-0 pages.
> >>> In memory_failure() (hard offline), pages
> >>> are poisoned before attempting to split huge pages. If the split fails,
> >>> the page remains a compound (order > 0) but is already poisoned. However,
> >>> Soft-offline pages are always poisoned as order-0 after migration, so
> >>> they are unaffected.
> >>>
> >>> The '!order' check causes these poisoned compound pages to skip
> >>> poison handling, leaving them in the buddy allocator.
> >>>
> >>> Worst case, a poisoned compound page could be reallocated,
> >>> potentially leading to crashes, silent data corruption,
> >>> or unwanted memory containment actions before the poison bit is detected.
> >>>
> >>> This patch removes the '&& !order' restriction. Cleanup functions in the
> >>> poison-handling block correctly handle non-zero order pages, making
> >>> this change safe.
> >
> >> This is not a fix. IIUC, for >0 order free pages, memory failure uses
> >> take_page_off_buddy() in a different code path.
> >>
> >
> > Thanks again for the quick response and clarification!
> >>From my understanding,
> > you correctly noted that take_page_off_buddy() handles already-free pages,
> > removing them from the buddy lists and setting SetPageHWPoisonTakenOff().
> > This prevents those pages from re-entering the buddy allocator.
>
> Thanks both.
>
> >
> > My concern is about in-use THP-backed compound pages:
> > 1. A compound page is in use.
> > 2. memory_failure() marks it poisoned (TestSetPageHWPoison).
> > 3. try_to_split_thp_page() fails.
> > 4. The process using the THP may be killed;
> > the page remains compound and poisoned.
> > 5. Later, when the page is finally freed, it reaches free_pages_prepare();
> > 'take_page_off_buddy()' is not invoked in this path.
I agree that Boudewijn's concern is valid when try_to_split_thp_page() fails.
However, I don't think the fix here really works. For a compound / THP
page, memory-failure() sets PG_HWPoison flag on the exact subpage
within the compound page. I believe the page in free_pages_prepare()
is almost going to be (if no always) the head of the compound page. So
removing "!order" won't really help unless the head of the THP page
happens to be HWPoison.
>
> Yes, this is also a problematic scenario for Hugetlb HugePage. And Jiaqi works on
> it now [1]. I think Jiaqi's patches might apply to THP scenario too. Add @Jiaqi to
> verify this.
Yep, I think my work will also help solve the concern when
try_to_split_thp_page() fails.
>
> [1]: https://lore.kernel.org/all/20260112004923.888429-1-jiaqiyan@google.com/
>
> Thanks.
> .
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm/page_alloc: Fix freeing of failed-split poisoned compound pages
2026-01-15 17:11 ` Jiaqi Yan
@ 2026-01-16 14:11 ` Boudewijn van der Heide
2026-01-24 4:42 ` Jiaqi Yan
0 siblings, 1 reply; 8+ messages in thread
From: Boudewijn van der Heide @ 2026-01-16 14:11 UTC (permalink / raw)
To: jiaqiyan
Cc: akpm, boudewijn, hannes, jackmanb, linmiaohe, linux-kernel,
linux-mm, mhocko, nao.horiguchi, osalvador, surenb, vbabka, ziy
Thanks Jiaqi for the feedback, that is very helpful.
(and thanks Miaohe for connecting the issues.)
After going through the memory_failure(),
I can see it indeed puts the PG_HWPoison flag on the specific subpage pointer,
and therefore my fix won't work as-is.
> >
> > Yes, this is also a problematic scenario for Hugetlb HugePage. And Jiaqi works on
> > it now [1]. I think Jiaqi's patches might apply to THP scenario too. Add @Jiaqi to
> > verify this.
>
> Yep, I think my work will also help solve the concern when
> try_to_split_thp_page() fails.
Your fix makes a lot of sense for hugetlb,
as it linearly scans through all the pages.
From my understanding,
your fix also provides the perfect architecture for also checking THP,
though it doesn't yet cover the in-use THP case outlined.
For THP I would need to trace the failed-split paths more carefully,
to check where the equivalent path for THP would be.
If there is work needed for THP, I'm happy to help.
Would you prefer I work on THP support as a separate follow-up patch,
after yours is merged,
or do you prefer to integrate it in your patch series?
> >
> > [1]: https://lore.kernel.org/all/20260112004923.888429-1-jiaqiyan@google.com/
> >
> > Thanks.
> > .
Thanks,
Boudewijn
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm/page_alloc: Fix freeing of failed-split poisoned compound pages
2026-01-16 14:11 ` Boudewijn van der Heide
@ 2026-01-24 4:42 ` Jiaqi Yan
2026-01-28 2:45 ` Miaohe Lin
0 siblings, 1 reply; 8+ messages in thread
From: Jiaqi Yan @ 2026-01-24 4:42 UTC (permalink / raw)
To: Boudewijn van der Heide, ziy, linmiaohe
Cc: akpm, hannes, jackmanb, linux-kernel, linux-mm, mhocko,
nao.horiguchi, osalvador, surenb, vbabka
On Fri, Jan 16, 2026 at 6:12 AM Boudewijn van der Heide
<boudewijn@delta-utec.com> wrote:
>
> Thanks Jiaqi for the feedback, that is very helpful.
> (and thanks Miaohe for connecting the issues.)
>
> After going through the memory_failure(),
> I can see it indeed puts the PG_HWPoison flag on the specific subpage pointer,
> and therefore my fix won't work as-is.
>
> > >
> > > Yes, this is also a problematic scenario for Hugetlb HugePage. And Jiaqi works on
> > > it now [1]. I think Jiaqi's patches might apply to THP scenario too. Add @Jiaqi to
> > > verify this.
> >
> > Yep, I think my work will also help solve the concern when
> > try_to_split_thp_page() fails.
>
> Your fix makes a lot of sense for hugetlb,
> as it linearly scans through all the pages.
> From my understanding,
> your fix also provides the perfect architecture for also checking THP,
> though it doesn't yet cover the in-use THP case outlined.
Oh, sorry I went ahead myself and assumed the split-failed folio would
eventually be released to the buddy allocator at some point when
userspace processes who owns/maps this THP are killed or exited.
Zi and Miaohe, am I right about this? or do we need explicitly handle
in-use and split-failed THP?
>
> For THP I would need to trace the failed-split paths more carefully,
> to check where the equivalent path for THP would be.
>
> If there is work needed for THP, I'm happy to help.
> Would you prefer I work on THP support as a separate follow-up patch,
> after yours is merged,
> or do you prefer to integrate it in your patch series?
>
> > >
> > > [1]: https://lore.kernel.org/all/20260112004923.888429-1-jiaqiyan@google.com/
> > >
> > > Thanks.
> > > .
>
> Thanks,
> Boudewijn
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm/page_alloc: Fix freeing of failed-split poisoned compound pages
2026-01-24 4:42 ` Jiaqi Yan
@ 2026-01-28 2:45 ` Miaohe Lin
0 siblings, 0 replies; 8+ messages in thread
From: Miaohe Lin @ 2026-01-28 2:45 UTC (permalink / raw)
To: Jiaqi Yan, Boudewijn van der Heide, ziy
Cc: akpm, hannes, jackmanb, linux-kernel, linux-mm, mhocko,
nao.horiguchi, osalvador, surenb, vbabka
On 2026/1/24 12:42, Jiaqi Yan wrote:
> On Fri, Jan 16, 2026 at 6:12 AM Boudewijn van der Heide
> <boudewijn@delta-utec.com> wrote:
>>
>> Thanks Jiaqi for the feedback, that is very helpful.
>> (and thanks Miaohe for connecting the issues.)
>>
>> After going through the memory_failure(),
>> I can see it indeed puts the PG_HWPoison flag on the specific subpage pointer,
>> and therefore my fix won't work as-is.
>>
>>>>
>>>> Yes, this is also a problematic scenario for Hugetlb HugePage. And Jiaqi works on
>>>> it now [1]. I think Jiaqi's patches might apply to THP scenario too. Add @Jiaqi to
>>>> verify this.
>>>
>>> Yep, I think my work will also help solve the concern when
>>> try_to_split_thp_page() fails.
>>
>> Your fix makes a lot of sense for hugetlb,
>> as it linearly scans through all the pages.
>> From my understanding,
>> your fix also provides the perfect architecture for also checking THP,
>> though it doesn't yet cover the in-use THP case outlined.
>
> Oh, sorry I went ahead myself and assumed the split-failed folio would
> eventually be released to the buddy allocator at some point when
> userspace processes who owns/maps this THP are killed or exited.
>
> Zi and Miaohe, am I right about this? or do we need explicitly handle
> in-use and split-failed THP?
IMHO, it's enough to handle poisoned sub-pages when in-use or split-failed THP
eventually be released to the buddy.
Thanks.
.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-01-28 2:45 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-01-13 20:54 [PATCH] mm/page_alloc: Fix freeing of failed-split poisoned compound pages Boudewijn van der Heide
2026-01-13 21:05 ` Zi Yan
2026-01-14 14:48 ` Boudewijn van der Heide
2026-01-15 7:55 ` Miaohe Lin
2026-01-15 17:11 ` Jiaqi Yan
2026-01-16 14:11 ` Boudewijn van der Heide
2026-01-24 4:42 ` Jiaqi Yan
2026-01-28 2:45 ` Miaohe Lin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox