* [PATCH v6 1/2] mm/memory-failure: fix missing ->mf_stats count in hugetlb poison
@ 2026-01-16 20:38 Jane Chu
2026-01-16 20:38 ` [PATCH v6 2/2] mm/memory-failure: teach kill_accessing_process to accept hugetlb tail page pfn Jane Chu
2026-01-20 11:54 ` [PATCH v6 1/2] mm/memory-failure: fix missing ->mf_stats count in hugetlb poison Miaohe Lin
0 siblings, 2 replies; 4+ messages in thread
From: Jane Chu @ 2026-01-16 20:38 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, stable, muchun.song, osalvador, david, linmiaohe,
jiaqiyan, william.roche, rientjes, akpm, lorenzo.stoakes,
Liam.Howlett, rppt, surenb, mhocko, willy, clm
When a newly poisoned subpage ends up in an already poisoned hugetlb
folio, 'num_poisoned_pages' is incremented, but the per node ->mf_stats
is not. Fix the inconsistency by designating action_result() to update
them both.
While at it, define __get_huge_page_for_hwpoison() return values in terms
of symbol names for better readibility. Also rename
folio_set_hugetlb_hwpoison() to hugetlb_update_hwpoison() since the
function does more than the conventional bit setting and the fact
three possible return values are expected.
Fixes: 18f41fa616ee ("mm: memory-failure: bump memory failure stats to pglist_data")
Cc: <stable@vger.kernel.org>
Signed-off-by: Jane Chu <jane.chu@oracle.com>
---
v5 -> v6:
comments from Miaohe.
v5 -> v4:
fix a bug pointed out by William and Chris, add comment.
v3 -> v4:
incorporate/adapt David's suggestions.
v2 -> v3:
No change.
v1 -> v2:
adapted David and Liam's comment, define __get_huge_page_for_hwpoison()
return values in terms of symbol names instead of naked integers for better
readibility. #define instead of enum is used since the function has footprint
outside MF, just try to limit the MF specifics local.
also renamed folio_set_hugetlb_hwpoison() to hugetlb_update_hwpoison()
since the function does more than the conventional bit setting and the
fact three possible return values are expected.
---
mm/memory-failure.c | 91 +++++++++++++++++++++++++++------------------
1 file changed, 54 insertions(+), 37 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index c80c2907da33..49ced16e9c1a 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1883,12 +1883,22 @@ static unsigned long __folio_free_raw_hwp(struct folio *folio, bool move_flag)
return count;
}
-static int folio_set_hugetlb_hwpoison(struct folio *folio, struct page *page)
+#define MF_HUGETLB_FREED 0 /* freed hugepage */
+#define MF_HUGETLB_IN_USED 1 /* in-use hugepage */
+#define MF_HUGETLB_NON_HUGEPAGE 2 /* not a hugepage */
+#define MF_HUGETLB_FOLIO_PRE_POISONED 3 /* folio already poisoned */
+#define MF_HUGETLB_PAGE_PRE_POISONED 4 /* exact page already poisoned */
+#define MF_HUGETLB_RETRY 5 /* hugepage is busy, retry */
+/*
+ * Set hugetlb folio as hwpoisoned, update folio private raw hwpoison list
+ * to keep track of the poisoned pages.
+ */
+static int hugetlb_update_hwpoison(struct folio *folio, struct page *page)
{
struct llist_head *head;
struct raw_hwp_page *raw_hwp;
struct raw_hwp_page *p;
- int ret = folio_test_set_hwpoison(folio) ? -EHWPOISON : 0;
+ int ret = folio_test_set_hwpoison(folio) ? MF_HUGETLB_FOLIO_PRE_POISONED : 0;
/*
* Once the hwpoison hugepage has lost reliable raw error info,
@@ -1896,20 +1906,17 @@ static int folio_set_hugetlb_hwpoison(struct folio *folio, struct page *page)
* so skip to add additional raw error info.
*/
if (folio_test_hugetlb_raw_hwp_unreliable(folio))
- return -EHWPOISON;
+ return MF_HUGETLB_FOLIO_PRE_POISONED;
head = raw_hwp_list_head(folio);
llist_for_each_entry(p, head->first, node) {
if (p->page == page)
- return -EHWPOISON;
+ return MF_HUGETLB_PAGE_PRE_POISONED;
}
raw_hwp = kmalloc(sizeof(struct raw_hwp_page), GFP_ATOMIC);
if (raw_hwp) {
raw_hwp->page = page;
llist_add(&raw_hwp->node, head);
- /* the first error event will be counted in action_result(). */
- if (ret)
- num_poisoned_pages_inc(page_to_pfn(page));
} else {
/*
* Failed to save raw error info. We no longer trace all
@@ -1957,42 +1964,38 @@ void folio_clear_hugetlb_hwpoison(struct folio *folio)
/*
* Called from hugetlb code with hugetlb_lock held.
- *
- * Return values:
- * 0 - free hugepage
- * 1 - in-use hugepage
- * 2 - not a hugepage
- * -EBUSY - the hugepage is busy (try to retry)
- * -EHWPOISON - the hugepage is already hwpoisoned
*/
int __get_huge_page_for_hwpoison(unsigned long pfn, int flags,
bool *migratable_cleared)
{
struct page *page = pfn_to_page(pfn);
struct folio *folio = page_folio(page);
- int ret = 2; /* fallback to normal page handling */
bool count_increased = false;
+ int ret, rc;
- if (!folio_test_hugetlb(folio))
+ if (!folio_test_hugetlb(folio)) {
+ ret = MF_HUGETLB_NON_HUGEPAGE;
goto out;
-
- if (flags & MF_COUNT_INCREASED) {
- ret = 1;
+ } else if (flags & MF_COUNT_INCREASED) {
+ ret = MF_HUGETLB_IN_USED;
count_increased = true;
} else if (folio_test_hugetlb_freed(folio)) {
- ret = 0;
+ ret = MF_HUGETLB_FREED;
} else if (folio_test_hugetlb_migratable(folio)) {
- ret = folio_try_get(folio);
- if (ret)
+ if (folio_try_get(folio)) {
+ ret = MF_HUGETLB_IN_USED;
count_increased = true;
+ } else
+ ret = MF_HUGETLB_FREED;
} else {
- ret = -EBUSY;
+ ret = MF_HUGETLB_RETRY;
if (!(flags & MF_NO_RETRY))
goto out;
}
- if (folio_set_hugetlb_hwpoison(folio, page)) {
- ret = -EHWPOISON;
+ rc = hugetlb_update_hwpoison(folio, page);
+ if (rc >= MF_HUGETLB_FOLIO_PRE_POISONED) {
+ ret = rc;
goto out;
}
@@ -2017,10 +2020,15 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags,
* with basic operations like hugepage allocation/free/demotion.
* So some of prechecks for hwpoison (pinning, and testing/setting
* PageHWPoison) should be done in single hugetlb_lock range.
+ * Returns:
+ * 0 - not hugetlb, or recovered
+ * -EBUSY - not recovered
+ * -EOPNOTSUPP - hwpoison_filter'ed
+ * -EHWPOISON - folio or exact page already poisoned
*/
static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb)
{
- int res;
+ int res, rv;
struct page *p = pfn_to_page(pfn);
struct folio *folio;
unsigned long page_flags;
@@ -2029,22 +2037,31 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
*hugetlb = 1;
retry:
res = get_huge_page_for_hwpoison(pfn, flags, &migratable_cleared);
- if (res == 2) { /* fallback to normal page handling */
+ switch (res) {
+ case MF_HUGETLB_NON_HUGEPAGE: /* fallback to normal page handling */
*hugetlb = 0;
return 0;
- } else if (res == -EHWPOISON) {
- if (flags & MF_ACTION_REQUIRED) {
- folio = page_folio(p);
- res = kill_accessing_process(current, folio_pfn(folio), flags);
- }
- action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
- return res;
- } else if (res == -EBUSY) {
+ case MF_HUGETLB_RETRY:
if (!(flags & MF_NO_RETRY)) {
flags |= MF_NO_RETRY;
goto retry;
}
return action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED);
+ case MF_HUGETLB_FOLIO_PRE_POISONED:
+ case MF_HUGETLB_PAGE_PRE_POISONED:
+ rv = -EHWPOISON;
+ if (flags & MF_ACTION_REQUIRED) {
+ folio = page_folio(p);
+ rv = kill_accessing_process(current, folio_pfn(folio), flags);
+ }
+ if (res == MF_HUGETLB_PAGE_PRE_POISONED)
+ action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
+ else
+ action_result(pfn, MF_MSG_HUGE, MF_FAILED);
+ return rv;
+ default:
+ WARN_ON((res != MF_HUGETLB_FREED) && (res != MF_HUGETLB_IN_USED));
+ break;
}
folio = page_folio(p);
@@ -2055,7 +2072,7 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
if (migratable_cleared)
folio_set_hugetlb_migratable(folio);
folio_unlock(folio);
- if (res == 1)
+ if (res == MF_HUGETLB_IN_USED)
folio_put(folio);
return -EOPNOTSUPP;
}
@@ -2064,7 +2081,7 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
* Handling free hugepage. The possible race with hugepage allocation
* or demotion can be prevented by PageHWPoison flag.
*/
- if (res == 0) {
+ if (res == MF_HUGETLB_FREED) {
folio_unlock(folio);
if (__page_handle_poison(p) > 0) {
page_ref_inc(p);
--
2.43.5
^ permalink raw reply [flat|nested] 4+ messages in thread* [PATCH v6 2/2] mm/memory-failure: teach kill_accessing_process to accept hugetlb tail page pfn
2026-01-16 20:38 [PATCH v6 1/2] mm/memory-failure: fix missing ->mf_stats count in hugetlb poison Jane Chu
@ 2026-01-16 20:38 ` Jane Chu
2026-01-20 11:54 ` [PATCH v6 1/2] mm/memory-failure: fix missing ->mf_stats count in hugetlb poison Miaohe Lin
1 sibling, 0 replies; 4+ messages in thread
From: Jane Chu @ 2026-01-16 20:38 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, stable, muchun.song, osalvador, david, linmiaohe,
jiaqiyan, william.roche, rientjes, akpm, lorenzo.stoakes,
Liam.Howlett, rppt, surenb, mhocko, willy, clm
When a hugetlb folio is being poisoned again, try_memory_failure_hugetlb()
passed head pfn to kill_accessing_process(), that is not right.
The precise pfn of the poisoned page should be used in order to
determine the precise vaddr as the SIGBUS payload.
This issue has already been taken care of in the normal path, that is,
hwpoison_user_mappings(), see [1][2]. Further more, for [3] to work
correctly in the hugetlb repoisoning case, it's essential to inform
VM the precise poisoned page, not the head page.
[1] https://lkml.kernel.org/r/20231218135837.3310403-1-willy@infradead.org
[2] https://lkml.kernel.org/r/20250224211445.2663312-1-jane.chu@oracle.com
[3] https://lore.kernel.org/lkml/20251116013223.1557158-1-jiaqiyan@google.com/
Cc: <stable@vger.kernel.org>
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Acked-by: Miaohe Lin <linmiaohe@huawei.com>
---
v5 -> v6:
comment from Miaohe, add an acked-by.
v5, v4: No change.
v2 -> v3:
incorporated suggestions from Miaohe and Matthew.
v1 -> v2:
pickup R-B, add stable to cc list.
---
mm/memory-failure.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 49ced16e9c1a..2d330176364a 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -692,6 +692,8 @@ static int check_hwpoisoned_entry(pte_t pte, unsigned long addr, short shift,
unsigned long poisoned_pfn, struct to_kill *tk)
{
unsigned long pfn = 0;
+ unsigned long hwpoison_vaddr;
+ unsigned long mask;
if (pte_present(pte)) {
pfn = pte_pfn(pte);
@@ -702,10 +704,12 @@ static int check_hwpoisoned_entry(pte_t pte, unsigned long addr, short shift,
pfn = softleaf_to_pfn(entry);
}
- if (!pfn || pfn != poisoned_pfn)
+ mask = ~((1UL << (shift - PAGE_SHIFT)) - 1);
+ if (!pfn || pfn != (poisoned_pfn & mask))
return 0;
- set_to_kill(tk, addr, shift);
+ hwpoison_vaddr = addr + ((poisoned_pfn - pfn) << PAGE_SHIFT);
+ set_to_kill(tk, hwpoison_vaddr, shift);
return 1;
}
@@ -2050,10 +2054,8 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
case MF_HUGETLB_FOLIO_PRE_POISONED:
case MF_HUGETLB_PAGE_PRE_POISONED:
rv = -EHWPOISON;
- if (flags & MF_ACTION_REQUIRED) {
- folio = page_folio(p);
- rv = kill_accessing_process(current, folio_pfn(folio), flags);
- }
+ if (flags & MF_ACTION_REQUIRED)
+ rv = kill_accessing_process(current, pfn, flags);
if (res == MF_HUGETLB_PAGE_PRE_POISONED)
action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
else
--
2.43.5
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH v6 1/2] mm/memory-failure: fix missing ->mf_stats count in hugetlb poison
2026-01-16 20:38 [PATCH v6 1/2] mm/memory-failure: fix missing ->mf_stats count in hugetlb poison Jane Chu
2026-01-16 20:38 ` [PATCH v6 2/2] mm/memory-failure: teach kill_accessing_process to accept hugetlb tail page pfn Jane Chu
@ 2026-01-20 11:54 ` Miaohe Lin
2026-01-20 23:23 ` jane.chu
1 sibling, 1 reply; 4+ messages in thread
From: Miaohe Lin @ 2026-01-20 11:54 UTC (permalink / raw)
To: Jane Chu
Cc: linux-mm, stable, muchun.song, osalvador, david, jiaqiyan,
william.roche, rientjes, akpm, lorenzo.stoakes, Liam.Howlett,
rppt, surenb, mhocko, willy, clm, linux-kernel
On 2026/1/17 4:38, Jane Chu wrote:
> When a newly poisoned subpage ends up in an already poisoned hugetlb
> folio, 'num_poisoned_pages' is incremented, but the per node ->mf_stats
> is not. Fix the inconsistency by designating action_result() to update
> them both.
>
> While at it, define __get_huge_page_for_hwpoison() return values in terms
> of symbol names for better readibility. Also rename
> folio_set_hugetlb_hwpoison() to hugetlb_update_hwpoison() since the
> function does more than the conventional bit setting and the fact
> three possible return values are expected.
>
> Fixes: 18f41fa616ee ("mm: memory-failure: bump memory failure stats to pglist_data")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Jane Chu <jane.chu@oracle.com>
This patch looks good to me with some nits below.
Acked-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
> v5 -> v6:
> comments from Miaohe.
> v5 -> v4:
> fix a bug pointed out by William and Chris, add comment.
> v3 -> v4:
> incorporate/adapt David's suggestions.
> v2 -> v3:
> No change.
> v1 -> v2:
> adapted David and Liam's comment, define __get_huge_page_for_hwpoison()
> return values in terms of symbol names instead of naked integers for better
> readibility. #define instead of enum is used since the function has footprint
> outside MF, just try to limit the MF specifics local.
> also renamed folio_set_hugetlb_hwpoison() to hugetlb_update_hwpoison()
> since the function does more than the conventional bit setting and the
> fact three possible return values are expected.
>
> ---
> mm/memory-failure.c | 91 +++++++++++++++++++++++++++------------------
> 1 file changed, 54 insertions(+), 37 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index c80c2907da33..49ced16e9c1a 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1883,12 +1883,22 @@ static unsigned long __folio_free_raw_hwp(struct folio *folio, bool move_flag)
> return count;
> }
>
> -static int folio_set_hugetlb_hwpoison(struct folio *folio, struct page *page)
> +#define MF_HUGETLB_FREED 0 /* freed hugepage */
> +#define MF_HUGETLB_IN_USED 1 /* in-use hugepage */
> +#define MF_HUGETLB_NON_HUGEPAGE 2 /* not a hugepage */
> +#define MF_HUGETLB_FOLIO_PRE_POISONED 3 /* folio already poisoned */
> +#define MF_HUGETLB_PAGE_PRE_POISONED 4 /* exact page already poisoned */
> +#define MF_HUGETLB_RETRY 5 /* hugepage is busy, retry */
> +/*
> + * Set hugetlb folio as hwpoisoned, update folio private raw hwpoison list
> + * to keep track of the poisoned pages.
> + */
> +static int hugetlb_update_hwpoison(struct folio *folio, struct page *page)
> {
> struct llist_head *head;
> struct raw_hwp_page *raw_hwp;
> struct raw_hwp_page *p;
> - int ret = folio_test_set_hwpoison(folio) ? -EHWPOISON : 0;
> + int ret = folio_test_set_hwpoison(folio) ? MF_HUGETLB_FOLIO_PRE_POISONED : 0;
>
> /*
> * Once the hwpoison hugepage has lost reliable raw error info,
> @@ -1896,20 +1906,17 @@ static int folio_set_hugetlb_hwpoison(struct folio *folio, struct page *page)
> * so skip to add additional raw error info.
> */
> if (folio_test_hugetlb_raw_hwp_unreliable(folio))
> - return -EHWPOISON;
> + return MF_HUGETLB_FOLIO_PRE_POISONED;
> head = raw_hwp_list_head(folio);
> llist_for_each_entry(p, head->first, node) {
> if (p->page == page)
> - return -EHWPOISON;
> + return MF_HUGETLB_PAGE_PRE_POISONED;
> }
>
> raw_hwp = kmalloc(sizeof(struct raw_hwp_page), GFP_ATOMIC);
> if (raw_hwp) {
> raw_hwp->page = page;
> llist_add(&raw_hwp->node, head);
> - /* the first error event will be counted in action_result(). */
> - if (ret)
> - num_poisoned_pages_inc(page_to_pfn(page));
> } else {
> /*
> * Failed to save raw error info. We no longer trace all
> @@ -1957,42 +1964,38 @@ void folio_clear_hugetlb_hwpoison(struct folio *folio)
>
> /*
> * Called from hugetlb code with hugetlb_lock held.
> - *
> - * Return values:
> - * 0 - free hugepage
> - * 1 - in-use hugepage
> - * 2 - not a hugepage
> - * -EBUSY - the hugepage is busy (try to retry)
> - * -EHWPOISON - the hugepage is already hwpoisoned
> */
> int __get_huge_page_for_hwpoison(unsigned long pfn, int flags,
> bool *migratable_cleared)
> {
> struct page *page = pfn_to_page(pfn);
> struct folio *folio = page_folio(page);
> - int ret = 2; /* fallback to normal page handling */
> bool count_increased = false;
> + int ret, rc;
>
> - if (!folio_test_hugetlb(folio))
> + if (!folio_test_hugetlb(folio)) {
> + ret = MF_HUGETLB_NON_HUGEPAGE;
> goto out;
> -
> - if (flags & MF_COUNT_INCREASED) {
> - ret = 1;
> + } else if (flags & MF_COUNT_INCREASED) {
> + ret = MF_HUGETLB_IN_USED;
> count_increased = true;
> } else if (folio_test_hugetlb_freed(folio)) {
> - ret = 0;
> + ret = MF_HUGETLB_FREED;
> } else if (folio_test_hugetlb_migratable(folio)) {
> - ret = folio_try_get(folio);
> - if (ret)
> + if (folio_try_get(folio)) {
> + ret = MF_HUGETLB_IN_USED;
> count_increased = true;
> + } else
> + ret = MF_HUGETLB_FREED;
IIRC, code style requires {} here. .i.e
if (folio_try_get(folio)) {
ret = MF_HUGETLB_IN_USED;
count_increased = true;
} else {
ret = MF_HUGETLB_FREED;
}
> } else {
> - ret = -EBUSY;
> + ret = MF_HUGETLB_RETRY;
> if (!(flags & MF_NO_RETRY))
> goto out;
> }
>
> - if (folio_set_hugetlb_hwpoison(folio, page)) {
> - ret = -EHWPOISON;
> + rc = hugetlb_update_hwpoison(folio, page);
> + if (rc >= MF_HUGETLB_FOLIO_PRE_POISONED) {
> + ret = rc;
> goto out;
> }
>
> @@ -2017,10 +2020,15 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags,
> * with basic operations like hugepage allocation/free/demotion.
> * So some of prechecks for hwpoison (pinning, and testing/setting
> * PageHWPoison) should be done in single hugetlb_lock range.
> + * Returns:
> + * 0 - not hugetlb, or recovered
> + * -EBUSY - not recovered
> + * -EOPNOTSUPP - hwpoison_filter'ed
> + * -EHWPOISON - folio or exact page already poisoned
-EFAULT can be returned when kill_accessing_process finds p->mm is null. So it might be better
to comment EFAULT case too.
Thanks.
.
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH v6 1/2] mm/memory-failure: fix missing ->mf_stats count in hugetlb poison
2026-01-20 11:54 ` [PATCH v6 1/2] mm/memory-failure: fix missing ->mf_stats count in hugetlb poison Miaohe Lin
@ 2026-01-20 23:23 ` jane.chu
0 siblings, 0 replies; 4+ messages in thread
From: jane.chu @ 2026-01-20 23:23 UTC (permalink / raw)
To: Miaohe Lin
Cc: linux-mm, stable, muchun.song, osalvador, david, jiaqiyan,
william.roche, rientjes, akpm, lorenzo.stoakes, Liam.Howlett,
rppt, surenb, mhocko, willy, clm, linux-kernel
On 1/20/2026 3:54 AM, Miaohe Lin wrote:
> On 2026/1/17 4:38, Jane Chu wrote:
>> When a newly poisoned subpage ends up in an already poisoned hugetlb
>> folio, 'num_poisoned_pages' is incremented, but the per node ->mf_stats
>> is not. Fix the inconsistency by designating action_result() to update
>> them both.
>>
>> While at it, define __get_huge_page_for_hwpoison() return values in terms
>> of symbol names for better readibility. Also rename
>> folio_set_hugetlb_hwpoison() to hugetlb_update_hwpoison() since the
>> function does more than the conventional bit setting and the fact
>> three possible return values are expected.
>>
>> Fixes: 18f41fa616ee ("mm: memory-failure: bump memory failure stats to pglist_data")
>> Cc: <stable@vger.kernel.org>
>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
>
> This patch looks good to me with some nits below.
>
> Acked-by: Miaohe Lin <linmiaohe@huawei.com>
>
>> ---
>> v5 -> v6:
>> comments from Miaohe.
>> v5 -> v4:
>> fix a bug pointed out by William and Chris, add comment.
>> v3 -> v4:
>> incorporate/adapt David's suggestions.
>> v2 -> v3:
>> No change.
>> v1 -> v2:
>> adapted David and Liam's comment, define __get_huge_page_for_hwpoison()
>> return values in terms of symbol names instead of naked integers for better
>> readibility. #define instead of enum is used since the function has footprint
>> outside MF, just try to limit the MF specifics local.
>> also renamed folio_set_hugetlb_hwpoison() to hugetlb_update_hwpoison()
>> since the function does more than the conventional bit setting and the
>> fact three possible return values are expected.
>>
>> ---
>> mm/memory-failure.c | 91 +++++++++++++++++++++++++++------------------
>> 1 file changed, 54 insertions(+), 37 deletions(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index c80c2907da33..49ced16e9c1a 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1883,12 +1883,22 @@ static unsigned long __folio_free_raw_hwp(struct folio *folio, bool move_flag)
>> return count;
>> }
>>
>> -static int folio_set_hugetlb_hwpoison(struct folio *folio, struct page *page)
>> +#define MF_HUGETLB_FREED 0 /* freed hugepage */
>> +#define MF_HUGETLB_IN_USED 1 /* in-use hugepage */
>> +#define MF_HUGETLB_NON_HUGEPAGE 2 /* not a hugepage */
>> +#define MF_HUGETLB_FOLIO_PRE_POISONED 3 /* folio already poisoned */
>> +#define MF_HUGETLB_PAGE_PRE_POISONED 4 /* exact page already poisoned */
>> +#define MF_HUGETLB_RETRY 5 /* hugepage is busy, retry */
>> +/*
>> + * Set hugetlb folio as hwpoisoned, update folio private raw hwpoison list
>> + * to keep track of the poisoned pages.
>> + */
>> +static int hugetlb_update_hwpoison(struct folio *folio, struct page *page)
>> {
>> struct llist_head *head;
>> struct raw_hwp_page *raw_hwp;
>> struct raw_hwp_page *p;
>> - int ret = folio_test_set_hwpoison(folio) ? -EHWPOISON : 0;
>> + int ret = folio_test_set_hwpoison(folio) ? MF_HUGETLB_FOLIO_PRE_POISONED : 0;
>>
>> /*
>> * Once the hwpoison hugepage has lost reliable raw error info,
>> @@ -1896,20 +1906,17 @@ static int folio_set_hugetlb_hwpoison(struct folio *folio, struct page *page)
>> * so skip to add additional raw error info.
>> */
>> if (folio_test_hugetlb_raw_hwp_unreliable(folio))
>> - return -EHWPOISON;
>> + return MF_HUGETLB_FOLIO_PRE_POISONED;
>> head = raw_hwp_list_head(folio);
>> llist_for_each_entry(p, head->first, node) {
>> if (p->page == page)
>> - return -EHWPOISON;
>> + return MF_HUGETLB_PAGE_PRE_POISONED;
>> }
>>
>> raw_hwp = kmalloc(sizeof(struct raw_hwp_page), GFP_ATOMIC);
>> if (raw_hwp) {
>> raw_hwp->page = page;
>> llist_add(&raw_hwp->node, head);
>> - /* the first error event will be counted in action_result(). */
>> - if (ret)
>> - num_poisoned_pages_inc(page_to_pfn(page));
>> } else {
>> /*
>> * Failed to save raw error info. We no longer trace all
>> @@ -1957,42 +1964,38 @@ void folio_clear_hugetlb_hwpoison(struct folio *folio)
>>
>> /*
>> * Called from hugetlb code with hugetlb_lock held.
>> - *
>> - * Return values:
>> - * 0 - free hugepage
>> - * 1 - in-use hugepage
>> - * 2 - not a hugepage
>> - * -EBUSY - the hugepage is busy (try to retry)
>> - * -EHWPOISON - the hugepage is already hwpoisoned
>> */
>> int __get_huge_page_for_hwpoison(unsigned long pfn, int flags,
>> bool *migratable_cleared)
>> {
>> struct page *page = pfn_to_page(pfn);
>> struct folio *folio = page_folio(page);
>> - int ret = 2; /* fallback to normal page handling */
>> bool count_increased = false;
>> + int ret, rc;
>>
>> - if (!folio_test_hugetlb(folio))
>> + if (!folio_test_hugetlb(folio)) {
>> + ret = MF_HUGETLB_NON_HUGEPAGE;
>> goto out;
>> -
>> - if (flags & MF_COUNT_INCREASED) {
>> - ret = 1;
>> + } else if (flags & MF_COUNT_INCREASED) {
>> + ret = MF_HUGETLB_IN_USED;
>> count_increased = true;
>> } else if (folio_test_hugetlb_freed(folio)) {
>> - ret = 0;
>> + ret = MF_HUGETLB_FREED;
>> } else if (folio_test_hugetlb_migratable(folio)) {
>> - ret = folio_try_get(folio);
>> - if (ret)
>> + if (folio_try_get(folio)) {
>> + ret = MF_HUGETLB_IN_USED;
>> count_increased = true;
>> + } else
>> + ret = MF_HUGETLB_FREED;
>
> IIRC, code style requires {} here. .i.e
>
> if (folio_try_get(folio)) {
> ret = MF_HUGETLB_IN_USED;
> count_increased = true;
> } else {
> ret = MF_HUGETLB_FREED;
> }
>
>> } else {
>> - ret = -EBUSY;
>> + ret = MF_HUGETLB_RETRY;
>> if (!(flags & MF_NO_RETRY))
>> goto out;
>> }
>>
>> - if (folio_set_hugetlb_hwpoison(folio, page)) {
>> - ret = -EHWPOISON;
>> + rc = hugetlb_update_hwpoison(folio, page);
>> + if (rc >= MF_HUGETLB_FOLIO_PRE_POISONED) {
>> + ret = rc;
>> goto out;
>> }
>>
>> @@ -2017,10 +2020,15 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags,
>> * with basic operations like hugepage allocation/free/demotion.
>> * So some of prechecks for hwpoison (pinning, and testing/setting
>> * PageHWPoison) should be done in single hugetlb_lock range.
>> + * Returns:
>> + * 0 - not hugetlb, or recovered
>> + * -EBUSY - not recovered
>> + * -EOPNOTSUPP - hwpoison_filter'ed
>> + * -EHWPOISON - folio or exact page already poisoned
>
> -EFAULT can be returned when kill_accessing_process finds p->mm is null. So it might be better
> to comment EFAULT case too.
>
> Thanks.
Thanks a lot! v7 sent out.
-jane
> .
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-01-20 23:23 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-01-16 20:38 [PATCH v6 1/2] mm/memory-failure: fix missing ->mf_stats count in hugetlb poison Jane Chu
2026-01-16 20:38 ` [PATCH v6 2/2] mm/memory-failure: teach kill_accessing_process to accept hugetlb tail page pfn Jane Chu
2026-01-20 11:54 ` [PATCH v6 1/2] mm/memory-failure: fix missing ->mf_stats count in hugetlb poison Miaohe Lin
2026-01-20 23:23 ` jane.chu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox