On 8/10/2025 9:33 PM, Jinjiang Tu wrote:
When memory_failure() is called for a already hwpoisoned pfn backed with
struct page, kill_accessing_process() will conditionally send a SIGBUS to
the current (triggering) process if it maps the page.
However, in case the page is not ordinarily mapped, but was mapped through
remap_pfn_range(), kill_accessing_process() wouldn't identify it as mapped
even though hwpoison_pte_range() would be prepared to handle it, because
walk_page_range() will skip VM_PFNMAP as default in walk_page_test(). As
a result, walk_page_range() will return 0, assuming "not mapped" and SIGBUS
will be skipped. The user task will trigger UCE infinitely because it will
not receive a SIGBUS on access and simply retry.
Before commit aaf99ac2ceb7 ("mm/hwpoison: do not send SIGBUS to processes
with recovered clean pages"), kill_accessing_process() will return EFAULT.
For x86, the current task will be killed in kill_me_maybe().
To fix it, add .test_walk callback for hwpoison_walk_ops to process
VM_PFNMAP VMAs too.
Fixes: aaf99ac2ceb7 ("mm/hwpoison: do not send SIGBUS to processes with recovered clean pages")
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
---
Changelog since v1:
* update patch description, suggested by David Hildenbrand
mm/memory-failure.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index e2e685b971bb..fa6a8f2cdebc 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -853,9 +853,16 @@ static int hwpoison_hugetlb_range(pte_t *ptep, unsigned long hmask,
#define hwpoison_hugetlb_range NULL
#endif
+static int hwpoison_test_walk(unsigned long start, unsigned long end,
+ struct mm_walk *walk)
+{
+ return 0;
+}
+
static const struct mm_walk_ops hwpoison_walk_ops = {
.pmd_entry = hwpoison_pte_range,
.hugetlb_entry = hwpoison_hugetlb_range,
+ .test_walk = hwpoison_test_walk,
.walk_lock = PGWALK_RDLOCK,
};
Looks good. Could you add this to stable ?
Yes, I will.
Reviewed-by: Jane Chu <jane.chu@oracle.com>
thanks,
-jane