From: Shuai Xue <xueshuai@linux.alibaba.com>
To: tony.luck@intel.com, bp@alien8.de, nao.horiguchi@gmail.com
Cc: tglx@linutronix.de, mingo@redhat.com,
dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
linmiaohe@huawei.com, akpm@linux-foundation.org,
linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, baolin.wang@linux.alibaba.com,
tianruidong@linux.alibaba.com
Subject: [PATCH v1 4/4] mm/hwpoison: Fix incorrect "not recovered" report for recovered clean pages
Date: Tue, 11 Feb 2025 14:02:00 +0800 [thread overview]
Message-ID: <20250211060200.33845-5-xueshuai@linux.alibaba.com> (raw)
In-Reply-To: <20250211060200.33845-1-xueshuai@linux.alibaba.com>
When an uncorrected memory error is consumed there is a race between
the CMCI from the memory controller reporting an uncorrected error
with a UCNA signature, and the core reporting and SRAR signature
machine check when the data is about to be consumed.
If the CMCI wins that race, the page is marked poisoned when
uc_decode_notifier() calls memory_failure(). For dirty pages,
memory_failure() invokes try_to_unmap() with the TTU_HWPOISON flag,
converting the PTE to a hwpoison entry. However, for clean pages, the
TTU_HWPOISON flag is cleared, leaving the PTE unchanged and not converted
to a hwpoison entry. Consequently, for an unmapped dirty page, the PTE is
marked as a hwpoison entry allowing kill_accessing_process() to:
- call walk_page_range() and return 1
- call kill_proc() to make sure a SIGBUS is sent
- return -EHWPOISON to indicate that SIGBUS is already sent to the process
and kill_me_maybe() doesn't have to send it again.
Conversely, for clean pages where PTE entries are not marked as hwpoison,
kill_accessing_process() returns -EFAULT, causing kill_me_maybe() to send a
SIGBUS.
Console log looks like this:
Memory failure: 0x827ca68: corrupted page was clean: dropped without side effects
Memory failure: 0x827ca68: recovery action for clean LRU page: Recovered
Memory failure: 0x827ca68: already hardware poisoned
mce: Memory error not recovered
To fix it, return -EHWPOISON if no hwpoison PTE entry is found, preventing
an unnecessary SIGBUS.
Fixes: 046545a661af ("mm/hwpoison: fix error page recovered but reported "not recovered"")
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
mm/memory-failure.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 995a15eb67e2..f9a6b136a6f0 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -883,10 +883,9 @@ static int kill_accessing_process(struct task_struct *p, unsigned long pfn,
(void *)&priv);
if (ret == 1 && priv.tk.addr)
kill_proc(&priv.tk, pfn, flags);
- else
- ret = 0;
mmap_read_unlock(p->mm);
- return ret > 0 ? -EHWPOISON : -EFAULT;
+
+ return ret >= 0 ? -EHWPOISON : -EFAULT;
}
/*
--
2.39.3
next prev parent reply other threads:[~2025-02-11 6:02 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-11 6:01 [PATCH v1 0/4] fmm/hwpoison: Fix regressions in memory failure handling Shuai Xue
2025-02-11 6:01 ` [PATCH v1 1/4] x86/mce: Collect error message for severities below MCE_PANIC_SEVERITY Shuai Xue
2025-02-11 16:51 ` Luck, Tony
2025-02-12 1:51 ` Shuai Xue
2025-02-11 6:01 ` [PATCH v1 2/4] x86/mce: dump error msg from severities Shuai Xue
2025-02-11 16:44 ` Luck, Tony
2025-02-14 9:29 ` Shuai Xue
2025-02-14 16:57 ` Luck, Tony
2025-02-11 6:01 ` [PATCH v1 3/4] x86/mce: add EX_TYPE_EFAULT_REG as in-kernel recovery context to fix copy-from-user operations regression Shuai Xue
2025-02-11 6:02 ` Shuai Xue [this message]
2025-02-12 8:09 ` [PATCH v1 4/4] mm/hwpoison: Fix incorrect "not recovered" report for recovered clean pages Miaohe Lin
2025-02-12 13:55 ` Shuai Xue
2025-02-13 3:20 ` Miaohe Lin
2025-02-13 6:59 ` Shuai Xue
2025-02-14 6:54 ` Miaohe Lin
2025-02-14 7:59 ` Shuai Xue
2025-02-14 16:51 ` Luck, Tony
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250211060200.33845-5-xueshuai@linux.alibaba.com \
--to=xueshuai@linux.alibaba.com \
--cc=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=linmiaohe@huawei.com \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@redhat.com \
--cc=nao.horiguchi@gmail.com \
--cc=tglx@linutronix.de \
--cc=tianruidong@linux.alibaba.com \
--cc=tony.luck@intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox