* [PATCH v3 1/3] x86/mce: Use is_copy_from_user() to determine copy-from-user context
2025-03-06 2:10 [PATCH v3 0/3] mm/hwpoison: Fix regressions in memory failure handling Shuai Xue
@ 2025-03-06 2:10 ` Shuai Xue
2025-03-06 18:15 ` Luck, Tony
` (2 more replies)
2025-03-06 2:10 ` [PATCH v3 2/3] mm/hwpoison: Do not send SIGBUS to processes with recovered clean pages Shuai Xue
2025-03-06 2:10 ` [PATCH v3 3/3] mm: memory-failure: Enhance comments for return value of memory_failure() Shuai Xue
2 siblings, 3 replies; 9+ messages in thread
From: Shuai Xue @ 2025-03-06 2:10 UTC (permalink / raw)
To: tony.luck, bp, peterz, catalin.marinas, yazen.ghannam, akpm,
linmiaohe, nao.horiguchi
Cc: tglx, mingo, dave.hansen, x86, hpa, jpoimboe, linux-edac,
linux-kernel, linux-mm, baolin.wang, tianruidong
Commit 4c132d1d844a ("x86/futex: Remove .fixup usage") introduced a new
extable fixup type, EX_TYPE_EFAULT_REG, and commit 4c132d1d844a
("x86/futex: Remove .fixup usage") updated the extable fixup type for
copy-from-user operations, changing it from EX_TYPE_UACCESS to
EX_TYPE_EFAULT_REG. The error context for copy-from-user operations no
longer functions as an in-kernel recovery context. Consequently, the error
context for copy-from-user operations no longer functions as an in-kernel
recovery context, resulting in kernel panics with the message: "Machine
check: Data load in unrecoverable area of kernel."
The critical aspect is identifying whether the error context involves a
read from user memory. We do not care about the ex-type if we know its a
MOV reading from userspace. is_copy_from_user() return true when both of
the following conditions are met:
- the current instruction is copy
- source address is user memory
So, use is_copy_from_user() to determin if a context is copy user directly.
Fixes: 4c132d1d844a ("x86/futex: Remove .fixup usage")
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
arch/x86/kernel/cpu/mce/severity.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/cpu/mce/severity.c b/arch/x86/kernel/cpu/mce/severity.c
index dac4d64dfb2a..cb021058165f 100644
--- a/arch/x86/kernel/cpu/mce/severity.c
+++ b/arch/x86/kernel/cpu/mce/severity.c
@@ -300,13 +300,12 @@ static noinstr int error_context(struct mce *m, struct pt_regs *regs)
copy_user = is_copy_from_user(regs);
instrumentation_end();
- switch (fixup_type) {
- case EX_TYPE_UACCESS:
- if (!copy_user)
- return IN_KERNEL;
- m->kflags |= MCE_IN_KERNEL_COPYIN;
- fallthrough;
+ if (copy_user) {
+ m->kflags |= MCE_IN_KERNEL_COPYIN | MCE_IN_KERNEL_COPYIN;
+ return IN_KERNEL_RECOV
+ }
+ switch (fixup_type) {
case EX_TYPE_FAULT_MCE_SAFE:
case EX_TYPE_DEFAULT_MCE_SAFE:
m->kflags |= MCE_IN_KERNEL_RECOV;
--
2.39.3
^ permalink raw reply [flat|nested] 9+ messages in thread* RE: [PATCH v3 1/3] x86/mce: Use is_copy_from_user() to determine copy-from-user context
2025-03-06 2:10 ` [PATCH v3 1/3] x86/mce: Use is_copy_from_user() to determine copy-from-user context Shuai Xue
@ 2025-03-06 18:15 ` Luck, Tony
2025-03-07 1:40 ` Shuai Xue
2025-03-07 5:47 ` Shuai Xue
2025-03-07 3:17 ` kernel test robot
2025-03-07 3:39 ` kernel test robot
2 siblings, 2 replies; 9+ messages in thread
From: Luck, Tony @ 2025-03-06 18:15 UTC (permalink / raw)
To: Shuai Xue, bp, peterz, catalin.marinas, yazen.ghannam, akpm,
linmiaohe, nao.horiguchi
Cc: tglx, mingo, dave.hansen, x86, hpa, jpoimboe, linux-edac,
linux-kernel, linux-mm, baolin.wang, tianruidong
> diff --git a/arch/x86/kernel/cpu/mce/severity.c b/arch/x86/kernel/cpu/mce/severity.c
> index dac4d64dfb2a..cb021058165f 100644
> --- a/arch/x86/kernel/cpu/mce/severity.c
> +++ b/arch/x86/kernel/cpu/mce/severity.c
> @@ -300,13 +300,12 @@ static noinstr int error_context(struct mce *m, struct pt_regs *regs)
> copy_user = is_copy_from_user(regs);
> instrumentation_end();
>
> - switch (fixup_type) {
> - case EX_TYPE_UACCESS:
> - if (!copy_user)
> - return IN_KERNEL;
> - m->kflags |= MCE_IN_KERNEL_COPYIN;
> - fallthrough;
> + if (copy_user) {
> + m->kflags |= MCE_IN_KERNEL_COPYIN | MCE_IN_KERNEL_COPYIN;
You have " MCE_IN_KERNEL_COPYIN" twice here.
> + return IN_KERNEL_RECOV
> + }
>
> + switch (fixup_type) {
> case EX_TYPE_FAULT_MCE_SAFE:
> case EX_TYPE_DEFAULT_MCE_SAFE:
> m->kflags |= MCE_IN_KERNEL_RECOV;
> --
-Tony
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH v3 1/3] x86/mce: Use is_copy_from_user() to determine copy-from-user context
2025-03-06 18:15 ` Luck, Tony
@ 2025-03-07 1:40 ` Shuai Xue
2025-03-07 5:47 ` Shuai Xue
1 sibling, 0 replies; 9+ messages in thread
From: Shuai Xue @ 2025-03-07 1:40 UTC (permalink / raw)
To: Luck, Tony, bp, peterz, catalin.marinas, yazen.ghannam, akpm,
linmiaohe, nao.horiguchi
Cc: tglx, mingo, dave.hansen, x86, hpa, jpoimboe, linux-edac,
linux-kernel, linux-mm, baolin.wang, tianruidong
在 2025/3/7 02:15, Luck, Tony 写道:
>> diff --git a/arch/x86/kernel/cpu/mce/severity.c b/arch/x86/kernel/cpu/mce/severity.c
>> index dac4d64dfb2a..cb021058165f 100644
>> --- a/arch/x86/kernel/cpu/mce/severity.c
>> +++ b/arch/x86/kernel/cpu/mce/severity.c
>> @@ -300,13 +300,12 @@ static noinstr int error_context(struct mce *m, struct pt_regs *regs)
>> copy_user = is_copy_from_user(regs);
>> instrumentation_end();
>>
>> - switch (fixup_type) {
>> - case EX_TYPE_UACCESS:
>> - if (!copy_user)
>> - return IN_KERNEL;
>> - m->kflags |= MCE_IN_KERNEL_COPYIN;
>> - fallthrough;
>> + if (copy_user) {
>> + m->kflags |= MCE_IN_KERNEL_COPYIN | MCE_IN_KERNEL_COPYIN;
>
> You have " MCE_IN_KERNEL_COPYIN" twice here.
Sorry, I forgot to format a new patch and send a old version.
The corrected one:
---
arch/x86/kernel/cpu/mce/severity.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/cpu/mce/severity.c b/arch/x86/kernel/cpu/mce/severity.c
index dac4d64dfb2a..2235a7477436 100644
--- a/arch/x86/kernel/cpu/mce/severity.c
+++ b/arch/x86/kernel/cpu/mce/severity.c
@@ -300,13 +300,12 @@ static noinstr int error_context(struct mce *m, struct pt_regs *regs)
copy_user = is_copy_from_user(regs);
instrumentation_end();
- switch (fixup_type) {
- case EX_TYPE_UACCESS:
- if (!copy_user)
- return IN_KERNEL;
- m->kflags |= MCE_IN_KERNEL_COPYIN;
- fallthrough;
+ if (copy_user) {
+ m->kflags |= MCE_IN_KERNEL_COPYIN | MCE_IN_KERNEL_RECOV;
+ return IN_KERNEL_RECOV;
+ }
+ switch (fixup_type) {
case EX_TYPE_FAULT_MCE_SAFE:
case EX_TYPE_DEFAULT_MCE_SAFE:
m->kflags |= MCE_IN_KERNEL_RECOV;
Will fix it in next version.
Thanks.
Shuai
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH v3 1/3] x86/mce: Use is_copy_from_user() to determine copy-from-user context
2025-03-06 18:15 ` Luck, Tony
2025-03-07 1:40 ` Shuai Xue
@ 2025-03-07 5:47 ` Shuai Xue
1 sibling, 0 replies; 9+ messages in thread
From: Shuai Xue @ 2025-03-07 5:47 UTC (permalink / raw)
To: Luck, Tony, bp, peterz, catalin.marinas, yazen.ghannam, akpm,
linmiaohe, nao.horiguchi
Cc: tglx, mingo, dave.hansen, x86, hpa, jpoimboe, linux-edac,
linux-kernel, linux-mm, baolin.wang, tianruidong
在 2025/3/7 02:15, Luck, Tony 写道:
>> diff --git a/arch/x86/kernel/cpu/mce/severity.c b/arch/x86/kernel/cpu/mce/severity.c
>> index dac4d64dfb2a..cb021058165f 100644
>> --- a/arch/x86/kernel/cpu/mce/severity.c
>> +++ b/arch/x86/kernel/cpu/mce/severity.c
>> @@ -300,13 +300,12 @@ static noinstr int error_context(struct mce *m, struct pt_regs *regs)
>> copy_user = is_copy_from_user(regs);
>> instrumentation_end();
>>
>> - switch (fixup_type) {
>> - case EX_TYPE_UACCESS:
>> - if (!copy_user)
>> - return IN_KERNEL;
>> - m->kflags |= MCE_IN_KERNEL_COPYIN;
>> - fallthrough;
>> + if (copy_user) {
>> + m->kflags |= MCE_IN_KERNEL_COPYIN | MCE_IN_KERNEL_COPYIN;
>
> You have " MCE_IN_KERNEL_COPYIN" twice here.
Sorry for this noise, please ignore this version,
I resend a new ready version, please see:
https://lore.kernel.org/linux-mm/20250307054404.73877-1-xueshuai@linux.alibaba.com/
Thanks
Shuai
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 1/3] x86/mce: Use is_copy_from_user() to determine copy-from-user context
2025-03-06 2:10 ` [PATCH v3 1/3] x86/mce: Use is_copy_from_user() to determine copy-from-user context Shuai Xue
2025-03-06 18:15 ` Luck, Tony
@ 2025-03-07 3:17 ` kernel test robot
2025-03-07 3:39 ` kernel test robot
2 siblings, 0 replies; 9+ messages in thread
From: kernel test robot @ 2025-03-07 3:17 UTC (permalink / raw)
To: Shuai Xue, tony.luck, bp, peterz, catalin.marinas, yazen.ghannam,
akpm, linmiaohe, nao.horiguchi
Cc: llvm, oe-kbuild-all, tglx, mingo, dave.hansen, x86, hpa,
jpoimboe, linux-edac, linux-kernel, linux-mm, baolin.wang,
tianruidong
Hi Shuai,
kernel test robot noticed the following build errors:
[auto build test ERROR on akpm-mm/mm-everything]
url: https://github.com/intel-lab-lkp/linux/commits/Shuai-Xue/x86-mce-Use-is_copy_from_user-to-determine-copy-from-user-context/20250306-101505
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20250306021031.5538-2-xueshuai%40linux.alibaba.com
patch subject: [PATCH v3 1/3] x86/mce: Use is_copy_from_user() to determine copy-from-user context
config: i386-buildonly-randconfig-002-20250307 (https://download.01.org/0day-ci/archive/20250307/202503071154.xQpKARjN-lkp@intel.com/config)
compiler: clang version 19.1.7 (https://github.com/llvm/llvm-project cd708029e0b2869e80abe31ddb175f7c35361f90)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250307/202503071154.xQpKARjN-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202503071154.xQpKARjN-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from arch/x86/kernel/cpu/mce/severity.c:16:
In file included from arch/x86/include/asm/traps.h:6:
In file included from include/linux/kprobes.h:28:
In file included from include/linux/ftrace.h:13:
In file included from include/linux/kallsyms.h:13:
In file included from include/linux/mm.h:2321:
include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
518 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
| ~~~~~~~~~~~ ^ ~~~
>> arch/x86/kernel/cpu/mce/severity.c:305:25: error: expected ';' after return statement
305 | return IN_KERNEL_RECOV
| ^
| ;
1 warning and 1 error generated.
vim +305 arch/x86/kernel/cpu/mce/severity.c
274
275 /*
276 * If mcgstatus indicated that ip/cs on the stack were
277 * no good, then "m->cs" will be zero and we will have
278 * to assume the worst case (IN_KERNEL) as we actually
279 * have no idea what we were executing when the machine
280 * check hit.
281 * If we do have a good "m->cs" (or a faked one in the
282 * case we were executing in VM86 mode) we can use it to
283 * distinguish an exception taken in user from from one
284 * taken in the kernel.
285 */
286 static noinstr int error_context(struct mce *m, struct pt_regs *regs)
287 {
288 int fixup_type;
289 bool copy_user;
290
291 if ((m->cs & 3) == 3)
292 return IN_USER;
293
294 if (!mc_recoverable(m->mcgstatus))
295 return IN_KERNEL;
296
297 /* Allow instrumentation around external facilities usage. */
298 instrumentation_begin();
299 fixup_type = ex_get_fixup_type(m->ip);
300 copy_user = is_copy_from_user(regs);
301 instrumentation_end();
302
303 if (copy_user) {
304 m->kflags |= MCE_IN_KERNEL_COPYIN | MCE_IN_KERNEL_COPYIN;
> 305 return IN_KERNEL_RECOV
306 }
307
308 switch (fixup_type) {
309 case EX_TYPE_FAULT_MCE_SAFE:
310 case EX_TYPE_DEFAULT_MCE_SAFE:
311 m->kflags |= MCE_IN_KERNEL_RECOV;
312 return IN_KERNEL_RECOV;
313
314 default:
315 return IN_KERNEL;
316 }
317 }
318
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH v3 1/3] x86/mce: Use is_copy_from_user() to determine copy-from-user context
2025-03-06 2:10 ` [PATCH v3 1/3] x86/mce: Use is_copy_from_user() to determine copy-from-user context Shuai Xue
2025-03-06 18:15 ` Luck, Tony
2025-03-07 3:17 ` kernel test robot
@ 2025-03-07 3:39 ` kernel test robot
2 siblings, 0 replies; 9+ messages in thread
From: kernel test robot @ 2025-03-07 3:39 UTC (permalink / raw)
To: Shuai Xue, tony.luck, bp, peterz, catalin.marinas, yazen.ghannam,
akpm, linmiaohe, nao.horiguchi
Cc: oe-kbuild-all, tglx, mingo, dave.hansen, x86, hpa, jpoimboe,
linux-edac, linux-kernel, linux-mm, baolin.wang, tianruidong
Hi Shuai,
kernel test robot noticed the following build errors:
[auto build test ERROR on akpm-mm/mm-everything]
url: https://github.com/intel-lab-lkp/linux/commits/Shuai-Xue/x86-mce-Use-is_copy_from_user-to-determine-copy-from-user-context/20250306-101505
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20250306021031.5538-2-xueshuai%40linux.alibaba.com
patch subject: [PATCH v3 1/3] x86/mce: Use is_copy_from_user() to determine copy-from-user context
config: i386-buildonly-randconfig-005-20250307 (https://download.01.org/0day-ci/archive/20250307/202503071115.uNkoVksh-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250307/202503071115.uNkoVksh-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202503071115.uNkoVksh-lkp@intel.com/
All errors (new ones prefixed by >>):
arch/x86/kernel/cpu/mce/severity.c: In function 'error_context':
>> arch/x86/kernel/cpu/mce/severity.c:305:39: error: expected ';' before '}' token
305 | return IN_KERNEL_RECOV
| ^
| ;
306 | }
| ~
vim +305 arch/x86/kernel/cpu/mce/severity.c
274
275 /*
276 * If mcgstatus indicated that ip/cs on the stack were
277 * no good, then "m->cs" will be zero and we will have
278 * to assume the worst case (IN_KERNEL) as we actually
279 * have no idea what we were executing when the machine
280 * check hit.
281 * If we do have a good "m->cs" (or a faked one in the
282 * case we were executing in VM86 mode) we can use it to
283 * distinguish an exception taken in user from from one
284 * taken in the kernel.
285 */
286 static noinstr int error_context(struct mce *m, struct pt_regs *regs)
287 {
288 int fixup_type;
289 bool copy_user;
290
291 if ((m->cs & 3) == 3)
292 return IN_USER;
293
294 if (!mc_recoverable(m->mcgstatus))
295 return IN_KERNEL;
296
297 /* Allow instrumentation around external facilities usage. */
298 instrumentation_begin();
299 fixup_type = ex_get_fixup_type(m->ip);
300 copy_user = is_copy_from_user(regs);
301 instrumentation_end();
302
303 if (copy_user) {
304 m->kflags |= MCE_IN_KERNEL_COPYIN | MCE_IN_KERNEL_COPYIN;
> 305 return IN_KERNEL_RECOV
306 }
307
308 switch (fixup_type) {
309 case EX_TYPE_FAULT_MCE_SAFE:
310 case EX_TYPE_DEFAULT_MCE_SAFE:
311 m->kflags |= MCE_IN_KERNEL_RECOV;
312 return IN_KERNEL_RECOV;
313
314 default:
315 return IN_KERNEL;
316 }
317 }
318
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v3 2/3] mm/hwpoison: Do not send SIGBUS to processes with recovered clean pages
2025-03-06 2:10 [PATCH v3 0/3] mm/hwpoison: Fix regressions in memory failure handling Shuai Xue
2025-03-06 2:10 ` [PATCH v3 1/3] x86/mce: Use is_copy_from_user() to determine copy-from-user context Shuai Xue
@ 2025-03-06 2:10 ` Shuai Xue
2025-03-06 2:10 ` [PATCH v3 3/3] mm: memory-failure: Enhance comments for return value of memory_failure() Shuai Xue
2 siblings, 0 replies; 9+ messages in thread
From: Shuai Xue @ 2025-03-06 2:10 UTC (permalink / raw)
To: tony.luck, bp, peterz, catalin.marinas, yazen.ghannam, akpm,
linmiaohe, nao.horiguchi
Cc: tglx, mingo, dave.hansen, x86, hpa, jpoimboe, linux-edac,
linux-kernel, linux-mm, baolin.wang, tianruidong
When an uncorrected memory error is consumed there is a race between the
CMCI from the memory controller reporting an uncorrected error with a UCNA
signature, and the core reporting and SRAR signature machine check when the
data is about to be consumed.
- Background: why *UN*corrected errors tied to *C*MCI in Intel platform [1]
Prior to Icelake memory controllers reported patrol scrub events that
detected a previously unseen uncorrected error in memory by signaling a
broadcast machine check with an SRAO (Software Recoverable Action Optional)
signature in the machine check bank. This was overkill because it's not an
urgent problem that no core is on the verge of consuming that bad data.
It's also found that multi SRAO UCE may cause nested MCE interrupts and
finally become an IERR.
Hence, Intel downgrades the machine check bank signature of patrol
scrub from SRAO to UCNA (Uncorrected, No Action required), and signal
changed to #CMCI. Just to add to the confusion, Linux does take an action
(in uc_decode_notifier()) to try to offline the page despite the UC*NA*
signature name.
- Background: why #CMCI and #MCE race when poison is consuming in Intel platform [1]
Having decided that CMCI/UCNA is the best action for patrol scrub errors,
the memory controller uses it for reads too. But the memory controller is
executing asynchronously from the core, and can't tell the difference
between a "real" read and a speculative read. So it will do CMCI/UCNA if an
error is found in any read.
Thus:
1) Core is clever and thinks address A is needed soon, issues a speculative read.
2) Core finds it is going to use address A soon after sending the read request
3) The CMCI from the memory controller is in a race with MCE from the core
that will soon try to retire the load from address A.
Quite often (because speculation has got better) the CMCI from the memory
controller is delivered before the core is committed to the instruction
reading address A, so the interrupt is taken, and Linux offlines the page
(marking it as poison).
- Why user process is killed for instr case
Commit 046545a661af ("mm/hwpoison: fix error page recovered but reported
"not recovered"") tries to fix noise message "Memory error not recovered"
and skips duplicate SIGBUSs due to the race. But it also introduced a bug
that kill_accessing_process() return -EHWPOISON for instr case, as result,
kill_me_maybe() send a SIGBUS to user process.
If the CMCI wins that race, the page is marked poisoned when
uc_decode_notifier() calls memory_failure(). For dirty pages,
memory_failure() invokes try_to_unmap() with the TTU_HWPOISON flag,
converting the PTE to a hwpoison entry. As a result,
kill_accessing_process():
- call walk_page_range() and return 1 regardless of whether
try_to_unmap() succeeds or fails,
- call kill_proc() to make sure a SIGBUS is sent
- return -EHWPOISON to indicate that SIGBUS is already sent to the
process and kill_me_maybe() doesn't have to send it again.
However, for clean pages, the TTU_HWPOISON flag is cleared, leaving the
PTE unchanged and not converted to a hwpoison entry. Conversely, for
clean pages where PTE entries are not marked as hwpoison,
kill_accessing_process() returns -EFAULT, causing kill_me_maybe() to
send a SIGBUS.
Console log looks like this:
Memory failure: 0x827ca68: corrupted page was clean: dropped without side effects
Memory failure: 0x827ca68: recovery action for clean LRU page: Recovered
Memory failure: 0x827ca68: already hardware poisoned
mce: Memory error not recovered
To fix it, return 0 for "corrupted page was clean", preventing an
unnecessary SIGBUS to user process.
[1] https://lore.kernel.org/lkml/20250217063335.22257-1-xueshuai@linux.alibaba.com/T/#mba94f1305b3009dd340ce4114d3221fe810d1871
Fixes: 046545a661af ("mm/hwpoison: fix error page recovered but reported "not recovered"")
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Cc: stable@vger.kernel.org
---
mm/memory-failure.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 995a15eb67e2..b037952565be 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -881,12 +881,17 @@ static int kill_accessing_process(struct task_struct *p, unsigned long pfn,
mmap_read_lock(p->mm);
ret = walk_page_range(p->mm, 0, TASK_SIZE, &hwpoison_walk_ops,
(void *)&priv);
+ /*
+ * ret = 1 when CMCI wins, regardless of whether try_to_unmap()
+ * succeeds or fails, then kill the process with SIGBUS.
+ * ret = 0 when poison page is a clean page and it's dropped, no
+ * SIGBUS is needed.
+ */
if (ret == 1 && priv.tk.addr)
kill_proc(&priv.tk, pfn, flags);
- else
- ret = 0;
mmap_read_unlock(p->mm);
- return ret > 0 ? -EHWPOISON : -EFAULT;
+
+ return ret > 0 ? -EHWPOISON : 0;
}
/*
--
2.39.3
^ permalink raw reply [flat|nested] 9+ messages in thread* [PATCH v3 3/3] mm: memory-failure: Enhance comments for return value of memory_failure()
2025-03-06 2:10 [PATCH v3 0/3] mm/hwpoison: Fix regressions in memory failure handling Shuai Xue
2025-03-06 2:10 ` [PATCH v3 1/3] x86/mce: Use is_copy_from_user() to determine copy-from-user context Shuai Xue
2025-03-06 2:10 ` [PATCH v3 2/3] mm/hwpoison: Do not send SIGBUS to processes with recovered clean pages Shuai Xue
@ 2025-03-06 2:10 ` Shuai Xue
2 siblings, 0 replies; 9+ messages in thread
From: Shuai Xue @ 2025-03-06 2:10 UTC (permalink / raw)
To: tony.luck, bp, peterz, catalin.marinas, yazen.ghannam, akpm,
linmiaohe, nao.horiguchi
Cc: tglx, mingo, dave.hansen, x86, hpa, jpoimboe, linux-edac,
linux-kernel, linux-mm, baolin.wang, tianruidong
The comments for the return value of memory_failure are not complete,
supplement the comments.
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Reviewed-by: Jane Chu <jane.chu@oracle.com>
Acked-by: Miaohe Lin <linmiaohe@huawei.com>
---
mm/memory-failure.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index b037952565be..8649849bcdb4 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2216,9 +2216,13 @@ static void kill_procs_now(struct page *p, unsigned long pfn, int flags,
* Must run in process context (e.g. a work queue) with interrupts
* enabled and no spinlocks held.
*
- * Return: 0 for successfully handled the memory error,
- * -EOPNOTSUPP for hwpoison_filter() filtered the error event,
- * < 0(except -EOPNOTSUPP) on failure.
+ * Return:
+ * 0 - success,
+ * -ENXIO - memory not managed by the kernel
+ * -EOPNOTSUPP - hwpoison_filter() filtered the error event,
+ * -EHWPOISON - the page was already poisoned, potentially
+ * kill process,
+ * other negative values - failure.
*/
int memory_failure(unsigned long pfn, int flags)
{
--
2.39.3
^ permalink raw reply [flat|nested] 9+ messages in thread