[PATCH v4 0/3] mm/hwpoison: Fix regressions in memory failure handling

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v4 0/3] mm/hwpoison: Fix regressions in memory failure handling
@ 2025-03-07  5:44 Shuai Xue
  2025-03-07  5:44 ` [PATCH v4 1/3] x86/mce: Use is_copy_from_user() to determine copy-from-user context Shuai Xue
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Shuai Xue @ 2025-03-07  5:44 UTC (permalink / raw)
  To: tony.luck, bp, peterz, catalin.marinas, yazen.ghannam, akpm,
	linmiaohe, nao.horiguchi
  Cc: tglx, mingo, dave.hansen, x86, hpa, jpoimboe, linux-edac,
	linux-kernel, linux-mm, baolin.wang, tianruidong, xueshuai

changes since v3:
- drop the out-date version per Tony

changes since v2:
- drop debug error message for non-fatal case per Borislav's strong objections
- rewrite cover letter by amending MCE and CMCI race background from Tony[1]
- rewrite commit log
- use is_copy_from_user() to determine copy-from-user context per Peter
- keep comments of in kill_me_maybe() per Catalin 
- add ack-by tag for patch 3 from Miaohe Lin
- Link: https://lore.kernel.org/lkml/20250217063335.22257-1-xueshuai@linux.alibaba.com/T/#m4d87f152a67e26f2aabb4cdf81e451a1e4c70094

changes singce v1:
- Patch 1: Fix cur_sev and sev type to `int` per Tony
- Patch 4: Fix return value to 0 for clean pages per Miaohe
- Patch 5: pick return value comments of memory-failure()

## 1. What am I trying to do:

This patch resolves two critical regressions related to memory failure
handling that have appeared in the upstream kernel since version 5.17, as
compared to 5.10 LTS.

    - copyin case: poison found in user page while kernel copying from user space
    - instr case: poison found while instruction fetching in user space

## 2. What is the expected outcome and why

- For copyin case:

Kernel can recover from poison found where kernel is doing get_user() or
copy_from_user() if those places get an error return and the kernel return
-EFAULT to the process instead of crashing. More specifily, MCE handler
checks the fixup handler type to decide whether an in kernel #MC can be
recovered.  When EX_TYPE_UACCESS is found, the PC jumps to recovery code
specified in _ASM_EXTABLE_FAULT() and return a -EFAULT to user space.

- For instr case:

If a poison found while instruction fetching in user space, full recovery is
possible. User process takes #PF, Linux allocates a new page and fills by
reading from storage.

## 3. What actually happens and why

- For copyin case: kernel panic since v5.17

Commit 4c132d1d844a ("x86/futex: Remove .fixup usage") introduced a new extable
fixup type, EX_TYPE_EFAULT_REG, and later patches updated the extable fixup
type for copy-from-user operations, changing it from EX_TYPE_UACCESS to
EX_TYPE_EFAULT_REG. It breaks previous EX_TYPE_UACCESS handling when posion
found in get_user() or copy_from_user().

- For instr case: user process is killed by a SIGBUS signal due to #CMCI and #MCE race

When an uncorrected memory error is consumed there is a race between the
CMCI from the memory controller reporting an uncorrected error with a UCNA
signature, and the core reporting and SRAR signature machine check when the
data is about to be consumed.

### Background: why *UN*corrected errors tied to *C*MCI in Intel platform [1]

Prior to Icelake memory controllers reported patrol scrub events that
detected a previously unseen uncorrected error in memory by signaling a
broadcast machine check with an SRAO (Software Recoverable Action Optional)
signature in the machine check bank. This was overkill because it's not an
urgent problem that no core is on the verge of consuming that bad data.
It's also found that multi SRAO UCE may cause nested MCE interrupts and
finally become an IERR.

Hence, Intel downgrades the machine check bank signature of patrol
scrub from SRAO to UCNA (Uncorrected, No Action required), and signal
changed to #CMCI. Just to add to the confusion, Linux does take an action
(in uc_decode_notifier()) to try to offline the page despite the UC*NA*
signature name.

### Background: why #CMCI and #MCE race when poison is consuming in Intel platform [1]

Having decided that CMCI/UCNA is the best action for patrol scrub errors,
the memory controller uses it for reads too. But the memory controller is
executing asynchronously from the core, and can't tell the difference
between a "real" read and a speculative read. So it will do CMCI/UCNA if an
error is found in any read.

Thus:

1) Core is clever and thinks address A is needed soon, issues a speculative read.
2) Core finds it is going to use address A soon after sending the read request
3) The CMCI from the memory controller is in a race with MCE from the core
   that will soon try to retire the load from address A.

Quite often (because speculation has got better) the CMCI from the memory
controller is delivered before the core is committed to the instruction
reading address A, so the interrupt is taken, and Linux offlines the page
(marking it as poison).

## Why user process is killed for instr case

Commit 046545a661af ("mm/hwpoison: fix error page recovered but reported
"not recovered"") tries to fix noise message "Memory error not recovered"
and skips duplicate SIGBUSs due to the race. But it also introduced a bug
that kill_accessing_process() return -EHWPOISON for instr case, as result,
kill_me_maybe() send a SIGBUS to user process.

# 4. The fix, in my opinion, should be:

- For copyin case:

The key point is whether the error context is in a read from user
memory. We do not care about the ex-type if we know its a MOV
reading from userspace.

is_copy_from_user() return true when both of the following two checks are
true:

    - the current instruction is copy
    - source address is user memory

If copy_user is true, we set

m->kflags |= MCE_IN_KERNEL_COPYIN | MCE_IN_KERNEL_RECOV;

Then do_machine_check() will try fixup_exception() first.

- For instr case: let kill_accessing_process() return 0 to prevent a SIGBUS.

- For patch 3:

The return value of memory_failure() is quite important while discussed
instr case regression with Tony and Miaohe for patch 4, so add comment
about the return value.

Shuai Xue (3):
  x86/mce: Use is_copy_from_user() to determine copy-from-user context
  mm/hwpoison: Do not send SIGBUS to processes with recovered clean
    pages
  mm: memory-failure: Enhance comments for return value of
    memory_failure()

 arch/x86/kernel/cpu/mce/severity.c | 11 +++++------
 mm/memory-failure.c                | 21 +++++++++++++++------
 2 files changed, 20 insertions(+), 12 deletions(-)

-- 
2.39.3

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v4 1/3] x86/mce: Use is_copy_from_user() to determine copy-from-user context
  2025-03-07  5:44 [PATCH v4 0/3] mm/hwpoison: Fix regressions in memory failure handling Shuai Xue
@ 2025-03-07  5:44 ` Shuai Xue
  2025-03-07 20:40   ` Borislav Petkov
  2025-03-07  5:44 ` [PATCH v4 2/3] mm/hwpoison: Do not send SIGBUS to processes with recovered clean pages Shuai Xue
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Shuai Xue @ 2025-03-07  5:44 UTC (permalink / raw)
  To: tony.luck, bp, peterz, catalin.marinas, yazen.ghannam, akpm,
	linmiaohe, nao.horiguchi
  Cc: tglx, mingo, dave.hansen, x86, hpa, jpoimboe, linux-edac,
	linux-kernel, linux-mm, baolin.wang, tianruidong, xueshuai

Commit 4c132d1d844a ("x86/futex: Remove .fixup usage") introduced a new
extable fixup type, EX_TYPE_EFAULT_REG, and commit 4c132d1d844a
("x86/futex: Remove .fixup usage") updated the extable fixup type for
copy-from-user operations, changing it from EX_TYPE_UACCESS to
EX_TYPE_EFAULT_REG. The error context for copy-from-user operations no
longer functions as an in-kernel recovery context. Consequently, the error
context for copy-from-user operations no longer functions as an in-kernel
recovery context, resulting in kernel panics with the message: "Machine
check: Data load in unrecoverable area of kernel."

The critical aspect is identifying whether the error context involves a
read from user memory. We do not care about the ex-type if we know its a
MOV reading from userspace. is_copy_from_user() return true when both of
the following conditions are met:

    - the current instruction is copy
    - source address is user memory

So, use is_copy_from_user() to determin if a context is copy user directly.

Fixes: 4c132d1d844a ("x86/futex: Remove .fixup usage")
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
 arch/x86/kernel/cpu/mce/severity.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/severity.c b/arch/x86/kernel/cpu/mce/severity.c
index dac4d64dfb2a..2235a7477436 100644
--- a/arch/x86/kernel/cpu/mce/severity.c
+++ b/arch/x86/kernel/cpu/mce/severity.c
@@ -300,13 +300,12 @@ static noinstr int error_context(struct mce *m, struct pt_regs *regs)
 	copy_user  = is_copy_from_user(regs);
 	instrumentation_end();

-	switch (fixup_type) {
-	case EX_TYPE_UACCESS:
-		if (!copy_user)
-			return IN_KERNEL;
-		m->kflags |= MCE_IN_KERNEL_COPYIN;
-		fallthrough;
+	if (copy_user) {
+		m->kflags |= MCE_IN_KERNEL_COPYIN | MCE_IN_KERNEL_RECOV;
+		return IN_KERNEL_RECOV;
+	}

+	switch (fixup_type) {
 	case EX_TYPE_FAULT_MCE_SAFE:
 	case EX_TYPE_DEFAULT_MCE_SAFE:
 		m->kflags |= MCE_IN_KERNEL_RECOV;
-- 
2.39.3

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v4 2/3] mm/hwpoison: Do not send SIGBUS to processes with recovered clean pages
  2025-03-07  5:44 [PATCH v4 0/3] mm/hwpoison: Fix regressions in memory failure handling Shuai Xue
  2025-03-07  5:44 ` [PATCH v4 1/3] x86/mce: Use is_copy_from_user() to determine copy-from-user context Shuai Xue
@ 2025-03-07  5:44 ` Shuai Xue
  2025-03-12  6:39   ` Miaohe Lin
  2025-03-07  5:44 ` [PATCH v4 3/3] mm: memory-failure: Enhance comments for return value of memory_failure() Shuai Xue
  2025-03-07 17:20 ` [PATCH v4 0/3] mm/hwpoison: Fix regressions in memory failure handling Luck, Tony
  3 siblings, 1 reply; 16+ messages in thread
From: Shuai Xue @ 2025-03-07  5:44 UTC (permalink / raw)
  To: tony.luck, bp, peterz, catalin.marinas, yazen.ghannam, akpm,
	linmiaohe, nao.horiguchi
  Cc: tglx, mingo, dave.hansen, x86, hpa, jpoimboe, linux-edac,
	linux-kernel, linux-mm, baolin.wang, tianruidong, xueshuai

When an uncorrected memory error is consumed there is a race between the
CMCI from the memory controller reporting an uncorrected error with a UCNA
signature, and the core reporting and SRAR signature machine check when the
data is about to be consumed.

- Background: why *UN*corrected errors tied to *C*MCI in Intel platform [1]

Prior to Icelake memory controllers reported patrol scrub events that
detected a previously unseen uncorrected error in memory by signaling a
broadcast machine check with an SRAO (Software Recoverable Action Optional)
signature in the machine check bank. This was overkill because it's not an
urgent problem that no core is on the verge of consuming that bad data.
It's also found that multi SRAO UCE may cause nested MCE interrupts and
finally become an IERR.

Hence, Intel downgrades the machine check bank signature of patrol
scrub from SRAO to UCNA (Uncorrected, No Action required), and signal
changed to #CMCI. Just to add to the confusion, Linux does take an action
(in uc_decode_notifier()) to try to offline the page despite the UC*NA*
signature name.

- Background: why #CMCI and #MCE race when poison is consuming in Intel platform [1]

Having decided that CMCI/UCNA is the best action for patrol scrub errors,
the memory controller uses it for reads too. But the memory controller is
executing asynchronously from the core, and can't tell the difference
between a "real" read and a speculative read. So it will do CMCI/UCNA if an
error is found in any read.

Thus:

1) Core is clever and thinks address A is needed soon, issues a speculative read.
2) Core finds it is going to use address A soon after sending the read request
3) The CMCI from the memory controller is in a race with MCE from the core
   that will soon try to retire the load from address A.

Quite often (because speculation has got better) the CMCI from the memory
controller is delivered before the core is committed to the instruction
reading address A, so the interrupt is taken, and Linux offlines the page
(marking it as poison).

- Why user process is killed for instr case

Commit 046545a661af ("mm/hwpoison: fix error page recovered but reported
"not recovered"") tries to fix noise message "Memory error not recovered"
and skips duplicate SIGBUSs due to the race. But it also introduced a bug
that kill_accessing_process() return -EHWPOISON for instr case, as result,
kill_me_maybe() send a SIGBUS to user process.

If the CMCI wins that race, the page is marked poisoned when
uc_decode_notifier() calls memory_failure(). For dirty pages,
memory_failure() invokes try_to_unmap() with the TTU_HWPOISON flag,
converting the PTE to a hwpoison entry. As a result,
kill_accessing_process():

- call walk_page_range() and return 1 regardless of whether
  try_to_unmap() succeeds or fails,
- call kill_proc() to make sure a SIGBUS is sent
- return -EHWPOISON to indicate that SIGBUS is already sent to the
  process and kill_me_maybe() doesn't have to send it again.

However, for clean pages, the TTU_HWPOISON flag is cleared, leaving the
PTE unchanged and not converted to a hwpoison entry. Conversely, for
clean pages where PTE entries are not marked as hwpoison,
kill_accessing_process() returns -EFAULT, causing kill_me_maybe() to
send a SIGBUS.

Console log looks like this:

    Memory failure: 0x827ca68: corrupted page was clean: dropped without side effects
    Memory failure: 0x827ca68: recovery action for clean LRU page: Recovered
    Memory failure: 0x827ca68: already hardware poisoned
    mce: Memory error not recovered

To fix it, return 0 for "corrupted page was clean", preventing an
unnecessary SIGBUS to user process.

[1] https://lore.kernel.org/lkml/20250217063335.22257-1-xueshuai@linux.alibaba.com/T/#mba94f1305b3009dd340ce4114d3221fe810d1871
Fixes: 046545a661af ("mm/hwpoison: fix error page recovered but reported "not recovered"")
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Cc: stable@vger.kernel.org
---
 mm/memory-failure.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 995a15eb67e2..b037952565be 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -881,12 +881,17 @@ static int kill_accessing_process(struct task_struct *p, unsigned long pfn,
 	mmap_read_lock(p->mm);
 	ret = walk_page_range(p->mm, 0, TASK_SIZE, &hwpoison_walk_ops,
 			      (void *)&priv);
+	/*
+	 * ret = 1 when CMCI wins, regardless of whether try_to_unmap()
+	 * succeeds or fails, then kill the process with SIGBUS.
+	 * ret = 0 when poison page is a clean page and it's dropped, no
+	 * SIGBUS is needed.
+	 */
 	if (ret == 1 && priv.tk.addr)
 		kill_proc(&priv.tk, pfn, flags);
-	else
-		ret = 0;
 	mmap_read_unlock(p->mm);
-	return ret > 0 ? -EHWPOISON : -EFAULT;
+
+	return ret > 0 ? -EHWPOISON : 0;
 }

 /*
-- 
2.39.3

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v4 3/3] mm: memory-failure: Enhance comments for return value of memory_failure()
  2025-03-07  5:44 [PATCH v4 0/3] mm/hwpoison: Fix regressions in memory failure handling Shuai Xue
  2025-03-07  5:44 ` [PATCH v4 1/3] x86/mce: Use is_copy_from_user() to determine copy-from-user context Shuai Xue
  2025-03-07  5:44 ` [PATCH v4 2/3] mm/hwpoison: Do not send SIGBUS to processes with recovered clean pages Shuai Xue
@ 2025-03-07  5:44 ` Shuai Xue
  2025-03-07 17:20 ` [PATCH v4 0/3] mm/hwpoison: Fix regressions in memory failure handling Luck, Tony
  3 siblings, 0 replies; 16+ messages in thread
From: Shuai Xue @ 2025-03-07  5:44 UTC (permalink / raw)
  To: tony.luck, bp, peterz, catalin.marinas, yazen.ghannam, akpm,
	linmiaohe, nao.horiguchi
  Cc: tglx, mingo, dave.hansen, x86, hpa, jpoimboe, linux-edac,
	linux-kernel, linux-mm, baolin.wang, tianruidong, xueshuai

The comments for the return value of memory_failure are not complete,
supplement the comments.

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Reviewed-by: Jane Chu <jane.chu@oracle.com>
Acked-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/memory-failure.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index b037952565be..8649849bcdb4 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2216,9 +2216,13 @@ static void kill_procs_now(struct page *p, unsigned long pfn, int flags,
  * Must run in process context (e.g. a work queue) with interrupts
  * enabled and no spinlocks held.
  *
- * Return: 0 for successfully handled the memory error,
- *         -EOPNOTSUPP for hwpoison_filter() filtered the error event,
- *         < 0(except -EOPNOTSUPP) on failure.
+ * Return:
+ *   0             - success,
+ *   -ENXIO        - memory not managed by the kernel
+ *   -EOPNOTSUPP   - hwpoison_filter() filtered the error event,
+ *   -EHWPOISON    - the page was already poisoned, potentially
+ *                   kill process,
+ *   other negative values - failure.
  */
 int memory_failure(unsigned long pfn, int flags)
 {
-- 
2.39.3



^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v4 0/3] mm/hwpoison: Fix regressions in memory failure handling
  2025-03-07  5:44 [PATCH v4 0/3] mm/hwpoison: Fix regressions in memory failure handling Shuai Xue
                   ` (2 preceding siblings ...)
  2025-03-07  5:44 ` [PATCH v4 3/3] mm: memory-failure: Enhance comments for return value of memory_failure() Shuai Xue
@ 2025-03-07 17:20 ` Luck, Tony
  2025-03-08 11:36   ` Shuai Xue
  3 siblings, 1 reply; 16+ messages in thread
From: Luck, Tony @ 2025-03-07 17:20 UTC (permalink / raw)
  To: Shuai Xue, bp, peterz, catalin.marinas, yazen.ghannam, akpm,
	linmiaohe, nao.horiguchi
  Cc: tglx, mingo, dave.hansen, x86, hpa, jpoimboe, linux-edac,
	linux-kernel, linux-mm, baolin.wang, tianruidong

> ## 1. What am I trying to do:
>
> This patch resolves two critical regressions related to memory failure
> handling that have appeared in the upstream kernel since version 5.17, as
> compared to 5.10 LTS.
>
>     - copyin case: poison found in user page while kernel copying from user space
>     - instr case: poison found while instruction fetching in user space

Tested the instruction, copyin, and futex cases. They all pass now.

Thanks!

Tested-by: Tony Luck <tony.luck@intel.com>

-Tony


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/3] x86/mce: Use is_copy_from_user() to determine copy-from-user context
  2025-03-07  5:44 ` [PATCH v4 1/3] x86/mce: Use is_copy_from_user() to determine copy-from-user context Shuai Xue
@ 2025-03-07 20:40   ` Borislav Petkov
  2025-03-07 22:05     ` Luck, Tony
  2025-03-08 11:25     ` Shuai Xue
  0 siblings, 2 replies; 16+ messages in thread
From: Borislav Petkov @ 2025-03-07 20:40 UTC (permalink / raw)
  To: Shuai Xue
  Cc: tony.luck, peterz, catalin.marinas, yazen.ghannam, akpm,
	linmiaohe, nao.horiguchi, tglx, mingo, dave.hansen, x86, hpa,
	jpoimboe, linux-edac, linux-kernel, linux-mm, baolin.wang,
	tianruidong

On Fri, Mar 07, 2025 at 01:44:02PM +0800, Shuai Xue wrote:
> Commit 4c132d1d844a ("x86/futex: Remove .fixup usage") introduced a new
> extable fixup type, EX_TYPE_EFAULT_REG, and commit 4c132d1d844a
> ("x86/futex: Remove .fixup usage") updated the extable fixup type for
> copy-from-user operations, changing it from EX_TYPE_UACCESS to
> EX_TYPE_EFAULT_REG. The error context for copy-from-user operations no
> longer functions as an in-kernel recovery context. Consequently, the error
> context for copy-from-user operations no longer functions as an in-kernel
> recovery context, resulting in kernel panics with the message: "Machine
> check: Data load in unrecoverable area of kernel."
> 
> The critical aspect is identifying whether the error context involves a
> read from user memory. We do not care about the ex-type if we know its a

Please use passive voice in your commit message: no "we" or "I", etc,
and describe your changes in imperative mood.

Also, pls read section "2) Describe your changes" in
Documentation/process/submitting-patches.rst for more details.

Also, see section "Changelog" in
Documentation/process/maintainer-tip.rst

Bottom line is: personal pronouns are ambiguous in text, especially with
so many parties/companies/etc developing the kernel so let's avoid them
please.

"ex-type"?

Please write in plain English - not in a programming language.

> MOV reading from userspace. is_copy_from_user() return true when both of
> the following conditions are met:
> 
>     - the current instruction is copy

There is no "copy instruction". You mean the "current operation".

>     - source address is user memory

So you can simply say "when reading user memory". Simple.
> 
> So, use is_copy_from_user() to determin if a context is copy user directly.

Unknown word [determin] in commit message.
Suggestions: ['determine',

Please introduce a spellchecker into your patch creation workflow.

Also, run your commit messages through AI to correct the grammar and
formulations in them.

The more important part which I asked for already is, is is_copy_from_user()
exhaustive in determining the that the operation really is a copy from user?

The EX_TYPE_UACCESS things *explicitly* marked such places in the code. Does
is_copy_from_user() guarantee the same, without false positives?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v4 1/3] x86/mce: Use is_copy_from_user() to determine copy-from-user context
  2025-03-07 20:40   ` Borislav Petkov
@ 2025-03-07 22:05     ` Luck, Tony
  2025-03-07 22:46       ` Borislav Petkov
  2025-03-08 11:25     ` Shuai Xue
  1 sibling, 1 reply; 16+ messages in thread
From: Luck, Tony @ 2025-03-07 22:05 UTC (permalink / raw)
  To: Borislav Petkov, Shuai Xue
  Cc: peterz, catalin.marinas, yazen.ghannam, akpm, linmiaohe,
	nao.horiguchi, tglx, mingo, dave.hansen, x86, hpa, jpoimboe,
	linux-edac, linux-kernel, linux-mm, baolin.wang, tianruidong

> The more important part which I asked for already is, is is_copy_from_user()
> exhaustive in determining the that the operation really is a copy from user?
>
> The EX_TYPE_UACCESS things *explicitly* marked such places in the code. Does
> is_copy_from_user() guarantee the same, without false positives?

is_copy_from_user() decodes the instruction that took the trap. It looks for
MOV, MOVZ and MOVS instructions to find the source address, and then
checks whether that's user (< TASK_SIZE_MAX) or kernel.

So no false positives.

There could be some false negatives if some other instruction is doing
the "load" operation.

-Tony

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/3] x86/mce: Use is_copy_from_user() to determine copy-from-user context
  2025-03-07 22:05     ` Luck, Tony
@ 2025-03-07 22:46       ` Borislav Petkov
  2025-03-07 23:11         ` Luck, Tony
  0 siblings, 1 reply; 16+ messages in thread
From: Borislav Petkov @ 2025-03-07 22:46 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Shuai Xue, peterz, catalin.marinas, yazen.ghannam, akpm,
	linmiaohe, nao.horiguchi, tglx, mingo, dave.hansen, x86, hpa,
	jpoimboe, linux-edac, linux-kernel, linux-mm, baolin.wang,
	tianruidong

On Fri, Mar 07, 2025 at 10:05:12PM +0000, Luck, Tony wrote:
> is_copy_from_user() decodes the instruction that took the trap. It looks for
> MOV, MOVZ and MOVS instructions to find the source address, and then
> checks whether that's user (< TASK_SIZE_MAX) or kernel.

You mean there's absolutely nothing else like, say, some epbf or some other
hackery we tend to do in the kernel (or we will do in the future) which won't
create the exact same two conditions:

- one of the three insns
- user mem read

and it would cause a recovery action.

Perhaps it still might be the proper thing to do even then but it does sound
fishy and unclean to me.

Nothing beats the explicit markup we had until recently...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v4 1/3] x86/mce: Use is_copy_from_user() to determine copy-from-user context
  2025-03-07 22:46       ` Borislav Petkov
@ 2025-03-07 23:11         ` Luck, Tony
  2025-03-07 23:22           ` Borislav Petkov
  0 siblings, 1 reply; 16+ messages in thread
From: Luck, Tony @ 2025-03-07 23:11 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Shuai Xue, peterz, catalin.marinas, yazen.ghannam, akpm,
	linmiaohe, nao.horiguchi, tglx, mingo, dave.hansen, x86, hpa,
	jpoimboe, linux-edac, linux-kernel, linux-mm, baolin.wang,
	tianruidong

> > is_copy_from_user() decodes the instruction that took the trap. It looks for
> > MOV, MOVZ and MOVS instructions to find the source address, and then
> > checks whether that's user (< TASK_SIZE_MAX) or kernel.
>
> You mean there's absolutely nothing else like, say, some epbf or some other
> hackery we tend to do in the kernel (or we will do in the future) which won't
> create the exact same two conditions:
>
> - one of the three insns
> - user mem read
>
> and it would cause a recovery action.
>
> Perhaps it still might be the proper thing to do even then but it does sound
> fishy and unclean to me.
>
> Nothing beats the explicit markup we had until recently...

Every "user mem read" needs to have an extable[] recovery entry
attached to the IP of the instruction  (to handle the much more common
#PF for page-not-present). All those places already have to deal with
the possibility that the #PF can't be recovered. The #MC handling is
really just a small extension.

As for "explicit markup" I don't think it would be better to decorate
every get_user() and copy_from_user() with some "this one can
recover from #MC" 

Note also that "what we had recently" was fragile, broke, and resulted
in this regression.

-Tony

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/3] x86/mce: Use is_copy_from_user() to determine copy-from-user context
  2025-03-07 23:11         ` Luck, Tony
@ 2025-03-07 23:22           ` Borislav Petkov
  2025-03-08 11:27             ` Shuai Xue
  0 siblings, 1 reply; 16+ messages in thread
From: Borislav Petkov @ 2025-03-07 23:22 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Shuai Xue, peterz, catalin.marinas, yazen.ghannam, akpm,
	linmiaohe, nao.horiguchi, tglx, mingo, dave.hansen, x86, hpa,
	jpoimboe, linux-edac, linux-kernel, linux-mm, baolin.wang,
	tianruidong

On Fri, Mar 07, 2025 at 11:11:26PM +0000, Luck, Tony wrote:
> As for "explicit markup" I don't think it would be better to decorate
> every get_user() and copy_from_user() with some "this one can
> recover from #MC" 

I don't mean every function - I mean what we had there with EX_TYPE_UACCESS.
That is explicit and unambiguous. Proving that is_copy_from_user() is always
correct is a lot harder.

> Note also that "what we had recently" was fragile, broke, and resulted
> in this regression.

Because those exception types got renamed? Oh well, that should've been
reverted actually but no one involved realized that MCE is using those.

And I'm not saying this is the only way to solve this. We could do something
like collecting all addresses on which an MCE can be recoverable, for example.
We haven't considered it that important... yet.

Looks like we're going to try this new is_copy_from_user() thing now and then
see where it gets us.

So, after the commit message has been fixed:

Acked-by: Borislav Petkov (AMD) <bp@alien8.de>

I'm presuming, this is going through akpm...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/3] x86/mce: Use is_copy_from_user() to determine copy-from-user context
  2025-03-07 20:40   ` Borislav Petkov
  2025-03-07 22:05     ` Luck, Tony
@ 2025-03-08 11:25     ` Shuai Xue
  1 sibling, 0 replies; 16+ messages in thread
From: Shuai Xue @ 2025-03-08 11:25 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: tony.luck, peterz, catalin.marinas, yazen.ghannam, akpm,
	linmiaohe, nao.horiguchi, tglx, mingo, dave.hansen, x86, hpa,
	jpoimboe, linux-edac, linux-kernel, linux-mm, baolin.wang,
	tianruidong



在 2025/3/8 04:40, Borislav Petkov 写道:
> On Fri, Mar 07, 2025 at 01:44:02PM +0800, Shuai Xue wrote:
>> Commit 4c132d1d844a ("x86/futex: Remove .fixup usage") introduced a new
>> extable fixup type, EX_TYPE_EFAULT_REG, and commit 4c132d1d844a
>> ("x86/futex: Remove .fixup usage") updated the extable fixup type for
>> copy-from-user operations, changing it from EX_TYPE_UACCESS to
>> EX_TYPE_EFAULT_REG. The error context for copy-from-user operations no
>> longer functions as an in-kernel recovery context. Consequently, the error
>> context for copy-from-user operations no longer functions as an in-kernel
>> recovery context, resulting in kernel panics with the message: "Machine
>> check: Data load in unrecoverable area of kernel."
>>
>> The critical aspect is identifying whether the error context involves a
>> read from user memory. We do not care about the ex-type if we know its a
> 
> Please use passive voice in your commit message: no "we" or "I", etc,
> and describe your changes in imperative mood.
> 
> Also, pls read section "2) Describe your changes" in
> Documentation/process/submitting-patches.rst for more details.
> 
> Also, see section "Changelog" in
> Documentation/process/maintainer-tip.rst
> 
> Bottom line is: personal pronouns are ambiguous in text, especially with
> so many parties/companies/etc developing the kernel so let's avoid them
> please.
> 
> "ex-type"?
> 
> Please write in plain English - not in a programming language.
> 
>> MOV reading from userspace. is_copy_from_user() return true when both of
>> the following conditions are met:
>>
>>      - the current instruction is copy
> 
> There is no "copy instruction". You mean the "current operation".
> 
>>      - source address is user memory
> 
> So you can simply say "when reading user memory". Simple.
>>
>> So, use is_copy_from_user() to determin if a context is copy user directly.
> 
> Unknown word [determin] in commit message.
> Suggestions: ['determine',
> 
> Please introduce a spellchecker into your patch creation workflow.
> 
> Also, run your commit messages through AI to correct the grammar and
> formulations in them.

Certainly, thank you for bringing that to my attention.
I will refine the commit log accordingly.

> 
> The more important part which I asked for already is, is is_copy_from_user()
> exhaustive in determining the that the operation really is a copy from user?
> 
> The EX_TYPE_UACCESS things *explicitly* marked such places in the code. Does
> is_copy_from_user() guarantee the same, without false positives?
> 

Following your discussion with Tony, it seems that we have reached a conclusion.

Thanks.
Best Regards,
Shuai



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/3] x86/mce: Use is_copy_from_user() to determine copy-from-user context
  2025-03-07 23:22           ` Borislav Petkov
@ 2025-03-08 11:27             ` Shuai Xue
  0 siblings, 0 replies; 16+ messages in thread
From: Shuai Xue @ 2025-03-08 11:27 UTC (permalink / raw)
  To: Borislav Petkov, Luck, Tony
  Cc: peterz, catalin.marinas, yazen.ghannam, akpm, linmiaohe,
	nao.horiguchi, tglx, mingo, dave.hansen, x86, hpa, jpoimboe,
	linux-edac, linux-kernel, linux-mm, baolin.wang, tianruidong



在 2025/3/8 07:22, Borislav Petkov 写道:
> On Fri, Mar 07, 2025 at 11:11:26PM +0000, Luck, Tony wrote:
>> As for "explicit markup" I don't think it would be better to decorate
>> every get_user() and copy_from_user() with some "this one can
>> recover from #MC"
> 
> I don't mean every function - I mean what we had there with EX_TYPE_UACCESS.
> That is explicit and unambiguous. Proving that is_copy_from_user() is always
> correct is a lot harder.
> 
>> Note also that "what we had recently" was fragile, broke, and resulted
>> in this regression.
> 
> Because those exception types got renamed? Oh well, that should've been
> reverted actually but no one involved realized that MCE is using those.
> 
> And I'm not saying this is the only way to solve this. We could do something
> like collecting all addresses on which an MCE can be recoverable, for example.
> We haven't considered it that important... yet.
> 
> Looks like we're going to try this new is_copy_from_user() thing now and then
> see where it gets us.
> 
> So, after the commit message has been fixed:
> 
> Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
> 
> I'm presuming, this is going through akpm...
> 

Thanks.
Shuai


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 0/3] mm/hwpoison: Fix regressions in memory failure handling
  2025-03-07 17:20 ` [PATCH v4 0/3] mm/hwpoison: Fix regressions in memory failure handling Luck, Tony
@ 2025-03-08 11:36   ` Shuai Xue
  0 siblings, 0 replies; 16+ messages in thread
From: Shuai Xue @ 2025-03-08 11:36 UTC (permalink / raw)
  To: Luck, Tony, bp, peterz, catalin.marinas, yazen.ghannam, akpm,
	linmiaohe, nao.horiguchi
  Cc: tglx, mingo, dave.hansen, x86, hpa, jpoimboe, linux-edac,
	linux-kernel, linux-mm, baolin.wang, tianruidong



在 2025/3/8 01:20, Luck, Tony 写道:
>> ## 1. What am I trying to do:
>>
>> This patch resolves two critical regressions related to memory failure
>> handling that have appeared in the upstream kernel since version 5.17, as
>> compared to 5.10 LTS.
>>
>>      - copyin case: poison found in user page while kernel copying from user space
>>      - instr case: poison found while instruction fetching in user space
> 
> Tested the instruction, copyin, and futex cases. They all pass now.
> 
> Thanks!
> 
> Tested-by: Tony Luck <tony.luck@intel.com>
> 
> -Tony

Thanks for your help.

Best Regards,
Shuai


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 2/3] mm/hwpoison: Do not send SIGBUS to processes with recovered clean pages
  2025-03-07  5:44 ` [PATCH v4 2/3] mm/hwpoison: Do not send SIGBUS to processes with recovered clean pages Shuai Xue
@ 2025-03-12  6:39   ` Miaohe Lin
  2025-03-12  8:03     ` Shuai Xue
  0 siblings, 1 reply; 16+ messages in thread
From: Miaohe Lin @ 2025-03-12  6:39 UTC (permalink / raw)
  To: Shuai Xue
  Cc: tglx, mingo, dave.hansen, x86, hpa, jpoimboe, linux-edac,
	linux-kernel, linux-mm, baolin.wang, tianruidong, tony.luck, bp,
	peterz, catalin.marinas, yazen.ghannam, akpm, nao.horiguchi

On 2025/3/7 13:44, Shuai Xue wrote:
> When an uncorrected memory error is consumed there is a race between the
> CMCI from the memory controller reporting an uncorrected error with a UCNA
> signature, and the core reporting and SRAR signature machine check when the
> data is about to be consumed.
> 
> - Background: why *UN*corrected errors tied to *C*MCI in Intel platform [1]
> 
> Prior to Icelake memory controllers reported patrol scrub events that
> detected a previously unseen uncorrected error in memory by signaling a
> broadcast machine check with an SRAO (Software Recoverable Action Optional)
> signature in the machine check bank. This was overkill because it's not an
> urgent problem that no core is on the verge of consuming that bad data.
> It's also found that multi SRAO UCE may cause nested MCE interrupts and
> finally become an IERR.
> 
> Hence, Intel downgrades the machine check bank signature of patrol
> scrub from SRAO to UCNA (Uncorrected, No Action required), and signal
> changed to #CMCI. Just to add to the confusion, Linux does take an action
> (in uc_decode_notifier()) to try to offline the page despite the UC*NA*
> signature name.
> 
> - Background: why #CMCI and #MCE race when poison is consuming in Intel platform [1]
> 
> Having decided that CMCI/UCNA is the best action for patrol scrub errors,
> the memory controller uses it for reads too. But the memory controller is
> executing asynchronously from the core, and can't tell the difference
> between a "real" read and a speculative read. So it will do CMCI/UCNA if an
> error is found in any read.
> 
> Thus:
> 
> 1) Core is clever and thinks address A is needed soon, issues a speculative read.
> 2) Core finds it is going to use address A soon after sending the read request
> 3) The CMCI from the memory controller is in a race with MCE from the core
>    that will soon try to retire the load from address A.
> 
> Quite often (because speculation has got better) the CMCI from the memory
> controller is delivered before the core is committed to the instruction
> reading address A, so the interrupt is taken, and Linux offlines the page
> (marking it as poison).
> 
> - Why user process is killed for instr case
> 
> Commit 046545a661af ("mm/hwpoison: fix error page recovered but reported
> "not recovered"") tries to fix noise message "Memory error not recovered"
> and skips duplicate SIGBUSs due to the race. But it also introduced a bug
> that kill_accessing_process() return -EHWPOISON for instr case, as result,
> kill_me_maybe() send a SIGBUS to user process.
> 
> If the CMCI wins that race, the page is marked poisoned when
> uc_decode_notifier() calls memory_failure(). For dirty pages,
> memory_failure() invokes try_to_unmap() with the TTU_HWPOISON flag,
> converting the PTE to a hwpoison entry. As a result,
> kill_accessing_process():
> 
> - call walk_page_range() and return 1 regardless of whether
>   try_to_unmap() succeeds or fails,
> - call kill_proc() to make sure a SIGBUS is sent
> - return -EHWPOISON to indicate that SIGBUS is already sent to the
>   process and kill_me_maybe() doesn't have to send it again.
> 
> However, for clean pages, the TTU_HWPOISON flag is cleared, leaving the
> PTE unchanged and not converted to a hwpoison entry. Conversely, for
> clean pages where PTE entries are not marked as hwpoison,
> kill_accessing_process() returns -EFAULT, causing kill_me_maybe() to
> send a SIGBUS.
> 
> Console log looks like this:
> 
>     Memory failure: 0x827ca68: corrupted page was clean: dropped without side effects
>     Memory failure: 0x827ca68: recovery action for clean LRU page: Recovered
>     Memory failure: 0x827ca68: already hardware poisoned
>     mce: Memory error not recovered
> 
> To fix it, return 0 for "corrupted page was clean", preventing an
> unnecessary SIGBUS to user process.
> 
> [1] https://lore.kernel.org/lkml/20250217063335.22257-1-xueshuai@linux.alibaba.com/T/#mba94f1305b3009dd340ce4114d3221fe810d1871
> Fixes: 046545a661af ("mm/hwpoison: fix error page recovered but reported "not recovered"")
> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
> Cc: stable@vger.kernel.org

Thanks for your detailed commit log. This patch looks good to me.

Acked-by: Miaohe Lin <linmiaohe@huawei.com>

Thanks.
.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 2/3] mm/hwpoison: Do not send SIGBUS to processes with recovered clean pages
  2025-03-12  6:39   ` Miaohe Lin
@ 2025-03-12  8:03     ` Shuai Xue
  0 siblings, 0 replies; 16+ messages in thread
From: Shuai Xue @ 2025-03-12  8:03 UTC (permalink / raw)
  To: Miaohe Lin, tony.luck, Borislav Petkov
  Cc: tglx, mingo, dave.hansen, x86, hpa, jpoimboe, linux-edac,
	linux-kernel, linux-mm, baolin.wang, tianruidong, bp, peterz,
	catalin.marinas, yazen.ghannam, akpm, nao.horiguchi



在 2025/3/12 14:39, Miaohe Lin 写道:
> On 2025/3/7 13:44, Shuai Xue wrote:
>> When an uncorrected memory error is consumed there is a race between the
>> CMCI from the memory controller reporting an uncorrected error with a UCNA
>> signature, and the core reporting and SRAR signature machine check when the
>> data is about to be consumed.
>>
>> - Background: why *UN*corrected errors tied to *C*MCI in Intel platform [1]
>>
>> Prior to Icelake memory controllers reported patrol scrub events that
>> detected a previously unseen uncorrected error in memory by signaling a
>> broadcast machine check with an SRAO (Software Recoverable Action Optional)
>> signature in the machine check bank. This was overkill because it's not an
>> urgent problem that no core is on the verge of consuming that bad data.
>> It's also found that multi SRAO UCE may cause nested MCE interrupts and
>> finally become an IERR.
>>
>> Hence, Intel downgrades the machine check bank signature of patrol
>> scrub from SRAO to UCNA (Uncorrected, No Action required), and signal
>> changed to #CMCI. Just to add to the confusion, Linux does take an action
>> (in uc_decode_notifier()) to try to offline the page despite the UC*NA*
>> signature name.
>>
>> - Background: why #CMCI and #MCE race when poison is consuming in Intel platform [1]
>>
>> Having decided that CMCI/UCNA is the best action for patrol scrub errors,
>> the memory controller uses it for reads too. But the memory controller is
>> executing asynchronously from the core, and can't tell the difference
>> between a "real" read and a speculative read. So it will do CMCI/UCNA if an
>> error is found in any read.
>>
>> Thus:
>>
>> 1) Core is clever and thinks address A is needed soon, issues a speculative read.
>> 2) Core finds it is going to use address A soon after sending the read request
>> 3) The CMCI from the memory controller is in a race with MCE from the core
>>     that will soon try to retire the load from address A.
>>
>> Quite often (because speculation has got better) the CMCI from the memory
>> controller is delivered before the core is committed to the instruction
>> reading address A, so the interrupt is taken, and Linux offlines the page
>> (marking it as poison).
>>
>> - Why user process is killed for instr case
>>
>> Commit 046545a661af ("mm/hwpoison: fix error page recovered but reported
>> "not recovered"") tries to fix noise message "Memory error not recovered"
>> and skips duplicate SIGBUSs due to the race. But it also introduced a bug
>> that kill_accessing_process() return -EHWPOISON for instr case, as result,
>> kill_me_maybe() send a SIGBUS to user process.
>>
>> If the CMCI wins that race, the page is marked poisoned when
>> uc_decode_notifier() calls memory_failure(). For dirty pages,
>> memory_failure() invokes try_to_unmap() with the TTU_HWPOISON flag,
>> converting the PTE to a hwpoison entry. As a result,
>> kill_accessing_process():
>>
>> - call walk_page_range() and return 1 regardless of whether
>>    try_to_unmap() succeeds or fails,
>> - call kill_proc() to make sure a SIGBUS is sent
>> - return -EHWPOISON to indicate that SIGBUS is already sent to the
>>    process and kill_me_maybe() doesn't have to send it again.
>>
>> However, for clean pages, the TTU_HWPOISON flag is cleared, leaving the
>> PTE unchanged and not converted to a hwpoison entry. Conversely, for
>> clean pages where PTE entries are not marked as hwpoison,
>> kill_accessing_process() returns -EFAULT, causing kill_me_maybe() to
>> send a SIGBUS.
>>
>> Console log looks like this:
>>
>>      Memory failure: 0x827ca68: corrupted page was clean: dropped without side effects
>>      Memory failure: 0x827ca68: recovery action for clean LRU page: Recovered
>>      Memory failure: 0x827ca68: already hardware poisoned
>>      mce: Memory error not recovered
>>
>> To fix it, return 0 for "corrupted page was clean", preventing an
>> unnecessary SIGBUS to user process.
>>
>> [1] https://lore.kernel.org/lkml/20250217063335.22257-1-xueshuai@linux.alibaba.com/T/#mba94f1305b3009dd340ce4114d3221fe810d1871
>> Fixes: 046545a661af ("mm/hwpoison: fix error page recovered but reported "not recovered"")
>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
>> Cc: stable@vger.kernel.org
> 
> Thanks for your detailed commit log. This patch looks good to me.
> 
> Acked-by: Miaohe Lin <linmiaohe@huawei.com>
> 
> Thanks.
> .

Thanks.

Most part is borrowed from disscusion with Tony and Borislav.
Thanks to them :)

Best Regards,
Shuai


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v4 1/3] x86/mce: Use is_copy_from_user() to determine copy-from-user context
  2025-03-12 11:28 Shuai Xue
@ 2025-03-12 11:28 ` Shuai Xue
  0 siblings, 0 replies; 16+ messages in thread
From: Shuai Xue @ 2025-03-12 11:28 UTC (permalink / raw)
  To: akpm, linmiaohe, nao.horiguchi
  Cc: tony.luck, bp, peterz, catalin.marinas, yazen.ghannam, tglx,
	mingo, dave.hansen, x86, hpa, jpoimboe, linux-edac, linux-kernel,
	linux-mm, baolin.wang, tianruidong, xueshuai

Commit 4c132d1d844a ("x86/futex: Remove .fixup usage") introduced a new
extable fixup type, EX_TYPE_EFAULT_REG, and commit 4c132d1d844a
("x86/futex: Remove .fixup usage") updated the extable fixup type for
copy-from-user operations, changing it from EX_TYPE_UACCESS to
EX_TYPE_EFAULT_REG. The error context for copy-from-user operations no
longer functions as an in-kernel recovery context. Consequently, the error
context for copy-from-user operations no longer functions as an in-kernel
recovery context, resulting in kernel panics with the message: "Machine
check: Data load in unrecoverable area of kernel."

To address this, it is crucial to identify if an error context involves a
read operation from user memory. The function is_copy_from_user() can be
utilized to determine:

    - the current operation is copy
    - when reading user memory

When these conditions are met, is_copy_from_user() will return true,
confirming that it is indeed a direct copy from user memory. This check is
essential for correctly handling the context of errors in these operations
without relying on the extable fixup types that previously allowed for
in-kernel recovery.

So, use is_copy_from_user() to determine if a context is copy user directly.

Fixes: 4c132d1d844a ("x86/futex: Remove .fixup usage")
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Tony Luck <tony.luck@intel.com>
Cc: stable@vger.kernel.org
---
 arch/x86/kernel/cpu/mce/severity.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/severity.c b/arch/x86/kernel/cpu/mce/severity.c
index dac4d64dfb2a..2235a7477436 100644
--- a/arch/x86/kernel/cpu/mce/severity.c
+++ b/arch/x86/kernel/cpu/mce/severity.c
@@ -300,13 +300,12 @@ static noinstr int error_context(struct mce *m, struct pt_regs *regs)
 	copy_user  = is_copy_from_user(regs);
 	instrumentation_end();

-	switch (fixup_type) {
-	case EX_TYPE_UACCESS:
-		if (!copy_user)
-			return IN_KERNEL;
-		m->kflags |= MCE_IN_KERNEL_COPYIN;
-		fallthrough;
+	if (copy_user) {
+		m->kflags |= MCE_IN_KERNEL_COPYIN | MCE_IN_KERNEL_RECOV;
+		return IN_KERNEL_RECOV;
+	}

+	switch (fixup_type) {
 	case EX_TYPE_FAULT_MCE_SAFE:
 	case EX_TYPE_DEFAULT_MCE_SAFE:
 		m->kflags |= MCE_IN_KERNEL_RECOV;
-- 
2.39.3

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-03-12 11:29 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-07  5:44 [PATCH v4 0/3] mm/hwpoison: Fix regressions in memory failure handling Shuai Xue
2025-03-07  5:44 ` [PATCH v4 1/3] x86/mce: Use is_copy_from_user() to determine copy-from-user context Shuai Xue
2025-03-07 20:40   ` Borislav Petkov
2025-03-07 22:05     ` Luck, Tony
2025-03-07 22:46       ` Borislav Petkov
2025-03-07 23:11         ` Luck, Tony
2025-03-07 23:22           ` Borislav Petkov
2025-03-08 11:27             ` Shuai Xue
2025-03-08 11:25     ` Shuai Xue
2025-03-07  5:44 ` [PATCH v4 2/3] mm/hwpoison: Do not send SIGBUS to processes with recovered clean pages Shuai Xue
2025-03-12  6:39   ` Miaohe Lin
2025-03-12  8:03     ` Shuai Xue
2025-03-07  5:44 ` [PATCH v4 3/3] mm: memory-failure: Enhance comments for return value of memory_failure() Shuai Xue
2025-03-07 17:20 ` [PATCH v4 0/3] mm/hwpoison: Fix regressions in memory failure handling Luck, Tony
2025-03-08 11:36   ` Shuai Xue
2025-03-12 11:28 Shuai Xue
2025-03-12 11:28 ` [PATCH v4 1/3] x86/mce: Use is_copy_from_user() to determine copy-from-user context Shuai Xue

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox