linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] mm/memory-failure: add panic option for unrecoverable pages
@ 2026-04-13 13:26 Breno Leitao
  2026-04-13 13:26 ` [PATCH v3 1/3] mm/memory-failure: report MF_MSG_KERNEL for reserved pages Breno Leitao
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Breno Leitao @ 2026-04-13 13:26 UTC (permalink / raw)
  To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Jonathan Corbet,
	Shuah Khan, David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko
  Cc: linux-mm, linux-kernel, linux-doc, Breno Leitao, kernel-team

When the memory failure handler encounters an in-use kernel page that it
cannot recover (slab, page tables, kernel stacks, vmalloc, etc.), it
currently logs the error as "Ignored" and continues operation.

This leaves corrupted data accessible to the kernel, which will inevitably
cause either silent data corruption or a delayed crash when the poisoned memory
is next accessed.

This is a common problem on large fleets. We frequently observe multi-bit ECC
errors hitting kernel slab pages, where memory_failure() fails to recover them
and the system crashes later at an unrelated code path, making root cause
analysis unnecessarily difficult.

Here is one specific example from production on an arm64 server: a multi-bit
ECC error hit a dentry cache slab page, memory_failure() failed to recover it
(slab pages are not supported by the hwpoison recovery mechanism), and 67
seconds later d_lookup() accessed the poisoned cache line causing a synchronous
external abort:

    [88690.479680] [Hardware Error]: error_type: 3, multi-bit ECC
    [88690.498473] Memory failure: 0x40272d: unhandlable page.
    [88690.498619] Memory failure: 0x40272d: recovery action for
                   get hwpoison page: Ignored
    ...
    [88757.847126] Internal error: synchronous external abort:
                   0000000096000410 [#1] SMP
    [88758.061075] pc : d_lookup+0x5c/0x220

This series adds a new sysctl vm.panic_on_unrecoverable_memory_failure
(default 0) that, when enabled, panics immediately on unrecoverable
memory failures. This provides a clean crash dump at the time of the
error, which is far more useful for diagnosis than a random crash later
at an unrelated code path.

This also categorizes reserved pages as MF_MSG_KERNEL, and panics on
unknown page types (MF_MSG_UNKNOWN), so all unrecoverable failure cases
are covered.

A CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC kernel configuration option is
also provided, similar to CONFIG_BOOTPARAM_HARDLOCKUP_PANIC, allowing
the sysctl to be enabled at build time for systems that always want to
panic on unrecoverable memory failures without requiring runtime
configuration.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
Changes in v3:
- Rename is_unrecoverable_memory_failure() to panic_on_unrecoverable_mf()
  as suggested by maintainer.
- Add CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC kernel configuration option,
  similar to CONFIG_BOOTPARAM_HARDLOCKUP_PANIC.
- Add documentation for the sysctl and CONFIG option.
- Add code comments documenting the panic condition design rationale and
  how the retry mechanism mitigates false positives from buddy allocator
  races.
- Link to v2: https://patch.msgid.link/20260331-ecc_panic-v2-0-9e40d0f64f7a@debian.org

Changes in v2:
- Panic on MF_MSG_KERNEL, MF_MSG_KERNEL_HIGH_ORDER and MF_MSG_UNKNOWN
  instead of MF_MSG_GET_HWPOISON.
- Report MF_MSG_KERNEL for reserved pages when get_hwpoison_page() fails
  instead of MF_MSG_GET_HWPOISON.
- Link to v1: https://patch.msgid.link/20260323-ecc_panic-v1-0-72a1921726c5@debian.org

---
Breno Leitao (3):
      mm/memory-failure: report MF_MSG_KERNEL for reserved pages
      mm/memory-failure: add CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC option
      Documentation: document panic_on_unrecoverable_memory_failure sysctl

 Documentation/admin-guide/sysctl/vm.rst | 46 ++++++++++++++++++++++++++++++
 mm/Kconfig                              |  9 ++++++
 mm/memory-failure.c                     | 50 ++++++++++++++++++++++++++++++++-
 3 files changed, 104 insertions(+), 1 deletion(-)
---
base-commit: 028ef9c96e96197026887c0f092424679298aae8
change-id: 20260323-ecc_panic-4e473b83087c

Best regards,
--  
Breno Leitao <leitao@debian.org>



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v3 1/3] mm/memory-failure: report MF_MSG_KERNEL for reserved pages
  2026-04-13 13:26 [PATCH v3 0/3] mm/memory-failure: add panic option for unrecoverable pages Breno Leitao
@ 2026-04-13 13:26 ` Breno Leitao
  2026-04-13 13:26 ` [PATCH v3 2/3] mm/memory-failure: add CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC option Breno Leitao
  2026-04-13 13:26 ` [PATCH v3 3/3] Documentation: document panic_on_unrecoverable_memory_failure sysctl Breno Leitao
  2 siblings, 0 replies; 4+ messages in thread
From: Breno Leitao @ 2026-04-13 13:26 UTC (permalink / raw)
  To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Jonathan Corbet,
	Shuah Khan, David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko
  Cc: linux-mm, linux-kernel, linux-doc, Breno Leitao, kernel-team

When get_hwpoison_page() returns a negative value, distinguish
reserved pages from other failure cases by reporting MF_MSG_KERNEL
instead of MF_MSG_GET_HWPOISON. Reserved pages belong to the kernel
and should be classified accordingly for proper handling by the
panic_on_unrecoverable_memory_failure mechanism.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 mm/memory-failure.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 48 insertions(+), 1 deletion(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index ee42d43613097..852c595aff108 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -74,6 +74,8 @@ static int sysctl_memory_failure_recovery __read_mostly = 1;
 
 static int sysctl_enable_soft_offline __read_mostly = 1;
 
+static int sysctl_panic_on_unrecoverable_mf __read_mostly;
+
 atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
 
 static bool hw_memory_failure __read_mostly = false;
@@ -155,6 +157,15 @@ static const struct ctl_table memory_failure_table[] = {
 		.proc_handler	= proc_dointvec_minmax,
 		.extra1		= SYSCTL_ZERO,
 		.extra2		= SYSCTL_ONE,
+	},
+	{
+		.procname	= "panic_on_unrecoverable_memory_failure",
+		.data		= &sysctl_panic_on_unrecoverable_mf,
+		.maxlen		= sizeof(sysctl_panic_on_unrecoverable_mf),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_ONE,
 	}
 };
 
@@ -1281,6 +1292,35 @@ static void update_per_node_mf_stats(unsigned long pfn,
 	++mf_stats->total;
 }
 
+/*
+ * Determine whether to panic on an unrecoverable memory failure.
+ *
+ * Design rationale: This design opts for immediate panic on kernel memory
+ * failures, capturing clean crashes other than random crashes on MF_IGNORED pages
+ *
+ * This panics on three categories of failures:
+ * - MF_MSG_KERNEL: Reserved pages that cannot be recovered
+ * - MF_MSG_KERNEL_HIGH_ORDER: High-order kernel pages that cannot be recovered
+ * - MF_MSG_UNKNOWN: Pages with unknown state that cannot be classified as recoverable
+ * - and the page is not being recovered (result = MF_IGNORED)
+ *
+ * Note: Transient races are mitigated by memory_failure()'s retry mechanism.
+ * When a buddy allocator race is detected (take_page_off_buddy() fails), the
+ * code clears PageHWPoison and retries the entire memory_failure() flow,
+ * allowing pages to be properly reclassified with updated flags. This ensures
+ * that false posiotives are not misclassified as unrecoverable.
+ *
+ */
+static bool panic_on_unrecoverable_mf(enum mf_action_page_type type,
+				       enum mf_result result)
+{
+	return sysctl_panic_on_unrecoverable_mf &&
+	       result == MF_IGNORED &&
+	       (type == MF_MSG_KERNEL ||
+		type == MF_MSG_KERNEL_HIGH_ORDER ||
+		type == MF_MSG_UNKNOWN);
+}
+
 /*
  * "Dirty/Clean" indication is not 100% accurate due to the possibility of
  * setting PG_dirty outside page lock. See also comment above set_page_dirty().
@@ -1298,6 +1338,9 @@ static int action_result(unsigned long pfn, enum mf_action_page_type type,
 	pr_err("%#lx: recovery action for %s: %s\n",
 		pfn, action_page_types[type], action_name[result]);
 
+	if (panic_on_unrecoverable_mf(type, result))
+		panic("Memory failure: %#lx: unrecoverable page", pfn);
+
 	return (result == MF_RECOVERED || result == MF_DELAYED) ? 0 : -EBUSY;
 }
 
@@ -2432,7 +2475,11 @@ int memory_failure(unsigned long pfn, int flags)
 		}
 		goto unlock_mutex;
 	} else if (res < 0) {
-		res = action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED);
+		if (PageReserved(p))
+			res = action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);
+		else
+			res = action_result(pfn, MF_MSG_GET_HWPOISON,
+					    MF_IGNORED);
 		goto unlock_mutex;
 	}
 

-- 
2.52.0



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v3 2/3] mm/memory-failure: add CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC option
  2026-04-13 13:26 [PATCH v3 0/3] mm/memory-failure: add panic option for unrecoverable pages Breno Leitao
  2026-04-13 13:26 ` [PATCH v3 1/3] mm/memory-failure: report MF_MSG_KERNEL for reserved pages Breno Leitao
@ 2026-04-13 13:26 ` Breno Leitao
  2026-04-13 13:26 ` [PATCH v3 3/3] Documentation: document panic_on_unrecoverable_memory_failure sysctl Breno Leitao
  2 siblings, 0 replies; 4+ messages in thread
From: Breno Leitao @ 2026-04-13 13:26 UTC (permalink / raw)
  To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Jonathan Corbet,
	Shuah Khan, David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko
  Cc: linux-mm, linux-kernel, linux-doc, Breno Leitao, kernel-team

Add a kernel configuration option to enable panic on unrecoverable
memory failures at boot time, similar to CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC
and CONFIG_BOOTPARAM_HARDLOCKUP_PANIC.

This allows systems that prioritize availability over recovery to
automatically panic when encountering unrecoverable kernel memory
failures. The behavior can still be controlled at runtime via the
panic_on_unrecoverable_memory_failure sysctl.

When enabled, the kernel will panic if:
 * A memory failure affects kernel pages that cannot be recovered
 * A memory failure affects high-order kernel pages
 * A memory failure affects unknown page types that cannot be recovered

Examples of BOOTPARAM configuration usage:

1. Building with the panic option enabled by default:
   CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC=y

2. Disabling at runtime even when compiled in:
   echo 0 > /proc/sys/vm/panic_on_unrecoverable_memory_failure

3. Enabling at runtime when not compiled in by default:
   echo 1 > /proc/sys/vm/panic_on_unrecoverable_memory_failure

Similar to other BOOTPARAM options, this provides a balance between:
 - Safe defaults (disabled by default without CONFIG option)
 - Production flexibility (can be enabled at build time)
 - Runtime control (can be toggled via sysctl)

This is consistent with the kernel's approach to other panic-on-error
options that allow systems to choose between attempting recovery or
failing fast when critical errors are detected.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 mm/Kconfig          | 9 +++++++++
 mm/memory-failure.c | 3 ++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index ebd8ea353687e..596f24a872ff6 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -733,6 +733,15 @@ config MEMORY_FAILURE
 	  even when some of its memory has uncorrected errors. This requires
 	  special hardware support and typically ECC memory.
 
+config BOOTPARAM_MEMORY_FAILURE_PANIC
+	bool "Panic on unrecoverable memory failure"
+	depends on MEMORY_FAILURE
+	help
+	  Say Y here to panic when an unrecoverable memory failure is
+	  detected. This covers kernel pages, high-order kernel pages,
+	  and unknown page types that cannot be recovered. Can be disabled
+	  at runtime via the panic_on_unrecoverable_memory_failure sysctl.
+
 config HWPOISON_INJECT
 	tristate "HWPoison pages injector"
 	depends on MEMORY_FAILURE && DEBUG_KERNEL && PROC_FS
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 852c595aff108..cf06960b4d069 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -74,7 +74,8 @@ static int sysctl_memory_failure_recovery __read_mostly = 1;
 
 static int sysctl_enable_soft_offline __read_mostly = 1;
 
-static int sysctl_panic_on_unrecoverable_mf __read_mostly;
+static int sysctl_panic_on_unrecoverable_mf __read_mostly =
+			IS_ENABLED(CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC);
 
 atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
 

-- 
2.52.0



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v3 3/3] Documentation: document panic_on_unrecoverable_memory_failure sysctl
  2026-04-13 13:26 [PATCH v3 0/3] mm/memory-failure: add panic option for unrecoverable pages Breno Leitao
  2026-04-13 13:26 ` [PATCH v3 1/3] mm/memory-failure: report MF_MSG_KERNEL for reserved pages Breno Leitao
  2026-04-13 13:26 ` [PATCH v3 2/3] mm/memory-failure: add CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC option Breno Leitao
@ 2026-04-13 13:26 ` Breno Leitao
  2 siblings, 0 replies; 4+ messages in thread
From: Breno Leitao @ 2026-04-13 13:26 UTC (permalink / raw)
  To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Jonathan Corbet,
	Shuah Khan, David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko
  Cc: linux-mm, linux-kernel, linux-doc, Breno Leitao, kernel-team

Document the vm.panic_on_unrecoverable_memory_failure sysctl in the
admin guide, including the CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC kernel
configuration option that allows enabling this behavior at build time.

This follows the same format as panic_on_unrecovered_nmi and other
panic-on-error documentation, providing clear examples of:
 - Enabling panic at build time via CONFIG option
 - Disabling at runtime via sysctl
 - Enabling at runtime via sysctl

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 Documentation/admin-guide/sysctl/vm.rst | 46 +++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 97e12359775c9..af545869bc1b4 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -67,6 +67,7 @@ Currently, these files are in /proc/sys/vm:
 - page-cluster
 - page_lock_unfairness
 - panic_on_oom
+- panic_on_unrecoverable_memory_failure
 - percpu_pagelist_high_fraction
 - stat_interval
 - stat_refresh
@@ -925,6 +926,51 @@ panic_on_oom=2+kdump gives you very strong tool to investigate
 why oom happens. You can get snapshot.
 
 
+panic_on_unrecoverable_memory_failure
+======================================
+
+When a hardware memory error (e.g. multi-bit ECC) hits an in-use kernel
+page that cannot be recovered by the memory failure handler, the default
+behaviour is to ignore the error and continue operation.  This is
+dangerous because the corrupted data remains accessible to the kernel,
+risking silent data corruption or a delayed crash when the poisoned
+memory is next accessed.
+
+Pages that reach this path include slab objects (dentry cache, inode
+cache, etc.), page tables, kernel stacks, and other kernel allocations
+that lack the reverse mapping needed to isolate all references.
+
+For many environments it is preferable to panic immediately with a clean
+crash dump that captures the original error context, rather than to
+continue and face a random crash later whose cause is difficult to
+diagnose.
+
+= =====================================================================
+0 Try to continue operation (default).
+1 Panic immediately.  If the ``panic`` sysctl is also non-zero then the
+  machine will be rebooted.
+= =====================================================================
+
+This sysctl can be set to 1 at boot time by enabling the
+``CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC`` kernel configuration option.
+This provides systems with the ability to enforce panic-on-error behavior
+from the kernel build, without requiring runtime sysctl configuration.
+
+Examples:
+
+1. Enable panic on unrecoverable memory failure at kernel build time::
+
+     CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC=y
+
+2. Disable at runtime even when compiled in::
+
+     echo 0 > /proc/sys/vm/panic_on_unrecoverable_memory_failure
+
+3. Enable at runtime when not enabled at build time::
+
+     echo 1 > /proc/sys/vm/panic_on_unrecoverable_memory_failure
+
+
 percpu_pagelist_high_fraction
 =============================
 

-- 
2.52.0



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-04-13 13:27 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-04-13 13:26 [PATCH v3 0/3] mm/memory-failure: add panic option for unrecoverable pages Breno Leitao
2026-04-13 13:26 ` [PATCH v3 1/3] mm/memory-failure: report MF_MSG_KERNEL for reserved pages Breno Leitao
2026-04-13 13:26 ` [PATCH v3 2/3] mm/memory-failure: add CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC option Breno Leitao
2026-04-13 13:26 ` [PATCH v3 3/3] Documentation: document panic_on_unrecoverable_memory_failure sysctl Breno Leitao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox