[PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM
@ 2025-11-04  7:23 Xie Yuanbin
  2025-11-04  7:23 ` [PATCH v2 1/2] " Xie Yuanbin
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Xie Yuanbin @ 2025-11-04  7:23 UTC (permalink / raw)
  To: david, dave.hansen, bp, tglx, mingo, dave.hansen, hpa, akpm,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	linmiaohe, nao.horiguchi, luto, peterz, tony.luck
  Cc: x86, linux-kernel, linux-mm, linux-edac, will, liaohua4,
	lilinjie8, Xie Yuanbin

Memory bit flips are among the most common hardware errors in the server
and embedded fields, many hardware components have memory verification
mechanisms, for example ECC. When an error is detected, some hardware or
architectures report the information to software (OS/BIOS), for example,
the MCE (Machine Check Exception) on x86.

Common errors include CE (Correctable Errors) and UE (Uncorrectable
Errors). When the kernel receives memory error information, if it has the
memory-failure feature, it can better handle memory errors without reboot.
For example, kernel can attempt to offline the affected memory by
migrating it or killing the process. Therefore, this feature is widely
used in servers and embedded fields.

For historical versions, memory-failure cannot be enabled with x86_32 &&
SPARSEMEM because the number of page-flags are insufficient. However, this
issue has been resolved in the current version, and this patch will allow
SPARSEMEM and memory-failure to be enabled together on x86_32.

By the way, due to increased demand, DRAM prices have recently
skyrocketed, making memory-failure potentially even more valuable in the
coming years.

v1-v2: https://lore.kernel.org/20251103033536.52234-1-xieyuanbin1@huawei.com
  - Describe the purpose of these patches in the cover letter.

  - Correct the description of historical changes to page flags.

  - Move the memory-failure traceing code from ras_event.h to
    memory-failure.h

Xie Yuanbin (2):
  x86/mm: support memory-failure on 32-bits with SPARSEMEM
  mm/memory-failure: remove the selection of RAS

 arch/x86/Kconfig                      |  3 -
 include/ras/ras_event.h               | 86 ------------------------
 include/trace/events/memory-failure.h | 97 +++++++++++++++++++++++++++
 mm/Kconfig                            |  1 -
 mm/memory-failure.c                   |  5 +-
 5 files changed, 101 insertions(+), 91 deletions(-)
 create mode 100644 include/trace/events/memory-failure.h

-- 
2.51.0

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM
  2025-11-04  7:23 [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM Xie Yuanbin
@ 2025-11-04  7:23 ` Xie Yuanbin
  2025-11-04  7:23 ` [PATCH v2 2/2] mm/memory-failure: remove the selection of RAS Xie Yuanbin
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Xie Yuanbin @ 2025-11-04  7:23 UTC (permalink / raw)
  To: david, dave.hansen, bp, tglx, mingo, dave.hansen, hpa, akpm,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	linmiaohe, nao.horiguchi, luto, peterz, tony.luck
  Cc: x86, linux-kernel, linux-mm, linux-edac, will, liaohua4,
	lilinjie8, Xie Yuanbin

The historical commit d949f36f1865c60239d4 ("x86: Fix hwpoison code
related build failure on 32-bit NUMAQ"), disabled x86_32's
memory-failure when SPARSEMEM is enabled, because the number of
page-flags are insufficient.

The commit 46df8e73a4a3f1445f2a ("mm: free up PG_slab") removes PG_slab
flag that allows MEMORY_FAILURE to be enabled from here on.

The commit 09022bc196d23484a7a5 ("mm: remove PG_error") removes PG_error
flag.

The commit cceba6f7e46c48deca43 ("mm: add PG_dropbehind folio flag") add
PG_dropbehind flag, but MEMORY_FAILURE can still be enabled.

For the current version, for x86_32, when SPARSEMEM && HIGHMEM && X86_PAE
&& X86_PAT, the number of pageflags reaches its maximum value,
which is 31. Therefore, MEMORY_FAILURE can be safely enabled.

Signed-off-by: Xie Yuanbin <xieyuanbin1@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
---
 arch/x86/Kconfig | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d55c01efd7c2..f9ee57a55500 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -750,13 +750,10 @@ config IOSF_MBI_DEBUG

 config X86_SUPPORTS_MEMORY_FAILURE
 	def_bool y
 	# MCE code calls memory_failure():
 	depends on X86_MCE
-	# On 32-bit this adds too big of NODES_SHIFT and we run out of page flags:
-	# On 32-bit SPARSEMEM adds too big of SECTIONS_WIDTH:
-	depends on X86_64 || !SPARSEMEM
 	select ARCH_SUPPORTS_MEMORY_FAILURE

 config X86_32_IRIS
 	tristate "Eurobraille/Iris poweroff module"
 	depends on X86_32
-- 
2.51.0

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 2/2] mm/memory-failure: remove the selection of RAS
  2025-11-04  7:23 [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM Xie Yuanbin
  2025-11-04  7:23 ` [PATCH v2 1/2] " Xie Yuanbin
@ 2025-11-04  7:23 ` Xie Yuanbin
  2025-11-04  9:38   ` David Hildenbrand (Red Hat)
  2025-11-04  9:33 ` [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM David Hildenbrand (Red Hat)
  2025-11-04 14:26 ` Dave Hansen
  3 siblings, 1 reply; 15+ messages in thread
From: Xie Yuanbin @ 2025-11-04  7:23 UTC (permalink / raw)
  To: david, dave.hansen, bp, tglx, mingo, dave.hansen, hpa, akpm,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	linmiaohe, nao.horiguchi, luto, peterz, tony.luck
  Cc: x86, linux-kernel, linux-mm, linux-edac, will, liaohua4,
	lilinjie8, Xie Yuanbin

The commit 97f0b13452198290799f ("tracing: add trace event for
memory-failure") introduces the selection of RAS in memory-failure.
This commit is just a tracing feature; in reality, there is no dependency
between memory-failure and RAS. RAS increases the size of the bzImage
image by 8k, which is very valuable for embedded devices.

Move the memory-failure traceing code from ras_event.h to
memory-failure.h and remove the selection of RAS.

Signed-off-by: Xie Yuanbin <xieyuanbin1@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
---
 include/ras/ras_event.h               | 86 ------------------------
 include/trace/events/memory-failure.h | 97 +++++++++++++++++++++++++++
 mm/Kconfig                            |  1 -
 mm/memory-failure.c                   |  5 +-
 4 files changed, 101 insertions(+), 88 deletions(-)
 create mode 100644 include/trace/events/memory-failure.h

diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index c8cd0f00c845..1e5e87020eef 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -10,11 +10,10 @@
 #include <linux/edac.h>
 #include <linux/ktime.h>
 #include <linux/pci.h>
 #include <linux/aer.h>
 #include <linux/cper.h>
-#include <linux/mm.h>
 
 /*
  * MCE Extended Error Log trace event
  *
  * These events are generated when hardware detects a corrected or
@@ -337,94 +336,9 @@ TRACE_EVENT(aer_event,
 		__entry->tlp_header_valid ?
 			__print_array(__entry->tlp_header, PCIE_STD_MAX_TLP_HEADERLOG, 4) :
 			"Not available")
 );
 #endif /* CONFIG_PCIEAER */
-
-/*
- * memory-failure recovery action result event
- *
- * unsigned long pfn -	Page Frame Number of the corrupted page
- * int type	-	Page types of the corrupted page
- * int result	-	Result of recovery action
- */
-
-#ifdef CONFIG_MEMORY_FAILURE
-#define MF_ACTION_RESULT	\
-	EM ( MF_IGNORED, "Ignored" )	\
-	EM ( MF_FAILED,  "Failed" )	\
-	EM ( MF_DELAYED, "Delayed" )	\
-	EMe ( MF_RECOVERED, "Recovered" )
-
-#define MF_PAGE_TYPE		\
-	EM ( MF_MSG_KERNEL, "reserved kernel page" )			\
-	EM ( MF_MSG_KERNEL_HIGH_ORDER, "high-order kernel page" )	\
-	EM ( MF_MSG_HUGE, "huge page" )					\
-	EM ( MF_MSG_FREE_HUGE, "free huge page" )			\
-	EM ( MF_MSG_GET_HWPOISON, "get hwpoison page" )			\
-	EM ( MF_MSG_UNMAP_FAILED, "unmapping failed page" )		\
-	EM ( MF_MSG_DIRTY_SWAPCACHE, "dirty swapcache page" )		\
-	EM ( MF_MSG_CLEAN_SWAPCACHE, "clean swapcache page" )		\
-	EM ( MF_MSG_DIRTY_MLOCKED_LRU, "dirty mlocked LRU page" )	\
-	EM ( MF_MSG_CLEAN_MLOCKED_LRU, "clean mlocked LRU page" )	\
-	EM ( MF_MSG_DIRTY_UNEVICTABLE_LRU, "dirty unevictable LRU page" )	\
-	EM ( MF_MSG_CLEAN_UNEVICTABLE_LRU, "clean unevictable LRU page" )	\
-	EM ( MF_MSG_DIRTY_LRU, "dirty LRU page" )			\
-	EM ( MF_MSG_CLEAN_LRU, "clean LRU page" )			\
-	EM ( MF_MSG_TRUNCATED_LRU, "already truncated LRU page" )	\
-	EM ( MF_MSG_BUDDY, "free buddy page" )				\
-	EM ( MF_MSG_DAX, "dax page" )					\
-	EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" )			\
-	EM ( MF_MSG_ALREADY_POISONED, "already poisoned" )		\
-	EMe ( MF_MSG_UNKNOWN, "unknown page" )
-
-/*
- * First define the enums in MM_ACTION_RESULT to be exported to userspace
- * via TRACE_DEFINE_ENUM().
- */
-#undef EM
-#undef EMe
-#define EM(a, b) TRACE_DEFINE_ENUM(a);
-#define EMe(a, b)	TRACE_DEFINE_ENUM(a);
-
-MF_ACTION_RESULT
-MF_PAGE_TYPE
-
-/*
- * Now redefine the EM() and EMe() macros to map the enums to the strings
- * that will be printed in the output.
- */
-#undef EM
-#undef EMe
-#define EM(a, b)		{ a, b },
-#define EMe(a, b)	{ a, b }
-
-TRACE_EVENT(memory_failure_event,
-	TP_PROTO(unsigned long pfn,
-		 int type,
-		 int result),
-
-	TP_ARGS(pfn, type, result),
-
-	TP_STRUCT__entry(
-		__field(unsigned long, pfn)
-		__field(int, type)
-		__field(int, result)
-	),
-
-	TP_fast_assign(
-		__entry->pfn	= pfn;
-		__entry->type	= type;
-		__entry->result	= result;
-	),
-
-	TP_printk("pfn %#lx: recovery action for %s: %s",
-		__entry->pfn,
-		__print_symbolic(__entry->type, MF_PAGE_TYPE),
-		__print_symbolic(__entry->result, MF_ACTION_RESULT)
-	)
-);
-#endif /* CONFIG_MEMORY_FAILURE */
 #endif /* _TRACE_HW_EVENT_MC_H */
 
 /* This part must be outside protection */
 #include <trace/define_trace.h>
diff --git a/include/trace/events/memory-failure.h b/include/trace/events/memory-failure.h
new file mode 100644
index 000000000000..6c88fb624bd7
--- /dev/null
+++ b/include/trace/events/memory-failure.h
@@ -0,0 +1,97 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM ras
+#define TRACE_INCLUDE_FILE memory-failure
+
+#if !defined(_TRACE_MEMORY_FAILURE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_MEMORY_FAILURE_H
+
+#include <linux/tracepoint.h>
+#include <linux/mm.h>
+
+/*
+ * memory-failure recovery action result event
+ *
+ * unsigned long pfn -	Page Frame Number of the corrupted page
+ * int type	-	Page types of the corrupted page
+ * int result	-	Result of recovery action
+ */
+
+#define MF_ACTION_RESULT	\
+	EM ( MF_IGNORED, "Ignored" )	\
+	EM ( MF_FAILED,  "Failed" )	\
+	EM ( MF_DELAYED, "Delayed" )	\
+	EMe ( MF_RECOVERED, "Recovered" )
+
+#define MF_PAGE_TYPE		\
+	EM ( MF_MSG_KERNEL, "reserved kernel page" )			\
+	EM ( MF_MSG_KERNEL_HIGH_ORDER, "high-order kernel page" )	\
+	EM ( MF_MSG_HUGE, "huge page" )					\
+	EM ( MF_MSG_FREE_HUGE, "free huge page" )			\
+	EM ( MF_MSG_GET_HWPOISON, "get hwpoison page" )			\
+	EM ( MF_MSG_UNMAP_FAILED, "unmapping failed page" )		\
+	EM ( MF_MSG_DIRTY_SWAPCACHE, "dirty swapcache page" )		\
+	EM ( MF_MSG_CLEAN_SWAPCACHE, "clean swapcache page" )		\
+	EM ( MF_MSG_DIRTY_MLOCKED_LRU, "dirty mlocked LRU page" )	\
+	EM ( MF_MSG_CLEAN_MLOCKED_LRU, "clean mlocked LRU page" )	\
+	EM ( MF_MSG_DIRTY_UNEVICTABLE_LRU, "dirty unevictable LRU page" )	\
+	EM ( MF_MSG_CLEAN_UNEVICTABLE_LRU, "clean unevictable LRU page" )	\
+	EM ( MF_MSG_DIRTY_LRU, "dirty LRU page" )			\
+	EM ( MF_MSG_CLEAN_LRU, "clean LRU page" )			\
+	EM ( MF_MSG_TRUNCATED_LRU, "already truncated LRU page" )	\
+	EM ( MF_MSG_BUDDY, "free buddy page" )				\
+	EM ( MF_MSG_DAX, "dax page" )					\
+	EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" )			\
+	EM ( MF_MSG_ALREADY_POISONED, "already poisoned" )		\
+	EMe ( MF_MSG_UNKNOWN, "unknown page" )
+
+/*
+ * First define the enums in MM_ACTION_RESULT to be exported to userspace
+ * via TRACE_DEFINE_ENUM().
+ */
+#undef EM
+#undef EMe
+#define EM(a, b) TRACE_DEFINE_ENUM(a);
+#define EMe(a, b)	TRACE_DEFINE_ENUM(a);
+
+MF_ACTION_RESULT
+MF_PAGE_TYPE
+
+/*
+ * Now redefine the EM() and EMe() macros to map the enums to the strings
+ * that will be printed in the output.
+ */
+#undef EM
+#undef EMe
+#define EM(a, b)		{ a, b },
+#define EMe(a, b)	{ a, b }
+
+TRACE_EVENT(memory_failure_event,
+	TP_PROTO(unsigned long pfn,
+		 int type,
+		 int result),
+
+	TP_ARGS(pfn, type, result),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, pfn)
+		__field(int, type)
+		__field(int, result)
+	),
+
+	TP_fast_assign(
+		__entry->pfn	= pfn;
+		__entry->type	= type;
+		__entry->result	= result;
+	),
+
+	TP_printk("pfn %#lx: recovery action for %s: %s",
+		__entry->pfn,
+		__print_symbolic(__entry->type, MF_PAGE_TYPE),
+		__print_symbolic(__entry->result, MF_ACTION_RESULT)
+	)
+);
+#endif /* _TRACE_MEMORY_FAILURE_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/mm/Kconfig b/mm/Kconfig
index a5a90b169435..c3a8e0ba1ac1 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -738,11 +738,10 @@ config ARCH_SUPPORTS_MEMORY_FAILURE
 
 config MEMORY_FAILURE
 	depends on MMU
 	depends on ARCH_SUPPORTS_MEMORY_FAILURE
 	bool "Enable recovery from hardware memory errors"
-	select RAS
 	help
 	  Enables code to recover from some memory failures on systems
 	  with MCA recovery. This allows a system to continue running
 	  even when some of its memory has uncorrected errors. This requires
 	  special hardware support and typically ECC memory.
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index f698df156bf8..a1fe6d760983 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -58,13 +58,16 @@
 #include <linux/kfifo.h>
 #include <linux/ratelimit.h>
 #include <linux/pagewalk.h>
 #include <linux/shmem_fs.h>
 #include <linux/sysctl.h>
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/memory-failure.h>
+
 #include "swap.h"
 #include "internal.h"
-#include "ras/ras_event.h"
 
 #define SOFT_OFFLINE_ENABLED		BIT(0)
 #define SOFT_OFFLINE_SKIP_HUGETLB	BIT(1)
 
 static int sysctl_memory_failure_early_kill __read_mostly;
-- 
2.51.0



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM
  2025-11-04  7:23 [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM Xie Yuanbin
  2025-11-04  7:23 ` [PATCH v2 1/2] " Xie Yuanbin
  2025-11-04  7:23 ` [PATCH v2 2/2] mm/memory-failure: remove the selection of RAS Xie Yuanbin
@ 2025-11-04  9:33 ` David Hildenbrand (Red Hat)
  2025-11-04 13:29   ` Xie Yuanbin
  2025-11-04 13:32   ` Xie Yuanbin
  2025-11-04 14:26 ` Dave Hansen
  3 siblings, 2 replies; 15+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-04  9:33 UTC (permalink / raw)
  To: Xie Yuanbin, david, dave.hansen, bp, tglx, mingo, dave.hansen,
	hpa, akpm, lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb,
	mhocko, linmiaohe, nao.horiguchi, luto, peterz, tony.luck
  Cc: x86, linux-kernel, linux-mm, linux-edac, will, liaohua4, lilinjie8

On 04.11.25 08:23, Xie Yuanbin wrote:
> Memory bit flips are among the most common hardware errors in the server
> and embedded fields, many hardware components have memory verification
> mechanisms, for example ECC. When an error is detected, some hardware or
> architectures report the information to software (OS/BIOS), for example,
> the MCE (Machine Check Exception) on x86.
> 
> Common errors include CE (Correctable Errors) and UE (Uncorrectable
> Errors). When the kernel receives memory error information, if it has the
> memory-failure feature, it can better handle memory errors without reboot.
> For example, kernel can attempt to offline the affected memory by
> migrating it or killing the process. Therefore, this feature is widely
> used in servers and embedded fields.

This is a pretty generic description of MCEs.

I think what we are missing is: who runs 32bit OSes on MCE-capable 
hardware (or VMs?) and needs this to work.

What's the use case?

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/2] mm/memory-failure: remove the selection of RAS
  2025-11-04  7:23 ` [PATCH v2 2/2] mm/memory-failure: remove the selection of RAS Xie Yuanbin
@ 2025-11-04  9:38   ` David Hildenbrand (Red Hat)
  2025-11-04  9:50     ` Xie Yuanbin
  0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-04  9:38 UTC (permalink / raw)
  To: Xie Yuanbin, david, dave.hansen, bp, tglx, mingo, dave.hansen,
	hpa, akpm, lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb,
	mhocko, linmiaohe, nao.horiguchi, luto, peterz, tony.luck
  Cc: x86, linux-kernel, linux-mm, linux-edac, will, liaohua4, lilinjie8

On 04.11.25 08:23, Xie Yuanbin wrote:
> The commit 97f0b13452198290799f ("tracing: add trace event for
> memory-failure") introduces the selection of RAS in memory-failure.
> This commit is just a tracing feature; in reality, there is no dependency
> between memory-failure and RAS. RAS increases the size of the bzImage
> image by 8k, which is very valuable for embedded devices.
> 
> Move the memory-failure traceing code from ras_event.h to
> memory-failure.h and remove the selection of RAS.
> 
> Signed-off-by: Xie Yuanbin <xieyuanbin1@huawei.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> ---

[...]

> +++ b/include/trace/events/memory-failure.h
> @@ -0,0 +1,97 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM ras

This trace system should not be called "ras". All RAS terminology should 
be removed here.

#define TRACE_SYSTEM memory_failure

> +#define TRACE_INCLUDE_FILE memory-failure
> +
> +#if !defined(_TRACE_MEMORY_FAILURE_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_MEMORY_FAILURE_H
> +
> +#include <linux/tracepoint.h>
> +#include <linux/mm.h>
> +
> +/*
> + * memory-failure recovery action result event
> + *
> + * unsigned long pfn -	Page Frame Number of the corrupted page
> + * int type	-	Page types of the corrupted page
> + * int result	-	Result of recovery action
> + */
> +
> +#define MF_ACTION_RESULT	\
> +	EM ( MF_IGNORED, "Ignored" )	\
> +	EM ( MF_FAILED,  "Failed" )	\
> +	EM ( MF_DELAYED, "Delayed" )	\
> +	EMe ( MF_RECOVERED, "Recovered" )
> +
> +#define MF_PAGE_TYPE		\
> +	EM ( MF_MSG_KERNEL, "reserved kernel page" )			\
> +	EM ( MF_MSG_KERNEL_HIGH_ORDER, "high-order kernel page" )	\
> +	EM ( MF_MSG_HUGE, "huge page" )					\
> +	EM ( MF_MSG_FREE_HUGE, "free huge page" )			\
> +	EM ( MF_MSG_GET_HWPOISON, "get hwpoison page" )			\
> +	EM ( MF_MSG_UNMAP_FAILED, "unmapping failed page" )		\
> +	EM ( MF_MSG_DIRTY_SWAPCACHE, "dirty swapcache page" )		\
> +	EM ( MF_MSG_CLEAN_SWAPCACHE, "clean swapcache page" )		\
> +	EM ( MF_MSG_DIRTY_MLOCKED_LRU, "dirty mlocked LRU page" )	\
> +	EM ( MF_MSG_CLEAN_MLOCKED_LRU, "clean mlocked LRU page" )	\
> +	EM ( MF_MSG_DIRTY_UNEVICTABLE_LRU, "dirty unevictable LRU page" )	\
> +	EM ( MF_MSG_CLEAN_UNEVICTABLE_LRU, "clean unevictable LRU page" )	\
> +	EM ( MF_MSG_DIRTY_LRU, "dirty LRU page" )			\
> +	EM ( MF_MSG_CLEAN_LRU, "clean LRU page" )			\
> +	EM ( MF_MSG_TRUNCATED_LRU, "already truncated LRU page" )	\
> +	EM ( MF_MSG_BUDDY, "free buddy page" )				\
> +	EM ( MF_MSG_DAX, "dax page" )					\
> +	EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" )			\
> +	EM ( MF_MSG_ALREADY_POISONED, "already poisoned" )		\
> +	EMe ( MF_MSG_UNKNOWN, "unknown page" )
> +
> +/*
> + * First define the enums in MM_ACTION_RESULT to be exported to userspace
> + * via TRACE_DEFINE_ENUM().
> + */
> +#undef EM
> +#undef EMe
> +#define EM(a, b) TRACE_DEFINE_ENUM(a);
> +#define EMe(a, b)	TRACE_DEFINE_ENUM(a);
> +
> +MF_ACTION_RESULT
> +MF_PAGE_TYPE
> +
> +/*
> + * Now redefine the EM() and EMe() macros to map the enums to the strings
> + * that will be printed in the output.
> + */
> +#undef EM
> +#undef EMe
> +#define EM(a, b)		{ a, b },
> +#define EMe(a, b)	{ a, b }
> +
> +TRACE_EVENT(memory_failure_event,
> +	TP_PROTO(unsigned long pfn,
> +		 int type,
> +		 int result),
> +
> +	TP_ARGS(pfn, type, result),
> +
> +	TP_STRUCT__entry(
> +		__field(unsigned long, pfn)
> +		__field(int, type)
> +		__field(int, result)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->pfn	= pfn;
> +		__entry->type	= type;
> +		__entry->result	= result;
> +	),
> +
> +	TP_printk("pfn %#lx: recovery action for %s: %s",
> +		__entry->pfn,
> +		__print_symbolic(__entry->type, MF_PAGE_TYPE),
> +		__print_symbolic(__entry->result, MF_ACTION_RESULT)
> +	)
> +);
> +#endif /* _TRACE_MEMORY_FAILURE_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>


We want to add that new file to the "HWPOISON MEMORY FAILURE HANDLING" 
section in MAINTAINERS.

Nothing else jumped at me.

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/2] mm/memory-failure: remove the selection of RAS
  2025-11-04  9:38   ` David Hildenbrand (Red Hat)
@ 2025-11-04  9:50     ` Xie Yuanbin
  0 siblings, 0 replies; 15+ messages in thread
From: Xie Yuanbin @ 2025-11-04  9:50 UTC (permalink / raw)
  To: david
  Cc: Liam.Howlett, akpm, bp, dave.hansen, dave.hansen, david, hpa,
	liaohua4, lilinjie8, linmiaohe, linux-edac, linux-kernel,
	linux-mm, lorenzo.stoakes, luto, mhocko, mingo, nao.horiguchi,
	peterz, rppt, surenb, tglx, tony.luck, vbabka, will, x86,
	xieyuanbin1

> This trace system should not be called "ras". All RAS terminology should 
> be removed here.
>
> #define TRACE_SYSTEM memory_failure
>
> We want to add that new file to the "HWPOISON MEMORY FAILURE HANDLING" 
> section in MAINTAINERS.
>
> Nothing else jumped at me.

Thanks, I will modify it in the v3 patches.

> Cheers
>
> David

Xie Yuanbin


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM
  2025-11-04  9:33 ` [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM David Hildenbrand (Red Hat)
@ 2025-11-04 13:29   ` Xie Yuanbin
  2025-11-04 13:32   ` Xie Yuanbin
  1 sibling, 0 replies; 15+ messages in thread
From: Xie Yuanbin @ 2025-11-04 13:29 UTC (permalink / raw)
  To: david, dave.hansen, bp, tglx, mingo, dave.hansen, hpa, akpm,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	linmiaohe, nao.horiguchi, luto, peterz, tony.luck
  Cc: x86, linux-kernel, linux-mm, linux-edac, will, liaohua4,
	lilinjie8, Xie Yuanbin

> This is a pretty generic description of MCEs.
>
> I think what we are missing is: who runs 32bit OSes on MCE-capable 
> hardware (or VMs?) and needs this to work.
>
> What's the use case?

Now, let me try to explain it. From what I understand, it mainly comes
from two aspects:
1. Although almost all new CPUs are 64-bit, there are still many existing
32-bit x86 devices in uses.
2. On some embedded devices, in order to save memory overhead, even with
64-bit CPU hardware, a 32-bit kernel may still be used. You might wonder
why embedded devices need SPARSEMEM. This is because the MEMORY_HOTPLUG
feature depends on SPARSEMEM, not necessarily SPARSEMEM itself.

All of the above devices, the memory-failure feature may be used to
provide reliable memory errors handling, and to minimize service
interruptions as much as possible.

> Cheers
>
> David

Thanks!

Xie Yuanbin

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM
  2025-11-04  9:33 ` [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM David Hildenbrand (Red Hat)
  2025-11-04 13:29   ` Xie Yuanbin
@ 2025-11-04 13:32   ` Xie Yuanbin
  1 sibling, 0 replies; 15+ messages in thread
From: Xie Yuanbin @ 2025-11-04 13:32 UTC (permalink / raw)
  To: david, dave.hansen, bp, tglx, mingo, dave.hansen, hpa, akpm,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	linmiaohe, nao.horiguchi, luto, peterz, tony.luck
  Cc: x86, linux-kernel, linux-mm, linux-edac, will, liaohua4,
	lilinjie8, Xie Yuanbin

The previous email was corrupted; please ignore it.
I'm very sorry about this.

On Tue, 4 Nov 2025 10:33:39 +0100, David Hildenbrand wrote:
> This is a pretty generic description of MCEs.
>
> I think what we are missing is: who runs 32bit OSes on MCE-capable 
> hardware (or VMs?) and needs this to work.
>
> What's the use case?

I did indeed miss this part in my description, and I apologize for that.
Since the memory-failure feature was introduced, from
commit 6a46079cf57a7f7758e8 ("HWPOISON: The high level memory error
handler in the VM v7"), it can be enabled on x86_32, submitting these
patches only because MEMORY_FAILURE cannot be enabled together with
SPARSEMEM on x86_32. The memory-failure was introduced in 2009, when
64-bit hardware was not even very popular yet, and the first caller of
`memory_failure()` is from x86's MCE.
Even in latest version, with default i386_defconfig, MEMORY_FAILURE can be
enabled directly on x86_32, because i386_defconfig does not enable
SPARSEMEM by default.
Therefore, I did not consider the need to explain why MEMORY_FAILURE needs
to be enabled on the x86_32.

Now, let me try to explain it. From what I understand, it mainly comes
from two aspects:
1. Although almost all new CPUs are 64-bit, there are still many existing
32-bit x86 devices in uses.
2. On some embedded devices, in order to save memory overhead, even with
64-bit CPU hardware, a 32-bit kernel may still be used. You might wonder
why embedded devices need SPARSEMEM. This is because the MEMORY_HOTPLUG
feature depends on SPARSEMEM, not necessarily SPARSEMEM itself.

All of the above devices, the memory-failure feature may be used to
provide reliable memory errors handling, and to minimize service
interruptions as much as possible.

> Cheers
>
> David

Thanks!

Xie Yuanbin

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM
  2025-11-04  7:23 [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM Xie Yuanbin
                   ` (2 preceding siblings ...)
  2025-11-04  9:33 ` [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM David Hildenbrand (Red Hat)
@ 2025-11-04 14:26 ` Dave Hansen
  2025-11-05  2:45   ` Xie Yuanbin
  3 siblings, 1 reply; 15+ messages in thread
From: Dave Hansen @ 2025-11-04 14:26 UTC (permalink / raw)
  To: Xie Yuanbin, david, bp, tglx, mingo, dave.hansen, hpa, akpm,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	linmiaohe, nao.horiguchi, luto, peterz, tony.luck
  Cc: x86, linux-kernel, linux-mm, linux-edac, will, liaohua4, lilinjie8

On 11/3/25 23:23, Xie Yuanbin wrote:
> Memory bit flips are among the most common hardware errors in the server
> and embedded fields, many hardware components have memory verification
> mechanisms, for example ECC. When an error is detected, some hardware or
> architectures report the information to software (OS/BIOS), for example,
> the MCE (Machine Check Exception) on x86.
> 
> Common errors include CE (Correctable Errors) and UE (Uncorrectable
> Errors). When the kernel receives memory error information, if it has the
> memory-failure feature, it can better handle memory errors without reboot.
> For example, kernel can attempt to offline the affected memory by
> migrating it or killing the process. Therefore, this feature is widely
> used in servers and embedded fields.
> 
> For historical versions, memory-failure cannot be enabled with x86_32 &&
> SPARSEMEM because the number of page-flags are insufficient. However, this
> issue has been resolved in the current version, and this patch will allow
> SPARSEMEM and memory-failure to be enabled together on x86_32.
> 
> By the way, due to increased demand, DRAM prices have recently
> skyrocketed, making memory-failure potentially even more valuable in the
> coming years.

Which LLM generated that for you, btw?

I wanted to know _specifically_ what kind of hardware or 32-bit
environment you wanted to support with this series, though.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM
  2025-11-04 14:26 ` Dave Hansen
@ 2025-11-05  2:45   ` Xie Yuanbin
  2025-11-05  8:12     ` David Hildenbrand (Red Hat)
  0 siblings, 1 reply; 15+ messages in thread
From: Xie Yuanbin @ 2025-11-05  2:45 UTC (permalink / raw)
  To: dave.hansen, david
  Cc: Liam.Howlett, akpm, bp, dave.hansen, david, hpa, liaohua4,
	lilinjie8, linmiaohe, linux-edac, linux-kernel, linux-mm,
	lorenzo.stoakes, luto, mhocko, mingo, nao.horiguchi, peterz,
	rppt, surenb, tglx, tony.luck, vbabka, will, x86, xieyuanbin1

On Tue, 4 Nov 2025 06:26:58 -0800, Dave Hansen wrote:
> Which LLM generated that for you, btw?

I wrote this myself; LLM just helped me with the translation. My English
isn't very good, so I apologize for any mistakes.

> I wanted to know _specifically_ what kind of hardware or 32-bit
> environment you wanted to support with this series, though.

I think I have explained it clearly enough in this email:
Link: https://lore.kernel.org/20251104133254.145660-1-xieyuanbin1@huawei.com

In simple terms, it refers to some old existing equipment and some
embedded devices. More specifically, it includes some routers, switches,
and similar devices. From what I know, there is no VM environment that
using it.
If you are asking about a specific CPU chip model, I'm sorry, but I may
not be able to provide that information for you.

Btw, why do you only ask about which x86_32 devices use memory-failure,
but not which x86_32 devices use sparsemem? This patch just allows both
to coexist, and perhaps both are important?

Thanks!

Xie Yuanbin

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM
  2025-11-05  2:45   ` Xie Yuanbin
@ 2025-11-05  8:12     ` David Hildenbrand (Red Hat)
  2025-11-05  9:05       ` Xie Yuanbin
  0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-05  8:12 UTC (permalink / raw)
  To: Xie Yuanbin, dave.hansen
  Cc: Liam.Howlett, akpm, bp, dave.hansen, hpa, liaohua4, lilinjie8,
	linmiaohe, linux-edac, linux-kernel, linux-mm, lorenzo.stoakes,
	luto, mhocko, mingo, nao.horiguchi, peterz, rppt, surenb, tglx,
	tony.luck, vbabka, will, x86

On 05.11.25 03:45, Xie Yuanbin wrote:
> On Tue, 4 Nov 2025 06:26:58 -0800, Dave Hansen wrote:
>> Which LLM generated that for you, btw?
> 
> I wrote this myself; LLM just helped me with the translation. My English
> isn't very good, so I apologize for any mistakes.
> 
>> I wanted to know _specifically_ what kind of hardware or 32-bit
>> environment you wanted to support with this series, though.
> 
> I think I have explained it clearly enough in this email:
> Link: https://lore.kernel.org/20251104133254.145660-1-xieyuanbin1@huawei.com
> 
> In simple terms, it refers to some old existing equipment and some
> embedded devices. More specifically, it includes some routers, switches,
> and similar devices. From what I know, there is no VM environment that
> using it.
> If you are asking about a specific CPU chip model, I'm sorry, but I may
> not be able to provide that information for you.
> 
> Btw, why do you only ask about which x86_32 devices use memory-failure,
> but not which x86_32 devices use sparsemem? This patch just allows both
> to coexist, and perhaps both are important?

Let me clarify what we need to know:

Will you (or your employer) be running such updated 32bit kernels on 
hardware that supports MCEs.

In other words: is this change driver by *real demand* or just by "oh 
look, we can enable that now, I can come up with a theoretical use case 
but I don't know if anybody would actually care"?

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM
  2025-11-05  8:12     ` David Hildenbrand (Red Hat)
@ 2025-11-05  9:05       ` Xie Yuanbin
  2025-11-17  2:09         ` Xie Yuanbin
  0 siblings, 1 reply; 15+ messages in thread
From: Xie Yuanbin @ 2025-11-05  9:05 UTC (permalink / raw)
  To: david, dave.hansen
  Cc: Liam.Howlett, akpm, bp, dave.hansen, hpa, liaohua4, lilinjie8,
	linmiaohe, linux-edac, linux-kernel, linux-mm, lorenzo.stoakes,
	luto, mhocko, mingo, nao.horiguchi, peterz, rppt, surenb, tglx,
	tony.luck, vbabka, will, x86, xieyuanbin1

On Wed, 5 Nov 2025 09:12:04 +0100, Dave Hansen wrote:
> Let me clarify what we need to know:
>
> Will you (or your employer) be running such updated 32bit kernels on
> hardware that supports MCEs.
>
> In other words: is this change driver by *real demand*

Thanks! Asking like this, I completely understand now.

We won't directly upgrade the kernel to 6.18.x (or later versions) to use
this feature, but if Linux community approves these patches, we will
backport it to 5.10.x and use it. I know that the page-flags in 5.10.x
have been exhausted, but we can work around them by adjusting
SECTION_SIZE_BITS/MAX_PHYSMEM_BITS to free up a page flag.
Another patch I submitted for arm32:
Link: https://lore.kernel.org/20250922021453.3939-1-xieyuanbin1@huawei.com
, follows the same logic.

Currently, there is a clear demand for ARM32, while the demand for x86 is
still under discussion.

> or just by "oh
> look, we can enable that now, I can come up with a theoretical use case
> but I don't know if anybody would actually care"?

It can also be said that way. In fact, when developing the demand
"support MEMORY_FAILURE for 32-bit OS" in version 5.10.x, I found that the
latest version already supported this feature, so I submitted these
patches, and hope others can benefit from it as well.

> Cheers
>
> David

Thanks!

Xie Yuanbin

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM
  2025-11-05  9:05       ` Xie Yuanbin
@ 2025-11-17  2:09         ` Xie Yuanbin
  2025-11-17 13:03           ` David Hildenbrand (Red Hat)
  0 siblings, 1 reply; 15+ messages in thread
From: Xie Yuanbin @ 2025-11-17  2:09 UTC (permalink / raw)
  To: xieyuanbin1, david, dave.hansen, david
  Cc: Liam.Howlett, akpm, bp, dave.hansen, hpa, liaohua4, lilinjie8,
	linmiaohe, linux-edac, linux-kernel, linux-mm, lorenzo.stoakes,
	luto, mhocko, mingo, nao.horiguchi, peterz, rppt, surenb, tglx,
	tony.luck, vbabka, will, x86

On Wed, 5 Nov 2025 17:05:36 +0800, Xie Yuanbin wrote:
> On Wed, 5 Nov 2025 09:12:04 +0100, Dave Hansen wrote:
>> Let me clarify what we need to know:
>>
>> Will you (or your employer) be running such updated 32bit kernels on
>> hardware that supports MCEs.
>>
>> In other words: is this change driver by *real demand*
>
> Thanks! Asking like this, I completely understand now.
>
> We won't directly upgrade the kernel to 6.18.x (or later versions) to use
> this feature, but if Linux community approves these patches, we will
> backport it to 5.10.x and use it. I know that the page-flags in 5.10.x
> have been exhausted, but we can work around them by adjusting
> SECTION_SIZE_BITS/MAX_PHYSMEM_BITS to free up a page flag.
> Another patch I submitted for arm32:
> Link: https://lore.kernel.org/20250922021453.3939-1-xieyuanbin1@huawei.com
> , follows the same logic.
>
> Currently, there is a clear demand for ARM32, while the demand for x86 is
> still under discussion.
>
>> or just by "oh
>> look, we can enable that now, I can come up with a theoretical use case
>> but I don't know if anybody would actually care"?
>
> It can also be said that way. In fact, when developing the demand
> "support MEMORY_FAILURE for 32-bit OS" in version 5.10.x, I found that the
> latest version already supported this feature, so I submitted these
> patches, and hope others can benefit from it as well.

Hello, David Hildenbrand and Dave Hansen!

Do you have any other comments on this patch? If you think that
supporting memory-failure on x86_32 is meaningless, I will only submit
patch 2 in the v3 patches.

Thank you very much!

Xie Yuanbin


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM
  2025-11-17  2:09         ` Xie Yuanbin
@ 2025-11-17 13:03           ` David Hildenbrand (Red Hat)
  2025-11-18  8:09             ` Xie Yuanbin
  0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-17 13:03 UTC (permalink / raw)
  To: Xie Yuanbin, dave.hansen
  Cc: Liam.Howlett, akpm, bp, dave.hansen, hpa, liaohua4, lilinjie8,
	linmiaohe, linux-edac, linux-kernel, linux-mm, lorenzo.stoakes,
	luto, mhocko, mingo, nao.horiguchi, peterz, rppt, surenb, tglx,
	tony.luck, vbabka, will, x86

On 17.11.25 03:09, Xie Yuanbin wrote:
> On Wed, 5 Nov 2025 17:05:36 +0800, Xie Yuanbin wrote:
>> On Wed, 5 Nov 2025 09:12:04 +0100, Dave Hansen wrote:
>>> Let me clarify what we need to know:
>>>
>>> Will you (or your employer) be running such updated 32bit kernels on
>>> hardware that supports MCEs.
>>>
>>> In other words: is this change driver by *real demand*
>>
>> Thanks! Asking like this, I completely understand now.
>>
>> We won't directly upgrade the kernel to 6.18.x (or later versions) to use
>> this feature, but if Linux community approves these patches, we will
>> backport it to 5.10.x and use it. I know that the page-flags in 5.10.x
>> have been exhausted, but we can work around them by adjusting
>> SECTION_SIZE_BITS/MAX_PHYSMEM_BITS to free up a page flag.
>> Another patch I submitted for arm32:
>> Link: https://lore.kernel.org/20250922021453.3939-1-xieyuanbin1@huawei.com
>> , follows the same logic.
>>
>> Currently, there is a clear demand for ARM32, while the demand for x86 is
>> still under discussion.
>>
>>> or just by "oh
>>> look, we can enable that now, I can come up with a theoretical use case
>>> but I don't know if anybody would actually care"?
>>
>> It can also be said that way. In fact, when developing the demand
>> "support MEMORY_FAILURE for 32-bit OS" in version 5.10.x, I found that the
>> latest version already supported this feature, so I submitted these
>> patches, and hope others can benefit from it as well.
> 
> Hello, David Hildenbrand and Dave Hansen!
> 
> Do you have any other comments on this patch? If you think that
> supporting memory-failure on x86_32 is meaningless, I will only submit
> patch 2 in the v3 patches.

I'd say, if nobody will really make use of that right now (customer 
request etc), just leave x86 alone for now.

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM
  2025-11-17 13:03           ` David Hildenbrand (Red Hat)
@ 2025-11-18  8:09             ` Xie Yuanbin
  0 siblings, 0 replies; 15+ messages in thread
From: Xie Yuanbin @ 2025-11-18  8:09 UTC (permalink / raw)
  To: david
  Cc: Liam.Howlett, akpm, bp, dave.hansen, dave.hansen, hpa, liaohua4,
	lilinjie8, linmiaohe, linux-edac, linux-kernel, linux-mm,
	lorenzo.stoakes, luto, mhocko, mingo, nao.horiguchi, peterz,
	rppt, surenb, tglx, tony.luck, vbabka, will, x86, xieyuanbin1

On Wed, Mon, 17 Nov 2025 14:03:46 +0100, David Hildenbrand wrote:
> I'd say, if nobody will really make use of that right now (customer 
> request etc), just leave x86 alone for now.

Okay, thanks, I will only submit patch 2 in the V3 patches.

On Tue, 4 Nov 2025 10:38:54 +0100, David Hildenbrand wrote:
Link: https://lore.kernel.org/01b44e0f-ea2e-406f-9f65-b698b5504f42@kernel.org
> This trace system should not be called "ras". All RAS terminology should 
> be removed here.
>
> #define TRACE_SYSTEM memory_failure
>
> We want to add that new file to the "HWPOISON MEMORY FAILURE HANDLING"
> section in MAINTAINERS.
>
> Nothing else jumped at me.

Can I add an
"Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>"
in the patch 2?

The full patch will be:
```patch
From: Xie Yuanbin <xieyuanbin1@huawei.com>
Subject: [PATCH v3] mm/memory-failure: remove the selection of RAS

The commit 97f0b13452198290799f ("tracing: add trace event for
memory-failure") introduces the selection of RAS in memory-failure.
This commit is just a tracing feature; in reality, there is no dependency
between memory-failure and RAS. RAS increases the size of the bzImage
image by 8k, which is very valuable for embedded devices.

Move the memory-failure traceing code from ras_event.h to
memory-failure.h and remove the selection of RAS.

v2->v3: https://lore.kernel.org/20251104072306.100738-3-xieyuanbin1@huawei.com
  - Change define TRACE_SYSTEM from ras to memory_failure
  - Add include/trace/events/memory-failure.h to
    "HWPOISON MEMORY FAILURE HANDLING" section in MAINTAINERS
  - Rebase to latest linux-next source

v1->v2: https://lore.kernel.org/20251103033536.52234-2-xieyuanbin1@huawei.com
  - Move the memory-failure traceing code from ras_event.h to
    memory-failure.h

Signed-off-by: Xie Yuanbin <xieyuanbin1@huawei.com>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
---
 MAINTAINERS                           |  1 +
 include/ras/ras_event.h               | 87 ------------------------
 include/trace/events/memory-failure.h | 98 +++++++++++++++++++++++++++
 mm/Kconfig                            |  1 -
 mm/memory-failure.c                   |  5 +-
 5 files changed, 103 insertions(+), 89 deletions(-)
 create mode 100644 include/trace/events/memory-failure.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 7310d9ca0370..43d6eb95fb05 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11631,10 +11631,11 @@ R:	Naoya Horiguchi <nao.horiguchi@gmail.com>
 L:	linux-mm@kvack.org
 S:	Maintained
 F:	include/linux/memory-failure.h
 F:	mm/hwpoison-inject.c
 F:	mm/memory-failure.c
+F:	include/trace/events/memory-failure.h
 
 HYCON HY46XX TOUCHSCREEN SUPPORT
 M:	Giulio Benetti <giulio.benetti@benettiengineering.com>
 L:	linux-input@vger.kernel.org
 S:	Maintained
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index fecfeb7c8be7..1e5e87020eef 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -10,11 +10,10 @@
 #include <linux/edac.h>
 #include <linux/ktime.h>
 #include <linux/pci.h>
 #include <linux/aer.h>
 #include <linux/cper.h>
-#include <linux/mm.h>
 
 /*
  * MCE Extended Error Log trace event
  *
  * These events are generated when hardware detects a corrected or
@@ -337,95 +336,9 @@ TRACE_EVENT(aer_event,
 		__entry->tlp_header_valid ?
 			__print_array(__entry->tlp_header, PCIE_STD_MAX_TLP_HEADERLOG, 4) :
 			"Not available")
 );
 #endif /* CONFIG_PCIEAER */
-
-/*
- * memory-failure recovery action result event
- *
- * unsigned long pfn -	Page Frame Number of the corrupted page
- * int type	-	Page types of the corrupted page
- * int result	-	Result of recovery action
- */
-
-#ifdef CONFIG_MEMORY_FAILURE
-#define MF_ACTION_RESULT	\
-	EM ( MF_IGNORED, "Ignored" )	\
-	EM ( MF_FAILED,  "Failed" )	\
-	EM ( MF_DELAYED, "Delayed" )	\
-	EMe ( MF_RECOVERED, "Recovered" )
-
-#define MF_PAGE_TYPE		\
-	EM ( MF_MSG_KERNEL, "reserved kernel page" )			\
-	EM ( MF_MSG_KERNEL_HIGH_ORDER, "high-order kernel page" )	\
-	EM ( MF_MSG_HUGE, "huge page" )					\
-	EM ( MF_MSG_FREE_HUGE, "free huge page" )			\
-	EM ( MF_MSG_GET_HWPOISON, "get hwpoison page" )			\
-	EM ( MF_MSG_UNMAP_FAILED, "unmapping failed page" )		\
-	EM ( MF_MSG_DIRTY_SWAPCACHE, "dirty swapcache page" )		\
-	EM ( MF_MSG_CLEAN_SWAPCACHE, "clean swapcache page" )		\
-	EM ( MF_MSG_DIRTY_MLOCKED_LRU, "dirty mlocked LRU page" )	\
-	EM ( MF_MSG_CLEAN_MLOCKED_LRU, "clean mlocked LRU page" )	\
-	EM ( MF_MSG_DIRTY_UNEVICTABLE_LRU, "dirty unevictable LRU page" )	\
-	EM ( MF_MSG_CLEAN_UNEVICTABLE_LRU, "clean unevictable LRU page" )	\
-	EM ( MF_MSG_DIRTY_LRU, "dirty LRU page" )			\
-	EM ( MF_MSG_CLEAN_LRU, "clean LRU page" )			\
-	EM ( MF_MSG_TRUNCATED_LRU, "already truncated LRU page" )	\
-	EM ( MF_MSG_BUDDY, "free buddy page" )				\
-	EM ( MF_MSG_DAX, "dax page" )					\
-	EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" )			\
-	EM ( MF_MSG_ALREADY_POISONED, "already poisoned" )		\
-	EM ( MF_MSG_PFN_MAP, "non struct page pfn" )                    \
-	EMe ( MF_MSG_UNKNOWN, "unknown page" )
-
-/*
- * First define the enums in MM_ACTION_RESULT to be exported to userspace
- * via TRACE_DEFINE_ENUM().
- */
-#undef EM
-#undef EMe
-#define EM(a, b) TRACE_DEFINE_ENUM(a);
-#define EMe(a, b)	TRACE_DEFINE_ENUM(a);
-
-MF_ACTION_RESULT
-MF_PAGE_TYPE
-
-/*
- * Now redefine the EM() and EMe() macros to map the enums to the strings
- * that will be printed in the output.
- */
-#undef EM
-#undef EMe
-#define EM(a, b)		{ a, b },
-#define EMe(a, b)	{ a, b }
-
-TRACE_EVENT(memory_failure_event,
-	TP_PROTO(unsigned long pfn,
-		 int type,
-		 int result),
-
-	TP_ARGS(pfn, type, result),
-
-	TP_STRUCT__entry(
-		__field(unsigned long, pfn)
-		__field(int, type)
-		__field(int, result)
-	),
-
-	TP_fast_assign(
-		__entry->pfn	= pfn;
-		__entry->type	= type;
-		__entry->result	= result;
-	),
-
-	TP_printk("pfn %#lx: recovery action for %s: %s",
-		__entry->pfn,
-		__print_symbolic(__entry->type, MF_PAGE_TYPE),
-		__print_symbolic(__entry->result, MF_ACTION_RESULT)
-	)
-);
-#endif /* CONFIG_MEMORY_FAILURE */
 #endif /* _TRACE_HW_EVENT_MC_H */
 
 /* This part must be outside protection */
 #include <trace/define_trace.h>
diff --git a/include/trace/events/memory-failure.h b/include/trace/events/memory-failure.h
new file mode 100644
index 000000000000..aa57cc8f896b
--- /dev/null
+++ b/include/trace/events/memory-failure.h
@@ -0,0 +1,98 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM memory_failure
+#define TRACE_INCLUDE_FILE memory-failure
+
+#if !defined(_TRACE_MEMORY_FAILURE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_MEMORY_FAILURE_H
+
+#include <linux/tracepoint.h>
+#include <linux/mm.h>
+
+/*
+ * memory-failure recovery action result event
+ *
+ * unsigned long pfn -	Page Frame Number of the corrupted page
+ * int type	-	Page types of the corrupted page
+ * int result	-	Result of recovery action
+ */
+
+#define MF_ACTION_RESULT	\
+	EM ( MF_IGNORED, "Ignored" )	\
+	EM ( MF_FAILED,  "Failed" )	\
+	EM ( MF_DELAYED, "Delayed" )	\
+	EMe ( MF_RECOVERED, "Recovered" )
+
+#define MF_PAGE_TYPE		\
+	EM ( MF_MSG_KERNEL, "reserved kernel page" )			\
+	EM ( MF_MSG_KERNEL_HIGH_ORDER, "high-order kernel page" )	\
+	EM ( MF_MSG_HUGE, "huge page" )					\
+	EM ( MF_MSG_FREE_HUGE, "free huge page" )			\
+	EM ( MF_MSG_GET_HWPOISON, "get hwpoison page" )			\
+	EM ( MF_MSG_UNMAP_FAILED, "unmapping failed page" )		\
+	EM ( MF_MSG_DIRTY_SWAPCACHE, "dirty swapcache page" )		\
+	EM ( MF_MSG_CLEAN_SWAPCACHE, "clean swapcache page" )		\
+	EM ( MF_MSG_DIRTY_MLOCKED_LRU, "dirty mlocked LRU page" )	\
+	EM ( MF_MSG_CLEAN_MLOCKED_LRU, "clean mlocked LRU page" )	\
+	EM ( MF_MSG_DIRTY_UNEVICTABLE_LRU, "dirty unevictable LRU page" )	\
+	EM ( MF_MSG_CLEAN_UNEVICTABLE_LRU, "clean unevictable LRU page" )	\
+	EM ( MF_MSG_DIRTY_LRU, "dirty LRU page" )			\
+	EM ( MF_MSG_CLEAN_LRU, "clean LRU page" )			\
+	EM ( MF_MSG_TRUNCATED_LRU, "already truncated LRU page" )	\
+	EM ( MF_MSG_BUDDY, "free buddy page" )				\
+	EM ( MF_MSG_DAX, "dax page" )					\
+	EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" )			\
+	EM ( MF_MSG_ALREADY_POISONED, "already poisoned" )		\
+	EM ( MF_MSG_PFN_MAP, "non struct page pfn" )                    \
+	EMe ( MF_MSG_UNKNOWN, "unknown page" )
+
+/*
+ * First define the enums in MM_ACTION_RESULT to be exported to userspace
+ * via TRACE_DEFINE_ENUM().
+ */
+#undef EM
+#undef EMe
+#define EM(a, b) TRACE_DEFINE_ENUM(a);
+#define EMe(a, b)	TRACE_DEFINE_ENUM(a);
+
+MF_ACTION_RESULT
+MF_PAGE_TYPE
+
+/*
+ * Now redefine the EM() and EMe() macros to map the enums to the strings
+ * that will be printed in the output.
+ */
+#undef EM
+#undef EMe
+#define EM(a, b)		{ a, b },
+#define EMe(a, b)	{ a, b }
+
+TRACE_EVENT(memory_failure_event,
+	TP_PROTO(unsigned long pfn,
+		 int type,
+		 int result),
+
+	TP_ARGS(pfn, type, result),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, pfn)
+		__field(int, type)
+		__field(int, result)
+	),
+
+	TP_fast_assign(
+		__entry->pfn	= pfn;
+		__entry->type	= type;
+		__entry->result	= result;
+	),
+
+	TP_printk("pfn %#lx: recovery action for %s: %s",
+		__entry->pfn,
+		__print_symbolic(__entry->type, MF_PAGE_TYPE),
+		__print_symbolic(__entry->result, MF_ACTION_RESULT)
+	)
+);
+#endif /* _TRACE_MEMORY_FAILURE_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/mm/Kconfig b/mm/Kconfig
index d548976d0e0a..bd0ea5454af8 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -738,11 +738,10 @@ config ARCH_SUPPORTS_MEMORY_FAILURE
 
 config MEMORY_FAILURE
 	depends on MMU
 	depends on ARCH_SUPPORTS_MEMORY_FAILURE
 	bool "Enable recovery from hardware memory errors"
-	select RAS
 	select INTERVAL_TREE
 	help
 	  Enables code to recover from some memory failures on systems
 	  with MCA recovery. This allows a system to continue running
 	  even when some of its memory has uncorrected errors. This requires
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 7f908ad795ad..fbc5a01260c8 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -59,13 +59,16 @@
 #include <linux/kfifo.h>
 #include <linux/ratelimit.h>
 #include <linux/pagewalk.h>
 #include <linux/shmem_fs.h>
 #include <linux/sysctl.h>
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/memory-failure.h>
+
 #include "swap.h"
 #include "internal.h"
-#include "ras/ras_event.h"
 
 static int sysctl_memory_failure_early_kill __read_mostly;
 
 static int sysctl_memory_failure_recovery __read_mostly = 1;
 
-- 
2.51.0
```

Thanks very much.

Xie Yuanbin


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-11-18  8:09 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-04  7:23 [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM Xie Yuanbin
2025-11-04  7:23 ` [PATCH v2 1/2] " Xie Yuanbin
2025-11-04  7:23 ` [PATCH v2 2/2] mm/memory-failure: remove the selection of RAS Xie Yuanbin
2025-11-04  9:38   ` David Hildenbrand (Red Hat)
2025-11-04  9:50     ` Xie Yuanbin
2025-11-04  9:33 ` [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM David Hildenbrand (Red Hat)
2025-11-04 13:29   ` Xie Yuanbin
2025-11-04 13:32   ` Xie Yuanbin
2025-11-04 14:26 ` Dave Hansen
2025-11-05  2:45   ` Xie Yuanbin
2025-11-05  8:12     ` David Hildenbrand (Red Hat)
2025-11-05  9:05       ` Xie Yuanbin
2025-11-17  2:09         ` Xie Yuanbin
2025-11-17 13:03           ` David Hildenbrand (Red Hat)
2025-11-18  8:09             ` Xie Yuanbin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox