* [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM
@ 2025-11-04 7:23 Xie Yuanbin
2025-11-04 7:23 ` [PATCH v2 1/2] " Xie Yuanbin
` (3 more replies)
0 siblings, 4 replies; 15+ messages in thread
From: Xie Yuanbin @ 2025-11-04 7:23 UTC (permalink / raw)
To: david, dave.hansen, bp, tglx, mingo, dave.hansen, hpa, akpm,
lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
linmiaohe, nao.horiguchi, luto, peterz, tony.luck
Cc: x86, linux-kernel, linux-mm, linux-edac, will, liaohua4,
lilinjie8, Xie Yuanbin
Memory bit flips are among the most common hardware errors in the server
and embedded fields, many hardware components have memory verification
mechanisms, for example ECC. When an error is detected, some hardware or
architectures report the information to software (OS/BIOS), for example,
the MCE (Machine Check Exception) on x86.
Common errors include CE (Correctable Errors) and UE (Uncorrectable
Errors). When the kernel receives memory error information, if it has the
memory-failure feature, it can better handle memory errors without reboot.
For example, kernel can attempt to offline the affected memory by
migrating it or killing the process. Therefore, this feature is widely
used in servers and embedded fields.
For historical versions, memory-failure cannot be enabled with x86_32 &&
SPARSEMEM because the number of page-flags are insufficient. However, this
issue has been resolved in the current version, and this patch will allow
SPARSEMEM and memory-failure to be enabled together on x86_32.
By the way, due to increased demand, DRAM prices have recently
skyrocketed, making memory-failure potentially even more valuable in the
coming years.
v1-v2: https://lore.kernel.org/20251103033536.52234-1-xieyuanbin1@huawei.com
- Describe the purpose of these patches in the cover letter.
- Correct the description of historical changes to page flags.
- Move the memory-failure traceing code from ras_event.h to
memory-failure.h
Xie Yuanbin (2):
x86/mm: support memory-failure on 32-bits with SPARSEMEM
mm/memory-failure: remove the selection of RAS
arch/x86/Kconfig | 3 -
include/ras/ras_event.h | 86 ------------------------
include/trace/events/memory-failure.h | 97 +++++++++++++++++++++++++++
mm/Kconfig | 1 -
mm/memory-failure.c | 5 +-
5 files changed, 101 insertions(+), 91 deletions(-)
create mode 100644 include/trace/events/memory-failure.h
--
2.51.0
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH v2 1/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM 2025-11-04 7:23 [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM Xie Yuanbin @ 2025-11-04 7:23 ` Xie Yuanbin 2025-11-04 7:23 ` [PATCH v2 2/2] mm/memory-failure: remove the selection of RAS Xie Yuanbin ` (2 subsequent siblings) 3 siblings, 0 replies; 15+ messages in thread From: Xie Yuanbin @ 2025-11-04 7:23 UTC (permalink / raw) To: david, dave.hansen, bp, tglx, mingo, dave.hansen, hpa, akpm, lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko, linmiaohe, nao.horiguchi, luto, peterz, tony.luck Cc: x86, linux-kernel, linux-mm, linux-edac, will, liaohua4, lilinjie8, Xie Yuanbin The historical commit d949f36f1865c60239d4 ("x86: Fix hwpoison code related build failure on 32-bit NUMAQ"), disabled x86_32's memory-failure when SPARSEMEM is enabled, because the number of page-flags are insufficient. The commit 46df8e73a4a3f1445f2a ("mm: free up PG_slab") removes PG_slab flag that allows MEMORY_FAILURE to be enabled from here on. The commit 09022bc196d23484a7a5 ("mm: remove PG_error") removes PG_error flag. The commit cceba6f7e46c48deca43 ("mm: add PG_dropbehind folio flag") add PG_dropbehind flag, but MEMORY_FAILURE can still be enabled. For the current version, for x86_32, when SPARSEMEM && HIGHMEM && X86_PAE && X86_PAT, the number of pageflags reaches its maximum value, which is 31. Therefore, MEMORY_FAILURE can be safely enabled. Signed-off-by: Xie Yuanbin <xieyuanbin1@huawei.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> --- arch/x86/Kconfig | 3 --- 1 file changed, 3 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index d55c01efd7c2..f9ee57a55500 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -750,13 +750,10 @@ config IOSF_MBI_DEBUG config X86_SUPPORTS_MEMORY_FAILURE def_bool y # MCE code calls memory_failure(): depends on X86_MCE - # On 32-bit this adds too big of NODES_SHIFT and we run out of page flags: - # On 32-bit SPARSEMEM adds too big of SECTIONS_WIDTH: - depends on X86_64 || !SPARSEMEM select ARCH_SUPPORTS_MEMORY_FAILURE config X86_32_IRIS tristate "Eurobraille/Iris poweroff module" depends on X86_32 -- 2.51.0 ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 2/2] mm/memory-failure: remove the selection of RAS 2025-11-04 7:23 [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM Xie Yuanbin 2025-11-04 7:23 ` [PATCH v2 1/2] " Xie Yuanbin @ 2025-11-04 7:23 ` Xie Yuanbin 2025-11-04 9:38 ` David Hildenbrand (Red Hat) 2025-11-04 9:33 ` [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM David Hildenbrand (Red Hat) 2025-11-04 14:26 ` Dave Hansen 3 siblings, 1 reply; 15+ messages in thread From: Xie Yuanbin @ 2025-11-04 7:23 UTC (permalink / raw) To: david, dave.hansen, bp, tglx, mingo, dave.hansen, hpa, akpm, lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko, linmiaohe, nao.horiguchi, luto, peterz, tony.luck Cc: x86, linux-kernel, linux-mm, linux-edac, will, liaohua4, lilinjie8, Xie Yuanbin The commit 97f0b13452198290799f ("tracing: add trace event for memory-failure") introduces the selection of RAS in memory-failure. This commit is just a tracing feature; in reality, there is no dependency between memory-failure and RAS. RAS increases the size of the bzImage image by 8k, which is very valuable for embedded devices. Move the memory-failure traceing code from ras_event.h to memory-failure.h and remove the selection of RAS. Signed-off-by: Xie Yuanbin <xieyuanbin1@huawei.com> Cc: David Hildenbrand <david@redhat.com> Cc: Borislav Petkov <bp@alien8.de> --- include/ras/ras_event.h | 86 ------------------------ include/trace/events/memory-failure.h | 97 +++++++++++++++++++++++++++ mm/Kconfig | 1 - mm/memory-failure.c | 5 +- 4 files changed, 101 insertions(+), 88 deletions(-) create mode 100644 include/trace/events/memory-failure.h diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h index c8cd0f00c845..1e5e87020eef 100644 --- a/include/ras/ras_event.h +++ b/include/ras/ras_event.h @@ -10,11 +10,10 @@ #include <linux/edac.h> #include <linux/ktime.h> #include <linux/pci.h> #include <linux/aer.h> #include <linux/cper.h> -#include <linux/mm.h> /* * MCE Extended Error Log trace event * * These events are generated when hardware detects a corrected or @@ -337,94 +336,9 @@ TRACE_EVENT(aer_event, __entry->tlp_header_valid ? __print_array(__entry->tlp_header, PCIE_STD_MAX_TLP_HEADERLOG, 4) : "Not available") ); #endif /* CONFIG_PCIEAER */ - -/* - * memory-failure recovery action result event - * - * unsigned long pfn - Page Frame Number of the corrupted page - * int type - Page types of the corrupted page - * int result - Result of recovery action - */ - -#ifdef CONFIG_MEMORY_FAILURE -#define MF_ACTION_RESULT \ - EM ( MF_IGNORED, "Ignored" ) \ - EM ( MF_FAILED, "Failed" ) \ - EM ( MF_DELAYED, "Delayed" ) \ - EMe ( MF_RECOVERED, "Recovered" ) - -#define MF_PAGE_TYPE \ - EM ( MF_MSG_KERNEL, "reserved kernel page" ) \ - EM ( MF_MSG_KERNEL_HIGH_ORDER, "high-order kernel page" ) \ - EM ( MF_MSG_HUGE, "huge page" ) \ - EM ( MF_MSG_FREE_HUGE, "free huge page" ) \ - EM ( MF_MSG_GET_HWPOISON, "get hwpoison page" ) \ - EM ( MF_MSG_UNMAP_FAILED, "unmapping failed page" ) \ - EM ( MF_MSG_DIRTY_SWAPCACHE, "dirty swapcache page" ) \ - EM ( MF_MSG_CLEAN_SWAPCACHE, "clean swapcache page" ) \ - EM ( MF_MSG_DIRTY_MLOCKED_LRU, "dirty mlocked LRU page" ) \ - EM ( MF_MSG_CLEAN_MLOCKED_LRU, "clean mlocked LRU page" ) \ - EM ( MF_MSG_DIRTY_UNEVICTABLE_LRU, "dirty unevictable LRU page" ) \ - EM ( MF_MSG_CLEAN_UNEVICTABLE_LRU, "clean unevictable LRU page" ) \ - EM ( MF_MSG_DIRTY_LRU, "dirty LRU page" ) \ - EM ( MF_MSG_CLEAN_LRU, "clean LRU page" ) \ - EM ( MF_MSG_TRUNCATED_LRU, "already truncated LRU page" ) \ - EM ( MF_MSG_BUDDY, "free buddy page" ) \ - EM ( MF_MSG_DAX, "dax page" ) \ - EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" ) \ - EM ( MF_MSG_ALREADY_POISONED, "already poisoned" ) \ - EMe ( MF_MSG_UNKNOWN, "unknown page" ) - -/* - * First define the enums in MM_ACTION_RESULT to be exported to userspace - * via TRACE_DEFINE_ENUM(). - */ -#undef EM -#undef EMe -#define EM(a, b) TRACE_DEFINE_ENUM(a); -#define EMe(a, b) TRACE_DEFINE_ENUM(a); - -MF_ACTION_RESULT -MF_PAGE_TYPE - -/* - * Now redefine the EM() and EMe() macros to map the enums to the strings - * that will be printed in the output. - */ -#undef EM -#undef EMe -#define EM(a, b) { a, b }, -#define EMe(a, b) { a, b } - -TRACE_EVENT(memory_failure_event, - TP_PROTO(unsigned long pfn, - int type, - int result), - - TP_ARGS(pfn, type, result), - - TP_STRUCT__entry( - __field(unsigned long, pfn) - __field(int, type) - __field(int, result) - ), - - TP_fast_assign( - __entry->pfn = pfn; - __entry->type = type; - __entry->result = result; - ), - - TP_printk("pfn %#lx: recovery action for %s: %s", - __entry->pfn, - __print_symbolic(__entry->type, MF_PAGE_TYPE), - __print_symbolic(__entry->result, MF_ACTION_RESULT) - ) -); -#endif /* CONFIG_MEMORY_FAILURE */ #endif /* _TRACE_HW_EVENT_MC_H */ /* This part must be outside protection */ #include <trace/define_trace.h> diff --git a/include/trace/events/memory-failure.h b/include/trace/events/memory-failure.h new file mode 100644 index 000000000000..6c88fb624bd7 --- /dev/null +++ b/include/trace/events/memory-failure.h @@ -0,0 +1,97 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM ras +#define TRACE_INCLUDE_FILE memory-failure + +#if !defined(_TRACE_MEMORY_FAILURE_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_MEMORY_FAILURE_H + +#include <linux/tracepoint.h> +#include <linux/mm.h> + +/* + * memory-failure recovery action result event + * + * unsigned long pfn - Page Frame Number of the corrupted page + * int type - Page types of the corrupted page + * int result - Result of recovery action + */ + +#define MF_ACTION_RESULT \ + EM ( MF_IGNORED, "Ignored" ) \ + EM ( MF_FAILED, "Failed" ) \ + EM ( MF_DELAYED, "Delayed" ) \ + EMe ( MF_RECOVERED, "Recovered" ) + +#define MF_PAGE_TYPE \ + EM ( MF_MSG_KERNEL, "reserved kernel page" ) \ + EM ( MF_MSG_KERNEL_HIGH_ORDER, "high-order kernel page" ) \ + EM ( MF_MSG_HUGE, "huge page" ) \ + EM ( MF_MSG_FREE_HUGE, "free huge page" ) \ + EM ( MF_MSG_GET_HWPOISON, "get hwpoison page" ) \ + EM ( MF_MSG_UNMAP_FAILED, "unmapping failed page" ) \ + EM ( MF_MSG_DIRTY_SWAPCACHE, "dirty swapcache page" ) \ + EM ( MF_MSG_CLEAN_SWAPCACHE, "clean swapcache page" ) \ + EM ( MF_MSG_DIRTY_MLOCKED_LRU, "dirty mlocked LRU page" ) \ + EM ( MF_MSG_CLEAN_MLOCKED_LRU, "clean mlocked LRU page" ) \ + EM ( MF_MSG_DIRTY_UNEVICTABLE_LRU, "dirty unevictable LRU page" ) \ + EM ( MF_MSG_CLEAN_UNEVICTABLE_LRU, "clean unevictable LRU page" ) \ + EM ( MF_MSG_DIRTY_LRU, "dirty LRU page" ) \ + EM ( MF_MSG_CLEAN_LRU, "clean LRU page" ) \ + EM ( MF_MSG_TRUNCATED_LRU, "already truncated LRU page" ) \ + EM ( MF_MSG_BUDDY, "free buddy page" ) \ + EM ( MF_MSG_DAX, "dax page" ) \ + EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" ) \ + EM ( MF_MSG_ALREADY_POISONED, "already poisoned" ) \ + EMe ( MF_MSG_UNKNOWN, "unknown page" ) + +/* + * First define the enums in MM_ACTION_RESULT to be exported to userspace + * via TRACE_DEFINE_ENUM(). + */ +#undef EM +#undef EMe +#define EM(a, b) TRACE_DEFINE_ENUM(a); +#define EMe(a, b) TRACE_DEFINE_ENUM(a); + +MF_ACTION_RESULT +MF_PAGE_TYPE + +/* + * Now redefine the EM() and EMe() macros to map the enums to the strings + * that will be printed in the output. + */ +#undef EM +#undef EMe +#define EM(a, b) { a, b }, +#define EMe(a, b) { a, b } + +TRACE_EVENT(memory_failure_event, + TP_PROTO(unsigned long pfn, + int type, + int result), + + TP_ARGS(pfn, type, result), + + TP_STRUCT__entry( + __field(unsigned long, pfn) + __field(int, type) + __field(int, result) + ), + + TP_fast_assign( + __entry->pfn = pfn; + __entry->type = type; + __entry->result = result; + ), + + TP_printk("pfn %#lx: recovery action for %s: %s", + __entry->pfn, + __print_symbolic(__entry->type, MF_PAGE_TYPE), + __print_symbolic(__entry->result, MF_ACTION_RESULT) + ) +); +#endif /* _TRACE_MEMORY_FAILURE_H */ + +/* This part must be outside protection */ +#include <trace/define_trace.h> diff --git a/mm/Kconfig b/mm/Kconfig index a5a90b169435..c3a8e0ba1ac1 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -738,11 +738,10 @@ config ARCH_SUPPORTS_MEMORY_FAILURE config MEMORY_FAILURE depends on MMU depends on ARCH_SUPPORTS_MEMORY_FAILURE bool "Enable recovery from hardware memory errors" - select RAS help Enables code to recover from some memory failures on systems with MCA recovery. This allows a system to continue running even when some of its memory has uncorrected errors. This requires special hardware support and typically ECC memory. diff --git a/mm/memory-failure.c b/mm/memory-failure.c index f698df156bf8..a1fe6d760983 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -58,13 +58,16 @@ #include <linux/kfifo.h> #include <linux/ratelimit.h> #include <linux/pagewalk.h> #include <linux/shmem_fs.h> #include <linux/sysctl.h> + +#define CREATE_TRACE_POINTS +#include <trace/events/memory-failure.h> + #include "swap.h" #include "internal.h" -#include "ras/ras_event.h" #define SOFT_OFFLINE_ENABLED BIT(0) #define SOFT_OFFLINE_SKIP_HUGETLB BIT(1) static int sysctl_memory_failure_early_kill __read_mostly; -- 2.51.0 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 2/2] mm/memory-failure: remove the selection of RAS 2025-11-04 7:23 ` [PATCH v2 2/2] mm/memory-failure: remove the selection of RAS Xie Yuanbin @ 2025-11-04 9:38 ` David Hildenbrand (Red Hat) 2025-11-04 9:50 ` Xie Yuanbin 0 siblings, 1 reply; 15+ messages in thread From: David Hildenbrand (Red Hat) @ 2025-11-04 9:38 UTC (permalink / raw) To: Xie Yuanbin, david, dave.hansen, bp, tglx, mingo, dave.hansen, hpa, akpm, lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko, linmiaohe, nao.horiguchi, luto, peterz, tony.luck Cc: x86, linux-kernel, linux-mm, linux-edac, will, liaohua4, lilinjie8 On 04.11.25 08:23, Xie Yuanbin wrote: > The commit 97f0b13452198290799f ("tracing: add trace event for > memory-failure") introduces the selection of RAS in memory-failure. > This commit is just a tracing feature; in reality, there is no dependency > between memory-failure and RAS. RAS increases the size of the bzImage > image by 8k, which is very valuable for embedded devices. > > Move the memory-failure traceing code from ras_event.h to > memory-failure.h and remove the selection of RAS. > > Signed-off-by: Xie Yuanbin <xieyuanbin1@huawei.com> > Cc: David Hildenbrand <david@redhat.com> > Cc: Borislav Petkov <bp@alien8.de> > --- [...] > +++ b/include/trace/events/memory-failure.h > @@ -0,0 +1,97 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +#undef TRACE_SYSTEM > +#define TRACE_SYSTEM ras This trace system should not be called "ras". All RAS terminology should be removed here. #define TRACE_SYSTEM memory_failure > +#define TRACE_INCLUDE_FILE memory-failure > + > +#if !defined(_TRACE_MEMORY_FAILURE_H) || defined(TRACE_HEADER_MULTI_READ) > +#define _TRACE_MEMORY_FAILURE_H > + > +#include <linux/tracepoint.h> > +#include <linux/mm.h> > + > +/* > + * memory-failure recovery action result event > + * > + * unsigned long pfn - Page Frame Number of the corrupted page > + * int type - Page types of the corrupted page > + * int result - Result of recovery action > + */ > + > +#define MF_ACTION_RESULT \ > + EM ( MF_IGNORED, "Ignored" ) \ > + EM ( MF_FAILED, "Failed" ) \ > + EM ( MF_DELAYED, "Delayed" ) \ > + EMe ( MF_RECOVERED, "Recovered" ) > + > +#define MF_PAGE_TYPE \ > + EM ( MF_MSG_KERNEL, "reserved kernel page" ) \ > + EM ( MF_MSG_KERNEL_HIGH_ORDER, "high-order kernel page" ) \ > + EM ( MF_MSG_HUGE, "huge page" ) \ > + EM ( MF_MSG_FREE_HUGE, "free huge page" ) \ > + EM ( MF_MSG_GET_HWPOISON, "get hwpoison page" ) \ > + EM ( MF_MSG_UNMAP_FAILED, "unmapping failed page" ) \ > + EM ( MF_MSG_DIRTY_SWAPCACHE, "dirty swapcache page" ) \ > + EM ( MF_MSG_CLEAN_SWAPCACHE, "clean swapcache page" ) \ > + EM ( MF_MSG_DIRTY_MLOCKED_LRU, "dirty mlocked LRU page" ) \ > + EM ( MF_MSG_CLEAN_MLOCKED_LRU, "clean mlocked LRU page" ) \ > + EM ( MF_MSG_DIRTY_UNEVICTABLE_LRU, "dirty unevictable LRU page" ) \ > + EM ( MF_MSG_CLEAN_UNEVICTABLE_LRU, "clean unevictable LRU page" ) \ > + EM ( MF_MSG_DIRTY_LRU, "dirty LRU page" ) \ > + EM ( MF_MSG_CLEAN_LRU, "clean LRU page" ) \ > + EM ( MF_MSG_TRUNCATED_LRU, "already truncated LRU page" ) \ > + EM ( MF_MSG_BUDDY, "free buddy page" ) \ > + EM ( MF_MSG_DAX, "dax page" ) \ > + EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" ) \ > + EM ( MF_MSG_ALREADY_POISONED, "already poisoned" ) \ > + EMe ( MF_MSG_UNKNOWN, "unknown page" ) > + > +/* > + * First define the enums in MM_ACTION_RESULT to be exported to userspace > + * via TRACE_DEFINE_ENUM(). > + */ > +#undef EM > +#undef EMe > +#define EM(a, b) TRACE_DEFINE_ENUM(a); > +#define EMe(a, b) TRACE_DEFINE_ENUM(a); > + > +MF_ACTION_RESULT > +MF_PAGE_TYPE > + > +/* > + * Now redefine the EM() and EMe() macros to map the enums to the strings > + * that will be printed in the output. > + */ > +#undef EM > +#undef EMe > +#define EM(a, b) { a, b }, > +#define EMe(a, b) { a, b } > + > +TRACE_EVENT(memory_failure_event, > + TP_PROTO(unsigned long pfn, > + int type, > + int result), > + > + TP_ARGS(pfn, type, result), > + > + TP_STRUCT__entry( > + __field(unsigned long, pfn) > + __field(int, type) > + __field(int, result) > + ), > + > + TP_fast_assign( > + __entry->pfn = pfn; > + __entry->type = type; > + __entry->result = result; > + ), > + > + TP_printk("pfn %#lx: recovery action for %s: %s", > + __entry->pfn, > + __print_symbolic(__entry->type, MF_PAGE_TYPE), > + __print_symbolic(__entry->result, MF_ACTION_RESULT) > + ) > +); > +#endif /* _TRACE_MEMORY_FAILURE_H */ > + > +/* This part must be outside protection */ > +#include <trace/define_trace.h> We want to add that new file to the "HWPOISON MEMORY FAILURE HANDLING" section in MAINTAINERS. Nothing else jumped at me. -- Cheers David ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 2/2] mm/memory-failure: remove the selection of RAS 2025-11-04 9:38 ` David Hildenbrand (Red Hat) @ 2025-11-04 9:50 ` Xie Yuanbin 0 siblings, 0 replies; 15+ messages in thread From: Xie Yuanbin @ 2025-11-04 9:50 UTC (permalink / raw) To: david Cc: Liam.Howlett, akpm, bp, dave.hansen, dave.hansen, david, hpa, liaohua4, lilinjie8, linmiaohe, linux-edac, linux-kernel, linux-mm, lorenzo.stoakes, luto, mhocko, mingo, nao.horiguchi, peterz, rppt, surenb, tglx, tony.luck, vbabka, will, x86, xieyuanbin1 > This trace system should not be called "ras". All RAS terminology should > be removed here. > > #define TRACE_SYSTEM memory_failure > > We want to add that new file to the "HWPOISON MEMORY FAILURE HANDLING" > section in MAINTAINERS. > > Nothing else jumped at me. Thanks, I will modify it in the v3 patches. > Cheers > > David Xie Yuanbin ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM 2025-11-04 7:23 [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM Xie Yuanbin 2025-11-04 7:23 ` [PATCH v2 1/2] " Xie Yuanbin 2025-11-04 7:23 ` [PATCH v2 2/2] mm/memory-failure: remove the selection of RAS Xie Yuanbin @ 2025-11-04 9:33 ` David Hildenbrand (Red Hat) 2025-11-04 13:29 ` Xie Yuanbin 2025-11-04 13:32 ` Xie Yuanbin 2025-11-04 14:26 ` Dave Hansen 3 siblings, 2 replies; 15+ messages in thread From: David Hildenbrand (Red Hat) @ 2025-11-04 9:33 UTC (permalink / raw) To: Xie Yuanbin, david, dave.hansen, bp, tglx, mingo, dave.hansen, hpa, akpm, lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko, linmiaohe, nao.horiguchi, luto, peterz, tony.luck Cc: x86, linux-kernel, linux-mm, linux-edac, will, liaohua4, lilinjie8 On 04.11.25 08:23, Xie Yuanbin wrote: > Memory bit flips are among the most common hardware errors in the server > and embedded fields, many hardware components have memory verification > mechanisms, for example ECC. When an error is detected, some hardware or > architectures report the information to software (OS/BIOS), for example, > the MCE (Machine Check Exception) on x86. > > Common errors include CE (Correctable Errors) and UE (Uncorrectable > Errors). When the kernel receives memory error information, if it has the > memory-failure feature, it can better handle memory errors without reboot. > For example, kernel can attempt to offline the affected memory by > migrating it or killing the process. Therefore, this feature is widely > used in servers and embedded fields. This is a pretty generic description of MCEs. I think what we are missing is: who runs 32bit OSes on MCE-capable hardware (or VMs?) and needs this to work. What's the use case? -- Cheers David ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM 2025-11-04 9:33 ` [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM David Hildenbrand (Red Hat) @ 2025-11-04 13:29 ` Xie Yuanbin 2025-11-04 13:32 ` Xie Yuanbin 1 sibling, 0 replies; 15+ messages in thread From: Xie Yuanbin @ 2025-11-04 13:29 UTC (permalink / raw) To: david, dave.hansen, bp, tglx, mingo, dave.hansen, hpa, akpm, lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko, linmiaohe, nao.horiguchi, luto, peterz, tony.luck Cc: x86, linux-kernel, linux-mm, linux-edac, will, liaohua4, lilinjie8, Xie Yuanbin > This is a pretty generic description of MCEs. > > I think what we are missing is: who runs 32bit OSes on MCE-capable > hardware (or VMs?) and needs this to work. > > What's the use case? Now, let me try to explain it. From what I understand, it mainly comes from two aspects: 1. Although almost all new CPUs are 64-bit, there are still many existing 32-bit x86 devices in uses. 2. On some embedded devices, in order to save memory overhead, even with 64-bit CPU hardware, a 32-bit kernel may still be used. You might wonder why embedded devices need SPARSEMEM. This is because the MEMORY_HOTPLUG feature depends on SPARSEMEM, not necessarily SPARSEMEM itself. All of the above devices, the memory-failure feature may be used to provide reliable memory errors handling, and to minimize service interruptions as much as possible. > Cheers > > David Thanks! Xie Yuanbin ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM 2025-11-04 9:33 ` [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM David Hildenbrand (Red Hat) 2025-11-04 13:29 ` Xie Yuanbin @ 2025-11-04 13:32 ` Xie Yuanbin 1 sibling, 0 replies; 15+ messages in thread From: Xie Yuanbin @ 2025-11-04 13:32 UTC (permalink / raw) To: david, dave.hansen, bp, tglx, mingo, dave.hansen, hpa, akpm, lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko, linmiaohe, nao.horiguchi, luto, peterz, tony.luck Cc: x86, linux-kernel, linux-mm, linux-edac, will, liaohua4, lilinjie8, Xie Yuanbin The previous email was corrupted; please ignore it. I'm very sorry about this. On Tue, 4 Nov 2025 10:33:39 +0100, David Hildenbrand wrote: > This is a pretty generic description of MCEs. > > I think what we are missing is: who runs 32bit OSes on MCE-capable > hardware (or VMs?) and needs this to work. > > What's the use case? I did indeed miss this part in my description, and I apologize for that. Since the memory-failure feature was introduced, from commit 6a46079cf57a7f7758e8 ("HWPOISON: The high level memory error handler in the VM v7"), it can be enabled on x86_32, submitting these patches only because MEMORY_FAILURE cannot be enabled together with SPARSEMEM on x86_32. The memory-failure was introduced in 2009, when 64-bit hardware was not even very popular yet, and the first caller of `memory_failure()` is from x86's MCE. Even in latest version, with default i386_defconfig, MEMORY_FAILURE can be enabled directly on x86_32, because i386_defconfig does not enable SPARSEMEM by default. Therefore, I did not consider the need to explain why MEMORY_FAILURE needs to be enabled on the x86_32. Now, let me try to explain it. From what I understand, it mainly comes from two aspects: 1. Although almost all new CPUs are 64-bit, there are still many existing 32-bit x86 devices in uses. 2. On some embedded devices, in order to save memory overhead, even with 64-bit CPU hardware, a 32-bit kernel may still be used. You might wonder why embedded devices need SPARSEMEM. This is because the MEMORY_HOTPLUG feature depends on SPARSEMEM, not necessarily SPARSEMEM itself. All of the above devices, the memory-failure feature may be used to provide reliable memory errors handling, and to minimize service interruptions as much as possible. > Cheers > > David Thanks! Xie Yuanbin ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM 2025-11-04 7:23 [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM Xie Yuanbin ` (2 preceding siblings ...) 2025-11-04 9:33 ` [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM David Hildenbrand (Red Hat) @ 2025-11-04 14:26 ` Dave Hansen 2025-11-05 2:45 ` Xie Yuanbin 3 siblings, 1 reply; 15+ messages in thread From: Dave Hansen @ 2025-11-04 14:26 UTC (permalink / raw) To: Xie Yuanbin, david, bp, tglx, mingo, dave.hansen, hpa, akpm, lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko, linmiaohe, nao.horiguchi, luto, peterz, tony.luck Cc: x86, linux-kernel, linux-mm, linux-edac, will, liaohua4, lilinjie8 On 11/3/25 23:23, Xie Yuanbin wrote: > Memory bit flips are among the most common hardware errors in the server > and embedded fields, many hardware components have memory verification > mechanisms, for example ECC. When an error is detected, some hardware or > architectures report the information to software (OS/BIOS), for example, > the MCE (Machine Check Exception) on x86. > > Common errors include CE (Correctable Errors) and UE (Uncorrectable > Errors). When the kernel receives memory error information, if it has the > memory-failure feature, it can better handle memory errors without reboot. > For example, kernel can attempt to offline the affected memory by > migrating it or killing the process. Therefore, this feature is widely > used in servers and embedded fields. > > For historical versions, memory-failure cannot be enabled with x86_32 && > SPARSEMEM because the number of page-flags are insufficient. However, this > issue has been resolved in the current version, and this patch will allow > SPARSEMEM and memory-failure to be enabled together on x86_32. > > By the way, due to increased demand, DRAM prices have recently > skyrocketed, making memory-failure potentially even more valuable in the > coming years. Which LLM generated that for you, btw? I wanted to know _specifically_ what kind of hardware or 32-bit environment you wanted to support with this series, though. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM 2025-11-04 14:26 ` Dave Hansen @ 2025-11-05 2:45 ` Xie Yuanbin 2025-11-05 8:12 ` David Hildenbrand (Red Hat) 0 siblings, 1 reply; 15+ messages in thread From: Xie Yuanbin @ 2025-11-05 2:45 UTC (permalink / raw) To: dave.hansen, david Cc: Liam.Howlett, akpm, bp, dave.hansen, david, hpa, liaohua4, lilinjie8, linmiaohe, linux-edac, linux-kernel, linux-mm, lorenzo.stoakes, luto, mhocko, mingo, nao.horiguchi, peterz, rppt, surenb, tglx, tony.luck, vbabka, will, x86, xieyuanbin1 On Tue, 4 Nov 2025 06:26:58 -0800, Dave Hansen wrote: > Which LLM generated that for you, btw? I wrote this myself; LLM just helped me with the translation. My English isn't very good, so I apologize for any mistakes. > I wanted to know _specifically_ what kind of hardware or 32-bit > environment you wanted to support with this series, though. I think I have explained it clearly enough in this email: Link: https://lore.kernel.org/20251104133254.145660-1-xieyuanbin1@huawei.com In simple terms, it refers to some old existing equipment and some embedded devices. More specifically, it includes some routers, switches, and similar devices. From what I know, there is no VM environment that using it. If you are asking about a specific CPU chip model, I'm sorry, but I may not be able to provide that information for you. Btw, why do you only ask about which x86_32 devices use memory-failure, but not which x86_32 devices use sparsemem? This patch just allows both to coexist, and perhaps both are important? Thanks! Xie Yuanbin ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM 2025-11-05 2:45 ` Xie Yuanbin @ 2025-11-05 8:12 ` David Hildenbrand (Red Hat) 2025-11-05 9:05 ` Xie Yuanbin 0 siblings, 1 reply; 15+ messages in thread From: David Hildenbrand (Red Hat) @ 2025-11-05 8:12 UTC (permalink / raw) To: Xie Yuanbin, dave.hansen Cc: Liam.Howlett, akpm, bp, dave.hansen, hpa, liaohua4, lilinjie8, linmiaohe, linux-edac, linux-kernel, linux-mm, lorenzo.stoakes, luto, mhocko, mingo, nao.horiguchi, peterz, rppt, surenb, tglx, tony.luck, vbabka, will, x86 On 05.11.25 03:45, Xie Yuanbin wrote: > On Tue, 4 Nov 2025 06:26:58 -0800, Dave Hansen wrote: >> Which LLM generated that for you, btw? > > I wrote this myself; LLM just helped me with the translation. My English > isn't very good, so I apologize for any mistakes. > >> I wanted to know _specifically_ what kind of hardware or 32-bit >> environment you wanted to support with this series, though. > > I think I have explained it clearly enough in this email: > Link: https://lore.kernel.org/20251104133254.145660-1-xieyuanbin1@huawei.com > > In simple terms, it refers to some old existing equipment and some > embedded devices. More specifically, it includes some routers, switches, > and similar devices. From what I know, there is no VM environment that > using it. > If you are asking about a specific CPU chip model, I'm sorry, but I may > not be able to provide that information for you. > > Btw, why do you only ask about which x86_32 devices use memory-failure, > but not which x86_32 devices use sparsemem? This patch just allows both > to coexist, and perhaps both are important? Let me clarify what we need to know: Will you (or your employer) be running such updated 32bit kernels on hardware that supports MCEs. In other words: is this change driver by *real demand* or just by "oh look, we can enable that now, I can come up with a theoretical use case but I don't know if anybody would actually care"? -- Cheers David ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM 2025-11-05 8:12 ` David Hildenbrand (Red Hat) @ 2025-11-05 9:05 ` Xie Yuanbin 2025-11-17 2:09 ` Xie Yuanbin 0 siblings, 1 reply; 15+ messages in thread From: Xie Yuanbin @ 2025-11-05 9:05 UTC (permalink / raw) To: david, dave.hansen Cc: Liam.Howlett, akpm, bp, dave.hansen, hpa, liaohua4, lilinjie8, linmiaohe, linux-edac, linux-kernel, linux-mm, lorenzo.stoakes, luto, mhocko, mingo, nao.horiguchi, peterz, rppt, surenb, tglx, tony.luck, vbabka, will, x86, xieyuanbin1 On Wed, 5 Nov 2025 09:12:04 +0100, Dave Hansen wrote: > Let me clarify what we need to know: > > Will you (or your employer) be running such updated 32bit kernels on > hardware that supports MCEs. > > In other words: is this change driver by *real demand* Thanks! Asking like this, I completely understand now. We won't directly upgrade the kernel to 6.18.x (or later versions) to use this feature, but if Linux community approves these patches, we will backport it to 5.10.x and use it. I know that the page-flags in 5.10.x have been exhausted, but we can work around them by adjusting SECTION_SIZE_BITS/MAX_PHYSMEM_BITS to free up a page flag. Another patch I submitted for arm32: Link: https://lore.kernel.org/20250922021453.3939-1-xieyuanbin1@huawei.com , follows the same logic. Currently, there is a clear demand for ARM32, while the demand for x86 is still under discussion. > or just by "oh > look, we can enable that now, I can come up with a theoretical use case > but I don't know if anybody would actually care"? It can also be said that way. In fact, when developing the demand "support MEMORY_FAILURE for 32-bit OS" in version 5.10.x, I found that the latest version already supported this feature, so I submitted these patches, and hope others can benefit from it as well. > Cheers > > David Thanks! Xie Yuanbin ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM 2025-11-05 9:05 ` Xie Yuanbin @ 2025-11-17 2:09 ` Xie Yuanbin 2025-11-17 13:03 ` David Hildenbrand (Red Hat) 0 siblings, 1 reply; 15+ messages in thread From: Xie Yuanbin @ 2025-11-17 2:09 UTC (permalink / raw) To: xieyuanbin1, david, dave.hansen, david Cc: Liam.Howlett, akpm, bp, dave.hansen, hpa, liaohua4, lilinjie8, linmiaohe, linux-edac, linux-kernel, linux-mm, lorenzo.stoakes, luto, mhocko, mingo, nao.horiguchi, peterz, rppt, surenb, tglx, tony.luck, vbabka, will, x86 On Wed, 5 Nov 2025 17:05:36 +0800, Xie Yuanbin wrote: > On Wed, 5 Nov 2025 09:12:04 +0100, Dave Hansen wrote: >> Let me clarify what we need to know: >> >> Will you (or your employer) be running such updated 32bit kernels on >> hardware that supports MCEs. >> >> In other words: is this change driver by *real demand* > > Thanks! Asking like this, I completely understand now. > > We won't directly upgrade the kernel to 6.18.x (or later versions) to use > this feature, but if Linux community approves these patches, we will > backport it to 5.10.x and use it. I know that the page-flags in 5.10.x > have been exhausted, but we can work around them by adjusting > SECTION_SIZE_BITS/MAX_PHYSMEM_BITS to free up a page flag. > Another patch I submitted for arm32: > Link: https://lore.kernel.org/20250922021453.3939-1-xieyuanbin1@huawei.com > , follows the same logic. > > Currently, there is a clear demand for ARM32, while the demand for x86 is > still under discussion. > >> or just by "oh >> look, we can enable that now, I can come up with a theoretical use case >> but I don't know if anybody would actually care"? > > It can also be said that way. In fact, when developing the demand > "support MEMORY_FAILURE for 32-bit OS" in version 5.10.x, I found that the > latest version already supported this feature, so I submitted these > patches, and hope others can benefit from it as well. Hello, David Hildenbrand and Dave Hansen! Do you have any other comments on this patch? If you think that supporting memory-failure on x86_32 is meaningless, I will only submit patch 2 in the v3 patches. Thank you very much! Xie Yuanbin ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM 2025-11-17 2:09 ` Xie Yuanbin @ 2025-11-17 13:03 ` David Hildenbrand (Red Hat) 2025-11-18 8:09 ` Xie Yuanbin 0 siblings, 1 reply; 15+ messages in thread From: David Hildenbrand (Red Hat) @ 2025-11-17 13:03 UTC (permalink / raw) To: Xie Yuanbin, dave.hansen Cc: Liam.Howlett, akpm, bp, dave.hansen, hpa, liaohua4, lilinjie8, linmiaohe, linux-edac, linux-kernel, linux-mm, lorenzo.stoakes, luto, mhocko, mingo, nao.horiguchi, peterz, rppt, surenb, tglx, tony.luck, vbabka, will, x86 On 17.11.25 03:09, Xie Yuanbin wrote: > On Wed, 5 Nov 2025 17:05:36 +0800, Xie Yuanbin wrote: >> On Wed, 5 Nov 2025 09:12:04 +0100, Dave Hansen wrote: >>> Let me clarify what we need to know: >>> >>> Will you (or your employer) be running such updated 32bit kernels on >>> hardware that supports MCEs. >>> >>> In other words: is this change driver by *real demand* >> >> Thanks! Asking like this, I completely understand now. >> >> We won't directly upgrade the kernel to 6.18.x (or later versions) to use >> this feature, but if Linux community approves these patches, we will >> backport it to 5.10.x and use it. I know that the page-flags in 5.10.x >> have been exhausted, but we can work around them by adjusting >> SECTION_SIZE_BITS/MAX_PHYSMEM_BITS to free up a page flag. >> Another patch I submitted for arm32: >> Link: https://lore.kernel.org/20250922021453.3939-1-xieyuanbin1@huawei.com >> , follows the same logic. >> >> Currently, there is a clear demand for ARM32, while the demand for x86 is >> still under discussion. >> >>> or just by "oh >>> look, we can enable that now, I can come up with a theoretical use case >>> but I don't know if anybody would actually care"? >> >> It can also be said that way. In fact, when developing the demand >> "support MEMORY_FAILURE for 32-bit OS" in version 5.10.x, I found that the >> latest version already supported this feature, so I submitted these >> patches, and hope others can benefit from it as well. > > Hello, David Hildenbrand and Dave Hansen! > > Do you have any other comments on this patch? If you think that > supporting memory-failure on x86_32 is meaningless, I will only submit > patch 2 in the v3 patches. I'd say, if nobody will really make use of that right now (customer request etc), just leave x86 alone for now. -- Cheers David ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM 2025-11-17 13:03 ` David Hildenbrand (Red Hat) @ 2025-11-18 8:09 ` Xie Yuanbin 0 siblings, 0 replies; 15+ messages in thread From: Xie Yuanbin @ 2025-11-18 8:09 UTC (permalink / raw) To: david Cc: Liam.Howlett, akpm, bp, dave.hansen, dave.hansen, hpa, liaohua4, lilinjie8, linmiaohe, linux-edac, linux-kernel, linux-mm, lorenzo.stoakes, luto, mhocko, mingo, nao.horiguchi, peterz, rppt, surenb, tglx, tony.luck, vbabka, will, x86, xieyuanbin1 On Wed, Mon, 17 Nov 2025 14:03:46 +0100, David Hildenbrand wrote: > I'd say, if nobody will really make use of that right now (customer > request etc), just leave x86 alone for now. Okay, thanks, I will only submit patch 2 in the V3 patches. On Tue, 4 Nov 2025 10:38:54 +0100, David Hildenbrand wrote: Link: https://lore.kernel.org/01b44e0f-ea2e-406f-9f65-b698b5504f42@kernel.org > This trace system should not be called "ras". All RAS terminology should > be removed here. > > #define TRACE_SYSTEM memory_failure > > We want to add that new file to the "HWPOISON MEMORY FAILURE HANDLING" > section in MAINTAINERS. > > Nothing else jumped at me. Can I add an "Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>" in the patch 2? The full patch will be: ```patch From: Xie Yuanbin <xieyuanbin1@huawei.com> Subject: [PATCH v3] mm/memory-failure: remove the selection of RAS The commit 97f0b13452198290799f ("tracing: add trace event for memory-failure") introduces the selection of RAS in memory-failure. This commit is just a tracing feature; in reality, there is no dependency between memory-failure and RAS. RAS increases the size of the bzImage image by 8k, which is very valuable for embedded devices. Move the memory-failure traceing code from ras_event.h to memory-failure.h and remove the selection of RAS. v2->v3: https://lore.kernel.org/20251104072306.100738-3-xieyuanbin1@huawei.com - Change define TRACE_SYSTEM from ras to memory_failure - Add include/trace/events/memory-failure.h to "HWPOISON MEMORY FAILURE HANDLING" section in MAINTAINERS - Rebase to latest linux-next source v1->v2: https://lore.kernel.org/20251103033536.52234-2-xieyuanbin1@huawei.com - Move the memory-failure traceing code from ras_event.h to memory-failure.h Signed-off-by: Xie Yuanbin <xieyuanbin1@huawei.com> Cc: David Hildenbrand (Red Hat) <david@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> --- MAINTAINERS | 1 + include/ras/ras_event.h | 87 ------------------------ include/trace/events/memory-failure.h | 98 +++++++++++++++++++++++++++ mm/Kconfig | 1 - mm/memory-failure.c | 5 +- 5 files changed, 103 insertions(+), 89 deletions(-) create mode 100644 include/trace/events/memory-failure.h diff --git a/MAINTAINERS b/MAINTAINERS index 7310d9ca0370..43d6eb95fb05 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -11631,10 +11631,11 @@ R: Naoya Horiguchi <nao.horiguchi@gmail.com> L: linux-mm@kvack.org S: Maintained F: include/linux/memory-failure.h F: mm/hwpoison-inject.c F: mm/memory-failure.c +F: include/trace/events/memory-failure.h HYCON HY46XX TOUCHSCREEN SUPPORT M: Giulio Benetti <giulio.benetti@benettiengineering.com> L: linux-input@vger.kernel.org S: Maintained diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h index fecfeb7c8be7..1e5e87020eef 100644 --- a/include/ras/ras_event.h +++ b/include/ras/ras_event.h @@ -10,11 +10,10 @@ #include <linux/edac.h> #include <linux/ktime.h> #include <linux/pci.h> #include <linux/aer.h> #include <linux/cper.h> -#include <linux/mm.h> /* * MCE Extended Error Log trace event * * These events are generated when hardware detects a corrected or @@ -337,95 +336,9 @@ TRACE_EVENT(aer_event, __entry->tlp_header_valid ? __print_array(__entry->tlp_header, PCIE_STD_MAX_TLP_HEADERLOG, 4) : "Not available") ); #endif /* CONFIG_PCIEAER */ - -/* - * memory-failure recovery action result event - * - * unsigned long pfn - Page Frame Number of the corrupted page - * int type - Page types of the corrupted page - * int result - Result of recovery action - */ - -#ifdef CONFIG_MEMORY_FAILURE -#define MF_ACTION_RESULT \ - EM ( MF_IGNORED, "Ignored" ) \ - EM ( MF_FAILED, "Failed" ) \ - EM ( MF_DELAYED, "Delayed" ) \ - EMe ( MF_RECOVERED, "Recovered" ) - -#define MF_PAGE_TYPE \ - EM ( MF_MSG_KERNEL, "reserved kernel page" ) \ - EM ( MF_MSG_KERNEL_HIGH_ORDER, "high-order kernel page" ) \ - EM ( MF_MSG_HUGE, "huge page" ) \ - EM ( MF_MSG_FREE_HUGE, "free huge page" ) \ - EM ( MF_MSG_GET_HWPOISON, "get hwpoison page" ) \ - EM ( MF_MSG_UNMAP_FAILED, "unmapping failed page" ) \ - EM ( MF_MSG_DIRTY_SWAPCACHE, "dirty swapcache page" ) \ - EM ( MF_MSG_CLEAN_SWAPCACHE, "clean swapcache page" ) \ - EM ( MF_MSG_DIRTY_MLOCKED_LRU, "dirty mlocked LRU page" ) \ - EM ( MF_MSG_CLEAN_MLOCKED_LRU, "clean mlocked LRU page" ) \ - EM ( MF_MSG_DIRTY_UNEVICTABLE_LRU, "dirty unevictable LRU page" ) \ - EM ( MF_MSG_CLEAN_UNEVICTABLE_LRU, "clean unevictable LRU page" ) \ - EM ( MF_MSG_DIRTY_LRU, "dirty LRU page" ) \ - EM ( MF_MSG_CLEAN_LRU, "clean LRU page" ) \ - EM ( MF_MSG_TRUNCATED_LRU, "already truncated LRU page" ) \ - EM ( MF_MSG_BUDDY, "free buddy page" ) \ - EM ( MF_MSG_DAX, "dax page" ) \ - EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" ) \ - EM ( MF_MSG_ALREADY_POISONED, "already poisoned" ) \ - EM ( MF_MSG_PFN_MAP, "non struct page pfn" ) \ - EMe ( MF_MSG_UNKNOWN, "unknown page" ) - -/* - * First define the enums in MM_ACTION_RESULT to be exported to userspace - * via TRACE_DEFINE_ENUM(). - */ -#undef EM -#undef EMe -#define EM(a, b) TRACE_DEFINE_ENUM(a); -#define EMe(a, b) TRACE_DEFINE_ENUM(a); - -MF_ACTION_RESULT -MF_PAGE_TYPE - -/* - * Now redefine the EM() and EMe() macros to map the enums to the strings - * that will be printed in the output. - */ -#undef EM -#undef EMe -#define EM(a, b) { a, b }, -#define EMe(a, b) { a, b } - -TRACE_EVENT(memory_failure_event, - TP_PROTO(unsigned long pfn, - int type, - int result), - - TP_ARGS(pfn, type, result), - - TP_STRUCT__entry( - __field(unsigned long, pfn) - __field(int, type) - __field(int, result) - ), - - TP_fast_assign( - __entry->pfn = pfn; - __entry->type = type; - __entry->result = result; - ), - - TP_printk("pfn %#lx: recovery action for %s: %s", - __entry->pfn, - __print_symbolic(__entry->type, MF_PAGE_TYPE), - __print_symbolic(__entry->result, MF_ACTION_RESULT) - ) -); -#endif /* CONFIG_MEMORY_FAILURE */ #endif /* _TRACE_HW_EVENT_MC_H */ /* This part must be outside protection */ #include <trace/define_trace.h> diff --git a/include/trace/events/memory-failure.h b/include/trace/events/memory-failure.h new file mode 100644 index 000000000000..aa57cc8f896b --- /dev/null +++ b/include/trace/events/memory-failure.h @@ -0,0 +1,98 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM memory_failure +#define TRACE_INCLUDE_FILE memory-failure + +#if !defined(_TRACE_MEMORY_FAILURE_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_MEMORY_FAILURE_H + +#include <linux/tracepoint.h> +#include <linux/mm.h> + +/* + * memory-failure recovery action result event + * + * unsigned long pfn - Page Frame Number of the corrupted page + * int type - Page types of the corrupted page + * int result - Result of recovery action + */ + +#define MF_ACTION_RESULT \ + EM ( MF_IGNORED, "Ignored" ) \ + EM ( MF_FAILED, "Failed" ) \ + EM ( MF_DELAYED, "Delayed" ) \ + EMe ( MF_RECOVERED, "Recovered" ) + +#define MF_PAGE_TYPE \ + EM ( MF_MSG_KERNEL, "reserved kernel page" ) \ + EM ( MF_MSG_KERNEL_HIGH_ORDER, "high-order kernel page" ) \ + EM ( MF_MSG_HUGE, "huge page" ) \ + EM ( MF_MSG_FREE_HUGE, "free huge page" ) \ + EM ( MF_MSG_GET_HWPOISON, "get hwpoison page" ) \ + EM ( MF_MSG_UNMAP_FAILED, "unmapping failed page" ) \ + EM ( MF_MSG_DIRTY_SWAPCACHE, "dirty swapcache page" ) \ + EM ( MF_MSG_CLEAN_SWAPCACHE, "clean swapcache page" ) \ + EM ( MF_MSG_DIRTY_MLOCKED_LRU, "dirty mlocked LRU page" ) \ + EM ( MF_MSG_CLEAN_MLOCKED_LRU, "clean mlocked LRU page" ) \ + EM ( MF_MSG_DIRTY_UNEVICTABLE_LRU, "dirty unevictable LRU page" ) \ + EM ( MF_MSG_CLEAN_UNEVICTABLE_LRU, "clean unevictable LRU page" ) \ + EM ( MF_MSG_DIRTY_LRU, "dirty LRU page" ) \ + EM ( MF_MSG_CLEAN_LRU, "clean LRU page" ) \ + EM ( MF_MSG_TRUNCATED_LRU, "already truncated LRU page" ) \ + EM ( MF_MSG_BUDDY, "free buddy page" ) \ + EM ( MF_MSG_DAX, "dax page" ) \ + EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" ) \ + EM ( MF_MSG_ALREADY_POISONED, "already poisoned" ) \ + EM ( MF_MSG_PFN_MAP, "non struct page pfn" ) \ + EMe ( MF_MSG_UNKNOWN, "unknown page" ) + +/* + * First define the enums in MM_ACTION_RESULT to be exported to userspace + * via TRACE_DEFINE_ENUM(). + */ +#undef EM +#undef EMe +#define EM(a, b) TRACE_DEFINE_ENUM(a); +#define EMe(a, b) TRACE_DEFINE_ENUM(a); + +MF_ACTION_RESULT +MF_PAGE_TYPE + +/* + * Now redefine the EM() and EMe() macros to map the enums to the strings + * that will be printed in the output. + */ +#undef EM +#undef EMe +#define EM(a, b) { a, b }, +#define EMe(a, b) { a, b } + +TRACE_EVENT(memory_failure_event, + TP_PROTO(unsigned long pfn, + int type, + int result), + + TP_ARGS(pfn, type, result), + + TP_STRUCT__entry( + __field(unsigned long, pfn) + __field(int, type) + __field(int, result) + ), + + TP_fast_assign( + __entry->pfn = pfn; + __entry->type = type; + __entry->result = result; + ), + + TP_printk("pfn %#lx: recovery action for %s: %s", + __entry->pfn, + __print_symbolic(__entry->type, MF_PAGE_TYPE), + __print_symbolic(__entry->result, MF_ACTION_RESULT) + ) +); +#endif /* _TRACE_MEMORY_FAILURE_H */ + +/* This part must be outside protection */ +#include <trace/define_trace.h> diff --git a/mm/Kconfig b/mm/Kconfig index d548976d0e0a..bd0ea5454af8 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -738,11 +738,10 @@ config ARCH_SUPPORTS_MEMORY_FAILURE config MEMORY_FAILURE depends on MMU depends on ARCH_SUPPORTS_MEMORY_FAILURE bool "Enable recovery from hardware memory errors" - select RAS select INTERVAL_TREE help Enables code to recover from some memory failures on systems with MCA recovery. This allows a system to continue running even when some of its memory has uncorrected errors. This requires diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 7f908ad795ad..fbc5a01260c8 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -59,13 +59,16 @@ #include <linux/kfifo.h> #include <linux/ratelimit.h> #include <linux/pagewalk.h> #include <linux/shmem_fs.h> #include <linux/sysctl.h> + +#define CREATE_TRACE_POINTS +#include <trace/events/memory-failure.h> + #include "swap.h" #include "internal.h" -#include "ras/ras_event.h" static int sysctl_memory_failure_early_kill __read_mostly; static int sysctl_memory_failure_recovery __read_mostly = 1; -- 2.51.0 ``` Thanks very much. Xie Yuanbin ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-11-18 8:09 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-11-04 7:23 [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM Xie Yuanbin 2025-11-04 7:23 ` [PATCH v2 1/2] " Xie Yuanbin 2025-11-04 7:23 ` [PATCH v2 2/2] mm/memory-failure: remove the selection of RAS Xie Yuanbin 2025-11-04 9:38 ` David Hildenbrand (Red Hat) 2025-11-04 9:50 ` Xie Yuanbin 2025-11-04 9:33 ` [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM David Hildenbrand (Red Hat) 2025-11-04 13:29 ` Xie Yuanbin 2025-11-04 13:32 ` Xie Yuanbin 2025-11-04 14:26 ` Dave Hansen 2025-11-05 2:45 ` Xie Yuanbin 2025-11-05 8:12 ` David Hildenbrand (Red Hat) 2025-11-05 9:05 ` Xie Yuanbin 2025-11-17 2:09 ` Xie Yuanbin 2025-11-17 13:03 ` David Hildenbrand (Red Hat) 2025-11-18 8:09 ` Xie Yuanbin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox