From: David Hildenbrand <david@redhat.com>
To: Longlong Xia <xialonglong2025@163.com>,
linmiaohe@huawei.com, lance.yang@linux.dev
Cc: markus.elfring@web.de, nao.horiguchi@gmail.com,
akpm@linux-foundation.org, wangkefeng.wang@huawei.com,
qiuxu.zhuo@intel.com, xu.xin16@zte.com.cn,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v2 0/1] mm/ksm: recover from memory failure on KSM page by migrating to healthy duplicate
Date: Thu, 16 Oct 2025 12:46:55 +0200 [thread overview]
Message-ID: <7e533422-1707-4fea-9350-0e832cf24a83@redhat.com> (raw)
In-Reply-To: <20251016101813.484565-1-xialonglong2025@163.com>
On 16.10.25 12:18, Longlong Xia wrote:
> When a hardware memory error occurs on a KSM page, the current
> behavior is to kill all processes mapping that page. This can
> be overly aggressive when KSM has multiple duplicate pages in
> a chain where other duplicates are still healthy.
>
> This patch introduces a recovery mechanism that attempts to
> migrate mappings from the failing KSM page to a newly
> allocated KSM page or another healthy duplicate already
> present in the same chain, before falling back to the
> process-killing procedure.
>
> The recovery process works as follows:
> 1. Identify if the failing KSM page belongs to a stable node chain.
> 2. Locate a healthy duplicate KSM page within the same chain.
> 3. For each process mapping the failing page:
> a. Attempt to allocate a new KSM page copy from healthy duplicate
> KSM page. If successful, migrate the mapping to this new KSM page.
> b. If allocation fails, migrate the mapping to the existing healthy
> duplicate KSM page.
> 4. If all migrations succeed, remove the failing KSM page from the chain.
> 5. Only if recovery fails (e.g., no healthy duplicate found or migration
> error) does the kernel fall back to killing the affected processes.
>
> The original idea came from Naoya Horiguchi.
> https://lore.kernel.org/all/20230331054243.GB1435482@hori.linux.bs1.fc.nec.co.jp/
>
> I test it with einj in physical machine x86_64 CPU Intel(R) Xeon(R) Gold 6430.
>
> test shell script
> modprobe einj 2>/dev/null
> echo 0x10 > /sys/kernel/debug/apei/einj/error_type
> echo $ADDRESS > /sys/kernel/debug/apei/einj/param1
> echo 0xfffffffffffff000 > /sys/kernel/debug/apei/einj/param2
> echo 1 > /sys/kernel/debug/apei/einj/error_inject
>
> FIRST WAY: allocate a new KSM page copy from healthy duplicate
> 1. alloc 1024 page with same content and enable KSM to merge
> after merge (same phy_addr only print once)
> virtual addr = 0x71582be00000 phy_addr =0x124802000
> virtual addr = 0x71582bf2c000 phy_addr =0x124902000
> virtual addr = 0x71582c026000 phy_addr =0x125402000
> virtual addr = 0x71582c120000 phy_addr =0x125502000
>
>
> 2. echo 0x124802000 > /sys/kernel/debug/apei/einj/param1
> virtual addr = 0x71582be00000 phy_addr =0x1363b1000 (new allocated)
> virtual addr = 0x71582bf2c000 phy_addr =0x124902000
> virtual addr = 0x71582c026000 phy_addr =0x125402000
> virtual addr = 0x71582c120000 phy_addr =0x125502000
>
>
> 3. echo 0x124902000 > /sys/kernel/debug/apei/einj/param1
> virtual addr = 0x71582be00000 phy_addr =0x1363b1000
> virtual addr = 0x71582bf2c000 phy_addr =0x13099a000 (new allocated)
> virtual addr = 0x71582c026000 phy_addr =0x125402000
> virtual addr = 0x71582c120000 phy_addr =0x125502000
>
> kernel-log:
> mce: [Hardware Error]: Machine check events logged
> ksm: recovery successful, no need to kill processes
> Memory failure: 0x124802: recovery action for dirty LRU page: Recovered
> Memory failure: 0x124802: recovery action for already poisoned page: Failed
> ksm: recovery successful, no need to kill processes
> Memory failure: 0x124902: recovery action for dirty LRU page: Recovered
> Memory failure: 0x124902: recovery action for already poisoned page: Failed
>
>
> SECOND WAY: Migrate the mapping to the existing healthy duplicate KSM page
> 1. alloc 1024 page with same content and enable KSM to merge
> after merge (same phy_addr only print once)
> virtual addr = 0x79a172000000 phy_addr =0x141802000
> virtual addr = 0x79a17212c000 phy_addr =0x141902000
> virtual addr = 0x79a172226000 phy_addr =0x13cc02000
> virtual addr = 0x79a172320000 phy_addr =0x13cd02000
>
> 2 echo 0x141802000 > /sys/kernel/debug/apei/einj/param1
> a.virtual addr = 0x79a172000000 phy_addr =0x13cd02000
> b.virtual addr = 0x79a17212c000 phy_addr =0x141902000
> c.virtual addr = 0x79a172226000 phy_addr =0x13cc02000
> d.virtual addr = 0x79a172320000 phy_addr =0x13cd02000 (share with a)
>
> 3.echo 0x141902000 > /sys/kernel/debug/apei/einj/param1
> a.virtual addr = 0x79a172000000 phy_addr =0x13cd02000
> b.virtual addr = 0x79a172032000 phy_addr =0x13cd02000 (share with a)
> c.virtual addr = 0x79a172226000 phy_addr =0x13cc02000
> d.virtual addr = 0x79a172320000 phy_addr =0x13cd02000 (share with a)
>
> 4. echo 0x13cd02000 > /sys/kernel/debug/apei/einj/param1
> a.virtual addr = 0x79a172000000 phy_addr =0x13cc02000
> b.virtual addr = 0x79a172032000 phy_addr =0x13cc02000 (share with a)
> c.virtual addr = 0x79a172226000 phy_addr =0x13cc02000 (share with a)
> d.virtual addr = 0x79a172320000 phy_addr =0x13cc02000 (share with a)
>
> 5. echo 0x13cc02000 > /sys/kernel/debug/apei/einj/param1
> Bus error (core dumped)
>
> kernel-log:
> mce: [Hardware Error]: Machine check events logged
> ksm: recovery successful, no need to kill processes
> Memory failure: 0x141802: recovery action for dirty LRU page: Recovered
> Memory failure: 0x141802: recovery action for already poisoned page: Failed
> ksm: recovery successful, no need to kill processes
> Memory failure: 0x141902: recovery action for dirty LRU page: Recovered
> Memory failure: 0x141902: recovery action for already poisoned page: Failed
> ksm: recovery successful, no need to kill processes
> Memory failure: 0x13cd02: recovery action for dirty LRU page: Recovered
> Memory failure: 0x13cd02: recovery action for already poisoned page: Failed
> Memory failure: 0x13cc02: recovery action for dirty LRU page: Recovered
> Memory failure: 0x13cc02: recovery action for already poisoned page: Failed
> MCE: Killing ksm_addr:5221 due to hardware memory corruption fault at 79a172000000
>
> ZERO PAGE TEST:
> when I test in physical machine x86_64 CPU Intel(R) Xeon(R) Gold 6430
> [shell]# ./einj.sh 0x193f908000
> ./einj.sh: line 25: echo: write error: Address already in use
>
> when I test in qemu-x86_64.
> Injecting memory failure at pfn 0x3a9d0c
> Memory failure: 0x3a9d0c: unhandlable page.
> Memory failure: 0x3a9d0c: recovery action for get hwpoison page: Ignored
>
> It seems return early before enter this patch's functions.
>
> Thanks for review and comments!
>
> Changes in v2:
>
> - Implemented a two-tier recovery strategy: preferring newly allocated
> pages over existing duplicates to avoid concentrating mappings on a
> single page suggested by David Hildenbrand
I also asked how relevant this is in practice [1]
"
But how realistic do we consider that in practice? We need quite a bunch
of processes to dedup the same page to end up getting duplicates in the
chain IIRC.
So isn't this rather an improvement only for less likely scenarios in
practice?
"
In particular for your test "alloc 1024 page with same content".
It certainly adds complexity, so we should clarify if this is really
worth it.
[1]
https://lore.kernel.org/all/8c4d8ebe-885e-40f0-a10e-7290067c7b96@redhat.com/
--
Cheers
David / dhildenb
next prev parent reply other threads:[~2025-10-16 10:47 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-16 10:18 Longlong Xia
2025-10-16 10:18 ` [PATCH v2 1/1] " Longlong Xia
2025-10-16 14:37 ` [PATCH v2] " Markus Elfring
2025-10-17 3:09 ` [PATCH v2 1/1] " kernel test robot
2025-10-23 11:54 ` Miaohe Lin
2025-10-28 7:54 ` Long long Xia
2025-10-29 6:40 ` Miaohe Lin
2025-10-29 7:12 ` Long long Xia
2025-10-30 2:56 ` Miaohe Lin
2025-10-28 9:44 ` David Hildenbrand
2025-11-03 15:15 ` [PATCH v3 0/2] mm/ksm: try " Longlong Xia
2025-11-03 15:16 ` [PATCH v3 1/2] mm/ksm: add helper to allocate and initialize stable node duplicates Longlong Xia
2025-11-03 15:16 ` [PATCH v3 2/2] mm/ksm: try recover from memory failure on KSM page by migrating to healthy duplicate Longlong Xia
2025-10-16 10:46 ` David Hildenbrand [this message]
2025-10-21 14:00 ` [PATCH v2 0/1] mm/ksm: " Long long Xia
2025-10-23 16:16 ` David Hildenbrand
2025-10-16 11:01 ` Markus Elfring
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7e533422-1707-4fea-9350-0e832cf24a83@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=lance.yang@linux.dev \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=markus.elfring@web.de \
--cc=nao.horiguchi@gmail.com \
--cc=qiuxu.zhuo@intel.com \
--cc=wangkefeng.wang@huawei.com \
--cc=xialonglong2025@163.com \
--cc=xu.xin16@zte.com.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox