* Re: [PATCH v4 0/2] Copy-on-write poison recovery
@ 2023-05-08 3:18 Albert E. Davies
0 siblings, 0 replies; 2+ messages in thread
From: Albert E. Davies @ 2023-05-08 3:18 UTC (permalink / raw)
To: tony.luck
Cc: akpm, christophe.leroy, dan.j.williams, glider, linmiaohe,
linux-kernel, linux-mm, linuxppc-dev, mpe, naoya.horiguchi,
npiggin, willy, xueshuai
[-- Attachment #1: Type: text/plain, Size: 55 bytes --]
Get Outlook for Android<https://aka.ms/AAb9ysg>
[-- Attachment #2: Type: text/html, Size: 452 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
* [PATCH v3 0/2] Copy-on-write poison recovery
@ 2022-10-21 20:01 Tony Luck
2022-10-31 20:10 ` [PATCH v4 " Tony Luck
0 siblings, 1 reply; 2+ messages in thread
From: Tony Luck @ 2022-10-21 20:01 UTC (permalink / raw)
To: Naoya Horiguchi, Andrew Morton
Cc: Miaohe Lin, Matthew Wilcox, Shuai Xue, Dan Williams,
Michael Ellerman, Nicholas Piggin, Christophe Leroy, linux-mm,
linux-kernel, linuxppc-dev, Tony Luck
Part 1 deals with the process that triggered the copy on write
fault with a store to a shared read-only page. That process is
send a SIGBUS with the usual machine check decoration to specify
the virtual address of the lost page, together with the scope.
Part 2 sets up to asynchronously take the page with the uncorrected
error offline to prevent additional machine check faults. H/t to
Miaohe Lin <linmiaohe@huawei.com> and Shuai Xue <xueshuai@linux.alibaba.com>
for pointing me to the existing function to queue a call to
memory_failure().
On x86 there is some duplicate reporting (because the error is
also signalled by the memory controller as well as by the core
that triggered the machine check). Console logs look like this:
[ 1647.723403] mce: [Hardware Error]: Machine check events logged
Machine check from kernel copy routine
[ 1647.723414] MCE: Killing einj_mem_uc:3600 due to hardware memory corruption fault at 7f3309503400
x86 fault handler sends SIGBUS to child process
[ 1647.735183] Memory failure: 0x905b92d: recovery action for dirty LRU page: Recovered
Async call to memory_failure() from copy on write path
[ 1647.748397] Memory failure: 0x905b92d: already hardware poisoned
uc_decode_notifier() processes memory controller report
[ 1647.761313] MCE: Killing einj_mem_uc:3599 due to hardware memory corruption fault at 7f3309503400
Parent process tries to read poisoned page. Page has been unmapped, so
#PF handler sends SIGBUS
Tony Luck (2):
mm, hwpoison: Try to recover from copy-on write faults
mm, hwpoison: When copy-on-write hits poison, take page offline
include/linux/highmem.h | 24 ++++++++++++++++++++++++
include/linux/mm.h | 5 ++++-
mm/memory.c | 32 ++++++++++++++++++++++----------
3 files changed, 50 insertions(+), 11 deletions(-)
--
2.37.3
^ permalink raw reply [flat|nested] 2+ messages in thread
* [PATCH v4 0/2] Copy-on-write poison recovery
2022-10-21 20:01 [PATCH v3 " Tony Luck
@ 2022-10-31 20:10 ` Tony Luck
0 siblings, 0 replies; 2+ messages in thread
From: Tony Luck @ 2022-10-31 20:10 UTC (permalink / raw)
To: Andrew Morton
Cc: Alexander Potapenko, Naoya Horiguchi, Miaohe Lin, Matthew Wilcox,
Shuai Xue, Dan Williams, Michael Ellerman, Nicholas Piggin,
Christophe Leroy, linux-mm, linux-kernel, linuxppc-dev,
Tony Luck
Recover from poison consumption while copying pages
in the kernel for a copy-on-write fault.
Changes since v3:
1) Miaohe Lin <linmiaohe@huawei.com> pointed out that a recent change
by Alexander Potapenko <glider@google.com> to copy_user_highpage()
added a call to kmsan_unpoison_memory(). Same is needed in my cloned
copy_mc_user_highpage() ... at least in the successful case where the
page was copied with no machine checks.
2) Picked up some additional Reviewed-by and Tested-by tags.
Tony Luck (2):
mm, hwpoison: Try to recover from copy-on write faults
mm, hwpoison: When copy-on-write hits poison, take page offline
include/linux/highmem.h | 26 ++++++++++++++++++++++++++
include/linux/mm.h | 5 ++++-
mm/memory.c | 32 ++++++++++++++++++++++----------
3 files changed, 52 insertions(+), 11 deletions(-)
base-commit: 30a0b95b1335e12efef89dd78518ed3e4a71a763
--
2.37.3
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2023-05-08 3:18 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-08 3:18 [PATCH v4 0/2] Copy-on-write poison recovery Albert E. Davies
-- strict thread matches above, loose matches on Subject: below --
2022-10-21 20:01 [PATCH v3 " Tony Luck
2022-10-31 20:10 ` [PATCH v4 " Tony Luck
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox