From: James Morse <james.morse@arm.com>
To: Borislav Petkov <bp@alien8.de>, Shuai Xue <xueshuai@linux.alibaba.com>
Cc: rafael@kernel.org, wangkefeng.wang@huawei.com,
tanxiaofei@huawei.com, mawupeng1@huawei.com, tony.luck@intel.com,
linmiaohe@huawei.com, naoya.horiguchi@nec.com,
gregkh@linuxfoundation.org, will@kernel.org, jarkko@kernel.org,
linux-acpi@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
linux-edac@vger.kernel.org,
acpica-devel@lists.linuxfoundation.org, stable@vger.kernel.org,
x86@kernel.org, justin.he@arm.com, ardb@kernel.org,
ying.huang@intel.com, ashish.kalra@amd.com,
baolin.wang@linux.alibaba.com, tglx@linutronix.de,
mingo@redhat.com, dave.hansen@linux.intel.com, lenb@kernel.org,
hpa@zytor.com, robert.moore@intel.com, lvying6@huawei.com,
xiexiuqi@huawei.com, zhuo.song@linux.alibaba.com
Subject: Re: [PATCH v9 0/2] ACPI: APEI: handle synchronous errors in task work with proper si_code
Date: Thu, 30 Nov 2023 17:43:38 +0000 [thread overview]
Message-ID: <d83545f0-af15-10bc-0f5d-9b531b54b9dd@arm.com> (raw)
In-Reply-To: <20231130144001.GGZWiewYtvMSJir62f@fat_crate.local>
Hi Boris,
On 30/11/2023 14:40, Borislav Petkov wrote:
> FTR, this is starting to make sense, thanks for explaining.
>
> Replying only to this one for now:
>
> On Thu, Nov 30, 2023 at 10:58:53AM +0800, Shuai Xue wrote:
>> To reproduce this problem:
>>
>> # STEP1: enable early kill mode
>> #sysctl -w vm.memory_failure_early_kill=1
>> vm.memory_failure_early_kill = 1
>>
>> # STEP2: inject an UCE error and consume it to trigger a synchronous error
>
> So this is for ARM folks to deal with, BUT:
>
> A consumed uncorrectable error on x86 means panic. On some hw like on
> AMD, that error doesn't even get seen by the OS but the hw does
> something called syncflood to prevent further error propagation. So
> there's no any action required - the hw does that.
>
> But I'd like to hear from ARM folks whether consuming an uncorrectable
> error even lets software run. Dunno.
I think we mean different things by 'consume' here.
I'd assume Shuai's test is poisoning a cache-line. When the CPU tries to access that
cache-line it will get an 'external abort' signal back from the memory system. Shuai - is
this what you mean by 'consume' - the CPU received external abort from the poisoned cache
line?
It's then up to the CPU whether it can put the world back in order to take this as
synchronous-external-abort or asynchronous-external-abort, which for arm64 are two
different interrupt/exception types.
The synchronous exceptions can't be masked, but the asynchronous one can.
If by the time the asynchronous-external-abort interrupt/exception has been unmasked, the
CPU has used the poisoned value in some calculation (which is what we usually mean by
consume) which has resulted in a memory access - it will report the error as 'uncontained'
because the error has been silently propagated. APEI should always report those a 'fatal',
and there is little point getting the OS involved at this point. Also in this category are
things like 'tag ram corruption', where you can no longer trust anything about memory.
Everything in this thread is about synchronous errors where this can't happen. The CPU
stops and does takes an interrupt/exception instead.
Thanks,
James
next prev parent reply other threads:[~2023-11-30 17:43 UTC|newest]
Thread overview: 148+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20221027042445.60108-1-xueshuai@linux.alibaba.com>
2023-03-17 7:24 ` [PATCH v3 0/2] ACPI: APEI: handle synchronous exceptions " Shuai Xue
2023-03-20 18:03 ` Rafael J. Wysocki
2023-03-30 6:11 ` Shuai Xue
2023-03-30 9:52 ` Rafael J. Wysocki
2023-03-21 7:17 ` mawupeng
2023-03-22 1:27 ` Shuai Xue
2023-03-17 7:24 ` [PATCH v3 1/2] ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events Shuai Xue
2023-03-17 7:24 ` [PATCH v3 2/2] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2023-04-06 12:39 ` Xiaofei Tan
2023-04-07 2:21 ` Shuai Xue
2023-04-08 9:13 ` [PATCH v4 0/2] ACPI: APEI: handle synchronous exceptions with proper si_code Shuai Xue
2023-04-08 9:13 ` [PATCH v4 1/2] ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events Shuai Xue
2023-04-08 9:13 ` [PATCH v4 2/2] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2023-04-11 1:44 ` Xiaofei Tan
2023-04-11 3:16 ` Shuai Xue
2023-04-11 9:02 ` Xiaofei Tan
2023-04-11 9:48 ` Shuai Xue
2023-04-11 10:48 ` [PATCH v5 0/2] ACPI: APEI: handle synchronous exceptions with proper si_code Shuai Xue
2023-04-11 10:48 ` [PATCH v5 1/2] ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events Shuai Xue
2023-04-11 14:17 ` Kefeng Wang
2023-04-12 2:54 ` Shuai Xue
2023-04-12 3:55 ` Xiaofei Tan
2023-04-13 1:42 ` Shuai Xue
2023-04-11 10:48 ` [PATCH v5 2/2] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2023-04-11 14:28 ` Kefeng Wang
2023-04-12 2:58 ` Shuai Xue
2023-04-12 4:05 ` Xiaofei Tan
2023-04-13 1:49 ` Shuai Xue
2023-04-12 11:27 ` [PATCH v6 0/2] ACPI: APEI: handle synchronous exceptions with proper si_code Shuai Xue
2023-04-12 11:28 ` [PATCH v6 1/2] ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events Shuai Xue
2023-04-12 11:28 ` [PATCH v6 2/2] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2023-04-17 1:14 ` [PATCH v7 0/2] ACPI: APEI: handle synchronous exceptions with proper si_code Shuai Xue
2023-04-24 6:24 ` Shuai Xue
2023-05-08 1:55 ` Shuai Xue
2023-04-17 1:14 ` [PATCH v7 1/2] ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events Shuai Xue
2023-04-17 1:14 ` [PATCH v7 2/2] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2023-09-19 2:21 ` [RESEND PATCH v8 0/2] ACPI: APEI: handle synchronous errors in task work with proper si_code Shuai Xue
2023-09-19 2:21 ` [RESEND PATCH v8 1/2] ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events Shuai Xue
2023-09-25 14:43 ` Jarkko Sakkinen
2023-09-26 6:23 ` Shuai Xue
2023-09-19 2:21 ` [RESEND PATCH v8 2/2] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2023-09-25 15:00 ` Jarkko Sakkinen
2023-09-26 6:38 ` Shuai Xue
2023-10-03 8:28 ` Naoya Horiguchi
2023-10-07 2:01 ` Shuai Xue
2023-10-07 7:28 ` [PATCH v9 0/2] ACPI: APEI: handle synchronous errors in task work with proper si_code Shuai Xue
2023-11-21 1:48 ` Shuai Xue
2023-11-23 15:07 ` Borislav Petkov
2023-11-25 6:44 ` Shuai Xue
2023-11-25 12:10 ` Borislav Petkov
2023-11-26 12:25 ` Shuai Xue
2023-11-29 18:54 ` Borislav Petkov
2023-11-30 2:58 ` Shuai Xue
2023-11-30 14:40 ` Borislav Petkov
2023-11-30 17:43 ` James Morse [this message]
2023-12-01 2:58 ` Shuai Xue
2023-11-30 17:39 ` James Morse
2023-12-01 3:37 ` Shuai Xue
2023-10-07 7:28 ` [PATCH v9 1/2] ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events Shuai Xue
2023-11-30 17:39 ` James Morse
2023-12-01 5:22 ` Shuai Xue
2023-10-07 7:28 ` [PATCH v9 2/2] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2023-11-30 17:39 ` James Morse
2023-12-01 7:03 ` Shuai Xue
2023-12-18 6:45 ` [PATCH v10 0/4] ACPI: APEI: handle synchronous errors in task work with proper si_code Shuai Xue
2023-12-18 6:45 ` [PATCH v10 1/4] ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events Shuai Xue
2023-12-18 6:53 ` Greg KH
2023-12-21 13:55 ` Rafael J. Wysocki
2023-12-22 1:07 ` Shuai Xue
2023-12-18 6:45 ` [PATCH v10 2/4] ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered Shuai Xue
2023-12-18 6:54 ` Greg KH
2023-12-18 6:45 ` [PATCH v10 3/4] mm: memory-failure: move memory_failure() return value documentation to function declaration Shuai Xue
2023-12-18 6:54 ` Greg KH
2023-12-18 6:45 ` [PATCH v10 4/4] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2023-12-18 6:54 ` Greg KH
2024-02-04 8:01 ` [PATCH v11 0/3] ACPI: APEI: handle synchronous exceptions in task work to send correct SIGBUS si_code Shuai Xue
2024-02-19 1:46 ` Shuai Xue
2024-02-04 8:01 ` [PATCH v11 1/3] ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered Shuai Xue
2024-02-19 9:25 ` Borislav Petkov
2024-02-22 2:07 ` Shuai Xue
2024-02-23 5:26 ` Dan Williams
2024-02-23 12:08 ` Jonathan Cameron
2024-02-23 12:17 ` Jonathan Cameron
2024-02-24 6:08 ` Shuai Xue
2024-02-26 10:29 ` Borislav Petkov
2024-02-27 1:23 ` Shuai Xue
2024-02-24 19:42 ` Dan Williams
2024-02-24 19:40 ` Dan Williams
2024-02-04 8:01 ` [PATCH v11 2/3] mm: memory-failure: move return value documentation to function declaration Shuai Xue
2024-02-26 10:46 ` Borislav Petkov
2024-02-27 1:27 ` Shuai Xue
2024-02-04 8:01 ` [PATCH v11 3/3] ACPI: APEI: handle synchronous exceptions in task work to send correct SIGBUS si_code Shuai Xue
2024-02-29 7:05 ` Shuai Xue
2024-03-08 10:18 ` Borislav Petkov
2024-03-12 6:05 ` Shuai Xue
2024-09-02 3:00 ` [PATCH v12 0/3] ACPI: APEI: handle synchronous errors in task work Shuai Xue
2024-09-18 2:16 ` Shuai Xue
2024-09-02 3:00 ` [PATCH v12 1/3] ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered Shuai Xue
2024-09-03 16:09 ` Jarkko Sakkinen
2024-09-05 3:04 ` Shuai Xue
2024-09-05 14:14 ` Jarkko Sakkinen
2024-09-05 14:17 ` Jarkko Sakkinen
2024-09-06 1:53 ` Shuai Xue
2024-09-06 14:42 ` Jarkko Sakkinen
2024-09-02 3:00 ` [PATCH v12 2/3] mm: memory-failure: move return value documentation to function declaration Shuai Xue
2024-09-03 16:10 ` Jarkko Sakkinen
2024-09-05 3:08 ` Shuai Xue
2024-09-02 3:00 ` [PATCH v12 3/3] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2024-09-03 16:11 ` Jarkko Sakkinen
2024-09-05 3:09 ` Shuai Xue
2024-09-20 4:30 ` [PATCH v13 0/3] ACPI: APEI: handle synchronous errors " Shuai Xue
2024-09-20 4:30 ` [PATCH v13 1/3] ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered Shuai Xue
2024-09-20 11:35 ` Jarkko Sakkinen
2024-09-20 4:30 ` [PATCH v13 2/3] mm: memory-failure: move return value documentation to function declaration Shuai Xue
2024-09-20 11:35 ` Jarkko Sakkinen
2024-09-20 4:30 ` [PATCH v13 3/3] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2024-09-20 11:44 ` Jarkko Sakkinen
2024-09-20 12:14 ` Shuai Xue
2024-10-14 8:42 ` [PATCH v14 0/3] ACPI: APEI: handle synchronous errors " Shuai Xue
2024-10-14 8:42 ` [PATCH v14 1/3] ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered Shuai Xue
2024-10-17 9:39 ` Jonathan Cameron
2024-10-17 23:41 ` Shuai Xue
2024-10-14 8:42 ` [PATCH v14 2/3] mm: memory-failure: move return value documentation to function declaration Shuai Xue
2024-10-17 9:41 ` Jonathan Cameron
2024-10-17 23:43 ` Shuai Xue
2024-10-14 8:42 ` [PATCH v14 3/3] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2024-10-17 9:56 ` Jonathan Cameron
2024-10-18 0:08 ` Shuai Xue
2024-10-22 1:11 ` Shuai Xue
2024-10-25 14:40 ` Jarkko Sakkinen
2024-10-26 6:46 ` Shuai Xue
2024-10-28 8:11 ` [PATCH v15 0/3] ACPI: APEI: handle synchronous errors " Shuai Xue
2024-10-28 8:11 ` [PATCH v15 1/3] ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered Shuai Xue
2024-10-29 20:48 ` Yazen Ghannam
2024-10-30 1:54 ` Shuai Xue
2024-10-30 14:08 ` Yazen Ghannam
2024-10-31 1:36 ` Shuai Xue
2024-10-28 8:11 ` [PATCH v15 2/3] mm: memory-failure: move return value documentation to function declaration Shuai Xue
2024-10-28 8:11 ` [PATCH v15 3/3] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2024-11-04 1:54 ` [PATCH v16 0/3] ACPI: APEI: handle synchronous errors " Shuai Xue
2024-11-04 1:54 ` [PATCH v16 1/3] ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered Shuai Xue
2024-11-05 15:09 ` Yazen Ghannam
2024-11-05 16:56 ` Jarkko Sakkinen
2024-11-06 6:07 ` Shuai Xue
2024-11-04 1:54 ` [PATCH v16 2/3] mm: memory-failure: move return value documentation to function declaration Shuai Xue
2024-11-05 15:18 ` Yazen Ghannam
2024-11-06 6:12 ` Shuai Xue
2024-11-04 1:54 ` [PATCH v16 3/3] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d83545f0-af15-10bc-0f5d-9b531b54b9dd@arm.com \
--to=james.morse@arm.com \
--cc=acpica-devel@lists.linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=ardb@kernel.org \
--cc=ashish.kalra@amd.com \
--cc=baolin.wang@linux.alibaba.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=gregkh@linuxfoundation.org \
--cc=hpa@zytor.com \
--cc=jarkko@kernel.org \
--cc=justin.he@arm.com \
--cc=lenb@kernel.org \
--cc=linmiaohe@huawei.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lvying6@huawei.com \
--cc=mawupeng1@huawei.com \
--cc=mingo@redhat.com \
--cc=naoya.horiguchi@nec.com \
--cc=rafael@kernel.org \
--cc=robert.moore@intel.com \
--cc=stable@vger.kernel.org \
--cc=tanxiaofei@huawei.com \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=wangkefeng.wang@huawei.com \
--cc=will@kernel.org \
--cc=x86@kernel.org \
--cc=xiexiuqi@huawei.com \
--cc=xueshuai@linux.alibaba.com \
--cc=ying.huang@intel.com \
--cc=zhuo.song@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox