From: James Morse <james.morse@arm.com>
To: Borislav Petkov <bp@alien8.de>
Cc: Xie XiuQi <xiexiuqi@huawei.com>,
linux-acpi@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org,
Marc Zyngier <marc.zyngier@arm.com>,
Christoffer Dall <christoffer.dall@arm.com>,
Will Deacon <will.deacon@arm.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
Rafael Wysocki <rjw@rjwysocki.net>, Len Brown <lenb@kernel.org>,
Tony Luck <tony.luck@intel.com>,
Dongjiu Geng <gengdongjiu@huawei.com>,
Fan Wu <wufan@codeaurora.org>,
Wang Xiongfeng <wangxiongfeng2@huawei.com>
Subject: Re: [PATCH v7 22/25] ACPI / APEI: Kick the memory_failure() queue for synchronous errors
Date: Wed, 23 Jan 2019 18:37:32 +0000 [thread overview]
Message-ID: <ae892690-fdf7-b326-1c76-5bf39c2c9bb5@arm.com> (raw)
In-Reply-To: <20190122105143.GB26587@zn.tnic>
Hi Boris,
On 22/01/2019 10:51, Borislav Petkov wrote:
> On Mon, Dec 10, 2018 at 07:15:13PM +0000, James Morse wrote:
>> What happens if we miss MF_ACTION_REQUIRED?
>
> AFAICU, the logic is to force-send a signal to the user process, i.e.,
> force_sig_info() which cannot be ignored. IOW, an "enlightened" process
> would know how to do recovery action from a memory error.
>
> VS the action optional thing which you can handle at your leisure.
> So the question boils down to what kind of severity do the errors
> reported through SEA have? I mean, if the hw would go the trouble to do
> the synchronous reporting, then something important must've happened and
> it wants us to know about it and handle it.
Before v8.2 we assumed these were fatal for the thread, it couldn't make progress.
Since v8.2 we get a value from the CPU, the severity values are, (the flippant
summary is obviously mine!):
* Recoverable: "You're about to step in it, fix it or die"
* Uncontainable: "It was here, but it escaped, we dont know where it went, panic!"
* Restartable/Corrected: "its fine, pretend this didn't happen"
Firmware should duplicate these values into the CPER severity fields.
>> Surely the page still gets unmapped as its PG_Poisoned, an AO signal
>> may be pending, but if user-space touches the page it will get an AR
>> signal. Is this just about removing an extra AO signal to user-space?
If we miss MF_ACTION_REQUIRED, the page still gets unmapped from user-space, and
user-space gets an AO signal. With this patch it takes that signal before it
continues. If it ignores it, the access gets a translation-fault->EHWPOISON->AR
signal from the arch code.
... so missing the flag gives us an extra signal. I'm not convinced this results
in any observable difference.
>> If we do need this, I'd like to pick it up from the CPER records, as x86's
>> NOTIFY_NMI looks like it covers both AO/AR cases. (as does NOTIFY_SDEI). The
>> Master/Target abort or Invalid-address types in the memory-error-section CPER
>> records look like the best bet.
>
> Right, and we do all kinds of severity mapping there aka ghes_severity()
> so that'll be a good start, methinks.
The options are those 'aborts' in the memory error. These must have been a
result of some request. If we get a CPU error structure as part of the same
block, it may have a cache/bus error structure, which has a precise bit that
tells us whether this is a co-incidence. (but linux doesn't support any of those
structures today)
Thanks,
James
next prev parent reply other threads:[~2019-01-23 18:37 UTC|newest]
Thread overview: 75+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-03 18:05 [PATCH v7 00/25] APEI in_nmi() rework and SDEI wire-up James Morse
2018-12-03 18:05 ` [PATCH v7 01/25] ACPI / APEI: Don't wait to serialise with oops messages when panic()ing James Morse
2018-12-03 18:05 ` [PATCH v7 02/25] ACPI / APEI: Remove silent flag from ghes_read_estatus() James Morse
2018-12-04 11:36 ` Borislav Petkov
2018-12-03 18:05 ` [PATCH v7 03/25] ACPI / APEI: Switch estatus pool to use vmalloc memory James Morse
2018-12-04 13:01 ` Borislav Petkov
2018-12-03 18:05 ` [PATCH v7 04/25] ACPI / APEI: Make hest.c manage the estatus memory pool James Morse
2018-12-11 16:48 ` Borislav Petkov
2018-12-14 13:56 ` James Morse
2018-12-19 14:42 ` Borislav Petkov
2019-01-10 18:20 ` James Morse
2018-12-03 18:05 ` [PATCH v7 05/25] ACPI / APEI: Make estatus pool allocation a static size James Morse
2018-12-11 16:54 ` Borislav Petkov
2018-12-03 18:05 ` [PATCH v7 06/25] ACPI / APEI: Don't store CPER records physical address in struct ghes James Morse
2018-12-11 17:04 ` Borislav Petkov
2018-12-03 18:05 ` [PATCH v7 07/25] ACPI / APEI: Remove spurious GHES_TO_CLEAR check James Morse
2018-12-11 17:18 ` Borislav Petkov
2018-12-03 18:05 ` [PATCH v7 08/25] ACPI / APEI: Don't update struct ghes' flags in read/clear estatus James Morse
2018-12-03 18:05 ` [PATCH v7 09/25] ACPI / APEI: Generalise the estatus queue's notify code James Morse
2018-12-11 17:44 ` Borislav Petkov
2019-01-10 18:21 ` James Morse
2019-01-11 11:46 ` Borislav Petkov
2018-12-03 18:05 ` [PATCH v7 10/25] ACPI / APEI: Tell firmware the estatus queue consumed the records James Morse
2018-12-11 18:36 ` Borislav Petkov
2019-01-10 18:22 ` James Morse
2019-01-10 21:01 ` Tyler Baicar
2019-01-10 21:01 ` Tyler Baicar
2019-01-11 12:03 ` Borislav Petkov
2019-01-11 15:32 ` Tyler Baicar
2019-01-11 15:32 ` Tyler Baicar
2019-01-11 17:45 ` Borislav Petkov
2019-01-11 18:25 ` James Morse
2019-01-11 19:58 ` Borislav Petkov
2019-01-23 18:36 ` James Morse
2019-01-29 11:49 ` Borislav Petkov
2019-01-29 18:48 ` James Morse
2019-01-31 13:29 ` Borislav Petkov
2019-01-11 18:09 ` James Morse
2019-01-11 20:01 ` Borislav Petkov
2019-01-11 20:53 ` Tyler Baicar
2019-01-11 20:53 ` Tyler Baicar
2019-01-29 18:48 ` James Morse
2018-12-03 18:05 ` [PATCH v7 11/25] ACPI / APEI: Move NOTIFY_SEA between the estatus-queue and NOTIFY_NMI James Morse
2019-01-21 13:01 ` Borislav Petkov
2018-12-03 18:06 ` [PATCH v7 12/25] ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue James Morse
2018-12-03 18:06 ` [PATCH v7 13/25] KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing James Morse
2018-12-06 16:17 ` Catalin Marinas
2018-12-03 18:06 ` [PATCH v7 14/25] arm64: KVM/mm: Move SEA handling behind a single 'claim' interface James Morse
2018-12-06 16:17 ` Catalin Marinas
2018-12-03 18:06 ` [PATCH v7 15/25] ACPI / APEI: Move locking to the notification helper James Morse
2018-12-03 18:06 ` [PATCH v7 16/25] ACPI / APEI: Let the notification helper specify the fixmap slot James Morse
2018-12-03 18:06 ` [PATCH v7 17/25] ACPI / APEI: Pass ghes and estatus separately to avoid a later copy James Morse
2019-01-21 13:35 ` Borislav Petkov
2018-12-03 18:06 ` [PATCH v7 18/25] ACPI / APEI: Split ghes_read_estatus() to allow a peek at the CPER length James Morse
2019-01-21 13:53 ` Borislav Petkov
2018-12-03 18:06 ` [PATCH v7 19/25] ACPI / APEI: Only use queued estatus entry during _in_nmi_notify_one() James Morse
2019-01-21 17:19 ` Borislav Petkov
2018-12-03 18:06 ` [PATCH v7 20/25] ACPI / APEI: Use separate fixmap pages for arm64 NMI-like notifications James Morse
2019-01-21 17:27 ` Borislav Petkov
2019-01-23 18:33 ` James Morse
2019-01-31 13:38 ` Borislav Petkov
2018-12-03 18:06 ` [PATCH v7 21/25] mm/memory-failure: Add memory_failure_queue_kick() James Morse
2018-12-03 18:06 ` [PATCH v7 22/25] ACPI / APEI: Kick the memory_failure() queue for synchronous errors James Morse
2018-12-05 2:02 ` Xie XiuQi
2018-12-10 19:15 ` James Morse
2019-01-22 10:51 ` Borislav Petkov
2019-01-23 18:37 ` James Morse [this message]
2019-01-21 17:58 ` Borislav Petkov
2019-01-23 18:40 ` James Morse
2019-01-31 14:04 ` Borislav Petkov
2018-12-03 18:06 ` [PATCH v7 23/25] arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work James Morse
2018-12-06 16:18 ` Catalin Marinas
2018-12-03 18:06 ` [PATCH v7 24/25] firmware: arm_sdei: Add ACPI GHES registration helper James Morse
2018-12-06 16:18 ` Catalin Marinas
2018-12-03 18:06 ` [PATCH v7 25/25] ACPI / APEI: Add support for the SDEI GHES Notification type James Morse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ae892690-fdf7-b326-1c76-5bf39c2c9bb5@arm.com \
--to=james.morse@arm.com \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=christoffer.dall@arm.com \
--cc=gengdongjiu@huawei.com \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-mm@kvack.org \
--cc=marc.zyngier@arm.com \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=rjw@rjwysocki.net \
--cc=tony.luck@intel.com \
--cc=wangxiongfeng2@huawei.com \
--cc=will.deacon@arm.com \
--cc=wufan@codeaurora.org \
--cc=xiexiuqi@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox