linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Jarkko Sakkinen" <jarkko@kernel.org>
To: "Jarkko Sakkinen" <jarkko@kernel.org>,
	"Shuai Xue" <xueshuai@linux.alibaba.com>, <bp@alien8.de>,
	<rafael@kernel.org>, <wangkefeng.wang@huawei.com>,
	<tanxiaofei@huawei.com>, <mawupeng1@huawei.com>,
	<tony.luck@intel.com>, <linmiaohe@huawei.com>,
	<naoya.horiguchi@nec.com>, <james.morse@arm.com>,
	<tongtiangen@huawei.com>, <gregkh@linuxfoundation.org>,
	<will@kernel.org>
Cc: <linux-acpi@vger.kernel.org>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>, <akpm@linux-foundation.org>,
	<linux-edac@vger.kernel.org>, <x86@kernel.org>,
	<justin.he@arm.com>, <ardb@kernel.org>, <ying.huang@intel.com>,
	<ashish.kalra@amd.com>, <baolin.wang@linux.alibaba.com>,
	<tglx@linutronix.de>, <mingo@redhat.com>,
	<dave.hansen@linux.intel.com>, <lenb@kernel.org>, <hpa@zytor.com>,
	<robert.moore@intel.com>, <lvying6@huawei.com>,
	<xiexiuqi@huawei.com>, <zhuo.song@linux.alibaba.com>
Subject: Re: [PATCH v12 1/3] ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered
Date: Thu, 05 Sep 2024 17:17:41 +0300	[thread overview]
Message-ID: <D3YEYH69KMV4.13SX59Y2HT6D@kernel.org> (raw)
In-Reply-To: <D3YEWCUXEWY3.ALFECJPKZMMG@kernel.org>

On Thu Sep 5, 2024 at 5:14 PM EEST, Jarkko Sakkinen wrote:
> On Thu Sep 5, 2024 at 6:04 AM EEST, Shuai Xue wrote:
> >
> >
> > 在 2024/9/4 00:09, Jarkko Sakkinen 写道:
> > > On Mon Sep 2, 2024 at 6:00 AM EEST, Shuai Xue wrote:
> > >> Synchronous error was detected as a result of user-space process accessing
> > >> a 2-bit uncorrected error. The CPU will take a synchronous error exception
> > >> such as Synchronous External Abort (SEA) on Arm64. The kernel will queue a
> > >> memory_failure() work which poisons the related page, unmaps the page, and
> > >> then sends a SIGBUS to the process, so that a system wide panic can be
> > >> avoided.
> > >>
> > >> However, no memory_failure() work will be queued unless all bellow
> > >> preconditions check passed:
> > >>
> > >> - `if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))` in ghes_handle_memory_failure()
> > >> - `if (flags == -1)` in ghes_handle_memory_failure()
> > >> - `if (!IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE))` in ghes_do_memory_failure()
> > >> - `if (!pfn_valid(pfn) && !arch_is_platform_page(physical_addr)) ` in ghes_do_memory_failure()
> > >>
> > >> In such case, the user-space process will trigger SEA again.  This loop
> > >> can potentially exceed the platform firmware threshold or even trigger a
> > >> kernel hard lockup, leading to a system reboot.
> > >>
> > >> Fix it by performing a force kill if no memory_failure() work is queued
> > >> for synchronous errors.
> > >>
> > >> Suggested-by: Xiaofei Tan <tanxiaofei@huawei.com>
> > >> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
> > >>
> > >> ---
> > >>   drivers/acpi/apei/ghes.c | 10 ++++++++++
> > >>   1 file changed, 10 insertions(+)
> > >>
> > >> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> > >> index 623cc0cb4a65..b0b20ee533d9 100644
> > >> --- a/drivers/acpi/apei/ghes.c
> > >> +++ b/drivers/acpi/apei/ghes.c
> > >> @@ -801,6 +801,16 @@ static bool ghes_do_proc(struct ghes *ghes,
> > >>   		}
> > >>   	}
> > >>   
> > >> +	/*
> > >> +	 * If no memory failure work is queued for abnormal synchronous
> > >> +	 * errors, do a force kill.
> > >> +	 */
> > >> +	if (sync && !queued) {
> > >> +		pr_err("Sending SIGBUS to %s:%d due to hardware memory corruption\n",
> > >> +			current->comm, task_pid_nr(current));
> > > 
> > > Hmm... doest this need "hardware" or would "memory corruption" be
> > > enough?
> > > 
> > > Also, does this need to say that it is sending SIGBUS when the signal
> > > itself tells that already?
> > > 
> > > I.e. could "%s:%d has memory corruption" be enough information?
> >
> > Hi, Jarkko,
> >
> > Thank you for your suggestion. Maybe it could.
> >
> > There are some similar error info which use "hardware memory error", e.g.
>
> By tweaking my original suggestion just a bit:
>
> "%s:%d: hardware memory corruption"
>
> Can't get clearer than that, right?

And obvious reason that shorter and more consistent klog message is easy
to spot and grep. It is simply less convoluted.

If you want also SIGBUS, I'd just put it as "%s:%d: hardware memory
corruption (SIGBUS)"

BR, Jarkko


  reply	other threads:[~2024-09-05 14:17 UTC|newest]

Thread overview: 148+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20221027042445.60108-1-xueshuai@linux.alibaba.com>
2023-03-17  7:24 ` [PATCH v3 0/2] ACPI: APEI: handle synchronous exceptions with proper si_code Shuai Xue
2023-03-20 18:03   ` Rafael J. Wysocki
2023-03-30  6:11     ` Shuai Xue
2023-03-30  9:52       ` Rafael J. Wysocki
2023-03-21  7:17   ` mawupeng
2023-03-22  1:27     ` Shuai Xue
2023-03-17  7:24 ` [PATCH v3 1/2] ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events Shuai Xue
2023-03-17  7:24 ` [PATCH v3 2/2] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2023-04-06 12:39   ` Xiaofei Tan
2023-04-07  2:21     ` Shuai Xue
2023-04-08  9:13 ` [PATCH v4 0/2] ACPI: APEI: handle synchronous exceptions with proper si_code Shuai Xue
2023-04-08  9:13 ` [PATCH v4 1/2] ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events Shuai Xue
2023-04-08  9:13 ` [PATCH v4 2/2] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2023-04-11  1:44   ` Xiaofei Tan
2023-04-11  3:16     ` Shuai Xue
2023-04-11  9:02       ` Xiaofei Tan
2023-04-11  9:48         ` Shuai Xue
2023-04-11 10:48 ` [PATCH v5 0/2] ACPI: APEI: handle synchronous exceptions with proper si_code Shuai Xue
2023-04-11 10:48 ` [PATCH v5 1/2] ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events Shuai Xue
2023-04-11 14:17   ` Kefeng Wang
2023-04-12  2:54     ` Shuai Xue
2023-04-12  3:55   ` Xiaofei Tan
2023-04-13  1:42     ` Shuai Xue
2023-04-11 10:48 ` [PATCH v5 2/2] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2023-04-11 14:28   ` Kefeng Wang
2023-04-12  2:58     ` Shuai Xue
2023-04-12  4:05   ` Xiaofei Tan
2023-04-13  1:49     ` Shuai Xue
2023-04-12 11:27 ` [PATCH v6 0/2] ACPI: APEI: handle synchronous exceptions with proper si_code Shuai Xue
2023-04-12 11:28 ` [PATCH v6 1/2] ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events Shuai Xue
2023-04-12 11:28 ` [PATCH v6 2/2] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2023-04-17  1:14 ` [PATCH v7 0/2] ACPI: APEI: handle synchronous exceptions with proper si_code Shuai Xue
2023-04-24  6:24   ` Shuai Xue
2023-05-08  1:55     ` Shuai Xue
2023-04-17  1:14 ` [PATCH v7 1/2] ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events Shuai Xue
2023-04-17  1:14 ` [PATCH v7 2/2] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2023-09-19  2:21 ` [RESEND PATCH v8 0/2] ACPI: APEI: handle synchronous errors in task work with proper si_code Shuai Xue
2023-09-19  2:21 ` [RESEND PATCH v8 1/2] ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events Shuai Xue
2023-09-25 14:43   ` Jarkko Sakkinen
2023-09-26  6:23     ` Shuai Xue
2023-09-19  2:21 ` [RESEND PATCH v8 2/2] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2023-09-25 15:00   ` Jarkko Sakkinen
2023-09-26  6:38     ` Shuai Xue
2023-10-03  8:28   ` Naoya Horiguchi
2023-10-07  2:01     ` Shuai Xue
2023-10-07  7:28 ` [PATCH v9 0/2] ACPI: APEI: handle synchronous errors in task work with proper si_code Shuai Xue
2023-11-21  1:48   ` Shuai Xue
2023-11-23 15:07   ` Borislav Petkov
2023-11-25  6:44     ` Shuai Xue
2023-11-25 12:10       ` Borislav Petkov
2023-11-26 12:25         ` Shuai Xue
2023-11-29 18:54           ` Borislav Petkov
2023-11-30  2:58             ` Shuai Xue
2023-11-30 14:40               ` Borislav Petkov
2023-11-30 17:43                 ` James Morse
2023-12-01  2:58                   ` Shuai Xue
2023-11-30 17:39             ` James Morse
2023-12-01  3:37               ` Shuai Xue
2023-10-07  7:28 ` [PATCH v9 1/2] ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events Shuai Xue
2023-11-30 17:39   ` James Morse
2023-12-01  5:22     ` Shuai Xue
2023-10-07  7:28 ` [PATCH v9 2/2] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2023-11-30 17:39   ` James Morse
2023-12-01  7:03     ` Shuai Xue
2023-12-18  6:45 ` [PATCH v10 0/4] ACPI: APEI: handle synchronous errors in task work with proper si_code Shuai Xue
2023-12-18  6:45 ` [PATCH v10 1/4] ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events Shuai Xue
2023-12-18  6:53   ` Greg KH
2023-12-21 13:55   ` Rafael J. Wysocki
2023-12-22  1:07     ` Shuai Xue
2023-12-18  6:45 ` [PATCH v10 2/4] ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered Shuai Xue
2023-12-18  6:54   ` Greg KH
2023-12-18  6:45 ` [PATCH v10 3/4] mm: memory-failure: move memory_failure() return value documentation to function declaration Shuai Xue
2023-12-18  6:54   ` Greg KH
2023-12-18  6:45 ` [PATCH v10 4/4] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2023-12-18  6:54   ` Greg KH
2024-02-04  8:01 ` [PATCH v11 0/3] ACPI: APEI: handle synchronous exceptions in task work to send correct SIGBUS si_code Shuai Xue
2024-02-19  1:46   ` Shuai Xue
2024-02-04  8:01 ` [PATCH v11 1/3] ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered Shuai Xue
2024-02-19  9:25   ` Borislav Petkov
2024-02-22  2:07     ` Shuai Xue
2024-02-23  5:26       ` Dan Williams
2024-02-23 12:08         ` Jonathan Cameron
2024-02-23 12:17           ` Jonathan Cameron
2024-02-24  6:08             ` Shuai Xue
2024-02-26 10:29               ` Borislav Petkov
2024-02-27  1:23                 ` Shuai Xue
2024-02-24 19:42             ` Dan Williams
2024-02-24 19:40     ` Dan Williams
2024-02-04  8:01 ` [PATCH v11 2/3] mm: memory-failure: move return value documentation to function declaration Shuai Xue
2024-02-26 10:46   ` Borislav Petkov
2024-02-27  1:27     ` Shuai Xue
2024-02-04  8:01 ` [PATCH v11 3/3] ACPI: APEI: handle synchronous exceptions in task work to send correct SIGBUS si_code Shuai Xue
2024-02-29  7:05   ` Shuai Xue
2024-03-08 10:18   ` Borislav Petkov
2024-03-12  6:05     ` Shuai Xue
2024-09-02  3:00 ` [PATCH v12 0/3] ACPI: APEI: handle synchronous errors in task work Shuai Xue
2024-09-18  2:16   ` Shuai Xue
2024-09-02  3:00 ` [PATCH v12 1/3] ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered Shuai Xue
2024-09-03 16:09   ` Jarkko Sakkinen
2024-09-05  3:04     ` Shuai Xue
2024-09-05 14:14       ` Jarkko Sakkinen
2024-09-05 14:17         ` Jarkko Sakkinen [this message]
2024-09-06  1:53           ` Shuai Xue
2024-09-06 14:42             ` Jarkko Sakkinen
2024-09-02  3:00 ` [PATCH v12 2/3] mm: memory-failure: move return value documentation to function declaration Shuai Xue
2024-09-03 16:10   ` Jarkko Sakkinen
2024-09-05  3:08     ` Shuai Xue
2024-09-02  3:00 ` [PATCH v12 3/3] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2024-09-03 16:11   ` Jarkko Sakkinen
2024-09-05  3:09     ` Shuai Xue
2024-09-20  4:30 ` [PATCH v13 0/3] ACPI: APEI: handle synchronous errors " Shuai Xue
2024-09-20  4:30 ` [PATCH v13 1/3] ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered Shuai Xue
2024-09-20 11:35   ` Jarkko Sakkinen
2024-09-20  4:30 ` [PATCH v13 2/3] mm: memory-failure: move return value documentation to function declaration Shuai Xue
2024-09-20 11:35   ` Jarkko Sakkinen
2024-09-20  4:30 ` [PATCH v13 3/3] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2024-09-20 11:44   ` Jarkko Sakkinen
2024-09-20 12:14     ` Shuai Xue
2024-10-14  8:42 ` [PATCH v14 0/3] ACPI: APEI: handle synchronous errors " Shuai Xue
2024-10-14  8:42 ` [PATCH v14 1/3] ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered Shuai Xue
2024-10-17  9:39   ` Jonathan Cameron
2024-10-17 23:41     ` Shuai Xue
2024-10-14  8:42 ` [PATCH v14 2/3] mm: memory-failure: move return value documentation to function declaration Shuai Xue
2024-10-17  9:41   ` Jonathan Cameron
2024-10-17 23:43     ` Shuai Xue
2024-10-14  8:42 ` [PATCH v14 3/3] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2024-10-17  9:56   ` Jonathan Cameron
2024-10-18  0:08     ` Shuai Xue
2024-10-22  1:11   ` Shuai Xue
2024-10-25 14:40     ` Jarkko Sakkinen
2024-10-26  6:46       ` Shuai Xue
2024-10-28  8:11 ` [PATCH v15 0/3] ACPI: APEI: handle synchronous errors " Shuai Xue
2024-10-28  8:11 ` [PATCH v15 1/3] ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered Shuai Xue
2024-10-29 20:48   ` Yazen Ghannam
2024-10-30  1:54     ` Shuai Xue
2024-10-30 14:08       ` Yazen Ghannam
2024-10-31  1:36         ` Shuai Xue
2024-10-28  8:11 ` [PATCH v15 2/3] mm: memory-failure: move return value documentation to function declaration Shuai Xue
2024-10-28  8:11 ` [PATCH v15 3/3] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2024-11-04  1:54 ` [PATCH v16 0/3] ACPI: APEI: handle synchronous errors " Shuai Xue
2024-11-04  1:54 ` [PATCH v16 1/3] ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered Shuai Xue
2024-11-05 15:09   ` Yazen Ghannam
2024-11-05 16:56     ` Jarkko Sakkinen
2024-11-06  6:07     ` Shuai Xue
2024-11-04  1:54 ` [PATCH v16 2/3] mm: memory-failure: move return value documentation to function declaration Shuai Xue
2024-11-05 15:18   ` Yazen Ghannam
2024-11-06  6:12     ` Shuai Xue
2024-11-04  1:54 ` [PATCH v16 3/3] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D3YEYH69KMV4.13SX59Y2HT6D@kernel.org \
    --to=jarkko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=ardb@kernel.org \
    --cc=ashish.kalra@amd.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hpa@zytor.com \
    --cc=james.morse@arm.com \
    --cc=justin.he@arm.com \
    --cc=lenb@kernel.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lvying6@huawei.com \
    --cc=mawupeng1@huawei.com \
    --cc=mingo@redhat.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=rafael@kernel.org \
    --cc=robert.moore@intel.com \
    --cc=tanxiaofei@huawei.com \
    --cc=tglx@linutronix.de \
    --cc=tongtiangen@huawei.com \
    --cc=tony.luck@intel.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=xiexiuqi@huawei.com \
    --cc=xueshuai@linux.alibaba.com \
    --cc=ying.huang@intel.com \
    --cc=zhuo.song@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox