From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 605F4C35274 for ; Thu, 21 Dec 2023 13:55:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CF3148D0006; Thu, 21 Dec 2023 08:55:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CA3148D0002; Thu, 21 Dec 2023 08:55:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B6AAC8D0006; Thu, 21 Dec 2023 08:55:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id A87428D0002 for ; Thu, 21 Dec 2023 08:55:47 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 79D6E140D74 for ; Thu, 21 Dec 2023 13:55:47 +0000 (UTC) X-FDA: 81590973534.27.7EFCC41 Received: from mail-ot1-f52.google.com (mail-ot1-f52.google.com [209.85.210.52]) by imf27.hostedemail.com (Postfix) with ESMTP id C1EDA40008 for ; Thu, 21 Dec 2023 13:55:45 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none); spf=pass (imf27.hostedemail.com: domain of rjwysocki@gmail.com designates 209.85.210.52 as permitted sender) smtp.mailfrom=rjwysocki@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703166945; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tn1YpB9RSUb3kYqtYjrWXVkua1HlT1atP73EUz1/0nY=; b=VpqoqFs7AjbEr9co+nL5Kaf/cQvOvgCspSZlEVs9g/RenJwWD/8mKmy8VkCdKvrGivkF3J ajp2J/bDf88wuj7lsAZqgjBMUFyesmMrlMScD13quIl2D96uAZDX4CKzxyr5slgyRUrH5L v1XCoJqIxC/MYS6azzkV1Af1+Piqp04= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none); spf=pass (imf27.hostedemail.com: domain of rjwysocki@gmail.com designates 209.85.210.52 as permitted sender) smtp.mailfrom=rjwysocki@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703166945; a=rsa-sha256; cv=none; b=ANU6Z9r12hdmkGXjzZlW2sl5s/qaaXbQUwjDCjvX9/jOlgba0agRZGo0RoPr+dvJqOvlz1 lfqahxJim++gVNbw/ClEl0klx8UuVQfXau3GsRMdwiWi2SepcnThVhNN6gSJEqGsu0J7zv Xh1FSFeXHYCLDapsqokUCrppv90vvec= Received: by mail-ot1-f52.google.com with SMTP id 46e09a7af769-6dbb9d03b5eso95554a34.1 for ; Thu, 21 Dec 2023 05:55:45 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703166945; x=1703771745; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tn1YpB9RSUb3kYqtYjrWXVkua1HlT1atP73EUz1/0nY=; b=KynWslZTVzf9L9yMWaZWtsJQTJMhqEJg6S9MIhaT/z6MC0BKVYAtoSwTXJA0EEAEcD rmJZTxbXFB8SpQXoO2jm3ZsClDxR0LcpbcHu1SievWzZdz7hHquuwrQ0pwVwLJvFNhrB h/BnWz6aYxWt1PzhRgXLConXgZ/amFpFeITCawnk8qsft89pVVXuSubiUf0Iv6pz01cf Z/V2y9aIq1M9PJhA2w8/H/KM6eGexroUfjgyS54Luyd0KLuhBLbRjWINdPBcg5EE542C jfwG5AYVM/Eia9r61nd78gIHMhbQ0tQJcAibdVr1iwg10YowlCPY/CxpmnCVpvLd/XIM lxeg== X-Gm-Message-State: AOJu0YyZNF7/U0z1cSos60CbwzvUxU39W/U9WObD5kxBb60dMbvAeiTO eKQGdGo4bMxzcc4FN1/ywFJPZsCaU7vict7br1E= X-Google-Smtp-Source: AGHT+IGyDCjQzUrqeW1hQnA1vPHGjsH39gBljxA4HGmC4zmiOGkpax/gOnaTZ3Hp+uBUwVKFr9RB00v6d1OjIBF6qys= X-Received: by 2002:a4a:a581:0:b0:591:cdc0:f28d with SMTP id d1-20020a4aa581000000b00591cdc0f28dmr13283358oom.0.1703166944879; Thu, 21 Dec 2023 05:55:44 -0800 (PST) MIME-Version: 1.0 References: <20221027042445.60108-1-xueshuai@linux.alibaba.com> <20231218064521.37324-2-xueshuai@linux.alibaba.com> In-Reply-To: <20231218064521.37324-2-xueshuai@linux.alibaba.com> From: "Rafael J. Wysocki" Date: Thu, 21 Dec 2023 14:55:33 +0100 Message-ID: Subject: Re: [PATCH v10 1/4] ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events To: Shuai Xue Cc: bp@alien8.de, rafael@kernel.org, wangkefeng.wang@huawei.com, tanxiaofei@huawei.com, mawupeng1@huawei.com, tony.luck@intel.com, linmiaohe@huawei.com, naoya.horiguchi@nec.com, james.morse@arm.com, gregkh@linuxfoundation.org, will@kernel.org, jarkko@kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-edac@vger.kernel.org, acpica-devel@lists.linuxfoundation.org, stable@vger.kernel.org, x86@kernel.org, justin.he@arm.com, ardb@kernel.org, ying.huang@intel.com, ashish.kalra@amd.com, baolin.wang@linux.alibaba.com, tglx@linutronix.de, mingo@redhat.com, dave.hansen@linux.intel.com, lenb@kernel.org, hpa@zytor.com, robert.moore@intel.com, lvying6@huawei.com, xiexiuqi@huawei.com, zhuo.song@linux.alibaba.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: 86jg3qdrjcpuz5pqj9xqu9cmx317ge6t X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: C1EDA40008 X-HE-Tag: 1703166945-940427 X-HE-Meta: U2FsdGVkX1+/Brq2f9YcT5ZNfYGWfHtPPZqklj2uBRafMAJEJFfhYExtEOVRMRVBCaNxYPWRMzmKJny3oPaQycUISXXLOl6Pc+Yx4xLG3bHbhvF/dJ7/140yuS6EnVDBf9hhKPqZKL/EiJ0se912j+ZXLI2pMarXzyziSSKoKFir23HrzPDFoMCUA+krFLwhFc5lmhrHBb98uAUtBvuHk85Fut1TTdoxu330Y68WW93N4lVd+HATBK1ai+zmIDSSIJ0h1AyBtE4qOTUS002wi5TaF60qoy84wV5GdI0LNiSQtecS3yz8bzwjQ0UlAmG2w2Asp5ODWJErTFoTnCQpfUF+bol77INGXsmiKORlWNpw2oHgUh55YsHud+BjY5Tb6sQXUfzXiKZUXhv9dmU1t0ctQY7ZGBplMYhB7mrnPdbH531d57wm4aG1KxYuTusnvbVBoSpNkNHjYY2dF4YEm3oyYhQh3++tn4mm8C9zUg4sBzf5yq8ap3o59RyQBcb9LZAOfNK2+0Uh3sRqT4MpdsYGvLWPRpYLI8OiEo1SQEVEX6C+gW/DUOwnNg0ODZt11WpGBAYqQQPxRcmHoOblvOqRnWSPBaJ3tmEUZpMn7cDpXITQc98mzf3dlEqKGT4tnQ6hB/jIp1R51C26dIgRFl2oyGsxHzJzU73Zu2719jwN4ioas8nPxDiC0fcP2D7keRUoQ1kZb5LwHQ7bdeAARxIL2KtBf9lGBRtXFK2PDmPEO6iTo/zpOpFdzGfw4QVaB8jYCwri53RV/qh2CBZZwWq9UAuhgKYnwqJRrrvg4LszaGkdgXxx7lVXXS+Vyh7RuWvrdXRC++5Nqv0F7kjBYPfLT4iWRb3iwdKoIF5DiW++3TKaYlvXTXeqTikhrL9qpgtMMI0+uN4yIqgf2kU3PpHjDlcCcUKhdT+8+iLNaOIh6OR8hV/VYfZ+z9P6bNwIXI9hbjgzpVoj46l0D6g DUno6Twh slkKU/8ShNmAXQSOK2NAvWIIIvjc3h+f4wma6a9jI2mWY3T2fCJcSWqWg3RvsgRwmRfZMDHZpHRjVpBYelxdc9mS+gcvDveN1hTyzUCo7ay923IfFqwrKcJ3rJhwT7pGIF2A8/cmn9AFw6jDNfPJ1AxzYuCP+zFJCbYxcO2MLkCKZW25ScfT4+phQl2RfRXodWqtnFlDW0Xo9ToXr27FCItCeWwpvdEQGVzWMo5uTuCIbHfKTuXamI3qwMj9GNd39sAFfGuEhF8/WcYLJseBNMfgY9d6fTi3ko+79y9+OMt8iYzfFzkj3lRBx6YyaekJPPugnwUIHbwBcVQof2reluGPM38oErENOvqZUJu9aTk3+W5gSm8ctft5vlpB5SoPzT2gQ0HeY3ldrnGuB6p0wIACxygbu0pyWxJG8PMULu/TNeSMIlN/Dpj0c5hnKRXHVCQnY+qs6Rf8YJvoto8aMkNAGLbFa15uC76WVaem4U1shMnQnHn4V6WBkkg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Dec 18, 2023 at 7:45=E2=80=AFAM Shuai Xue wrote: > > There are two major types of uncorrected recoverable (UCR) errors : > > - Synchronous error: The error is detected and raised at the point of the > consumption in the execution flow, e.g. when a CPU tries to access > a poisoned cache line. The CPU will take a synchronous error exception > such as Synchronous External Abort (SEA) on Arm64 and Machine Check > Exception (MCE) on X86. OS requires to take action (for example, offlin= e > failure page/kill failure thread) to recover this uncorrectable error. > > - Asynchronous error: The error is detected out of processor execution > context, e.g. when an error is detected by a background scrubber. Some = data > in the memory are corrupted. But the data have not been consumed. OS is > optional to take action to recover this uncorrectable error. > > When APEI firmware first is enabled, a platform may describe one error > source for the handling of synchronous errors (e.g. MCE or SEA notificati= on > ), or for handling asynchronous errors (e.g. SCI or External Interrupt > notification). In other words, we can distinguish synchronous errors by > APEI notification. For synchronous errors, kernel will kill the current > process which accessing the poisoned page by sending SIGBUS with > BUS_MCEERR_AR. In addition, for asynchronous errors, kernel will notify t= he > process who owns the poisoned page by sending SIGBUS with BUS_MCEERR_AO i= n > early kill mode. However, the GHES driver always sets mf_flags to 0 so th= at > all synchronous errors are handled as asynchronous errors in memory failu= re. > > To this end, set memory failure flags as MF_ACTION_REQUIRED on synchronou= s > events. > > Signed-off-by: Shuai Xue > Tested-by: Ma Wupeng > Reviewed-by: Kefeng Wang > Reviewed-by: Xiaofei Tan > Reviewed-by: Baolin Wang > Reviewed-by: James Morse Applied as 6.8 material. The other patches in the series still need to receive tags from the APEI designated reviewers (as per MAINTAINERS). Thanks! > --- > drivers/acpi/apei/ghes.c | 29 +++++++++++++++++++++++------ > 1 file changed, 23 insertions(+), 6 deletions(-) > > diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c > index 63ad0541db38..ab2a82cb1b0b 100644 > --- a/drivers/acpi/apei/ghes.c > +++ b/drivers/acpi/apei/ghes.c > @@ -101,6 +101,20 @@ static inline bool is_hest_type_generic_v2(struct gh= es *ghes) > return ghes->generic->header.type =3D=3D ACPI_HEST_TYPE_GENERIC_E= RROR_V2; > } > > +/* > + * A platform may describe one error source for the handling of synchron= ous > + * errors (e.g. MCE or SEA), or for handling asynchronous errors (e.g. S= CI > + * or External Interrupt). On x86, the HEST notifications are always > + * asynchronous, so only SEA on ARM is delivered as a synchronous > + * notification. > + */ > +static inline bool is_hest_sync_notify(struct ghes *ghes) > +{ > + u8 notify_type =3D ghes->generic->notify.type; > + > + return notify_type =3D=3D ACPI_HEST_NOTIFY_SEA; > +} > + > /* > * This driver isn't really modular, however for the time being, > * continuing to use module_param is the easiest way to remain > @@ -489,7 +503,7 @@ static bool ghes_do_memory_failure(u64 physical_addr,= int flags) > } > > static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gd= ata, > - int sev) > + int sev, bool sync) > { > int flags =3D -1; > int sec_sev =3D ghes_severity(gdata->error_severity); > @@ -503,7 +517,7 @@ static bool ghes_handle_memory_failure(struct acpi_he= st_generic_data *gdata, > (gdata->flags & CPER_SEC_ERROR_THRESHOLD_EXCEEDED)) > flags =3D MF_SOFT_OFFLINE; > if (sev =3D=3D GHES_SEV_RECOVERABLE && sec_sev =3D=3D GHES_SEV_RE= COVERABLE) > - flags =3D 0; > + flags =3D sync ? MF_ACTION_REQUIRED : 0; > > if (flags !=3D -1) > return ghes_do_memory_failure(mem_err->physical_addr, fla= gs); > @@ -511,9 +525,11 @@ static bool ghes_handle_memory_failure(struct acpi_h= est_generic_data *gdata, > return false; > } > > -static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdat= a, int sev) > +static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdat= a, > + int sev, bool sync) > { > struct cper_sec_proc_arm *err =3D acpi_hest_get_payload(gdata); > + int flags =3D sync ? MF_ACTION_REQUIRED : 0; > bool queued =3D false; > int sec_sev, i; > char *p; > @@ -538,7 +554,7 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest= _generic_data *gdata, int s > * and don't filter out 'corrected' error here. > */ > if (is_cache && has_pa) { > - queued =3D ghes_do_memory_failure(err_info->physi= cal_fault_addr, 0); > + queued =3D ghes_do_memory_failure(err_info->physi= cal_fault_addr, flags); > p +=3D err_info->length; > continue; > } > @@ -666,6 +682,7 @@ static bool ghes_do_proc(struct ghes *ghes, > const guid_t *fru_id =3D &guid_null; > char *fru_text =3D ""; > bool queued =3D false; > + bool sync =3D is_hest_sync_notify(ghes); > > sev =3D ghes_severity(estatus->error_severity); > apei_estatus_for_each_section(estatus, gdata) { > @@ -683,13 +700,13 @@ static bool ghes_do_proc(struct ghes *ghes, > atomic_notifier_call_chain(&ghes_report_chain, se= v, mem_err); > > arch_apei_report_mem_error(sev, mem_err); > - queued =3D ghes_handle_memory_failure(gdata, sev)= ; > + queued =3D ghes_handle_memory_failure(gdata, sev,= sync); > } > else if (guid_equal(sec_type, &CPER_SEC_PCIE)) { > ghes_handle_aer(gdata); > } > else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) { > - queued =3D ghes_handle_arm_hw_error(gdata, sev); > + queued =3D ghes_handle_arm_hw_error(gdata, sev, s= ync); > } else { > void *err =3D acpi_hest_get_payload(gdata); > > -- > 2.39.3 >