From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A607BC5478C for ; Fri, 23 Feb 2024 12:08:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D1A226B0071; Fri, 23 Feb 2024 07:08:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CA33D6B0072; Fri, 23 Feb 2024 07:08:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B44836B0074; Fri, 23 Feb 2024 07:08:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 9DDEC6B0071 for ; Fri, 23 Feb 2024 07:08:23 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 480F21610E2 for ; Fri, 23 Feb 2024 12:08:23 +0000 (UTC) X-FDA: 81822946086.09.0C54175 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by imf05.hostedemail.com (Postfix) with ESMTP id 4FF9B100028 for ; Fri, 23 Feb 2024 12:08:19 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=none; spf=pass (imf05.hostedemail.com: domain of jonathan.cameron@huawei.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=jonathan.cameron@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708690101; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xoIe3sPXYstbgPx+iFnzmo7Fbmwm9KqcCOlM3ee1tT4=; b=pdX+ABb2wMmCYCQ0PobBXLEcihuRpZdh2828VmI9+/UALLeuyc04o3PatTblsGx65/Vf/q 6I8u0KOvjV1d05g7ZjW6PNKBJLAaCZO9GqNBz+q1w4lA3FUSQtSjrmJUQzEIEDJvSmj3/n UBTTd4gpeMTO7rQtCln3ZRF/bf8MoGA= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=none; spf=pass (imf05.hostedemail.com: domain of jonathan.cameron@huawei.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=jonathan.cameron@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708690101; a=rsa-sha256; cv=none; b=T2/UoIa65XcehjyPLfWDd09YOX02KCzdT4GIEpsQWGlROkUxB9ta3mxmb/kaZ5gTX+u5Gb 4JxWzWswpZvwcXQukv97uUZQ29G4Zp+LaQhiR+L3A4H/W+/rtZXEpsv+sFz8Ncoz3eb9Pg m0aXDLIRhu3VbwyimpOPH7ZE98Xy/iE= Received: from mail.maildlp.com (unknown [172.18.186.231]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4Th7w71fm3z6K6B1; Fri, 23 Feb 2024 20:04:07 +0800 (CST) Received: from lhrpeml500005.china.huawei.com (unknown [7.191.163.240]) by mail.maildlp.com (Postfix) with ESMTPS id 2F56D140B33; Fri, 23 Feb 2024 20:08:16 +0800 (CST) Received: from localhost (10.202.227.76) by lhrpeml500005.china.huawei.com (7.191.163.240) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Fri, 23 Feb 2024 12:08:15 +0000 Date: Fri, 23 Feb 2024 12:08:13 +0000 From: Jonathan Cameron To: Dan Williams CC: Shuai Xue , Borislav Petkov , Ira Weiny , "Luck, Tony" , "james.morse@arm.com" , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: Re: [PATCH v11 1/3] ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered Message-ID: <20240223120813.00005d1f@Huawei.com> In-Reply-To: <65d82c9352e78_24f3f294d5@dwillia2-mobl3.amr.corp.intel.com.notmuch> References: <20221027042445.60108-1-xueshuai@linux.alibaba.com> <20240204080144.7977-2-xueshuai@linux.alibaba.com> <20240219092528.GTZdMeiDWIDz613VeT@fat_crate.local> <65d82c9352e78_24f3f294d5@dwillia2-mobl3.amr.corp.intel.com.notmuch> Organization: Huawei Technologies Research and Development (UK) Ltd. X-Mailer: Claws Mail 4.1.0 (GTK 3.24.33; x86_64-w64-mingw32) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.202.227.76] X-ClientProxiedBy: lhrpeml100004.china.huawei.com (7.191.162.219) To lhrpeml500005.china.huawei.com (7.191.163.240) X-Rspamd-Queue-Id: 4FF9B100028 X-Rspam-User: X-Stat-Signature: 5yxbb78fm8ihii5ewfww4b3f4wudj453 X-Rspamd-Server: rspam01 X-HE-Tag: 1708690099-940607 X-HE-Meta: U2FsdGVkX1/VWj+kaSMS+KIml0+F58qvBI+qiurH95SmmGZm5eaiWDirQe12QaiswkeW9ULUgvewp5/Blo7/W47X6XCUAA3Ph+VLPQEKqLWVehl7PztWPZvtgIpN1T3VxFKJG5AaZ+mf3YLzPVvQq8rRnYknyngSIDYMEaBwWAkriU6QpkIk2TrW4ddTtZW1kA2JWzEXbmDnUvIBs6PFga4VpBS7ykohvxvflp257YOt9/HEZiLxckcJqCp0DbvwtKXo9j9ruzRmDJwp9Txe6WiHCKZ7zlYqnfZERzoXOWukmEsO3u/eeXMHhBw/EbWB9zR1i3dn2hO1U0oV00DhVrBisiTsQOsUOkdlcmDjj58HGYSqfTmhoNXgtHrR3DJkraCmHcMGFzziMKzLv+yYEj+fxH+Xy3oPAW6t2lfOyjuU6FrIKU8yz234OKC06wl6YXXanWt3ljvaT1YOIKjrz/O9o6KAYaPJBu+DJ/3zbrkCVY16LQ5GBwyeyzkvf8iviZlZ3JpRgha3+j6+532u0L3yCOuwMpRXTTcHnezIiP7wOdKTdbryJNVlXqX8TRFkJWV8I5MXO3gust1amxDHv8BWA9siEOM8RMfT1k3oN5wX7Ytqv1s5F2xT+0OHgcJYD2pO+iINSV2yf2wa73KLBPrjnTTai0a8832fwdxBCSKy2ShiR42M+odsqVJTFZGDeWVwVVlWUsbY656ZcDNiu81+yJHbX7c/9csUgvxYygT4b8mPACzD883DcpBll5ZizqUWZ/yhUFGhegHVOJdcnV8DaJYNOcxAbL/lWxCdfws6ghetBADeoxyroCetrqpJ6Xm0AD85ucijjp7VKi+EI+r0nrnTySzJacvL17oCRYrz3DLDOVmVR49SAp7IH5MoW39fHfWVk7B5JpQ0Uk/tvNMsaG06XEq1iSnd/vWJ72U= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, 22 Feb 2024 21:26:43 -0800 Dan Williams wrote: > Shuai Xue wrote: > > > > > > On 2024/2/19 17:25, Borislav Petkov wrote: > > > On Sun, Feb 04, 2024 at 04:01:42PM +0800, Shuai Xue wrote: > > >> Synchronous error was detected as a result of user-space process accessing > > >> a 2-bit uncorrected error. The CPU will take a synchronous error exception > > >> such as Synchronous External Abort (SEA) on Arm64. The kernel will queue a > > >> memory_failure() work which poisons the related page, unmaps the page, and > > >> then sends a SIGBUS to the process, so that a system wide panic can be > > >> avoided. > > >> > > >> However, no memory_failure() work will be queued when abnormal synchronous > > >> errors occur. These errors can include situations such as invalid PA, > > >> unexpected severity, no memory failure config support, invalid GUID > > >> section, etc. In such case, the user-space process will trigger SEA again. > > >> This loop can potentially exceed the platform firmware threshold or even > > >> trigger a kernel hard lockup, leading to a system reboot. > > >> > > >> Fix it by performing a force kill if no memory_failure() work is queued > > >> for synchronous errors. > > >> > > >> Signed-off-by: Shuai Xue > > >> --- > > >> drivers/acpi/apei/ghes.c | 9 +++++++++ > > >> 1 file changed, 9 insertions(+) > > >> > > >> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c > > >> index 7b7c605166e0..0892550732d4 100644 > > >> --- a/drivers/acpi/apei/ghes.c > > >> +++ b/drivers/acpi/apei/ghes.c > > >> @@ -806,6 +806,15 @@ static bool ghes_do_proc(struct ghes *ghes, > > >> } > > >> } > > >> > > >> + /* > > >> + * If no memory failure work is queued for abnormal synchronous > > >> + * errors, do a force kill. > > >> + */ > > >> + if (sync && !queued) { > > >> + pr_err("Sending SIGBUS to current task due to memory error not recovered"); > > >> + force_sig(SIGBUS); > > >> + } > > > > > > Except that there are a bunch of CXL GUIDs being handled there too and > > > this will sigbus those processes now automatically. > > > > Before the CXL GUIDs added, @Tony confirmed that the HEST notifications are always > > asynchronous on x86 platform, so only Synchronous External Abort (SEA) on ARM is > > delivered as a synchronous notification. > > > > Will the CXL component trigger synchronous events for which we need to terminate the > > current process by sending sigbus to process? > > None of the CXL component errors should be handled as synchronous > events. They are either asynchronous protocol errors, or effectively > equivalent to CPER_SEC_PLATFORM_MEM notifications. Not a good example, CPER_SEC_PLATFORM_MEM is sometimes signaled via SEA.