From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B836C369AB for ; Fri, 18 Apr 2025 07:48:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A63936B02B2; Fri, 18 Apr 2025 03:48:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9EB916B02B3; Fri, 18 Apr 2025 03:48:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8642B6B02B4; Fri, 18 Apr 2025 03:48:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 5E0126B02B2 for ; Fri, 18 Apr 2025 03:48:10 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 6E619C14E8 for ; Fri, 18 Apr 2025 07:48:10 +0000 (UTC) X-FDA: 83346386340.02.EA65E9D Received: from szxga04-in.huawei.com (szxga04-in.huawei.com [45.249.212.190]) by imf14.hostedemail.com (Postfix) with ESMTP id 58142100002 for ; Fri, 18 Apr 2025 07:48:06 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf14.hostedemail.com: domain of guohanjun@huawei.com designates 45.249.212.190 as permitted sender) smtp.mailfrom=guohanjun@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744962488; a=rsa-sha256; cv=none; b=1+NcCluBh6spQMzwtsMYE8OtRMEWv4fUcG6TGuGU4IdxIQMRtKXGabqqWweNT/OPZcJej7 fqg/QJV1JmInKeFraXT1R/V2MxxziJZz6Sjk+XSPpVck0vTnOKpVll+8kXLo2OCoYiAnnQ lTOzm7yW2FF1mEqcCslEFHsCkH3zgMg= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf14.hostedemail.com: domain of guohanjun@huawei.com designates 45.249.212.190 as permitted sender) smtp.mailfrom=guohanjun@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744962488; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RDzXWFVmYPC3kric+MLP/X2we/w+ShLuIiO10dkCHLk=; b=HEeoAt1f0KlfOegO5tve0VuMN7SdkxsaLfRDFO7g/ZzpVT+KZEtVIp6M8g70WBxIWdliQO ctICbpJoaiBkIYLNuSbEI9ftFH2YuoA9ccmaQ/tQFHP/CZOrw2A92/lSlZZTYXVmt43GhO Nl1Gc+wTd3Y7JBgm/P9mNlnA6t0qDEI= Received: from mail.maildlp.com (unknown [172.19.163.44]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4Zf6Lc4S9yz2TS3t; Fri, 18 Apr 2025 15:47:52 +0800 (CST) Received: from dggpemf500002.china.huawei.com (unknown [7.185.36.57]) by mail.maildlp.com (Postfix) with ESMTPS id A0D301400D4; Fri, 18 Apr 2025 15:48:02 +0800 (CST) Received: from [10.174.178.247] (10.174.178.247) by dggpemf500002.china.huawei.com (7.185.36.57) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 18 Apr 2025 15:48:01 +0800 Subject: Re: [RESEND PATCH v18 1/2] ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered To: Shuai Xue , , , , , , , , , , , , , , , , , , , , , , , CC: , , , , , , , , , , , , , , , , , References: <20250404112050.42040-1-xueshuai@linux.alibaba.com> <20250404112050.42040-2-xueshuai@linux.alibaba.com> <0c0bc332-0323-4e43-a96b-dd5f5957ecc9@huawei.com> <709ee8d2-8969-424c-b32b-101c6a8220fb@linux.alibaba.com> From: Hanjun Guo Message-ID: <353809e7-5373-0d54-6ddb-767bc5af9e5f@huawei.com> Date: Fri, 18 Apr 2025 15:48:00 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 MIME-Version: 1.0 In-Reply-To: <709ee8d2-8969-424c-b32b-101c6a8220fb@linux.alibaba.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.178.247] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To dggpemf500002.china.huawei.com (7.185.36.57) X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 58142100002 X-Stat-Signature: mj51ndcjgq4s64new8mmb3kusfyi3gms X-Rspam-User: X-HE-Tag: 1744962486-576755 X-HE-Meta: U2FsdGVkX1/LJGtwMbxKYhnKQj5G45FKE9jmZwgGoUV7G2ZsVrTZq+yRWRoQy/QwU8RnwIAIPmY8XUk5kuOpExw5LTYcckQiyw9P/ll2rr8zMSOHO8ppgDbAw7WnycxJKgNFGOGfkQF/JKZNkpBycu38hMN3lb3tDzY9jhJafBGCghBOR5Gx4QdsOCFBLDmHEZQMPVvyZrEqgk5OcSug0iEOFgZ1rCwfomEF8CaCGdMMtNSRvblyTWHGQmgK2UwpfcTiLuBHeGUtgQavASpQGpofnEtxZlD7G6ApEvh1o6oTWOZyeqIpy7PvWlltczCCY3BMpf/s7eFJ6Y5nxvMRLZrMAJwvju9lB7AD/2n9wZnPY6j4/HTdp1np/oPITgqbIz1kL260UZoIK1e7SE3Ib6YpzXGLY83nvrlyW7XpYZCvhlp4+7bvLr5fYZC4sJ9puJE28M6MajdAl7/JGKJqPzWCXfUMM1lOiuVrTrnnc3OQ3RW9gkJVfav3d3aIIY8XSPzbALzHmok9uBnckZvQgsr6feXh181WEF18Ah8FkjfNIGvr1oZxHxUVrZLChRoSZ1m6FVCIvNWB/Hwm/8ohhreFdlbGNj6FO8nfikxORWaq/bCZosusd54qyBcO6z/RGDam3Xy+CC5zPKgwoKQjnRLB4PIgfDKKH0hVw6W07SpMFbqFmKruC1YAVw4xT1RTtGWWqZQXqWxOTHsLyYOHQNv4CuMeel/IhCGMTaDi6hFlkSBLniblyHgw78bfvpNLWoXyqDhYd2+YSZl2Bdw2KJNu1hp2TZ5KemqDq8LsVKi42YXW2IcLhY6PXTIEK9WSOUU33aSCLGlV/rNwvsaOgcAQXKb+5RKp7zGJF8ihOrEfpTRl8xOtBi+KGTzaRjuTia/FaP7jaRcNqF92m3ks+DYbCekxUANCAJD2K7NPPbGG1CbUOo5L0GckXrLvZfYGip7pr+HZvH+/Sam51sK dai9aEVR eOP8aopt/A/wuqL3+awqCCi8WPuObeqiqh5S9tpo/89pDe54T9ZSxxDjrB0tsDpuBvSo0Xtrv9ggU1njnxxKLvqSie/UUhu9P5g7unhguf+kEDKM21XbPF33xUfnpHBzQlBMLAsDxR4CecYxKKskVkhWPA25N2p3qp2dtOUlOkKUD6XRh/PieWhw7qO1UhQna5pU7XF6aIzQO5y35aHrko+d1SVtvKbgPWWvKQ8wVMt8HajFA3AMLAUSrv9bM20bLHLl0RstNhMvKaDCUtgL1mUrXoQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/4/14 23:02, Shuai Xue wrote: > > > 在 2025/4/14 22:37, Hanjun Guo 写道: >> On 2025/4/4 19:20, Shuai Xue wrote: >>> Synchronous error was detected as a result of user-space process >>> accessing >>> a 2-bit uncorrected error. The CPU will take a synchronous error >>> exception >>> such as Synchronous External Abort (SEA) on Arm64. The kernel will >>> queue a >>> memory_failure() work which poisons the related page, unmaps the >>> page, and >>> then sends a SIGBUS to the process, so that a system wide panic can be >>> avoided. >>> >>> However, no memory_failure() work will be queued when abnormal >>> synchronous >>> errors occur. These errors can include situations such as invalid PA, >>> unexpected severity, no memory failure config support, invalid GUID >>> section, etc. In such case, the user-space process will trigger SEA >>> again. >>> This loop can potentially exceed the platform firmware threshold or even >>> trigger a kernel hard lockup, leading to a system reboot. >>> >>> Fix it by performing a force kill if no memory_failure() work is queued >>> for synchronous errors. >>> >>> Signed-off-by: Shuai Xue >>> Reviewed-by: Jarkko Sakkinen >>> Reviewed-by: Jonathan Cameron >>> Reviewed-by: Yazen Ghannam >>> Reviewed-by: Jane Chu >>> --- >>>   drivers/acpi/apei/ghes.c | 11 +++++++++++ >>>   1 file changed, 11 insertions(+) >>> >>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c >>> index b72772494655..50e4d924aa8b 100644 >>> --- a/drivers/acpi/apei/ghes.c >>> +++ b/drivers/acpi/apei/ghes.c >>> @@ -799,6 +799,17 @@ static bool ghes_do_proc(struct ghes *ghes, >>>           } >>>       } >>> +    /* >>> +     * If no memory failure work is queued for abnormal synchronous >>> +     * errors, do a force kill. >>> +     */ >>> +    if (sync && !queued) { >>> +        dev_err(ghes->dev, >>> +            HW_ERR GHES_PFX "%s:%d: synchronous unrecoverable error >>> (SIGBUS)\n", >>> +            current->comm, task_pid_nr(current)); >>> +        force_sig(SIGBUS); >>> +    } >> >> I think it's reasonable to send a force kill to the task when the >> synchronous memory error is not recovered. >> >> But I hope this code will not trigger some legacy firmware issues, >> let's be careful for this, so can we just introduce arch specific >> callbacks for this? > > Sorry, can you give more details? I am not sure I got your point. > > For x86, Tony confirmed that ghes will not dispatch x86 synchronous errors > (a.k.a machine check exception), in previous vesion. > Sync is only used in arm64 platform, see is_hest_sync_notify(). Sorry for the late reply, from the code I can see that x86 will reuse ghes_do_proc(), if Tony confirmed that x86 is OK, it's OK to me as well. Thanks Hanjun