From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 630FCE75433 for ; Tue, 3 Oct 2023 08:29:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D25B68D0063; Tue, 3 Oct 2023 04:29:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CD5058D0003; Tue, 3 Oct 2023 04:29:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B9D008D0063; Tue, 3 Oct 2023 04:29:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id ABCEF8D0003 for ; Tue, 3 Oct 2023 04:29:23 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 5F3E2B36A6 for ; Tue, 3 Oct 2023 08:29:23 +0000 (UTC) X-FDA: 81303475806.11.54E8645 Received: from out-210.mta0.migadu.com (out-210.mta0.migadu.com [91.218.175.210]) by imf18.hostedemail.com (Postfix) with ESMTP id E2AEE1C0022 for ; Tue, 3 Oct 2023 08:29:19 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=SnZnaakk; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf18.hostedemail.com: domain of naoya.horiguchi@linux.dev designates 91.218.175.210 as permitted sender) smtp.mailfrom=naoya.horiguchi@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696321761; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=SvHEdL127DkwY1o1/aHJZdaismZDLKyIFOf23kZMKII=; b=XPS7fTv48wK7xHnnuY3dYfo49x/FacBxqV4smgagrvLMUZ3Tu3phCKcnRTevRf39mwn5y+ IArwfl4axQsI5KHkqCJvXhvVAfwveiqg252abVgEx8K9dR7DTPxoNLIWEOARVrrSvPrqig H8flt2Y9SGWWj1Cr88AKPdg2lBJT6hw= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=SnZnaakk; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf18.hostedemail.com: domain of naoya.horiguchi@linux.dev designates 91.218.175.210 as permitted sender) smtp.mailfrom=naoya.horiguchi@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696321761; a=rsa-sha256; cv=none; b=ZSq500QWzgftVuR70nciAvIKA8C2xfpreZ+j8dd4pzT26Q5x3eo5XeIAIYRg9hmfNn3eGj uVYo/kII+jM/K9+DYmRc7rpK/QnrKUN6jUviWrsXkNlugF8LWJ8Qy/xVdZPuUYAJY7mMHb /s+GCJy8CqfEI1vI+0DXx3isFqdhMBs= Date: Tue, 3 Oct 2023 17:28:58 +0900 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1696321756; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=SvHEdL127DkwY1o1/aHJZdaismZDLKyIFOf23kZMKII=; b=SnZnaakkCBrskHWe8xJL1L5kXJ04u9tThP9M8iKTR/CDUx+XLVjReoLcedlF6WCxvIyGD6 OhdcyIniVLeOQ4Kzd3U+6O26XoRzrW7252N8bpgQy1lQ/Hvcody5jaKDq/8yQ6j4TdM4hz ijfc3ipmQGWFZBIAd/Eq8b9RfxVpS2E= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Naoya Horiguchi To: Shuai Xue Cc: rafael@kernel.org, wangkefeng.wang@huawei.com, tanxiaofei@huawei.com, mawupeng1@huawei.com, tony.luck@intel.com, linmiaohe@huawei.com, naoya.horiguchi@nec.com, james.morse@arm.com, gregkh@linuxfoundation.org, will@kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-edac@vger.kernel.org, acpica-devel@lists.linuxfoundation.org, stable@vger.kernel.org, x86@kernel.org, justin.he@arm.com, ardb@kernel.org, ying.huang@intel.com, ashish.kalra@amd.com, baolin.wang@linux.alibaba.com, bp@alien8.de, tglx@linutronix.de, mingo@redhat.com, dave.hansen@linux.intel.com, jarkko@kernel.org, lenb@kernel.org, hpa@zytor.com, robert.moore@intel.com, lvying6@huawei.com, xiexiuqi@huawei.com, zhuo.song@linux.alibaba.com Subject: Re: [RESEND PATCH v8 2/2] ACPI: APEI: handle synchronous exceptions in task work Message-ID: <20231003082858.GA750796@ik1-406-35019.vs.sakura.ne.jp> References: <20221027042445.60108-1-xueshuai@linux.alibaba.com> <20230919022127.69732-3-xueshuai@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20230919022127.69732-3-xueshuai@linux.alibaba.com> X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Stat-Signature: rpkbwaqjufhcdo8iairaigk91eafggg7 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: E2AEE1C0022 X-HE-Tag: 1696321759-594410 X-HE-Meta: U2FsdGVkX19hfTZEHHWy7Pclgit3wjFYJjpbtS0bcS9KrtDtbVjRYJg1V2N6JLdnxS7qtMkKGr2G/b40+UG0WRLqHnuUqyxxJ/09s7fQt7p84BLsGPdaFUMvq8XaVZGYYz3ieeIP4x/uHOT1BFLtQ4HwVwCW5cmtYSq3oMpRT1vYH6Gmh3+S26EW5J4N49wwhK7MqH+hKV6fP6BiwPMc7XIQnuzyr+S9NCBPcwlGRIPcoRxWZt2L45EYXBVWHVejIynllnQHJhBzFltb1ApgL9M42BRt4z0sp6XqithvkrC27JzwO91AtDYW2yjeehMWrzcEBRw+0hdhbo7U+LfNKU9qf+0hD7SopMVcdCL3Qi+89E+DBU06Sks3BkV889A9CaMXNSv3kbHMOGIh57V98U0QqLzu9xLabd7fynTZMAJzH0HUnghs+p+TJH+MTYehbJt5qYHT8gz1a5e7QYJHHG4p50rAdq+1/FFL8mAYl5XFg9G2VE4rDsZpl3xLdLA7U/n4ZtX0aqOeU8GPuvWWpkL2B/TQt6k/pIuhiAWncfXH+z2HYgL6fHOvnQE976qjPkOaxQ0fNvRnUmD4goX2r5wV6+kNeRJQqZuggSnvrUNlIOBiP36K9tES+wf4IiQwlMQJ3GYJxNgRZN9vNIzdipMkXZTLlf8qlPj4c+raVOhG3LQQC/HFOzfitcmzy3YbGxjDkP5gzZigVW/5Bal5knbBqcZQeylhNLmXQS5rRu+INdpspb80yCBXK/c2XDEcLYx8ZS83nXYOEfYZ7F8fV06eX0s23FnQoq3pPpERmskdRDPuYA6jRNbRt2+ZBm5J+MjADlZJKmyFVeR9gSkajyTgUlRKcBvIihmhlAEH9JnzwQFB9y01XBRdYvCyoZRhbL5588yUxgP1pBBIWGtLKYOsQuatQbprfJbrZkma9ByNtSLNIuzRmHFQZkbjLvz2E38nE9uU0+MHYemz9LP aKPIjPUe pNJp6TX2FEiMlh/U/fcu4TPMc7D2ws9MCi7/5CIxOghzo9oeiLeQMvi34mW8Mi5ogaRLPBc5Qb8GxQ5FXf28rJCyx0A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Sep 19, 2023 at 10:21:27AM +0800, Shuai Xue wrote: > Hardware errors could be signaled by synchronous interrupt, e.g. when an > error is detected by a background scrubber, or signaled by synchronous > exception, e.g. when an uncorrected error is consumed. Both synchronous and > asynchronous error are queued and handled by a dedicated kthread in > workqueue. > > commit 7f17b4a121d0 ("ACPI: APEI: Kick the memory_failure() queue for > synchronous errors") keep track of whether memory_failure() work was > queued, and make task_work pending to flush out the workqueue so that the > work for synchronous error is processed before returning to user-space. > The trick ensures that the corrupted page is unmapped and poisoned. And > after returning to user-space, the task starts at current instruction which > triggering a page fault in which kernel will send SIGBUS to current process > due to VM_FAULT_HWPOISON. > > However, the memory failure recovery for hwpoison-aware mechanisms does not > work as expected. For example, hwpoison-aware user-space processes like > QEMU register their customized SIGBUS handler and enable early kill mode by > seting PF_MCE_EARLY at initialization. Then the kernel will directy notify > the process by sending a SIGBUS signal in memory failure with wrong > si_code: the actual user-space process accessing the corrupt memory > location, but its memory failure work is handled in a kthread context, so > it will send SIGBUS with BUS_MCEERR_AO si_code to the actual user-space > process instead of BUS_MCEERR_AR in kill_proc(). > > To this end, separate synchronous and asynchronous error handling into > different paths like X86 platform does: > > - valid synchronous errors: queue a task_work to synchronously send SIGBUS > before ret_to_user. > - valid asynchronous errors: queue a work into workqueue to asynchronously > handle memory failure. > - abnormal branches such as invalid PA, unexpected severity, no memory > failure config support, invalid GUID section, OOM, etc. > > Then for valid synchronous errors, the current context in memory failure is > exactly belongs to the task consuming poison data and it will send SIBBUS > with proper si_code. > > Fixes: 7f17b4a121d0 ("ACPI: APEI: Kick the memory_failure() queue for synchronous errors") > Signed-off-by: Shuai Xue > Tested-by: Ma Wupeng > Reviewed-by: Kefeng Wang > Reviewed-by: Xiaofei Tan > Reviewed-by: Baolin Wang > --- > arch/x86/kernel/cpu/mce/core.c | 9 +--- > drivers/acpi/apei/ghes.c | 84 +++++++++++++++++++++------------- > include/acpi/ghes.h | 3 -- > mm/memory-failure.c | 17 ++----- > 4 files changed, 56 insertions(+), 57 deletions(-) > ... > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index 4d6e43c88489..80e1ea1cc56d 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -2163,7 +2163,9 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, > * > * Return: 0 for successfully handled the memory error, > * -EOPNOTSUPP for hwpoison_filter() filtered the error event, > - * < 0(except -EOPNOTSUPP) on failure. > + * -EHWPOISON for already sent SIGBUS to the current process with > + * the proper error info, The meaning of this comment is understood, but the sentence seems to be a little too long. Could you sort this out with bullet points (like below)? * Return values: * 0 - success * -EOPNOTSUPP - hwpoison_filter() filtered the error event. * -EHWPOISON - sent SIGBUS to the current process with the proper * error info by kill_accessing_process(). * other negative values - failure > + * other negative error code on failure. > */ > int memory_failure(unsigned long pfn, int flags) > { > @@ -2445,19 +2447,6 @@ static void memory_failure_work_func(struct work_struct *work) > } > } > > -/* > - * Process memory_failure work queued on the specified CPU. > - * Used to avoid return-to-userspace racing with the memory_failure workqueue. > - */ > -void memory_failure_queue_kick(int cpu) > -{ > - struct memory_failure_cpu *mf_cpu; > - > - mf_cpu = &per_cpu(memory_failure_cpu, cpu); > - cancel_work_sync(&mf_cpu->work); > - memory_failure_work_func(&mf_cpu->work); > -} > - The declaration of memory_failure_queue_kick() still remains in include/linux/mm.h, so you can remove it together. Thanks, Naoya Horiguchi