From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58B32C001B2 for ; Thu, 15 Dec 2022 02:45:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6E22E8E0003; Wed, 14 Dec 2022 21:45:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 66B548E0002; Wed, 14 Dec 2022 21:45:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4E4958E0003; Wed, 14 Dec 2022 21:45:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 38BC78E0002 for ; Wed, 14 Dec 2022 21:45:16 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 0C058AB5D2 for ; Thu, 15 Dec 2022 02:45:16 +0000 (UTC) X-FDA: 80242999032.18.301288A Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by imf19.hostedemail.com (Postfix) with ESMTP id B5C571A000D for ; Thu, 15 Dec 2022 02:45:12 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=none; spf=pass (imf19.hostedemail.com: domain of lvying6@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=lvying6@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1671072314; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0MTgBc4lNGqonRPQNMT1Syty20fYKDwNhzEBHLq9N2g=; b=kQPtmwEslY+k0z7NC8ljH/LFnvi97VPcFz4gfoozbO/p6D5LotsVQkfC3i2Uz9bUO9pZp6 8Gq78k1Nb+7z54gFrkIyXD9GpHx4F1oReX3A8yFJNMFzNPkr9rhXmAR2g2rJwMYud42MbB nTJgtpu+mv6TUwAV4x4Wq+UfB+N4iT8= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=none; spf=pass (imf19.hostedemail.com: domain of lvying6@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=lvying6@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1671072314; a=rsa-sha256; cv=none; b=TnqKfsa9iaVPIYRqZCKJAnwKxm5TfFL+XN1lINTwg2iKRooYr4TPRDEOhRX/8Dz9cWGOY/ Y1OhGkTzCzMj0R/Yh2402iCiVuY6trIHofTHJW88yuUThclmH7I5NhzzrWzFPN+aAIAyY3 Lh5LQR6lb68N174aCLKfB1oUh4PKtek= Received: from kwepemi500015.china.huawei.com (unknown [172.30.72.54]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4NXc1h6ctzzJpMM; Thu, 15 Dec 2022 10:41:28 +0800 (CST) Received: from [10.174.176.219] (10.174.176.219) by kwepemi500015.china.huawei.com (7.221.188.92) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.34; Thu, 15 Dec 2022 10:45:06 +0800 Subject: Re: [RFC PATCH v2 1/1] ACPI: APEI: Make memory_failure() triggered by synchronization errors execute in the current context To: =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPoyDnm7TkuZ8p?= CC: "rafael@kernel.org" , "lenb@kernel.org" , "james.morse@arm.com" , "tony.luck@intel.com" , "bp@alien8.de" , "linmiaohe@huawei.com" , "akpm@linux-foundation.org" , "xueshuai@linux.alibaba.com" , "ashish.kalra@amd.com" , "xiezhipeng1@huawei.com" , "wangkefeng.wang@huawei.com" , "xiexiuqi@huawei.com" , "tanxiaofei@huawei.com" , "cuibixuan@linux.alibaba.com" , "linux-acpi@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" References: <20221209095407.383211-1-lvying6@huawei.com> <20221209095407.383211-2-lvying6@huawei.com> <20221215002520.GA2020717@hori.linux.bs1.fc.nec.co.jp> From: Lv Ying Message-ID: <76038f5b-914c-7ae0-e89f-500bd0c7502f@huawei.com> Date: Thu, 15 Dec 2022 10:45:05 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1 MIME-Version: 1.0 In-Reply-To: <20221215002520.GA2020717@hori.linux.bs1.fc.nec.co.jp> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.176.219] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To kwepemi500015.china.huawei.com (7.221.188.92) X-CFilter-Loop: Reflected X-Stat-Signature: 4e8d1574aw91bm9se1pnxj85sri3ic3z X-Rspam-User: X-Rspamd-Queue-Id: B5C571A000D X-Rspamd-Server: rspam06 X-HE-Tag: 1671072312-311640 X-HE-Meta: U2FsdGVkX1+snbYbpWS9Czwj0o2MDsbi2AVCPPqb5OSmLQK/0j/FJqHAHZvW1D1Ol2YBa32ysZuQtpN3dFtxR9w0BuJh2px66FAtBGatNfAYcV7Zep1byDCM7Ce5mRyA8/3qxn5lEpMXWPSdWgqMaL8BRGWjRIvtvVkg4szt9jbLWoOW+2caDHIqck1SnntWp24hxVP5wYbuY633QO3Gqg4DvgnZ1tinEaxh2kSHryUKeAD81ej+pL5PDVtIl5CKkmJtu/b1pJZPG7NPB96Swwjxg3dDCgZbFTtZssYlKm3pM4e0a/7Jh9I/cuecFTTF6DEGO6PU9XeuuGWLT46pSfQDgUrH0y42+SnKUM/9oMeum/Nf3HiN5j66fsiAIfaa5TMOVSyb48vLYU7uo4edZZEPO46AvHSU/Frs/RXmHGeMn7O5XMaEFo/6kPR4Dn49TfwjaUkl0JT5u4K4KLqLsNW618XpbkT/4WYVTIvBz16b50Zcw5hyf9m/pbRnszohIptOgnCXMuqjS3UlF3QxO5BPSOkJB3Gb66MknkO6+vt7KJ2fljFWPyTLDTDjik2QeZKd0ZWiHI90gKxghAn8e5vQye23wz6c2/NzkJ2zGKWRgrPNzHmbjU9xkjSEKMS/CYxtWNdNP5GlLIlqu/duPaKM3CvbzsAUgkoPyeAnAp1V61P/19nxRP6SfsEuzTzfZRNLpbcelb91NYzUUJDIIsjKCpKeegrSQe/UwjCgNPvplk5fMBetNo6/axlgva/9bNjnrszl0jK8dO+2/97x1LKydmk6iTt4JFzyqkQncv601PZu4t5iorjn5xjvBFNDEswRRQyaM+V6LNUWNm2Kh0tFUy9uqUQ+fNgmm/9UcbQ1TREXW5aqLP/X7Gr8nQ0itHRcgRpy0ZTrSP9Mt0RQrvbCSQxAdaiJ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2022/12/15 8:26, HORIGUCHI NAOYA(堀口 直也) wrote: > On Fri, Dec 09, 2022 at 05:54:07PM +0800, Lv Ying wrote: >> The memory uncorrected error which is detected by an external component and >> notified via an IRQ, can be called asynchronization error. If an error is >> detected as a result of user-space process accessing a corrupt memory >> location, the CPU may take an abort. On arm64 this is a >> 'synchronous external abort', and on a firmware first system it is notified >> via NOTIFY_SEA, this can be called synchronization error. >> > > "synchronization error" in this context looks weird to me, maybe you mean > "synchronous error" ? There're many places using "synchronization", so > please use consistent wording. "synchronization error" in this context means "synchronous error", e.g SEA. Thanks for your suggestion, I will use consistent wording - "synchronous error". > >> Currently, synchronization error and asynchronization error both use >> memory_failure_queue to schedule memory_failure() exectute in kworker >> context. Commit 7f17b4a121d0 ("ACPI: APEI: Kick the memory_failure() queue >> for synchronous errors") make task_work pending to flush out the queue, >> cancel_work_sync() in memory_failure_queue_kick() will make >> memory_failure() exectute in kworker context first which will get > > s/exectute/execute/ Thank you for your detailed review, I will check again and fix the typos and syntax errors in the patch. > >> synchronization error info from kfifo, so task_work later will get nothing >> from kfifo which doesn't work as expected. Even worse, synchronization >> error notification has NMI like properties, (it can interrupt IRQ-masked >> code), task_work may get wrong kfifo entry from interrupted >> asynchronization error which is notified by IRQ. >> >> Since the memory_failure() triggered by a synchronous exception is >> executed in the kworker context, the early_kill mode of memory_failure() >> will send wrong si_code by SIGBUS signal: current process is kworker >> thread, the actual user-space process accessing the corrupt memory location >> will be collected by find_early_kill_thread(), and then send SIGBUS with >> BUS_MCEERR_AO si_code to the actual user-space process instead of >> BUS_MCEERR_AR. The machine-manager(kvm) use the si_code: BUS_MCEERR_AO for >> 'action optional' early notifications, and BUS_MCEERR_AR for >> 'action required' synchronous/late notifications. >> >> Make memory_failure() triggered by synchronization errors execute in the >> current context, we do not need workqueue for synchronization error >> anymore, use task_work handle synchronization errors directly. Since, >> synchronization errors and asynchronization errors share the same kfifo, >> use MF_ACTION_REQUIRED flag to distinguish them. And the asynchronization >> error keeps the same as before. >> >> Currently, it's hard to distinguish synchronization error in APEI. It >> can be determined that the SEA report synchronization error, so >> currently only the synchronization error reported by SEA is distinguished >> and handled in current context. >> >> Signed-off-by: Lv Ying >> --- >> drivers/acpi/apei/ghes.c | 20 +++++++++------- >> mm/memory-failure.c | 50 +++++++++++++++++++++++++++++----------- >> 2 files changed, 48 insertions(+), 22 deletions(-) >> >> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c >> index 9952f3a792ba..19d62ec2177f 100644 >> --- a/drivers/acpi/apei/ghes.c >> +++ b/drivers/acpi/apei/ghes.c >> @@ -423,8 +423,8 @@ static void ghes_clear_estatus(struct ghes *ghes, >> >> /* >> * Called as task_work before returning to user-space. >> - * Ensure any queued work has been done before we return to the context that >> - * triggered the notification. >> + * Ensure any queued corrupt page in synchronous errors has been handled before >> + * we return to the user context that triggered the notification. >> */ >> static void ghes_kick_task_work(struct callback_head *head) >> { >> @@ -461,7 +461,7 @@ static bool ghes_do_memory_failure(u64 physical_addr, int flags) >> } >> >> static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, >> - int sev) >> + int sev, int notify_type) >> { >> int flags = -1; >> int sec_sev = ghes_severity(gdata->error_severity); >> @@ -475,7 +475,7 @@ static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, >> (gdata->flags & CPER_SEC_ERROR_THRESHOLD_EXCEEDED)) >> flags = MF_SOFT_OFFLINE; >> if (sev == GHES_SEV_RECOVERABLE && sec_sev == GHES_SEV_RECOVERABLE) >> - flags = 0; >> + flags = (notify_type == ACPI_HEST_NOTIFY_SEA) ? MF_ACTION_REQUIRED : 0; >> >> if (flags != -1) >> return ghes_do_memory_failure(mem_err->physical_addr, flags); >> @@ -483,7 +483,8 @@ static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, >> return false; >> } >> >> -static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata, int sev) >> +static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata, int sev, >> + int notify_type) >> { >> struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata); >> bool queued = false; >> @@ -510,7 +511,9 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata, int s >> * and don't filter out 'corrected' error here. >> */ >> if (is_cache && has_pa) { >> - queued = ghes_do_memory_failure(err_info->physical_fault_addr, 0); >> + queued = ghes_do_memory_failure(err_info->physical_fault_addr, >> + (notify_type == ACPI_HEST_NOTIFY_SEA) ? >> + MF_ACTION_REQUIRED : 0); >> p += err_info->length; >> continue; >> } >> @@ -631,6 +634,7 @@ static bool ghes_do_proc(struct ghes *ghes, >> const guid_t *fru_id = &guid_null; >> char *fru_text = ""; >> bool queued = false; >> + int notify_type = ghes->generic->notify.type; >> >> sev = ghes_severity(estatus->error_severity); >> apei_estatus_for_each_section(estatus, gdata) { >> @@ -648,13 +652,13 @@ static bool ghes_do_proc(struct ghes *ghes, >> ghes_edac_report_mem_error(sev, mem_err); >> >> arch_apei_report_mem_error(sev, mem_err); >> - queued = ghes_handle_memory_failure(gdata, sev); >> + queued = ghes_handle_memory_failure(gdata, sev, notify_type); >> } >> else if (guid_equal(sec_type, &CPER_SEC_PCIE)) { >> ghes_handle_aer(gdata); >> } >> else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) { >> - queued = ghes_handle_arm_hw_error(gdata, sev); >> + queued = ghes_handle_arm_hw_error(gdata, sev, notify_type); >> } else { >> void *err = acpi_hest_get_payload(gdata); >> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >> index bead6bccc7f2..82238ec86acd 100644 >> --- a/mm/memory-failure.c >> +++ b/mm/memory-failure.c >> @@ -2204,7 +2204,11 @@ struct memory_failure_cpu { >> static DEFINE_PER_CPU(struct memory_failure_cpu, memory_failure_cpu); >> >> /** >> - * memory_failure_queue - Schedule handling memory failure of a page. >> + * memory_failure_queue >> + * - Schedule handling memory failure of a page for asynchronous error, memory >> + * failure page will be executed in kworker thread >> + * - put corrupt memory info into kfifo for synchronous error, task_work will >> + * handle them before returning to the user > > I think that the top description of kernel-doc function documentation needs > to be brief, so could you move the above 2 items downward as details? > Maybe the first line can be updated like below (scheduling is done conditionally > with your change): > > /** > * memory_failure_queue - Queue memory failure event > * @pfn: Page Number of the corrupted page > * @flags: Flags for memory failure handling > * > * ... (full details) > > And maybe existing comment in "full details" is obsolete since commit > 7f17b4a121d0 ("ACPI: APEI: Kick the memory_failure() queue for synchronous > errors"), so could you update the whole description to explain the new > behavior with some background information as done in patch description? > Thanks, your description is very concise and close to the meaning expressed by the patch. I will fix it in the next patch. And I will update the whole description to explain the new behavior. >> * @pfn: Page Number of the corrupted page >> * @flags: Flags for memory failure handling >> * >> @@ -2217,6 +2221,11 @@ static DEFINE_PER_CPU(struct memory_failure_cpu, memory_failure_cpu); >> * happen outside the current execution context (e.g. when >> * detected by a background scrubber) >> * >> + * This function can also be used in synchronous errors which was detected as a > > "... errors which was ..." seems unmatched in plurality. > >> + * result of user-space accessing a corrupt memory location, just put memory > > s/corrupt/corrupted/ The typo and syntax error will be fixed in the next patch. > >> + * error info into kfifo, and then, task_work get and handle it in current >> + * execution context instead of scheduling kworker to handle it > > Please put a period at the end of sentence. kernel-doc comment is > converted to auto-generated documentation, so it needs to look like > natural English text. > See https://docs.kernel.org/doc-guide/kernel-doc.html#function-documentation > Thanks, it help me a lot, I will update function comments as per the kernel-doc. >> + * >> * Can run in IRQ context. >> */ >> @@ -2230,9 +2239,10 @@ void memory_failure_queue(unsigned long pfn, int flags) >> >> mf_cpu = &get_cpu_var(memory_failure_cpu); >> spin_lock_irqsave(&mf_cpu->lock, proc_flags); >> - if (kfifo_put(&mf_cpu->fifo, entry)) >> - schedule_work_on(smp_processor_id(), &mf_cpu->work); >> - else >> + if (kfifo_put(&mf_cpu->fifo, entry)) { >> + if (!(entry.flags & MF_ACTION_REQUIRED)) >> + schedule_work_on(smp_processor_id(), &mf_cpu->work); >> + } else >> pr_err("buffer overflow when queuing memory failure at %#lx\n", >> pfn); >> spin_unlock_irqrestore(&mf_cpu->lock, proc_flags); >> @@ -2240,12 +2250,15 @@ void memory_failure_queue(unsigned long pfn, int flags) >> } >> EXPORT_SYMBOL_GPL(memory_failure_queue); >> >> -static void memory_failure_work_func(struct work_struct *work) >> +/* >> + * (a)synchronous error info should be consumed by the corresponding handler >> + */ >> +static void __memory_failure_work_func(struct work_struct *work, bool sync) >> { >> struct memory_failure_cpu *mf_cpu; >> struct memory_failure_entry entry = { 0, }; >> unsigned long proc_flags; >> - int gotten; >> + int gotten, ret; >> >> mf_cpu = container_of(work, struct memory_failure_cpu, work); >> for (;;) { >> @@ -2256,22 +2269,31 @@ static void memory_failure_work_func(struct work_struct *work) >> break; >> if (entry.flags & MF_SOFT_OFFLINE) >> soft_offline_page(entry.pfn, entry.flags); >> - else >> - memory_failure(entry.pfn, entry.flags); >> + else { >> + if (sync && (entry.flags & MF_ACTION_REQUIRED)) { >> + ret = memory_failure(entry.pfn, entry.flags); >> + if (ret == -EHWPOISON || ret == -EOPNOTSUPP) >> + return; >> + >> + pr_err("Memory error not recovered"); >> + force_sig(SIGBUS); >> + } else if (!sync && !(entry.flags & MF_ACTION_REQUIRED)) >> + memory_failure(entry.pfn, entry.flags); > > So if sync is true and MF_ACTION_REQUIRED is not set, memory_failure() is > not called. Does that break something? > > Thanks, > Naoya Horiguchi > Only in synchronous error handle process, set sync true. As expected, MF_ACTION_REQUIRED should be set in synchronous error handle process. Kfifo is shared by synchronous error and asynchronous error. Asynchronous error will not set MF_ACTION_REQUIRED. This judgment is to prevent synchronous error calls memory_failure() handle asynchronous errors in kfifo. If __memory_failure_work_func() in synchronous error get an asynchronous error info(sync is true and MF_ACTION_REQUIRED is not set), just ignore it, it will break nothing. However, currently we can only confirm that SEA is synchronous error, just set MF_ACTION_REQUIRED in SEA, other indeterminate synchronous error will miss memory_failure(). -- Thanks! Lv Ying