From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 58B32C001B2
	for <linux-mm@archiver.kernel.org>; Thu, 15 Dec 2022 02:45:17 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 6E22E8E0003; Wed, 14 Dec 2022 21:45:16 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 66B548E0002; Wed, 14 Dec 2022 21:45:16 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 4E4958E0003; Wed, 14 Dec 2022 21:45:16 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17])
	by kanga.kvack.org (Postfix) with ESMTP id 38BC78E0002
	for <linux-mm@kvack.org>; Wed, 14 Dec 2022 21:45:16 -0500 (EST)
Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay06.hostedemail.com (Postfix) with ESMTP id 0C058AB5D2
	for <linux-mm@kvack.org>; Thu, 15 Dec 2022 02:45:16 +0000 (UTC)
X-FDA: 80242999032.18.301288A
Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189])
	by imf19.hostedemail.com (Postfix) with ESMTP id B5C571A000D
	for <linux-mm@kvack.org>; Thu, 15 Dec 2022 02:45:12 +0000 (UTC)
Authentication-Results: imf19.hostedemail.com;
	dkim=none;
	spf=pass (imf19.hostedemail.com: domain of lvying6@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=lvying6@huawei.com;
	dmarc=pass (policy=quarantine) header.from=huawei.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1671072314;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=0MTgBc4lNGqonRPQNMT1Syty20fYKDwNhzEBHLq9N2g=;
	b=kQPtmwEslY+k0z7NC8ljH/LFnvi97VPcFz4gfoozbO/p6D5LotsVQkfC3i2Uz9bUO9pZp6
	8Gq78k1Nb+7z54gFrkIyXD9GpHx4F1oReX3A8yFJNMFzNPkr9rhXmAR2g2rJwMYud42MbB
	nTJgtpu+mv6TUwAV4x4Wq+UfB+N4iT8=
ARC-Authentication-Results: i=1;
	imf19.hostedemail.com;
	dkim=none;
	spf=pass (imf19.hostedemail.com: domain of lvying6@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=lvying6@huawei.com;
	dmarc=pass (policy=quarantine) header.from=huawei.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1671072314; a=rsa-sha256;
	cv=none;
	b=TnqKfsa9iaVPIYRqZCKJAnwKxm5TfFL+XN1lINTwg2iKRooYr4TPRDEOhRX/8Dz9cWGOY/
	Y1OhGkTzCzMj0R/Yh2402iCiVuY6trIHofTHJW88yuUThclmH7I5NhzzrWzFPN+aAIAyY3
	Lh5LQR6lb68N174aCLKfB1oUh4PKtek=
Received: from kwepemi500015.china.huawei.com (unknown [172.30.72.54])
	by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4NXc1h6ctzzJpMM;
	Thu, 15 Dec 2022 10:41:28 +0800 (CST)
Received: from [10.174.176.219] (10.174.176.219) by
 kwepemi500015.china.huawei.com (7.221.188.92) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2375.34; Thu, 15 Dec 2022 10:45:06 +0800
Subject: Re: [RFC PATCH v2 1/1] ACPI: APEI: Make memory_failure() triggered by
 synchronization errors execute in the current context
To: =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPoyDnm7TkuZ8p?=
	<naoya.horiguchi@nec.com>
CC: "rafael@kernel.org" <rafael@kernel.org>, "lenb@kernel.org"
	<lenb@kernel.org>, "james.morse@arm.com" <james.morse@arm.com>,
	"tony.luck@intel.com" <tony.luck@intel.com>, "bp@alien8.de" <bp@alien8.de>,
	"linmiaohe@huawei.com" <linmiaohe@huawei.com>, "akpm@linux-foundation.org"
	<akpm@linux-foundation.org>, "xueshuai@linux.alibaba.com"
	<xueshuai@linux.alibaba.com>, "ashish.kalra@amd.com" <ashish.kalra@amd.com>,
	"xiezhipeng1@huawei.com" <xiezhipeng1@huawei.com>,
	"wangkefeng.wang@huawei.com" <wangkefeng.wang@huawei.com>,
	"xiexiuqi@huawei.com" <xiexiuqi@huawei.com>, "tanxiaofei@huawei.com"
	<tanxiaofei@huawei.com>, "cuibixuan@linux.alibaba.com"
	<cuibixuan@linux.alibaba.com>, "linux-acpi@vger.kernel.org"
	<linux-acpi@vger.kernel.org>, "linux-kernel@vger.kernel.org"
	<linux-kernel@vger.kernel.org>, "linux-mm@kvack.org" <linux-mm@kvack.org>
References: <20221209095407.383211-1-lvying6@huawei.com>
 <20221209095407.383211-2-lvying6@huawei.com>
 <20221215002520.GA2020717@hori.linux.bs1.fc.nec.co.jp>
From: Lv Ying <lvying6@huawei.com>
Message-ID: <76038f5b-914c-7ae0-e89f-500bd0c7502f@huawei.com>
Date: Thu, 15 Dec 2022 10:45:05 +0800
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
 Thunderbird/68.4.1
MIME-Version: 1.0
In-Reply-To: <20221215002520.GA2020717@hori.linux.bs1.fc.nec.co.jp>
Content-Type: text/plain; charset="utf-8"; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 8bit
X-Originating-IP: [10.174.176.219]
X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To
 kwepemi500015.china.huawei.com (7.221.188.92)
X-CFilter-Loop: Reflected
X-Stat-Signature: 4e8d1574aw91bm9se1pnxj85sri3ic3z
X-Rspam-User: 
X-Rspamd-Queue-Id: B5C571A000D
X-Rspamd-Server: rspam06
X-HE-Tag: 1671072312-311640
X-HE-Meta: U2FsdGVkX1+snbYbpWS9Czwj0o2MDsbi2AVCPPqb5OSmLQK/0j/FJqHAHZvW1D1Ol2YBa32ysZuQtpN3dFtxR9w0BuJh2px66FAtBGatNfAYcV7Zep1byDCM7Ce5mRyA8/3qxn5lEpMXWPSdWgqMaL8BRGWjRIvtvVkg4szt9jbLWoOW+2caDHIqck1SnntWp24hxVP5wYbuY633QO3Gqg4DvgnZ1tinEaxh2kSHryUKeAD81ej+pL5PDVtIl5CKkmJtu/b1pJZPG7NPB96Swwjxg3dDCgZbFTtZssYlKm3pM4e0a/7Jh9I/cuecFTTF6DEGO6PU9XeuuGWLT46pSfQDgUrH0y42+SnKUM/9oMeum/Nf3HiN5j66fsiAIfaa5TMOVSyb48vLYU7uo4edZZEPO46AvHSU/Frs/RXmHGeMn7O5XMaEFo/6kPR4Dn49TfwjaUkl0JT5u4K4KLqLsNW618XpbkT/4WYVTIvBz16b50Zcw5hyf9m/pbRnszohIptOgnCXMuqjS3UlF3QxO5BPSOkJB3Gb66MknkO6+vt7KJ2fljFWPyTLDTDjik2QeZKd0ZWiHI90gKxghAn8e5vQye23wz6c2/NzkJ2zGKWRgrPNzHmbjU9xkjSEKMS/CYxtWNdNP5GlLIlqu/duPaKM3CvbzsAUgkoPyeAnAp1V61P/19nxRP6SfsEuzTzfZRNLpbcelb91NYzUUJDIIsjKCpKeegrSQe/UwjCgNPvplk5fMBetNo6/axlgva/9bNjnrszl0jK8dO+2/97x1LKydmk6iTt4JFzyqkQncv601PZu4t5iorjn5xjvBFNDEswRRQyaM+V6LNUWNm2Kh0tFUy9uqUQ+fNgmm/9UcbQ1TREXW5aqLP/X7Gr8nQ0itHRcgRpy0ZTrSP9Mt0RQrvbCSQxAdaiJ
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On 2022/12/15 8:26, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Fri, Dec 09, 2022 at 05:54:07PM +0800, Lv Ying wrote:
>> The memory uncorrected error which is detected by an external component and
>> notified via an IRQ, can be called asynchronization error. If an error is
>> detected as a result of user-space process accessing a corrupt memory
>> location, the CPU may take an abort. On arm64 this is a
>> 'synchronous external abort', and on a firmware first system it is notified
>> via NOTIFY_SEA, this can be called synchronization error.
>>
> 
> "synchronization error" in this context looks weird to me, maybe you mean
> "synchronous error" ?  There're many places using "synchronization", so
> please use consistent wording.

"synchronization error" in this context means "synchronous error", e.g 
SEA. Thanks for your suggestion, I will use consistent wording - 
"synchronous error".

> 
>> Currently, synchronization error and asynchronization error both use
>> memory_failure_queue to schedule memory_failure() exectute in kworker
>> context. Commit 7f17b4a121d0 ("ACPI: APEI: Kick the memory_failure() queue
>> for synchronous errors") make task_work pending to flush out the queue,
>> cancel_work_sync() in memory_failure_queue_kick() will make
>> memory_failure() exectute in kworker context first which will get
> 
> s/exectute/execute/

Thank you for your detailed review, I will check again and fix the typos 
and syntax errors in the patch.

> 
>> synchronization error info from kfifo, so task_work later will get nothing
>> from kfifo which doesn't work as expected. Even worse, synchronization
>> error notification has NMI like properties, (it can interrupt IRQ-masked
>> code), task_work may get wrong kfifo entry from interrupted
>> asynchronization error which is notified by IRQ.
>>
>> Since the memory_failure() triggered by a synchronous exception is
>> executed in the kworker context, the early_kill mode of memory_failure()
>> will send wrong si_code by SIGBUS signal: current process is kworker
>> thread, the actual user-space process accessing the corrupt memory location
>> will be collected by find_early_kill_thread(), and then send SIGBUS with
>> BUS_MCEERR_AO si_code to the actual user-space process instead of
>> BUS_MCEERR_AR. The machine-manager(kvm) use the si_code: BUS_MCEERR_AO for
>> 'action optional' early notifications, and BUS_MCEERR_AR for
>> 'action required' synchronous/late notifications.
>>
>> Make memory_failure() triggered by synchronization errors execute in the
>> current context, we do not need workqueue for synchronization error
>> anymore, use task_work handle synchronization errors directly. Since,
>> synchronization errors and asynchronization errors share the same kfifo,
>> use MF_ACTION_REQUIRED flag to distinguish them. And the asynchronization
>> error keeps the same as before.
>>
>> Currently, it's hard to distinguish synchronization error in APEI. It
>> can be determined that the SEA report synchronization error, so
>> currently only the synchronization error reported by SEA is distinguished
>> and handled in current context.
>>
>> Signed-off-by: Lv Ying <lvying6@huawei.com>
>> ---
>>   drivers/acpi/apei/ghes.c | 20 +++++++++-------
>>   mm/memory-failure.c      | 50 +++++++++++++++++++++++++++++-----------
>>   2 files changed, 48 insertions(+), 22 deletions(-)
>>
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index 9952f3a792ba..19d62ec2177f 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -423,8 +423,8 @@ static void ghes_clear_estatus(struct ghes *ghes,
>>   
>>   /*
>>    * Called as task_work before returning to user-space.
>> - * Ensure any queued work has been done before we return to the context that
>> - * triggered the notification.
>> + * Ensure any queued corrupt page in synchronous errors has been handled before
>> + * we return to the user context that triggered the notification.
>>    */
>>   static void ghes_kick_task_work(struct callback_head *head)
>>   {
>> @@ -461,7 +461,7 @@ static bool ghes_do_memory_failure(u64 physical_addr, int flags)
>>   }
>>   
>>   static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata,
>> -				       int sev)
>> +				       int sev, int notify_type)
>>   {
>>   	int flags = -1;
>>   	int sec_sev = ghes_severity(gdata->error_severity);
>> @@ -475,7 +475,7 @@ static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata,
>>   	    (gdata->flags & CPER_SEC_ERROR_THRESHOLD_EXCEEDED))
>>   		flags = MF_SOFT_OFFLINE;
>>   	if (sev == GHES_SEV_RECOVERABLE && sec_sev == GHES_SEV_RECOVERABLE)
>> -		flags = 0;
>> +		flags = (notify_type == ACPI_HEST_NOTIFY_SEA) ? MF_ACTION_REQUIRED : 0;
>>   
>>   	if (flags != -1)
>>   		return ghes_do_memory_failure(mem_err->physical_addr, flags);
>> @@ -483,7 +483,8 @@ static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata,
>>   	return false;
>>   }
>>   
>> -static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata, int sev)
>> +static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata, int sev,
>> +		int notify_type)
>>   {
>>   	struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
>>   	bool queued = false;
>> @@ -510,7 +511,9 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata, int s
>>   		 * and don't filter out 'corrected' error here.
>>   		 */
>>   		if (is_cache && has_pa) {
>> -			queued = ghes_do_memory_failure(err_info->physical_fault_addr, 0);
>> +			queued = ghes_do_memory_failure(err_info->physical_fault_addr,
>> +					(notify_type == ACPI_HEST_NOTIFY_SEA) ?
>> +					MF_ACTION_REQUIRED : 0);
>>   			p += err_info->length;
>>   			continue;
>>   		}
>> @@ -631,6 +634,7 @@ static bool ghes_do_proc(struct ghes *ghes,
>>   	const guid_t *fru_id = &guid_null;
>>   	char *fru_text = "";
>>   	bool queued = false;
>> +	int notify_type = ghes->generic->notify.type;
>>   
>>   	sev = ghes_severity(estatus->error_severity);
>>   	apei_estatus_for_each_section(estatus, gdata) {
>> @@ -648,13 +652,13 @@ static bool ghes_do_proc(struct ghes *ghes,
>>   			ghes_edac_report_mem_error(sev, mem_err);
>>   
>>   			arch_apei_report_mem_error(sev, mem_err);
>> -			queued = ghes_handle_memory_failure(gdata, sev);
>> +			queued = ghes_handle_memory_failure(gdata, sev, notify_type);
>>   		}
>>   		else if (guid_equal(sec_type, &CPER_SEC_PCIE)) {
>>   			ghes_handle_aer(gdata);
>>   		}
>>   		else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
>> -			queued = ghes_handle_arm_hw_error(gdata, sev);
>> +			queued = ghes_handle_arm_hw_error(gdata, sev, notify_type);
>>   		} else {
>>   			void *err = acpi_hest_get_payload(gdata);
>>   
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index bead6bccc7f2..82238ec86acd 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -2204,7 +2204,11 @@ struct memory_failure_cpu {
>>   static DEFINE_PER_CPU(struct memory_failure_cpu, memory_failure_cpu);
>>   
>>   /**
>> - * memory_failure_queue - Schedule handling memory failure of a page.
>> + * memory_failure_queue
>> + * - Schedule handling memory failure of a page for asynchronous error, memory
>> + *   failure page will be executed in kworker thread
>> + * - put corrupt memory info into kfifo for synchronous error, task_work will
>> + *   handle them before returning to the user
> 
> I think that the top description of kernel-doc function documentation needs
> to be brief, so could you move the above 2 items downward as details?
> Maybe the first line can be updated like below (scheduling is done conditionally
> with your change):
> 
> /**
>   * memory_failure_queue - Queue memory failure event
>   * @pfn: Page Number of the corrupted page
>   * @flags: Flags for memory failure handling
>   *
>   * ... (full details)
> 
> And maybe existing comment in "full details" is obsolete since commit
> 7f17b4a121d0 ("ACPI: APEI: Kick the memory_failure() queue for synchronous
> errors"), so could you update the whole description to explain the new
> behavior with some background information as done in patch description?
> 

Thanks, your description is very concise and close to the meaning 
expressed by the patch. I will fix it in the next patch.
And I will update the whole description to explain the new behavior.

>>    * @pfn: Page Number of the corrupted page
>>    * @flags: Flags for memory failure handling
>>    *
>> @@ -2217,6 +2221,11 @@ static DEFINE_PER_CPU(struct memory_failure_cpu, memory_failure_cpu);
>>    * happen outside the current execution context (e.g. when
>>    * detected by a background scrubber)
>>    *
>> + * This function can also be used in synchronous errors which was detected as a
> 
> "... errors which was ..." seems unmatched in plurality.
> 
>> + * result of user-space accessing a corrupt memory location, just put memory
> 
> s/corrupt/corrupted/

The typo and syntax error will be fixed in the next patch.

> 
>> + * error info into kfifo, and then, task_work get and handle it in current
>> + * execution context instead of scheduling kworker to handle it
> 
> Please put a period at the end of sentence. kernel-doc comment is
> converted to auto-generated documentation, so it needs to look like
> natural English text.
> See https://docs.kernel.org/doc-guide/kernel-doc.html#function-documentation
> 

Thanks, it help me a lot, I will update function comments as per the 
kernel-doc.

>> + *
>>    * Can run in IRQ context.
>>    */
>> @@ -2230,9 +2239,10 @@ void memory_failure_queue(unsigned long pfn, int flags)
>>   
>>   	mf_cpu = &get_cpu_var(memory_failure_cpu);
>>   	spin_lock_irqsave(&mf_cpu->lock, proc_flags);
>> -	if (kfifo_put(&mf_cpu->fifo, entry))
>> -		schedule_work_on(smp_processor_id(), &mf_cpu->work);
>> -	else
>> +	if (kfifo_put(&mf_cpu->fifo, entry)) {
>> +		if (!(entry.flags & MF_ACTION_REQUIRED))
>> +			schedule_work_on(smp_processor_id(), &mf_cpu->work);
>> +	} else
>>   		pr_err("buffer overflow when queuing memory failure at %#lx\n",
>>   		       pfn);
>>   	spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
>> @@ -2240,12 +2250,15 @@ void memory_failure_queue(unsigned long pfn, int flags)
>>   }
>>   EXPORT_SYMBOL_GPL(memory_failure_queue);
>>   
>> -static void memory_failure_work_func(struct work_struct *work)
>> +/*
>> + * (a)synchronous error info should be consumed by the corresponding handler
>> + */
>> +static void __memory_failure_work_func(struct work_struct *work, bool sync)
>>   {
>>   	struct memory_failure_cpu *mf_cpu;
>>   	struct memory_failure_entry entry = { 0, };
>>   	unsigned long proc_flags;
>> -	int gotten;
>> +	int gotten, ret;
>>   
>>   	mf_cpu = container_of(work, struct memory_failure_cpu, work);
>>   	for (;;) {
>> @@ -2256,22 +2269,31 @@ static void memory_failure_work_func(struct work_struct *work)
>>   			break;
>>   		if (entry.flags & MF_SOFT_OFFLINE)
>>   			soft_offline_page(entry.pfn, entry.flags);
>> -		else
>> -			memory_failure(entry.pfn, entry.flags);
>> +		else {
>> +			if (sync && (entry.flags & MF_ACTION_REQUIRED)) {
>> +				ret = memory_failure(entry.pfn, entry.flags);
>> +				if (ret == -EHWPOISON || ret == -EOPNOTSUPP)
>> +					return;
>> +
>> +				pr_err("Memory error not recovered");
>> +				force_sig(SIGBUS);
>> +			} else if (!sync && !(entry.flags & MF_ACTION_REQUIRED))
>> +				memory_failure(entry.pfn, entry.flags);
> 
> So if sync is true and MF_ACTION_REQUIRED is not set, memory_failure() is
> not called.  Does that break something?
> 
> Thanks,
> Naoya Horiguchi
> 

Only in synchronous error handle process, set sync true.
As expected, MF_ACTION_REQUIRED should be set in synchronous error 
handle process.

Kfifo is shared by synchronous error and asynchronous error. 
Asynchronous error will not set MF_ACTION_REQUIRED. This judgment is to 
prevent synchronous error calls memory_failure() handle asynchronous 
errors in kfifo. If __memory_failure_work_func() in synchronous error 
get an asynchronous error info(sync is true and MF_ACTION_REQUIRED is 
not set), just ignore it, it will break nothing.

However, currently we can only confirm that SEA is
synchronous error, just set MF_ACTION_REQUIRED in SEA, other 
indeterminate synchronous error will miss memory_failure().


-- 
Thanks!
Lv Ying