From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6CB6DC021A7 for ; Thu, 13 Feb 2025 07:00:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9D1CC6B008A; Thu, 13 Feb 2025 02:00:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 933AA6B008C; Thu, 13 Feb 2025 02:00:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7FB8A6B0092; Thu, 13 Feb 2025 02:00:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 5E76D6B008A for ; Thu, 13 Feb 2025 02:00:07 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 1B970A01F7 for ; Thu, 13 Feb 2025 07:00:07 +0000 (UTC) X-FDA: 83114022054.04.80E452F Received: from out30-119.freemail.mail.aliyun.com (out30-119.freemail.mail.aliyun.com [115.124.30.119]) by imf03.hostedemail.com (Postfix) with ESMTP id 6FC9520011 for ; Thu, 13 Feb 2025 07:00:03 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=jTEvpu0I; spf=pass (imf03.hostedemail.com: domain of xueshuai@linux.alibaba.com designates 115.124.30.119 as permitted sender) smtp.mailfrom=xueshuai@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739430005; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tB5Kve4S2OJd5zNjPgLzfRSqSz5A4e7GPCbqNbIsUvY=; b=3H9YEIol9SHbxzH9hsWq/+bgiZxRTXjovdvLlnbusvGuGofMpMZ6M8xWfyQxphSIwXEQz1 GjK1Y6rwNiVZfI8+wwxU6lyHAgYlQnj2bPm8Qhsn2/Y90bW6QHXgilUPs29UEVoIf621TO mUvSQD4ozXlVkA6K+A+lsrIKvbA43vI= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=jTEvpu0I; spf=pass (imf03.hostedemail.com: domain of xueshuai@linux.alibaba.com designates 115.124.30.119 as permitted sender) smtp.mailfrom=xueshuai@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739430005; a=rsa-sha256; cv=none; b=Qs1dmfeIwsjkzbxesEhrekZZLO/eXg066ngxGIXjRj+IEDCqs9hPKNYKkxNZ7DW0Pg8v1B zy+DdbnydUsJXT9jx74naL1XZfrwALQdfuVi/hkPA0XDWbfru8DammpoEPI1st+UqL+xiG iGcmiV0a3FAGChC+MWWSxl2MXdo5vzA= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1739430000; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=tB5Kve4S2OJd5zNjPgLzfRSqSz5A4e7GPCbqNbIsUvY=; b=jTEvpu0IYeD4B5J+AspzkB9RjS69e+5cDrGwiAR+t62UDyyxfASuQWqfC4YDDIM3YIGNWbUQY5Glt1pvqJuC3r+5MQUmv1QN72TKB6LhnCrr6P4gMLUjPg/CmWF5qsIJvB/zk+SDkD9PbTetlPwM3vFtUp4oy+Yrhkk6UHn7DQg= Received: from 30.246.161.128(mailfrom:xueshuai@linux.alibaba.com fp:SMTPD_---0WPMZXZn_1739429998 cluster:ay36) by smtp.aliyun-inc.com; Thu, 13 Feb 2025 14:59:59 +0800 Message-ID: <84ed4048-606e-47bf-98ad-d850cf30d60d@linux.alibaba.com> Date: Thu, 13 Feb 2025 14:59:57 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v1 4/4] mm/hwpoison: Fix incorrect "not recovered" report for recovered clean pages To: Miaohe Lin Cc: tglx@linutronix.de, mingo@redhat.com, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, akpm@linux-foundation.org, linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, baolin.wang@linux.alibaba.com, tianruidong@linux.alibaba.com, tony.luck@intel.com, bp@alien8.de, "nao.horiguchi@gmail.com" References: <20250211060200.33845-1-xueshuai@linux.alibaba.com> <20250211060200.33845-5-xueshuai@linux.alibaba.com> <5f116840-60df-c6d9-d7ff-dcf1dce7773f@huawei.com> <3820329d-20e3-49ee-a329-aac7393c6df3@linux.alibaba.com> <23251c74-cc50-012c-409f-c45117b52b16@huawei.com> From: Shuai Xue In-Reply-To: <23251c74-cc50-012c-409f-c45117b52b16@huawei.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 6FC9520011 X-Stat-Signature: n3q39zzubwymwo89fx8pywaq3ucofxp6 X-HE-Tag: 1739430003-7650 X-HE-Meta: U2FsdGVkX1955CUwiO6Hnh7/Ij89WOmqtc9ylRbugEubyjE0t7k4GMoSMsKGsIx9CemGI7FuRRX3eUWlkEfh8jAs89gQrN0u03mwmE0E1en8IvBRlqx+8dOvKoJxiitFubB2EX6QvB9icP/oP9ruirgke6sSsXsbksApQWC/yiaS2RtNNgH01k39+h4zcymVyKNGHpk6njBMSO7ba55Xp2jRATg2ZeUiVRBy7Hh+ekYGSy/E8IGc7ue13MejLo2cGcWwLdzHhFxTIAAyUIDD/wxNrr44MRcY1WXacLgtgA374RZN12tAy+oSR6tmWANn56BoQcocqPGWCrE93qopo8MaHZZ5NXK2GwFsCZRGo5yk1AgRctEzoaQWTCpwxxEOXIkNS0Evow0PnnVO13oUb6cXgNIMtOSgZ4A0KeHr989N/gcZtxX9XaHQLdXp1yUlaRPAtOtrgkFTGXxS5aYaS0ZFz5OV2If+yzxxHctMCLON24B65XbiwDpwiklehKfFEapqMYluMdKKGWRlP1jgkYGxvjpaCV/CLeVIDM8DSU+UVfaA/t/4MJMy8PNwhuCess8xsdTa9V234g5ur4z/qF4f2lsQJ9w6Vz8Gczjpi5xbPsLR5h0EBQPwP9UBAX57DgP9ZE5zue38EnTxNCXyP1zrl3MTqdAj/PRWVDf09tJm3N8oTeCanM2E4skr3/fMJ1Hc2DsVKjVWjIbhDr50RlKPLcEMMJeA5NjGI/nRbPRgZEuK9ac2p0d8uPVfF+OH3bP94ZQDqVaxuafKLrYk0/iWI43dQVKsuFjkxoAHMGMfr2dDSVPRNyDbTM2XRfuqa0xzBaWpFAFZPBeyCGda+qW2O78l8QniyanqOv60oWU7k+A1gz9Qc/LWkEgG3azxF/CcSn4uqPYQ/8/0V/Qv5nEy38LkqzfIa3Bf6dVwd6G/WdjMQ8EdWRvK5eX2K49FuTU+3j40OWoyIW0uy+w pVHKUcyR eu4V/5j9h8Lj52+j5qXSeEdEzRt7UhcbTeSS94MM4SBMerCZuOIHG2aBBS3QJA/qw58DduAgPp6lJaJif35EUccadFCOpY5FUlQ9ncof4WIAo7DWQVs+Ny9uLK7KQeNBi48opRAKwiLV/v2Qn+3Rj2xBGB5BWmEsh3GyNcKG/hqcylTL1m/oYYf1OXHofZN6myAjMW5hMTcFFilEP4KQqqxWdPu7QXT22wYDtBNQHgg9kCJ5lvrhqEM84ABHPrBgN71Z+3LkDdk6KqMUQr/IdWY9L4qcPHyjnRYHE8gqM6Z65baVz9SGtSHQOqwj4ggvKTAmE8BqW490cvJtsQVxvm8cVUhlJQ+cg3R7AfshTqLZqROW5d4e7IPRC0kf2AWYXoXs1tNuVhTVh0uOqIt2NB+pyAydTZ/V1gvebwxMVJ7H4B9nNEaGttiUq0g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.020656, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 在 2025/2/13 11:20, Miaohe Lin 写道: > On 2025/2/12 21:55, Shuai Xue wrote: >> >> >> 在 2025/2/12 16:09, Miaohe Lin 写道: >>> On 2025/2/11 14:02, Shuai Xue wrote: >>>> When an uncorrected memory error is consumed there is a race between >>>> the CMCI from the memory controller reporting an uncorrected error >>>> with a UCNA signature, and the core reporting and SRAR signature >>>> machine check when the data is about to be consumed. >>>> >>>> If the CMCI wins that race, the page is marked poisoned when >>>> uc_decode_notifier() calls memory_failure(). For dirty pages, >>>> memory_failure() invokes try_to_unmap() with the TTU_HWPOISON flag, >>>> converting the PTE to a hwpoison entry. However, for clean pages, the >>>> TTU_HWPOISON flag is cleared, leaving the PTE unchanged and not converted >>>> to a hwpoison entry. Consequently, for an unmapped dirty page, the PTE is >>>> marked as a hwpoison entry allowing kill_accessing_process() to: >>>> >>>> - call walk_page_range() and return 1 >>>> - call kill_proc() to make sure a SIGBUS is sent >>>> - return -EHWPOISON to indicate that SIGBUS is already sent to the process >>>> and kill_me_maybe() doesn't have to send it again. >>>> >>>> Conversely, for clean pages where PTE entries are not marked as hwpoison, >>>> kill_accessing_process() returns -EFAULT, causing kill_me_maybe() to send a >>>> SIGBUS. >>>> >>>> Console log looks like this: >>>> >>>> Memory failure: 0x827ca68: corrupted page was clean: dropped without side effects >>>> Memory failure: 0x827ca68: recovery action for clean LRU page: Recovered >>>> Memory failure: 0x827ca68: already hardware poisoned >>>> mce: Memory error not recovered >>>> >>>> To fix it, return -EHWPOISON if no hwpoison PTE entry is found, preventing >>>> an unnecessary SIGBUS. >>> >>> Thanks for your patch. >>> >>>> >>>> Fixes: 046545a661af ("mm/hwpoison: fix error page recovered but reported "not recovered"") >>>> Signed-off-by: Shuai Xue >>>> --- >>>> mm/memory-failure.c | 5 ++--- >>>> 1 file changed, 2 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >>>> index 995a15eb67e2..f9a6b136a6f0 100644 >>>> --- a/mm/memory-failure.c >>>> +++ b/mm/memory-failure.c >>>> @@ -883,10 +883,9 @@ static int kill_accessing_process(struct task_struct *p, unsigned long pfn, >>>> (void *)&priv); >>>> if (ret == 1 && priv.tk.addr) >>>> kill_proc(&priv.tk, pfn, flags); >>>> - else >>>> - ret = 0; >>>> mmap_read_unlock(p->mm); >>>> - return ret > 0 ? -EHWPOISON : -EFAULT; >>>> + >>>> + return ret >= 0 ? -EHWPOISON : -EFAULT; >>> >>> IIUC, kill_accessing_process() is supposed to return -EHWPOISON to notify that SIGBUS is already >>> sent to the process and kill_me_maybe() doesn't have to send it again. But with your change, >>> kill_accessing_process() will return -EHWPOISON even if SIGBUS is not sent. Does this break >>> the semantics of -EHWPOISON? >> >> Yes, from the comment of kill_me_maybe(), >> >> * -EHWPOISON from memory_failure() means that it already sent SIGBUS >> * to the current process with the proper error info, >> * -EOPNOTSUPP means hwpoison_filter() filtered the error event, >> >> this patch break the comment. >> >> But the defination of EHWPOISON is quite different from the comment. >> >> #define EHWPOISON 133 /* Memory page has hardware error */ >> >> As for this issue, returning 0 or EHWPOISON can both prevent a SIGBUS signal >> from being sent in kill_me_maybe(). >> >> Which way do you prefer? >> >>> >>> BTW I scanned the code of walk_page_range(). It seems with implementation of hwpoison_walk_ops >>> walk_page_range() will only return 0 or 1, i.e. always >= 0. So kill_accessing_process() will always >>> return -EHWPOISON if this patch is applied. >>> >>> Correct me if I miss something. >> >> Yes, you are right. Let's count the cases one by one: >> >> 1. clean page: try_to_remap(!TTU_HWPOISON), walk_page_range() will return 0 and > > Do you mean try_to_unmap? Yes, sorry for the typo. > >> we should not send sigbus in kill_me_maybe(). >> >> 2. dirty page: >> 2.1 MCE wins race >> CMCI:w/o Action Require MCE: w/ Action Require >> TestSetPageHWPoison >> TestSetPageHWPoison >> return -EHWPOISON >> try_to_unmap(TTU_HWPOISON) >> kill_proc in hwpoison_user_mappings() >> >> If MCE wins the race, because the flag of memory_fialure() called by CMCI is >> not set as MF_ACTION_REQUIRED, everything goes well, kill_proc() will send >> SIGBUS in hwpoison_user_mappings(). >> >> 2.2 CMCI win >> CMCI:w/o Action Require MCE: w/ Action Require >> TestSetPageHWPoison >> try_to_unmap(TTU_HWPOISON) >> walk_page_range() return 1 due to hwpoison PTE entry >> kill_proc in kill_accessing_process() >> >> If the CMCI wins the race, we need to kill the process in >> kill_accessing_process(). And if try_to_remap() success, everything goes well, >> kill_proc() will send SIGBUS in kill_accessing_process(). >> >> But if try_to_remap() fails, the PTE entry will not be marked as hwpoison, and >> walk_page_range() return 0 as case 1 clean page, NO SIGBUS will be sent. > > If try_to_unmap() fails, the PTE entry will still point to the dirty page. Then in > check_hwpoisoned_entry(), we will have pfn == poisoned_pfn. So walk_page_range() > will return 1 in this case. Or am I miss something? > You’re right; I overlooked the pte_present() branch. Therefore, in the walk_page_range() function: - It returns 0 when the poison page is a clean page. - It returns 1 when CMCI wins, regardless of whether try_to_unmap succeeds or fails. Then the patch will be like: @@ -883,10 +883,9 @@ static int kill_accessing_process(struct task_struct *p, unsigned long pfn, (void *)&priv); if (ret == 1 && priv.tk.addr) kill_proc(&priv.tk, pfn, flags); - else - ret = 0; mmap_read_unlock(p->mm); - return ret > 0 ? -EHWPOISON : -EFAULT; + + return ret > 0 ? -EHWPOISON : 0; Here, returning 0 indicates that memory_failure() successfully handled the error by dropping the clean page. >> >> In summary, hwpoison_walk_ops cannot distinguish between try_to_unmap failing >> and causing the PTE entry not to be set to hwpoison, and a clean page that >> originally does not have the PTE entry set to hwpoison. > > Is it possible current process is not the one accessing the hwpoisoned page? E.g. memory_failure > is deferred and called from kworker context or something like that. If it's possible, this is > another scene needs to be considered. Yes, it possibale. But kill_accessing_process() will only be called with MF_ACTION_REQUIRED. MF_ACTION_REQUIRED indates that current process is exactly the one accesing the poison data. Fox x86 platform, GHES driver may queue a kwoker to defer memory_failure() with flag=0. So kill_accessing_process() will not be called in such case. Thanks. Shuai