From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17839C02198 for ; Fri, 14 Feb 2025 07:59:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 79E706B0082; Fri, 14 Feb 2025 02:59:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 74DCB6B0083; Fri, 14 Feb 2025 02:59:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 615D96B0085; Fri, 14 Feb 2025 02:59:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 423356B0082 for ; Fri, 14 Feb 2025 02:59:41 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id C421E161FEB for ; Fri, 14 Feb 2025 07:59:40 +0000 (UTC) X-FDA: 83117800920.21.302F470 Received: from out30-100.freemail.mail.aliyun.com (out30-100.freemail.mail.aliyun.com [115.124.30.100]) by imf18.hostedemail.com (Postfix) with ESMTP id D44AF1C000A for ; Fri, 14 Feb 2025 07:59:37 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=AIoVn8xO; spf=pass (imf18.hostedemail.com: domain of xueshuai@linux.alibaba.com designates 115.124.30.100 as permitted sender) smtp.mailfrom=xueshuai@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739519978; a=rsa-sha256; cv=none; b=cToIffFCb5FACihuXMTGsgkAjF49KNAahFCjaKixu6/ZOHe2/i8tlowUDS6mvSnCQ9Jbku 4tPYRZUia1/J4ciyW8MhtAoqUPEnErirqLT3PhycN16p69DY/DqD3p/FFy2oi4VElVT5JN 7nQa1Y7FmvjPC9u5MU7HefmvsKd7CmQ= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=AIoVn8xO; spf=pass (imf18.hostedemail.com: domain of xueshuai@linux.alibaba.com designates 115.124.30.100 as permitted sender) smtp.mailfrom=xueshuai@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739519978; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7iw9G8i5r7l7ZNPI5yNsJDTBKaKr+iXP00E00D8TM0w=; b=OTXpcZLt7S9QHoPE7mOM8kLRT/MEy/Jla+smVFzuDRiQQIjUpAs12sAt6II5Ek89I4+6ol JwOGNPbGMYymDZQaS0aosAQfXGH8RzEPS9PVCjdKm7ypg9hFwATfQjWSa9ebk1gOUAA78k LuCsXvTgrRG4HriFjOHLxm6+FQ1ckfI= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1739519975; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=7iw9G8i5r7l7ZNPI5yNsJDTBKaKr+iXP00E00D8TM0w=; b=AIoVn8xOtN0QA0+9SPqH+7TcQSp9Ho3BVxdLFyxxaKaFMQoXTGOLjNsUrVY4+HJZfJTZTFvl73roL+5PlAdVUedG/kJLw0MxlxiXrMowP8iuqAmznFf9DdfQEt2gWi9MJQnCoU6PsdqPFMr2kGVOvPguMexKCtJwTijlF85DQOg= Received: from 30.246.161.128(mailfrom:xueshuai@linux.alibaba.com fp:SMTPD_---0WPQ9Cb5_1739519972 cluster:ay36) by smtp.aliyun-inc.com; Fri, 14 Feb 2025 15:59:33 +0800 Message-ID: <9a9322df-bcd6-4ff5-bbec-1292bb5978d0@linux.alibaba.com> Date: Fri, 14 Feb 2025 15:59:31 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v1 4/4] mm/hwpoison: Fix incorrect "not recovered" report for recovered clean pages To: Miaohe Lin Cc: tglx@linutronix.de, mingo@redhat.com, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, akpm@linux-foundation.org, linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, baolin.wang@linux.alibaba.com, tianruidong@linux.alibaba.com, tony.luck@intel.com, bp@alien8.de, "nao.horiguchi@gmail.com" References: <20250211060200.33845-1-xueshuai@linux.alibaba.com> <20250211060200.33845-5-xueshuai@linux.alibaba.com> <5f116840-60df-c6d9-d7ff-dcf1dce7773f@huawei.com> <3820329d-20e3-49ee-a329-aac7393c6df3@linux.alibaba.com> <23251c74-cc50-012c-409f-c45117b52b16@huawei.com> <84ed4048-606e-47bf-98ad-d850cf30d60d@linux.alibaba.com> <9f54f518-2be4-7e44-0d6e-c03c53149b97@huawei.com> From: Shuai Xue In-Reply-To: <9f54f518-2be4-7e44-0d6e-c03c53149b97@huawei.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: D44AF1C000A X-Stat-Signature: mzz1sp4847z6z6asd11htkxwmfew766a X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1739519977-325177 X-HE-Meta: U2FsdGVkX1+txlsgye+sfTH2y54fPFIk09lSVkiYAP0xBbnWdBoNFqmWQGfx+/nALHTicVuKxX/tHG5aSNfmpXJZg/5vVD23dMWuyGOxr0Lh6nmJ5qgDxuHAeY0jAJZEcMDYSxUU+tPnjSRYwKk65s8NlXZeUwkaRU4UnqZauZnEEF5GmsTMSPbLito84onsORrDtBHER1aCBq6VTzgHezH9rOXywnzQdCwb6b74mMrB/hoXwzjkYDmQUzmc+QYK9JD+FXeHOW6ZsgyKBgxsT4QnUpaMJ7HEeGDg89oDIZJFh5VYy9/bhwg5mG9Rvu7sqaoQZVCYtcP1Mz+Gdqr5Yu8MUBxoYG3fBQ3zXmwwotSyTE7qXrUZMXUAykpEDge3zVo465osqziL5kZelQ8vQ+SxbSxVFs2Lh7XyhYkrhJtdPja5A8xxM+W7alstEKGqKtW0k/tmbzkO5AlvIeEAe7VHn8CGgnzAgJisEKK4FK6Ky7K5yZiObYCdvDlLyopsSvmEmLUXw8bVwqybkx3r0ik6rBGCjcMuZrDIzx7d+3eBzPyhgkNQqLcnKN3vIVlCSpJ26WZzFG3nl71IDh8Fh+jJ06erZNzlwJxpXYc/jQ6uOJW3JXi9KnK16YdNXjqfk/FzACY9tFkPDhe8cVuSmmp6ledsEekay7ybnL1uVj/n+rmUzW0LPS1M9A713Db/O8dE3aGfjhcAqQA4fFpa58+6DCxSuYID+uPcrjB8GX2IfY02J/oKwbMoxFR+V8ZAcnW8yVOlvGzifttoTy9IkhkCWO2RjmLmzD4BOqvcVhPbHIvG13+z4i5S0D3zYcQVB2KSDz8S3Vj+KTT48Gtg161Gu/KObhIjJ6oqfBV87XeDrnLWVqAcHkcwrxdmH0C/Mrs5UfCpS92nMKm3GUVeu6N/2ByBWduxOLtnIOLNQtbgvPKbi9PTpUoHKR8+f7c4T35WwsntQhAHa2lD3IG GC6TD2yN 3k4ZgwUmnDhzn+CSzYU6ykC62u6/EyhWh4LrAHS+bh+aSF0MfmyYF0IXQisD5heDwe7prhi1Gp4N0QuVZHmfOnhXT5tIub/g0cmH/Md4pxKncVE+aj7Nq0krbjlPzRl247LwSR2JMNA+rptgRhLo1SHfKTsOV3XR+dZb8bEdQVhh8+c4JljE8ChwanDUs3/7Ae2LNIihvA7Y+8iU/8HPgOFBjcg3ZeZynd1RRF09ilITvbFmGkyosn68vIIr4oE+rjRrFti8w6Rzb8tgBkL5i44mc2jAS/p9H1O7gyUaV3lsq9YB8/ZpOrTgOhGaAOPz4cfg1+eDvfemIIoMEFRjH9bSMlRvFOqrXSYdFlTCqi+zaPy0ROyP5T/aSyJhLTdCXWCYMmRAhanTtpqnwSCOqW3jdNVANQnnm5ZuHhN0Q/7iebEfIVVDcbxD34g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.002329, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 在 2025/2/14 14:54, Miaohe Lin 写道: > On 2025/2/13 14:59, Shuai Xue wrote: >> >> >> 在 2025/2/13 11:20, Miaohe Lin 写道: >>> On 2025/2/12 21:55, Shuai Xue wrote: >>>> >>>> >>>> 在 2025/2/12 16:09, Miaohe Lin 写道: >>>>> On 2025/2/11 14:02, Shuai Xue wrote: >>>>>> When an uncorrected memory error is consumed there is a race between >>>>>> the CMCI from the memory controller reporting an uncorrected error >>>>>> with a UCNA signature, and the core reporting and SRAR signature >>>>>> machine check when the data is about to be consumed. >>>>>> >>>>>> If the CMCI wins that race, the page is marked poisoned when >>>>>> uc_decode_notifier() calls memory_failure(). For dirty pages, >>>>>> memory_failure() invokes try_to_unmap() with the TTU_HWPOISON flag, >>>>>> converting the PTE to a hwpoison entry. However, for clean pages, the >>>>>> TTU_HWPOISON flag is cleared, leaving the PTE unchanged and not converted >>>>>> to a hwpoison entry. Consequently, for an unmapped dirty page, the PTE is >>>>>> marked as a hwpoison entry allowing kill_accessing_process() to: >>>>>> >>>>>> - call walk_page_range() and return 1 >>>>>> - call kill_proc() to make sure a SIGBUS is sent >>>>>> - return -EHWPOISON to indicate that SIGBUS is already sent to the process >>>>>>     and kill_me_maybe() doesn't have to send it again. >>>>>> >>>>>> Conversely, for clean pages where PTE entries are not marked as hwpoison, >>>>>> kill_accessing_process() returns -EFAULT, causing kill_me_maybe() to send a >>>>>> SIGBUS. >>>>>> >>>>>> Console log looks like this: >>>>>> >>>>>>       Memory failure: 0x827ca68: corrupted page was clean: dropped without side effects >>>>>>       Memory failure: 0x827ca68: recovery action for clean LRU page: Recovered >>>>>>       Memory failure: 0x827ca68: already hardware poisoned >>>>>>       mce: Memory error not recovered >>>>>> >>>>>> To fix it, return -EHWPOISON if no hwpoison PTE entry is found, preventing >>>>>> an unnecessary SIGBUS. >>>>> >>>>> Thanks for your patch. >>>>> >>>>>> >>>>>> Fixes: 046545a661af ("mm/hwpoison: fix error page recovered but reported "not recovered"") >>>>>> Signed-off-by: Shuai Xue >>>>>> --- >>>>>>    mm/memory-failure.c | 5 ++--- >>>>>>    1 file changed, 2 insertions(+), 3 deletions(-) >>>>>> >>>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >>>>>> index 995a15eb67e2..f9a6b136a6f0 100644 >>>>>> --- a/mm/memory-failure.c >>>>>> +++ b/mm/memory-failure.c >>>>>> @@ -883,10 +883,9 @@ static int kill_accessing_process(struct task_struct *p, unsigned long pfn, >>>>>>                      (void *)&priv); >>>>>>        if (ret == 1 && priv.tk.addr) >>>>>>            kill_proc(&priv.tk, pfn, flags); >>>>>> -    else >>>>>> -        ret = 0; >>>>>>        mmap_read_unlock(p->mm); >>>>>> -    return ret > 0 ? -EHWPOISON : -EFAULT; >>>>>> + >>>>>> +    return ret >= 0 ? -EHWPOISON : -EFAULT; >>>>> >>>>> IIUC, kill_accessing_process() is supposed to return -EHWPOISON to notify that SIGBUS is already >>>>> sent to the process and kill_me_maybe() doesn't have to send it again. But with your change, >>>>> kill_accessing_process() will return -EHWPOISON even if SIGBUS is not sent. Does this break >>>>> the semantics of -EHWPOISON? >>>> >>>> Yes, from the comment of kill_me_maybe(), >>>> >>>>       * -EHWPOISON from memory_failure() means that it already sent SIGBUS >>>>       * to the current process with the proper error info, >>>>       * -EOPNOTSUPP means hwpoison_filter() filtered the error event, >>>> >>>> this patch break the comment. >>>> >>>> But the defination of EHWPOISON is quite different from the comment. >>>> >>>>   #define EHWPOISON    133    /* Memory page has hardware error */ >>>> >>>> As for this issue, returning 0 or EHWPOISON can both prevent a SIGBUS signal >>>> from being sent in kill_me_maybe(). >>>> >>>> Which way do you prefer? >>>> >>>>> >>>>> BTW I scanned the code of walk_page_range(). It seems with implementation of hwpoison_walk_ops >>>>> walk_page_range() will only return 0 or 1, i.e. always >= 0. So kill_accessing_process() will always >>>>> return -EHWPOISON if this patch is applied. >>>>> >>>>> Correct me if I miss something. >>>> >>>> Yes, you are right. Let's count the cases one by one: >>>> >>>> 1. clean page: try_to_remap(!TTU_HWPOISON), walk_page_range() will return 0 and >>> >>> Do you mean try_to_unmap? >> >> Yes, sorry for the typo. >>> >>>> we should not send sigbus in kill_me_maybe(). >>>> >>>> 2. dirty page: >>>> 2.1 MCE wins race >>>>            CMCI:w/o Action Require         MCE: w/ Action Require >>>>                                        TestSetPageHWPoison >>>>        TestSetPageHWPoison >>>>        return -EHWPOISON >>>>                                        try_to_unmap(TTU_HWPOISON) >>>>                                        kill_proc in hwpoison_user_mappings() >>>> >>>> If MCE wins the race, because the flag of memory_fialure() called by CMCI is >>>> not set as MF_ACTION_REQUIRED, everything goes well, kill_proc() will send >>>> SIGBUS in hwpoison_user_mappings(). >>>> >>>> 2.2 CMCI win >>>>            CMCI:w/o Action Require         MCE: w/ Action Require >>>>      TestSetPageHWPoison >>>>      try_to_unmap(TTU_HWPOISON) >>>>                                         walk_page_range() return 1 due to hwpoison PTE entry >>>>                                         kill_proc in kill_accessing_process() >>>> >>>> If the CMCI wins the race, we need to kill the process in >>>> kill_accessing_process(). And if try_to_remap() success, everything goes well, >>>> kill_proc() will send SIGBUS in kill_accessing_process(). >>>> >>>> But if try_to_remap() fails, the PTE entry will not be marked as hwpoison, and >>>> walk_page_range() return 0 as case 1 clean page, NO SIGBUS will be sent. >>> >>> If try_to_unmap() fails, the PTE entry will still point to the dirty page. Then in >>> check_hwpoisoned_entry(), we will have pfn == poisoned_pfn. So walk_page_range() >>> will return 1 in this case. Or am I miss something? >>> >> >> You’re right; I overlooked the pte_present() branch. >> >> Therefore, in the walk_page_range() function: >> - It returns 0 when the poison page is a clean page. >> - It returns 1 when CMCI wins, regardless of whether try_to_unmap succeeds >>   or fails. >> >> Then the patch will be like: >> >> @@ -883,10 +883,9 @@ static int kill_accessing_process(struct task_struct *p, unsigned long pfn, >>                    (void *)&priv); >>      if (ret == 1 && priv.tk.addr) >>          kill_proc(&priv.tk, pfn, flags); >> -    else >> -        ret = 0; >>      mmap_read_unlock(p->mm); >> -    return ret > 0 ? -EHWPOISON : -EFAULT; >> + >> +    return ret > 0 ? -EHWPOISON : 0; >> >> Here, returning 0 indicates that memory_failure() successfully handled the >> error by dropping the clean page. > > I'm not sure whether there's another scene that can make walk_page_range() returns 0. But if the > only reason for walk_page_range() returning 0 is the poison page is a clean page and it's dropped, > then this modification should be appropriate. With this change, the callers never send SIGBUS now. > They might need to be changed too. Yes, if memory_failure() successfully handled the error, the caller should be nothing. Thanks. Shuai