From: Miaohe Lin <linmiaohe@huawei.com>
To: Jiaqi Yan <jiaqiyan@google.com>, Kyle Meyer <kyle.meyer@hpe.com>
Cc: <akpm@linux-foundation.org>, <david@redhat.com>,
<tony.luck@intel.com>, <bp@alien8.de>, <linux-mm@kvack.org>,
<linux-kernel@vger.kernel.org>, <linux-edac@vger.kernel.org>,
<lorenzo.stoakes@oracle.com>, <Liam.Howlett@oracle.com>,
<vbabka@suse.cz>, <rppt@kernel.org>, <surenb@google.com>,
<mhocko@suse.com>, <nao.horiguchi@gmail.com>,
<jane.chu@oracle.com>, <osalvador@suse.de>
Subject: Re: [PATCH] mm/memory-failure: Do not call action_result() on already poisoned pages
Date: Mon, 25 Aug 2025 11:04:43 +0800 [thread overview]
Message-ID: <14a0dd45-388d-7a32-5ee5-44e60277271a@huawei.com> (raw)
In-Reply-To: <CACw3F527M-x=UeB0tN_sjCgp_YP_8yLkVRCLP2dLO2bzXrqWsA@mail.gmail.com>
On 2025/8/22 8:24, Jiaqi Yan wrote:
> On Thu, Aug 21, 2025 at 12:36 PM Kyle Meyer <kyle.meyer@hpe.com> wrote:
>>
>> On Thu, Aug 21, 2025 at 11:23:48AM -0700, Jiaqi Yan wrote:
>>> On Thu, Aug 21, 2025 at 9:46 AM Kyle Meyer <kyle.meyer@hpe.com> wrote:
>>>>
>>>> Calling action_result() on already poisoned pages causes issues:
>>>>
>>>> * The amount of hardware corrupted memory is incorrectly incremented.
>>>> * NUMA node memory failure statistics are incorrectly updated.
>>>> * Redundant "already poisoned" messages are printed.
>>>
>>> All agreed.
>>>
>>>>
>>>> Do not call action_result() on already poisoned pages and drop unused
>>>> MF_MSG_ALREADY_POISONED.
>>>
>>> Hi Kyle,
>>>
>>> Patch looks great to me, just one thought...
Thanks both.
>>>
>>> Alternatively, have you thought about keeping MF_MSG_ALREADY_POISONED
>>> but changing action_result for MF_MSG_ALREADY_POISONED?
>>> - don't num_poisoned_pages_inc(pfn)
>>> - don't update_per_node_mf_stats(pfn, result)
>>> - still pr_err("%#lx: recovery action for %s: %s\n", ...)
>>> - meanwhile remove "pr_err("%#lx: already hardware poisoned\n", pfn)"
>>> in memory_failure and try_memory_failure_hugetlb
>>
>> I did consider that approach but I was concerned about passing
>> MF_MSG_ALREADY_POISONED to action_result() with MF_FAILED. The message is a
>> bit misleading.
>
> Based on my reading the documentation for MF_* in static const char
> *action_name[]...
>
> Yeah, for file mapped pages, kernel may not have hole-punched or
> truncated it from the file mapping (shmem and hugetlbfs for example)
> but that still considered as MF_RECOVERED, so touching a page with
> HWPoison flag doesn't mean that page was failed to be recovered
> previously.
>
> For pages intended to be taken out of the buddy system, touching a
> page with HWPoison flag does imply it isn't isolated and hence
> MF_FAILED.
There should be other cases that memory_failure failed to isolate the
hwpoisoned pages at first time due to various reasons.
>
> In summary, seeing the HWPoison flag again doesn't necessarily
> indicate what the recovery result was previously; it only indicate
> kernel won't re-attempt to recover?
Yes, kernel won't re-attempt to or just cannot recover.
>
>>
>> How about introducing a new MF action result? Maybe MF_NONE? The message could
>> look something like:
>
> Adding MF_NONE sounds fine to me, as long as we correctly document its
> meaning, which can be subtle.
Adding a new MF action result sounds good to me. But IMHO MF_NONE might not be that suitable
as kill_accessing_process might be called to kill proc in this case, so it's not "NONE".
Thanks.
.
next prev parent reply other threads:[~2025-08-25 3:04 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-21 16:44 Kyle Meyer
2025-08-21 18:23 ` Jiaqi Yan
2025-08-21 19:36 ` Kyle Meyer
2025-08-22 0:24 ` Jiaqi Yan
2025-08-25 3:04 ` Miaohe Lin [this message]
2025-08-25 16:09 ` Kyle Meyer
2025-08-25 22:36 ` jane.chu
2025-08-26 1:56 ` Kyle Meyer
2025-08-26 17:24 ` jane.chu
2025-08-26 19:27 ` Kyle Meyer
2025-08-26 21:22 ` Jiaqi Yan
2025-08-27 8:06 ` jane.chu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=14a0dd45-388d-7a32-5ee5-44e60277271a@huawei.com \
--to=linmiaohe@huawei.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=david@redhat.com \
--cc=jane.chu@oracle.com \
--cc=jiaqiyan@google.com \
--cc=kyle.meyer@hpe.com \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=nao.horiguchi@gmail.com \
--cc=osalvador@suse.de \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=tony.luck@intel.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox