linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jiaqi Yan <jiaqiyan@google.com>
To: Kyle Meyer <kyle.meyer@hpe.com>, linmiaohe@huawei.com
Cc: akpm@linux-foundation.org, david@redhat.com, tony.luck@intel.com,
	 bp@alien8.de, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	 linux-edac@vger.kernel.org, lorenzo.stoakes@oracle.com,
	 Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org,
	surenb@google.com,  mhocko@suse.com, nao.horiguchi@gmail.com,
	jane.chu@oracle.com,  osalvador@suse.de
Subject: Re: [PATCH] mm/memory-failure: Do not call action_result() on already poisoned pages
Date: Thu, 21 Aug 2025 17:24:13 -0700	[thread overview]
Message-ID: <CACw3F527M-x=UeB0tN_sjCgp_YP_8yLkVRCLP2dLO2bzXrqWsA@mail.gmail.com> (raw)
In-Reply-To: <aKd1K3ueTacGTf1W@hpe.com>

On Thu, Aug 21, 2025 at 12:36 PM Kyle Meyer <kyle.meyer@hpe.com> wrote:
>
> On Thu, Aug 21, 2025 at 11:23:48AM -0700, Jiaqi Yan wrote:
> > On Thu, Aug 21, 2025 at 9:46 AM Kyle Meyer <kyle.meyer@hpe.com> wrote:
> > >
> > > Calling action_result() on already poisoned pages causes issues:
> > >
> > > * The amount of hardware corrupted memory is incorrectly incremented.
> > > * NUMA node memory failure statistics are incorrectly updated.
> > > * Redundant "already poisoned" messages are printed.
> >
> > All agreed.
> >
> > >
> > > Do not call action_result() on already poisoned pages and drop unused
> > > MF_MSG_ALREADY_POISONED.
> >
> > Hi Kyle,
> >
> > Patch looks great to me, just one thought...
> >
> > Alternatively, have you thought about keeping MF_MSG_ALREADY_POISONED
> > but changing action_result for MF_MSG_ALREADY_POISONED?
> > - don't num_poisoned_pages_inc(pfn)
> > - don't update_per_node_mf_stats(pfn, result)
> > - still pr_err("%#lx: recovery action for %s: %s\n", ...)
> > - meanwhile remove "pr_err("%#lx: already hardware poisoned\n", pfn)"
> > in memory_failure and try_memory_failure_hugetlb
>
> I did consider that approach but I was concerned about passing
> MF_MSG_ALREADY_POISONED to action_result() with MF_FAILED. The message is a
> bit misleading.

Based on my reading the documentation for MF_* in static const char
*action_name[]...

Yeah, for file mapped pages, kernel may not have hole-punched or
truncated it from the file mapping (shmem and hugetlbfs for example)
but that still considered as MF_RECOVERED, so touching a page with
HWPoison flag doesn't mean that page was failed to be recovered
previously.

For pages intended to be taken out of the buddy system, touching a
page with HWPoison flag does imply it isn't isolated and hence
MF_FAILED.

In summary, seeing the HWPoison flag again doesn't necessarily
indicate what the recovery result was previously; it only indicate
kernel won't re-attempt to recover?

>
> How about introducing a new MF action result? Maybe MF_NONE? The message could
> look something like:

Adding MF_NONE sounds fine to me, as long as we correctly document its
meaning, which can be subtle.

Let's see what Miaohe's thoughts are.

>
> Memory failure: 0xXXXXXXXX: recovery action for already poisoned page: None
>
> > This way, all the MF recovery result kernel logs out will be sitting
> > in one place, action_result, instead of scattering around all over the
> > place.
>
> That sounds better to me.
>
> > >
> > > Fixes: b8b9488d50b7 ("mm/memory-failure: improve memory failure action_result messages")
> > > Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
> > > ---
> > >  include/linux/mm.h      | 1 -
> > >  include/ras/ras_event.h | 1 -
> > >  mm/memory-failure.c     | 3 ---
> > >  3 files changed, 5 deletions(-)
> > >
> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > index 1ae97a0b8ec7..09ce81ef7afc 100644
> > > --- a/include/linux/mm.h
> > > +++ b/include/linux/mm.h
> > > @@ -4005,7 +4005,6 @@ enum mf_action_page_type {
> > >         MF_MSG_BUDDY,
> > >         MF_MSG_DAX,
> > >         MF_MSG_UNSPLIT_THP,
> > > -       MF_MSG_ALREADY_POISONED,
> > >         MF_MSG_UNKNOWN,
> > >  };
> > >
> > > diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
> > > index c8cd0f00c845..f62a52f5bd81 100644
> > > --- a/include/ras/ras_event.h
> > > +++ b/include/ras/ras_event.h
> > > @@ -374,7 +374,6 @@ TRACE_EVENT(aer_event,
> > >         EM ( MF_MSG_BUDDY, "free buddy page" )                          \
> > >         EM ( MF_MSG_DAX, "dax page" )                                   \
> > >         EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" )                        \
> > > -       EM ( MF_MSG_ALREADY_POISONED, "already poisoned" )              \
> > >         EMe ( MF_MSG_UNKNOWN, "unknown page" )
> > >
> > >  /*
> > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > > index e2e685b971bb..7839ec83bc1d 100644
> > > --- a/mm/memory-failure.c
> > > +++ b/mm/memory-failure.c
> > > @@ -948,7 +948,6 @@ static const char * const action_page_types[] = {
> > >         [MF_MSG_BUDDY]                  = "free buddy page",
> > >         [MF_MSG_DAX]                    = "dax page",
> > >         [MF_MSG_UNSPLIT_THP]            = "unsplit thp",
> > > -       [MF_MSG_ALREADY_POISONED]       = "already poisoned",
> > >         [MF_MSG_UNKNOWN]                = "unknown page",
> > >  };
> > >
> > > @@ -2090,7 +2089,6 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
> > >                 if (flags & MF_ACTION_REQUIRED) {
> > >                         folio = page_folio(p);
> > >                         res = kill_accessing_process(current, folio_pfn(folio), flags);
> > > -                       action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
> > >                 }
> > >                 return res;
> > >         } else if (res == -EBUSY) {
> > > @@ -2283,7 +2281,6 @@ int memory_failure(unsigned long pfn, int flags)
> > >                         res = kill_accessing_process(current, pfn, flags);
> > >                 if (flags & MF_COUNT_INCREASED)
> > >                         put_page(p);
> > > -               action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
> > >                 goto unlock_mutex;
> > >         }
> > >
> > > --
> > > 2.50.1
> > >
> > >


  reply	other threads:[~2025-08-22  0:24 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-21 16:44 Kyle Meyer
2025-08-21 18:23 ` Jiaqi Yan
2025-08-21 19:36   ` Kyle Meyer
2025-08-22  0:24     ` Jiaqi Yan [this message]
2025-08-25  3:04       ` Miaohe Lin
2025-08-25 16:09         ` Kyle Meyer
2025-08-25 22:36           ` jane.chu
2025-08-26  1:56             ` Kyle Meyer
2025-08-26 17:24               ` jane.chu
2025-08-26 19:27                 ` Kyle Meyer
2025-08-26 21:22                   ` Jiaqi Yan
2025-08-27  8:06                     ` jane.chu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACw3F527M-x=UeB0tN_sjCgp_YP_8yLkVRCLP2dLO2bzXrqWsA@mail.gmail.com' \
    --to=jiaqiyan@google.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=david@redhat.com \
    --cc=jane.chu@oracle.com \
    --cc=kyle.meyer@hpe.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=nao.horiguchi@gmail.com \
    --cc=osalvador@suse.de \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=tony.luck@intel.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox