Re: [RFC] Kernel Support of Memory Error Detection.

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Yazen Ghannam <yazen.ghannam@amd.com>
To: Jiaqi Yan <jiaqiyan@google.com>
Cc: "Luck, Tony" <tony.luck@intel.com>,
	"HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>,
	"Vilas.Sridharan@amd.com" <vilas.sridharan@amd.com>,
	"David Rientjes" <rientjes@google.com>,
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
	"david@redhat.com" <david@redhat.com>,
	"Aktas, Erdem" <erdemaktas@google.com>,
	"pgonda@google.com" <pgonda@google.com>,
	"Hsiao, Duen-wen" <duenwen@google.com>,
	"Malvestuto, Mike" <mike.malvestuto@intel.com>,
	"gthelen@google.com" <gthelen@google.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"jthoughton@google.com" <jthoughton@google.com>
Subject: Re: [RFC] Kernel Support of Memory Error Detection.
Date: Wed, 14 Dec 2022 14:45:03 +0000	[thread overview]
Message-ID: <Y5nhb3VQvLnLp8tP@yaz-fattaah> (raw)
In-Reply-To: <CACw3F52kBRn6fBHpZAof-0eCrNhQfMy65h3gGBeh5_w6eY68fw@mail.gmail.com>

On Tue, Dec 13, 2022 at 11:03:52AM -0800, Jiaqi Yan wrote:
> On Tue, Dec 13, 2022 at 10:10 AM Luck, Tony <tony.luck@intel.com> wrote:
> >
> > > I think that one point not mentioned yet is how the in-kernel scanner finds
> > > a broken page before the page is marked by PG_hwpoison.  Some mechanism
> > > similar to mcsafe-memcpy could be used, but maybe memcpy is not necessary
> > > because we just want to check the healthiness of pages.  So a core routine
> > > like mcsafe-read would be introduced in the first patchset (or we already
> > > have it)?
> >
> > I don’t think that there is an existing routine to do the mcsafe-read. But it should
> > be easy enough to write one.  If an architecture supports a way to do this without
> > evicting other data from caches, that would be a bonus. X86 has a non-temporal
> > read that could be interesting ... but I'm not sure that it would detect poison
> > synchronously. I could be wrong, but I expect that you won’t see a machine check,
> > but you should see the memory controller log a UCNA error reported by a CMCI.
> >
> > -Tony
> 
> To Naoya: yes, we will introduce a new scanning routine. It "touches"
> cacheline by cacheline of a page to detect memory error. This "touch"
> is essentially an ANDQ operation of loaded cacheline with 0, to avoid
> leaking user data in the register.
> 
> To Tony: thanks. I think you are referring to PREFETCHNTA before ANDQ?
> (which we are using in our scanning routine to minimize cache
> pollution.) We tested the attached scanning draft on Intel Skylake +
> Cascadelake + Icelake CPUs, and the ANDQ instruction does raise a MC
> synchronously when an injected memory error is encountered.
> 
> To Yazen and Vilas: We haven't tested on any AMD hardware. Do you have
> any thoughts on PREFETCHNTA + MC?
>

Hi Jiaqi,

I'm not sure of the behavior. I think it'll require some experimentation.
The AMD APM has the following statement in the "PREFETCHlevel" description:

  "The operation of this instruction is implementation-dependent."

So it may be the case that the behavior changes between products. Maybe
this procedure should be opt-in and only apply to products that are
verified to work?

Thanks,
Yazen

     prev parent reply	other threads:[~2022-12-14 14:45 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-03 15:50 Jiaqi Yan
2022-11-03 16:27 ` Luck, Tony
2022-11-03 16:40   ` Nadav Amit
2022-11-08  2:24     ` Jiaqi Yan
2022-11-08 16:17       ` Luck, Tony
2022-11-09  5:04         ` HORIGUCHI NAOYA(堀口　直也)
2022-11-10 20:23           ` Jiaqi Yan
2022-11-18  1:19           ` Jiaqi Yan
2022-11-18 14:38             ` Sridharan, Vilas
2022-11-18 17:10               ` Luck, Tony
2022-11-07 16:59 ` Sridharan, Vilas
2022-11-09  5:29 ` HORIGUCHI NAOYA(堀口　直也)
2022-11-09 16:15   ` Luck, Tony
2022-11-10 20:25     ` Jiaqi Yan
2022-11-10 20:23   ` Jiaqi Yan
2022-11-30  5:31 ` David Rientjes
2022-12-13  9:27   ` HORIGUCHI NAOYA(堀口　直也)
2022-12-13 18:09     ` Luck, Tony
2022-12-13 19:03       ` Jiaqi Yan
2022-12-14 14:45         ` Yazen Ghannam [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y5nhb3VQvLnLp8tP@yaz-fattaah \
    --to=yazen.ghannam@amd.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=duenwen@google.com \
    --cc=erdemaktas@google.com \
    --cc=gthelen@google.com \
    --cc=jiaqiyan@google.com \
    --cc=jthoughton@google.com \
    --cc=linux-mm@kvack.org \
    --cc=mike.malvestuto@intel.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=pgonda@google.com \
    --cc=rientjes@google.com \
    --cc=tony.luck@intel.com \
    --cc=vilas.sridharan@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox