linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Shiyang Ruan <ruansy.fnst@fujitsu.com>
To: "Luck, Tony" <tony.luck@intel.com>,
	Jonathan Cameron <Jonathan.Cameron@Huawei.com>,
	"Williams, Dan J" <dan.j.williams@intel.com>
Cc: "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
	"dave@stgolabs.net" <dave@stgolabs.net>,
	"Weiny, Ira" <ira.weiny@intel.com>,
	"Schofield, Alison" <alison.schofield@intel.com>,
	"Jiang, Dave" <dave.jiang@intel.com>,
	"Verma, Vishal L" <vishal.l.verma@intel.com>,
	Borislav Petkov <bp@alien8.de>, James Morse <james.morse@arm.com>,
	Mauro Carvalho Chehab <mchehab@kernel.org>,
	Robert Richter <rric@kernel.org>,
	"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Naoya Horiguchi <nao.horiguchi@gmail.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [RFC PATCH] cxl: avoid duplicating report from MCE & device
Date: Wed, 26 Jun 2024 14:03:03 +0800	[thread overview]
Message-ID: <bc58d99a-785f-4bb3-a9c9-9cf50ea7e06d@fujitsu.com> (raw)
In-Reply-To: <SJ1PR11MB6083837A8588894E49FEBC7BFCC92@SJ1PR11MB6083.namprd11.prod.outlook.com>



在 2024/6/22 4:44, Luck, Tony 写道:
>> So who actually cares about recovering poisoned volatile memory?
>> I'd like to understand more on how significant a use case this is.
>> Whilst I can conjecture that its an extreme case of wanting to avoid
>> loosing the ability to create 1GiB or larger pages due to poison
>> is that a real problem for anyone today?  Note this is just the case
>> where you've reached an actual uncorrectable error and probably
>> / possibly killed something, not the more common soft offlining
>> of memory due to correctable errors being detected.
> 
> I guess you really need a reply from someone with a data center
> with thousands of machines, since that's where this question
> may be important.
> 
> My humble opinion is that, outside of the huge page issue, nobody
> should try to recover a poisoned page. Systems that can report
> and recover from poison have tens, hundreds, or more GBytes
> of memory. Dropping 4K pages will not have any measurable
> impact on a system (even if there are hundreds of pages dropped).
> 
> There's no reliable way to determine whether the poisoned page
> was due to some transient issue, or a permanent defect. Recovering
> a poisoned page runs the risk that the poison will re-occur. Perhaps
> next use of the page will be in some unrecoverable (kernel) context.
> 
> So recovery has some risk, but very little upside benefit.

Since the hardware provides the instruction(CPU)/command(CXL) to clear 
the poison, we could make the function work, at least as an optional 
feature.  Then users could decide to use it or not after evaluating the 
risk and benefit.

I think doing recovery is an improvement step, and may need a lot of 
discussion.  I'm not sure if we could reach a conclusion in this thread. 
  Just hope more comments on the original problem (duplicate report) to 
solve in this patch.


--
Thanks,
Ruan.

> 
> -Tony


  reply	other threads:[~2024-06-26  6:03 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20240618165310.877974-1-ruansy.fnst@fujitsu.com>
2024-06-20 17:02 ` Jonathan Cameron
2024-06-21 10:16   ` Shiyang Ruan
2024-06-21 17:21     ` Jonathan Cameron
2024-06-21 17:59   ` Dan Williams
2024-06-21 18:45     ` Jonathan Cameron
2024-06-21 20:44       ` Luck, Tony
2024-06-26  6:03         ` Shiyang Ruan [this message]
2024-06-26 15:56           ` Luck, Tony
2024-07-19  6:24 ` Shiyang Ruan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bc58d99a-785f-4bb3-a9c9-9cf50ea7e06d@fujitsu.com \
    --to=ruansy.fnst@fujitsu.com \
    --cc=Jonathan.Cameron@Huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=bp@alien8.de \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=ira.weiny@intel.com \
    --cc=james.morse@arm.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mchehab@kernel.org \
    --cc=nao.horiguchi@gmail.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rric@kernel.org \
    --cc=tony.luck@intel.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox