linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Tony Luck <tony.luck@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>,
	Shiju Jose <shiju.jose@huawei.com>,
	"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
	"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"dave@stgolabs.net" <dave@stgolabs.net>,
	"dave.jiang@intel.com" <dave.jiang@intel.com>,
	"alison.schofield@intel.com" <alison.schofield@intel.com>,
	"vishal.l.verma@intel.com" <vishal.l.verma@intel.com>,
	"ira.weiny@intel.com" <ira.weiny@intel.com>,
	"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"david@redhat.com" <david@redhat.com>,
	"Vilas.Sridharan@amd.com" <Vilas.Sridharan@amd.com>,
	"leo.duran@amd.com" <leo.duran@amd.com>,
	"Yazen.Ghannam@amd.com" <Yazen.Ghannam@amd.com>,
	"rientjes@google.com" <rientjes@google.com>,
	"jiaqiyan@google.com" <jiaqiyan@google.com>,
	"Jon.Grimm@amd.com" <Jon.Grimm@amd.com>,
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
	"rafael@kernel.org" <rafael@kernel.org>,
	"lenb@kernel.org" <lenb@kernel.org>,
	"naoya.horiguchi@nec.com" <naoya.horiguchi@nec.com>,
	"james.morse@arm.com" <james.morse@arm.com>,
	"jthoughton@google.com" <jthoughton@google.com>,
	"somasundaram.a@hpe.com" <somasundaram.a@hpe.com>,
	"erdemaktas@google.com" <erdemaktas@google.com>,
	"pgonda@google.com" <pgonda@google.com>,
	"duenwen@google.com" <duenwen@google.com>,
	"mike.malvestuto@intel.com" <mike.malvestuto@intel.com>,
	"gthelen@google.com" <gthelen@google.com>,
	"wschwartz@amperecomputing.com" <wschwartz@amperecomputing.com>,
	"dferguson@amperecomputing.com" <dferguson@amperecomputing.com>,
	tanxiaofei <tanxiaofei@huawei.com>,
	"Zengtao (B)" <prime.zeng@hisilicon.com>,
	"kangkang.shen@futurewei.com" <kangkang.shen@futurewei.com>,
	wanghuiqiang <wanghuiqiang@huawei.com>,
	Linuxarm <linuxarm@huawei.com>, John Groves <John@Groves.net>
Subject: Re: [RFC PATCH v6 00/12] cxl: Add support for CXL feature commands, CXL device patrol scrub control and DDR5 ECS control features
Date: Fri, 1 Mar 2024 13:19:31 +0000	[thread overview]
Message-ID: <20240301131931.000070c7@Huawei.com> (raw)
In-Reply-To: <ZeDsESXK5pMeialz@agluck-desk3>

On Thu, 29 Feb 2024 12:41:53 -0800
Tony Luck <tony.luck@intel.com> wrote:

> > Obviously can't talk about who was involved in this feature
> > in it's definition, but I have strong confidence it will get implemented
> > for reasons I can point at on a public list. 
> > a) There will be scrubbing on devices.
> > b) It will need control (evidence for this is the BIOS controls mentioned below
> >    for equivalent main memory).
> > c) Hotplug means that control must be done by OS driver (or via very fiddly
> >    pre hotplug hacks that I think we can all agree should not be necessary
> >    and aren't even an option on all platforms)
> > d) No one likes custom solutions.
> > This isn't a fancy feature with a high level of complexity which helps.  

Hi Tony,
> 
> But how will users know what are appropriate scrubbing
> parameters for these devices?
> 
> Car analogy: Fuel injection systems on internal combustion engines
> have tweakable controls. But no auto manufacturer wires them up to
> a user accessible dashboad control.

Good analogy - I believe performance tuning 3rd parties will change
them for you. So the controls are used - be it not by every user.

> 
> Back to computers:
> 
> I'd expect the OEMs that produce memory devices to set appropriate
> scrubbing rates based on their internal knowledge of the components
> used in construction.

Absolutely agree that they will set a default / baseline value,
but reality is that 'everyone' (for the first few OEMs I googled)
exposes tuning controls in their shipping BIOS menus to configure
this because there are users who want to change it.  I'd expect
them to clamp the minimum scrub frequency to something that avoids
them getting hardware returned on mass for reliability and the
maximum at whatever ensures the perf is good enough that they sell
hardware in the first place.  I'd also expect a bios menu to
allow cloud hosts etc to turn off exposing RAS2 or similar.

> 
> What is the use case where some user would need to override these
> parameters and scrub and a faster/slower rate than that set by the
> manufacturer?

Its a performance vs reliability trade off.  If your larger scale
architecture (many servers) requires a few nodes to be super stable you
will pay pretty much any cost to keep them running. If a single node
failure makes little or no difference, you'll happily crank this down
(same with refresh) in order to save some power / get a small
performance lift.  Or if you care about latency tails, more than
reliability you'll turn this off.

For comedy value, some BIOS guides point out that leaving scrub on may
affect performance benchmarking. Obviously not a good data point, but
a hint at the sort of market that cares.  Same market that buy cheaper
RAM knowing they are going to have more system crashes.

There is probably a description gap. That might be a paperwork
question as part of system specification.
What is relationship between scrub rate and error rate under particular
styles of workload (because you get a free scrub whenever you access
the memory)?  The RAM dimms themselves could in theory provide inputs
but the workload dependence makes this hard. Probably fallback on a
a test and tune loop over very long runs.  Single bit error rates
used to detect when getting below a level people are happy with for
instance.

With the fancier units that can be supported, you can play more reliable
memory games by scanning subsets of the memory more frequently.

Though it was about a kernel daemon doing scrub, Jiaqi's RFC document here
https://lore.kernel.org/all/20221103155029.2451105-1-jiaqiyan@google.com/
provided justification for on demand scrub - some interesting stuff in the
bit on hardware patrol scrubbing.  I see you commented on the thread
and complexity of hardware solutions.

- Cheap memory makes this all more important.
- Need for configuration of how fast and when depending on system state.
- Lack of flexibility of what is scanned (RAS2 provides some by association
  with NUMA node + option to request particular ranges, CXL provides per
  end point controls).

There are some gaps on hardware scrubbers, but offloading this problem
definitely attractive.

So my understanding is there is demand to tune this but it won't be exposed
on every system.

Jonathan

 
> 
> -Tony



  reply	other threads:[~2024-03-01 13:19 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-15 11:14 shiju.jose
2024-02-15 11:14 ` [RFC PATCH v6 01/12] cxl/mbox: Add GET_SUPPORTED_FEATURES mailbox command shiju.jose
2024-02-20 11:02   ` Jonathan Cameron
2024-03-05 23:57   ` fan
2024-02-15 11:14 ` [RFC PATCH v6 02/12] cxl/mbox: Add GET_FEATURE " shiju.jose
2024-02-20 11:14   ` Jonathan Cameron
2024-02-23 12:23     ` Shiju Jose
2024-02-15 11:14 ` [RFC PATCH v6 03/12] cxl/mbox: Add SET_FEATURE " shiju.jose
2024-02-20 11:27   ` Jonathan Cameron
2024-02-15 11:14 ` [RFC PATCH v6 04/12] cxl/memscrub: Add CXL device patrol scrub control feature shiju.jose
2024-02-20 11:59   ` Jonathan Cameron
2024-02-15 11:14 ` [RFC PATCH v6 05/12] cxl/memscrub: Add CXL device ECS " shiju.jose
2024-02-20 12:17   ` Jonathan Cameron
2024-02-15 11:14 ` [RFC PATCH v6 06/12] memory: scrub: Add scrub subsystem driver supports configuring memory scrubs in the system shiju.jose
2024-02-20 12:39   ` Jonathan Cameron
2024-02-15 11:14 ` [RFC PATCH v6 07/12] cxl/memscrub: Register CXL device patrol scrub with scrub configure driver shiju.jose
2024-02-20 12:43   ` Jonathan Cameron
2024-02-15 11:14 ` [RFC PATCH v6 08/12] cxl/memscrub: Register CXL device ECS " shiju.jose
2024-02-20 13:39   ` Jonathan Cameron
2024-02-15 11:14 ` [RFC PATCH v6 09/12] ACPI:RASF: Add common library for RASF and RAS2 PCC interfaces shiju.jose
2024-02-20 13:53   ` Jonathan Cameron
2024-02-15 11:14 ` [RFC PATCH v6 10/12] ACPICA: ACPI 6.5: Add support for RAS2 table shiju.jose
2024-02-20 13:58   ` Jonathan Cameron
2024-02-15 11:14 ` [RFC PATCH v6 11/12] ACPI:RAS2: Add driver for ACPI RAS2 feature table (RAS2) shiju.jose
2024-02-15 11:14 ` [RFC PATCH v6 12/12] memory: RAS2: Add memory RAS2 driver shiju.jose
2024-02-20 16:12   ` Jonathan Cameron
2024-02-22  0:20 ` [RFC PATCH v6 00/12] cxl: Add support for CXL feature commands, CXL device patrol scrub control and DDR5 ECS control features Dan Williams
2024-02-23 12:16   ` Shiju Jose
2024-02-23 19:42     ` Dan Williams
2024-02-26 10:29       ` Jonathan Cameron
2024-02-29 19:51         ` Dan Williams
2024-03-01 14:41           ` Jonathan Cameron
2024-03-05  0:52             ` Jiaqi Yan
2024-02-29 20:41         ` Tony Luck
2024-03-01 13:19           ` Jonathan Cameron [this message]
2024-02-28 22:26       ` John Groves

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240301131931.000070c7@Huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=John@Groves.net \
    --cc=Jon.Grimm@amd.com \
    --cc=Vilas.Sridharan@amd.com \
    --cc=Yazen.Ghannam@amd.com \
    --cc=alison.schofield@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=dferguson@amperecomputing.com \
    --cc=duenwen@google.com \
    --cc=erdemaktas@google.com \
    --cc=gthelen@google.com \
    --cc=ira.weiny@intel.com \
    --cc=james.morse@arm.com \
    --cc=jiaqiyan@google.com \
    --cc=jthoughton@google.com \
    --cc=kangkang.shen@futurewei.com \
    --cc=lenb@kernel.org \
    --cc=leo.duran@amd.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxarm@huawei.com \
    --cc=mike.malvestuto@intel.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=pgonda@google.com \
    --cc=prime.zeng@hisilicon.com \
    --cc=rafael@kernel.org \
    --cc=rientjes@google.com \
    --cc=shiju.jose@huawei.com \
    --cc=somasundaram.a@hpe.com \
    --cc=tanxiaofei@huawei.com \
    --cc=tony.luck@intel.com \
    --cc=vishal.l.verma@intel.com \
    --cc=wanghuiqiang@huawei.com \
    --cc=wschwartz@amperecomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox