Re: [PATCH v12 1/2] ACPI:RAS2: Add ACPI RAS2 driver

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Jonathan Cameron <jonathan.cameron@huawei.com>
To: Borislav Petkov <bp@alien8.de>
Cc: Shiju Jose <shiju.jose@huawei.com>,
	"rafael@kernel.org" <rafael@kernel.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"rppt@kernel.org" <rppt@kernel.org>,
	"dferguson@amperecomputing.com" <dferguson@amperecomputing.com>,
	"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"tony.luck@intel.com" <tony.luck@intel.com>,
	"lenb@kernel.org" <lenb@kernel.org>,
	"Yazen.Ghannam@amd.com" <Yazen.Ghannam@amd.com>,
	"mchehab@kernel.org" <mchehab@kernel.org>,
	Linuxarm <linuxarm@huawei.com>,
	"rientjes@google.com" <rientjes@google.com>,
	"jiaqiyan@google.com" <jiaqiyan@google.com>,
	"Jon.Grimm@amd.com" <Jon.Grimm@amd.com>,
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
	"naoya.horiguchi@nec.com" <naoya.horiguchi@nec.com>,
	"james.morse@arm.com" <james.morse@arm.com>,
	"jthoughton@google.com" <jthoughton@google.com>,
	"somasundaram.a@hpe.com" <somasundaram.a@hpe.com>,
	"erdemaktas@google.com" <erdemaktas@google.com>,
	"pgonda@google.com" <pgonda@google.com>,
	"duenwen@google.com" <duenwen@google.com>,
	"gthelen@google.com" <gthelen@google.com>,
	"wschwartz@amperecomputing.com" <wschwartz@amperecomputing.com>,
	"wbs@os.amperecomputing.com" <wbs@os.amperecomputing.com>,
	"nifan.cxl@gmail.com" <nifan.cxl@gmail.com>,
	tanxiaofei <tanxiaofei@huawei.com>,
	"Zengtao (B)" <prime.zeng@hisilicon.com>,
	"Roberto Sassu" <roberto.sassu@huawei.com>,
	"kangkang.shen@futurewei.com" <kangkang.shen@futurewei.com>,
	wanghuiqiang <wanghuiqiang@huawei.com>
Subject: Re: [PATCH v12 1/2] ACPI:RAS2: Add ACPI RAS2 driver
Date: Wed, 17 Sep 2025 18:36:08 +0100	[thread overview]
Message-ID: <20250917183608.000038c4@huawei.com> (raw)
In-Reply-To: <20250917162253.GCaMrgXYXq2T4hFI0w@fat_crate.local>

On Wed, 17 Sep 2025 18:22:53 +0200
Borislav Petkov <bp@alien8.de> wrote:

> On Mon, Sep 15, 2025 at 11:50:16AM +0000, Shiju Jose wrote:
> > This has been added as suggested by Jonathan considering the interleaved NUMA node.
> > Link to the related discussion in V11:
> > https://lore.kernel.org/all/20250821100655.00003942@huawei.com/#t  
> 
> Sorry, this doesn't work this way.
> 
> If something in the code is being done which is not obvious and trivial, then
> the reason for it is written down in a prominent place so that it is clear to
> people.
> 
> Not pointing to a discussion or some funky place on the web where someone
> might've said something.
> 
> Your patch submission should contain that info and not have reviewers ask for
> it.
> 
> > | node 0 | node 1 | node 0 |   PA address map.
> > Can you give your suggestion what we should do about it?  
> 
> I don't know what the problem is to begin with...
> 
> > I think Option (2) seems better?  If so, can the EDAC scrub interface  be
> > updated to include attributes for publishing the supported PA range for the
> > memory device to scrub?  
> 
> The memory ranges should already be available somewhere in the NUMA/mm code or
> so and for starters, we should start a scrub for all ranges and do the
> single-range only when there really is a good reason for it.
> 
> Also, you don't have to expose any ranges to userspace in order to start
> a scrub activity - you can simply start the scrub in the affected range
> automatically.
> 
> Like I preached the last time, your aim should be to make as much of the
> variables that control the scrub automatic and not expose everything to
> userspace so that some userspace tool decides. The tool should simply start
> the scrub and the kernel should DTRT.

This 'first contiguous range' is an attempt to DTRT in a corner
case that is real but where there is not an obvious right thing due to spec limitations.

The problem is the ACPI specification ties a controller to a NUMA / _PXM but
then controls it via a single memory range with rules on whether that range can
include holes (that are actually covered by a different controller).  There
is no way to discover if two disconnected ranges may be scanned at once other
that trying that.

Without resolving this corner, there is no way we could come up with for the kernel
to DTRT.

Aim is to automatically establish the range that can be scrubbed. The corner
case is the hole problem.  Alternative as discussed in earlier versions of
this series was ignore holes.

The options discussed in earlier versions of this patch
=======================================================

So with Node / PA mapping (happens because people want low memory addresses
near to CPUs in different sockets - I simplified it here somewhat).

| Node 0 | Node 1 | Node 0 | Node 2|

1. Hole skipping approach
Have control parameters default to
|       Node 0             |
         | Node 1 |
                           | Node 2|
2. Present part of range that is at least not including a hole where it might
look like we were controlling memory scrub that we were not.
| Node 0 |
         | Node 1 |
                           | Node 2|

3. Just don't present anything and leave it up to general mm interfaces
   to provide what 'should' be set.
|
All 3 here, 0 size.

Whilst nice to support, I'm not seeing 'default' as a key use case and
changing scrub from a kernel driver without policy from userspace to
me would be a wrong thing to do. (I'm not sure if you are suggesting this)

The scrub should always be running pre linux (true today on all systems that I'm
aware of, but maybe not as universal as I think?).  If there is a need for
a default scrub setting on boot of Linux, then sure we can add it.

This interface all about tweaking the settings not defaults (unlike the CXL
case which does need the setting of defaults as well because of hotplug of
the devices and lack of firmware involvement in that).

We are fine with any of the options above.

This was an attempt to respond to review feedback from Daniel - it was not
something Huawei needs.

https://lore.kernel.org/all/547ed8fb-d6b7-4b6b-a38b-bf13223971b1@os.amperecomputing.com/

After a discussion of why 0, 0 defaults give an unexpected result...

"Proposed Solution:
What we propose, is to instead of zeroing out the base and size after an error,
use the full range of the current NUMA node. We believe that a superset of a
currently active scrub range can properly report all the relevant and correct
information."

The above Numa node pattern in PA space |0|1|0|2| etc is a thing
that happens on real systems so if was the best that had come up in earlier
discussion as an approximation of what Daniel asked for that should allow the
right values to be queried.

I'm not entirely sure this even matters now they have resolved the shared
PCC interface issue on their platforms.  Felt nice to provide meaningful
defaults but maybe this is a problem we don't need to solve and can go back to
just using 0, 0 until told to do something else.

Daniel, perhaps you can provide more info?

Thanks,

Jonathan

> 
> > This returns error on the first failure.
> > 
> > What if there was a success before? Does that aux_device need to be removed?
> > 
> > If not, then why return failure at all? Why not just try to add all devices? Some may fail and some may succeed.
> > ============================= 
> > 
> > We thought second option is a better because a successfully added aux dev for a memory device and corresponding
> > EDAC interface continue exist and support the scrub/a memory feature. 
> > We do not mind doing stop on a failure adding an aux_device and free previously crated aux devices, though
> > it may require some additional dynamically allocated memory space to store the successfully created aux devices
> > so that free them on a failure later. Hope that is acceptable?  
> 
> So how are you going to present to people a subset of devices loaded? And what
> is the point at all? 
> 
> Is there a valid use case where you can use only a subset of the devices to
> even try to support such nonsense?
>

next prev parent reply	other threads:[~2025-09-17 17:36 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-02 17:30 [PATCH v12 0/2] ACPI: Add support for ACPI RAS2 feature table shiju.jose
2025-09-02 17:30 ` [PATCH v12 1/2] ACPI:RAS2: Add ACPI RAS2 driver shiju.jose
2025-09-09 16:24   ` Yazen Ghannam
2025-09-10  8:38     ` Shiju Jose
2025-09-10 19:27   ` Borislav Petkov
2025-09-12 12:04     ` Shiju Jose
2025-09-12 14:11       ` Borislav Petkov
2025-09-15 11:50         ` Shiju Jose
2025-09-17 16:22           ` Borislav Petkov
2025-09-17 17:36             ` Jonathan Cameron [this message]
2025-09-18 20:22               ` Daniel Ferguson
2025-09-19 10:39               ` Borislav Petkov
2025-10-06 10:37                 ` Shiju Jose
2025-10-16 10:30                   ` Borislav Petkov
2025-10-17 12:54                     ` Shiju Jose
2025-10-24 18:13                       ` Daniel Ferguson
2025-11-03 13:19                       ` Borislav Petkov
2025-11-04 12:55                         ` Shiju Jose
2025-11-22 11:36                           ` Borislav Petkov
2025-09-02 17:30 ` [PATCH v12 2/2] ras: mem: Add memory " shiju.jose

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250917183608.000038c4@huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=Jon.Grimm@amd.com \
    --cc=Yazen.Ghannam@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=dferguson@amperecomputing.com \
    --cc=duenwen@google.com \
    --cc=erdemaktas@google.com \
    --cc=gthelen@google.com \
    --cc=james.morse@arm.com \
    --cc=jiaqiyan@google.com \
    --cc=jthoughton@google.com \
    --cc=kangkang.shen@futurewei.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxarm@huawei.com \
    --cc=mchehab@kernel.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=nifan.cxl@gmail.com \
    --cc=pgonda@google.com \
    --cc=prime.zeng@hisilicon.com \
    --cc=rafael@kernel.org \
    --cc=rientjes@google.com \
    --cc=roberto.sassu@huawei.com \
    --cc=rppt@kernel.org \
    --cc=shiju.jose@huawei.com \
    --cc=somasundaram.a@hpe.com \
    --cc=tanxiaofei@huawei.com \
    --cc=tony.luck@intel.com \
    --cc=wanghuiqiang@huawei.com \
    --cc=wbs@os.amperecomputing.com \
    --cc=wschwartz@amperecomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox