From: Dan Williams <dan.j.williams@intel.com>
To: Shiju Jose <shiju.jose@huawei.com>,
Dan Williams <dan.j.williams@intel.com>,
"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Cc: "bp@alien8.de" <bp@alien8.de>,
"tony.luck@intel.com" <tony.luck@intel.com>,
"rafael@kernel.org" <rafael@kernel.org>,
"lenb@kernel.org" <lenb@kernel.org>,
"mchehab@kernel.org" <mchehab@kernel.org>,
"dave@stgolabs.net" <dave@stgolabs.net>,
"Jonathan Cameron" <jonathan.cameron@huawei.com>,
"dave.jiang@intel.com" <dave.jiang@intel.com>,
"alison.schofield@intel.com" <alison.schofield@intel.com>,
"vishal.l.verma@intel.com" <vishal.l.verma@intel.com>,
"ira.weiny@intel.com" <ira.weiny@intel.com>,
"david@redhat.com" <david@redhat.com>,
"Vilas.Sridharan@amd.com" <Vilas.Sridharan@amd.com>,
"leo.duran@amd.com" <leo.duran@amd.com>,
"Yazen.Ghannam@amd.com" <Yazen.Ghannam@amd.com>,
"rientjes@google.com" <rientjes@google.com>,
"jiaqiyan@google.com" <jiaqiyan@google.com>,
"Jon.Grimm@amd.com" <Jon.Grimm@amd.com>,
"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
"naoya.horiguchi@nec.com" <naoya.horiguchi@nec.com>,
"james.morse@arm.com" <james.morse@arm.com>,
"jthoughton@google.com" <jthoughton@google.com>,
"somasundaram.a@hpe.com" <somasundaram.a@hpe.com>,
"erdemaktas@google.com" <erdemaktas@google.com>,
"pgonda@google.com" <pgonda@google.com>,
"duenwen@google.com" <duenwen@google.com>,
"gthelen@google.com" <gthelen@google.com>,
"wschwartz@amperecomputing.com" <wschwartz@amperecomputing.com>,
"dferguson@amperecomputing.com" <dferguson@amperecomputing.com>,
"wbs@os.amperecomputing.com" <wbs@os.amperecomputing.com>,
"nifan.cxl@gmail.com" <nifan.cxl@gmail.com>,
tanxiaofei <tanxiaofei@huawei.com>,
"Zengtao (B)" <prime.zeng@hisilicon.com>,
"Roberto Sassu" <roberto.sassu@huawei.com>,
"kangkang.shen@futurewei.com" <kangkang.shen@futurewei.com>,
wanghuiqiang <wanghuiqiang@huawei.com>,
Linuxarm <linuxarm@huawei.com>
Subject: RE: [PATCH v18 15/19] cxl/memfeature: Add CXL memory device patrol scrub control feature
Date: Mon, 27 Jan 2025 15:17:26 -0800 [thread overview]
Message-ID: <679814068d4d1_2d1e294c4@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <637fa0190fe64594954ee4d9e012c39c@huawei.com>
Shiju Jose wrote:
> Hi Dan,
>
> Thanks for the comments.
>
> Please find reply inline.
>
> Thanks,
> Shiju
> >-----Original Message-----
> >From: Dan Williams <dan.j.williams@intel.com>
> >Sent: 24 January 2025 20:39
> >To: Shiju Jose <shiju.jose@huawei.com>; linux-edac@vger.kernel.org; linux-
> >cxl@vger.kernel.org; linux-acpi@vger.kernel.org; linux-mm@kvack.org; linux-
> >kernel@vger.kernel.org
> >Cc: bp@alien8.de; tony.luck@intel.com; rafael@kernel.org; lenb@kernel.org;
> >mchehab@kernel.org; dan.j.williams@intel.com; dave@stgolabs.net; Jonathan
> >Cameron <jonathan.cameron@huawei.com>; dave.jiang@intel.com;
> >alison.schofield@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
> >david@redhat.com; Vilas.Sridharan@amd.com; leo.duran@amd.com;
> >Yazen.Ghannam@amd.com; rientjes@google.com; jiaqiyan@google.com;
> >Jon.Grimm@amd.com; dave.hansen@linux.intel.com;
> >naoya.horiguchi@nec.com; james.morse@arm.com; jthoughton@google.com;
> >somasundaram.a@hpe.com; erdemaktas@google.com; pgonda@google.com;
> >duenwen@google.com; gthelen@google.com;
> >wschwartz@amperecomputing.com; dferguson@amperecomputing.com;
> >wbs@os.amperecomputing.com; nifan.cxl@gmail.com; tanxiaofei
> ><tanxiaofei@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>; Roberto
> >Sassu <roberto.sassu@huawei.com>; kangkang.shen@futurewei.com;
> >wanghuiqiang <wanghuiqiang@huawei.com>; Linuxarm
> ><linuxarm@huawei.com>; Shiju Jose <shiju.jose@huawei.com>
> >Subject: Re: [PATCH v18 15/19] cxl/memfeature: Add CXL memory device patrol
> >scrub control feature
> >
> >shiju.jose@ wrote:
> >> From: Shiju Jose <shiju.jose@huawei.com>
> >>
> >> CXL spec 3.1 section 8.2.9.9.11.1 describes the device patrol scrub
> >> control feature. The device patrol scrub proactively locates and makes
> >> corrections to errors in regular cycle.
> >>
> >> Allow specifying the number of hours within which the patrol scrub
> >> must be completed, subject to minimum and maximum limits reported by the
> >device.
> >> Also allow disabling scrub allowing trade-off error rates against
> >> performance.
> >>
> >> Add support for patrol scrub control on CXL memory devices.
> >> Register with the EDAC device driver, which retrieves the scrub
> >> attribute descriptors from EDAC scrub and exposes the sysfs scrub
> >> control attributes to userspace. For example, scrub control for the
> >> CXL memory device "cxl_mem0" is exposed in
> >/sys/bus/edac/devices/cxl_mem0/scrubX/.
> >>
> >> Additionally, add support for region-based CXL memory patrol scrub control.
> >> CXL memory regions may be interleaved across one or more CXL memory
> >> devices. For example, region-based scrub control for "cxl_region1" is
> >> exposed in /sys/bus/edac/devices/cxl_region1/scrubX/.
> >>
> >> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> >> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> >> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> >> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> >> ---
> >> Documentation/edac/scrub.rst | 66 ++++++
> >> drivers/cxl/Kconfig | 17 ++
> >> drivers/cxl/core/Makefile | 1 +
> >> drivers/cxl/core/memfeature.c | 392
> >++++++++++++++++++++++++++++++++++
> >> drivers/cxl/core/region.c | 6 +
> >> drivers/cxl/cxlmem.h | 7 +
> >> drivers/cxl/mem.c | 5 +
> >> include/cxl/features.h | 16 ++
> >> 8 files changed, 510 insertions(+)
> >> create mode 100644 drivers/cxl/core/memfeature.c diff --git
> >> a/Documentation/edac/scrub.rst b/Documentation/edac/scrub.rst index
> >> f86645c7f0af..80e986c57885 100644
> >> --- a/Documentation/edac/scrub.rst
> >> +++ b/Documentation/edac/scrub.rst
[..]
> >
> >What is this content-free blob of cat and echo statements? Please write actual
> >documentation with theory of operation, clarification of assumptions, rationale
> >for defaults, guidance on changing defaults...
>
> Jonathan already replied.
I disagree that any of that is useful to include without rationale, and
if the rationale is already somewhere else then delete the multiple
lines of showing how 'cat' and 'echo' work with sysfs.
[..]
> >> + depends on CXL_MEM
> >
> >Similar comment, and this also goes away if all of this just moves into the new
> >cxl_features driver.
>
> Agree with Jonathan told in reply. These are RAS specific features for CXL memory devices and
> thus added in memfeature.c
Apoligies for this comment, I had meant to delete it along with some
other commentary along this theme after thinking it through.
I am now advocating that Dave drop his cxl_features driver altogether
and mirror your approach. I.e. EDAC is registered from existing CXL
drivers, and FWCTL can be registered against a cxl_memdev just like the
fw_upload ABI.
There was a concern that CXL needed a separate FWCTL driver in case
distributions wanted to have a policy against FWCTL, but given CXL
already has CONFIG_CXL_MEM_RAW_COMMANDS at compile-time and a wide array
of CXL bus devices, a cxl_features device is an awkward fit.
[..]
> >> +static int cxl_ps_get_attrs(struct cxl_patrol_scrub_context *cxl_ps_ctx,
> >> + struct cxl_memdev_ps_params *params) {
> >> + struct cxl_memdev *cxlmd;
> >> + u16 min_scrub_cycle = 0;
> >> + int i, ret;
> >> +
> >> + if (cxl_ps_ctx->cxlr) {
> >> + struct cxl_region *cxlr = cxl_ps_ctx->cxlr;
> >> + struct cxl_region_params *p = &cxlr->params;
> >> +
> >> + for (i = p->interleave_ways - 1; i >= 0; i--) {
> >> + struct cxl_endpoint_decoder *cxled = p->targets[i];
> >
> >It looks like this is called directly as a callback from EDAC. Where is the locking
> >that keeps cxl_ps_ctx->cxlr valid, or p->targets content stable?
> Jonathan already replied.
I could not find that comment? I *think* it's ok because when the region
is in the probe state changes will not be made to this list, but it
would be useful to at least have commentary to that effect. Protect
against someone copying this code in isolation and not consider the
context.
[..]
> >> +
> >> +int cxl_mem_ras_features_init(struct cxl_memdev *cxlmd, struct
> >> +cxl_region *cxlr)
> >
> >Please separate this into a memdev helper and a region helper. It is silly to have
> >two arguments to a function where one is expected to be NULL at all times, and
> >then have an if else statement inside that to effectively turn it back into 2 code
> >paths.
> >
> >If there is code to be shared amongst those, make *that* the shared helper.
> I added single function cxl_mem_ras_features_init() for both memdev and region based
> scrubbing to reduce code size as there were feedbacks try reduce code size.
"Succint" and "concise" does not necessarily mean less lines. I would
greatly prefer a few more lines if it mines not outsourcing complexity
to the calling context. Readable code means I do not need to wonder
what:
cxl_mem_ras_features_init(NULL, cxlr)
...means. I can just read devm_cxl_region_edac_register(cxlr), and know
exactly what is happening without needing to lose my train of thought to
go read what semantics cxl_mem_ras_features_init() is implementing.
Note that all the other _init() calls in drivers/cxl/ (outside of
module_init callbacks), are just purely init work, not object
registration. Please keep that local style.
> >> +{
> >> + struct edac_dev_feature ras_features[CXL_DEV_NUM_RAS_FEATURES];
> >> + char cxl_dev_name[CXL_DEV_NAME_LEN];
> >> + int num_ras_features = 0;
> >> + u8 scrub_inst = 0;
> >> + int rc;
> >> +
> >> + rc = cxl_memdev_scrub_init(cxlmd, cxlr,
> >&ras_features[num_ras_features],
> >> + scrub_inst);
> >> + if (rc < 0)
> >> + return rc;
> >> +
> >> + scrub_inst++;
> >> + num_ras_features++;
> >> +
> >> + if (cxlr)
> >> + snprintf(cxl_dev_name, sizeof(cxl_dev_name),
> >> + "cxl_region%d", cxlr->id);
> >
> >Why not pass dev_name(&cxlr->dev) directly?
> Jonathan already replied.
That was purely with the cxl_mem observation, cxlr can be passed
directly.
> >
> >> + else
> >> + snprintf(cxl_dev_name, sizeof(cxl_dev_name),
> >> + "%s_%s", "cxl", dev_name(&cxlmd->dev));
> >
> >Can a "cxl" directory be created so that the raw name can be used?
In fact we already do something similar for CONFIG_HMEM_REPORTING (i.e.
an "access%d" device to create a nameed directory of attributes) so it
is a question for Boris if he wants to tolerate a parent "cxl" device to
parent all CXL objects in EDAC.
> >
> >> +
> >> + return edac_dev_register(&cxlmd->dev, cxl_dev_name, NULL,
> >> + num_ras_features, ras_features);
> >
> >I'm so confused... a few lines down in this patch we have:
> >
> > rc = cxl_mem_ras_features_init(NULL, cxlr);
> >
> >...so how can this call to edac_dev_register() unconditionally de-reference
> >@cxlmd?
> Thanks for spotting this. It is a bug, need to fix.
[..]
> >> +EXPORT_SYMBOL_NS_GPL(cxl_mem_ras_features_init, "CXL");
> >> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> >> index b98b1ccffd1c..c2be70cd87f8 100644
> >> --- a/drivers/cxl/core/region.c
> >> +++ b/drivers/cxl/core/region.c
> >> @@ -3449,6 +3449,12 @@ static int cxl_region_probe(struct device *dev)
> >> p->res->start, p->res->end, cxlr,
> >> is_system_ram) > 0)
> >> return 0;
> >> +
> >> + rc = cxl_mem_ras_features_init(NULL, cxlr);
> >> + if (rc)
> >> + dev_warn(&cxlr->dev, "CXL RAS features init for
> >region_id=%d failed\n",
> >> + cxlr->id);
> >
> >There is more to RAS than EDAC memory scrub so this message is misleading. It
> >is also unnecessary because the driver continues to load and the admin, if they
> >care, will notice that the EDAC attributes are missing.
> This message was added for the debugging purpose in CXL driver. I will change to dev_dbg().
...but also stop calling this functionality with the blanket term "RAS".
It is "EDAC scrub and repair extensions to all the other RAS
functionality the CXL subsystem handles directly", name it accordingly.
next prev parent reply other threads:[~2025-01-27 23:17 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-06 12:09 [PATCH v18 00/19] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers shiju.jose
2025-01-06 12:09 ` [PATCH v18 01/19] EDAC: Add support for EDAC device features control shiju.jose
2025-01-06 13:37 ` Borislav Petkov
2025-01-06 14:48 ` Shiju Jose
2025-01-13 15:06 ` Mauro Carvalho Chehab
2025-01-14 9:55 ` Jonathan Cameron
2025-01-14 10:08 ` Shiju Jose
2025-01-14 11:33 ` Mauro Carvalho Chehab
2025-01-30 19:18 ` Daniel Ferguson
2025-01-06 12:09 ` [PATCH v18 02/19] EDAC: Add scrub control feature shiju.jose
2025-01-06 15:57 ` Borislav Petkov
2025-01-06 19:34 ` Shiju Jose
2025-01-07 7:32 ` Borislav Petkov
2025-01-07 9:23 ` Shiju Jose
2025-01-08 15:47 ` Shiju Jose
2025-01-13 15:50 ` Mauro Carvalho Chehab
2025-01-30 19:18 ` Daniel Ferguson
2025-01-06 12:09 ` [PATCH v18 03/19] EDAC: Add ECS " shiju.jose
2025-01-13 16:09 ` Mauro Carvalho Chehab
2025-01-06 12:10 ` [PATCH v18 04/19] EDAC: Add memory repair " shiju.jose
2025-01-09 9:19 ` Borislav Petkov
2025-01-09 11:00 ` Shiju Jose
2025-01-09 12:32 ` Borislav Petkov
2025-01-09 14:24 ` Jonathan Cameron
2025-01-09 15:18 ` Borislav Petkov
2025-01-09 16:01 ` Jonathan Cameron
2025-01-09 16:19 ` Borislav Petkov
2025-01-09 18:34 ` Jonathan Cameron
2025-01-09 23:51 ` Dan Williams
2025-01-10 11:01 ` Jonathan Cameron
2025-01-10 22:49 ` Dan Williams
2025-01-13 11:40 ` Jonathan Cameron
2025-01-14 19:35 ` Dan Williams
2025-01-15 10:07 ` Jonathan Cameron
2025-01-15 11:35 ` Mauro Carvalho Chehab
2025-01-11 17:12 ` Borislav Petkov
2025-01-13 11:07 ` Jonathan Cameron
2025-01-21 16:16 ` Borislav Petkov
2025-01-21 18:16 ` Jonathan Cameron
2025-01-22 19:09 ` Borislav Petkov
2025-02-06 13:39 ` Jonathan Cameron
2025-02-17 13:23 ` Borislav Petkov
2025-02-18 16:51 ` Jonathan Cameron
2025-02-19 18:45 ` Borislav Petkov
2025-02-20 12:19 ` Jonathan Cameron
2025-01-14 13:10 ` Mauro Carvalho Chehab
2025-01-14 12:57 ` Mauro Carvalho Chehab
2025-01-14 12:38 ` Mauro Carvalho Chehab
2025-01-14 13:05 ` Jonathan Cameron
2025-01-14 14:39 ` Mauro Carvalho Chehab
2025-01-14 11:47 ` Mauro Carvalho Chehab
2025-01-14 12:31 ` Shiju Jose
2025-01-14 14:26 ` Mauro Carvalho Chehab
2025-01-14 13:47 ` Mauro Carvalho Chehab
2025-01-14 14:30 ` Shiju Jose
2025-01-15 12:03 ` Mauro Carvalho Chehab
2025-01-06 12:10 ` [PATCH v18 05/19] ACPI:RAS2: Add ACPI RAS2 driver shiju.jose
2025-01-21 23:01 ` Daniel Ferguson
2025-01-22 15:38 ` Shiju Jose
2025-01-30 19:19 ` Daniel Ferguson
2025-01-06 12:10 ` [PATCH v18 06/19] ras: mem: Add memory " shiju.jose
2025-01-21 23:01 ` Daniel Ferguson
2025-01-30 19:19 ` Daniel Ferguson
2025-01-06 12:10 ` [PATCH v18 07/19] cxl: Refactor user ioctl command path from mds to mailbox shiju.jose
2025-01-06 12:10 ` [PATCH v18 08/19] cxl: Add skeletal features driver shiju.jose
2025-01-06 12:10 ` [PATCH v18 09/19] cxl: Enumerate feature commands shiju.jose
2025-01-06 12:10 ` [PATCH v18 10/19] cxl: Add Get Supported Features command for kernel usage shiju.jose
2025-01-06 12:10 ` [PATCH v18 11/19] cxl: Add features driver attribute to emit number of features supported shiju.jose
2025-01-06 12:10 ` [PATCH v18 12/19] cxl/mbox: Add GET_FEATURE mailbox command shiju.jose
2025-01-06 12:10 ` [PATCH v18 13/19] cxl/mbox: Add SET_FEATURE " shiju.jose
2025-01-06 12:10 ` [PATCH v18 14/19] cxl: Setup exclusive CXL features that are reserved for the kernel shiju.jose
2025-01-06 12:10 ` [PATCH v18 15/19] cxl/memfeature: Add CXL memory device patrol scrub control feature shiju.jose
2025-01-24 20:38 ` Dan Williams
2025-01-27 10:06 ` Jonathan Cameron
2025-01-27 12:53 ` Shiju Jose
2025-01-27 23:17 ` Dan Williams [this message]
2025-01-29 12:28 ` Shiju Jose
2025-01-06 12:10 ` [PATCH v18 16/19] cxl/memfeature: Add CXL memory device ECS " shiju.jose
2025-01-06 12:10 ` [PATCH v18 17/19] cxl/mbox: Add support for PERFORM_MAINTENANCE mailbox command shiju.jose
2025-01-06 12:10 ` [PATCH v18 18/19] cxl/memfeature: Add CXL memory device soft PPR control feature shiju.jose
2025-01-06 12:10 ` [PATCH v18 19/19] cxl/memfeature: Add CXL memory device memory sparing " shiju.jose
2025-01-13 14:46 ` [PATCH v18 00/19] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers Mauro Carvalho Chehab
2025-01-13 15:36 ` Jonathan Cameron
2025-01-14 14:06 ` Mauro Carvalho Chehab
2025-01-13 18:15 ` Shiju Jose
2025-01-30 19:18 ` Daniel Ferguson
2025-02-03 9:25 ` Shiju Jose
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=679814068d4d1_2d1e294c4@dwillia2-xfh.jf.intel.com.notmuch \
--to=dan.j.williams@intel.com \
--cc=Jon.Grimm@amd.com \
--cc=Vilas.Sridharan@amd.com \
--cc=Yazen.Ghannam@amd.com \
--cc=alison.schofield@intel.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=david@redhat.com \
--cc=dferguson@amperecomputing.com \
--cc=duenwen@google.com \
--cc=erdemaktas@google.com \
--cc=gthelen@google.com \
--cc=ira.weiny@intel.com \
--cc=james.morse@arm.com \
--cc=jiaqiyan@google.com \
--cc=jonathan.cameron@huawei.com \
--cc=jthoughton@google.com \
--cc=kangkang.shen@futurewei.com \
--cc=lenb@kernel.org \
--cc=leo.duran@amd.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxarm@huawei.com \
--cc=mchehab@kernel.org \
--cc=naoya.horiguchi@nec.com \
--cc=nifan.cxl@gmail.com \
--cc=pgonda@google.com \
--cc=prime.zeng@hisilicon.com \
--cc=rafael@kernel.org \
--cc=rientjes@google.com \
--cc=roberto.sassu@huawei.com \
--cc=shiju.jose@huawei.com \
--cc=somasundaram.a@hpe.com \
--cc=tanxiaofei@huawei.com \
--cc=tony.luck@intel.com \
--cc=vishal.l.verma@intel.com \
--cc=wanghuiqiang@huawei.com \
--cc=wbs@os.amperecomputing.com \
--cc=wschwartz@amperecomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox