From: Fan Ni <nifan.cxl@gmail.com>
To: shiju.jose@huawei.com
Cc: linux-edac@vger.kernel.org, linux-cxl@vger.kernel.org,
linux-acpi@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
bp@alien8.de, tony.luck@intel.com, rafael@kernel.org,
lenb@kernel.org, mchehab@kernel.org, dan.j.williams@intel.com,
dave@stgolabs.net, jonathan.cameron@huawei.com,
dave.jiang@intel.com, alison.schofield@intel.com,
vishal.l.verma@intel.com, ira.weiny@intel.com, david@redhat.com,
Vilas.Sridharan@amd.com, leo.duran@amd.com,
Yazen.Ghannam@amd.com, rientjes@google.com, jiaqiyan@google.com,
Jon.Grimm@amd.com, dave.hansen@linux.intel.com,
naoya.horiguchi@nec.com, james.morse@arm.com,
jthoughton@google.com, somasundaram.a@hpe.com,
erdemaktas@google.com, pgonda@google.com, duenwen@google.com,
gthelen@google.com, wschwartz@amperecomputing.com,
dferguson@amperecomputing.com, wbs@os.amperecomputing.com,
nifan.cxl@gmail.com, tanxiaofei@huawei.com,
prime.zeng@hisilicon.com, roberto.sassu@huawei.com,
kangkang.shen@futurewei.com, wanghuiqiang@huawei.com,
linuxarm@huawei.com
Subject: Re: [PATCH v20 01/15] EDAC: Add support for EDAC device features control
Date: Thu, 13 Feb 2025 13:06:04 -0800 [thread overview]
Message-ID: <67ae5ec9.a70a0220.15bb91.94a5@mx.google.com> (raw)
In-Reply-To: <20250212143654.1893-2-shiju.jose@huawei.com>
On Wed, Feb 12, 2025 at 02:36:39PM +0000, shiju.jose@huawei.com wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
>
> Add generic EDAC device feature controls supporting the registration
> of RAS features available in the system. The driver exposes control
> attributes for these features to userspace in
> /sys/bus/edac/devices/<dev-name>/<ras-feature>/
>
> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Tested-by: Daniel Ferguson <danielf@os.amperecomputing.com>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Tested-by: Fan Ni <fan.ni@samsung.com>
> ---
> Documentation/edac/features.rst | 94 +++++++++++++++++++++++++++++
> Documentation/edac/index.rst | 10 ++++
> drivers/edac/edac_device.c | 102 ++++++++++++++++++++++++++++++++
> include/linux/edac.h | 26 ++++++++
> 4 files changed, 232 insertions(+)
> create mode 100644 Documentation/edac/features.rst
> create mode 100644 Documentation/edac/index.rst
>
> diff --git a/Documentation/edac/features.rst b/Documentation/edac/features.rst
> new file mode 100644
> index 000000000000..6b0fdc6f5d6e
> --- /dev/null
> +++ b/Documentation/edac/features.rst
> @@ -0,0 +1,94 @@
> +.. SPDX-License-Identifier: GPL-2.0 OR GFDL-1.2-no-invariants-or-later
> +
> +============================================
> +Augmenting EDAC for controlling RAS features
> +============================================
> +
> +Copyright (c) 2024-2025 HiSilicon Limited.
> +
> +:Author: Shiju Jose <shiju.jose@huawei.com>
> +:License: The GNU Free Documentation License, Version 1.2 without
> + Invariant Sections, Front-Cover Texts nor Back-Cover Texts.
> + (dual licensed under the GPL v2)
> +
> +- Written for: 6.15
> +
> +Introduction
> +------------
> +The expansion of EDAC for controlling RAS features and exposing features
> +control attributes to userspace via sysfs. Some Examples:
> +
> +1. Scrub control
> +
> +2. Error Check Scrub (ECS) control
> +
> +3. ACPI RAS2 features
> +
> +4. Post Package Repair (PPR) control
> +
> +5. Memory Sparing Repair control etc.
> +
> +High level design is illustrated in the following diagram::
> +
> + +-----------------------------------------------+
> + | Userspace - Rasdaemon |
> + | +-------------+ |
> + | | RAS CXL mem | +---------------+ |
> + | |error handler|---->| | |
> + | +-------------+ | RAS dynamic | |
> + | +-------------+ | scrub, memory | |
> + | | RAS memory |---->| repair control| |
> + | |error handler| +----|----------+ |
> + | +-------------+ | |
> + +--------------------------|--------------------+
> + |
> + |
> + +-------------------------------|------------------------------+
> + | Kernel EDAC extension for | controlling RAS Features |
> + |+------------------------------|----------------------------+ |
> + || EDAC Core Sysfs EDAC| Bus | |
> + || +--------------------------|---------------------------+| |
> + || |/sys/bus/edac/devices/<dev>/scrubX/ | | EDAC device || |
> + || |/sys/bus/edac/devices/<dev>/ecsX/ |<->| EDAC MC || |
> + || |/sys/bus/edac/devices/<dev>/repairX | | EDAC sysfs || |
> + || +---------------------------|--------------------------+| |
> + || EDAC|Bus | |
> + || | | |
> + || +----------+ Get feature | Get feature | |
> + || | | desc +---------|------+ desc +----------+ | |
> + || |EDAC scrub|<-----| EDAC device | | | | |
> + || +----------+ | driver- RAS |----->| EDAC mem | | |
> + || +----------+ | feature control| | repair | | |
> + || | |<-----| | +----------+ | |
> + || |EDAC ECS | +---------|------+ | |
> + || +----------+ Register RAS|features | |
> + || ______________________|_____________ | |
> + |+---------|---------------|------------------|--------------+ |
> + | +-------|----+ +-------|-------+ +----|----------+ |
> + | | | | CXL mem driver| | Client driver | |
> + | | ACPI RAS2 | | scrub, ECS, | | memory repair | |
> + | | driver | | sparing, PPR | | features | |
> + | +-----|------+ +-------|-------+ +------|--------+ |
> + | | | | |
> + +--------|-----------------|--------------------|--------------+
> + | | |
> + +--------|-----------------|--------------------|--------------+
> + | +---|-----------------|--------------------|-------+ |
> + | | | |
> + | | Platform HW and Firmware | |
> + | +--------------------------------------------------+ |
> + +--------------------------------------------------------------+
> +
> +
> +1. EDAC Features components - Create feature specific descriptors.
> + For example, EDAC scrub, EDAC ECS, EDAC memory repair in the above
> + diagram.
> +
> +2. EDAC device driver for controlling RAS Features - Get feature's attribute
> + descriptors from EDAC RAS feature component and registers device's RAS
> + features with EDAC bus and exposes the features control attributes via
> + the sysfs EDAC bus. For example, /sys/bus/edac/devices/<dev-name>/<feature>X/
> +
> +3. RAS dynamic feature controller - Userspace sample modules in rasdaemon for
> + dynamic scrub/repair control to issue scrubbing/repair when excess number
> + of corrected memory errors are reported in a short span of time.
> diff --git a/Documentation/edac/index.rst b/Documentation/edac/index.rst
> new file mode 100644
> index 000000000000..de4a3aa452cb
> --- /dev/null
> +++ b/Documentation/edac/index.rst
> @@ -0,0 +1,10 @@
> +.. SPDX-License-Identifier: GPL-2.0 OR GFDL-1.2-no-invariants-or-later
> +
> +==============
> +EDAC Subsystem
> +==============
> +
> +.. toctree::
> + :maxdepth: 1
> +
> + features
> diff --git a/drivers/edac/edac_device.c b/drivers/edac/edac_device.c
> index 621dc2a5d034..142a661ff543 100644
> --- a/drivers/edac/edac_device.c
> +++ b/drivers/edac/edac_device.c
> @@ -570,3 +570,105 @@ void edac_device_handle_ue_count(struct edac_device_ctl_info *edac_dev,
> block ? block->name : "N/A", count, msg);
> }
> EXPORT_SYMBOL_GPL(edac_device_handle_ue_count);
> +
> +static void edac_dev_release(struct device *dev)
> +{
> + struct edac_dev_feat_ctx *ctx = container_of(dev, struct edac_dev_feat_ctx, dev);
> +
> + kfree(ctx->dev.groups);
> + kfree(ctx);
> +}
> +
> +const struct device_type edac_dev_type = {
> + .name = "edac_dev",
> + .release = edac_dev_release,
> +};
> +
> +static void edac_dev_unreg(void *data)
> +{
> + device_unregister(data);
> +}
> +
> +/**
> + * edac_dev_register - register device for RAS features with EDAC
> + * @parent: parent device.
> + * @name: name for the folder in the /sys/bus/edac/devices/,
> + * which is derived from the parent device.
> + * For eg. /sys/bus/edac/devices/cxl_mem0/
> + * @private: parent driver's data to store in the context if any.
> + * @num_features: number of RAS features to register.
> + * @ras_features: list of RAS features to register.
> + *
> + * Return:
> + * * %0 - Success.
> + * * %-EINVAL - Invalid parameters passed.
> + * * %-ENOMEM - Dynamic memory allocation failed.
> + *
> + */
> +int edac_dev_register(struct device *parent, char *name,
> + void *private, int num_features,
> + const struct edac_dev_feature *ras_features)
> +{
> + const struct attribute_group **ras_attr_groups;
> + struct edac_dev_feat_ctx *ctx;
> + int attr_gcnt = 0;
> + int ret, feat;
> +
> + if (!parent || !name || !num_features || !ras_features)
> + return -EINVAL;
> +
> + /* Double parse to make space for attributes */
> + for (feat = 0; feat < num_features; feat++) {
> + switch (ras_features[feat].ft_type) {
> + /* Add feature specific code */
> + default:
> + return -EINVAL;
> + }
> + }
> +
> + ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
> + if (!ctx)
> + return -ENOMEM;
> +
> + ras_attr_groups = kcalloc(attr_gcnt + 1, sizeof(*ras_attr_groups), GFP_KERNEL);
> + if (!ras_attr_groups) {
> + ret = -ENOMEM;
> + goto ctx_free;
> + }
> +
> + attr_gcnt = 0;
> + for (feat = 0; feat < num_features; feat++, ras_features++) {
> + switch (ras_features->ft_type) {
> + /* Add feature specific code */
> + default:
> + ret = -EINVAL;
> + goto groups_free;
> + }
> + }
> +
> + ctx->dev.parent = parent;
> + ctx->dev.bus = edac_get_sysfs_subsys();
> + ctx->dev.type = &edac_dev_type;
> + ctx->dev.groups = ras_attr_groups;
> + ctx->private = private;
> + dev_set_drvdata(&ctx->dev, ctx);
> +
> + ret = dev_set_name(&ctx->dev, name);
> + if (ret)
> + goto groups_free;
> +
> + ret = device_register(&ctx->dev);
> + if (ret) {
> + put_device(&ctx->dev);
> + return ret;
> + }
> +
> + return devm_add_action_or_reset(parent, edac_dev_unreg, &ctx->dev);
> +
> +groups_free:
> + kfree(ras_attr_groups);
> +ctx_free:
> + kfree(ctx);
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(edac_dev_register);
> diff --git a/include/linux/edac.h b/include/linux/edac.h
> index b4ee8961e623..8c4b6ca2a994 100644
> --- a/include/linux/edac.h
> +++ b/include/linux/edac.h
> @@ -661,4 +661,30 @@ static inline struct dimm_info *edac_get_dimm(struct mem_ctl_info *mci,
>
> return mci->dimms[index];
> }
> +
> +/* RAS feature type */
> +enum edac_dev_feat {
> + RAS_FEAT_MAX
> +};
> +
> +/* EDAC device feature information structure */
> +struct edac_dev_data {
> + u8 instance;
> + void *private;
> +};
> +
> +struct edac_dev_feat_ctx {
> + struct device dev;
> + void *private;
> +};
> +
> +struct edac_dev_feature {
> + enum edac_dev_feat ft_type;
> + u8 instance;
> + void *ctx;
> +};
> +
> +int edac_dev_register(struct device *parent, char *dev_name,
> + void *parent_pvt_data, int num_features,
> + const struct edac_dev_feature *ras_features);
> #endif /* _LINUX_EDAC_H_ */
> --
> 2.43.0
>
next prev parent reply other threads:[~2025-02-13 21:06 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-12 14:36 [PATCH v20 00/15] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers shiju.jose
2025-02-12 14:36 ` [PATCH v20 01/15] EDAC: Add support for EDAC device features control shiju.jose
2025-02-13 21:06 ` Fan Ni [this message]
2025-02-12 14:36 ` [PATCH v20 02/15] EDAC: Add scrub control feature shiju.jose
2025-02-13 21:34 ` Fan Ni
2025-02-14 10:49 ` Shiju Jose
2025-02-12 14:36 ` [PATCH v20 03/15] EDAC: Add ECS " shiju.jose
2025-02-13 21:54 ` Fan Ni
2025-02-12 14:36 ` [PATCH v20 04/15] EDAC: Add memory repair " shiju.jose
2025-02-12 14:36 ` [PATCH v20 05/15] ACPI:RAS2: Add ACPI RAS2 driver shiju.jose
2025-02-12 14:36 ` [PATCH v20 06/15] ras: mem: Add memory " shiju.jose
2025-02-12 14:36 ` [PATCH v20 07/15] cxl: Add helper function to retrieve a feature entry shiju.jose
2025-02-12 14:36 ` [PATCH v20 08/15] cxl/memfeature: Add CXL memory device patrol scrub control feature shiju.jose
2025-02-12 14:36 ` [PATCH v20 09/15] cxl/memfeature: Add CXL memory device ECS " shiju.jose
2025-02-12 14:36 ` [PATCH v20 10/15] cxl/mbox: Add support for PERFORM_MAINTENANCE mailbox command shiju.jose
2025-02-12 14:36 ` [PATCH v20 11/15] cxl/region: Add helper function to determine memory is online shiju.jose
2025-02-12 14:36 ` [PATCH v20 12/15] cxl: Support for finding memory operation attributes from the current boot shiju.jose
2025-02-12 14:36 ` [PATCH v20 13/15] cxl/memfeature: Add CXL memory device soft PPR control feature shiju.jose
2025-02-12 14:36 ` [PATCH v20 14/15] EDAC: Update memory repair control interface for memory sparing feature shiju.jose
2025-02-12 14:36 ` [PATCH v20 15/15] cxl/memfeature: Add CXL memory device memory sparing control feature shiju.jose
2025-02-24 11:50 ` [PATCH v20 00/15] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers Borislav Petkov
2025-02-24 18:30 ` Shiju Jose
2025-02-24 19:36 ` Borislav Petkov
2025-02-25 11:20 ` Shiju Jose
2025-03-06 18:18 ` Daniel Ferguson
2025-03-10 10:16 ` Shiju Jose
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=67ae5ec9.a70a0220.15bb91.94a5@mx.google.com \
--to=nifan.cxl@gmail.com \
--cc=Jon.Grimm@amd.com \
--cc=Vilas.Sridharan@amd.com \
--cc=Yazen.Ghannam@amd.com \
--cc=alison.schofield@intel.com \
--cc=bp@alien8.de \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=david@redhat.com \
--cc=dferguson@amperecomputing.com \
--cc=duenwen@google.com \
--cc=erdemaktas@google.com \
--cc=gthelen@google.com \
--cc=ira.weiny@intel.com \
--cc=james.morse@arm.com \
--cc=jiaqiyan@google.com \
--cc=jonathan.cameron@huawei.com \
--cc=jthoughton@google.com \
--cc=kangkang.shen@futurewei.com \
--cc=lenb@kernel.org \
--cc=leo.duran@amd.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxarm@huawei.com \
--cc=mchehab@kernel.org \
--cc=naoya.horiguchi@nec.com \
--cc=pgonda@google.com \
--cc=prime.zeng@hisilicon.com \
--cc=rafael@kernel.org \
--cc=rientjes@google.com \
--cc=roberto.sassu@huawei.com \
--cc=shiju.jose@huawei.com \
--cc=somasundaram.a@hpe.com \
--cc=tanxiaofei@huawei.com \
--cc=tony.luck@intel.com \
--cc=vishal.l.verma@intel.com \
--cc=wanghuiqiang@huawei.com \
--cc=wbs@os.amperecomputing.com \
--cc=wschwartz@amperecomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox