From: Fan Ni <nifan.cxl@gmail.com>
To: shiju.jose@huawei.com
Cc: linux-edac@vger.kernel.org, linux-cxl@vger.kernel.org,
linux-acpi@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, bp@alien8.de, tony.luck@intel.com,
rafael@kernel.org, lenb@kernel.org, mchehab@kernel.org,
dan.j.williams@intel.com, dave@stgolabs.net,
jonathan.cameron@huawei.com, dave.jiang@intel.com,
alison.schofield@intel.com, vishal.l.verma@intel.com,
ira.weiny@intel.com, david@redhat.com, Vilas.Sridharan@amd.com,
leo.duran@amd.com, Yazen.Ghannam@amd.com, rientjes@google.com,
jiaqiyan@google.com, Jon.Grimm@amd.com,
dave.hansen@linux.intel.com, naoya.horiguchi@nec.com,
james.morse@arm.com, jthoughton@google.com,
somasundaram.a@hpe.com, erdemaktas@google.com, pgonda@google.com,
duenwen@google.com, mike.malvestuto@intel.com,
gthelen@google.com, wschwartz@amperecomputing.com,
dferguson@amperecomputing.com, wbs@os.amperecomputing.com,
nifan.cxl@gmail.com, jgroves@micron.com, vsalve@micron.com,
tanxiaofei@huawei.com, prime.zeng@hisilicon.com,
roberto.sassu@huawei.com, kangkang.shen@futurewei.com,
wanghuiqiang@huawei.com, linuxarm@huawei.com
Subject: Re: [PATCH v12 11/17] cxl/memfeature: Add CXL memory device ECS control feature
Date: Mon, 30 Sep 2024 11:12:36 -0700 [thread overview]
Message-ID: <ZvrqFHQ8TPv_c3vC@fan> (raw)
In-Reply-To: <20240911090447.751-12-shiju.jose@huawei.com>
On Wed, Sep 11, 2024 at 10:04:40AM +0100, shiju.jose@huawei.com wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
>
> CXL spec 3.1 section 8.2.9.9.11.2 describes the DDR5 ECS (Error Check
> Scrub) control feature.
> The Error Check Scrub (ECS) is a feature defined in JEDEC DDR5 SDRAM
> Specification (JESD79-5) and allows the DRAM to internally read, correct
> single-bit errors, and write back corrected data bits to the DRAM array
> while providing transparency to error counts.
>
> The ECS control allows the requester to change the log entry type, the ECS
> threshold count provided that the request is within the definition
> specified in DDR5 mode registers, change mode between codeword mode and
> row count mode, and reset the ECS counter.
>
> Register with EDAC RAS control feature driver, which gets the ECS attr
> descriptors from the EDAC ECS and expose sysfs ECS control attributes
> to the userspace.
> For example ECS control for the memory media FRU 0 in CXL mem0 device is
> in /sys/bus/edac/devices/cxl_mem0/ecs_fru0/
>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> ---
> drivers/cxl/core/memfeature.c | 439 +++++++++++++++++++++++++++++++++-
> 1 file changed, 438 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cxl/core/memfeature.c b/drivers/cxl/core/memfeature.c
> index 90c68d20b02b..5d4057fa304c 100644
> --- a/drivers/cxl/core/memfeature.c
> +++ b/drivers/cxl/core/memfeature.c
> @@ -19,7 +19,7 @@
> #include <cxl.h>
> #include <linux/edac.h>
>
> -#define CXL_DEV_NUM_RAS_FEATURES 1
> +#define CXL_DEV_NUM_RAS_FEATURES 2
> #define CXL_DEV_HOUR_IN_SECS 3600
>
> #define CXL_SCRUB_NAME_LEN 128
> @@ -303,6 +303,405 @@ static const struct edac_scrub_ops cxl_ps_scrub_ops = {
> .cycle_duration_write = cxl_patrol_scrub_write_scrub_cycle,
> };
>
...
> + case CXL_ECS_PARAM_THRESHOLD:
> + wr_attrs[fru_id].ecs_config &= ~CXL_ECS_THRESHOLD_COUNT_MASK;
> + switch (params->threshold) {
> + case 256:
> + wr_attrs[fru_id].ecs_config |= FIELD_PREP(CXL_ECS_THRESHOLD_COUNT_MASK,
> + ECS_THRESHOLD_256);
> + break;
> + case 1024:
> + wr_attrs[fru_id].ecs_config |= FIELD_PREP(CXL_ECS_THRESHOLD_COUNT_MASK,
> + ECS_THRESHOLD_1024);
> + break;
> + case 4096:
> + wr_attrs[fru_id].ecs_config |= FIELD_PREP(CXL_ECS_THRESHOLD_COUNT_MASK,
> + ECS_THRESHOLD_4096);
> + break;
> + default:
> + dev_err(dev,
> + "Invalid CXL ECS scrub threshold count(%d) to set\n",
> + params->threshold);
> + dev_err(dev,
> + "Supported scrub threshold count: 256,1024,4096\n");
> + return -EINVAL;
> + }
> + break;
> + case CXL_ECS_PARAM_MODE:
> + if (params->mode != ECS_MODE_COUNTS_ROWS &&
> + params->mode != ECS_MODE_COUNTS_CODEWORDS) {
> + dev_err(dev,
> + "Invalid CXL ECS scrub mode(%d) to set\n",
> + params->mode);
> + dev_err(dev,
> + "Mode 0: ECS counts rows with errors"
> + " 1: ECS counts codewords with errors\n");
The messaging here can be improved. When printed out in dmesg, it looks
like
root@localhost:~# echo 2 > /sys/bus/edac/devices/cxl_mem0/ecs_fru0/mode
----
[ 6099.073006] cxl_mem mem0: Invalid CXL ECS scrub mode(2) to set
[ 6099.074407] cxl_mem mem0: Mode 0: ECS counts rows with errors 1: ECS counts codewords with errors
----
Maybe use similar message format as threshold above, like
+ dev_err(dev,
+ "Supported ECS mode: 0: ECS counts rows with errors; 1: ECS counts codewords with errors\n");
Fan
> + return -EINVAL;
> + }
> + wr_attrs[fru_id].ecs_config &= ~CXL_ECS_MODE_MASK;
> + wr_attrs[fru_id].ecs_config |= FIELD_PREP(CXL_ECS_MODE_MASK,
> + params->mode);
> + break;
> + case CXL_ECS_PARAM_RESET_COUNTER:
> + wr_attrs[fru_id].ecs_config &= ~CXL_ECS_RESET_COUNTER_MASK;
> + wr_attrs[fru_id].ecs_config |= FIELD_PREP(CXL_ECS_RESET_COUNTER_MASK,
> + params->reset_counter);
> + break;
> + default:
> + dev_err(dev, "Invalid CXL ECS parameter to set\n");
> + return -EINVAL;
> + }
> +
> + ret = cxl_set_feature(cxlds, cxl_ecs_uuid, cxl_ecs_ctx->set_version,
> + wr_attrs, wr_data_size,
> + CXL_SET_FEAT_FLAG_DATA_SAVED_ACROSS_RESET);
> + if (ret) {
> + dev_err(dev, "CXL ECS set feature failed ret=%d\n", ret);
> + return ret;
> + }
> +
> + return 0;
> +}
> +
> +static int cxl_ecs_get_log_entry_type(struct device *dev, void *drv_data,
> + int fru_id, u32 *val)
> +{
> + struct cxl_ecs_params params;
> + int ret;
> +
> + ret = cxl_mem_ecs_get_attrs(dev, drv_data, fru_id, ¶ms);
> + if (ret)
> + return ret;
> +
> + *val = params.log_entry_type;
> +
> + return 0;
> +}
> +
> +static int cxl_ecs_set_log_entry_type(struct device *dev, void *drv_data,
> + int fru_id, u32 val)
> +{
> + struct cxl_ecs_params params = {
> + .log_entry_type = val,
> + };
> +
> + return cxl_mem_ecs_set_attrs(dev, drv_data, fru_id,
> + ¶ms, CXL_ECS_PARAM_LOG_ENTRY_TYPE);
> +}
> +
> +static int cxl_ecs_get_log_entry_type_per_dram(struct device *dev, void *drv_data,
> + int fru_id, u32 *val)
> +{
> + struct cxl_ecs_params params;
> + int ret;
> +
> + ret = cxl_mem_ecs_get_attrs(dev, drv_data, fru_id, ¶ms);
> + if (ret)
> + return ret;
> +
> + if (params.log_entry_type == ECS_LOG_ENTRY_TYPE_DRAM)
> + *val = 1;
> + else
> + *val = 0;
> +
> + return 0;
> +}
> +
> +static int cxl_ecs_get_log_entry_type_per_memory_media(struct device *dev,
> + void *drv_data,
> + int fru_id, u32 *val)
> +{
> + struct cxl_ecs_params params;
> + int ret;
> +
> + ret = cxl_mem_ecs_get_attrs(dev, drv_data, fru_id, ¶ms);
> + if (ret)
> + return ret;
> +
> + if (params.log_entry_type == ECS_LOG_ENTRY_TYPE_MEM_MEDIA_FRU)
> + *val = 1;
> + else
> + *val = 0;
> +
> + return 0;
> +}
> +
> +static int cxl_ecs_get_mode(struct device *dev, void *drv_data,
> + int fru_id, u32 *val)
> +{
> + struct cxl_ecs_params params;
> + int ret;
> +
> + ret = cxl_mem_ecs_get_attrs(dev, drv_data, fru_id, ¶ms);
> + if (ret)
> + return ret;
> +
> + *val = params.mode;
> +
> + return 0;
> +}
> +
> +static int cxl_ecs_set_mode(struct device *dev, void *drv_data,
> + int fru_id, u32 val)
> +{
> + struct cxl_ecs_params params = {
> + .mode = val,
> + };
> +
> + return cxl_mem_ecs_set_attrs(dev, drv_data, fru_id,
> + ¶ms, CXL_ECS_PARAM_MODE);
> +}
> +
> +static int cxl_ecs_get_mode_counts_rows(struct device *dev, void *drv_data,
> + int fru_id, u32 *val)
> +{
> + struct cxl_ecs_params params;
> + int ret;
> +
> + ret = cxl_mem_ecs_get_attrs(dev, drv_data, fru_id, ¶ms);
> + if (ret)
> + return ret;
> +
> + if (params.mode == ECS_MODE_COUNTS_ROWS)
> + *val = 1;
> + else
> + *val = 0;
> +
> + return 0;
> +}
> +
> +static int cxl_ecs_get_mode_counts_codewords(struct device *dev, void *drv_data,
> + int fru_id, u32 *val)
> +{
> + struct cxl_ecs_params params;
> + int ret;
> +
> + ret = cxl_mem_ecs_get_attrs(dev, drv_data, fru_id, ¶ms);
> + if (ret)
> + return ret;
> +
> + if (params.mode == ECS_MODE_COUNTS_CODEWORDS)
> + *val = 1;
> + else
> + *val = 0;
> +
> + return 0;
> +}
> +
> +static int cxl_ecs_reset(struct device *dev, void *drv_data, int fru_id, u32 val)
> +{
> + struct cxl_ecs_params params = {
> + .reset_counter = val,
> + };
> +
> + return cxl_mem_ecs_set_attrs(dev, drv_data, fru_id,
> + ¶ms, CXL_ECS_PARAM_RESET_COUNTER);
> +}
> +
> +static int cxl_ecs_get_threshold(struct device *dev, void *drv_data,
> + int fru_id, u32 *val)
> +{
> + struct cxl_ecs_params params;
> + int ret;
> +
> + ret = cxl_mem_ecs_get_attrs(dev, drv_data, fru_id, ¶ms);
> + if (ret)
> + return ret;
> +
> + *val = params.threshold;
> +
> + return 0;
> +}
> +
> +static int cxl_ecs_set_threshold(struct device *dev, void *drv_data,
> + int fru_id, u32 val)
> +{
> + struct cxl_ecs_params params = {
> + .threshold = val,
> + };
> +
> + return cxl_mem_ecs_set_attrs(dev, drv_data, fru_id,
> + ¶ms, CXL_ECS_PARAM_THRESHOLD);
> +}
> +
> +static const struct edac_ecs_ops cxl_ecs_ops = {
> + .get_log_entry_type = cxl_ecs_get_log_entry_type,
> + .set_log_entry_type = cxl_ecs_set_log_entry_type,
> + .get_log_entry_type_per_dram = cxl_ecs_get_log_entry_type_per_dram,
> + .get_log_entry_type_per_memory_media =
> + cxl_ecs_get_log_entry_type_per_memory_media,
> + .get_mode = cxl_ecs_get_mode,
> + .set_mode = cxl_ecs_set_mode,
> + .get_mode_counts_codewords = cxl_ecs_get_mode_counts_codewords,
> + .get_mode_counts_rows = cxl_ecs_get_mode_counts_rows,
> + .reset = cxl_ecs_reset,
> + .get_threshold = cxl_ecs_get_threshold,
> + .set_threshold = cxl_ecs_set_threshold,
> +};
> +
> int cxl_mem_ras_features_init(struct cxl_memdev *cxlmd, struct cxl_region *cxlr)
> {
> struct edac_dev_feature ras_features[CXL_DEV_NUM_RAS_FEATURES];
> @@ -310,7 +709,9 @@ int cxl_mem_ras_features_init(struct cxl_memdev *cxlmd, struct cxl_region *cxlr)
> struct cxl_patrol_scrub_context *cxl_ps_ctx;
> struct cxl_feat_entry feat_entry;
> char cxl_dev_name[CXL_SCRUB_NAME_LEN];
> + struct cxl_ecs_context *cxl_ecs_ctx;
> int rc, i, num_ras_features = 0;
> + int num_media_frus;
>
> if (cxlr) {
> struct cxl_region_params *p = &cxlr->params;
> @@ -366,6 +767,42 @@ int cxl_mem_ras_features_init(struct cxl_memdev *cxlmd, struct cxl_region *cxlr)
> ras_features[num_ras_features].ctx = cxl_ps_ctx;
> num_ras_features++;
>
> + if (!cxlr) {
> + rc = cxl_get_supported_feature_entry(cxlds, &cxl_ecs_uuid,
> + &feat_entry);
> + if (rc < 0)
> + goto feat_register;
> +
> + if (!(feat_entry.attr_flags & CXL_FEAT_ENTRY_FLAG_CHANGABLE))
> + goto feat_register;
> + num_media_frus = feat_entry.get_feat_size /
> + sizeof(struct cxl_ecs_rd_attrs);
> + if (!num_media_frus)
> + goto feat_register;
> +
> + cxl_ecs_ctx = devm_kzalloc(&cxlmd->dev, sizeof(*cxl_ecs_ctx),
> + GFP_KERNEL);
> + if (!cxl_ecs_ctx)
> + goto feat_register;
> + *cxl_ecs_ctx = (struct cxl_ecs_context) {
> + .get_feat_size = feat_entry.get_feat_size,
> + .set_feat_size = feat_entry.set_feat_size,
> + .get_version = feat_entry.get_feat_ver,
> + .set_version = feat_entry.set_feat_ver,
> + .set_effects = feat_entry.set_effects,
> + .num_media_frus = num_media_frus,
> + .cxlmd = cxlmd,
> + };
> +
> + ras_features[num_ras_features].ft_type = RAS_FEAT_ECS;
> + ras_features[num_ras_features].ecs_ops = &cxl_ecs_ops;
> + ras_features[num_ras_features].ctx = cxl_ecs_ctx;
> + ras_features[num_ras_features].ecs_info.num_media_frus =
> + num_media_frus;
> + num_ras_features++;
> + }
> +
> +feat_register:
> return edac_dev_register(&cxlmd->dev, cxl_dev_name, NULL,
> num_ras_features, ras_features);
> }
> --
> 2.34.1
>
--
Fan Ni
next prev parent reply other threads:[~2024-09-30 18:12 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-11 9:04 [PATCH v12 00/17] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers shiju.jose
2024-09-11 9:04 ` [PATCH v12 01/17] EDAC: Add support for EDAC device features control shiju.jose
2024-09-13 16:40 ` Borislav Petkov
2024-09-16 9:21 ` Shiju Jose
2024-09-16 10:50 ` Jonathan Cameron
2024-09-16 16:16 ` Shiju Jose
2024-09-11 9:04 ` [PATCH v12 02/17] EDAC: Add EDAC scrub control driver shiju.jose
2024-09-13 17:25 ` Borislav Petkov
2024-09-16 9:22 ` Shiju Jose
2024-09-26 23:04 ` Fan Ni
2024-09-27 11:17 ` Shiju Jose
2024-09-11 9:04 ` [PATCH v12 03/17] EDAC: Add EDAC ECS " shiju.jose
2024-09-27 16:28 ` Fan Ni
2024-09-11 9:04 ` [PATCH v12 04/17] cxl: Move mailbox related bits to the same context shiju.jose
2024-09-11 17:20 ` Dave Jiang
2024-09-12 9:42 ` Shiju Jose
2024-09-11 9:04 ` [PATCH v12 05/17] cxl: Fix comment regarding cxl_query_cmd() return data shiju.jose
2024-09-11 9:04 ` [PATCH v12 06/17] cxl: Refactor user ioctl command path from mds to mailbox shiju.jose
2024-09-11 9:04 ` [PATCH v12 07/17] cxl: Add Get Supported Features command for kernel usage shiju.jose
2024-09-23 23:33 ` Dave Jiang
2024-09-25 11:18 ` Shiju Jose
2024-09-11 9:04 ` [PATCH v12 08/17] cxl/mbox: Add GET_FEATURE mailbox command shiju.jose
2024-09-30 16:17 ` Fan Ni
2024-09-11 9:04 ` [PATCH v12 09/17] cxl/mbox: Add SET_FEATURE " shiju.jose
2024-09-30 16:58 ` Fan Ni
2024-09-11 9:04 ` [PATCH v12 10/17] cxl/memfeature: Add CXL memory device patrol scrub control feature shiju.jose
2024-09-30 17:38 ` Fan Ni
2024-10-01 8:38 ` Shiju Jose
2024-10-01 19:47 ` Fan Ni
2024-09-11 9:04 ` [PATCH v12 11/17] cxl/memfeature: Add CXL memory device ECS " shiju.jose
2024-09-30 18:12 ` Fan Ni [this message]
2024-10-01 8:39 ` Shiju Jose
2024-09-11 9:04 ` [PATCH v12 12/17] platform: Add __free() based cleanup function for platform_device_put shiju.jose
2024-09-11 9:04 ` [PATCH v12 13/17] ACPI:RAS2: Add ACPI RAS2 driver shiju.jose
2024-10-01 15:47 ` Fan Ni
2024-09-11 9:04 ` [PATCH v12 14/17] ras: mem: Add memory " shiju.jose
2024-09-11 9:04 ` [PATCH v12 15/17] EDAC: Add EDAC PPR control driver shiju.jose
2024-09-11 9:04 ` [PATCH v12 16/17] cxl/mbox: Add support for PERFORM_MAINTENANCE mailbox command shiju.jose
2024-09-11 9:04 ` [PATCH v12 17/17] cxl/memfeature: Add CXL memory device PPR control feature shiju.jose
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZvrqFHQ8TPv_c3vC@fan \
--to=nifan.cxl@gmail.com \
--cc=Jon.Grimm@amd.com \
--cc=Vilas.Sridharan@amd.com \
--cc=Yazen.Ghannam@amd.com \
--cc=alison.schofield@intel.com \
--cc=bp@alien8.de \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=david@redhat.com \
--cc=dferguson@amperecomputing.com \
--cc=duenwen@google.com \
--cc=erdemaktas@google.com \
--cc=gthelen@google.com \
--cc=ira.weiny@intel.com \
--cc=james.morse@arm.com \
--cc=jgroves@micron.com \
--cc=jiaqiyan@google.com \
--cc=jonathan.cameron@huawei.com \
--cc=jthoughton@google.com \
--cc=kangkang.shen@futurewei.com \
--cc=lenb@kernel.org \
--cc=leo.duran@amd.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxarm@huawei.com \
--cc=mchehab@kernel.org \
--cc=mike.malvestuto@intel.com \
--cc=naoya.horiguchi@nec.com \
--cc=pgonda@google.com \
--cc=prime.zeng@hisilicon.com \
--cc=rafael@kernel.org \
--cc=rientjes@google.com \
--cc=roberto.sassu@huawei.com \
--cc=shiju.jose@huawei.com \
--cc=somasundaram.a@hpe.com \
--cc=tanxiaofei@huawei.com \
--cc=tony.luck@intel.com \
--cc=vishal.l.verma@intel.com \
--cc=vsalve@micron.com \
--cc=wanghuiqiang@huawei.com \
--cc=wbs@os.amperecomputing.com \
--cc=wschwartz@amperecomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox