linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: shiju.jose@huawei.com
Cc: linux-edac@vger.kernel.org, linux-cxl@vger.kernel.org,
	linux-acpi@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, tony.luck@intel.com,
	rafael@kernel.org, lenb@kernel.org, mchehab@kernel.org,
	dan.j.williams@intel.com, dave@stgolabs.net,
	jonathan.cameron@huawei.com, gregkh@linuxfoundation.org,
	sudeep.holla@arm.com, jassisinghbrar@gmail.com,
	dave.jiang@intel.com, alison.schofield@intel.com,
	vishal.l.verma@intel.com, ira.weiny@intel.com, david@redhat.com,
	Vilas.Sridharan@amd.com, leo.duran@amd.com,
	Yazen.Ghannam@amd.com, rientjes@google.com, jiaqiyan@google.com,
	Jon.Grimm@amd.com, dave.hansen@linux.intel.com,
	naoya.horiguchi@nec.com, james.morse@arm.com,
	jthoughton@google.com, somasundaram.a@hpe.com,
	erdemaktas@google.com, pgonda@google.com, duenwen@google.com,
	gthelen@google.com, wschwartz@amperecomputing.com,
	dferguson@amperecomputing.com, wbs@os.amperecomputing.com,
	nifan.cxl@gmail.com, tanxiaofei@huawei.com,
	prime.zeng@hisilicon.com, roberto.sassu@huawei.com,
	kangkang.shen@futurewei.com, wanghuiqiang@huawei.com,
	linuxarm@huawei.com
Subject: Re: [PATCH v14 03/14] EDAC: Add ECS control feature
Date: Mon, 28 Oct 2024 12:16:37 +0100	[thread overview]
Message-ID: <20241028111637.GSZx9yleFPOjTklIVr@fat_crate.local> (raw)
In-Reply-To: <20241025171356.1377-4-shiju.jose@huawei.com>

On Fri, Oct 25, 2024 at 06:13:44PM +0100, shiju.jose@huawei.com wrote:
> +What:		/sys/bus/edac/devices/<dev-name>/ecs_fruX/log_entry_type
> +Date:		Jan 2025
> +KernelVersion:	6.13
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RW) The log entry type of how the DDR5 ECS log is reported.
> +		00b - per DRAM.
> +		01b - per memory media FRU.

If the conversion function here is kstrtoul(), why are those values not "0"
and "1" but in binary format?

> +
> +What:		/sys/bus/edac/devices/<dev-name>/ecs_fruX/log_entry_type_per_dram
> +Date:		Jan 2025
> +KernelVersion:	6.13
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RO) True if current log entry type is per DRAM.
> +
> +What:		/sys/bus/edac/devices/<dev-name>/ecs_fruX/log_entry_type_per_memory_media
> +Date:		Jan 2025
> +KernelVersion:	6.13
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RO) True if current log entry type is per memory media FRU.

What's the point of those two if log_entry_type already gives you the same
info?

And the filename length is a bit too much...

> +
> +What:		/sys/bus/edac/devices/<dev-name>/ecs_fruX/mode
> +Date:		Jan 2025
> +KernelVersion:	6.13
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RW) The mode of how the DDR5 ECS counts the errors.
> +		0 - ECS counts rows with errors.
> +		1 - ECS counts codewords with errors.

Now we have "0" and "1"s. Oh well.

What are "rows", what are "codewords"? Explain them here pls for the user.

> +What:		/sys/bus/edac/devices/<dev-name>/ecs_fruX/mode_counts_rows
> +Date:		Jan 2025
> +KernelVersion:	6.13
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RO) True if current mode is ECS counts rows with errors.
> +
> +What:		/sys/bus/edac/devices/<dev-name>/ecs_fruX/mode_counts_codewords
> +Date:		Jan 2025
> +KernelVersion:	6.13
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RO) True if current mode is ECS counts codewords with errors.

Same question as above - redundant files.

> +What:		/sys/bus/edac/devices/<dev-name>/ecs_fruX/reset
> +Date:		Jan 2025
> +KernelVersion:	6.13
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(WO) ECS reset ECC counter.
> +		1 - reset ECC counter to the default value.

1 or any value?

Looks like any to me...

You should restrict it to "1" in case you want to extend this interface with
"2" in the future, for example, doing something a bit different.

> +
> +What:		/sys/bus/edac/devices/<dev-name>/ecs_fruX/threshold
> +Date:		Jan 2025
> +KernelVersion:	6.13
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RW) ECS threshold count per gigabits of memory cells.

That definitely needs more explanation.

> diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile
> index 188501e676c7..b24c2c112d9c 100644
> --- a/drivers/edac/Makefile
> +++ b/drivers/edac/Makefile
> @@ -10,7 +10,7 @@ obj-$(CONFIG_EDAC)			:= edac_core.o
>  
>  edac_core-y	:= edac_mc.o edac_device.o edac_mc_sysfs.o
>  edac_core-y	+= edac_module.o edac_device_sysfs.o wq.o
> -edac_core-y	+= scrub.o
> +edac_core-y	+= scrub.o ecs.o
>  
>  edac_core-$(CONFIG_EDAC_DEBUG)		+= debugfs.o
>  
> diff --git a/drivers/edac/ecs.c b/drivers/edac/ecs.c
> new file mode 100755
> index 000000000000..a2b64d7bf6b6
> --- /dev/null
> +++ b/drivers/edac/ecs.c
> @@ -0,0 +1,240 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Generic ECS driver in order to support control the on die
> + * error check scrub (e.g. DDR5 ECS).

This sentence needs grammar check.

> The common sysfs ECS
> + * interface abstracts the control of an arbitrary ECS
> + * functionality to a common set of functions.
> + *
> + * Copyright (c) 2024 HiSilicon Limited.
> + */
> +

#undef pr_fmt

> +#define pr_fmt(fmt)     "EDAC ECS: " fmt

Grep the tree for examples how to do that properly.

Also, this pr_fmt looks unused.

> +static umode_t ecs_attr_visible(struct kobject *kobj, struct attribute *a, int attr_id)
> +{
> +	struct device *ras_feat_dev = kobj_to_dev(kobj);
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops;
> +
> +	switch (attr_id) {
> +	case ECS_LOG_ENTRY_TYPE:
> +		if (ops->get_log_entry_type)  {
> +			if (ops->set_log_entry_type)
> +				return a->mode;
> +			else
> +				return 0444;

What is the goal for the access mode of all those sysfs entries? I sure hope
it is going to be root-only no-matter what. I don't want normal users to cause
scrub activity. Please make sure your whole set does that.

> +		}
> +		break;
> +	case ECS_LOG_ENTRY_TYPE_PER_DRAM:
> +		if (ops->get_log_entry_type_per_dram)
> +			return a->mode;
> +		break;
> +	case ECS_LOG_ENTRY_TYPE_PER_MEMORY_MEDIA:
> +		if (ops->get_log_entry_type_per_memory_media)
> +			return a->mode;
> +		break;
> +	case ECS_MODE:
> +		if (ops->get_mode) {
> +			if (ops->set_mode)
> +				return a->mode;
> +			else
> +				return 0444;
> +		}
> +		break;
> +	case ECS_MODE_COUNTS_ROWS:
> +		if (ops->get_mode_counts_rows)
> +			return a->mode;
> +		break;
> +	case ECS_MODE_COUNTS_CODEWORDS:
> +		if (ops->get_mode_counts_codewords)
> +			return a->mode;
> +		break;
> +	case ECS_RESET:
> +		if (ops->reset)
> +			return a->mode;
> +		break;
> +	case ECS_THRESHOLD:
> +		if (ops->get_threshold) {
> +			if (ops->set_threshold)
> +				return a->mode;
> +			else
> +				return 0444;
> +		}
> +		break;
> +	default:
> +		break;
> +	}
> +
> +	return 0;
> +}
> +
> +#define EDAC_ECS_ATTR_RO(_name, _fru_id)       \
> +	((struct edac_ecs_dev_attr) { .dev_attr = __ATTR_RO(_name), \
> +				     .fru_id = _fru_id })
> +
> +#define EDAC_ECS_ATTR_WO(_name, _fru_id)       \
> +	((struct edac_ecs_dev_attr) { .dev_attr = __ATTR_WO(_name), \
> +				     .fru_id = _fru_id })
> +
> +#define EDAC_ECS_ATTR_RW(_name, _fru_id)       \
> +	((struct edac_ecs_dev_attr) { .dev_attr = __ATTR_RW(_name), \
> +				     .fru_id = _fru_id })
> +
> +static int ecs_create_desc(struct device *ecs_dev,
> +			   const struct attribute_group **attr_groups, u16 num_media_frus)
> +{
> +	struct edac_ecs_context *ecs_ctx;
> +	u32 fru;
> +
> +	ecs_ctx = devm_kzalloc(ecs_dev, sizeof(*ecs_ctx), GFP_KERNEL);
> +	if (!ecs_ctx)
> +		return -ENOMEM;
> +
> +	ecs_ctx->num_media_frus = num_media_frus;
> +	ecs_ctx->fru_ctxs = devm_kcalloc(ecs_dev, num_media_frus,
> +					 sizeof(*ecs_ctx->fru_ctxs),
> +					 GFP_KERNEL);
> +	if (!ecs_ctx->fru_ctxs)
> +		return -ENOMEM;
> +
> +	for (fru = 0; fru < num_media_frus; fru++) {
> +		struct edac_ecs_fru_context *fru_ctx = &ecs_ctx->fru_ctxs[fru];
> +		struct attribute_group *group = &fru_ctx->group;
> +		int i;
> +
> +		fru_ctx->ecs_dev_attr[ECS_LOG_ENTRY_TYPE] = EDAC_ECS_ATTR_RW(log_entry_type, fru);
> +		fru_ctx->ecs_dev_attr[ECS_LOG_ENTRY_TYPE_PER_DRAM] =
> +					EDAC_ECS_ATTR_RO(log_entry_type_per_dram, fru);
> +		fru_ctx->ecs_dev_attr[ECS_LOG_ENTRY_TYPE_PER_MEMORY_MEDIA] =
> +					EDAC_ECS_ATTR_RO(log_entry_type_per_memory_media, fru);
> +		fru_ctx->ecs_dev_attr[ECS_MODE] = EDAC_ECS_ATTR_RW(mode, fru);
> +		fru_ctx->ecs_dev_attr[ECS_MODE_COUNTS_ROWS] =
> +					EDAC_ECS_ATTR_RO(mode_counts_rows, fru);
> +		fru_ctx->ecs_dev_attr[ECS_MODE_COUNTS_CODEWORDS] =
> +					EDAC_ECS_ATTR_RO(mode_counts_codewords, fru);
> +		fru_ctx->ecs_dev_attr[ECS_RESET] = EDAC_ECS_ATTR_WO(reset, fru);
> +		fru_ctx->ecs_dev_attr[ECS_THRESHOLD] = EDAC_ECS_ATTR_RW(threshold, fru);

Clearly too long variable and define names. Shorten pls.

Also, a new line here:

<---


> +		for (i = 0; i < ECS_MAX_ATTRS; i++)
> +			fru_ctx->ecs_attrs[i] = &fru_ctx->ecs_dev_attr[i].dev_attr.attr;
> +
> +		sprintf(fru_ctx->name, "%s%d", EDAC_ECS_FRU_NAME, fru);
> +		group->name = fru_ctx->name;
> +		group->attrs = fru_ctx->ecs_attrs;
> +		group->is_visible  = ecs_attr_visible;
> +
> +		attr_groups[fru] = group;
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * edac_ecs_get_desc - get EDAC ECS descriptors
> + * @ecs_dev: client device, supports ECS feature
> + * @attr_groups: pointer to attribute group container
> + * @num_media_frus: number of media FRUs in the device
> + *
> + * Return:
> + *  * %0	- Success.
> + *  * %-EINVAL	- Invalid parameters passed.
> + *  * %-ENOMEM	- Dynamic memory allocation failed.
> + */
> +int edac_ecs_get_desc(struct device *ecs_dev,
> +		      const struct attribute_group **attr_groups, u16 num_media_frus)
> +{
> +	if (!ecs_dev || !attr_groups || !num_media_frus)
> +		return -EINVAL;
> +
> +	return ecs_create_desc(ecs_dev, attr_groups, num_media_frus);
> +}
> diff --git a/drivers/edac/edac_device.c b/drivers/edac/edac_device.c
> index 91552271b34a..5fc3ec7f25eb 100644
> --- a/drivers/edac/edac_device.c
> +++ b/drivers/edac/edac_device.c
> @@ -626,6 +626,9 @@ int edac_dev_register(struct device *parent, char *name,
>  			attr_gcnt++;
>  			scrub_cnt++;
>  			break;
> +		case RAS_FEAT_ECS:
> +			attr_gcnt += ras_features[feat].ecs_info.num_media_frus;
> +			break;
>  		default:
>  			return -EINVAL;
>  		}
> @@ -667,6 +670,18 @@ int edac_dev_register(struct device *parent, char *name,
>  			scrub_inst++;
>  			attr_gcnt++;
>  			break;
> +		case RAS_FEAT_ECS:
> +			if (!ras_features->ecs_ops)
> +				goto data_mem_free;

<---- newline here.

> +			dev_data = &ctx->ecs;
> +			dev_data->ecs_ops = ras_features->ecs_ops;
> +			dev_data->private = ras_features->ctx;
> +			ret = edac_ecs_get_desc(parent, &ras_attr_groups[attr_gcnt],
> +						ras_features->ecs_info.num_media_frus);
> +			if (ret)
> +				goto data_mem_free;

Ditto.

> +			attr_gcnt += ras_features->ecs_info.num_media_frus;
> +			break;
>  		default:
>  			ret = -EINVAL;
>  			goto data_mem_free;

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette


  reply	other threads:[~2024-10-28 11:17 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-25 17:13 [PATCH v14 00/14] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers shiju.jose
2024-10-25 17:13 ` [PATCH v14 01/14] EDAC: Add support for EDAC device features control shiju.jose
2024-10-25 17:13 ` [PATCH v14 02/14] EDAC: Add scrub control feature shiju.jose
2024-10-25 17:13 ` [PATCH v14 03/14] EDAC: Add ECS " shiju.jose
2024-10-28 11:16   ` Borislav Petkov [this message]
2024-10-28 16:03     ` Shiju Jose
2024-10-29 20:07       ` Borislav Petkov
2024-10-25 17:13 ` [PATCH v14 04/14] cxl: Add Get Supported Features command for kernel usage shiju.jose
2024-10-25 17:13 ` [PATCH v14 05/14] cxl/mbox: Add GET_FEATURE mailbox command shiju.jose
2024-10-29 15:47   ` Dave Jiang
2024-10-25 17:13 ` [PATCH v14 06/14] cxl/mbox: Add SET_FEATURE " shiju.jose
2024-10-29 15:51   ` Dave Jiang
2024-10-25 17:13 ` [PATCH v14 07/14] cxl/memfeature: Add CXL memory device patrol scrub control feature shiju.jose
2024-10-29 16:31   ` Dave Jiang
2024-10-29 17:00     ` Shiju Jose
2024-10-29 18:32       ` Dave Jiang
2024-10-30 16:16         ` Jonathan Cameron
2024-10-30 16:46           ` Dave Jiang
2024-10-29 20:15   ` Borislav Petkov
2024-10-30 12:52     ` Shiju Jose
2024-10-29 20:16   ` Borislav Petkov
2024-10-30 11:26     ` Shiju Jose
2024-10-25 17:13 ` [PATCH v14 08/14] cxl/memfeature: Add CXL memory device ECS " shiju.jose
2024-10-25 17:13 ` [PATCH v14 09/14] ACPI:RAS2: Add ACPI RAS2 driver shiju.jose
2024-10-25 17:13 ` [PATCH v14 10/14] ras: mem: Add memory " shiju.jose
2024-10-25 17:13 ` [PATCH v14 11/14] EDAC: Add memory repair control feature shiju.jose
2024-10-25 17:13 ` [PATCH v14 12/14] cxl/mbox: Add support for PERFORM_MAINTENANCE mailbox command shiju.jose
2024-10-25 17:13 ` [PATCH v14 13/14] cxl/memfeature: Add CXL memory device sPPR control feature shiju.jose
2024-10-25 17:13 ` [PATCH v14 14/14] cxl/memfeature: Add CXL memory device memory sparing " shiju.jose
2024-10-26 10:35 ` [PATCH v14 00/14] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241028111637.GSZx9yleFPOjTklIVr@fat_crate.local \
    --to=bp@alien8.de \
    --cc=Jon.Grimm@amd.com \
    --cc=Vilas.Sridharan@amd.com \
    --cc=Yazen.Ghannam@amd.com \
    --cc=alison.schofield@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=dferguson@amperecomputing.com \
    --cc=duenwen@google.com \
    --cc=erdemaktas@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=gthelen@google.com \
    --cc=ira.weiny@intel.com \
    --cc=james.morse@arm.com \
    --cc=jassisinghbrar@gmail.com \
    --cc=jiaqiyan@google.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=jthoughton@google.com \
    --cc=kangkang.shen@futurewei.com \
    --cc=lenb@kernel.org \
    --cc=leo.duran@amd.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxarm@huawei.com \
    --cc=mchehab@kernel.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=nifan.cxl@gmail.com \
    --cc=pgonda@google.com \
    --cc=prime.zeng@hisilicon.com \
    --cc=rafael@kernel.org \
    --cc=rientjes@google.com \
    --cc=roberto.sassu@huawei.com \
    --cc=shiju.jose@huawei.com \
    --cc=somasundaram.a@hpe.com \
    --cc=sudeep.holla@arm.com \
    --cc=tanxiaofei@huawei.com \
    --cc=tony.luck@intel.com \
    --cc=vishal.l.verma@intel.com \
    --cc=wanghuiqiang@huawei.com \
    --cc=wbs@os.amperecomputing.com \
    --cc=wschwartz@amperecomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox