linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Shiju Jose <shiju.jose@huawei.com>
To: Borislav Petkov <bp@alien8.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
	"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"tony.luck@intel.com" <tony.luck@intel.com>,
	"rafael@kernel.org" <rafael@kernel.org>,
	"lenb@kernel.org" <lenb@kernel.org>,
	"mchehab@kernel.org" <mchehab@kernel.org>,
	"dan.j.williams@intel.com" <dan.j.williams@intel.com>,
	"dave@stgolabs.net" <dave@stgolabs.net>,
	"Jonathan Cameron" <jonathan.cameron@huawei.com>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"sudeep.holla@arm.com" <sudeep.holla@arm.com>,
	"jassisinghbrar@gmail.com" <jassisinghbrar@gmail.com>,
	"dave.jiang@intel.com" <dave.jiang@intel.com>,
	"alison.schofield@intel.com" <alison.schofield@intel.com>,
	"vishal.l.verma@intel.com" <vishal.l.verma@intel.com>,
	"ira.weiny@intel.com" <ira.weiny@intel.com>,
	"david@redhat.com" <david@redhat.com>,
	"Vilas.Sridharan@amd.com" <Vilas.Sridharan@amd.com>,
	"leo.duran@amd.com" <leo.duran@amd.com>,
	"Yazen.Ghannam@amd.com" <Yazen.Ghannam@amd.com>,
	"rientjes@google.com" <rientjes@google.com>,
	"jiaqiyan@google.com" <jiaqiyan@google.com>,
	"Jon.Grimm@amd.com" <Jon.Grimm@amd.com>,
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
	"naoya.horiguchi@nec.com" <naoya.horiguchi@nec.com>,
	"james.morse@arm.com" <james.morse@arm.com>,
	"jthoughton@google.com" <jthoughton@google.com>,
	"somasundaram.a@hpe.com" <somasundaram.a@hpe.com>,
	"erdemaktas@google.com" <erdemaktas@google.com>,
	"pgonda@google.com" <pgonda@google.com>,
	"duenwen@google.com" <duenwen@google.com>,
	"gthelen@google.com" <gthelen@google.com>,
	"wschwartz@amperecomputing.com" <wschwartz@amperecomputing.com>,
	"dferguson@amperecomputing.com" <dferguson@amperecomputing.com>,
	"wbs@os.amperecomputing.com" <wbs@os.amperecomputing.com>,
	"nifan.cxl@gmail.com" <nifan.cxl@gmail.com>,
	tanxiaofei <tanxiaofei@huawei.com>,
	"Zengtao (B)" <prime.zeng@hisilicon.com>,
	"Roberto Sassu" <roberto.sassu@huawei.com>,
	"kangkang.shen@futurewei.com" <kangkang.shen@futurewei.com>,
	wanghuiqiang <wanghuiqiang@huawei.com>,
	Linuxarm <linuxarm@huawei.com>
Subject: RE: [PATCH v14 03/14] EDAC: Add ECS control feature
Date: Mon, 28 Oct 2024 16:03:31 +0000	[thread overview]
Message-ID: <2263e5a551e24a6ca63dbe33538a5ce8@huawei.com> (raw)
In-Reply-To: <20241028111637.GSZx9yleFPOjTklIVr@fat_crate.local>


Thanks for the comments.
>-----Original Message-----
>From: Borislav Petkov <bp@alien8.de>
>Sent: 28 October 2024 11:17
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: linux-edac@vger.kernel.org; linux-cxl@vger.kernel.org; linux-
>acpi@vger.kernel.org; linux-mm@kvack.org; linux-kernel@vger.kernel.org;
>tony.luck@intel.com; rafael@kernel.org; lenb@kernel.org;
>mchehab@kernel.org; dan.j.williams@intel.com; dave@stgolabs.net; Jonathan
>Cameron <jonathan.cameron@huawei.com>; gregkh@linuxfoundation.org;
>sudeep.holla@arm.com; jassisinghbrar@gmail.com; dave.jiang@intel.com;
>alison.schofield@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>david@redhat.com; Vilas.Sridharan@amd.com; leo.duran@amd.com;
>Yazen.Ghannam@amd.com; rientjes@google.com; jiaqiyan@google.com;
>Jon.Grimm@amd.com; dave.hansen@linux.intel.com;
>naoya.horiguchi@nec.com; james.morse@arm.com; jthoughton@google.com;
>somasundaram.a@hpe.com; erdemaktas@google.com; pgonda@google.com;
>duenwen@google.com; gthelen@google.com;
>wschwartz@amperecomputing.com; dferguson@amperecomputing.com;
>wbs@os.amperecomputing.com; nifan.cxl@gmail.com; tanxiaofei
><tanxiaofei@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>; Roberto
>Sassu <roberto.sassu@huawei.com>; kangkang.shen@futurewei.com;
>wanghuiqiang <wanghuiqiang@huawei.com>; Linuxarm
><linuxarm@huawei.com>
>Subject: Re: [PATCH v14 03/14] EDAC: Add ECS control feature
>
>On Fri, Oct 25, 2024 at 06:13:44PM +0100, shiju.jose@huawei.com wrote:
>> +What:		/sys/bus/edac/devices/<dev-
>name>/ecs_fruX/log_entry_type
>> +Date:		Jan 2025
>> +KernelVersion:	6.13
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		(RW) The log entry type of how the DDR5 ECS log is reported.
>> +		00b - per DRAM.
>> +		01b - per memory media FRU.
>
>If the conversion function here is kstrtoul(), why are those values not "0"
>and "1" but in binary format?
Followed the description for the " Log Entry Type " in the CXL spec rev 3.1 Table 8-210.
DDR5 ECS Control Feature Readable Attributes and Table 8-211. DDR5 ECS Control Feature
Writable Attributes. This was written as " Common DDR5 ECS Log Capabilities " though actually returns
the status of the log_entry_type. 

Common DDR5 ECS Log Capabilities
• Bits[1:0]: Log Entry Type: The log entry type of how the ECS log is
reported. The entry type is defined commonly for all memory media FRUs
within the device.
— 00b = Per DRAM
— 01b = Per Memory Media FRU
— All other encodings are reserved
• Bits[7:2]: Reserved.

I will update to 0 and 1. 
>
>> +
>> +What:		/sys/bus/edac/devices/<dev-
>name>/ecs_fruX/log_entry_type_per_dram
>> +Date:		Jan 2025
>> +KernelVersion:	6.13
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		(RO) True if current log entry type is per DRAM.
>> +
>> +What:		/sys/bus/edac/devices/<dev-
>name>/ecs_fruX/log_entry_type_per_memory_media
>> +Date:		Jan 2025
>> +KernelVersion:	6.13
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		(RO) True if current log entry type is per memory media FRU.
>
>What's the point of those two if log_entry_type already gives you the same info?
>
This was written as " Common DDR5 ECS Log Capabilities " in the CXL spec rev 3.1
Table 8-210. DDR5 ECS Control Feature Readable Attributes, though actually returns
the status of the log_entry_type. I will remove these extra log type attributes. 

>And the filename length is a bit too much...
>
>> +
>> +What:		/sys/bus/edac/devices/<dev-name>/ecs_fruX/mode
>> +Date:		Jan 2025
>> +KernelVersion:	6.13
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		(RW) The mode of how the DDR5 ECS counts the errors.
>> +		0 - ECS counts rows with errors.
>> +		1 - ECS counts codewords with errors.
>
>Now we have "0" and "1"s. Oh well.
>
>What are "rows", what are "codewords"? Explain them here pls for the user.
Will do.

>
>> +What:		/sys/bus/edac/devices/<dev-
>name>/ecs_fruX/mode_counts_rows
>> +Date:		Jan 2025
>> +KernelVersion:	6.13
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		(RO) True if current mode is ECS counts rows with errors.
>> +
>> +What:		/sys/bus/edac/devices/<dev-
>name>/ecs_fruX/mode_counts_codewords
>> +Date:		Jan 2025
>> +KernelVersion:	6.13
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		(RO) True if current mode is ECS counts codewords with errors.
>
>Same question as above - redundant files.
I will remove redundant files.

>
>> +What:		/sys/bus/edac/devices/<dev-name>/ecs_fruX/reset
>> +Date:		Jan 2025
>> +KernelVersion:	6.13
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		(WO) ECS reset ECC counter.
>> +		1 - reset ECC counter to the default value.
>
>1 or any value?
>
>Looks like any to me...
>
>You should restrict it to "1" in case you want to extend this interface with "2" in
>the future, for example, doing something a bit different.
I will update. Returns error for any other value set than '1'.
>
>> +
>> +What:		/sys/bus/edac/devices/<dev-name>/ecs_fruX/threshold
>> +Date:		Jan 2025
>> +KernelVersion:	6.13
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		(RW) ECS threshold count per gigabits of memory cells.
>
>That definitely needs more explanation.
Sure.
>
>> diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile index
>> 188501e676c7..b24c2c112d9c 100644
>> --- a/drivers/edac/Makefile
>> +++ b/drivers/edac/Makefile
>> @@ -10,7 +10,7 @@ obj-$(CONFIG_EDAC)			:= edac_core.o
>>
>>  edac_core-y	:= edac_mc.o edac_device.o edac_mc_sysfs.o
>>  edac_core-y	+= edac_module.o edac_device_sysfs.o wq.o
>> -edac_core-y	+= scrub.o
>> +edac_core-y	+= scrub.o ecs.o
>>
>>  edac_core-$(CONFIG_EDAC_DEBUG)		+= debugfs.o
>>
>> diff --git a/drivers/edac/ecs.c b/drivers/edac/ecs.c new file mode
>> 100755 index 000000000000..a2b64d7bf6b6
>> --- /dev/null
>> +++ b/drivers/edac/ecs.c
>> @@ -0,0 +1,240 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Generic ECS driver in order to support control the on die
>> + * error check scrub (e.g. DDR5 ECS).
>
>This sentence needs grammar check.
Will do.
>
>> The common sysfs ECS
>> + * interface abstracts the control of an arbitrary ECS
>> + * functionality to a common set of functions.
>> + *
>> + * Copyright (c) 2024 HiSilicon Limited.
>> + */
>> +
>
>#undef pr_fmt
>
>> +#define pr_fmt(fmt)     "EDAC ECS: " fmt
>
>Grep the tree for examples how to do that properly.
>
>Also, this pr_fmt looks unused.
I will remove.
>
>> +static umode_t ecs_attr_visible(struct kobject *kobj, struct
>> +attribute *a, int attr_id) {
>> +	struct device *ras_feat_dev = kobj_to_dev(kobj);
>> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
>> +	const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops;
>> +
>> +	switch (attr_id) {
>> +	case ECS_LOG_ENTRY_TYPE:
>> +		if (ops->get_log_entry_type)  {
>> +			if (ops->set_log_entry_type)
>> +				return a->mode;
>> +			else
>> +				return 0444;
>
>What is the goal for the access mode of all those sysfs entries? I sure hope it is
>going to be root-only no-matter what. I don't want normal users to cause scrub
>activity. Please make sure your whole set does that.
This is the attribute_group's is_visible() callback  for controlling the visibility
of the attributes based on whether attribute is added or not by the parent driver.
Yes.  Writable only by the root. 
>
>> +		}
>> +		break;
>> +	case ECS_LOG_ENTRY_TYPE_PER_DRAM:
>> +		if (ops->get_log_entry_type_per_dram)
>> +			return a->mode;
>> +		break;
[...]
>	EDAC_ECS_ATTR_RO(log_entry_type_per_dram, fru);
>> +		fru_ctx-
>>ecs_dev_attr[ECS_LOG_ENTRY_TYPE_PER_MEMORY_MEDIA] =
>> +
>	EDAC_ECS_ATTR_RO(log_entry_type_per_memory_media, fru);
>> +		fru_ctx->ecs_dev_attr[ECS_MODE] =
>EDAC_ECS_ATTR_RW(mode, fru);
>> +		fru_ctx->ecs_dev_attr[ECS_MODE_COUNTS_ROWS] =
>> +
>	EDAC_ECS_ATTR_RO(mode_counts_rows, fru);
>> +		fru_ctx->ecs_dev_attr[ECS_MODE_COUNTS_CODEWORDS] =
>> +
>	EDAC_ECS_ATTR_RO(mode_counts_codewords, fru);
>> +		fru_ctx->ecs_dev_attr[ECS_RESET] =
>EDAC_ECS_ATTR_WO(reset, fru);
>> +		fru_ctx->ecs_dev_attr[ECS_THRESHOLD] =
>EDAC_ECS_ATTR_RW(threshold,
>> +fru);
>
>Clearly too long variable and define names. Shorten pls.
Will do
>
>Also, a new line here:
Ok.
>
><---
>
>
[...]
>> +		case RAS_FEAT_ECS:
>> +			if (!ras_features->ecs_ops)
>> +				goto data_mem_free;
>
><---- newline here.
Wil do.
>
>> +			dev_data = &ctx->ecs;
>> +			dev_data->ecs_ops = ras_features->ecs_ops;
>> +			dev_data->private = ras_features->ctx;
>> +			ret = edac_ecs_get_desc(parent,
>&ras_attr_groups[attr_gcnt],
>> +						ras_features-
>>ecs_info.num_media_frus);
>> +			if (ret)
>> +				goto data_mem_free;
>
>Ditto.
Wil do.
>
>> +			attr_gcnt += ras_features->ecs_info.num_media_frus;
>> +			break;
>>  		default:
>>  			ret = -EINVAL;
>>  			goto data_mem_free;
>
>--
>Regards/Gruss,
>    Boris.

Thanks,
Shiju

  reply	other threads:[~2024-10-28 16:03 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-25 17:13 [PATCH v14 00/14] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers shiju.jose
2024-10-25 17:13 ` [PATCH v14 01/14] EDAC: Add support for EDAC device features control shiju.jose
2024-10-25 17:13 ` [PATCH v14 02/14] EDAC: Add scrub control feature shiju.jose
2024-10-25 17:13 ` [PATCH v14 03/14] EDAC: Add ECS " shiju.jose
2024-10-28 11:16   ` Borislav Petkov
2024-10-28 16:03     ` Shiju Jose [this message]
2024-10-29 20:07       ` Borislav Petkov
2024-10-25 17:13 ` [PATCH v14 04/14] cxl: Add Get Supported Features command for kernel usage shiju.jose
2024-10-25 17:13 ` [PATCH v14 05/14] cxl/mbox: Add GET_FEATURE mailbox command shiju.jose
2024-10-29 15:47   ` Dave Jiang
2024-10-25 17:13 ` [PATCH v14 06/14] cxl/mbox: Add SET_FEATURE " shiju.jose
2024-10-29 15:51   ` Dave Jiang
2024-10-25 17:13 ` [PATCH v14 07/14] cxl/memfeature: Add CXL memory device patrol scrub control feature shiju.jose
2024-10-29 16:31   ` Dave Jiang
2024-10-29 17:00     ` Shiju Jose
2024-10-29 18:32       ` Dave Jiang
2024-10-30 16:16         ` Jonathan Cameron
2024-10-30 16:46           ` Dave Jiang
2024-10-29 20:15   ` Borislav Petkov
2024-10-30 12:52     ` Shiju Jose
2024-10-29 20:16   ` Borislav Petkov
2024-10-30 11:26     ` Shiju Jose
2024-10-25 17:13 ` [PATCH v14 08/14] cxl/memfeature: Add CXL memory device ECS " shiju.jose
2024-10-25 17:13 ` [PATCH v14 09/14] ACPI:RAS2: Add ACPI RAS2 driver shiju.jose
2024-10-25 17:13 ` [PATCH v14 10/14] ras: mem: Add memory " shiju.jose
2024-10-25 17:13 ` [PATCH v14 11/14] EDAC: Add memory repair control feature shiju.jose
2024-10-25 17:13 ` [PATCH v14 12/14] cxl/mbox: Add support for PERFORM_MAINTENANCE mailbox command shiju.jose
2024-10-25 17:13 ` [PATCH v14 13/14] cxl/memfeature: Add CXL memory device sPPR control feature shiju.jose
2024-10-25 17:13 ` [PATCH v14 14/14] cxl/memfeature: Add CXL memory device memory sparing " shiju.jose
2024-10-26 10:35 ` [PATCH v14 00/14] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2263e5a551e24a6ca63dbe33538a5ce8@huawei.com \
    --to=shiju.jose@huawei.com \
    --cc=Jon.Grimm@amd.com \
    --cc=Vilas.Sridharan@amd.com \
    --cc=Yazen.Ghannam@amd.com \
    --cc=alison.schofield@intel.com \
    --cc=bp@alien8.de \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=dferguson@amperecomputing.com \
    --cc=duenwen@google.com \
    --cc=erdemaktas@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=gthelen@google.com \
    --cc=ira.weiny@intel.com \
    --cc=james.morse@arm.com \
    --cc=jassisinghbrar@gmail.com \
    --cc=jiaqiyan@google.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=jthoughton@google.com \
    --cc=kangkang.shen@futurewei.com \
    --cc=lenb@kernel.org \
    --cc=leo.duran@amd.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxarm@huawei.com \
    --cc=mchehab@kernel.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=nifan.cxl@gmail.com \
    --cc=pgonda@google.com \
    --cc=prime.zeng@hisilicon.com \
    --cc=rafael@kernel.org \
    --cc=rientjes@google.com \
    --cc=roberto.sassu@huawei.com \
    --cc=somasundaram.a@hpe.com \
    --cc=sudeep.holla@arm.com \
    --cc=tanxiaofei@huawei.com \
    --cc=tony.luck@intel.com \
    --cc=vishal.l.verma@intel.com \
    --cc=wanghuiqiang@huawei.com \
    --cc=wbs@os.amperecomputing.com \
    --cc=wschwartz@amperecomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox