[PATCH v2 0/8] cxl: support CXL memory RAS features

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/8] cxl: support CXL memory RAS features
@ 2025-03-20 18:04 shiju.jose
  2025-03-20 18:04 ` [PATCH v2 1/8] cxl: Add helper function to retrieve a feature entry shiju.jose
                   ` (8 more replies)
  0 siblings, 9 replies; 24+ messages in thread
From: shiju.jose @ 2025-03-20 18:04 UTC (permalink / raw)
  To: linux-cxl, dan.j.williams, dave, jonathan.cameron, dave.jiang,
	alison.schofield, vishal.l.verma, ira.weiny, david,
	Vilas.Sridharan
  Cc: linux-edac, linux-acpi, linux-mm, linux-kernel, bp, tony.luck,
	rafael, lenb, mchehab, leo.duran, Yazen.Ghannam, rientjes,
	jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen, gthelen,
	wschwartz, dferguson, wbs, nifan.cxl, yazen.ghannam, tanxiaofei,
	prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm,
	shiju.jose

From: Shiju Jose <shiju.jose@huawei.com>

Support for CXL memory RAS features: patrol scrub, ECS, soft-PPR and
memory sparing.

This CXL series was part of the EDAC series [1].

The code is based on cxl.git next branch [2] merged with ras.git edac-cxl
branch [3].

1. https://lore.kernel.org/linux-cxl/20250212143654.1893-1-shiju.jose@huawei.com/
2. https://web.git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=next
3. https://web.git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git/log/?h=edac-cxl

Userspace code for CXL memory repair features [4] and
sample boot-script for CXL memory repair [5].

[4]: https://lore.kernel.org/lkml/20250207143028.1865-1-shiju.jose@huawei.com/
[5]: https://lore.kernel.org/lkml/20250207143028.1865-5-shiju.jose@huawei.com/

Changes
=======
v1 -> v2:
1. Feedbacks from Dan Williams on v1,
   https://lore.kernel.org/linux-mm/20250307091137.00006a0a@huawei.com/T/
  - Fixed lock issues in region scrubbing, added local cxl_acquire()
    and cxl_unlock.
  - Replaced CXL examples using cat and echo from EDAC .rst docs
    with short description and ref to ABI docs. Also corrections
    in existing descriptions as suggested by Dan.
  - Add policy description for the scrub control feature.
    However this may require inputs from CXL experts. 
  - Replaced CONFIG_CXL_RAS_FEATURES with CONFIG_CXL_EDAC_MEM_FEATURES.
  - Few changes to depends part of CONFIG_CXL_EDAC_MEM_FEATURES.
  - Rename drivers/cxl/core/memfeatures.c as drivers/cxl/core/edac.c
  - snprintf() -> kasprintf() in few places.
  
2. Feedbacks from Alison on v1,
  - In cxl_get_feature_entry()(patch 1), return NULL on failures and
    reintroduced checks in cxl_get_feature_entry().
  - Changed logic in for loop in region based scrubbing code.
  - Replace cxl_are_decoders_committed() to cxl_is_memdev_memory_online()
    and add as a local function to drivers/cxl/core/edac.c
  - Changed few multiline comments to single line comments.
  - Removed unnecessary comments from the code.
  - Reduced line length of few macros in ECS and memory repair code.
  - In new files, changed "GPL-2.0-or-later" -> "GPL-2.0-only".
  - Ran clang-format for new files and updated.                                                                                                                                                                       
3. Changes for feedbacks from Jonathan on v1.
  - Changed few multiline comments to single line comments.

Shiju Jose (8):
  cxl: Add helper function to retrieve a feature entry
  EDAC: Update documentation for the CXL memory patrol scrub control
    feature
  cxl/edac: Add CXL memory device patrol scrub control feature
  cxl/edac: Add CXL memory device ECS control feature
  cxl/mbox: Add support for PERFORM_MAINTENANCE mailbox command
  cxl: Support for finding memory operation attributes from the current
    boot
  cxl/memfeature: Add CXL memory device soft PPR control feature
  cxl/memfeature: Add CXL memory device memory sparing control feature

 Documentation/edac/memory_repair.rst |   31 +
 Documentation/edac/scrub.rst         |   47 +
 drivers/cxl/Kconfig                  |   27 +
 drivers/cxl/core/Makefile            |    1 +
 drivers/cxl/core/core.h              |    2 +
 drivers/cxl/core/edac.c              | 1730 ++++++++++++++++++++++++++
 drivers/cxl/core/features.c          |   23 +
 drivers/cxl/core/mbox.c              |   45 +-
 drivers/cxl/core/memdev.c            |    9 +
 drivers/cxl/core/ras.c               |  145 +++
 drivers/cxl/core/region.c            |    5 +
 drivers/cxl/cxlmem.h                 |   73 ++
 drivers/cxl/mem.c                    |    4 +
 drivers/cxl/pci.c                    |    3 +
 drivers/edac/mem_repair.c            |    9 +
 include/linux/edac.h                 |    7 +
 16 files changed, 2159 insertions(+), 2 deletions(-)
 create mode 100644 drivers/cxl/core/edac.c

-- 
2.43.0



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/8] cxl: Add helper function to retrieve a feature entry
  2025-03-20 18:04 [PATCH v2 0/8] cxl: support CXL memory RAS features shiju.jose
@ 2025-03-20 18:04 ` shiju.jose
  2025-03-26 21:32   ` Dan Williams
  2025-03-20 18:04 ` [PATCH v2 2/8] EDAC: Update documentation for the CXL memory patrol scrub control feature shiju.jose
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: shiju.jose @ 2025-03-20 18:04 UTC (permalink / raw)
  To: linux-cxl, dan.j.williams, dave, jonathan.cameron, dave.jiang,
	alison.schofield, vishal.l.verma, ira.weiny, david,
	Vilas.Sridharan
  Cc: linux-edac, linux-acpi, linux-mm, linux-kernel, bp, tony.luck,
	rafael, lenb, mchehab, leo.duran, Yazen.Ghannam, rientjes,
	jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen, gthelen,
	wschwartz, dferguson, wbs, nifan.cxl, yazen.ghannam, tanxiaofei,
	prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm,
	shiju.jose

From: Shiju Jose <shiju.jose@huawei.com>

Add helper function to retrieve a feature entry from the supported
features list, if supported.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 drivers/cxl/core/core.h     |  2 ++
 drivers/cxl/core/features.c | 23 +++++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 1803aedb25ca..16bc717376fc 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -123,6 +123,8 @@ int cxl_ras_init(void);
 void cxl_ras_exit(void);
 
 #ifdef CONFIG_CXL_FEATURES
+struct cxl_feat_entry *cxl_get_feature_entry(struct cxl_dev_state *cxlds,
+					     const uuid_t *feat_uuid);
 size_t cxl_get_feature(struct cxl_mailbox *cxl_mbox, const uuid_t *feat_uuid,
 		       enum cxl_get_feat_selection selection,
 		       void *feat_out, size_t feat_out_size, u16 offset,
diff --git a/drivers/cxl/core/features.c b/drivers/cxl/core/features.c
index 048ba4fc3538..202c8c21930c 100644
--- a/drivers/cxl/core/features.c
+++ b/drivers/cxl/core/features.c
@@ -203,6 +203,29 @@ int devm_cxl_setup_features(struct cxl_dev_state *cxlds)
 }
 EXPORT_SYMBOL_NS_GPL(devm_cxl_setup_features, "CXL");
 
+struct cxl_feat_entry *cxl_get_feature_entry(struct cxl_dev_state *cxlds,
+					     const uuid_t *feat_uuid)
+{
+	struct cxl_features_state *cxlfs = to_cxlfs(cxlds);
+	struct cxl_feat_entry *feat_entry;
+	int count;
+
+	if (!cxlfs || !cxlfs->entries || !cxlfs->entries->num_features)
+		return NULL;
+
+	/*
+	 * Retrieve the feature entry from the supported features list,
+	 * if the feature is supported.
+	 */
+	feat_entry = cxlfs->entries->ent;
+	for (count = 0; count < cxlfs->entries->num_features; count++, feat_entry++) {
+		if (uuid_equal(&feat_entry->uuid, feat_uuid))
+			return feat_entry;
+	}
+
+	return NULL;
+}
+
 size_t cxl_get_feature(struct cxl_mailbox *cxl_mbox, const uuid_t *feat_uuid,
 		       enum cxl_get_feat_selection selection,
 		       void *feat_out, size_t feat_out_size, u16 offset,
-- 
2.43.0



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 2/8] EDAC: Update documentation for the CXL memory patrol scrub control feature
  2025-03-20 18:04 [PATCH v2 0/8] cxl: support CXL memory RAS features shiju.jose
  2025-03-20 18:04 ` [PATCH v2 1/8] cxl: Add helper function to retrieve a feature entry shiju.jose
@ 2025-03-20 18:04 ` shiju.jose
  2025-03-21 10:03   ` Jonathan Cameron
  2025-03-20 18:04 ` [PATCH v2 3/8] cxl/edac: Add CXL memory device " shiju.jose
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: shiju.jose @ 2025-03-20 18:04 UTC (permalink / raw)
  To: linux-cxl, dan.j.williams, dave, jonathan.cameron, dave.jiang,
	alison.schofield, vishal.l.verma, ira.weiny, david,
	Vilas.Sridharan
  Cc: linux-edac, linux-acpi, linux-mm, linux-kernel, bp, tony.luck,
	rafael, lenb, mchehab, leo.duran, Yazen.Ghannam, rientjes,
	jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen, gthelen,
	wschwartz, dferguson, wbs, nifan.cxl, yazen.ghannam, tanxiaofei,
	prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm,
	shiju.jose

From: Shiju Jose <shiju.jose@huawei.com>

Update the Documentation/edac/scrub.rst to include descriptions and
policies for CXL memory device-based and CXL region-based patrol scrub
control.

Note: This may require inputs from CXL memory experts regarding
region-based scrubbing policies.

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 Documentation/edac/scrub.rst | 47 ++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/Documentation/edac/scrub.rst b/Documentation/edac/scrub.rst
index daab929cdba1..d1c02bd90090 100644
--- a/Documentation/edac/scrub.rst
+++ b/Documentation/edac/scrub.rst
@@ -264,3 +264,51 @@ Sysfs files are documented in
 `Documentation/ABI/testing/sysfs-edac-scrub`
 
 `Documentation/ABI/testing/sysfs-edac-ecs`
+
+Examples
+--------
+
+The usage takes the form shown in these examples:
+
+1. CXL memory device patrol scrubber
+
+1.1 Device based scrubbing
+
+CXL memory is exposed to memory management subsystem and ultimately userspace
+via CXL devices.
+
+For cases where hardware interleave controls do not directly map to regions of
+Physical Address space, perhaps due to interleave the approach described in 
+1.2 Region based scrubbing section, which is specific to CXL regions should be
+followed. In those cases settings on the presented interface may interact with
+direct control via a device instance specific interface and care must be taken.
+
+Sysfs files for scrubbing are documented in
+`Documentation/ABI/testing/sysfs-edac-scrub`
+
+1.2. Region based scrubbing
+
+CXL memory is exposed to memory management subsystem and ultimately userspace
+via CXL regions. CXL Regions represent mapped memory capacity in system
+physical address space. These can incorporate one or more parts of multiple CXL
+memory devices with traffic interleaved across them. The user may want to control
+the scrub rate via this more abstract region instead of having to figure out the
+constituent devices and program them separately. The scrub rate for each device
+covers the whole device. Thus if multiple regions use parts of that device then
+requests for scrubbing of other regions may result in a higher scrub rate than
+requested for this specific region.
+
+1. When user sets scrub rate for a memory region, the scrub rate for all the CXL
+   memory devices interleaved under that region is updated with the same scrub
+   rate.
+
+2. When user sets scrub rate for a memory device, only the scrub rate for that
+   memory devices is updated though device may be part of a memory region and
+   does not change scrub rate of other memory devices of that memory region.
+
+3. Scrub rate of a CXL memory device may be set via EDAC device or region scrub
+   interface simultaneously. Care must be taken to prevent a race condition, or
+   only region-based setting may be allowed.
+
+Sysfs files for scrubbing are documented in
+`Documentation/ABI/testing/sysfs-edac-scrub`
-- 
2.43.0



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 3/8] cxl/edac: Add CXL memory device patrol scrub control feature
  2025-03-20 18:04 [PATCH v2 0/8] cxl: support CXL memory RAS features shiju.jose
  2025-03-20 18:04 ` [PATCH v2 1/8] cxl: Add helper function to retrieve a feature entry shiju.jose
  2025-03-20 18:04 ` [PATCH v2 2/8] EDAC: Update documentation for the CXL memory patrol scrub control feature shiju.jose
@ 2025-03-20 18:04 ` shiju.jose
  2025-03-27  1:47   ` Dan Williams
  2025-03-20 18:04 ` [PATCH v2 4/8] cxl/edac: Add CXL memory device ECS " shiju.jose
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: shiju.jose @ 2025-03-20 18:04 UTC (permalink / raw)
  To: linux-cxl, dan.j.williams, dave, jonathan.cameron, dave.jiang,
	alison.schofield, vishal.l.verma, ira.weiny, david,
	Vilas.Sridharan
  Cc: linux-edac, linux-acpi, linux-mm, linux-kernel, bp, tony.luck,
	rafael, lenb, mchehab, leo.duran, Yazen.Ghannam, rientjes,
	jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen, gthelen,
	wschwartz, dferguson, wbs, nifan.cxl, yazen.ghannam, tanxiaofei,
	prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm,
	shiju.jose

From: Shiju Jose <shiju.jose@huawei.com>

CXL spec 3.2 section 8.2.10.9.11.1 describes the device patrol scrub
control feature. The device patrol scrub proactively locates and makes
corrections to errors in regular cycle.

Allow specifying the number of hours within which the patrol scrub must be
completed, subject to minimum and maximum limits reported by the device.
Also allow disabling scrub allowing trade-off error rates against
performance.

Add support for patrol scrub control on CXL memory devices.
Register with the EDAC device driver, which retrieves the scrub attribute
descriptors from EDAC scrub and exposes the sysfs scrub control attributes
to userspace. For example, scrub control for the CXL memory device
"cxl_mem0" is exposed in /sys/bus/edac/devices/cxl_mem0/scrubX/.

Additionally, add support for region-based CXL memory patrol scrub control.
CXL memory regions may be interleaved across one or more CXL memory
devices. For example, region-based scrub control for "cxl_region1" is
exposed in /sys/bus/edac/devices/cxl_region1/scrubX/.

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 drivers/cxl/Kconfig       |  25 ++
 drivers/cxl/core/Makefile |   1 +
 drivers/cxl/core/edac.c   | 474 ++++++++++++++++++++++++++++++++++++++
 drivers/cxl/core/region.c |   5 +
 drivers/cxl/cxlmem.h      |  10 +
 drivers/cxl/mem.c         |   4 +
 6 files changed, 519 insertions(+)
 create mode 100644 drivers/cxl/core/edac.c

diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 205547e5543a..b5ede1308425 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -113,6 +113,31 @@ config CXL_FEATURES
 
 	  If unsure say 'n'
 
+config CXL_EDAC_MEM_FEATURES
+	bool "CXL: EDAC Memory Features"
+	depends on EXPERT
+	depends on CXL_MEM
+	depends on CXL_FEATURES
+	depends on EDAC >= CXL_BUS
+	depends on EDAC_SCRUB
+	help
+	  The CXL EDAC memory feature control is optional and allows host
+	  to control the EDAC memory features configurations of CXL memory
+	  expander devices.
+
+	  When enabled 'cxl_mem' and 'cxl_region' EDAC devices are published
+	  with memory scrub control attributes as described by
+	  Documentation/ABI/testing/sysfs-edac-scrub.
+
+	  When enabled 'cxl_mem' EDAC devices are published with memory ECS
+	  and repair control attributes as described by
+	  Documentation/ABI/testing/sysfs-edac-ecs and
+	  Documentation/ABI/testing/sysfs-edac-memory-repair respectively.
+
+	  Say 'y/m' if you have an expert need to change default settings
+	  of a memory RAS feature established by the platform/device (eg.
+	  scrub rates for the patrol scrub feature). otherwise say 'n'.
+
 config CXL_PORT
 	default CXL_BUS
 	tristate
diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
index 139b349b3a52..9b86fb22e5de 100644
--- a/drivers/cxl/core/Makefile
+++ b/drivers/cxl/core/Makefile
@@ -19,4 +19,5 @@ cxl_core-y += ras.o
 cxl_core-$(CONFIG_TRACING) += trace.o
 cxl_core-$(CONFIG_CXL_REGION) += region.o
 cxl_core-$(CONFIG_CXL_FEATURES) += features.o
+cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += edac.o
 cxl_core-$(CONFIG_CXL_MCE) += mce.o
diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c
new file mode 100644
index 000000000000..5ec3535785e1
--- /dev/null
+++ b/drivers/cxl/core/edac.c
@@ -0,0 +1,474 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * CXL EDAC memory feature driver.
+ *
+ * Copyright (c) 2024-2025 HiSilicon Limited.
+ *
+ *  - Supports functions to configure EDAC features of the
+ *    CXL memory devices.
+ *  - Registers with the EDAC device subsystem driver to expose
+ *    the features sysfs attributes to the user for configuring
+ *    CXL memory RAS feature.
+ */
+
+#include <linux/cleanup.h>
+#include <linux/edac.h>
+#include <linux/limits.h>
+#include <cxl/features.h>
+#include <cxl.h>
+#include <cxlmem.h>
+#include "core.h"
+
+#define CXL_NR_EDAC_DEV_FEATURES 1
+
+static struct rw_semaphore *cxl_acquire(struct rw_semaphore *rwsem)
+{
+	if (down_read_interruptible(rwsem))
+		return NULL;
+
+	return rwsem;
+}
+
+DEFINE_FREE(cxl_unlock, struct rw_semaphore *, if (_T) up_read(_T))
+
+/*
+ * CXL memory patrol scrub control
+ */
+struct cxl_patrol_scrub_context {
+	u8 instance;
+	u16 get_feat_size;
+	u16 set_feat_size;
+	u8 get_version;
+	u8 set_version;
+	u16 effects;
+	struct cxl_memdev *cxlmd;
+	struct cxl_region *cxlr;
+};
+
+/**
+ * struct cxl_memdev_ps_params - CXL memory patrol scrub parameter data structure.
+ * @enable:     [IN & OUT] enable(1)/disable(0) patrol scrub.
+ * @scrub_cycle_changeable: [OUT] scrub cycle attribute of patrol scrub is changeable.
+ * @scrub_cycle_hrs:    [IN] Requested patrol scrub cycle in hours.
+ *                      [OUT] Current patrol scrub cycle in hours.
+ * @min_scrub_cycle_hrs:[OUT] minimum patrol scrub cycle in hours supported.
+ */
+struct cxl_memdev_ps_params {
+	bool enable;
+	bool scrub_cycle_changeable;
+	u8 scrub_cycle_hrs;
+	u8 min_scrub_cycle_hrs;
+};
+
+enum cxl_scrub_param {
+	CXL_PS_PARAM_ENABLE,
+	CXL_PS_PARAM_SCRUB_CYCLE,
+};
+
+#define CXL_MEMDEV_PS_SCRUB_CYCLE_CHANGE_CAP_MASK BIT(0)
+#define CXL_MEMDEV_PS_SCRUB_CYCLE_REALTIME_REPORT_CAP_MASK BIT(1)
+#define CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK GENMASK(7, 0)
+#define CXL_MEMDEV_PS_MIN_SCRUB_CYCLE_MASK GENMASK(15, 8)
+#define CXL_MEMDEV_PS_FLAG_ENABLED_MASK BIT(0)
+
+/*
+ * See CXL spec rev 3.2 @8.2.10.9.11.1 Table 8-222 Device Patrol Scrub Control
+ * Feature Readable Attributes.
+ */
+struct cxl_memdev_ps_rd_attrs {
+	u8 scrub_cycle_cap;
+	__le16 scrub_cycle_hrs;
+	u8 scrub_flags;
+} __packed;
+
+/*
+ * See CXL spec rev 3.2 @8.2.10.9.11.1 Table 8-223 Device Patrol Scrub Control
+ * Feature Writable Attributes.
+ */
+struct cxl_memdev_ps_wr_attrs {
+	u8 scrub_cycle_hrs;
+	u8 scrub_flags;
+} __packed;
+
+static int cxl_mem_ps_get_attrs(struct cxl_mailbox *cxl_mbox,
+				struct cxl_memdev_ps_params *params)
+{
+	size_t rd_data_size = sizeof(struct cxl_memdev_ps_rd_attrs);
+	u16 scrub_cycle_hrs;
+	size_t data_size;
+	struct cxl_memdev_ps_rd_attrs *rd_attrs __free(kfree) =
+		kzalloc(rd_data_size, GFP_KERNEL);
+	if (!rd_attrs)
+		return -ENOMEM;
+
+	data_size = cxl_get_feature(cxl_mbox, &CXL_FEAT_PATROL_SCRUB_UUID,
+				    CXL_GET_FEAT_SEL_CURRENT_VALUE, rd_attrs,
+				    rd_data_size, 0, NULL);
+	if (!data_size)
+		return -EIO;
+
+	params->scrub_cycle_changeable =
+		FIELD_GET(CXL_MEMDEV_PS_SCRUB_CYCLE_CHANGE_CAP_MASK,
+			  rd_attrs->scrub_cycle_cap);
+	params->enable = FIELD_GET(CXL_MEMDEV_PS_FLAG_ENABLED_MASK,
+				   rd_attrs->scrub_flags);
+	scrub_cycle_hrs = le16_to_cpu(rd_attrs->scrub_cycle_hrs);
+	params->scrub_cycle_hrs =
+		FIELD_GET(CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK, scrub_cycle_hrs);
+	params->min_scrub_cycle_hrs =
+		FIELD_GET(CXL_MEMDEV_PS_MIN_SCRUB_CYCLE_MASK, scrub_cycle_hrs);
+
+	return 0;
+}
+
+static int cxl_ps_get_attrs(struct cxl_patrol_scrub_context *cxl_ps_ctx,
+			    struct cxl_memdev_ps_params *params)
+{
+	struct cxl_mailbox *cxl_mbox;
+	struct cxl_memdev *cxlmd;
+	u8 min_scrub_cycle = U8_MAX;
+	int i, ret;
+
+	if (cxl_ps_ctx->cxlr) {
+		struct cxl_region *cxlr = cxl_ps_ctx->cxlr;
+		struct cxl_region_params *p = &cxlr->params;
+
+		struct rw_semaphore *region_lock __free(cxl_unlock) =
+			cxl_acquire(&cxl_region_rwsem);
+		if (!region_lock)
+			return -EINTR;
+
+		for (i = 0; i < p->nr_targets; i++) {
+			struct cxl_endpoint_decoder *cxled = p->targets[i];
+
+			cxlmd = cxled_to_memdev(cxled);
+			cxl_mbox = &cxlmd->cxlds->cxl_mbox;
+			ret = cxl_mem_ps_get_attrs(cxl_mbox, params);
+			if (ret)
+				return ret;
+
+			min_scrub_cycle = min(params->min_scrub_cycle_hrs,
+					      min_scrub_cycle);
+		}
+
+		params->min_scrub_cycle_hrs = min_scrub_cycle;
+		return 0;
+	}
+	cxl_mbox = &cxl_ps_ctx->cxlmd->cxlds->cxl_mbox;
+
+	return cxl_mem_ps_get_attrs(cxl_mbox, params);
+}
+
+static int cxl_mem_ps_set_attrs(struct device *dev,
+				struct cxl_patrol_scrub_context *cxl_ps_ctx,
+				struct cxl_mailbox *cxl_mbox,
+				struct cxl_memdev_ps_params *params,
+				enum cxl_scrub_param param_type)
+{
+	struct cxl_memdev_ps_wr_attrs wr_attrs;
+	struct cxl_memdev_ps_params rd_params;
+	int ret;
+
+	ret = cxl_mem_ps_get_attrs(cxl_mbox, &rd_params);
+	if (ret) {
+		dev_dbg(dev,
+			"Get cxlmemdev patrol scrub params failed ret=%d\n",
+			ret);
+		return ret;
+	}
+
+	switch (param_type) {
+	case CXL_PS_PARAM_ENABLE:
+		wr_attrs.scrub_flags = FIELD_PREP(CXL_MEMDEV_PS_FLAG_ENABLED_MASK,
+						  params->enable);
+		wr_attrs.scrub_cycle_hrs =
+			FIELD_PREP(CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK,
+				   rd_params.scrub_cycle_hrs);
+		break;
+	case CXL_PS_PARAM_SCRUB_CYCLE:
+		if (params->scrub_cycle_hrs < rd_params.min_scrub_cycle_hrs) {
+			dev_dbg(dev,
+				"Invalid CXL patrol scrub cycle(%d) to set\n",
+				params->scrub_cycle_hrs);
+			dev_dbg(dev,
+				"Minimum supported CXL patrol scrub cycle in hour %d\n",
+				rd_params.min_scrub_cycle_hrs);
+			return -EINVAL;
+		}
+		wr_attrs.scrub_cycle_hrs =
+			FIELD_PREP(CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK,
+				   params->scrub_cycle_hrs);
+		wr_attrs.scrub_flags = FIELD_PREP(CXL_MEMDEV_PS_FLAG_ENABLED_MASK,
+						  rd_params.enable);
+		break;
+	}
+
+	ret = cxl_set_feature(cxl_mbox, &CXL_FEAT_PATROL_SCRUB_UUID,
+			      cxl_ps_ctx->set_version, &wr_attrs,
+			      sizeof(wr_attrs),
+			      CXL_SET_FEAT_FLAG_DATA_SAVED_ACROSS_RESET, 0,
+			      NULL);
+	if (ret) {
+		dev_dbg(dev, "CXL patrol scrub set feature failed ret=%d\n",
+			ret);
+		return ret;
+	}
+
+	return 0;
+}
+
+static int cxl_ps_set_attrs(struct device *dev,
+			    struct cxl_patrol_scrub_context *cxl_ps_ctx,
+			    struct cxl_memdev_ps_params *params,
+			    enum cxl_scrub_param param_type)
+{
+	struct cxl_mailbox *cxl_mbox;
+	struct cxl_memdev *cxlmd;
+	int ret, i;
+
+	if (cxl_ps_ctx->cxlr) {
+		struct cxl_region *cxlr = cxl_ps_ctx->cxlr;
+		struct cxl_region_params *p = &cxlr->params;
+
+		struct rw_semaphore *region_lock __free(cxl_unlock) =
+			cxl_acquire(&cxl_region_rwsem);
+		if (!region_lock)
+			return -EINTR;
+
+		for (i = 0; i < p->nr_targets; i++) {
+			struct cxl_endpoint_decoder *cxled = p->targets[i];
+
+			cxlmd = cxled_to_memdev(cxled);
+			cxl_mbox = &cxlmd->cxlds->cxl_mbox;
+			ret = cxl_mem_ps_set_attrs(dev, cxl_ps_ctx, cxl_mbox,
+						   params, param_type);
+			if (ret)
+				return ret;
+		}
+
+		return 0;
+	}
+	cxl_mbox = &cxl_ps_ctx->cxlmd->cxlds->cxl_mbox;
+
+	return cxl_mem_ps_set_attrs(dev, cxl_ps_ctx, cxl_mbox, params,
+				    param_type);
+}
+
+static int cxl_patrol_scrub_get_enabled_bg(struct device *dev, void *drv_data,
+					   bool *enabled)
+{
+	struct cxl_patrol_scrub_context *ctx = drv_data;
+	struct cxl_memdev_ps_params params;
+	int ret;
+
+	ret = cxl_ps_get_attrs(ctx, &params);
+	if (ret)
+		return ret;
+
+	*enabled = params.enable;
+
+	return 0;
+}
+
+static int cxl_patrol_scrub_set_enabled_bg(struct device *dev, void *drv_data,
+					   bool enable)
+{
+	struct cxl_patrol_scrub_context *ctx = drv_data;
+	struct cxl_memdev_ps_params params = {
+		.enable = enable,
+	};
+
+	return cxl_ps_set_attrs(dev, ctx, &params, CXL_PS_PARAM_ENABLE);
+}
+
+static int cxl_patrol_scrub_read_min_scrub_cycle(struct device *dev,
+						 void *drv_data, u32 *min)
+{
+	struct cxl_patrol_scrub_context *ctx = drv_data;
+	struct cxl_memdev_ps_params params;
+	int ret;
+
+	ret = cxl_ps_get_attrs(ctx, &params);
+	if (ret)
+		return ret;
+	*min = params.min_scrub_cycle_hrs * 3600;
+
+	return 0;
+}
+
+static int cxl_patrol_scrub_read_max_scrub_cycle(struct device *dev,
+						 void *drv_data, u32 *max)
+{
+	*max = U8_MAX * 3600; /* Max set by register size */
+
+	return 0;
+}
+
+static int cxl_patrol_scrub_read_scrub_cycle(struct device *dev, void *drv_data,
+					     u32 *scrub_cycle_secs)
+{
+	struct cxl_patrol_scrub_context *ctx = drv_data;
+	struct cxl_memdev_ps_params params;
+	int ret;
+
+	ret = cxl_ps_get_attrs(ctx, &params);
+	if (ret)
+		return ret;
+
+	*scrub_cycle_secs = params.scrub_cycle_hrs * 3600;
+
+	return 0;
+}
+
+static int cxl_patrol_scrub_write_scrub_cycle(struct device *dev,
+					      void *drv_data,
+					      u32 scrub_cycle_secs)
+{
+	struct cxl_patrol_scrub_context *ctx = drv_data;
+	struct cxl_memdev_ps_params params = {
+		.scrub_cycle_hrs = scrub_cycle_secs / 3600,
+	};
+
+	return cxl_ps_set_attrs(dev, ctx, &params, CXL_PS_PARAM_SCRUB_CYCLE);
+}
+
+static const struct edac_scrub_ops cxl_ps_scrub_ops = {
+	.get_enabled_bg = cxl_patrol_scrub_get_enabled_bg,
+	.set_enabled_bg = cxl_patrol_scrub_set_enabled_bg,
+	.get_min_cycle = cxl_patrol_scrub_read_min_scrub_cycle,
+	.get_max_cycle = cxl_patrol_scrub_read_max_scrub_cycle,
+	.get_cycle_duration = cxl_patrol_scrub_read_scrub_cycle,
+	.set_cycle_duration = cxl_patrol_scrub_write_scrub_cycle,
+};
+
+static int cxl_memdev_scrub_init(struct cxl_memdev *cxlmd,
+				 struct edac_dev_feature *ras_feature,
+				 u8 scrub_inst)
+{
+	struct cxl_patrol_scrub_context *cxl_ps_ctx;
+	struct cxl_feat_entry *feat_entry;
+
+	feat_entry = cxl_get_feature_entry(cxlmd->cxlds,
+					   &CXL_FEAT_PATROL_SCRUB_UUID);
+	if (!feat_entry)
+		return -EOPNOTSUPP;
+
+	if (!(le32_to_cpu(feat_entry->flags) & CXL_FEATURE_F_CHANGEABLE))
+		return -EOPNOTSUPP;
+
+	cxl_ps_ctx = devm_kzalloc(&cxlmd->dev, sizeof(*cxl_ps_ctx), GFP_KERNEL);
+	if (!cxl_ps_ctx)
+		return -ENOMEM;
+
+	*cxl_ps_ctx = (struct cxl_patrol_scrub_context){
+		.get_feat_size = le16_to_cpu(feat_entry->get_feat_size),
+		.set_feat_size = le16_to_cpu(feat_entry->set_feat_size),
+		.get_version = feat_entry->get_feat_ver,
+		.set_version = feat_entry->set_feat_ver,
+		.effects = le16_to_cpu(feat_entry->effects),
+		.instance = scrub_inst,
+		.cxlmd = cxlmd,
+	};
+
+	ras_feature->ft_type = RAS_FEAT_SCRUB;
+	ras_feature->instance = cxl_ps_ctx->instance;
+	ras_feature->scrub_ops = &cxl_ps_scrub_ops;
+	ras_feature->ctx = cxl_ps_ctx;
+
+	return 0;
+}
+
+static int cxl_region_scrub_init(struct cxl_region *cxlr,
+				 struct edac_dev_feature *ras_feature,
+				 u8 scrub_inst)
+{
+	struct cxl_patrol_scrub_context *cxl_ps_ctx;
+	struct cxl_region_params *p = &cxlr->params;
+	struct cxl_feat_entry *feat_entry = NULL;
+	struct cxl_memdev *cxlmd;
+	int i;
+
+	/*
+	 * The cxl_region_rwsem must be held if the code below is used in a context
+	 * other than when the region is in the probe state, as shown here.
+	 */
+	for (i = 0; i < p->nr_targets; i++) {
+		struct cxl_endpoint_decoder *cxled = p->targets[i];
+
+		cxlmd = cxled_to_memdev(cxled);
+		feat_entry = cxl_get_feature_entry(cxlmd->cxlds,
+						   &CXL_FEAT_PATROL_SCRUB_UUID);
+		if (!feat_entry)
+			return -EOPNOTSUPP;
+
+		if (!(le32_to_cpu(feat_entry->flags) &
+		      CXL_FEATURE_F_CHANGEABLE))
+			return -EOPNOTSUPP;
+	}
+
+	cxl_ps_ctx = devm_kzalloc(&cxlr->dev, sizeof(*cxl_ps_ctx), GFP_KERNEL);
+	if (!cxl_ps_ctx)
+		return -ENOMEM;
+
+	*cxl_ps_ctx = (struct cxl_patrol_scrub_context){
+		.get_feat_size = le16_to_cpu(feat_entry->get_feat_size),
+		.set_feat_size = le16_to_cpu(feat_entry->set_feat_size),
+		.get_version = feat_entry->get_feat_ver,
+		.set_version = feat_entry->set_feat_ver,
+		.effects = le16_to_cpu(feat_entry->effects),
+		.instance = scrub_inst,
+		.cxlr = cxlr,
+	};
+
+	ras_feature->ft_type = RAS_FEAT_SCRUB;
+	ras_feature->instance = cxl_ps_ctx->instance;
+	ras_feature->scrub_ops = &cxl_ps_scrub_ops;
+	ras_feature->ctx = cxl_ps_ctx;
+
+	return 0;
+}
+
+int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd)
+{
+	struct edac_dev_feature ras_features[CXL_NR_EDAC_DEV_FEATURES];
+	int num_ras_features = 0;
+	u8 scrub_inst = 0;
+	int rc;
+
+	rc = cxl_memdev_scrub_init(cxlmd, &ras_features[num_ras_features],
+				   scrub_inst);
+	if (rc < 0 && rc != -EOPNOTSUPP)
+		return rc;
+
+	if (rc != -EOPNOTSUPP)
+		num_ras_features++;
+
+	char *cxl_dev_name __free(kfree) =
+		kasprintf(GFP_KERNEL, "cxl_%s", dev_name(&cxlmd->dev));
+
+	return edac_dev_register(&cxlmd->dev, cxl_dev_name, NULL,
+				 num_ras_features, ras_features);
+}
+EXPORT_SYMBOL_NS_GPL(devm_cxl_memdev_edac_register, "CXL");
+
+int devm_cxl_region_edac_register(struct cxl_region *cxlr)
+{
+	struct edac_dev_feature ras_features[CXL_NR_EDAC_DEV_FEATURES];
+	int num_ras_features = 0;
+	u8 scrub_inst = 0;
+	int rc;
+
+	rc = cxl_region_scrub_init(cxlr, &ras_features[num_ras_features],
+				   scrub_inst);
+	if (rc < 0)
+		return rc;
+
+	num_ras_features++;
+
+	char *cxl_dev_name __free(kfree) =
+		kasprintf(GFP_KERNEL, "cxl_%s", dev_name(&cxlr->dev));
+
+	return edac_dev_register(&cxlr->dev, cxl_dev_name, NULL,
+				 num_ras_features, ras_features);
+}
+EXPORT_SYMBOL_NS_GPL(devm_cxl_region_edac_register, "CXL");
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index b3260d433ec7..2aa6eb675fdf 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -3542,6 +3542,11 @@ static int cxl_region_probe(struct device *dev)
 	case CXL_PARTMODE_PMEM:
 		return devm_cxl_add_pmem_region(cxlr);
 	case CXL_PARTMODE_RAM:
+		rc = devm_cxl_region_edac_register(cxlr);
+		if (rc)
+			dev_dbg(&cxlr->dev, "CXL EDAC registration for region_id=%d failed\n",
+				cxlr->id);
+
 		/*
 		 * The region can not be manged by CXL if any portion of
 		 * it is already online as 'System RAM'
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 3ec6b906371b..11fa98cc4d9c 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -853,6 +853,16 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd);
 int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa);
 int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa);
 
+#if IS_ENABLED(CONFIG_CXL_EDAC_MEM_FEATURES)
+int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd);
+int devm_cxl_region_edac_register(struct cxl_region *cxlr);
+#else
+static inline int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd)
+{ return 0; }
+static inline int devm_cxl_region_edac_register(struct cxl_region *cxlr)
+{ return 0; }
+#endif
+
 #ifdef CONFIG_CXL_SUSPEND
 void cxl_mem_active_inc(void);
 void cxl_mem_active_dec(void);
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 9675243bd05b..6e6777b7bafb 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -180,6 +180,10 @@ static int cxl_mem_probe(struct device *dev)
 			return rc;
 	}
 
+	rc = devm_cxl_memdev_edac_register(cxlmd);
+	if (rc)
+		dev_dbg(dev, "CXL memdev EDAC registration failed rc=%d\n", rc);
+
 	/*
 	 * The kernel may be operating out of CXL memory on this device,
 	 * there is no spec defined way to determine whether this device
-- 
2.43.0



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 4/8] cxl/edac: Add CXL memory device ECS control feature
  2025-03-20 18:04 [PATCH v2 0/8] cxl: support CXL memory RAS features shiju.jose
                   ` (2 preceding siblings ...)
  2025-03-20 18:04 ` [PATCH v2 3/8] cxl/edac: Add CXL memory device " shiju.jose
@ 2025-03-20 18:04 ` shiju.jose
  2025-03-27 17:12   ` Dan Williams
  2025-03-20 18:04 ` [PATCH v2 5/8] cxl/mbox: Add support for PERFORM_MAINTENANCE mailbox command shiju.jose
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: shiju.jose @ 2025-03-20 18:04 UTC (permalink / raw)
  To: linux-cxl, dan.j.williams, dave, jonathan.cameron, dave.jiang,
	alison.schofield, vishal.l.verma, ira.weiny, david,
	Vilas.Sridharan
  Cc: linux-edac, linux-acpi, linux-mm, linux-kernel, bp, tony.luck,
	rafael, lenb, mchehab, leo.duran, Yazen.Ghannam, rientjes,
	jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen, gthelen,
	wschwartz, dferguson, wbs, nifan.cxl, yazen.ghannam, tanxiaofei,
	prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm,
	shiju.jose

From: Shiju Jose <shiju.jose@huawei.com>

CXL spec 3.2 section 8.2.10.9.11.2 describes the DDR5 ECS (Error Check
Scrub) control feature.
The Error Check Scrub (ECS) is a feature defined in JEDEC DDR5 SDRAM
Specification (JESD79-5) and allows the DRAM to internally read, correct
single-bit errors, and write back corrected data bits to the DRAM array
while providing transparency to error counts.

The ECS control allows the requester to change the log entry type, the ECS
threshold count (provided the request falls within the limits specified in
DDR5 mode registers), switch between codeword mode and row count mode, and
reset the ECS counter.

Register with EDAC device driver, which retrieves the ECS attribute
descriptors from the EDAC ECS and exposes the ECS control attributes to
userspace via sysfs. For example, the ECS control for the memory media FRU0
in CXL mem0 device is located at /sys/bus/edac/devices/cxl_mem0/ecs_fru0/

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 drivers/cxl/Kconfig     |   1 +
 drivers/cxl/core/edac.c | 353 +++++++++++++++++++++++++++++++++++++++-
 2 files changed, 353 insertions(+), 1 deletion(-)

diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index b5ede1308425..1c67bf844993 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -120,6 +120,7 @@ config CXL_EDAC_MEM_FEATURES
 	depends on CXL_FEATURES
 	depends on EDAC >= CXL_BUS
 	depends on EDAC_SCRUB
+	depends on EDAC_ECS
 	help
 	  The CXL EDAC memory feature control is optional and allows host
 	  to control the EDAC memory features configurations of CXL memory
diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c
index 5ec3535785e1..1110685ed41a 100644
--- a/drivers/cxl/core/edac.c
+++ b/drivers/cxl/core/edac.c
@@ -19,7 +19,7 @@
 #include <cxlmem.h>
 #include "core.h"
 
-#define CXL_NR_EDAC_DEV_FEATURES 1
+#define CXL_NR_EDAC_DEV_FEATURES 2
 
 static struct rw_semaphore *cxl_acquire(struct rw_semaphore *rwsem)
 {
@@ -428,6 +428,350 @@ static int cxl_region_scrub_init(struct cxl_region *cxlr,
 	return 0;
 }
 
+/*
+ * CXL DDR5 ECS control definitions.
+ */
+struct cxl_ecs_context {
+	u16 num_media_frus;
+	u16 get_feat_size;
+	u16 set_feat_size;
+	u8 get_version;
+	u8 set_version;
+	u16 effects;
+	struct cxl_memdev *cxlmd;
+};
+
+enum {
+	CXL_ECS_PARAM_LOG_ENTRY_TYPE,
+	CXL_ECS_PARAM_THRESHOLD,
+	CXL_ECS_PARAM_MODE,
+	CXL_ECS_PARAM_RESET_COUNTER,
+};
+
+#define CXL_ECS_LOG_ENTRY_TYPE_MASK GENMASK(1, 0)
+#define CXL_ECS_REALTIME_REPORT_CAP_MASK BIT(0)
+#define CXL_ECS_THRESHOLD_COUNT_MASK GENMASK(2, 0)
+#define CXL_ECS_COUNT_MODE_MASK BIT(3)
+#define CXL_ECS_RESET_COUNTER_MASK BIT(4)
+#define CXL_ECS_RESET_COUNTER 1
+
+enum {
+	ECS_THRESHOLD_256 = 256,
+	ECS_THRESHOLD_1024 = 1024,
+	ECS_THRESHOLD_4096 = 4096,
+};
+
+enum {
+	ECS_THRESHOLD_IDX_256 = 3,
+	ECS_THRESHOLD_IDX_1024 = 4,
+	ECS_THRESHOLD_IDX_4096 = 5,
+};
+
+static const u16 ecs_supp_threshold[] = {
+	[ECS_THRESHOLD_IDX_256] = 256,
+	[ECS_THRESHOLD_IDX_1024] = 1024,
+	[ECS_THRESHOLD_IDX_4096] = 4096,
+};
+
+enum {
+	ECS_LOG_ENTRY_TYPE_DRAM = 0x0,
+	ECS_LOG_ENTRY_TYPE_MEM_MEDIA_FRU = 0x1,
+};
+
+enum cxl_ecs_count_mode {
+	ECS_MODE_COUNTS_ROWS = 0,
+	ECS_MODE_COUNTS_CODEWORDS = 1,
+};
+
+/**
+ * struct cxl_ecs_params - CXL memory DDR5 ECS parameter data structure.
+ * @threshold: ECS threshold count per GB of memory cells.
+ * @log_entry_type: ECS log entry type, per DRAM or per memory media FRU.
+ * @reset_counter: [IN] reset ECC counter to default value.
+ * @count_mode: codeword/row count mode
+ *		0 : ECS counts rows with errors
+ *		1 : ECS counts codeword with errors
+ */
+struct cxl_ecs_params {
+	u16 threshold;
+	u8 log_entry_type;
+	u8 reset_counter;
+	enum cxl_ecs_count_mode count_mode;
+};
+
+/*
+ * See CXL spec rev 3.2 @8.2.10.9.11.2 Table 8-225 DDR5 ECS Control Feature
+ * Readable Attributes.
+ */
+struct cxl_ecs_fru_rd_attrs {
+	u8 ecs_cap;
+	__le16 ecs_config;
+	u8 ecs_flags;
+} __packed;
+
+struct cxl_ecs_rd_attrs {
+	u8 ecs_log_cap;
+	struct cxl_ecs_fru_rd_attrs fru_attrs[];
+} __packed;
+
+/*
+ * See CXL spec rev 3.2 @8.2.10.9.11.2 Table 8-226 DDR5 ECS Control Feature
+ * Writable Attributes.
+ */
+struct cxl_ecs_fru_wr_attrs {
+	__le16 ecs_config;
+} __packed;
+
+struct cxl_ecs_wr_attrs {
+	u8 ecs_log_cap;
+	struct cxl_ecs_fru_wr_attrs fru_attrs[];
+} __packed;
+
+/*
+ * CXL DDR5 ECS control functions.
+ */
+static int cxl_mem_ecs_get_attrs(struct device *dev,
+				 struct cxl_ecs_context *cxl_ecs_ctx,
+				 int fru_id, struct cxl_ecs_params *params)
+{
+	struct cxl_memdev *cxlmd = cxl_ecs_ctx->cxlmd;
+	struct cxl_mailbox *cxl_mbox = &cxlmd->cxlds->cxl_mbox;
+	struct cxl_ecs_fru_rd_attrs *fru_rd_attrs;
+	size_t rd_data_size;
+	u8 threshold_index;
+	size_t data_size;
+	u16 ecs_config;
+
+	rd_data_size = cxl_ecs_ctx->get_feat_size;
+
+	struct cxl_ecs_rd_attrs *rd_attrs __free(kvfree) =
+		kvzalloc(rd_data_size, GFP_KERNEL);
+	if (!rd_attrs)
+		return -ENOMEM;
+
+	params->log_entry_type = 0;
+	params->threshold = 0;
+	params->count_mode = 0;
+	data_size = cxl_get_feature(cxl_mbox, &CXL_FEAT_ECS_UUID,
+				    CXL_GET_FEAT_SEL_CURRENT_VALUE, rd_attrs,
+				    rd_data_size, 0, NULL);
+	if (!data_size)
+		return -EIO;
+
+	fru_rd_attrs = rd_attrs->fru_attrs;
+	params->log_entry_type =
+		FIELD_GET(CXL_ECS_LOG_ENTRY_TYPE_MASK, rd_attrs->ecs_log_cap);
+	ecs_config = le16_to_cpu(fru_rd_attrs[fru_id].ecs_config);
+	threshold_index = FIELD_GET(CXL_ECS_THRESHOLD_COUNT_MASK, ecs_config);
+	params->threshold = ecs_supp_threshold[threshold_index];
+	params->count_mode = FIELD_GET(CXL_ECS_COUNT_MODE_MASK, ecs_config);
+	return 0;
+}
+
+static int cxl_mem_ecs_set_attrs(struct device *dev,
+				 struct cxl_ecs_context *cxl_ecs_ctx,
+				 int fru_id, struct cxl_ecs_params *params,
+				 u8 param_type)
+{
+	struct cxl_memdev *cxlmd = cxl_ecs_ctx->cxlmd;
+	struct cxl_mailbox *cxl_mbox = &cxlmd->cxlds->cxl_mbox;
+	struct cxl_ecs_fru_rd_attrs *fru_rd_attrs;
+	struct cxl_ecs_fru_wr_attrs *fru_wr_attrs;
+	size_t rd_data_size, wr_data_size;
+	u16 num_media_frus, count;
+	size_t data_size;
+	u16 ecs_config;
+
+	num_media_frus = cxl_ecs_ctx->num_media_frus;
+	rd_data_size = cxl_ecs_ctx->get_feat_size;
+	wr_data_size = cxl_ecs_ctx->set_feat_size;
+	struct cxl_ecs_rd_attrs *rd_attrs __free(kvfree) =
+		kvzalloc(rd_data_size, GFP_KERNEL);
+	if (!rd_attrs)
+		return -ENOMEM;
+
+	data_size = cxl_get_feature(cxl_mbox, &CXL_FEAT_ECS_UUID,
+				    CXL_GET_FEAT_SEL_CURRENT_VALUE, rd_attrs,
+				    rd_data_size, 0, NULL);
+	if (!data_size)
+		return -EIO;
+
+	struct cxl_ecs_wr_attrs *wr_attrs __free(kvfree) =
+		kvzalloc(wr_data_size, GFP_KERNEL);
+	if (!wr_attrs)
+		return -ENOMEM;
+
+	/*
+	 * Fill writable attributes from the current attributes read
+	 * for all the media FRUs.
+	 */
+	fru_rd_attrs = rd_attrs->fru_attrs;
+	fru_wr_attrs = wr_attrs->fru_attrs;
+	wr_attrs->ecs_log_cap = rd_attrs->ecs_log_cap;
+	for (count = 0; count < num_media_frus; count++)
+		fru_wr_attrs[count].ecs_config = fru_rd_attrs[count].ecs_config;
+
+	/* Fill attribute to be set for the media FRU */
+	ecs_config = le16_to_cpu(fru_rd_attrs[fru_id].ecs_config);
+	switch (param_type) {
+	case CXL_ECS_PARAM_LOG_ENTRY_TYPE:
+		if (params->log_entry_type != ECS_LOG_ENTRY_TYPE_DRAM &&
+		    params->log_entry_type != ECS_LOG_ENTRY_TYPE_MEM_MEDIA_FRU)
+			return -EINVAL;
+
+		wr_attrs->ecs_log_cap = FIELD_PREP(CXL_ECS_LOG_ENTRY_TYPE_MASK,
+						   params->log_entry_type);
+		break;
+	case CXL_ECS_PARAM_THRESHOLD:
+		ecs_config &= ~CXL_ECS_THRESHOLD_COUNT_MASK;
+		switch (params->threshold) {
+		case ECS_THRESHOLD_256:
+			ecs_config |= FIELD_PREP(CXL_ECS_THRESHOLD_COUNT_MASK,
+						 ECS_THRESHOLD_IDX_256);
+			break;
+		case ECS_THRESHOLD_1024:
+			ecs_config |= FIELD_PREP(CXL_ECS_THRESHOLD_COUNT_MASK,
+						 ECS_THRESHOLD_IDX_1024);
+			break;
+		case ECS_THRESHOLD_4096:
+			ecs_config |= FIELD_PREP(CXL_ECS_THRESHOLD_COUNT_MASK,
+						 ECS_THRESHOLD_IDX_4096);
+			break;
+		default:
+			dev_dbg(dev,
+				"Invalid CXL ECS scrub threshold count(%d) to set\n",
+				params->threshold);
+			dev_dbg(dev,
+				"Supported scrub threshold counts: %u, %u, %u\n",
+				ECS_THRESHOLD_256, ECS_THRESHOLD_1024,
+				ECS_THRESHOLD_4096);
+			return -EINVAL;
+		}
+		break;
+	case CXL_ECS_PARAM_MODE:
+		if (params->count_mode != ECS_MODE_COUNTS_ROWS &&
+		    params->count_mode != ECS_MODE_COUNTS_CODEWORDS) {
+			dev_dbg(dev, "Invalid CXL ECS scrub mode(%d) to set\n",
+				params->count_mode);
+			dev_dbg(dev,
+				"Supported ECS Modes: 0: ECS counts rows with errors,"
+				" 1: ECS counts codewords with errors\n");
+			return -EINVAL;
+		}
+		ecs_config &= ~CXL_ECS_COUNT_MODE_MASK;
+		ecs_config |=
+			FIELD_PREP(CXL_ECS_COUNT_MODE_MASK, params->count_mode);
+		break;
+	case CXL_ECS_PARAM_RESET_COUNTER:
+		if (params->reset_counter != CXL_ECS_RESET_COUNTER)
+			return -EINVAL;
+
+		ecs_config &= ~CXL_ECS_RESET_COUNTER_MASK;
+		ecs_config |= FIELD_PREP(CXL_ECS_RESET_COUNTER_MASK,
+					 params->reset_counter);
+		break;
+	default:
+		return -EINVAL;
+	}
+	fru_wr_attrs[fru_id].ecs_config = cpu_to_le16(ecs_config);
+
+	return cxl_set_feature(cxl_mbox, &CXL_FEAT_ECS_UUID,
+			       cxl_ecs_ctx->set_version, wr_attrs, wr_data_size,
+			       CXL_SET_FEAT_FLAG_DATA_SAVED_ACROSS_RESET, 0,
+			       NULL);
+}
+
+#define CXL_ECS_GET_ATTR(attrib)					    \
+	static int cxl_ecs_get_##attrib(struct device *dev, void *drv_data, \
+					int fru_id, u32 *val)		    \
+	{								    \
+		struct cxl_ecs_context *ctx = drv_data;			    \
+		struct cxl_ecs_params params;				    \
+		int ret;						    \
+									    \
+		ret = cxl_mem_ecs_get_attrs(dev, ctx, fru_id, &params);	    \
+		if (ret)						    \
+			return ret;					    \
+									    \
+		*val = params.attrib;					    \
+									    \
+		return 0;						    \
+	}
+
+CXL_ECS_GET_ATTR(log_entry_type)
+CXL_ECS_GET_ATTR(count_mode)
+CXL_ECS_GET_ATTR(threshold)
+
+#define CXL_ECS_SET_ATTR(attrib, param_type)				    \
+	static int cxl_ecs_set_##attrib(struct device *dev, void *drv_data, \
+					int fru_id, u32 val)		    \
+	{								    \
+		struct cxl_ecs_context *ctx = drv_data;			    \
+		struct cxl_ecs_params params = {			    \
+			.attrib = val,					    \
+		};							    \
+									    \
+		return cxl_mem_ecs_set_attrs(dev, ctx, fru_id, &params,	    \
+					     (param_type));		    \
+	}
+CXL_ECS_SET_ATTR(log_entry_type, CXL_ECS_PARAM_LOG_ENTRY_TYPE)
+CXL_ECS_SET_ATTR(count_mode, CXL_ECS_PARAM_MODE)
+CXL_ECS_SET_ATTR(reset_counter, CXL_ECS_PARAM_RESET_COUNTER)
+CXL_ECS_SET_ATTR(threshold, CXL_ECS_PARAM_THRESHOLD)
+
+static const struct edac_ecs_ops cxl_ecs_ops = {
+	.get_log_entry_type = cxl_ecs_get_log_entry_type,
+	.set_log_entry_type = cxl_ecs_set_log_entry_type,
+	.get_mode = cxl_ecs_get_count_mode,
+	.set_mode = cxl_ecs_set_count_mode,
+	.reset = cxl_ecs_set_reset_counter,
+	.get_threshold = cxl_ecs_get_threshold,
+	.set_threshold = cxl_ecs_set_threshold,
+};
+
+static int cxl_memdev_ecs_init(struct cxl_memdev *cxlmd,
+			       struct edac_dev_feature *ras_feature)
+{
+	struct cxl_ecs_context *cxl_ecs_ctx;
+	struct cxl_feat_entry *feat_entry;
+	int num_media_frus;
+
+	feat_entry = cxl_get_feature_entry(cxlmd->cxlds, &CXL_FEAT_ECS_UUID);
+	if (!feat_entry)
+		return -EOPNOTSUPP;
+
+	if (!(le32_to_cpu(feat_entry->flags) & CXL_FEATURE_F_CHANGEABLE))
+		return -EOPNOTSUPP;
+
+	num_media_frus = (le16_to_cpu(feat_entry->get_feat_size) -
+			  sizeof(struct cxl_ecs_rd_attrs)) /
+			 sizeof(struct cxl_ecs_fru_rd_attrs);
+	if (!num_media_frus)
+		return -EOPNOTSUPP;
+
+	cxl_ecs_ctx =
+		devm_kzalloc(&cxlmd->dev, sizeof(*cxl_ecs_ctx), GFP_KERNEL);
+	if (!cxl_ecs_ctx)
+		return -ENOMEM;
+
+	*cxl_ecs_ctx = (struct cxl_ecs_context){
+		.get_feat_size = le16_to_cpu(feat_entry->get_feat_size),
+		.set_feat_size = le16_to_cpu(feat_entry->set_feat_size),
+		.get_version = feat_entry->get_feat_ver,
+		.set_version = feat_entry->set_feat_ver,
+		.effects = le16_to_cpu(feat_entry->effects),
+		.num_media_frus = num_media_frus,
+		.cxlmd = cxlmd,
+	};
+
+	ras_feature->ft_type = RAS_FEAT_ECS;
+	ras_feature->ecs_ops = &cxl_ecs_ops;
+	ras_feature->ctx = cxl_ecs_ctx;
+	ras_feature->ecs_info.num_media_frus = num_media_frus;
+
+	return 0;
+}
+
 int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd)
 {
 	struct edac_dev_feature ras_features[CXL_NR_EDAC_DEV_FEATURES];
@@ -443,6 +787,13 @@ int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd)
 	if (rc != -EOPNOTSUPP)
 		num_ras_features++;
 
+	rc = cxl_memdev_ecs_init(cxlmd, &ras_features[num_ras_features]);
+	if (rc < 0 && rc != -EOPNOTSUPP)
+		return rc;
+
+	if (rc != -EOPNOTSUPP)
+		num_ras_features++;
+
 	char *cxl_dev_name __free(kfree) =
 		kasprintf(GFP_KERNEL, "cxl_%s", dev_name(&cxlmd->dev));
 
-- 
2.43.0



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 5/8] cxl/mbox: Add support for PERFORM_MAINTENANCE mailbox command
  2025-03-20 18:04 [PATCH v2 0/8] cxl: support CXL memory RAS features shiju.jose
                   ` (3 preceding siblings ...)
  2025-03-20 18:04 ` [PATCH v2 4/8] cxl/edac: Add CXL memory device ECS " shiju.jose
@ 2025-03-20 18:04 ` shiju.jose
  2025-03-27 17:23   ` Dan Williams
  2025-03-20 18:04 ` [PATCH v2 6/8] cxl: Support for finding memory operation attributes from the current boot shiju.jose
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: shiju.jose @ 2025-03-20 18:04 UTC (permalink / raw)
  To: linux-cxl, dan.j.williams, dave, jonathan.cameron, dave.jiang,
	alison.schofield, vishal.l.verma, ira.weiny, david,
	Vilas.Sridharan
  Cc: linux-edac, linux-acpi, linux-mm, linux-kernel, bp, tony.luck,
	rafael, lenb, mchehab, leo.duran, Yazen.Ghannam, rientjes,
	jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen, gthelen,
	wschwartz, dferguson, wbs, nifan.cxl, yazen.ghannam, tanxiaofei,
	prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm,
	shiju.jose

From: Shiju Jose <shiju.jose@huawei.com>

Add support for PERFORM_MAINTENANCE mailbox command.

CXL spec 3.2 section 8.2.10.7.1 describes the Perform Maintenance command.
This command requests the device to execute the maintenance operation
specified by the maintenance operation class and the maintenance operation
subclass.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 drivers/cxl/core/mbox.c | 34 ++++++++++++++++++++++++++++++++++
 drivers/cxl/cxlmem.h    | 17 +++++++++++++++++
 2 files changed, 51 insertions(+)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index d72764056ce6..19d46a284650 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -824,6 +824,40 @@ static const uuid_t log_uuid[] = {
 	[VENDOR_DEBUG_UUID] = DEFINE_CXL_VENDOR_DEBUG_UUID,
 };
 
+int cxl_do_maintenance(struct cxl_mailbox *cxl_mbox,
+		       u8 class, u8 subclass,
+		       void *data_in, size_t data_in_size)
+{
+	struct cxl_memdev_maintenance_pi {
+		struct cxl_mbox_do_maintenance_hdr hdr;
+		u8 data[];
+	}  __packed;
+	struct cxl_mbox_cmd mbox_cmd;
+	size_t hdr_size;
+
+	struct cxl_memdev_maintenance_pi *pi __free(kfree) =
+					kmalloc(cxl_mbox->payload_size, GFP_KERNEL);
+	pi->hdr.op_class = class;
+	pi->hdr.op_subclass = subclass;
+	hdr_size = sizeof(pi->hdr);
+	/*
+	 * Check minimum mbox payload size is available for
+	 * the maintenance data transfer.
+	 */
+	if (hdr_size + data_in_size > cxl_mbox->payload_size)
+		return -ENOMEM;
+
+	memcpy(pi->data, data_in, data_in_size);
+	mbox_cmd = (struct cxl_mbox_cmd) {
+		.opcode = CXL_MBOX_OP_DO_MAINTENANCE,
+		.size_in = hdr_size + data_in_size,
+		.payload_in = pi,
+	};
+
+	return cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_do_maintenance, "CXL");
+
 /**
  * cxl_enumerate_cmds() - Enumerate commands for a device.
  * @mds: The driver data for the operation
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 11fa98cc4d9c..7ab257e0c85e 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -527,6 +527,7 @@ enum cxl_opcode {
 	CXL_MBOX_OP_GET_SUPPORTED_FEATURES	= 0x0500,
 	CXL_MBOX_OP_GET_FEATURE		= 0x0501,
 	CXL_MBOX_OP_SET_FEATURE		= 0x0502,
+	CXL_MBOX_OP_DO_MAINTENANCE	= 0x0600,
 	CXL_MBOX_OP_IDENTIFY		= 0x4000,
 	CXL_MBOX_OP_GET_PARTITION_INFO	= 0x4100,
 	CXL_MBOX_OP_SET_PARTITION_INFO	= 0x4101,
@@ -827,6 +828,19 @@ enum {
 	CXL_PMEM_SEC_PASS_USER,
 };
 
+/*
+ * Perform Maintenance CXL 3.2 Spec 8.2.10.7.1
+ */
+
+/*
+ * Perform Maintenance input payload
+ * CXL rev 3.2 section 8.2.10.7.1 Table 8-117
+ */
+struct cxl_mbox_do_maintenance_hdr {
+	u8 op_class;
+	u8 op_subclass;
+}  __packed;
+
 int cxl_internal_send_cmd(struct cxl_mailbox *cxl_mbox,
 			  struct cxl_mbox_cmd *cmd);
 int cxl_dev_state_identify(struct cxl_memdev_state *mds);
@@ -898,4 +912,7 @@ struct cxl_hdm {
 struct seq_file;
 struct dentry *cxl_debugfs_create_dir(const char *dir);
 void cxl_dpa_debug(struct seq_file *file, struct cxl_dev_state *cxlds);
+int cxl_do_maintenance(struct cxl_mailbox *cxl_mbox,
+		       u8 class, u8 subclass,
+		       void *data_in, size_t data_in_size);
 #endif /* __CXL_MEM_H__ */
-- 
2.43.0



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 6/8] cxl: Support for finding memory operation attributes from the current boot
  2025-03-20 18:04 [PATCH v2 0/8] cxl: support CXL memory RAS features shiju.jose
                   ` (4 preceding siblings ...)
  2025-03-20 18:04 ` [PATCH v2 5/8] cxl/mbox: Add support for PERFORM_MAINTENANCE mailbox command shiju.jose
@ 2025-03-20 18:04 ` shiju.jose
  2025-03-27 17:43   ` Dan Williams
  2025-03-20 18:04 ` [PATCH v2 7/8] cxl/memfeature: Add CXL memory device soft PPR control feature shiju.jose
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: shiju.jose @ 2025-03-20 18:04 UTC (permalink / raw)
  To: linux-cxl, dan.j.williams, dave, jonathan.cameron, dave.jiang,
	alison.schofield, vishal.l.verma, ira.weiny, david,
	Vilas.Sridharan
  Cc: linux-edac, linux-acpi, linux-mm, linux-kernel, bp, tony.luck,
	rafael, lenb, mchehab, leo.duran, Yazen.Ghannam, rientjes,
	jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen, gthelen,
	wschwartz, dferguson, wbs, nifan.cxl, yazen.ghannam, tanxiaofei,
	prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm,
	shiju.jose

From: Shiju Jose <shiju.jose@huawei.com>

Certain operations on memory, such as memory repair, are permitted
only when the address and other attributes for the operation are
from the current boot. This is determined by checking whether the
memory attributes for the operation match those in the CXL gen_media
or CXL DRAM memory event records reported during the current boot.

The CXL event records must be backed up because they are cleared
in the hardware after being processed by the kernel.

Support is added for storing CXL gen_media or CXL DRAM memory event
records in xarrays. Additionally, helper functions are implemented
to find a matching record in the xarray storage based on the memory
attributes and repair type.

Add validity check, when matching attributes for sparing, using
the validity flag in the DRAM event record, to ensure that all
required attributes for a requested repair operation are valid and
set.

Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 drivers/cxl/core/mbox.c   |  11 ++-
 drivers/cxl/core/memdev.c |   9 +++
 drivers/cxl/core/ras.c    | 145 ++++++++++++++++++++++++++++++++++++++
 drivers/cxl/cxlmem.h      |  46 ++++++++++++
 drivers/cxl/pci.c         |   3 +
 5 files changed, 212 insertions(+), 2 deletions(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 19d46a284650..c9328f1b6464 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -956,12 +956,19 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
 				hpa_alias = hpa - cache_size;
 		}
 
-		if (event_type == CXL_CPER_EVENT_GEN_MEDIA)
+		if (event_type == CXL_CPER_EVENT_GEN_MEDIA) {
+			if (cxl_store_rec_gen_media((struct cxl_memdev *)cxlmd, evt))
+				dev_dbg(&cxlmd->dev, "CXL store rec_gen_media failed\n");
+
 			trace_cxl_general_media(cxlmd, type, cxlr, hpa,
 						hpa_alias, &evt->gen_media);
-		else if (event_type == CXL_CPER_EVENT_DRAM)
+		} else if (event_type == CXL_CPER_EVENT_DRAM) {
+			if (cxl_store_rec_dram((struct cxl_memdev *)cxlmd, evt))
+				dev_dbg(&cxlmd->dev, "CXL store rec_dram failed\n");
+
 			trace_cxl_dram(cxlmd, type, cxlr, hpa, hpa_alias,
 				       &evt->dram);
+		}
 	}
 }
 EXPORT_SYMBOL_NS_GPL(cxl_event_trace_record, "CXL");
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index a16a5886d40a..bd9ba50bc01e 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -25,8 +25,17 @@ static DEFINE_IDA(cxl_memdev_ida);
 static void cxl_memdev_release(struct device *dev)
 {
 	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+	struct cxl_event_gen_media *rec_gen_media;
+	struct cxl_event_dram *rec_dram;
+	unsigned long index;
 
 	ida_free(&cxl_memdev_ida, cxlmd->id);
+	xa_for_each(&cxlmd->rec_dram, index, rec_dram)
+		kfree(rec_dram);
+	xa_destroy(&cxlmd->rec_dram);
+	xa_for_each(&cxlmd->rec_gen_media, index, rec_gen_media)
+		kfree(rec_gen_media);
+	xa_destroy(&cxlmd->rec_gen_media);
 	kfree(cxlmd);
 }
 
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index 485a831695c7..c703d4e7e05b 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -7,6 +7,151 @@
 #include <cxlmem.h>
 #include "trace.h"
 
+struct cxl_event_gen_media *
+cxl_find_rec_gen_media(struct cxl_memdev *cxlmd,
+		       struct cxl_mem_repair_attrbs *attrbs)
+{
+	struct cxl_event_gen_media *rec;
+
+	rec = xa_load(&cxlmd->rec_gen_media, attrbs->dpa);
+	if (!rec)
+		return NULL;
+
+	if (attrbs->repair_type == CXL_PPR)
+		return rec;
+
+	return NULL;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_find_rec_gen_media, "CXL");
+
+struct cxl_event_dram *cxl_find_rec_dram(struct cxl_memdev *cxlmd,
+					 struct cxl_mem_repair_attrbs *attrbs)
+{
+	struct cxl_event_dram *rec;
+	u16 validity_flags;
+
+	rec = xa_load(&cxlmd->rec_dram, attrbs->dpa);
+	if (!rec)
+		return NULL;
+
+	validity_flags = get_unaligned_le16(rec->media_hdr.validity_flags);
+	if (!(validity_flags & CXL_DER_VALID_CHANNEL) ||
+	    !(validity_flags & CXL_DER_VALID_RANK))
+		return NULL;
+
+	switch (attrbs->repair_type) {
+	case CXL_PPR:
+		if (!(validity_flags & CXL_DER_VALID_NIBBLE) ||
+		    get_unaligned_le24(rec->nibble_mask) == attrbs->nibble_mask)
+			return rec;
+		break;
+	case CXL_CACHELINE_SPARING:
+		if (!(validity_flags & CXL_DER_VALID_BANK_GROUP) ||
+		    !(validity_flags & CXL_DER_VALID_BANK) ||
+		    !(validity_flags & CXL_DER_VALID_ROW) ||
+		    !(validity_flags & CXL_DER_VALID_COLUMN))
+			return NULL;
+
+		if (rec->media_hdr.channel == attrbs->channel &&
+		    rec->media_hdr.rank == attrbs->rank &&
+		    rec->bank_group == attrbs->bank_group &&
+		    rec->bank == attrbs->bank &&
+		    get_unaligned_le24(rec->row) == attrbs->row &&
+		    get_unaligned_le16(rec->column) == attrbs->column &&
+		    (!(validity_flags & CXL_DER_VALID_NIBBLE) ||
+		     get_unaligned_le24(rec->nibble_mask) ==
+			     attrbs->nibble_mask) &&
+		    (!(validity_flags & CXL_DER_VALID_SUB_CHANNEL) ||
+		     rec->sub_channel == attrbs->sub_channel))
+			return rec;
+		break;
+	case CXL_ROW_SPARING:
+		if (!(validity_flags & CXL_DER_VALID_BANK_GROUP) ||
+		    !(validity_flags & CXL_DER_VALID_BANK) ||
+		    !(validity_flags & CXL_DER_VALID_ROW))
+			return NULL;
+
+		if (rec->media_hdr.channel == attrbs->channel &&
+		    rec->media_hdr.rank == attrbs->rank &&
+		    rec->bank_group == attrbs->bank_group &&
+		    rec->bank == attrbs->bank &&
+		    get_unaligned_le24(rec->row) == attrbs->row &&
+		    (!(validity_flags & CXL_DER_VALID_NIBBLE) ||
+		     get_unaligned_le24(rec->nibble_mask) ==
+			     attrbs->nibble_mask))
+			return rec;
+		break;
+	case CXL_BANK_SPARING:
+		if (!(validity_flags & CXL_DER_VALID_BANK_GROUP) ||
+		    !(validity_flags & CXL_DER_VALID_BANK))
+			return NULL;
+
+		if (rec->media_hdr.channel == attrbs->channel &&
+		    rec->media_hdr.rank == attrbs->rank &&
+		    rec->bank_group == attrbs->bank_group &&
+		    rec->bank == attrbs->bank &&
+		    (!(validity_flags & CXL_DER_VALID_NIBBLE) ||
+		     get_unaligned_le24(rec->nibble_mask) ==
+			     attrbs->nibble_mask))
+			return rec;
+		break;
+	case CXL_RANK_SPARING:
+		if (rec->media_hdr.channel == attrbs->channel &&
+		    rec->media_hdr.rank == attrbs->rank &&
+		    (!(validity_flags & CXL_DER_VALID_NIBBLE) ||
+		     get_unaligned_le24(rec->nibble_mask) ==
+			     attrbs->nibble_mask))
+			return rec;
+		break;
+	default:
+		return NULL;
+	}
+
+	return NULL;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_find_rec_dram, "CXL");
+
+int cxl_store_rec_gen_media(struct cxl_memdev *cxlmd, union cxl_event *evt)
+{
+	void *old_rec;
+	struct cxl_event_gen_media *rec =
+		kmemdup(&evt->gen_media, sizeof(*rec), GFP_KERNEL);
+	if (!rec)
+		return -ENOMEM;
+
+	old_rec = xa_store(&cxlmd->rec_gen_media,
+			   le64_to_cpu(rec->media_hdr.phys_addr), rec,
+			   GFP_KERNEL);
+	if (xa_is_err(old_rec))
+		return xa_err(old_rec);
+
+	kfree(old_rec);
+
+	return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_store_rec_gen_media, "CXL");
+
+int cxl_store_rec_dram(struct cxl_memdev *cxlmd, union cxl_event *evt)
+{
+	void *old_rec;
+	struct cxl_event_dram *rec =
+		kmemdup(&evt->dram, sizeof(*rec), GFP_KERNEL);
+
+	if (!rec)
+		return -ENOMEM;
+
+	old_rec = xa_store(&cxlmd->rec_dram,
+			   le64_to_cpu(rec->media_hdr.phys_addr), rec,
+			   GFP_KERNEL);
+	if (xa_is_err(old_rec))
+		return xa_err(old_rec);
+
+	kfree(old_rec);
+
+	return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_store_rec_dram, "CXL");
+
 static void cxl_cper_trace_corr_port_prot_err(struct pci_dev *pdev,
 					      struct cxl_ras_capability_regs ras_cap)
 {
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 7ab257e0c85e..24ece579a145 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -34,6 +34,41 @@
 	(FIELD_GET(CXLMDEV_RESET_NEEDED_MASK, status) !=                       \
 	 CXLMDEV_RESET_NEEDED_NOT)
 
+enum cxl_mem_repair_type {
+	CXL_PPR,
+	CXL_CACHELINE_SPARING,
+	CXL_ROW_SPARING,
+	CXL_BANK_SPARING,
+	CXL_RANK_SPARING,
+	CXL_REPAIR_MAX,
+};
+
+/**
+ * struct cxl_mem_repair_attrbs - CXL memory repair attributes
+ * @dpa: DPA of memory to repair
+ * @nibble_mask: nibble mask, identifies one or more nibbles on the memory bus
+ * @row: row of memory to repair
+ * @column: column of memory to repair
+ * @channel: channel of memory to repair
+ * @sub_channel: sub channel of memory to repair
+ * @rank: rank of memory to repair
+ * @bank_group: bank group of memory to repair
+ * @bank: bank of memory to repair
+ * @repair_type: repair type. For eg. PPR, memory sparing etc.
+ */
+struct cxl_mem_repair_attrbs {
+	u64 dpa;
+	u32 nibble_mask;
+	u32 row;
+	u16 column;
+	u8 channel;
+	u8 sub_channel;
+	u8 rank;
+	u8 bank_group;
+	u8 bank;
+	enum cxl_mem_repair_type repair_type;
+};
+
 /**
  * struct cxl_memdev - CXL bus object representing a Type-3 Memory Device
  * @dev: driver core device object
@@ -45,6 +80,8 @@
  * @endpoint: connection to the CXL port topology for this memory device
  * @id: id number of this memdev instance.
  * @depth: endpoint port depth
+ * @rec_gen_media: xarray to store CXL general media records
+ * @rec_dram: xarray to store CXL DRAM records
  */
 struct cxl_memdev {
 	struct device dev;
@@ -56,6 +93,8 @@ struct cxl_memdev {
 	struct cxl_port *endpoint;
 	int id;
 	int depth;
+	struct xarray rec_gen_media;
+	struct xarray rec_dram;
 };
 
 static inline struct cxl_memdev *to_cxl_memdev(struct device *dev)
@@ -877,6 +916,13 @@ static inline int devm_cxl_region_edac_register(struct cxl_region *cxlr)
 { return 0; }
 #endif
 
+struct cxl_event_gen_media *
+cxl_find_rec_gen_media(struct cxl_memdev *cxlmd, struct cxl_mem_repair_attrbs *attrbs);
+struct cxl_event_dram *cxl_find_rec_dram(struct cxl_memdev *cxlmd,
+					 struct cxl_mem_repair_attrbs *attrbs);
+int cxl_store_rec_gen_media(struct cxl_memdev *cxlmd, union cxl_event *evt);
+int cxl_store_rec_dram(struct cxl_memdev *cxlmd, union cxl_event *evt);
+
 #ifdef CONFIG_CXL_SUSPEND
 void cxl_mem_active_inc(void);
 void cxl_mem_active_dec(void);
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 4288f4814cc5..51f09e685dd9 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -1053,6 +1053,9 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 
 	pci_save_state(pdev);
 
+	xa_init(&cxlmd->rec_gen_media);
+	xa_init(&cxlmd->rec_dram);
+
 	return rc;
 }
 
-- 
2.43.0



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 7/8] cxl/memfeature: Add CXL memory device soft PPR control feature
  2025-03-20 18:04 [PATCH v2 0/8] cxl: support CXL memory RAS features shiju.jose
                   ` (5 preceding siblings ...)
  2025-03-20 18:04 ` [PATCH v2 6/8] cxl: Support for finding memory operation attributes from the current boot shiju.jose
@ 2025-03-20 18:04 ` shiju.jose
  2025-03-27 17:01   ` Borislav Petkov
  2025-03-20 18:04 ` [PATCH v2 8/8] cxl/memfeature: Add CXL memory device memory sparing " shiju.jose
  2025-03-21  7:39 ` [PATCH v2 0/8] cxl: support CXL memory RAS features Christophe Leroy
  8 siblings, 1 reply; 24+ messages in thread
From: shiju.jose @ 2025-03-20 18:04 UTC (permalink / raw)
  To: linux-cxl, dan.j.williams, dave, jonathan.cameron, dave.jiang,
	alison.schofield, vishal.l.verma, ira.weiny, david,
	Vilas.Sridharan
  Cc: linux-edac, linux-acpi, linux-mm, linux-kernel, bp, tony.luck,
	rafael, lenb, mchehab, leo.duran, Yazen.Ghannam, rientjes,
	jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen, gthelen,
	wschwartz, dferguson, wbs, nifan.cxl, yazen.ghannam, tanxiaofei,
	prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm,
	shiju.jose

From: Shiju Jose <shiju.jose@huawei.com>

Post Package Repair (PPR) maintenance operations may be supported by CXL
devices that implement CXL.mem protocol. A PPR maintenance operation
requests the CXL device to perform a repair operation on its media.
For example, a CXL device with DRAM components that support PPR features
may implement PPR Maintenance operations. DRAM components may support two
types of PPR, hard PPR (hPPR), for a permanent row repair, and Soft PPR
(sPPR), for a temporary row repair. Soft PPR is much faster than hPPR,
but the repair is lost with a power cycle.

During the execution of a PPR Maintenance operation, a CXL memory device:
- May or may not retain data
- May or may not be able to process CXL.mem requests correctly, including
the ones that target the DPA involved in the repair.
These CXL Memory Device capabilities are specified by Restriction Flags
in the sPPR Feature and hPPR Feature.

Soft PPR maintenance operation may be executed at runtime, if data is
retained and CXL.mem requests are correctly processed. For CXL devices with
DRAM components, hPPR maintenance operation may be executed only at boot
because typically data may not be retained with hPPR maintenance operation.

When a CXL device identifies error on a memory component, the device
may inform the host about the need for a PPR maintenance operation by using
an Event Record, where the Maintenance Needed flag is set. The Event Record
specifies the DPA that should be repaired. A CXL device may not keep track
of the requests that have already been sent and the information on which
DPA should be repaired may be lost upon power cycle.
The userspace tool requests for maintenance operation if the number of
corrected error reported on a CXL.mem media exceeds error threshold.

CXL spec 3.2 section 8.2.10.7.1.2 describes the device's sPPR (soft PPR)
maintenance operation and section 8.2.10.7.1.3 describes the device's
hPPR (hard PPR) maintenance operation feature.

CXL spec 3.2 section 8.2.10.7.2.1 describes the sPPR feature discovery and
configuration.

CXL spec 3.2 section 8.2.10.7.2.2 describes the hPPR feature discovery and
configuration.

Add support for controlling CXL memory device soft PPR (sPPR) feature.
Register with EDAC driver, which gets the memory repair attr descriptors
from the EDAC memory repair driver and exposes sysfs repair control
attributes for PRR to the userspace. For example CXL PPR control for the
CXL mem0 device is exposed in /sys/bus/edac/devices/cxl_mem0/mem_repairX/

Add checks to ensure the memory to be repaired is offline and originates
from a CXL DRAM or CXL gen_media error record reported in the current boot,
before requesting a PPR operation on the device.

Tested with QEMU patch for CXL PPR feature.
https://lore.kernel.org/all/20240730045722.71482-1-dave@stgolabs.net/

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 Documentation/edac/memory_repair.rst |  18 ++
 drivers/cxl/Kconfig                  |   1 +
 drivers/cxl/core/edac.c              | 361 ++++++++++++++++++++++++++-
 drivers/edac/mem_repair.c            |   5 +
 include/linux/edac.h                 |   3 +
 5 files changed, 387 insertions(+), 1 deletion(-)

diff --git a/Documentation/edac/memory_repair.rst b/Documentation/edac/memory_repair.rst
index 52162a422864..cb0d91ccebfd 100644
--- a/Documentation/edac/memory_repair.rst
+++ b/Documentation/edac/memory_repair.rst
@@ -119,3 +119,21 @@ sysfs
 
 Sysfs files are documented in
 `Documentation/ABI/testing/sysfs-edac-memory-repair`.
+
+Examples
+--------
+
+The memory repair usage takes the form shown in this example:
+
+1. CXL memory device Soft Post Package Repair (Soft PPR)
+
+Post Package Repair (PPR) maintenance operations may be supported by CXL
+devices that implement CXL.mem protocol. A PPR maintenance operation
+requests the CXL device to perform a repair operation on its media.
+For example, a CXL device with DRAM components that support PPR features
+may implement PPR Maintenance operations. Soft PPR (sPPR), is a temporary
+row repair. Soft PPR may be faster, but the repair is lost with a power
+cycle.
+
+Sysfs files for memory repair are documented in
+`Documentation/ABI/testing/sysfs-edac-memory-repair`
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 1c67bf844993..540d06011d80 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -121,6 +121,7 @@ config CXL_EDAC_MEM_FEATURES
 	depends on EDAC >= CXL_BUS
 	depends on EDAC_SCRUB
 	depends on EDAC_ECS
+	depends on EDAC_MEM_REPAIR
 	help
 	  The CXL EDAC memory feature control is optional and allows host
 	  to control the EDAC memory features configurations of CXL memory
diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c
index 1110685ed41a..f0fb7bac6544 100644
--- a/drivers/cxl/core/edac.c
+++ b/drivers/cxl/core/edac.c
@@ -14,12 +14,13 @@
 #include <linux/cleanup.h>
 #include <linux/edac.h>
 #include <linux/limits.h>
+#include <linux/unaligned.h>
 #include <cxl/features.h>
 #include <cxl.h>
 #include <cxlmem.h>
 #include "core.h"
 
-#define CXL_NR_EDAC_DEV_FEATURES 2
+#define CXL_NR_EDAC_DEV_FEATURES 3
 
 static struct rw_semaphore *cxl_acquire(struct rw_semaphore *rwsem)
 {
@@ -31,6 +32,16 @@ static struct rw_semaphore *cxl_acquire(struct rw_semaphore *rwsem)
 
 DEFINE_FREE(cxl_unlock, struct rw_semaphore *, if (_T) up_read(_T))
 
+static bool cxl_is_memdev_memory_online(const struct cxl_memdev *cxlmd)
+{
+	struct cxl_port *port = cxlmd->endpoint;
+
+	if (port && is_cxl_endpoint(port) && cxl_num_decoders_committed(port))
+		return true;
+
+	return false;
+}
+
 /*
  * CXL memory patrol scrub control
  */
@@ -772,10 +783,348 @@ static int cxl_memdev_ecs_init(struct cxl_memdev *cxlmd,
 	return 0;
 }
 
+/*
+ * CXL memory soft PPR & hard PPR control definitions
+ */
+struct cxl_ppr_context {
+	uuid_t repair_uuid;
+	u8 instance;
+	u16 get_feat_size;
+	u16 set_feat_size;
+	u8 get_version;
+	u8 set_version;
+	u16 effects;
+	struct cxl_memdev *cxlmd;
+	enum edac_mem_repair_type repair_type;
+	bool persist_mode;
+	u64 dpa;
+	u32 nibble_mask;
+};
+
+/**
+ * struct cxl_memdev_ppr_params - CXL memory PPR parameter data structure.
+ * @dpa: device physical address.
+ * @op_class: PPR operation class.
+ * @op_subclass: PPR operation subclass.
+ * @media_accessible: memory media is accessible or not during PPR operation.
+ * @data_retained: data is retained or not during PPR operation.
+ */
+struct cxl_memdev_ppr_params {
+	u64 dpa;
+	u8 op_class;
+	u8 op_subclass;
+	bool media_accessible;
+	bool data_retained;
+};
+
+/*
+ * See CXL rev 3.2 @8.2.10.7.2.1 Table 8-128 sPPR Feature Readable Attributes
+ *
+ * See CXL rev 3.2 @8.2.10.7.2.2 Table 8-131 hPPR Feature Readable Attributes
+ */
+#define CXL_MEMDEV_PPR_QUERY_RESOURCE_FLAG BIT(0)
+
+#define CXL_MEMDEV_PPR_DEVICE_INITIATED_MASK BIT(0)
+
+#define CXL_MEMDEV_PPR_FLAG_DPA_SUPPORT_MASK BIT(0)
+#define CXL_MEMDEV_PPR_FLAG_NIBBLE_SUPPORT_MASK BIT(1)
+#define CXL_MEMDEV_PPR_FLAG_MEM_SPARING_EV_REC_SUPPORT_MASK BIT(2)
+#define CXL_MEMDEV_PPR_FLAG_DEV_INITED_PPR_AT_BOOT_CAP_MASK BIT(3)
+
+#define CXL_MEMDEV_PPR_RESTRICTION_FLAG_MEDIA_ACCESSIBLE_MASK BIT(0)
+#define CXL_MEMDEV_PPR_RESTRICTION_FLAG_DATA_RETAINED_MASK BIT(2)
+
+#define CXL_MEMDEV_PPR_SPARING_EV_REC_EN_MASK BIT(0)
+#define CXL_MEMDEV_PPR_DEV_INITED_PPR_AT_BOOT_EN_MASK BIT(1)
+
+struct cxl_memdev_repair_rd_attrs_hdr {
+	u8 max_op_latency;
+	__le16 op_cap;
+	__le16 op_mode;
+	u8 op_class;
+	u8 op_subclass;
+	u8 rsvd[9];
+} __packed;
+
+struct cxl_memdev_ppr_rd_attrs {
+	struct cxl_memdev_repair_rd_attrs_hdr hdr;
+	u8 ppr_flags;
+	__le16 restriction_flags;
+	u8 ppr_op_mode;
+} __packed;
+
+/*
+ * See CXL rev 3.2 @8.2.10.7.1.2 Table 8-118 sPPR Maintenance Input Payload
+ *
+ * See CXL rev 3.2 @8.2.10.7.1.3 Table 8-119 hPPR Maintenance Input Payload
+ */
+struct cxl_memdev_ppr_maintenance_attrs {
+	u8 flags;
+	__le64 dpa;
+	u8 nibble_mask[3];
+} __packed;
+
+static int cxl_mem_ppr_get_attrs(struct cxl_ppr_context *cxl_ppr_ctx,
+				 struct cxl_memdev_ppr_params *params)
+{
+	size_t rd_data_size = sizeof(struct cxl_memdev_ppr_rd_attrs);
+	struct cxl_memdev *cxlmd = cxl_ppr_ctx->cxlmd;
+	struct cxl_mailbox *cxl_mbox = &cxlmd->cxlds->cxl_mbox;
+	u16 restriction_flags;
+	size_t data_size;
+	u16 return_code;
+
+	struct cxl_memdev_ppr_rd_attrs *rd_attrs __free(kfree) =
+		kmalloc(rd_data_size, GFP_KERNEL);
+	if (!rd_attrs)
+		return -ENOMEM;
+
+	data_size = cxl_get_feature(cxl_mbox, &cxl_ppr_ctx->repair_uuid,
+				    CXL_GET_FEAT_SEL_CURRENT_VALUE, rd_attrs,
+				    rd_data_size, 0, &return_code);
+	if (!data_size)
+		return -EIO;
+
+	params->op_class = rd_attrs->hdr.op_class;
+	params->op_subclass = rd_attrs->hdr.op_subclass;
+	restriction_flags = le16_to_cpu(rd_attrs->restriction_flags);
+	params->media_accessible =
+		FIELD_GET(CXL_MEMDEV_PPR_RESTRICTION_FLAG_MEDIA_ACCESSIBLE_MASK,
+			  restriction_flags) ^ 1;
+	params->data_retained =
+		FIELD_GET(CXL_MEMDEV_PPR_RESTRICTION_FLAG_DATA_RETAINED_MASK,
+			  restriction_flags) ^ 1;
+
+	return 0;
+}
+
+static int cxl_mem_do_ppr_op(struct device *dev,
+			     struct cxl_ppr_context *cxl_ppr_ctx,
+			     struct cxl_memdev_ppr_params *rd_params)
+{
+	struct cxl_memdev_ppr_maintenance_attrs maintenance_attrs;
+	struct cxl_memdev *cxlmd = cxl_ppr_ctx->cxlmd;
+	struct cxl_mem_repair_attrbs attrbs = { 0 };
+
+	if (!rd_params->media_accessible || !rd_params->data_retained) {
+		/* Memory to repair must be offline */
+		if (cxl_is_memdev_memory_online(cxlmd))
+			return -EBUSY;
+	} else {
+		if (cxl_is_memdev_memory_online(cxlmd)) {
+			/* Check memory to repair is from the current boot */
+			attrbs.repair_type = CXL_PPR;
+			attrbs.dpa = cxl_ppr_ctx->dpa;
+			attrbs.nibble_mask = cxl_ppr_ctx->nibble_mask;
+			if (!cxl_find_rec_dram(cxlmd, &attrbs) &&
+			    !cxl_find_rec_gen_media(cxlmd, &attrbs))
+				return -EINVAL;
+		}
+	}
+
+	memset(&maintenance_attrs, 0, sizeof(maintenance_attrs));
+	maintenance_attrs.flags = 0;
+	maintenance_attrs.dpa = cpu_to_le64(cxl_ppr_ctx->dpa);
+	put_unaligned_le24(cxl_ppr_ctx->nibble_mask,
+			   maintenance_attrs.nibble_mask);
+
+	return cxl_do_maintenance(&cxlmd->cxlds->cxl_mbox, rd_params->op_class,
+				  rd_params->op_subclass, &maintenance_attrs,
+				  sizeof(maintenance_attrs));
+}
+
+static int cxl_mem_ppr_set_attrs(struct device *dev,
+				 struct cxl_ppr_context *cxl_ppr_ctx)
+{
+	struct cxl_memdev_ppr_params rd_params;
+	int ret;
+
+	ret = cxl_mem_ppr_get_attrs(cxl_ppr_ctx, &rd_params);
+	if (ret)
+		return ret;
+
+	struct rw_semaphore *region_lock __free(cxl_unlock) =
+		cxl_acquire(&cxl_region_rwsem);
+	if (!region_lock)
+		return -EINTR;
+
+	struct rw_semaphore *dpa_lock __free(cxl_unlock) =
+		cxl_acquire(&cxl_dpa_rwsem);
+	if (!dpa_lock)
+		return -EINTR;
+
+	ret = cxl_mem_do_ppr_op(dev, cxl_ppr_ctx, &rd_params);
+
+	return ret;
+}
+
+static int cxl_ppr_get_repair_type(struct device *dev, void *drv_data,
+				   const char **repair_type)
+{
+	*repair_type = edac_repair_type[EDAC_PPR];
+
+	return 0;
+}
+
+static int cxl_ppr_get_persist_mode(struct device *dev, void *drv_data,
+				    bool *persist_mode)
+{
+	struct cxl_ppr_context *cxl_ppr_ctx = drv_data;
+
+	*persist_mode = cxl_ppr_ctx->persist_mode;
+
+	return 0;
+}
+
+static int cxl_get_ppr_safe_when_in_use(struct device *dev, void *drv_data,
+					bool *safe)
+{
+	struct cxl_ppr_context *cxl_ppr_ctx = drv_data;
+	struct cxl_memdev_ppr_params params;
+	int ret;
+
+	ret = cxl_mem_ppr_get_attrs(cxl_ppr_ctx, &params);
+	if (ret)
+		return ret;
+
+	*safe = params.media_accessible & params.data_retained;
+
+	return 0;
+}
+
+static int cxl_ppr_get_min_dpa(struct device *dev, void *drv_data, u64 *min_dpa)
+{
+	struct cxl_ppr_context *cxl_ppr_ctx = drv_data;
+	struct cxl_memdev *cxlmd = cxl_ppr_ctx->cxlmd;
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+
+	*min_dpa = cxlds->dpa_res.start;
+
+	return 0;
+}
+
+static int cxl_ppr_get_max_dpa(struct device *dev, void *drv_data, u64 *max_dpa)
+{
+	struct cxl_ppr_context *cxl_ppr_ctx = drv_data;
+	struct cxl_memdev *cxlmd = cxl_ppr_ctx->cxlmd;
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+
+	*max_dpa = cxlds->dpa_res.end;
+
+	return 0;
+}
+
+static int cxl_ppr_get_dpa(struct device *dev, void *drv_data, u64 *dpa)
+{
+	struct cxl_ppr_context *cxl_ppr_ctx = drv_data;
+
+	*dpa = cxl_ppr_ctx->dpa;
+
+	return 0;
+}
+
+static int cxl_ppr_set_dpa(struct device *dev, void *drv_data, u64 dpa)
+{
+	struct cxl_ppr_context *cxl_ppr_ctx = drv_data;
+	struct cxl_memdev *cxlmd = cxl_ppr_ctx->cxlmd;
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+
+	if (dpa < cxlds->dpa_res.start || dpa > cxlds->dpa_res.end)
+		return -EINVAL;
+
+	cxl_ppr_ctx->dpa = dpa;
+
+	return 0;
+}
+
+static int cxl_ppr_get_nibble_mask(struct device *dev, void *drv_data,
+				   u32 *nibble_mask)
+{
+	struct cxl_ppr_context *cxl_ppr_ctx = drv_data;
+
+	*nibble_mask = cxl_ppr_ctx->nibble_mask;
+
+	return 0;
+}
+
+static int cxl_ppr_set_nibble_mask(struct device *dev, void *drv_data,
+				   u32 nibble_mask)
+{
+	struct cxl_ppr_context *cxl_ppr_ctx = drv_data;
+
+	cxl_ppr_ctx->nibble_mask = nibble_mask;
+
+	return 0;
+}
+
+static int cxl_do_ppr(struct device *dev, void *drv_data, u32 val)
+{
+	struct cxl_ppr_context *cxl_ppr_ctx = drv_data;
+
+	if (!cxl_ppr_ctx->dpa || val != EDAC_DO_MEM_REPAIR)
+		return -EINVAL;
+
+	return cxl_mem_ppr_set_attrs(dev, cxl_ppr_ctx);
+}
+
+static const struct edac_mem_repair_ops cxl_sppr_ops = {
+	.get_repair_type = cxl_ppr_get_repair_type,
+	.get_persist_mode = cxl_ppr_get_persist_mode,
+	.get_repair_safe_when_in_use = cxl_get_ppr_safe_when_in_use,
+	.get_min_dpa = cxl_ppr_get_min_dpa,
+	.get_max_dpa = cxl_ppr_get_max_dpa,
+	.get_dpa = cxl_ppr_get_dpa,
+	.set_dpa = cxl_ppr_set_dpa,
+	.get_nibble_mask = cxl_ppr_get_nibble_mask,
+	.set_nibble_mask = cxl_ppr_set_nibble_mask,
+	.do_repair = cxl_do_ppr,
+};
+
+static int cxl_memdev_soft_ppr_init(struct cxl_memdev *cxlmd,
+				    struct edac_dev_feature *ras_feature,
+				    u8 repair_inst)
+{
+	struct cxl_ppr_context *cxl_sppr_ctx;
+	struct cxl_feat_entry *feat_entry;
+
+	feat_entry = cxl_get_feature_entry(cxlmd->cxlds, &CXL_FEAT_SPPR_UUID);
+	if (!feat_entry)
+		return -EOPNOTSUPP;
+
+	if (!(le32_to_cpu(feat_entry->flags) & CXL_FEATURE_F_CHANGEABLE))
+		return -EOPNOTSUPP;
+
+	cxl_sppr_ctx =
+		devm_kzalloc(&cxlmd->dev, sizeof(*cxl_sppr_ctx), GFP_KERNEL);
+	if (!cxl_sppr_ctx)
+		return -ENOMEM;
+
+	*cxl_sppr_ctx = (struct cxl_ppr_context){
+		.get_feat_size = le16_to_cpu(feat_entry->get_feat_size),
+		.set_feat_size = le16_to_cpu(feat_entry->set_feat_size),
+		.get_version = feat_entry->get_feat_ver,
+		.set_version = feat_entry->set_feat_ver,
+		.effects = le16_to_cpu(feat_entry->effects),
+		.cxlmd = cxlmd,
+		.repair_type = EDAC_PPR,
+		.persist_mode = 0,
+		.instance = repair_inst,
+	};
+	uuid_copy(&cxl_sppr_ctx->repair_uuid, &CXL_FEAT_SPPR_UUID);
+
+	ras_feature->ft_type = RAS_FEAT_MEM_REPAIR;
+	ras_feature->instance = cxl_sppr_ctx->instance;
+	ras_feature->mem_repair_ops = &cxl_sppr_ops;
+	ras_feature->ctx = cxl_sppr_ctx;
+
+	return 0;
+}
+
 int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd)
 {
 	struct edac_dev_feature ras_features[CXL_NR_EDAC_DEV_FEATURES];
 	int num_ras_features = 0;
+	u8 repair_inst = 0;
 	u8 scrub_inst = 0;
 	int rc;
 
@@ -794,6 +1143,16 @@ int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd)
 	if (rc != -EOPNOTSUPP)
 		num_ras_features++;
 
+	rc = cxl_memdev_soft_ppr_init(cxlmd, &ras_features[num_ras_features],
+				      repair_inst);
+	if (rc < 0 && rc != -EOPNOTSUPP)
+		return rc;
+
+	if (rc != -EOPNOTSUPP) {
+		repair_inst++;
+		num_ras_features++;
+	}
+
 	char *cxl_dev_name __free(kfree) =
 		kasprintf(GFP_KERNEL, "cxl_%s", dev_name(&cxlmd->dev));
 
diff --git a/drivers/edac/mem_repair.c b/drivers/edac/mem_repair.c
index 3b1a845457b0..bf7e01a8b4dd 100755
--- a/drivers/edac/mem_repair.c
+++ b/drivers/edac/mem_repair.c
@@ -45,6 +45,11 @@ struct edac_mem_repair_context {
 	struct attribute_group group;
 };
 
+const char * const edac_repair_type[] = {
+	[EDAC_PPR] = "ppr",
+};
+EXPORT_SYMBOL_GPL(edac_repair_type);
+
 #define TO_MR_DEV_ATTR(_dev_attr)      \
 	container_of(_dev_attr, struct edac_mem_repair_dev_attr, dev_attr)
 
diff --git a/include/linux/edac.h b/include/linux/edac.h
index 451f9c152c99..5669d8d2509a 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -745,9 +745,12 @@ static inline int edac_ecs_get_desc(struct device *ecs_dev,
 #endif /* CONFIG_EDAC_ECS */
 
 enum edac_mem_repair_type {
+	EDAC_PPR,
 	EDAC_REPAIR_MAX
 };
 
+extern const char * const edac_repair_type[];
+
 enum edac_mem_repair_cmd {
 	EDAC_DO_MEM_REPAIR = 1,
 };
-- 
2.43.0



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 8/8] cxl/memfeature: Add CXL memory device memory sparing control feature
  2025-03-20 18:04 [PATCH v2 0/8] cxl: support CXL memory RAS features shiju.jose
                   ` (6 preceding siblings ...)
  2025-03-20 18:04 ` [PATCH v2 7/8] cxl/memfeature: Add CXL memory device soft PPR control feature shiju.jose
@ 2025-03-20 18:04 ` shiju.jose
  2025-03-21  7:39 ` [PATCH v2 0/8] cxl: support CXL memory RAS features Christophe Leroy
  8 siblings, 0 replies; 24+ messages in thread
From: shiju.jose @ 2025-03-20 18:04 UTC (permalink / raw)
  To: linux-cxl, dan.j.williams, dave, jonathan.cameron, dave.jiang,
	alison.schofield, vishal.l.verma, ira.weiny, david,
	Vilas.Sridharan
  Cc: linux-edac, linux-acpi, linux-mm, linux-kernel, bp, tony.luck,
	rafael, lenb, mchehab, leo.duran, Yazen.Ghannam, rientjes,
	jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen, gthelen,
	wschwartz, dferguson, wbs, nifan.cxl, yazen.ghannam, tanxiaofei,
	prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm,
	shiju.jose

From: Shiju Jose <shiju.jose@huawei.com>

Memory sparing is defined as a repair function that replaces a portion of
memory with a portion of functional memory at that same DPA. The subclasses
for this operation vary in terms of the scope of the sparing being
performed. The cacheline sparing subclass refers to a sparing action that
can replace a full cacheline. Row sparing is provided as an alternative to
PPR sparing functions and its scope is that of a single DDR row.
As per CXL r3.2 Table 8-125 foot note 1. Memory sparing is preferred over
PPR when possible.
Bank sparing allows an entire bank to be replaced. Rank sparing is defined
as an operation in which an entire DDR rank is replaced.

Memory sparing maintenance operations may be supported by CXL devices
that implement CXL.mem protocol. A sparing maintenance operation requests
the CXL device to perform a repair operation on its media.
For example, a CXL device with DRAM components that support memory sparing
features may implement sparing maintenance operations.

The host may issue a query command by setting query resources flag in the
input payload (CXL spec 3.2 Table 8-120) to determine availability of
sparing resources for a given address. In response to a query request,
the device shall report the resource availability by producing the memory
sparing event record (CXL spec 3.2 Table 8-60) in which the Channel, Rank,
Nibble Mask, Bank Group, Bank, Row, Column, Sub-Channel fields are a copy
of the values specified in the request.

During the execution of a sparing maintenance operation, a CXL memory
device:
- may not retain data
- may not be able to process CXL.mem requests correctly.
These CXL memory device capabilities are specified by restriction flags
in the memory sparing feature readable attributes.

When a CXL device identifies error on a memory component, the device
may inform the host about the need for a memory sparing maintenance
operation by using DRAM event record, where the 'maintenance needed' flag
may set. The event record contains some of the DPA, Channel, Rank,
Nibble Mask, Bank Group, Bank, Row, Column, Sub-Channel fields that
should be repaired. The userspace tool requests for maintenance operation
if the 'maintenance needed' flag set in the CXL DRAM error record.

CXL spec 3.2 section 8.2.10.7.1.4 describes the device's memory sparing
maintenance operation feature.

CXL spec 3.2 section 8.2.10.7.2.3 describes the memory sparing feature
discovery and configuration.

Add support for controlling CXL memory device memory sparing feature.
Register with EDAC driver, which gets the memory repair attr descriptors
from the EDAC memory repair driver and exposes sysfs repair control
attributes for memory sparing to the userspace. For example CXL memory
sparing control for the CXL mem0 device is exposed in
/sys/bus/edac/devices/cxl_mem0/mem_repairX/

Use case
========
1. CXL device identifies a failure in a memory component, report to
   userspace in a CXL DRAM trace event with DPA and other attributes of
   memory to repair such as channel, rank, nibble mask, bank Group,
   bank, row, column, sub-channel.

2. Rasdaemon process the trace event and may issue query request in sysfs
check resources available for memory sparing if either of the following
conditions met.
 - 'maintenance needed' flag set in the event record.
 - 'threshold event' flag set for CVME threshold feature.
 - If the previous case is not enough, may be when the number of corrected
   error reported on a CXL.mem media to the user space exceeds an error
   threshold set in the userspace policy.

3. Rasdaemon process the memory sparing trace event and issue repair
   request for memory sparing.

Kernel CXL driver shall report memory sparing event record to the userspace
with the resource availability in order rasdaemon to process the event
record and issue a repair request in sysfs for the memory sparing operation
in the CXL device.

Note: Based on the feedbacks from the community 'query' sysfs attribute is
removed and reporting memory sparing error record to the userspace are not
supported. Instead userspace issues sparing operation and kernel does the
same to the CXL memory device, when 'maintenance needed' flag set in the
DRAM event record.

Add checks to ensure the memory to be repaired is offline and if online,
then originates from a CXL DRAM error record reported in the current boot
before requesting a memory sparing operation on the device.

Tested for memory sparing control feature with
   "hw/cxl: Add memory sparing control feature"
   Repository: "https://gitlab.com/shiju.jose/qemu.git"
   Branch: cxl-ras-features-2024-10-24

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 Documentation/edac/memory_repair.rst |  13 +
 drivers/cxl/core/edac.c              | 550 ++++++++++++++++++++++++++-
 drivers/edac/mem_repair.c            |   4 +
 include/linux/edac.h                 |   4 +
 4 files changed, 569 insertions(+), 2 deletions(-)

diff --git a/Documentation/edac/memory_repair.rst b/Documentation/edac/memory_repair.rst
index cb0d91ccebfd..3725a7c76697 100644
--- a/Documentation/edac/memory_repair.rst
+++ b/Documentation/edac/memory_repair.rst
@@ -135,5 +135,18 @@ may implement PPR Maintenance operations. Soft PPR (sPPR), is a temporary
 row repair. Soft PPR may be faster, but the repair is lost with a power
 cycle.
 
+2. CXL memory sparing
+
+Memory sparing is defined as a repair function that replaces a portion of
+memory with a portion of functional memory at that same DPA. The subclass
+for this operation, cacheline/row/bank/rank sparing, vary in terms of the
+scope of the sparing being performed.
+
+Memory sparing maintenance operations may be supported by CXL devices that
+implement CXL.mem protocol. A sparing maintenance operation requests the
+CXL device to perform a repair operation on its media. For example, a CXL
+device with DRAM components that support memory sparing features may
+implement sparing maintenance operations.
+
 Sysfs files for memory repair are documented in
 `Documentation/ABI/testing/sysfs-edac-memory-repair`
diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c
index f0fb7bac6544..0721a9db79f1 100644
--- a/drivers/cxl/core/edac.c
+++ b/drivers/cxl/core/edac.c
@@ -19,8 +19,9 @@
 #include <cxl.h>
 #include <cxlmem.h>
 #include "core.h"
+#include "trace.h"
 
-#define CXL_NR_EDAC_DEV_FEATURES 3
+#define CXL_NR_EDAC_DEV_FEATURES 7
 
 static struct rw_semaphore *cxl_acquire(struct rw_semaphore *rwsem)
 {
@@ -1120,13 +1121,545 @@ static int cxl_memdev_soft_ppr_init(struct cxl_memdev *cxlmd,
 	return 0;
 }
 
+/*
+ * CXL memory sparing control
+ */
+enum cxl_mem_sparing_granularity {
+	CXL_MEM_SPARING_CACHELINE,
+	CXL_MEM_SPARING_ROW,
+	CXL_MEM_SPARING_BANK,
+	CXL_MEM_SPARING_RANK,
+	CXL_MEM_SPARING_MAX
+};
+
+struct cxl_mem_sparing_context {
+	struct cxl_memdev *cxlmd;
+	uuid_t repair_uuid;
+	u16 get_feat_size;
+	u16 set_feat_size;
+	u16 effects;
+	u8 instance;
+	u8 get_version;
+	u8 set_version;
+	u8 channel;
+	u8 rank;
+	u8 bank_group;
+	u32 nibble_mask;
+	u64 dpa;
+	u32 row;
+	u16 column;
+	u8 bank;
+	u8 sub_channel;
+	enum edac_mem_repair_type repair_type;
+	bool persist_mode;
+	enum cxl_mem_sparing_granularity granularity;
+};
+
+struct cxl_memdev_sparing_params {
+	u8 op_class;
+	u8 op_subclass;
+	bool cap_safe_when_in_use;
+	bool cap_hard_sparing;
+	bool cap_soft_sparing;
+};
+
+#define CXL_MEMDEV_SPARING_RD_CAP_SAFE_IN_USE_MASK BIT(0)
+#define CXL_MEMDEV_SPARING_RD_CAP_HARD_SPARING_MASK BIT(1)
+#define CXL_MEMDEV_SPARING_RD_CAP_SOFT_SPARING_MASK BIT(2)
+
+#define CXL_MEMDEV_SPARING_WR_DEVICE_INITIATED_MASK BIT(0)
+
+#define CXL_MEMDEV_SPARING_QUERY_RESOURCE_FLAG BIT(0)
+#define CXL_MEMDEV_SET_HARD_SPARING_FLAG BIT(1)
+#define CXL_MEMDEV_SPARING_SUB_CHANNEL_VALID_FLAG BIT(2)
+#define CXL_MEMDEV_SPARING_NIB_MASK_VALID_FLAG BIT(3)
+
+/*
+ * See CXL spec rev 3.2 @8.2.10.7.2.3 Table 8-134 Memory Sparing Feature
+ * Readable Attributes.
+ */
+struct cxl_memdev_sparing_rd_attrs {
+	struct cxl_memdev_repair_rd_attrs_hdr hdr;
+	u8 rsvd;
+	__le16 restriction_flags;
+} __packed;
+
+/*
+ * See CXL spec rev 3.2 @8.2.10.7.1.4 Table 8-120 Memory Sparing Input Payload.
+ */
+struct cxl_memdev_sparing_in_payload {
+	u8 flags;
+	u8 channel;
+	u8 rank;
+	u8 nibble_mask[3];
+	u8 bank_group;
+	u8 bank;
+	u8 row[3];
+	__le16 column;
+	u8 sub_channel;
+} __packed;
+
+static int
+cxl_mem_sparing_get_attrs(struct cxl_mem_sparing_context *cxl_sparing_ctx,
+			  struct cxl_memdev_sparing_params *params)
+{
+	size_t rd_data_size = sizeof(struct cxl_memdev_sparing_rd_attrs);
+	struct cxl_memdev *cxlmd = cxl_sparing_ctx->cxlmd;
+	struct cxl_mailbox *cxl_mbox = &cxlmd->cxlds->cxl_mbox;
+	u16 restriction_flags;
+	size_t data_size;
+	u16 return_code;
+	struct cxl_memdev_sparing_rd_attrs *rd_attrs __free(kfree) =
+		kzalloc(rd_data_size, GFP_KERNEL);
+	if (!rd_attrs)
+		return -ENOMEM;
+
+	data_size = cxl_get_feature(cxl_mbox, &cxl_sparing_ctx->repair_uuid,
+				    CXL_GET_FEAT_SEL_CURRENT_VALUE, rd_attrs,
+				    rd_data_size, 0, &return_code);
+	if (!data_size)
+		return -EIO;
+
+	params->op_class = rd_attrs->hdr.op_class;
+	params->op_subclass = rd_attrs->hdr.op_subclass;
+	restriction_flags = le16_to_cpu(rd_attrs->restriction_flags);
+	params->cap_safe_when_in_use =
+		FIELD_GET(CXL_MEMDEV_SPARING_RD_CAP_SAFE_IN_USE_MASK,
+			  restriction_flags) ^ 1;
+	params->cap_hard_sparing =
+		FIELD_GET(CXL_MEMDEV_SPARING_RD_CAP_HARD_SPARING_MASK,
+			  restriction_flags);
+	params->cap_soft_sparing =
+		FIELD_GET(CXL_MEMDEV_SPARING_RD_CAP_SOFT_SPARING_MASK,
+			  restriction_flags);
+
+	return 0;
+}
+
+static struct cxl_event_dram *
+cxl_mem_get_rec_dram(struct cxl_memdev *cxlmd,
+		     struct cxl_mem_sparing_context *ctx)
+{
+	struct cxl_mem_repair_attrbs attrbs = { 0 };
+
+	attrbs.dpa = ctx->dpa;
+	attrbs.channel = ctx->channel;
+	attrbs.rank = ctx->rank;
+	attrbs.nibble_mask = ctx->nibble_mask;
+	switch (ctx->repair_type) {
+	case EDAC_CACHELINE_SPARING:
+		attrbs.repair_type = CXL_CACHELINE_SPARING;
+		attrbs.bank_group = ctx->bank_group;
+		attrbs.bank = ctx->bank;
+		attrbs.row = ctx->row;
+		attrbs.column = ctx->column;
+		attrbs.sub_channel = ctx->sub_channel;
+		break;
+	case EDAC_ROW_SPARING:
+		attrbs.repair_type = CXL_ROW_SPARING;
+		attrbs.bank_group = ctx->bank_group;
+		attrbs.bank = ctx->bank;
+		attrbs.row = ctx->row;
+		break;
+	case EDAC_BANK_SPARING:
+		attrbs.repair_type = CXL_BANK_SPARING;
+		attrbs.bank_group = ctx->bank_group;
+		attrbs.bank = ctx->bank;
+		break;
+	case EDAC_RANK_SPARING:
+		attrbs.repair_type = CXL_BANK_SPARING;
+		break;
+	default:
+		return NULL;
+	}
+
+	return cxl_find_rec_dram(cxlmd, &attrbs);
+}
+
+static int
+cxl_mem_do_sparing_op(struct device *dev,
+		      struct cxl_mem_sparing_context *cxl_sparing_ctx,
+		      struct cxl_memdev_sparing_params *rd_params)
+{
+	struct cxl_memdev *cxlmd = cxl_sparing_ctx->cxlmd;
+	struct cxl_memdev_sparing_in_payload sparing_pi;
+	struct cxl_event_dram *rec = NULL;
+	u16 validity_flags = 0;
+
+	if (!rd_params->cap_safe_when_in_use) {
+		/* Memory to repair must be offline */
+		if (cxl_is_memdev_memory_online(cxlmd))
+			return -EBUSY;
+	} else {
+		if (cxl_is_memdev_memory_online(cxlmd)) {
+			rec = cxl_mem_get_rec_dram(cxlmd, cxl_sparing_ctx);
+			if (!rec)
+				return -EINVAL;
+
+			validity_flags =
+				get_unaligned_le16(rec->media_hdr.validity_flags);
+			if (!validity_flags)
+				return -EINVAL;
+		}
+	}
+
+	memset(&sparing_pi, 0, sizeof(sparing_pi));
+	sparing_pi.flags =
+		FIELD_PREP(CXL_MEMDEV_SPARING_QUERY_RESOURCE_FLAG, 0);
+	if (cxl_sparing_ctx->persist_mode)
+		sparing_pi.flags |=
+			FIELD_PREP(CXL_MEMDEV_SET_HARD_SPARING_FLAG, 1);
+
+	switch (cxl_sparing_ctx->repair_type) {
+	case EDAC_CACHELINE_SPARING:
+		sparing_pi.column = cpu_to_le16(cxl_sparing_ctx->column);
+		if (!rec || (validity_flags & CXL_DER_VALID_SUB_CHANNEL)) {
+			sparing_pi.flags |=
+				FIELD_PREP(CXL_MEMDEV_SPARING_SUB_CHANNEL_VALID_FLAG, 1);
+			sparing_pi.sub_channel = cxl_sparing_ctx->sub_channel;
+		}
+		fallthrough;
+	case EDAC_ROW_SPARING:
+		put_unaligned_le24(cxl_sparing_ctx->row, sparing_pi.row);
+		fallthrough;
+	case EDAC_BANK_SPARING:
+		sparing_pi.bank_group = cxl_sparing_ctx->bank_group;
+		sparing_pi.bank = cxl_sparing_ctx->bank;
+		fallthrough;
+	case EDAC_RANK_SPARING:
+		sparing_pi.rank = cxl_sparing_ctx->rank;
+		fallthrough;
+	default:
+		sparing_pi.channel = cxl_sparing_ctx->channel;
+		if ((rec && (validity_flags & CXL_DER_VALID_NIBBLE)) ||
+		    (!rec && (!cxl_sparing_ctx->nibble_mask ||
+			     (cxl_sparing_ctx->nibble_mask & 0xFFFFFF)))) {
+			sparing_pi.flags |=
+				FIELD_PREP(CXL_MEMDEV_SPARING_NIB_MASK_VALID_FLAG, 1);
+			put_unaligned_le24(cxl_sparing_ctx->nibble_mask,
+					   sparing_pi.nibble_mask);
+		}
+		break;
+	}
+
+	return cxl_do_maintenance(&cxlmd->cxlds->cxl_mbox, rd_params->op_class,
+				  rd_params->op_subclass, &sparing_pi,
+				  sizeof(sparing_pi));
+}
+
+static int cxl_mem_sparing_set_attrs(struct device *dev,
+				     struct cxl_mem_sparing_context *ctx)
+{
+	struct cxl_memdev_sparing_params rd_params;
+	int ret;
+
+	ret = cxl_mem_sparing_get_attrs(ctx, &rd_params);
+	if (ret)
+		return ret;
+
+	struct rw_semaphore *region_lock __free(cxl_unlock) =
+		cxl_acquire(&cxl_region_rwsem);
+	if (!region_lock)
+		return -EINTR;
+
+	struct rw_semaphore *dpa_lock __free(cxl_unlock) =
+		cxl_acquire(&cxl_dpa_rwsem);
+	if (!dpa_lock)
+		return -EINTR;
+
+	ret = cxl_mem_do_sparing_op(dev, ctx, &rd_params);
+
+	return ret;
+}
+
+static int cxl_mem_sparing_get_repair_type(struct device *dev, void *drv_data,
+					   const char **repair_type)
+{
+	struct cxl_mem_sparing_context *ctx = drv_data;
+
+	switch (ctx->repair_type) {
+	case EDAC_CACHELINE_SPARING:
+	case EDAC_ROW_SPARING:
+	case EDAC_BANK_SPARING:
+	case EDAC_RANK_SPARING:
+		*repair_type = edac_repair_type[ctx->repair_type];
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+#define CXL_SPARING_GET_ATTR(attrib, data_type)			    \
+	static int cxl_mem_sparing_get_##attrib(		    \
+		struct device *dev, void *drv_data, data_type *val) \
+	{							    \
+		struct cxl_mem_sparing_context *ctx = drv_data;	    \
+								    \
+		*val = ctx->attrib;				    \
+								    \
+		return 0;					    \
+	}
+CXL_SPARING_GET_ATTR(persist_mode, bool)
+CXL_SPARING_GET_ATTR(dpa, u64)
+CXL_SPARING_GET_ATTR(nibble_mask, u32)
+CXL_SPARING_GET_ATTR(bank_group, u32)
+CXL_SPARING_GET_ATTR(bank, u32)
+CXL_SPARING_GET_ATTR(rank, u32)
+CXL_SPARING_GET_ATTR(row, u32)
+CXL_SPARING_GET_ATTR(column, u32)
+CXL_SPARING_GET_ATTR(channel, u32)
+CXL_SPARING_GET_ATTR(sub_channel, u32)
+
+#define CXL_SPARING_SET_ATTR(attrib, data_type)					\
+	static int cxl_mem_sparing_set_##attrib(struct device *dev,		\
+						void *drv_data, data_type val)	\
+	{									\
+		struct cxl_mem_sparing_context *ctx = drv_data;			\
+										\
+		ctx->attrib = val;						\
+										\
+		return 0;							\
+	}
+CXL_SPARING_SET_ATTR(nibble_mask, u32)
+CXL_SPARING_SET_ATTR(bank_group, u32)
+CXL_SPARING_SET_ATTR(bank, u32)
+CXL_SPARING_SET_ATTR(rank, u32)
+CXL_SPARING_SET_ATTR(row, u32)
+CXL_SPARING_SET_ATTR(column, u32)
+CXL_SPARING_SET_ATTR(channel, u32)
+CXL_SPARING_SET_ATTR(sub_channel, u32)
+
+static int cxl_mem_sparing_set_persist_mode(struct device *dev, void *drv_data,
+					    bool persist_mode)
+{
+	struct cxl_mem_sparing_context *ctx = drv_data;
+	struct cxl_memdev_sparing_params params;
+	int ret;
+
+	ret = cxl_mem_sparing_get_attrs(ctx, &params);
+	if (ret)
+		return ret;
+
+	if ((persist_mode && params.cap_hard_sparing) ||
+	    (!persist_mode && params.cap_soft_sparing))
+		ctx->persist_mode = persist_mode;
+	else
+		return -EOPNOTSUPP;
+
+	return 0;
+}
+
+static int cxl_get_mem_sparing_safe_when_in_use(struct device *dev,
+						void *drv_data, bool *safe)
+{
+	struct cxl_mem_sparing_context *ctx = drv_data;
+	struct cxl_memdev_sparing_params params;
+	int ret;
+
+	ret = cxl_mem_sparing_get_attrs(ctx, &params);
+	if (ret)
+		return ret;
+
+	*safe = params.cap_safe_when_in_use;
+
+	return 0;
+}
+
+static int cxl_mem_sparing_get_min_dpa(struct device *dev, void *drv_data,
+				       u64 *min_dpa)
+{
+	struct cxl_mem_sparing_context *ctx = drv_data;
+	struct cxl_memdev *cxlmd = ctx->cxlmd;
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+
+	*min_dpa = cxlds->dpa_res.start;
+
+	return 0;
+}
+
+static int cxl_mem_sparing_get_max_dpa(struct device *dev, void *drv_data,
+				       u64 *max_dpa)
+{
+	struct cxl_mem_sparing_context *ctx = drv_data;
+	struct cxl_memdev *cxlmd = ctx->cxlmd;
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+
+	*max_dpa = cxlds->dpa_res.end;
+
+	return 0;
+}
+
+static int cxl_mem_sparing_set_dpa(struct device *dev, void *drv_data, u64 dpa)
+{
+	struct cxl_mem_sparing_context *ctx = drv_data;
+	struct cxl_memdev *cxlmd = ctx->cxlmd;
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+
+	if (dpa < cxlds->dpa_res.start || dpa > cxlds->dpa_res.end)
+		return -EINVAL;
+
+	ctx->dpa = dpa;
+
+	return 0;
+}
+
+static int cxl_do_mem_sparing(struct device *dev, void *drv_data, u32 val)
+{
+	struct cxl_mem_sparing_context *ctx = drv_data;
+
+	if (val != EDAC_DO_MEM_REPAIR)
+		return -EINVAL;
+
+	return cxl_mem_sparing_set_attrs(dev, ctx);
+}
+
+#define RANK_OPS                                                             \
+	.get_repair_type = cxl_mem_sparing_get_repair_type,                  \
+	.get_persist_mode = cxl_mem_sparing_get_persist_mode,                \
+	.set_persist_mode = cxl_mem_sparing_set_persist_mode,                \
+	.get_repair_safe_when_in_use = cxl_get_mem_sparing_safe_when_in_use, \
+	.get_min_dpa = cxl_mem_sparing_get_min_dpa,                          \
+	.get_max_dpa = cxl_mem_sparing_get_max_dpa,                          \
+	.get_dpa = cxl_mem_sparing_get_dpa,                                  \
+	.set_dpa = cxl_mem_sparing_set_dpa,                                  \
+	.get_nibble_mask = cxl_mem_sparing_get_nibble_mask,                  \
+	.set_nibble_mask = cxl_mem_sparing_set_nibble_mask,                  \
+	.get_rank = cxl_mem_sparing_get_rank,                                \
+	.set_rank = cxl_mem_sparing_set_rank,                                \
+	.get_channel = cxl_mem_sparing_get_channel,                          \
+	.set_channel = cxl_mem_sparing_set_channel,                          \
+	.do_repair = cxl_do_mem_sparing
+
+#define BANK_OPS                                                    \
+	RANK_OPS, .get_bank_group = cxl_mem_sparing_get_bank_group, \
+		.set_bank_group = cxl_mem_sparing_set_bank_group,   \
+		.get_bank = cxl_mem_sparing_get_bank,               \
+		.set_bank = cxl_mem_sparing_set_bank
+
+#define ROW_OPS                                       \
+	BANK_OPS, .get_row = cxl_mem_sparing_get_row, \
+		.set_row = cxl_mem_sparing_set_row
+
+#define CACHELINE_OPS                                               \
+	ROW_OPS, .get_column = cxl_mem_sparing_get_column,          \
+		.set_column = cxl_mem_sparing_set_column,           \
+		.get_sub_channel = cxl_mem_sparing_get_sub_channel, \
+		.set_sub_channel = cxl_mem_sparing_set_sub_channel
+
+static const struct edac_mem_repair_ops cxl_rank_sparing_ops = {
+	RANK_OPS,
+};
+
+static const struct edac_mem_repair_ops cxl_bank_sparing_ops = {
+	BANK_OPS,
+};
+
+static const struct edac_mem_repair_ops cxl_row_sparing_ops = {
+	ROW_OPS,
+};
+
+static const struct edac_mem_repair_ops cxl_cacheline_sparing_ops = {
+	CACHELINE_OPS,
+};
+
+struct cxl_mem_sparing_desc {
+	const uuid_t repair_uuid;
+	enum edac_mem_repair_type repair_type;
+	enum cxl_mem_sparing_granularity granularity;
+	const struct edac_mem_repair_ops *repair_ops;
+};
+
+static const struct cxl_mem_sparing_desc mem_sparing_desc[] = {
+	{
+		.repair_uuid = CXL_FEAT_CACHELINE_SPARING_UUID,
+		.repair_type = EDAC_CACHELINE_SPARING,
+		.granularity = CXL_MEM_SPARING_CACHELINE,
+		.repair_ops = &cxl_cacheline_sparing_ops,
+	},
+	{
+		.repair_uuid = CXL_FEAT_ROW_SPARING_UUID,
+		.repair_type = EDAC_ROW_SPARING,
+		.granularity = CXL_MEM_SPARING_ROW,
+		.repair_ops = &cxl_row_sparing_ops,
+	},
+	{
+		.repair_uuid = CXL_FEAT_BANK_SPARING_UUID,
+		.repair_type = EDAC_BANK_SPARING,
+		.granularity = CXL_MEM_SPARING_BANK,
+		.repair_ops = &cxl_bank_sparing_ops,
+	},
+	{
+		.repair_uuid = CXL_FEAT_RANK_SPARING_UUID,
+		.repair_type = EDAC_RANK_SPARING,
+		.granularity = CXL_MEM_SPARING_RANK,
+		.repair_ops = &cxl_rank_sparing_ops,
+	},
+};
+
+static int cxl_memdev_sparing_init(struct cxl_memdev *cxlmd,
+				   struct edac_dev_feature *ras_feature,
+				   const struct cxl_mem_sparing_desc *desc,
+				   u8 repair_inst)
+{
+	struct cxl_mem_sparing_context *cxl_sparing_ctx;
+	struct cxl_memdev_sparing_params rd_params;
+	struct cxl_feat_entry *feat_entry;
+	int ret;
+
+	feat_entry = cxl_get_feature_entry(cxlmd->cxlds, &desc->repair_uuid);
+	if (!feat_entry)
+		return -EOPNOTSUPP;
+
+	if (!(le32_to_cpu(feat_entry->flags) & CXL_FEATURE_F_CHANGEABLE))
+		return -EOPNOTSUPP;
+
+	cxl_sparing_ctx = devm_kzalloc(&cxlmd->dev, sizeof(*cxl_sparing_ctx),
+				       GFP_KERNEL);
+	if (!cxl_sparing_ctx)
+		return -ENOMEM;
+
+	*cxl_sparing_ctx = (struct cxl_mem_sparing_context){
+		.get_feat_size = le16_to_cpu(feat_entry->get_feat_size),
+		.set_feat_size = le16_to_cpu(feat_entry->set_feat_size),
+		.get_version = feat_entry->get_feat_ver,
+		.set_version = feat_entry->set_feat_ver,
+		.effects = le16_to_cpu(feat_entry->effects),
+		.cxlmd = cxlmd,
+		.repair_type = desc->repair_type,
+		.granularity = desc->granularity,
+		.instance = repair_inst++,
+	};
+	uuid_copy(&cxl_sparing_ctx->repair_uuid, &desc->repair_uuid);
+
+	ret = cxl_mem_sparing_get_attrs(cxl_sparing_ctx, &rd_params);
+	if (ret)
+		return ret;
+
+	if ((rd_params.cap_soft_sparing && rd_params.cap_hard_sparing) ||
+	    rd_params.cap_soft_sparing)
+		cxl_sparing_ctx->persist_mode = 0;
+	else if (rd_params.cap_hard_sparing)
+		cxl_sparing_ctx->persist_mode = 1;
+	else
+		return -EOPNOTSUPP;
+
+	ras_feature->ft_type = RAS_FEAT_MEM_REPAIR;
+	ras_feature->instance = cxl_sparing_ctx->instance;
+	ras_feature->mem_repair_ops = desc->repair_ops;
+	ras_feature->ctx = cxl_sparing_ctx;
+
+	return 0;
+}
+
 int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd)
 {
 	struct edac_dev_feature ras_features[CXL_NR_EDAC_DEV_FEATURES];
 	int num_ras_features = 0;
 	u8 repair_inst = 0;
 	u8 scrub_inst = 0;
-	int rc;
+	int i, rc;
 
 	rc = cxl_memdev_scrub_init(cxlmd, &ras_features[num_ras_features],
 				   scrub_inst);
@@ -1153,6 +1686,19 @@ int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd)
 		num_ras_features++;
 	}
 
+	for (i = 0; i < CXL_MEM_SPARING_MAX; i++) {
+		rc = cxl_memdev_sparing_init(cxlmd,
+					     &ras_features[num_ras_features],
+					     &mem_sparing_desc[i], repair_inst);
+		if (rc == -EOPNOTSUPP)
+			continue;
+		if (rc < 0)
+			return rc;
+
+		repair_inst++;
+		num_ras_features++;
+	}
+
 	char *cxl_dev_name __free(kfree) =
 		kasprintf(GFP_KERNEL, "cxl_%s", dev_name(&cxlmd->dev));
 
diff --git a/drivers/edac/mem_repair.c b/drivers/edac/mem_repair.c
index bf7e01a8b4dd..c4a2e99c9355 100755
--- a/drivers/edac/mem_repair.c
+++ b/drivers/edac/mem_repair.c
@@ -47,6 +47,10 @@ struct edac_mem_repair_context {
 
 const char * const edac_repair_type[] = {
 	[EDAC_PPR] = "ppr",
+	[EDAC_CACHELINE_SPARING] = "cacheline-sparing",
+	[EDAC_ROW_SPARING] = "row-sparing",
+	[EDAC_BANK_SPARING] = "bank-sparing",
+	[EDAC_RANK_SPARING] = "rank-sparing",
 };
 EXPORT_SYMBOL_GPL(edac_repair_type);
 
diff --git a/include/linux/edac.h b/include/linux/edac.h
index 5669d8d2509a..57c9856a6bd9 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -746,6 +746,10 @@ static inline int edac_ecs_get_desc(struct device *ecs_dev,
 
 enum edac_mem_repair_type {
 	EDAC_PPR,
+	EDAC_CACHELINE_SPARING,
+	EDAC_ROW_SPARING,
+	EDAC_BANK_SPARING,
+	EDAC_RANK_SPARING,
 	EDAC_REPAIR_MAX
 };
 
-- 
2.43.0



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/8] cxl: support CXL memory RAS features
  2025-03-20 18:04 [PATCH v2 0/8] cxl: support CXL memory RAS features shiju.jose
                   ` (7 preceding siblings ...)
  2025-03-20 18:04 ` [PATCH v2 8/8] cxl/memfeature: Add CXL memory device memory sparing " shiju.jose
@ 2025-03-21  7:39 ` Christophe Leroy
  8 siblings, 0 replies; 24+ messages in thread
From: Christophe Leroy @ 2025-03-21  7:39 UTC (permalink / raw)
  To: shiju.jose, linux-cxl, dan.j.williams, dave, jonathan.cameron,
	dave.jiang, alison.schofield, vishal.l.verma, ira.weiny, david,
	Vilas.Sridharan
  Cc: linux-edac, linux-acpi, linux-mm, linux-kernel, bp, tony.luck,
	rafael, lenb, mchehab, leo.duran, Yazen.Ghannam, rientjes,
	jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen, gthelen,
	wschwartz, dferguson, wbs, nifan.cxl, tanxiaofei, prime.zeng,
	roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm



Le 20/03/2025 à 19:04, shiju.jose@huawei.com a écrit :
> From: Shiju Jose <shiju.jose@huawei.com>
> 
> Support for CXL memory RAS features: patrol scrub, ECS, soft-PPR and
> memory sparing.
> 
> This CXL series was part of the EDAC series [1].
> 
> The code is based on cxl.git next branch [2] merged with ras.git edac-cxl
> branch [3].
> 
> 1. https://lore.kernel.org/linux-cxl/20250212143654.1893-1-shiju.jose@huawei.com/
> 2. https://web.git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=next
> 3. https://web.git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git/log/?h=edac-cxl
> 
> Userspace code for CXL memory repair features [4] and
> sample boot-script for CXL memory repair [5].
> 
> [4]: https://lore.kernel.org/lkml/20250207143028.1865-1-shiju.jose@huawei.com/
> [5]: https://lore.kernel.org/lkml/20250207143028.1865-5-shiju.jose@huawei.com/

The title for the series is quite confusing, CXL seems to be something 
else. There is a series here [1] that removes CXL driver, but after 
looking it seems to be something completely different.

[1] https://lore.kernel.org/all/20250219070007.177725-1-ajd@linux.ibm.com/

Christophe


> 
> Changes
> =======
> v1 -> v2:
> 1. Feedbacks from Dan Williams on v1,
>     https://lore.kernel.org/linux-mm/20250307091137.00006a0a@huawei.com/T/
>    - Fixed lock issues in region scrubbing, added local cxl_acquire()
>      and cxl_unlock.
>    - Replaced CXL examples using cat and echo from EDAC .rst docs
>      with short description and ref to ABI docs. Also corrections
>      in existing descriptions as suggested by Dan.
>    - Add policy description for the scrub control feature.
>      However this may require inputs from CXL experts.
>    - Replaced CONFIG_CXL_RAS_FEATURES with CONFIG_CXL_EDAC_MEM_FEATURES.
>    - Few changes to depends part of CONFIG_CXL_EDAC_MEM_FEATURES.
>    - Rename drivers/cxl/core/memfeatures.c as drivers/cxl/core/edac.c
>    - snprintf() -> kasprintf() in few places.
>    
> 2. Feedbacks from Alison on v1,
>    - In cxl_get_feature_entry()(patch 1), return NULL on failures and
>      reintroduced checks in cxl_get_feature_entry().
>    - Changed logic in for loop in region based scrubbing code.
>    - Replace cxl_are_decoders_committed() to cxl_is_memdev_memory_online()
>      and add as a local function to drivers/cxl/core/edac.c
>    - Changed few multiline comments to single line comments.
>    - Removed unnecessary comments from the code.
>    - Reduced line length of few macros in ECS and memory repair code.
>    - In new files, changed "GPL-2.0-or-later" -> "GPL-2.0-only".
>    - Ran clang-format for new files and updated.
> 3. Changes for feedbacks from Jonathan on v1.
>    - Changed few multiline comments to single line comments.
> 
> Shiju Jose (8):
>    cxl: Add helper function to retrieve a feature entry
>    EDAC: Update documentation for the CXL memory patrol scrub control
>      feature
>    cxl/edac: Add CXL memory device patrol scrub control feature
>    cxl/edac: Add CXL memory device ECS control feature
>    cxl/mbox: Add support for PERFORM_MAINTENANCE mailbox command
>    cxl: Support for finding memory operation attributes from the current
>      boot
>    cxl/memfeature: Add CXL memory device soft PPR control feature
>    cxl/memfeature: Add CXL memory device memory sparing control feature
> 
>   Documentation/edac/memory_repair.rst |   31 +
>   Documentation/edac/scrub.rst         |   47 +
>   drivers/cxl/Kconfig                  |   27 +
>   drivers/cxl/core/Makefile            |    1 +
>   drivers/cxl/core/core.h              |    2 +
>   drivers/cxl/core/edac.c              | 1730 ++++++++++++++++++++++++++
>   drivers/cxl/core/features.c          |   23 +
>   drivers/cxl/core/mbox.c              |   45 +-
>   drivers/cxl/core/memdev.c            |    9 +
>   drivers/cxl/core/ras.c               |  145 +++
>   drivers/cxl/core/region.c            |    5 +
>   drivers/cxl/cxlmem.h                 |   73 ++
>   drivers/cxl/mem.c                    |    4 +
>   drivers/cxl/pci.c                    |    3 +
>   drivers/edac/mem_repair.c            |    9 +
>   include/linux/edac.h                 |    7 +
>   16 files changed, 2159 insertions(+), 2 deletions(-)
>   create mode 100644 drivers/cxl/core/edac.c
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/8] EDAC: Update documentation for the CXL memory patrol scrub control feature
  2025-03-20 18:04 ` [PATCH v2 2/8] EDAC: Update documentation for the CXL memory patrol scrub control feature shiju.jose
@ 2025-03-21 10:03   ` Jonathan Cameron
  2025-03-24 10:37     ` Shiju Jose
  2025-03-26 21:46     ` Dan Williams
  0 siblings, 2 replies; 24+ messages in thread
From: Jonathan Cameron @ 2025-03-21 10:03 UTC (permalink / raw)
  To: shiju.jose
  Cc: linux-cxl, dan.j.williams, dave, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, david, Vilas.Sridharan, linux-edac,
	linux-acpi, linux-mm, linux-kernel, bp, tony.luck, rafael, lenb,
	mchehab, leo.duran, Yazen.Ghannam, rientjes, jiaqiyan, Jon.Grimm,
	dave.hansen, naoya.horiguchi, james.morse, jthoughton,
	somasundaram.a, erdemaktas, pgonda, duenwen, gthelen, wschwartz,
	dferguson, wbs, nifan.cxl, tanxiaofei, prime.zeng, roberto.sassu,
	kangkang.shen, wanghuiqiang, linuxarm

On Thu, 20 Mar 2025 18:04:39 +0000
<shiju.jose@huawei.com> wrote:

> From: Shiju Jose <shiju.jose@huawei.com>
> 
> Update the Documentation/edac/scrub.rst to include descriptions and
> policies for CXL memory device-based and CXL region-based patrol scrub
> control.
> 
> Note: This may require inputs from CXL memory experts regarding
> region-based scrubbing policies.

So I suggested the region interfaces in the first place.  It's all about
usecases and 'why' we might increase the scrub rate.
Ultimately the hardware is controlled in a device wide way, so we could
have made it complex userspace problem to deal with it on a perf device.
The region interfaces are there as a simplification not because they
are strictly necessary.

Anyhow, the use cases:

1) Scrubbing because a device is showing unexpectedly high errors.  That
   control needs to be at device granularity.  If one device in an interleave
   set (backing a region) is dodgy, why make them all do more work?

2) Scrubbing may apply to memory that isn't online at all yet.  Nice to know
   if we have a problem before we start using it!  Likely this is setting
   system wide defaults on boot.

3) Scrubbing at higher rate because software has decided that we want
   more reliability for particular data.  I've been calling this
   Differentiated Reliability.  That data sits in a region which
   may cover part of multiple devices. The region interfaces are about
   supporting this use case.

So now the question is what do we do if both interfaces are poked
because someone cares simultaneously about 1 and 3?

I'd suggest just laying out a set for rules on how to set the scrub rates
for any mixture of requirements, rather than making the driver work out
the optimum combination.

> 
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> ---
>  Documentation/edac/scrub.rst | 47 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 47 insertions(+)
> 
> diff --git a/Documentation/edac/scrub.rst b/Documentation/edac/scrub.rst
> index daab929cdba1..d1c02bd90090 100644
> --- a/Documentation/edac/scrub.rst
> +++ b/Documentation/edac/scrub.rst
> @@ -264,3 +264,51 @@ Sysfs files are documented in
>  `Documentation/ABI/testing/sysfs-edac-scrub`
>  
>  `Documentation/ABI/testing/sysfs-edac-ecs`
> +
> +Examples
> +--------
> +
> +The usage takes the form shown in these examples:
> +
> +1. CXL memory device patrol scrubber
> +
> +1.1 Device based scrubbing
> +
> +CXL memory is exposed to memory management subsystem and ultimately userspace
> +via CXL devices.
> +
> +For cases where hardware interleave controls do not directly map to regions of
> +Physical Address space, perhaps due to interleave the approach described in 
> +1.2 Region based scrubbing section, which is specific to CXL regions should be
> +followed.

These sentences end up a bit unwieldy. Perhaps simply a forwards reference.

When combining control via the device interfaces and region interfaces see
1.2 Region bases scrubbing.

> In those cases settings on the presented interface may interact with
> +direct control via a device instance specific interface and care must be taken.
> +
> +Sysfs files for scrubbing are documented in
> +`Documentation/ABI/testing/sysfs-edac-scrub`
> +
> +1.2. Region based scrubbing
> +
> +CXL memory is exposed to memory management subsystem and ultimately userspace
> +via CXL regions. CXL Regions represent mapped memory capacity in system
> +physical address space. These can incorporate one or more parts of multiple CXL
> +memory devices with traffic interleaved across them. The user may want to control
> +the scrub rate via this more abstract region instead of having to figure out the
> +constituent devices and program them separately. The scrub rate for each device
> +covers the whole device. Thus if multiple regions use parts of that device then
> +requests for scrubbing of other regions may result in a higher scrub rate than
> +requested for this specific region.
> +
> +1. When user sets scrub rate for a memory region, the scrub rate for all the CXL
> +   memory devices interleaved under that region is updated with the same scrub
> +   rate. 

Note that this may affect multiple regions.

> +
> +2. When user sets scrub rate for a memory device, only the scrub rate for that
> +   memory devices is updated though device may be part of a memory region and
> +   does not change scrub rate of other memory devices of that memory region.
> +
> +3. Scrub rate of a CXL memory device may be set via EDAC device or region scrub
> +   interface simultaneously. Care must be taken to prevent a race condition, or
> +   only region-based setting may be allowed.

So is this saying if you want to mix and match, set region first then device
next?  Can we just lay out the rules to set up a weird mixture.  We could
add more smarts to the driver but do we care as mixing 1 and 3 above is probably
unusual?

1. Taking each region in turn from lowest desired scrub rate to highest and set
   their scrub rates.  Later regions may override the scrub rate on individual
   devices (and hence potentially whole regions).

2. Take each device for which enhanced scrubbing is required (higher rate) and
   set those scrub rates.  This will override the scrub rates of individual devices
   leaving any that are not specifically set to scrub at the maximum rate required
   for any of the regions they are involved in backing.

> +
> +Sysfs files for scrubbing are documented in
> +`Documentation/ABI/testing/sysfs-edac-scrub`

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v2 2/8] EDAC: Update documentation for the CXL memory patrol scrub control feature
  2025-03-21 10:03   ` Jonathan Cameron
@ 2025-03-24 10:37     ` Shiju Jose
  2025-03-26 21:46     ` Dan Williams
  1 sibling, 0 replies; 24+ messages in thread
From: Shiju Jose @ 2025-03-24 10:37 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-cxl, dan.j.williams, dave, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, david, Vilas.Sridharan, linux-edac,
	linux-acpi, linux-mm, linux-kernel, bp, tony.luck, rafael, lenb,
	mchehab, leo.duran, Yazen.Ghannam, rientjes, jiaqiyan, Jon.Grimm,
	dave.hansen, naoya.horiguchi, james.morse, jthoughton,
	somasundaram.a, erdemaktas, pgonda, duenwen, gthelen, wschwartz,
	dferguson, wbs, nifan.cxl, tanxiaofei, Zengtao (B),
	Roberto Sassu, kangkang.shen, wanghuiqiang, Linuxarm


>-----Original Message-----
>From: Jonathan Cameron <jonathan.cameron@huawei.com>
>Sent: 21 March 2025 10:03
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: linux-cxl@vger.kernel.org; dan.j.williams@intel.com; dave@stgolabs.net;
>dave.jiang@intel.com; alison.schofield@intel.com; vishal.l.verma@intel.com;
>ira.weiny@intel.com; david@redhat.com; Vilas.Sridharan@amd.com; linux-
>edac@vger.kernel.org; linux-acpi@vger.kernel.org; linux-mm@kvack.org; linux-
>kernel@vger.kernel.org; bp@alien8.de; tony.luck@intel.com; rafael@kernel.org;
>lenb@kernel.org; mchehab@kernel.org; leo.duran@amd.com;
>Yazen.Ghannam@amd.com; rientjes@google.com; jiaqiyan@google.com;
>Jon.Grimm@amd.com; dave.hansen@linux.intel.com;
>naoya.horiguchi@nec.com; james.morse@arm.com; jthoughton@google.com;
>somasundaram.a@hpe.com; erdemaktas@google.com; pgonda@google.com;
>duenwen@google.com; gthelen@google.com;
>wschwartz@amperecomputing.com; dferguson@amperecomputing.com;
>wbs@os.amperecomputing.com; nifan.cxl@gmail.com; tanxiaofei
><tanxiaofei@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>; Roberto
>Sassu <roberto.sassu@huawei.com>; kangkang.shen@futurewei.com;
>wanghuiqiang <wanghuiqiang@huawei.com>; Linuxarm
><linuxarm@huawei.com>
>Subject: Re: [PATCH v2 2/8] EDAC: Update documentation for the CXL memory
>patrol scrub control feature
>
>On Thu, 20 Mar 2025 18:04:39 +0000
><shiju.jose@huawei.com> wrote:
>
>> From: Shiju Jose <shiju.jose@huawei.com>
>>
>> Update the Documentation/edac/scrub.rst to include descriptions and
>> policies for CXL memory device-based and CXL region-based patrol scrub
>> control.
>>
>> Note: This may require inputs from CXL memory experts regarding
>> region-based scrubbing policies.
>
>So I suggested the region interfaces in the first place.  It's all about usecases and
>'why' we might increase the scrub rate.
>Ultimately the hardware is controlled in a device wide way, so we could have
>made it complex userspace problem to deal with it on a perf device.
>The region interfaces are there as a simplification not because they are strictly
>necessary.
>
>Anyhow, the use cases:
>
>1) Scrubbing because a device is showing unexpectedly high errors.  That
>   control needs to be at device granularity.  If one device in an interleave
>   set (backing a region) is dodgy, why make them all do more work?
>
>2) Scrubbing may apply to memory that isn't online at all yet.  Nice to know
>   if we have a problem before we start using it!  Likely this is setting
>   system wide defaults on boot.
>
>3) Scrubbing at higher rate because software has decided that we want
>   more reliability for particular data.  I've been calling this
>   Differentiated Reliability.  That data sits in a region which
>   may cover part of multiple devices. The region interfaces are about
>   supporting this use case.
>
>So now the question is what do we do if both interfaces are poked because
>someone cares simultaneously about 1 and 3?
>
>I'd suggest just laying out a set for rules on how to set the scrub rates for any
>mixture of requirements, rather than making the driver work out the optimum
>combination.
>
>>
>> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
>> ---
>>  Documentation/edac/scrub.rst | 47
>> ++++++++++++++++++++++++++++++++++++
>>  1 file changed, 47 insertions(+)
>>
>> diff --git a/Documentation/edac/scrub.rst
>> b/Documentation/edac/scrub.rst index daab929cdba1..d1c02bd90090 100644
>> --- a/Documentation/edac/scrub.rst
>> +++ b/Documentation/edac/scrub.rst
>> @@ -264,3 +264,51 @@ Sysfs files are documented in
>> `Documentation/ABI/testing/sysfs-edac-scrub`
>>
>>  `Documentation/ABI/testing/sysfs-edac-ecs`
>> +
>> +Examples
>> +--------
>> +
>> +The usage takes the form shown in these examples:
>> +
>> +1. CXL memory device patrol scrubber
>> +
>> +1.1 Device based scrubbing
>> +
>> +CXL memory is exposed to memory management subsystem and ultimately
>> +userspace via CXL devices.
>> +
>> +For cases where hardware interleave controls do not directly map to
>> +regions of Physical Address space, perhaps due to interleave the
>> +approach described in
>> +1.2 Region based scrubbing section, which is specific to CXL regions
>> +should be followed.
>
>These sentences end up a bit unwieldy. Perhaps simply a forwards reference.
>
>When combining control via the device interfaces and region interfaces see
>1.2 Region bases scrubbing.
>
>
>
>> In those cases settings on the presented interface may interact with
>> +direct control via a device instance specific interface and care must be taken.
>> +
>> +Sysfs files for scrubbing are documented in
>> +`Documentation/ABI/testing/sysfs-edac-scrub`
>> +
>> +1.2. Region based scrubbing
>> +
>> +CXL memory is exposed to memory management subsystem and ultimately
>> +userspace via CXL regions. CXL Regions represent mapped memory
>> +capacity in system physical address space. These can incorporate one
>> +or more parts of multiple CXL memory devices with traffic interleaved
>> +across them. The user may want to control the scrub rate via this
>> +more abstract region instead of having to figure out the constituent
>> +devices and program them separately. The scrub rate for each device
>> +covers the whole device. Thus if multiple regions use parts of that
>> +device then requests for scrubbing of other regions may result in a higher
>scrub rate than requested for this specific region.
>> +
>> +1. When user sets scrub rate for a memory region, the scrub rate for all the
>CXL
>> +   memory devices interleaved under that region is updated with the same
>scrub
>> +   rate.
>
>Note that this may affect multiple regions.
>
>> +
>> +2. When user sets scrub rate for a memory device, only the scrub rate for
>that
>> +   memory devices is updated though device may be part of a memory region
>and
>> +   does not change scrub rate of other memory devices of that memory
>region.
>> +
>> +3. Scrub rate of a CXL memory device may be set via EDAC device or region
>scrub
>> +   interface simultaneously. Care must be taken to prevent a race condition,
>or
>> +   only region-based setting may be allowed.
>
>So is this saying if you want to mix and match, set region first then device next?
>Can we just lay out the rules to set up a weird mixture.  We could add more
>smarts to the driver but do we care as mixing 1 and 3 above is probably
>unusual?
>
>1. Taking each region in turn from lowest desired scrub rate to highest and set
>   their scrub rates.  Later regions may override the scrub rate on individual
>   devices (and hence potentially whole regions).
>
>2. Take each device for which enhanced scrubbing is required (higher rate) and
>   set those scrub rates.  This will override the scrub rates of individual devices
>   leaving any that are not specifically set to scrub at the maximum rate required
>   for any of the regions they are involved in backing.

Thanks. Will incorporate these info and rules in the next version.
>
>
>> +
>> +Sysfs files for scrubbing are documented in
>> +`Documentation/ABI/testing/sysfs-edac-scrub`

Shiju



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/8] cxl: Add helper function to retrieve a feature entry
  2025-03-20 18:04 ` [PATCH v2 1/8] cxl: Add helper function to retrieve a feature entry shiju.jose
@ 2025-03-26 21:32   ` Dan Williams
  2025-03-27 16:59     ` Shiju Jose
  0 siblings, 1 reply; 24+ messages in thread
From: Dan Williams @ 2025-03-26 21:32 UTC (permalink / raw)
  To: shiju.jose, linux-cxl, dan.j.williams, dave, jonathan.cameron,
	dave.jiang, alison.schofield, vishal.l.verma, ira.weiny, david,
	Vilas.Sridharan
  Cc: linux-edac, linux-acpi, linux-mm, linux-kernel, bp, tony.luck,
	rafael, lenb, mchehab, leo.duran, Yazen.Ghannam, rientjes,
	jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen, gthelen,
	wschwartz, dferguson, wbs, nifan.cxl, tanxiaofei, prime.zeng,
	roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm, shiju.jose

shiju.jose@ wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
> 
> Add helper function to retrieve a feature entry from the supported
> features list, if supported.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Fan Ni <fan.ni@samsung.com>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> ---
>  drivers/cxl/core/core.h     |  2 ++
>  drivers/cxl/core/features.c | 23 +++++++++++++++++++++++
>  2 files changed, 25 insertions(+)
> 
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index 1803aedb25ca..16bc717376fc 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -123,6 +123,8 @@ int cxl_ras_init(void);
>  void cxl_ras_exit(void);
>  
>  #ifdef CONFIG_CXL_FEATURES
> +struct cxl_feat_entry *cxl_get_feature_entry(struct cxl_dev_state *cxlds,
> +					     const uuid_t *feat_uuid);

It is unfortunate that this naming choice is too similar to
cxl_get_feature(). However, as I go to suggest a new name I find that
this is a duplicate of get_support_feature_info() in Dave's fwctl
series. Just drop this patch in favor of that.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/8] EDAC: Update documentation for the CXL memory patrol scrub control feature
  2025-03-21 10:03   ` Jonathan Cameron
  2025-03-24 10:37     ` Shiju Jose
@ 2025-03-26 21:46     ` Dan Williams
  1 sibling, 0 replies; 24+ messages in thread
From: Dan Williams @ 2025-03-26 21:46 UTC (permalink / raw)
  To: Jonathan Cameron, shiju.jose
  Cc: linux-cxl, dan.j.williams, dave, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, david, Vilas.Sridharan, linux-edac,
	linux-acpi, linux-mm, linux-kernel, bp, tony.luck, rafael, lenb,
	mchehab, leo.duran, Yazen.Ghannam, rientjes, jiaqiyan, Jon.Grimm,
	dave.hansen, naoya.horiguchi, james.morse, jthoughton,
	somasundaram.a, erdemaktas, pgonda, duenwen, gthelen, wschwartz,
	dferguson, wbs, nifan.cxl, tanxiaofei, prime.zeng, roberto.sassu,
	kangkang.shen, wanghuiqiang, linuxarm

Jonathan Cameron wrote:
> On Thu, 20 Mar 2025 18:04:39 +0000
> <shiju.jose@huawei.com> wrote:
> 
> > From: Shiju Jose <shiju.jose@huawei.com>
> > 
> > Update the Documentation/edac/scrub.rst to include descriptions and
> > policies for CXL memory device-based and CXL region-based patrol scrub
> > control.
> > 
> > Note: This may require inputs from CXL memory experts regarding
> > region-based scrubbing policies.
> 
> So I suggested the region interfaces in the first place.  It's all about
> usecases and 'why' we might increase the scrub rate.
> Ultimately the hardware is controlled in a device wide way, so we could
> have made it complex userspace problem to deal with it on a perf device.
> The region interfaces are there as a simplification not because they
> are strictly necessary.
> 
> Anyhow, the use cases:
> 
> 1) Scrubbing because a device is showing unexpectedly high errors.  That
>    control needs to be at device granularity.  If one device in an interleave
>    set (backing a region) is dodgy, why make them all do more work?
> 
> 2) Scrubbing may apply to memory that isn't online at all yet.  Nice to know
>    if we have a problem before we start using it!  Likely this is setting
>    system wide defaults on boot.
> 
> 3) Scrubbing at higher rate because software has decided that we want
>    more reliability for particular data.  I've been calling this
>    Differentiated Reliability.  That data sits in a region which
>    may cover part of multiple devices. The region interfaces are about
>    supporting this use case.
> 
> So now the question is what do we do if both interfaces are poked
> because someone cares simultaneously about 1 and 3?
> 
> I'd suggest just laying out a set for rules on how to set the scrub rates
> for any mixture of requirements, rather than making the driver work out
> the optimum combination.
>  
> > 
> > Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> > ---
> >  Documentation/edac/scrub.rst | 47 ++++++++++++++++++++++++++++++++++++
> >  1 file changed, 47 insertions(+)
> > 
> > diff --git a/Documentation/edac/scrub.rst b/Documentation/edac/scrub.rst
> > index daab929cdba1..d1c02bd90090 100644
> > --- a/Documentation/edac/scrub.rst
> > +++ b/Documentation/edac/scrub.rst
> > @@ -264,3 +264,51 @@ Sysfs files are documented in
> >  `Documentation/ABI/testing/sysfs-edac-scrub`
> >  
> >  `Documentation/ABI/testing/sysfs-edac-ecs`
> > +
> > +Examples
> > +--------
> > +
> > +The usage takes the form shown in these examples:
> > +
> > +1. CXL memory device patrol scrubber
> > +
> > +1.1 Device based scrubbing
> > +
> > +CXL memory is exposed to memory management subsystem and ultimately userspace
> > +via CXL devices.
> > +
> > +For cases where hardware interleave controls do not directly map to regions of
> > +Physical Address space, perhaps due to interleave the approach described in 
> > +1.2 Region based scrubbing section, which is specific to CXL regions should be
> > +followed.
> 
> These sentences end up a bit unwieldy. Perhaps simply a forwards reference.
> 
> When combining control via the device interfaces and region interfaces see
> 1.2 Region bases scrubbing.
> 
> 
>  
> > In those cases settings on the presented interface may interact with
> > +direct control via a device instance specific interface and care must be taken.
> > +
> > +Sysfs files for scrubbing are documented in
> > +`Documentation/ABI/testing/sysfs-edac-scrub`
> > +
> > +1.2. Region based scrubbing
> > +
> > +CXL memory is exposed to memory management subsystem and ultimately userspace
> > +via CXL regions. CXL Regions represent mapped memory capacity in system
> > +physical address space. These can incorporate one or more parts of multiple CXL
> > +memory devices with traffic interleaved across them. The user may want to control
> > +the scrub rate via this more abstract region instead of having to figure out the
> > +constituent devices and program them separately. The scrub rate for each device
> > +covers the whole device. Thus if multiple regions use parts of that device then
> > +requests for scrubbing of other regions may result in a higher scrub rate than
> > +requested for this specific region.
> > +
> > +1. When user sets scrub rate for a memory region, the scrub rate for all the CXL
> > +   memory devices interleaved under that region is updated with the same scrub
> > +   rate. 
> 
> Note that this may affect multiple regions.
> 
> > +
> > +2. When user sets scrub rate for a memory device, only the scrub rate for that
> > +   memory devices is updated though device may be part of a memory region and
> > +   does not change scrub rate of other memory devices of that memory region.
> > +
> > +3. Scrub rate of a CXL memory device may be set via EDAC device or region scrub
> > +   interface simultaneously. Care must be taken to prevent a race condition, or
> > +   only region-based setting may be allowed.
> 
> So is this saying if you want to mix and match, set region first then device
> next?  Can we just lay out the rules to set up a weird mixture.  We could
> add more smarts to the driver but do we care as mixing 1 and 3 above is probably
> unusual?
> 
> 1. Taking each region in turn from lowest desired scrub rate to highest and set
>    their scrub rates.  Later regions may override the scrub rate on individual
>    devices (and hence potentially whole regions).
> 
> 2. Take each device for which enhanced scrubbing is required (higher rate) and
>    set those scrub rates.  This will override the scrub rates of individual devices
>    leaving any that are not specifically set to scrub at the maximum rate required
>    for any of the regions they are involved in backing.

I would just say that last used interface wins and that the kernel
should probably emit an info message if someone mixes the interafces.
Something like:

"device scrub rate set by region%d rate overwritten with device local
scrub rate"

A similar message also applies when regionX setting and regionY setting
collides at the same device:

"device scrub rate set by region%d rate overwritten by region%d scrub
rate"

With that we at least have the chance to tell people that are confused
about their effective scrub rate to check for these conflicts in the
log, and otherwise push the conflict resolution problem to userspace
policy.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/8] cxl/edac: Add CXL memory device patrol scrub control feature
  2025-03-20 18:04 ` [PATCH v2 3/8] cxl/edac: Add CXL memory device " shiju.jose
@ 2025-03-27  1:47   ` Dan Williams
  2025-03-28 10:18     ` Shiju Jose
  0 siblings, 1 reply; 24+ messages in thread
From: Dan Williams @ 2025-03-27  1:47 UTC (permalink / raw)
  To: shiju.jose, linux-cxl, dan.j.williams, dave, jonathan.cameron,
	dave.jiang, alison.schofield, vishal.l.verma, ira.weiny, david,
	Vilas.Sridharan
  Cc: linux-edac, linux-acpi, linux-mm, linux-kernel, bp, tony.luck,
	rafael, lenb, mchehab, leo.duran, Yazen.Ghannam, rientjes,
	jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen, gthelen,
	wschwartz, dferguson, wbs, nifan.cxl, tanxiaofei, prime.zeng,
	roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm, shiju.jose

shiju.jose@ wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
> 
> CXL spec 3.2 section 8.2.10.9.11.1 describes the device patrol scrub
> control feature. The device patrol scrub proactively locates and makes
> corrections to errors in regular cycle.
> 
> Allow specifying the number of hours within which the patrol scrub must be
> completed, subject to minimum and maximum limits reported by the device.
> Also allow disabling scrub allowing trade-off error rates against
> performance.
> 
> Add support for patrol scrub control on CXL memory devices.
> Register with the EDAC device driver, which retrieves the scrub attribute
> descriptors from EDAC scrub and exposes the sysfs scrub control attributes
> to userspace. For example, scrub control for the CXL memory device
> "cxl_mem0" is exposed in /sys/bus/edac/devices/cxl_mem0/scrubX/.
> 
> Additionally, add support for region-based CXL memory patrol scrub control.
> CXL memory regions may be interleaved across one or more CXL memory
> devices. For example, region-based scrub control for "cxl_region1" is
> exposed in /sys/bus/edac/devices/cxl_region1/scrubX/.
> 
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> ---
>  drivers/cxl/Kconfig       |  25 ++
>  drivers/cxl/core/Makefile |   1 +
>  drivers/cxl/core/edac.c   | 474 ++++++++++++++++++++++++++++++++++++++
>  drivers/cxl/core/region.c |   5 +
>  drivers/cxl/cxlmem.h      |  10 +
>  drivers/cxl/mem.c         |   4 +
>  6 files changed, 519 insertions(+)
>  create mode 100644 drivers/cxl/core/edac.c
> 
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 205547e5543a..b5ede1308425 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -113,6 +113,31 @@ config CXL_FEATURES
>  
>  	  If unsure say 'n'
>  
> +config CXL_EDAC_MEM_FEATURES
> +	bool "CXL: EDAC Memory Features"
> +	depends on EXPERT
> +	depends on CXL_MEM
> +	depends on CXL_FEATURES
> +	depends on EDAC >= CXL_BUS
> +	depends on EDAC_SCRUB
> +	help
> +	  The CXL EDAC memory feature control is optional and allows host
> +	  to control the EDAC memory features configurations of CXL memory
> +	  expander devices.
> +
> +	  When enabled 'cxl_mem' and 'cxl_region' EDAC devices are published
> +	  with memory scrub control attributes as described by
> +	  Documentation/ABI/testing/sysfs-edac-scrub.
> +
> +	  When enabled 'cxl_mem' EDAC devices are published with memory ECS
> +	  and repair control attributes as described by
> +	  Documentation/ABI/testing/sysfs-edac-ecs and
> +	  Documentation/ABI/testing/sysfs-edac-memory-repair respectively.
> +
> +	  Say 'y/m' if you have an expert need to change default settings
> +	  of a memory RAS feature established by the platform/device (eg.
> +	  scrub rates for the patrol scrub feature). otherwise say 'n'.
> +
>  config CXL_PORT
>  	default CXL_BUS
>  	tristate
> diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
> index 139b349b3a52..9b86fb22e5de 100644
> --- a/drivers/cxl/core/Makefile
> +++ b/drivers/cxl/core/Makefile
> @@ -19,4 +19,5 @@ cxl_core-y += ras.o
>  cxl_core-$(CONFIG_TRACING) += trace.o
>  cxl_core-$(CONFIG_CXL_REGION) += region.o
>  cxl_core-$(CONFIG_CXL_FEATURES) += features.o
> +cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += edac.o
>  cxl_core-$(CONFIG_CXL_MCE) += mce.o
> diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c
> new file mode 100644
> index 000000000000..5ec3535785e1
> --- /dev/null
> +++ b/drivers/cxl/core/edac.c
> @@ -0,0 +1,474 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * CXL EDAC memory feature driver.
> + *
> + * Copyright (c) 2024-2025 HiSilicon Limited.
> + *
> + *  - Supports functions to configure EDAC features of the
> + *    CXL memory devices.
> + *  - Registers with the EDAC device subsystem driver to expose
> + *    the features sysfs attributes to the user for configuring
> + *    CXL memory RAS feature.
> + */
> +
> +#include <linux/cleanup.h>
> +#include <linux/edac.h>
> +#include <linux/limits.h>
> +#include <cxl/features.h>
> +#include <cxl.h>
> +#include <cxlmem.h>
> +#include "core.h"
> +
> +#define CXL_NR_EDAC_DEV_FEATURES 1
> +
> +static struct rw_semaphore *cxl_acquire(struct rw_semaphore *rwsem)
> +{
> +	if (down_read_interruptible(rwsem))
> +		return NULL;
> +
> +	return rwsem;
> +}
> +
> +DEFINE_FREE(cxl_unlock, struct rw_semaphore *, if (_T) up_read(_T))

I know I suggested cxl_acquire() and cxl_unlock(), but this really is a
generic facility.

Let's call it rwsem_read_intr_acquire() and rwsem_read_release(), and
I'll follow up later with Peter to see if he wants this to graduate from
CXL.

Also, go ahead and define it in cxl.h for now as I think other places in
the subsystem could benefit from this approach.

> +
> +/*
> + * CXL memory patrol scrub control
> + */
> +struct cxl_patrol_scrub_context {

I like "patrol_scrub" spelled out here compared to "ps" used everywhere
else.

> +	u8 instance;
> +	u16 get_feat_size;
> +	u16 set_feat_size;
> +	u8 get_version;
> +	u8 set_version;
> +	u16 effects;
> +	struct cxl_memdev *cxlmd;
> +	struct cxl_region *cxlr;
> +};
> +
> +/**
> + * struct cxl_memdev_ps_params - CXL memory patrol scrub parameter data structure.
> + * @enable:     [IN & OUT] enable(1)/disable(0) patrol scrub.
> + * @scrub_cycle_changeable: [OUT] scrub cycle attribute of patrol scrub is changeable.
> + * @scrub_cycle_hrs:    [IN] Requested patrol scrub cycle in hours.
> + *                      [OUT] Current patrol scrub cycle in hours.
> + * @min_scrub_cycle_hrs:[OUT] minimum patrol scrub cycle in hours supported.
> + */
> +struct cxl_memdev_ps_params {
> +	bool enable;
> +	bool scrub_cycle_changeable;

This is set but unused. Even if it were to be used I would expect it to
be set in the cxl_patrol_scrub_context.

> +	u8 scrub_cycle_hrs;
> +	u8 min_scrub_cycle_hrs;
> +};

I do not understand the point of this extra object and would prefer to
keep intermediate data structures to a minimum.

It looks like all this does is provide for short lived parsed caching of
the raw hardware patrol scrube attributes. Just pass those raw objects
around and provide helpers to do the conversion.

The less data structures the less confusion for the next person that has
to read this code a few years down the road.

> +
> +enum cxl_scrub_param {
> +	CXL_PS_PARAM_ENABLE,
> +	CXL_PS_PARAM_SCRUB_CYCLE,
> +};

This seems unforuntate, why not make non-zero scrub rate an implied
enable and zero to disable? A non-zero sentinel value like U32_MAX can
indicate "keep scrub rate unchanged".

> +#define CXL_MEMDEV_PS_SCRUB_CYCLE_CHANGE_CAP_MASK BIT(0)

This CXL_MEMDEV_PS prefix is awkward due to overload with 'struct
cxl_memdev'. Minimize it to something like:

CXL_SCRUB_CONTROL_CHANGEABLE
CXL_SCRUB_CONTROL_REALTIME
CXL_SCRUB_CONTROL_CYCLE_MASK
CXL_SCRUB_CONTROL_MIN_CYCLE_MASK

> +#define CXL_MEMDEV_PS_SCRUB_CYCLE_REALTIME_REPORT_CAP_MASK BIT(1)
> +#define CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK GENMASK(7, 0)
> +#define CXL_MEMDEV_PS_MIN_SCRUB_CYCLE_MASK GENMASK(15, 8)
> +#define CXL_MEMDEV_PS_FLAG_ENABLED_MASK BIT(0)

CXL_SCRUB_CONTROL_ENABLE

...no need to call it a mask when it is just a single-bit, and when it
is both the status and the control just call it "enable".

> +
> +/*
> + * See CXL spec rev 3.2 @8.2.10.9.11.1 Table 8-222 Device Patrol Scrub Control
> + * Feature Readable Attributes.
> + */
> +struct cxl_memdev_ps_rd_attrs {
> +	u8 scrub_cycle_cap;
> +	__le16 scrub_cycle_hrs;

"hours" is just 2 more characters than "hrs", I think we can afford the
extra bytes.

[..]
> +int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd)
> +{
> +	struct edac_dev_feature ras_features[CXL_NR_EDAC_DEV_FEATURES];
> +	int num_ras_features = 0;
> +	u8 scrub_inst = 0;
> +	int rc;
> +
> +	rc = cxl_memdev_scrub_init(cxlmd, &ras_features[num_ras_features],
> +				   scrub_inst);
> +	if (rc < 0 && rc != -EOPNOTSUPP)
> +		return rc;
> +
> +	if (rc != -EOPNOTSUPP)
> +		num_ras_features++;
> +
> +	char *cxl_dev_name __free(kfree) =
> +		kasprintf(GFP_KERNEL, "cxl_%s", dev_name(&cxlmd->dev));

if (!cxl_dev_name)
	return -ENOMEM;

> +
> +	return edac_dev_register(&cxlmd->dev, cxl_dev_name, NULL,
> +				 num_ras_features, ras_features);
> +}
> +EXPORT_SYMBOL_NS_GPL(devm_cxl_memdev_edac_register, "CXL");
> +
> +int devm_cxl_region_edac_register(struct cxl_region *cxlr)
> +{
> +	struct edac_dev_feature ras_features[CXL_NR_EDAC_DEV_FEATURES];
> +	int num_ras_features = 0;
> +	u8 scrub_inst = 0;
> +	int rc;
> +
> +	rc = cxl_region_scrub_init(cxlr, &ras_features[num_ras_features],
> +				   scrub_inst);
> +	if (rc < 0)
> +		return rc;
> +
> +	num_ras_features++;
> +
> +	char *cxl_dev_name __free(kfree) =
> +		kasprintf(GFP_KERNEL, "cxl_%s", dev_name(&cxlr->dev));
> +
> +	return edac_dev_register(&cxlr->dev, cxl_dev_name, NULL,
> +				 num_ras_features, ras_features);
> +}
> +EXPORT_SYMBOL_NS_GPL(devm_cxl_region_edac_register, "CXL");
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index b3260d433ec7..2aa6eb675fdf 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -3542,6 +3542,11 @@ static int cxl_region_probe(struct device *dev)
>  	case CXL_PARTMODE_PMEM:
>  		return devm_cxl_add_pmem_region(cxlr);
>  	case CXL_PARTMODE_RAM:
> +		rc = devm_cxl_region_edac_register(cxlr);

Why do only volatile regions get EDAC support? PMEM patrol scrub seems
equally valid.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v2 1/8] cxl: Add helper function to retrieve a feature entry
  2025-03-26 21:32   ` Dan Williams
@ 2025-03-27 16:59     ` Shiju Jose
  2025-03-31 23:35       ` Dan Williams
  0 siblings, 1 reply; 24+ messages in thread
From: Shiju Jose @ 2025-03-27 16:59 UTC (permalink / raw)
  To: Dan Williams, linux-cxl, dave, Jonathan Cameron, dave.jiang,
	alison.schofield, vishal.l.verma, ira.weiny, david,
	Vilas.Sridharan
  Cc: linux-edac, linux-acpi, linux-mm, linux-kernel, bp, tony.luck,
	rafael, lenb, mchehab, leo.duran, Yazen.Ghannam, rientjes,
	jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen, gthelen,
	wschwartz, dferguson, wbs, nifan.cxl, tanxiaofei, Zengtao (B),
	Roberto Sassu, kangkang.shen, wanghuiqiang, Linuxarm

>-----Original Message-----
>From: Dan Williams <dan.j.williams@intel.com>
>Sent: 26 March 2025 21:33
>To: Shiju Jose <shiju.jose@huawei.com>; linux-cxl@vger.kernel.org;
>dan.j.williams@intel.com; dave@stgolabs.net; Jonathan Cameron
><jonathan.cameron@huawei.com>; dave.jiang@intel.com;
>alison.schofield@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>david@redhat.com; Vilas.Sridharan@amd.com
>Cc: linux-edac@vger.kernel.org; linux-acpi@vger.kernel.org; linux-
>mm@kvack.org; linux-kernel@vger.kernel.org; bp@alien8.de;
>tony.luck@intel.com; rafael@kernel.org; lenb@kernel.org;
>mchehab@kernel.org; leo.duran@amd.com; Yazen.Ghannam@amd.com;
>rientjes@google.com; jiaqiyan@google.com; Jon.Grimm@amd.com;
>dave.hansen@linux.intel.com; naoya.horiguchi@nec.com;
>james.morse@arm.com; jthoughton@google.com; somasundaram.a@hpe.com;
>erdemaktas@google.com; pgonda@google.com; duenwen@google.com;
>gthelen@google.com; wschwartz@amperecomputing.com;
>dferguson@amperecomputing.com; wbs@os.amperecomputing.com;
>nifan.cxl@gmail.com; tanxiaofei <tanxiaofei@huawei.com>; Zengtao (B)
><prime.zeng@hisilicon.com>; Roberto Sassu <roberto.sassu@huawei.com>;
>kangkang.shen@futurewei.com; wanghuiqiang <wanghuiqiang@huawei.com>;
>Linuxarm <linuxarm@huawei.com>; Shiju Jose <shiju.jose@huawei.com>
>Subject: Re: [PATCH v2 1/8] cxl: Add helper function to retrieve a feature entry
>
>shiju.jose@ wrote:
>> From: Shiju Jose <shiju.jose@huawei.com>
>>
>> Add helper function to retrieve a feature entry from the supported
>> features list, if supported.
>>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Reviewed-by: Fan Ni <fan.ni@samsung.com>
>> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
>> ---
>>  drivers/cxl/core/core.h     |  2 ++
>>  drivers/cxl/core/features.c | 23 +++++++++++++++++++++++
>>  2 files changed, 25 insertions(+)
>>
>> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h index
>> 1803aedb25ca..16bc717376fc 100644
>> --- a/drivers/cxl/core/core.h
>> +++ b/drivers/cxl/core/core.h
>> @@ -123,6 +123,8 @@ int cxl_ras_init(void);  void cxl_ras_exit(void);
>>
>>  #ifdef CONFIG_CXL_FEATURES
>> +struct cxl_feat_entry *cxl_get_feature_entry(struct cxl_dev_state *cxlds,
>> +					     const uuid_t *feat_uuid);
>
>It is unfortunate that this naming choice is too similar to cxl_get_feature().
>However, as I go to suggest a new name I find that this is a duplicate of
>get_support_feature_info() in Dave's fwctl series. Just drop this patch in favor of
>that.

Hi Dan,

I am fine to use get_support_feature_info() for the EDAC features. 
However this function is defined as static in the fwctl series and 
takes struct fwctl_rpc_cxl * as input for RPC instead of  uuid_t *
as in cxl_get_feature_entry().

static struct cxl_feat_entry *
get_support_feature_info(struct cxl_features_state *cxlfs,
			 const struct fwctl_rpc_cxl *rpc_in)

Can you suggest how to use get_support_feature_info() from within the CXL driver 
to retrieve a supported feature entry (for e.g an EDAC feature)?

>

Thanks,
Shiju


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 7/8] cxl/memfeature: Add CXL memory device soft PPR control feature
  2025-03-20 18:04 ` [PATCH v2 7/8] cxl/memfeature: Add CXL memory device soft PPR control feature shiju.jose
@ 2025-03-27 17:01   ` Borislav Petkov
  2025-03-27 17:08     ` Borislav Petkov
  0 siblings, 1 reply; 24+ messages in thread
From: Borislav Petkov @ 2025-03-27 17:01 UTC (permalink / raw)
  To: shiju.jose
  Cc: linux-cxl, dan.j.williams, dave, jonathan.cameron, dave.jiang,
	alison.schofield, vishal.l.verma, ira.weiny, david,
	Vilas.Sridharan, linux-edac, linux-acpi, linux-mm, linux-kernel,
	tony.luck, rafael, lenb, mchehab, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, gthelen, wschwartz, dferguson, wbs, nifan.cxl,
	tanxiaofei, prime.zeng, roberto.sassu, kangkang.shen,
	wanghuiqiang, linuxarm

On Thu, Mar 20, 2025 at 06:04:44PM +0000, shiju.jose@huawei.com wrote:
> diff --git a/drivers/edac/mem_repair.c b/drivers/edac/mem_repair.c
> index 3b1a845457b0..bf7e01a8b4dd 100755
> --- a/drivers/edac/mem_repair.c
> +++ b/drivers/edac/mem_repair.c
> @@ -45,6 +45,11 @@ struct edac_mem_repair_context {
>  	struct attribute_group group;
>  };
>  
> +const char * const edac_repair_type[] = {
> +	[EDAC_PPR] = "ppr",
> +};
> +EXPORT_SYMBOL_GPL(edac_repair_type);

Why is this thing exported instead of adding a getter function and having all
its users pass in proper defines as arguments?

And "EDAC_PPR" is not a proper define - it doesn't tell me what it is.

It should be more likely a

EDAC_REPAIR_PPR,
EDAC_REPAIR_ROW_SPARING,
EDAC_REPAIR_BANK_SPARING,

and so on.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 7/8] cxl/memfeature: Add CXL memory device soft PPR control feature
  2025-03-27 17:01   ` Borislav Petkov
@ 2025-03-27 17:08     ` Borislav Petkov
  2025-03-28 13:05       ` Shiju Jose
  0 siblings, 1 reply; 24+ messages in thread
From: Borislav Petkov @ 2025-03-27 17:08 UTC (permalink / raw)
  To: shiju.jose
  Cc: linux-cxl, dan.j.williams, dave, jonathan.cameron, dave.jiang,
	alison.schofield, vishal.l.verma, ira.weiny, david,
	Vilas.Sridharan, linux-edac, linux-acpi, linux-mm, linux-kernel,
	tony.luck, rafael, lenb, mchehab, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, gthelen, wschwartz, dferguson, wbs, nifan.cxl,
	tanxiaofei, prime.zeng, roberto.sassu, kangkang.shen,
	wanghuiqiang, linuxarm

On Thu, Mar 27, 2025 at 06:01:56PM +0100, Borislav Petkov wrote:
> On Thu, Mar 20, 2025 at 06:04:44PM +0000, shiju.jose@huawei.com wrote:
> > diff --git a/drivers/edac/mem_repair.c b/drivers/edac/mem_repair.c
> > index 3b1a845457b0..bf7e01a8b4dd 100755
> > --- a/drivers/edac/mem_repair.c
> > +++ b/drivers/edac/mem_repair.c
> > @@ -45,6 +45,11 @@ struct edac_mem_repair_context {
> >  	struct attribute_group group;
> >  };
> >  
> > +const char * const edac_repair_type[] = {
> > +	[EDAC_PPR] = "ppr",
> > +};
> > +EXPORT_SYMBOL_GPL(edac_repair_type);
> 
> Why is this thing exported instead of adding a getter function and having all
> its users pass in proper defines as arguments?
> 
> And "EDAC_PPR" is not a proper define - it doesn't tell me what it is.
> 
> It should be more likely a
> 
> EDAC_REPAIR_PPR,
> EDAC_REPAIR_ROW_SPARING,
> EDAC_REPAIR_BANK_SPARING,
> 
> and so on.

Looking at this more:

+static int cxl_ppr_get_repair_type(struct device *dev, void *drv_data,
+				   const char **repair_type)
+{
+	*repair_type = edac_repair_type[EDAC_PPR];
+
+	return 0;
+}

Can this be any more silly?

An ops member which copies a string pointer into some argument. What for?

If those strings are for userspace, why don't you simply return *numbers* and
let userspace convert them into strings?

Oh boy.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 4/8] cxl/edac: Add CXL memory device ECS control feature
  2025-03-20 18:04 ` [PATCH v2 4/8] cxl/edac: Add CXL memory device ECS " shiju.jose
@ 2025-03-27 17:12   ` Dan Williams
  0 siblings, 0 replies; 24+ messages in thread
From: Dan Williams @ 2025-03-27 17:12 UTC (permalink / raw)
  To: shiju.jose, linux-cxl, dan.j.williams, dave, jonathan.cameron,
	dave.jiang, alison.schofield, vishal.l.verma, ira.weiny, david,
	Vilas.Sridharan
  Cc: linux-edac, linux-acpi, linux-mm, linux-kernel, bp, tony.luck,
	rafael, lenb, mchehab, leo.duran, Yazen.Ghannam, rientjes,
	jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen, gthelen,
	wschwartz, dferguson, wbs, nifan.cxl, tanxiaofei, prime.zeng,
	roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm, shiju.jose

shiju.jose@ wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
> 
> CXL spec 3.2 section 8.2.10.9.11.2 describes the DDR5 ECS (Error Check
> Scrub) control feature.
> The Error Check Scrub (ECS) is a feature defined in JEDEC DDR5 SDRAM
> Specification (JESD79-5) and allows the DRAM to internally read, correct
> single-bit errors, and write back corrected data bits to the DRAM array
> while providing transparency to error counts.
> 
> The ECS control allows the requester to change the log entry type, the ECS
> threshold count (provided the request falls within the limits specified in
> DDR5 mode registers), switch between codeword mode and row count mode, and
> reset the ECS counter.
> 
> Register with EDAC device driver, which retrieves the ECS attribute
> descriptors from the EDAC ECS and exposes the ECS control attributes to
> userspace via sysfs. For example, the ECS control for the memory media FRU0
> in CXL mem0 device is located at /sys/bus/edac/devices/cxl_mem0/ecs_fru0/
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Fan Ni <fan.ni@samsung.com>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>

With PMEM we were careful to route repair requests through the driver
via a write mechanism so the filesystem, or other components that might
have logged metdata about the error, could adjust its record.

I assume that is not a concern in this case because corrected errors
would not trigger poison on read affects, i.e. the repair here is not
bringing poison back into service, it is merely refreshing the cell?

Is there documentation about when an operator would want to do this
explicitly and why patrol scrub is not suitable to rely up for
automatically doing this work?

> ---
>  drivers/cxl/Kconfig     |   1 +
>  drivers/cxl/core/edac.c | 353 +++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 353 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index b5ede1308425..1c67bf844993 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -120,6 +120,7 @@ config CXL_EDAC_MEM_FEATURES
>  	depends on CXL_FEATURES
>  	depends on EDAC >= CXL_BUS
>  	depends on EDAC_SCRUB
> +	depends on EDAC_ECS

It is not clear to me that someone who wants patrol scrub control also
wants ECS. Can you make the features individually selectable?

I.e. something like

config CXL_EDAC_ECS
	bool "Enable CXL Error Check Scrub (Repair)
	depends on CXL_EDAC_MEM_FEATURES
	...description...

config CXL_EDAC_SCRUB
	bool "Enable CXL Patrol Scrub Control (Patrol Read)
	depends on CXL_EDAC_MEM_FEATURES
	...description...


>  	help
>  	  The CXL EDAC memory feature control is optional and allows host
>  	  to control the EDAC memory features configurations of CXL memory
> diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c
> index 5ec3535785e1..1110685ed41a 100644
> --- a/drivers/cxl/core/edac.c
> +++ b/drivers/cxl/core/edac.c
> @@ -19,7 +19,7 @@
>  #include <cxlmem.h>
>  #include "core.h"
>  
> -#define CXL_NR_EDAC_DEV_FEATURES 1
> +#define CXL_NR_EDAC_DEV_FEATURES 2
>  
>  static struct rw_semaphore *cxl_acquire(struct rw_semaphore *rwsem)
>  {
> @@ -428,6 +428,350 @@ static int cxl_region_scrub_init(struct cxl_region *cxlr,
>  	return 0;
>  }
>  
> +/*
> + * CXL DDR5 ECS control definitions.
> + */
> +struct cxl_ecs_context {
> +	u16 num_media_frus;
> +	u16 get_feat_size;
> +	u16 set_feat_size;
> +	u8 get_version;
> +	u8 set_version;
> +	u16 effects;
> +	struct cxl_memdev *cxlmd;
> +};
> +
> +enum {
> +	CXL_ECS_PARAM_LOG_ENTRY_TYPE,
> +	CXL_ECS_PARAM_THRESHOLD,
> +	CXL_ECS_PARAM_MODE,
> +	CXL_ECS_PARAM_RESET_COUNTER,
> +};
> +
> +#define CXL_ECS_LOG_ENTRY_TYPE_MASK GENMASK(1, 0)
> +#define CXL_ECS_REALTIME_REPORT_CAP_MASK BIT(0)
> +#define CXL_ECS_THRESHOLD_COUNT_MASK GENMASK(2, 0)
> +#define CXL_ECS_COUNT_MODE_MASK BIT(3)
> +#define CXL_ECS_RESET_COUNTER_MASK BIT(4)
> +#define CXL_ECS_RESET_COUNTER 1
> +
> +enum {
> +	ECS_THRESHOLD_256 = 256,
> +	ECS_THRESHOLD_1024 = 1024,
> +	ECS_THRESHOLD_4096 = 4096,
> +};
> +
> +enum {
> +	ECS_THRESHOLD_IDX_256 = 3,
> +	ECS_THRESHOLD_IDX_1024 = 4,
> +	ECS_THRESHOLD_IDX_4096 = 5,
> +};
> +
> +static const u16 ecs_supp_threshold[] = {
> +	[ECS_THRESHOLD_IDX_256] = 256,
> +	[ECS_THRESHOLD_IDX_1024] = 1024,
> +	[ECS_THRESHOLD_IDX_4096] = 4096,
> +};
> +
> +enum {
> +	ECS_LOG_ENTRY_TYPE_DRAM = 0x0,
> +	ECS_LOG_ENTRY_TYPE_MEM_MEDIA_FRU = 0x1,
> +};
> +
> +enum cxl_ecs_count_mode {
> +	ECS_MODE_COUNTS_ROWS = 0,
> +	ECS_MODE_COUNTS_CODEWORDS = 1,
> +};
> +
> +/**
> + * struct cxl_ecs_params - CXL memory DDR5 ECS parameter data structure.
> + * @threshold: ECS threshold count per GB of memory cells.
> + * @log_entry_type: ECS log entry type, per DRAM or per memory media FRU.
> + * @reset_counter: [IN] reset ECC counter to default value.
> + * @count_mode: codeword/row count mode
> + *		0 : ECS counts rows with errors
> + *		1 : ECS counts codeword with errors
> + */
> +struct cxl_ecs_params {
> +	u16 threshold;
> +	u8 log_entry_type;
> +	u8 reset_counter;
> +	enum cxl_ecs_count_mode count_mode;

Similar comment as last patch about having a superfluous intermediate
object rather than hw objects + helpers.

> +};
> +
> +/*
> + * See CXL spec rev 3.2 @8.2.10.9.11.2 Table 8-225 DDR5 ECS Control Feature
> + * Readable Attributes.
> + */
> +struct cxl_ecs_fru_rd_attrs {
> +	u8 ecs_cap;
> +	__le16 ecs_config;
> +	u8 ecs_flags;
> +} __packed;
> +
> +struct cxl_ecs_rd_attrs {
> +	u8 ecs_log_cap;
> +	struct cxl_ecs_fru_rd_attrs fru_attrs[];
> +} __packed;
> +
> +/*
> + * See CXL spec rev 3.2 @8.2.10.9.11.2 Table 8-226 DDR5 ECS Control Feature
> + * Writable Attributes.
> + */
> +struct cxl_ecs_fru_wr_attrs {
> +	__le16 ecs_config;
> +} __packed;
> +
> +struct cxl_ecs_wr_attrs {
> +	u8 ecs_log_cap;
> +	struct cxl_ecs_fru_wr_attrs fru_attrs[];
> +} __packed;
> +
> +/*
> + * CXL DDR5 ECS control functions.
> + */
> +static int cxl_mem_ecs_get_attrs(struct device *dev,
> +				 struct cxl_ecs_context *cxl_ecs_ctx,
> +				 int fru_id, struct cxl_ecs_params *params)
> +{
> +	struct cxl_memdev *cxlmd = cxl_ecs_ctx->cxlmd;
> +	struct cxl_mailbox *cxl_mbox = &cxlmd->cxlds->cxl_mbox;
> +	struct cxl_ecs_fru_rd_attrs *fru_rd_attrs;
> +	size_t rd_data_size;
> +	u8 threshold_index;
> +	size_t data_size;
> +	u16 ecs_config;
> +
> +	rd_data_size = cxl_ecs_ctx->get_feat_size;
> +
> +	struct cxl_ecs_rd_attrs *rd_attrs __free(kvfree) =
> +		kvzalloc(rd_data_size, GFP_KERNEL);
> +	if (!rd_attrs)
> +		return -ENOMEM;
> +
> +	params->log_entry_type = 0;
> +	params->threshold = 0;
> +	params->count_mode = 0;
> +	data_size = cxl_get_feature(cxl_mbox, &CXL_FEAT_ECS_UUID,
> +				    CXL_GET_FEAT_SEL_CURRENT_VALUE, rd_attrs,
> +				    rd_data_size, 0, NULL);
> +	if (!data_size)
> +		return -EIO;
> +
> +	fru_rd_attrs = rd_attrs->fru_attrs;
> +	params->log_entry_type =
> +		FIELD_GET(CXL_ECS_LOG_ENTRY_TYPE_MASK, rd_attrs->ecs_log_cap);
> +	ecs_config = le16_to_cpu(fru_rd_attrs[fru_id].ecs_config);
> +	threshold_index = FIELD_GET(CXL_ECS_THRESHOLD_COUNT_MASK, ecs_config);
> +	params->threshold = ecs_supp_threshold[threshold_index];
> +	params->count_mode = FIELD_GET(CXL_ECS_COUNT_MODE_MASK, ecs_config);
> +	return 0;
> +}
> +
> +static int cxl_mem_ecs_set_attrs(struct device *dev,
> +				 struct cxl_ecs_context *cxl_ecs_ctx,
> +				 int fru_id, struct cxl_ecs_params *params,
> +				 u8 param_type)
> +{
> +	struct cxl_memdev *cxlmd = cxl_ecs_ctx->cxlmd;
> +	struct cxl_mailbox *cxl_mbox = &cxlmd->cxlds->cxl_mbox;
> +	struct cxl_ecs_fru_rd_attrs *fru_rd_attrs;
> +	struct cxl_ecs_fru_wr_attrs *fru_wr_attrs;
> +	size_t rd_data_size, wr_data_size;
> +	u16 num_media_frus, count;
> +	size_t data_size;
> +	u16 ecs_config;
> +
> +	num_media_frus = cxl_ecs_ctx->num_media_frus;
> +	rd_data_size = cxl_ecs_ctx->get_feat_size;
> +	wr_data_size = cxl_ecs_ctx->set_feat_size;
> +	struct cxl_ecs_rd_attrs *rd_attrs __free(kvfree) =
> +		kvzalloc(rd_data_size, GFP_KERNEL);
> +	if (!rd_attrs)
> +		return -ENOMEM;
> +
> +	data_size = cxl_get_feature(cxl_mbox, &CXL_FEAT_ECS_UUID,
> +				    CXL_GET_FEAT_SEL_CURRENT_VALUE, rd_attrs,
> +				    rd_data_size, 0, NULL);
> +	if (!data_size)
> +		return -EIO;
> +
> +	struct cxl_ecs_wr_attrs *wr_attrs __free(kvfree) =
> +		kvzalloc(wr_data_size, GFP_KERNEL);
> +	if (!wr_attrs)
> +		return -ENOMEM;
> +
> +	/*
> +	 * Fill writable attributes from the current attributes read
> +	 * for all the media FRUs.
> +	 */
> +	fru_rd_attrs = rd_attrs->fru_attrs;
> +	fru_wr_attrs = wr_attrs->fru_attrs;
> +	wr_attrs->ecs_log_cap = rd_attrs->ecs_log_cap;
> +	for (count = 0; count < num_media_frus; count++)
> +		fru_wr_attrs[count].ecs_config = fru_rd_attrs[count].ecs_config;
> +
> +	/* Fill attribute to be set for the media FRU */
> +	ecs_config = le16_to_cpu(fru_rd_attrs[fru_id].ecs_config);
> +	switch (param_type) {
> +	case CXL_ECS_PARAM_LOG_ENTRY_TYPE:
> +		if (params->log_entry_type != ECS_LOG_ENTRY_TYPE_DRAM &&
> +		    params->log_entry_type != ECS_LOG_ENTRY_TYPE_MEM_MEDIA_FRU)
> +			return -EINVAL;
> +
> +		wr_attrs->ecs_log_cap = FIELD_PREP(CXL_ECS_LOG_ENTRY_TYPE_MASK,
> +						   params->log_entry_type);
> +		break;
> +	case CXL_ECS_PARAM_THRESHOLD:
> +		ecs_config &= ~CXL_ECS_THRESHOLD_COUNT_MASK;
> +		switch (params->threshold) {
> +		case ECS_THRESHOLD_256:
> +			ecs_config |= FIELD_PREP(CXL_ECS_THRESHOLD_COUNT_MASK,
> +						 ECS_THRESHOLD_IDX_256);
> +			break;
> +		case ECS_THRESHOLD_1024:
> +			ecs_config |= FIELD_PREP(CXL_ECS_THRESHOLD_COUNT_MASK,
> +						 ECS_THRESHOLD_IDX_1024);
> +			break;
> +		case ECS_THRESHOLD_4096:
> +			ecs_config |= FIELD_PREP(CXL_ECS_THRESHOLD_COUNT_MASK,
> +						 ECS_THRESHOLD_IDX_4096);
> +			break;
> +		default:
> +			dev_dbg(dev,
> +				"Invalid CXL ECS scrub threshold count(%d) to set\n",
> +				params->threshold);
> +			dev_dbg(dev,
> +				"Supported scrub threshold counts: %u, %u, %u\n",
> +				ECS_THRESHOLD_256, ECS_THRESHOLD_1024,
> +				ECS_THRESHOLD_4096);
> +			return -EINVAL;
> +		}
> +		break;
> +	case CXL_ECS_PARAM_MODE:
> +		if (params->count_mode != ECS_MODE_COUNTS_ROWS &&
> +		    params->count_mode != ECS_MODE_COUNTS_CODEWORDS) {
> +			dev_dbg(dev, "Invalid CXL ECS scrub mode(%d) to set\n",
> +				params->count_mode);
> +			dev_dbg(dev,
> +				"Supported ECS Modes: 0: ECS counts rows with errors,"
> +				" 1: ECS counts codewords with errors\n");
> +			return -EINVAL;
> +		}
> +		ecs_config &= ~CXL_ECS_COUNT_MODE_MASK;
> +		ecs_config |=
> +			FIELD_PREP(CXL_ECS_COUNT_MODE_MASK, params->count_mode);
> +		break;
> +	case CXL_ECS_PARAM_RESET_COUNTER:
> +		if (params->reset_counter != CXL_ECS_RESET_COUNTER)
> +			return -EINVAL;
> +
> +		ecs_config &= ~CXL_ECS_RESET_COUNTER_MASK;
> +		ecs_config |= FIELD_PREP(CXL_ECS_RESET_COUNTER_MASK,
> +					 params->reset_counter);
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +	fru_wr_attrs[fru_id].ecs_config = cpu_to_le16(ecs_config);
> +
> +	return cxl_set_feature(cxl_mbox, &CXL_FEAT_ECS_UUID,
> +			       cxl_ecs_ctx->set_version, wr_attrs, wr_data_size,
> +			       CXL_SET_FEAT_FLAG_DATA_SAVED_ACROSS_RESET, 0,
> +			       NULL);
> +}
> +
> +#define CXL_ECS_GET_ATTR(attrib)					    \
> +	static int cxl_ecs_get_##attrib(struct device *dev, void *drv_data, \
> +					int fru_id, u32 *val)		    \
> +	{								    \
> +		struct cxl_ecs_context *ctx = drv_data;			    \
> +		struct cxl_ecs_params params;				    \
> +		int ret;						    \
> +									    \
> +		ret = cxl_mem_ecs_get_attrs(dev, ctx, fru_id, &params);	    \
> +		if (ret)						    \
> +			return ret;					    \
> +									    \
> +		*val = params.attrib;					    \
> +									    \
> +		return 0;						    \
> +	}
> +
> +CXL_ECS_GET_ATTR(log_entry_type)
> +CXL_ECS_GET_ATTR(count_mode)
> +CXL_ECS_GET_ATTR(threshold)
> +
> +#define CXL_ECS_SET_ATTR(attrib, param_type)				    \
> +	static int cxl_ecs_set_##attrib(struct device *dev, void *drv_data, \
> +					int fru_id, u32 val)		    \
> +	{								    \
> +		struct cxl_ecs_context *ctx = drv_data;			    \
> +		struct cxl_ecs_params params = {			    \
> +			.attrib = val,					    \
> +		};							    \
> +									    \

I failed to comment on this on the previous patch, but for all of these
paths that affect I/O to media lets add a:

     if (!capable(CAP_SYS_RAWIO))
             return -EPERM;

...just to constrain the ability of drop-privelege-root to affect
hardware memory media state.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 5/8] cxl/mbox: Add support for PERFORM_MAINTENANCE mailbox command
  2025-03-20 18:04 ` [PATCH v2 5/8] cxl/mbox: Add support for PERFORM_MAINTENANCE mailbox command shiju.jose
@ 2025-03-27 17:23   ` Dan Williams
  0 siblings, 0 replies; 24+ messages in thread
From: Dan Williams @ 2025-03-27 17:23 UTC (permalink / raw)
  To: shiju.jose, linux-cxl, dan.j.williams, dave, jonathan.cameron,
	dave.jiang, alison.schofield, vishal.l.verma, ira.weiny, david,
	Vilas.Sridharan
  Cc: linux-edac, linux-acpi, linux-mm, linux-kernel, bp, tony.luck,
	rafael, lenb, mchehab, leo.duran, Yazen.Ghannam, rientjes,
	jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen, gthelen,
	wschwartz, dferguson, wbs, nifan.cxl, tanxiaofei, prime.zeng,
	roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm, shiju.jose

shiju.jose@ wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
> 
> Add support for PERFORM_MAINTENANCE mailbox command.
> 
> CXL spec 3.2 section 8.2.10.7.1 describes the Perform Maintenance command.
> This command requests the device to execute the maintenance operation
> specified by the maintenance operation class and the maintenance operation
> subclass.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> ---
>  drivers/cxl/core/mbox.c | 34 ++++++++++++++++++++++++++++++++++
>  drivers/cxl/cxlmem.h    | 17 +++++++++++++++++
>  2 files changed, 51 insertions(+)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index d72764056ce6..19d46a284650 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -824,6 +824,40 @@ static const uuid_t log_uuid[] = {
>  	[VENDOR_DEBUG_UUID] = DEFINE_CXL_VENDOR_DEBUG_UUID,
>  };
>  
> +int cxl_do_maintenance(struct cxl_mailbox *cxl_mbox,
> +		       u8 class, u8 subclass,
> +		       void *data_in, size_t data_in_size)
> +{
> +	struct cxl_memdev_maintenance_pi {
> +		struct cxl_mbox_do_maintenance_hdr hdr;

Please call this "perform_maintenance" because "do_" is usually a
Linux-ism for a core helper.

Also fold this patch into the caller that needs it.

> +		u8 data[];
> +	}  __packed;
> +	struct cxl_mbox_cmd mbox_cmd;
> +	size_t hdr_size;
> +
> +	struct cxl_memdev_maintenance_pi *pi __free(kfree) =
> +					kmalloc(cxl_mbox->payload_size, GFP_KERNEL);

s/kmalloc/kvzalloc/

	if (!pi)
		return -ENOMEM;

> +	pi->hdr.op_class = class;
> +	pi->hdr.op_subclass = subclass;
> +	hdr_size = sizeof(pi->hdr);
> +	/*
> +	 * Check minimum mbox payload size is available for
> +	 * the maintenance data transfer.
> +	 */
> +	if (hdr_size + data_in_size > cxl_mbox->payload_size)
> +		return -ENOMEM;

	-EINVAL
> +
> +	memcpy(pi->data, data_in, data_in_size);
> +	mbox_cmd = (struct cxl_mbox_cmd) {
> +		.opcode = CXL_MBOX_OP_DO_MAINTENANCE,
> +		.size_in = hdr_size + data_in_size,
> +		.payload_in = pi,
> +	};
> +
> +	return cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_do_maintenance, "CXL");

Why? There is nothing in this function that needs the rest of mbox.c
beyond cxl_internal_send_cmd which is already exported. Just define this
in the only object that needs it.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 6/8] cxl: Support for finding memory operation attributes from the current boot
  2025-03-20 18:04 ` [PATCH v2 6/8] cxl: Support for finding memory operation attributes from the current boot shiju.jose
@ 2025-03-27 17:43   ` Dan Williams
  0 siblings, 0 replies; 24+ messages in thread
From: Dan Williams @ 2025-03-27 17:43 UTC (permalink / raw)
  To: shiju.jose, linux-cxl, dan.j.williams, dave, jonathan.cameron,
	dave.jiang, alison.schofield, vishal.l.verma, ira.weiny, david,
	Vilas.Sridharan
  Cc: linux-edac, linux-acpi, linux-mm, linux-kernel, bp, tony.luck,
	rafael, lenb, mchehab, leo.duran, Yazen.Ghannam, rientjes,
	jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen, gthelen,
	wschwartz, dferguson, wbs, nifan.cxl, tanxiaofei, prime.zeng,
	roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm, shiju.jose

shiju.jose@ wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
> 
> Certain operations on memory, such as memory repair, are permitted
> only when the address and other attributes for the operation are
> from the current boot. This is determined by checking whether the
> memory attributes for the operation match those in the CXL gen_media
> or CXL DRAM memory event records reported during the current boot.
> 
> The CXL event records must be backed up because they are cleared
> in the hardware after being processed by the kernel.
> 
> Support is added for storing CXL gen_media or CXL DRAM memory event
> records in xarrays. Additionally, helper functions are implemented
> to find a matching record in the xarray storage based on the memory
> attributes and repair type.
> 
> Add validity check, when matching attributes for sparing, using
> the validity flag in the DRAM event record, to ensure that all
> required attributes for a requested repair operation are valid and
> set.
> 
> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> ---
>  drivers/cxl/core/mbox.c   |  11 ++-
>  drivers/cxl/core/memdev.c |   9 +++
>  drivers/cxl/core/ras.c    | 145 ++++++++++++++++++++++++++++++++++++++

I thought we agreed to call the file edac.c since "ras" concepts are
spread throughout the driver?

>  drivers/cxl/cxlmem.h      |  46 ++++++++++++
>  drivers/cxl/pci.c         |   3 +
>  5 files changed, 212 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 19d46a284650..c9328f1b6464 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -956,12 +956,19 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
>  				hpa_alias = hpa - cache_size;
>  		}
>  
> -		if (event_type == CXL_CPER_EVENT_GEN_MEDIA)
> +		if (event_type == CXL_CPER_EVENT_GEN_MEDIA) {
> +			if (cxl_store_rec_gen_media((struct cxl_memdev *)cxlmd, evt))
> +				dev_dbg(&cxlmd->dev, "CXL store rec_gen_media failed\n");
> +

All of this should be turned off when there is no EDAC consumer.

I don't see anything that triggers releasing the cache when a repair
makes continuing to save the information irrelevant.

I don't see any safety with respect to error storms and burning
unlimited memory in this cache.

The cache is storing 100% of the event record which seems wasteful if
100% of the data is not needed for validating repair operations.

The memory overhead expense of this feature needs to be estimated and
documented in the Kconfig so that distros can make a reasonable decision
about turning this on.

>  			trace_cxl_general_media(cxlmd, type, cxlr, hpa,
>  						hpa_alias, &evt->gen_media);
> -		else if (event_type == CXL_CPER_EVENT_DRAM)
> +		} else if (event_type == CXL_CPER_EVENT_DRAM) {
> +			if (cxl_store_rec_dram((struct cxl_memdev *)cxlmd, evt))
> +				dev_dbg(&cxlmd->dev, "CXL store rec_dram failed\n");
> +
>  			trace_cxl_dram(cxlmd, type, cxlr, hpa, hpa_alias,
>  				       &evt->dram);
> +		}
>  	}
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_event_trace_record, "CXL");
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index a16a5886d40a..bd9ba50bc01e 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -25,8 +25,17 @@ static DEFINE_IDA(cxl_memdev_ida);
>  static void cxl_memdev_release(struct device *dev)
>  {
>  	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> +	struct cxl_event_gen_media *rec_gen_media;
> +	struct cxl_event_dram *rec_dram;
> +	unsigned long index;
>  
>  	ida_free(&cxl_memdev_ida, cxlmd->id);
> +	xa_for_each(&cxlmd->rec_dram, index, rec_dram)
> +		kfree(rec_dram);
> +	xa_destroy(&cxlmd->rec_dram);
> +	xa_for_each(&cxlmd->rec_gen_media, index, rec_gen_media)
> +		kfree(rec_gen_media);
> +	xa_destroy(&cxlmd->rec_gen_media);
>  	kfree(cxlmd);
>  }
>  
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index 485a831695c7..c703d4e7e05b 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -7,6 +7,151 @@
>  #include <cxlmem.h>
>  #include "trace.h"
>  
> +struct cxl_event_gen_media *
> +cxl_find_rec_gen_media(struct cxl_memdev *cxlmd,
> +		       struct cxl_mem_repair_attrbs *attrbs)
> +{
> +	struct cxl_event_gen_media *rec;
> +
> +	rec = xa_load(&cxlmd->rec_gen_media, attrbs->dpa);
> +	if (!rec)
> +		return NULL;
> +
> +	if (attrbs->repair_type == CXL_PPR)
> +		return rec;
> +
> +	return NULL;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_find_rec_gen_media, "CXL");
> +
> +struct cxl_event_dram *cxl_find_rec_dram(struct cxl_memdev *cxlmd,
> +					 struct cxl_mem_repair_attrbs *attrbs)
> +{
> +	struct cxl_event_dram *rec;
> +	u16 validity_flags;
> +
> +	rec = xa_load(&cxlmd->rec_dram, attrbs->dpa);
> +	if (!rec)
> +		return NULL;
> +
> +	validity_flags = get_unaligned_le16(rec->media_hdr.validity_flags);
> +	if (!(validity_flags & CXL_DER_VALID_CHANNEL) ||
> +	    !(validity_flags & CXL_DER_VALID_RANK))
> +		return NULL;
> +
> +	switch (attrbs->repair_type) {
> +	case CXL_PPR:
> +		if (!(validity_flags & CXL_DER_VALID_NIBBLE) ||
> +		    get_unaligned_le24(rec->nibble_mask) == attrbs->nibble_mask)
> +			return rec;
> +		break;
> +	case CXL_CACHELINE_SPARING:
> +		if (!(validity_flags & CXL_DER_VALID_BANK_GROUP) ||
> +		    !(validity_flags & CXL_DER_VALID_BANK) ||
> +		    !(validity_flags & CXL_DER_VALID_ROW) ||
> +		    !(validity_flags & CXL_DER_VALID_COLUMN))
> +			return NULL;
> +
> +		if (rec->media_hdr.channel == attrbs->channel &&
> +		    rec->media_hdr.rank == attrbs->rank &&
> +		    rec->bank_group == attrbs->bank_group &&
> +		    rec->bank == attrbs->bank &&
> +		    get_unaligned_le24(rec->row) == attrbs->row &&
> +		    get_unaligned_le16(rec->column) == attrbs->column &&
> +		    (!(validity_flags & CXL_DER_VALID_NIBBLE) ||
> +		     get_unaligned_le24(rec->nibble_mask) ==
> +			     attrbs->nibble_mask) &&
> +		    (!(validity_flags & CXL_DER_VALID_SUB_CHANNEL) ||
> +		     rec->sub_channel == attrbs->sub_channel))
> +			return rec;
> +		break;
> +	case CXL_ROW_SPARING:
> +		if (!(validity_flags & CXL_DER_VALID_BANK_GROUP) ||
> +		    !(validity_flags & CXL_DER_VALID_BANK) ||
> +		    !(validity_flags & CXL_DER_VALID_ROW))
> +			return NULL;
> +
> +		if (rec->media_hdr.channel == attrbs->channel &&
> +		    rec->media_hdr.rank == attrbs->rank &&
> +		    rec->bank_group == attrbs->bank_group &&
> +		    rec->bank == attrbs->bank &&
> +		    get_unaligned_le24(rec->row) == attrbs->row &&
> +		    (!(validity_flags & CXL_DER_VALID_NIBBLE) ||
> +		     get_unaligned_le24(rec->nibble_mask) ==
> +			     attrbs->nibble_mask))
> +			return rec;
> +		break;
> +	case CXL_BANK_SPARING:
> +		if (!(validity_flags & CXL_DER_VALID_BANK_GROUP) ||
> +		    !(validity_flags & CXL_DER_VALID_BANK))
> +			return NULL;
> +
> +		if (rec->media_hdr.channel == attrbs->channel &&
> +		    rec->media_hdr.rank == attrbs->rank &&
> +		    rec->bank_group == attrbs->bank_group &&
> +		    rec->bank == attrbs->bank &&
> +		    (!(validity_flags & CXL_DER_VALID_NIBBLE) ||
> +		     get_unaligned_le24(rec->nibble_mask) ==
> +			     attrbs->nibble_mask))
> +			return rec;
> +		break;
> +	case CXL_RANK_SPARING:
> +		if (rec->media_hdr.channel == attrbs->channel &&
> +		    rec->media_hdr.rank == attrbs->rank &&
> +		    (!(validity_flags & CXL_DER_VALID_NIBBLE) ||
> +		     get_unaligned_le24(rec->nibble_mask) ==
> +			     attrbs->nibble_mask))
> +			return rec;
> +		break;
> +	default:
> +		return NULL;
> +	}
> +
> +	return NULL;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_find_rec_dram, "CXL");
> +
> +int cxl_store_rec_gen_media(struct cxl_memdev *cxlmd, union cxl_event *evt)
> +{
> +	void *old_rec;
> +	struct cxl_event_gen_media *rec =
> +		kmemdup(&evt->gen_media, sizeof(*rec), GFP_KERNEL);
> +	if (!rec)
> +		return -ENOMEM;
> +
> +	old_rec = xa_store(&cxlmd->rec_gen_media,
> +			   le64_to_cpu(rec->media_hdr.phys_addr), rec,
> +			   GFP_KERNEL);
> +	if (xa_is_err(old_rec))
> +		return xa_err(old_rec);
> +
> +	kfree(old_rec);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_store_rec_gen_media, "CXL");
> +
> +int cxl_store_rec_dram(struct cxl_memdev *cxlmd, union cxl_event *evt)
> +{
> +	void *old_rec;
> +	struct cxl_event_dram *rec =
> +		kmemdup(&evt->dram, sizeof(*rec), GFP_KERNEL);
> +
> +	if (!rec)
> +		return -ENOMEM;
> +
> +	old_rec = xa_store(&cxlmd->rec_dram,
> +			   le64_to_cpu(rec->media_hdr.phys_addr), rec,
> +			   GFP_KERNEL);
> +	if (xa_is_err(old_rec))
> +		return xa_err(old_rec);
> +
> +	kfree(old_rec);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_store_rec_dram, "CXL");
> +
>  static void cxl_cper_trace_corr_port_prot_err(struct pci_dev *pdev,
>  					      struct cxl_ras_capability_regs ras_cap)
>  {
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 7ab257e0c85e..24ece579a145 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -34,6 +34,41 @@
>  	(FIELD_GET(CXLMDEV_RESET_NEEDED_MASK, status) !=                       \
>  	 CXLMDEV_RESET_NEEDED_NOT)
>  
> +enum cxl_mem_repair_type {
> +	CXL_PPR,
> +	CXL_CACHELINE_SPARING,
> +	CXL_ROW_SPARING,
> +	CXL_BANK_SPARING,
> +	CXL_RANK_SPARING,
> +	CXL_REPAIR_MAX,
> +};
> +
> +/**
> + * struct cxl_mem_repair_attrbs - CXL memory repair attributes

Between attr, param, attrbs the names of intermediate objects seem to
have no rhyme or reason in these patches. I don't have a specific
suggestion here beyond please take another pass over the whole set and
be consistent.

In general attr is an overloaded term especially in code files that have
sysfs attributes, so please steer away from "attr" for that reason.

> + * @dpa: DPA of memory to repair
> + * @nibble_mask: nibble mask, identifies one or more nibbles on the memory bus
> + * @row: row of memory to repair
> + * @column: column of memory to repair
> + * @channel: channel of memory to repair
> + * @sub_channel: sub channel of memory to repair
> + * @rank: rank of memory to repair
> + * @bank_group: bank group of memory to repair
> + * @bank: bank of memory to repair
> + * @repair_type: repair type. For eg. PPR, memory sparing etc.
> + */
> +struct cxl_mem_repair_attrbs {
> +	u64 dpa;
> +	u32 nibble_mask;
> +	u32 row;
> +	u16 column;
> +	u8 channel;
> +	u8 sub_channel;
> +	u8 rank;
> +	u8 bank_group;
> +	u8 bank;
> +	enum cxl_mem_repair_type repair_type;
> +};
> +
>  /**
>   * struct cxl_memdev - CXL bus object representing a Type-3 Memory Device
>   * @dev: driver core device object
> @@ -45,6 +80,8 @@
>   * @endpoint: connection to the CXL port topology for this memory device
>   * @id: id number of this memdev instance.
>   * @depth: endpoint port depth
> + * @rec_gen_media: xarray to store CXL general media records
> + * @rec_dram: xarray to store CXL DRAM records
>   */
>  struct cxl_memdev {
>  	struct device dev;
> @@ -56,6 +93,8 @@ struct cxl_memdev {
>  	struct cxl_port *endpoint;
>  	int id;
>  	int depth;
> +	struct xarray rec_gen_media;
> +	struct xarray rec_dram;

Can this move to an EDAC context object to not burden 'struct
cxl_memdev' by default?


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v2 3/8] cxl/edac: Add CXL memory device patrol scrub control feature
  2025-03-27  1:47   ` Dan Williams
@ 2025-03-28 10:18     ` Shiju Jose
  0 siblings, 0 replies; 24+ messages in thread
From: Shiju Jose @ 2025-03-28 10:18 UTC (permalink / raw)
  To: Dan Williams, linux-cxl, dave, Jonathan Cameron, dave.jiang,
	alison.schofield, vishal.l.verma, ira.weiny, david,
	Vilas.Sridharan
  Cc: linux-edac, linux-acpi, linux-mm, linux-kernel, bp, tony.luck,
	rafael, lenb, mchehab, leo.duran, Yazen.Ghannam, rientjes,
	jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen, gthelen,
	wschwartz, dferguson, wbs, nifan.cxl, tanxiaofei, Zengtao (B),
	Roberto Sassu, kangkang.shen, wanghuiqiang, Linuxarm

>-----Original Message-----
>From: Dan Williams <dan.j.williams@intel.com>
>Sent: 27 March 2025 01:47
>To: Shiju Jose <shiju.jose@huawei.com>; linux-cxl@vger.kernel.org;
>dan.j.williams@intel.com; dave@stgolabs.net; Jonathan Cameron
><jonathan.cameron@huawei.com>; dave.jiang@intel.com;
>alison.schofield@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>david@redhat.com; Vilas.Sridharan@amd.com
>Cc: linux-edac@vger.kernel.org; linux-acpi@vger.kernel.org; linux-
>mm@kvack.org; linux-kernel@vger.kernel.org; bp@alien8.de;
>tony.luck@intel.com; rafael@kernel.org; lenb@kernel.org;
>mchehab@kernel.org; leo.duran@amd.com; Yazen.Ghannam@amd.com;
>rientjes@google.com; jiaqiyan@google.com; Jon.Grimm@amd.com;
>dave.hansen@linux.intel.com; naoya.horiguchi@nec.com;
>james.morse@arm.com; jthoughton@google.com; somasundaram.a@hpe.com;
>erdemaktas@google.com; pgonda@google.com; duenwen@google.com;
>gthelen@google.com; wschwartz@amperecomputing.com;
>dferguson@amperecomputing.com; wbs@os.amperecomputing.com;
>nifan.cxl@gmail.com; tanxiaofei <tanxiaofei@huawei.com>; Zengtao (B)
><prime.zeng@hisilicon.com>; Roberto Sassu <roberto.sassu@huawei.com>;
>kangkang.shen@futurewei.com; wanghuiqiang <wanghuiqiang@huawei.com>;
>Linuxarm <linuxarm@huawei.com>; Shiju Jose <shiju.jose@huawei.com>
>Subject: Re: [PATCH v2 3/8] cxl/edac: Add CXL memory device patrol scrub
>control feature
>
>shiju.jose@ wrote:
>> From: Shiju Jose <shiju.jose@huawei.com>
>>
>> CXL spec 3.2 section 8.2.10.9.11.1 describes the device patrol scrub
[...]
>> diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c new
>> file mode 100644 index 000000000000..5ec3535785e1
>> --- /dev/null
>> +++ b/drivers/cxl/core/edac.c
>> @@ -0,0 +1,474 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * CXL EDAC memory feature driver.
>> + *
>> + * Copyright (c) 2024-2025 HiSilicon Limited.
>> + *
>> + *  - Supports functions to configure EDAC features of the
>> + *    CXL memory devices.
>> + *  - Registers with the EDAC device subsystem driver to expose
>> + *    the features sysfs attributes to the user for configuring
>> + *    CXL memory RAS feature.
>> + */
>> +
>> +#include <linux/cleanup.h>
>> +#include <linux/edac.h>
>> +#include <linux/limits.h>
>> +#include <cxl/features.h>
>> +#include <cxl.h>
>> +#include <cxlmem.h>
>> +#include "core.h"
>> +
>> +#define CXL_NR_EDAC_DEV_FEATURES 1
>> +
>> +static struct rw_semaphore *cxl_acquire(struct rw_semaphore *rwsem) {
>> +	if (down_read_interruptible(rwsem))
>> +		return NULL;
>> +
>> +	return rwsem;
>> +}
>> +
>> +DEFINE_FREE(cxl_unlock, struct rw_semaphore *, if (_T) up_read(_T))
>
>I know I suggested cxl_acquire() and cxl_unlock(), but this really is a generic
>facility.
>
>Let's call it rwsem_read_intr_acquire() and rwsem_read_release(), and I'll
>follow up later with Peter to see if he wants this to graduate from CXL.
>
>Also, go ahead and define it in cxl.h for now as I think other places in the
>subsystem could benefit from this approach.

Hi Dan,

Thanks for the comments.
Sure. should these definitions in cxl.h  require in a separate patch?

>
>> +
>> +/*
>> + * CXL memory patrol scrub control
>> + */
>> +struct cxl_patrol_scrub_context {
>
>I like "patrol_scrub" spelled out here compared to "ps" used everywhere else.
Will change.

>
>> +	u8 instance;
>> +	u16 get_feat_size;
>> +	u16 set_feat_size;
>> +	u8 get_version;
>> +	u8 set_version;
>> +	u16 effects;
>> +	struct cxl_memdev *cxlmd;
>> +	struct cxl_region *cxlr;
>> +};
>> +
>> +/**
>> + * struct cxl_memdev_ps_params - CXL memory patrol scrub parameter data
>structure.
>> + * @enable:     [IN & OUT] enable(1)/disable(0) patrol scrub.
>> + * @scrub_cycle_changeable: [OUT] scrub cycle attribute of patrol scrub is
>changeable.
>> + * @scrub_cycle_hrs:    [IN] Requested patrol scrub cycle in hours.
>> + *                      [OUT] Current patrol scrub cycle in hours.
>> + * @min_scrub_cycle_hrs:[OUT] minimum patrol scrub cycle in hours
>supported.
>> + */
>> +struct cxl_memdev_ps_params {
>> +	bool enable;
>> +	bool scrub_cycle_changeable;
>
>This is set but unused. Even if it were to be used I would expect it to be set in the
>cxl_patrol_scrub_context.
I will add  to cxl_patrol_scrub_context and will add an extra check against this when
user request to change the scrub rate.
>
>> +	u8 scrub_cycle_hrs;
>> +	u8 min_scrub_cycle_hrs;
>> +};
>
>I do not understand the point of this extra object and would prefer to keep
>intermediate data structures to a minimum.
>
>It looks like all this does is provide for short lived parsed caching of the raw
>hardware patrol scrube attributes. Just pass those raw objects around and
>provide helpers to do the conversion.

Sure. Will do. Was added to avoid more number of function parameters. 
>
>The less data structures the less confusion for the next person that has to read
>this code a few years down the road.
>
>> +
>> +enum cxl_scrub_param {
>> +	CXL_PS_PARAM_ENABLE,
>> +	CXL_PS_PARAM_SCRUB_CYCLE,
>> +};
>
>This seems unforuntate, why not make non-zero scrub rate an implied enable
>and zero to disable? A non-zero sentinel value like U32_MAX can indicate "keep
>scrub rate unchanged".

These  enums can be removed with remove using cxl_memdev_ps_params.

>
>> +#define CXL_MEMDEV_PS_SCRUB_CYCLE_CHANGE_CAP_MASK BIT(0)
>
>This CXL_MEMDEV_PS prefix is awkward due to overload with 'struct
>cxl_memdev'. Minimize it to something like:
>
>CXL_SCRUB_CONTROL_CHANGEABLE
>CXL_SCRUB_CONTROL_REALTIME
>CXL_SCRUB_CONTROL_CYCLE_MASK
>CXL_SCRUB_CONTROL_MIN_CYCLE_MASK

Will change.
>
>> +#define CXL_MEMDEV_PS_SCRUB_CYCLE_REALTIME_REPORT_CAP_MASK
>BIT(1)
>> +#define CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK GENMASK(7, 0)
>#define
>> +CXL_MEMDEV_PS_MIN_SCRUB_CYCLE_MASK GENMASK(15, 8) #define
>> +CXL_MEMDEV_PS_FLAG_ENABLED_MASK BIT(0)
>
>CXL_SCRUB_CONTROL_ENABLE
>
>...no need to call it a mask when it is just a single-bit, and when it is both the
>status and the control just call it "enable".

Sure. Will change.
>
>> +
>> +/*
>> + * See CXL spec rev 3.2 @8.2.10.9.11.1 Table 8-222 Device Patrol
>> +Scrub Control
>> + * Feature Readable Attributes.
>> + */
>> +struct cxl_memdev_ps_rd_attrs {
>> +	u8 scrub_cycle_cap;
>> +	__le16 scrub_cycle_hrs;
>
>"hours" is just 2 more characters than "hrs", I think we can afford the extra
>bytes.

Will change.
>
>[..]
>> +int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd) {
>> +	struct edac_dev_feature ras_features[CXL_NR_EDAC_DEV_FEATURES];
>> +	int num_ras_features = 0;
>> +	u8 scrub_inst = 0;
>> +	int rc;
>> +
>> +	rc = cxl_memdev_scrub_init(cxlmd, &ras_features[num_ras_features],
>> +				   scrub_inst);
>> +	if (rc < 0 && rc != -EOPNOTSUPP)
>> +		return rc;
>> +
>> +	if (rc != -EOPNOTSUPP)
>> +		num_ras_features++;
>> +
>> +	char *cxl_dev_name __free(kfree) =
>> +		kasprintf(GFP_KERNEL, "cxl_%s", dev_name(&cxlmd->dev));
>
>if (!cxl_dev_name)
>	return -ENOMEM;

Will add.
>
>> +
>> +	return edac_dev_register(&cxlmd->dev, cxl_dev_name, NULL,
>> +				 num_ras_features, ras_features); }
>> +EXPORT_SYMBOL_NS_GPL(devm_cxl_memdev_edac_register, "CXL");
>> +
>> +int devm_cxl_region_edac_register(struct cxl_region *cxlr) {
>> +	struct edac_dev_feature ras_features[CXL_NR_EDAC_DEV_FEATURES];
>> +	int num_ras_features = 0;
>> +	u8 scrub_inst = 0;
>> +	int rc;
>> +
>> +	rc = cxl_region_scrub_init(cxlr, &ras_features[num_ras_features],
>> +				   scrub_inst);
>> +	if (rc < 0)
>> +		return rc;
>> +
>> +	num_ras_features++;
>> +
>> +	char *cxl_dev_name __free(kfree) =
>> +		kasprintf(GFP_KERNEL, "cxl_%s", dev_name(&cxlr->dev));
>> +
>> +	return edac_dev_register(&cxlr->dev, cxl_dev_name, NULL,
>> +				 num_ras_features, ras_features); }
>> +EXPORT_SYMBOL_NS_GPL(devm_cxl_region_edac_register, "CXL");
>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>> index b3260d433ec7..2aa6eb675fdf 100644
>> --- a/drivers/cxl/core/region.c
>> +++ b/drivers/cxl/core/region.c
>> @@ -3542,6 +3542,11 @@ static int cxl_region_probe(struct device *dev)
>>  	case CXL_PARTMODE_PMEM:
>>  		return devm_cxl_add_pmem_region(cxlr);
>>  	case CXL_PARTMODE_RAM:
>> +		rc = devm_cxl_region_edac_register(cxlr);
>
>Why do only volatile regions get EDAC support? PMEM patrol scrub seems
>equally valid.

Will add devm_cxl_region_edac_register () for PMEM as well.

Thanks,
Shiju


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v2 7/8] cxl/memfeature: Add CXL memory device soft PPR control feature
  2025-03-27 17:08     ` Borislav Petkov
@ 2025-03-28 13:05       ` Shiju Jose
  0 siblings, 0 replies; 24+ messages in thread
From: Shiju Jose @ 2025-03-28 13:05 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-cxl, dan.j.williams, dave, Jonathan Cameron, dave.jiang,
	alison.schofield, vishal.l.verma, ira.weiny, david,
	Vilas.Sridharan, linux-edac, linux-acpi, linux-mm, linux-kernel,
	tony.luck, rafael, lenb, mchehab, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, gthelen, wschwartz, dferguson, wbs, nifan.cxl,
	tanxiaofei, Zengtao (B),
	Roberto Sassu, kangkang.shen, wanghuiqiang, Linuxarm

>-----Original Message-----
>From: Borislav Petkov <bp@alien8.de>
>Sent: 27 March 2025 17:09
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: linux-cxl@vger.kernel.org; dan.j.williams@intel.com; dave@stgolabs.net;
>Jonathan Cameron <jonathan.cameron@huawei.com>; dave.jiang@intel.com;
>alison.schofield@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>david@redhat.com; Vilas.Sridharan@amd.com; linux-edac@vger.kernel.org;
>linux-acpi@vger.kernel.org; linux-mm@kvack.org; linux-kernel@vger.kernel.org;
>tony.luck@intel.com; rafael@kernel.org; lenb@kernel.org;
>mchehab@kernel.org; leo.duran@amd.com; Yazen.Ghannam@amd.com;
>rientjes@google.com; jiaqiyan@google.com; Jon.Grimm@amd.com;
>dave.hansen@linux.intel.com; naoya.horiguchi@nec.com;
>james.morse@arm.com; jthoughton@google.com; somasundaram.a@hpe.com;
>erdemaktas@google.com; pgonda@google.com; duenwen@google.com;
>gthelen@google.com; wschwartz@amperecomputing.com;
>dferguson@amperecomputing.com; wbs@os.amperecomputing.com;
>nifan.cxl@gmail.com; tanxiaofei <tanxiaofei@huawei.com>; Zengtao (B)
><prime.zeng@hisilicon.com>; Roberto Sassu <roberto.sassu@huawei.com>;
>kangkang.shen@futurewei.com; wanghuiqiang <wanghuiqiang@huawei.com>;
>Linuxarm <linuxarm@huawei.com>
>Subject: Re: [PATCH v2 7/8] cxl/memfeature: Add CXL memory device soft PPR
>control feature
>
>On Thu, Mar 27, 2025 at 06:01:56PM +0100, Borislav Petkov wrote:
>> On Thu, Mar 20, 2025 at 06:04:44PM +0000, shiju.jose@huawei.com wrote:
>> > diff --git a/drivers/edac/mem_repair.c b/drivers/edac/mem_repair.c
>> > index 3b1a845457b0..bf7e01a8b4dd 100755
>> > --- a/drivers/edac/mem_repair.c
>> > +++ b/drivers/edac/mem_repair.c
>> > @@ -45,6 +45,11 @@ struct edac_mem_repair_context {
>> >  	struct attribute_group group;
>> >  };
>> >
>> > +const char * const edac_repair_type[] = {
>> > +	[EDAC_PPR] = "ppr",
>> > +};
>> > +EXPORT_SYMBOL_GPL(edac_repair_type);
>>
>> Why is this thing exported instead of adding a getter function and
>> having all its users pass in proper defines as arguments?
>>
>> And "EDAC_PPR" is not a proper define - it doesn't tell me what it is.
>>
>> It should be more likely a
>>
>> EDAC_REPAIR_PPR,
>> EDAC_REPAIR_ROW_SPARING,
>> EDAC_REPAIR_BANK_SPARING,
>>
>> and so on.

Hi Borislav,

Will change.

>
>Looking at this more:
>
>+static int cxl_ppr_get_repair_type(struct device *dev, void *drv_data,
>+				   const char **repair_type)
>+{
>+	*repair_type = edac_repair_type[EDAC_PPR];
>+
>+	return 0;
>+}
>
>Can this be any more silly?
>
>An ops member which copies a string pointer into some argument. What for?
>
>If those strings are for userspace, why don't you simply return *numbers* and
>let userspace convert them into strings?
Yes. The EDAC memory repair interface defined as return 'numbers' for userspace until 
v18 of the EDAC series.  Changed to return 'string' as Mauro wanted.
Please see discussion here.
https://lore.kernel.org/all/20250114152617.14eb41b5@foz.lan/

>
Thanks,
Shiju

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v2 1/8] cxl: Add helper function to retrieve a feature entry
  2025-03-27 16:59     ` Shiju Jose
@ 2025-03-31 23:35       ` Dan Williams
  0 siblings, 0 replies; 24+ messages in thread
From: Dan Williams @ 2025-03-31 23:35 UTC (permalink / raw)
  To: Shiju Jose, Dan Williams, linux-cxl, dave, Jonathan Cameron,
	dave.jiang, alison.schofield, vishal.l.verma, ira.weiny, david,
	Vilas.Sridharan
  Cc: linux-edac, linux-acpi, linux-mm, linux-kernel, bp, tony.luck,
	rafael, lenb, mchehab, leo.duran, Yazen.Ghannam, rientjes,
	jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen, gthelen,
	wschwartz, dferguson, wbs, nifan.cxl, tanxiaofei, Zengtao (B),
	Roberto Sassu, kangkang.shen, wanghuiqiang, Linuxarm

Shiju Jose wrote:
> >-----Original Message-----
> >From: Dan Williams <dan.j.williams@intel.com>
> >Sent: 26 March 2025 21:33
> >To: Shiju Jose <shiju.jose@huawei.com>; linux-cxl@vger.kernel.org;
> >dan.j.williams@intel.com; dave@stgolabs.net; Jonathan Cameron
> ><jonathan.cameron@huawei.com>; dave.jiang@intel.com;
> >alison.schofield@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
> >david@redhat.com; Vilas.Sridharan@amd.com
> >Cc: linux-edac@vger.kernel.org; linux-acpi@vger.kernel.org; linux-
> >mm@kvack.org; linux-kernel@vger.kernel.org; bp@alien8.de;
> >tony.luck@intel.com; rafael@kernel.org; lenb@kernel.org;
> >mchehab@kernel.org; leo.duran@amd.com; Yazen.Ghannam@amd.com;
> >rientjes@google.com; jiaqiyan@google.com; Jon.Grimm@amd.com;
> >dave.hansen@linux.intel.com; naoya.horiguchi@nec.com;
> >james.morse@arm.com; jthoughton@google.com; somasundaram.a@hpe.com;
> >erdemaktas@google.com; pgonda@google.com; duenwen@google.com;
> >gthelen@google.com; wschwartz@amperecomputing.com;
> >dferguson@amperecomputing.com; wbs@os.amperecomputing.com;
> >nifan.cxl@gmail.com; tanxiaofei <tanxiaofei@huawei.com>; Zengtao (B)
> ><prime.zeng@hisilicon.com>; Roberto Sassu <roberto.sassu@huawei.com>;
> >kangkang.shen@futurewei.com; wanghuiqiang <wanghuiqiang@huawei.com>;
> >Linuxarm <linuxarm@huawei.com>; Shiju Jose <shiju.jose@huawei.com>
> >Subject: Re: [PATCH v2 1/8] cxl: Add helper function to retrieve a feature entry
> >
> >shiju.jose@ wrote:
> >> From: Shiju Jose <shiju.jose@huawei.com>
> >>
> >> Add helper function to retrieve a feature entry from the supported
> >> features list, if supported.
> >>
> >> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> >> Reviewed-by: Fan Ni <fan.ni@samsung.com>
> >> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> >> ---
> >>  drivers/cxl/core/core.h     |  2 ++
> >>  drivers/cxl/core/features.c | 23 +++++++++++++++++++++++
> >>  2 files changed, 25 insertions(+)
> >>
> >> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h index
> >> 1803aedb25ca..16bc717376fc 100644
> >> --- a/drivers/cxl/core/core.h
> >> +++ b/drivers/cxl/core/core.h
> >> @@ -123,6 +123,8 @@ int cxl_ras_init(void);  void cxl_ras_exit(void);
> >>
> >>  #ifdef CONFIG_CXL_FEATURES
> >> +struct cxl_feat_entry *cxl_get_feature_entry(struct cxl_dev_state *cxlds,
> >> +					     const uuid_t *feat_uuid);
> >
> >It is unfortunate that this naming choice is too similar to cxl_get_feature().
> >However, as I go to suggest a new name I find that this is a duplicate of
> >get_support_feature_info() in Dave's fwctl series. Just drop this patch in favor of
> >that.
> 
> Hi Dan,
> 
> I am fine to use get_support_feature_info() for the EDAC features. 
> However this function is defined as static in the fwctl series and 

So, remove static and make it a routine shared between core/features.c
and core/edac.c.

> takes struct fwctl_rpc_cxl * as input for RPC instead of  uuid_t *
> as in cxl_get_feature_entry().
> 
> static struct cxl_feat_entry *
> get_support_feature_info(struct cxl_features_state *cxlfs,
> 			 const struct fwctl_rpc_cxl *rpc_in)
> 
> Can you suggest how to use get_support_feature_info() from within the CXL driver 
> to retrieve a supported feature entry (for e.g an EDAC feature)?

Simply refactor it into something generic that the cxl-edac and
cxl-fwctl path can share:

struct cxl_feat_entry *
cxl_feature_info(struct cxl_features_state *cxlfs, const uuid_t *uuid)

...with the header in drivers/cxl/core/core.h where other cross-core
object helpers are defined.


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2025-03-31 23:35 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-20 18:04 [PATCH v2 0/8] cxl: support CXL memory RAS features shiju.jose
2025-03-20 18:04 ` [PATCH v2 1/8] cxl: Add helper function to retrieve a feature entry shiju.jose
2025-03-26 21:32   ` Dan Williams
2025-03-27 16:59     ` Shiju Jose
2025-03-31 23:35       ` Dan Williams
2025-03-20 18:04 ` [PATCH v2 2/8] EDAC: Update documentation for the CXL memory patrol scrub control feature shiju.jose
2025-03-21 10:03   ` Jonathan Cameron
2025-03-24 10:37     ` Shiju Jose
2025-03-26 21:46     ` Dan Williams
2025-03-20 18:04 ` [PATCH v2 3/8] cxl/edac: Add CXL memory device " shiju.jose
2025-03-27  1:47   ` Dan Williams
2025-03-28 10:18     ` Shiju Jose
2025-03-20 18:04 ` [PATCH v2 4/8] cxl/edac: Add CXL memory device ECS " shiju.jose
2025-03-27 17:12   ` Dan Williams
2025-03-20 18:04 ` [PATCH v2 5/8] cxl/mbox: Add support for PERFORM_MAINTENANCE mailbox command shiju.jose
2025-03-27 17:23   ` Dan Williams
2025-03-20 18:04 ` [PATCH v2 6/8] cxl: Support for finding memory operation attributes from the current boot shiju.jose
2025-03-27 17:43   ` Dan Williams
2025-03-20 18:04 ` [PATCH v2 7/8] cxl/memfeature: Add CXL memory device soft PPR control feature shiju.jose
2025-03-27 17:01   ` Borislav Petkov
2025-03-27 17:08     ` Borislav Petkov
2025-03-28 13:05       ` Shiju Jose
2025-03-20 18:04 ` [PATCH v2 8/8] cxl/memfeature: Add CXL memory device memory sparing " shiju.jose
2025-03-21  7:39 ` [PATCH v2 0/8] cxl: support CXL memory RAS features Christophe Leroy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox