* [PATCH v2 0/3] ACPI: Add support for ACPI RAS2 feature table
@ 2025-03-05 18:02 shiju.jose
2025-03-05 18:02 ` [PATCH v2 1/3] ACPI: ACPI 6.5: RAS2: Shorten RAS2 table structure and variable names shiju.jose
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: shiju.jose @ 2025-03-05 18:02 UTC (permalink / raw)
To: linux-edac, linux-acpi, bp, tony.luck, rafael, lenb, mchehab,
leo.duran, Yazen.Ghannam
Cc: linux-cxl, dan.j.williams, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, david,
Vilas.Sridharan, linux-mm, linux-kernel, rientjes, jiaqiyan,
Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse, jthoughton,
somasundaram.a, erdemaktas, pgonda, duenwen, gthelen, wschwartz,
dferguson, wbs, nifan.cxl, yazen.ghannam, tanxiaofei, prime.zeng,
roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm, shiju.jose
From: Shiju Jose <shiju.jose@huawei.com>
Add support for ACPI RAS2 feature table (RAS2) defined in the
ACPI 6.5 specification, section 5.2.21 and RAS2 HW based memory
scrubbing feature.
ACPI RAS2 patches were part of the EDAC series [1].
The code is based on ras.git: edac-cxl branch [2].
1. https://lore.kernel.org/linux-cxl/20250212143654.1893-1-shiju.jose@huawei.com/
2. https://web.git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git/log/?h=edac-cxl
Changes
=======
v1 -> v2:
1. Changes for feedbacks from Borislav.
- Shorten ACPI RAS2 structures and variables names in "include/acpi/actbl2.h".
- Shorten some of the other variables in the RAS2 drivers.
- Fixed few CamelCases.
2. Changes for feedbacks from Yazen.
- Added newline after number of '}' and return statements.
- Changed return type for "ras2_add_aux_device() to 'int'.
- Deleted a duplication of acpi_get_table("RAS2",...) in the ras2_acpi_parse_table().
- Add "FW_WARN" to few error logs in the ras2_acpi_parse_table().
- Rename ras2_acpi_init() to acpi_ras2_init() and modified to call acpi_ras2_init()
function from the acpi_init().
- Moved scrub related variables from the struct ras2_mem_ctx from patch
"ACPI:RAS2: Add ACPI RAS2 driver" to "ras: mem: Add memory ACPI RAS2 driver".
3. Rename file include/acpi/acpi_ras2.h to include/acpi/ras2.h
Shiju Jose (3):
ACPI: ACPI 6.5: RAS2: Shorten RAS2 table structure and variable names
ACPI:RAS2: Add ACPI RAS2 driver
ras: mem: Add memory ACPI RAS2 driver
Documentation/edac/scrub.rst | 73 ++++++
drivers/acpi/Kconfig | 11 +
drivers/acpi/Makefile | 1 +
drivers/acpi/bus.c | 3 +
drivers/acpi/ras2.c | 416 +++++++++++++++++++++++++++++++++++
drivers/ras/Kconfig | 11 +
drivers/ras/Makefile | 1 +
drivers/ras/acpi_ras2.c | 391 ++++++++++++++++++++++++++++++++
include/acpi/actbl2.h | 38 ++--
include/acpi/ras2.h | 54 +++++
10 files changed, 980 insertions(+), 19 deletions(-)
create mode 100755 drivers/acpi/ras2.c
create mode 100644 drivers/ras/acpi_ras2.c
create mode 100644 include/acpi/ras2.h
--
2.43.0
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 1/3] ACPI: ACPI 6.5: RAS2: Shorten RAS2 table structure and variable names
2025-03-05 18:02 [PATCH v2 0/3] ACPI: Add support for ACPI RAS2 feature table shiju.jose
@ 2025-03-05 18:02 ` shiju.jose
2025-03-05 18:51 ` Luck, Tony
2025-03-06 6:05 ` Jonathan Cameron
2025-03-05 18:02 ` [PATCH v2 2/3] ACPI:RAS2: Add ACPI RAS2 driver shiju.jose
2025-03-05 18:02 ` [PATCH v2 3/3] ras: mem: Add memory " shiju.jose
2 siblings, 2 replies; 15+ messages in thread
From: shiju.jose @ 2025-03-05 18:02 UTC (permalink / raw)
To: linux-edac, linux-acpi, bp, tony.luck, rafael, lenb, mchehab,
leo.duran, Yazen.Ghannam
Cc: linux-cxl, dan.j.williams, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, david,
Vilas.Sridharan, linux-mm, linux-kernel, rientjes, jiaqiyan,
Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse, jthoughton,
somasundaram.a, erdemaktas, pgonda, duenwen, gthelen, wschwartz,
dferguson, wbs, nifan.cxl, yazen.ghannam, tanxiaofei, prime.zeng,
roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm, shiju.jose
From: Shiju Jose <shiju.jose@huawei.com>
Shorten RAS2 table structure and variable names.
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
include/acpi/actbl2.h | 38 +++++++++++++++++++-------------------
1 file changed, 19 insertions(+), 19 deletions(-)
diff --git a/include/acpi/actbl2.h b/include/acpi/actbl2.h
index 2e917a8f8bca..5cfc65ba6e9e 100644
--- a/include/acpi/actbl2.h
+++ b/include/acpi/actbl2.h
@@ -2802,20 +2802,20 @@ struct acpi_ras2_pcc_desc {
/* RAS2 Platform Communication Channel Shared Memory Region */
-struct acpi_ras2_shared_memory {
+struct acpi_ras2_shmem {
u32 signature;
- u16 command;
+ u16 cmd;
u16 status;
u16 version;
u8 features[16];
- u8 set_capabilities[16];
- u16 num_parameter_blocks;
- u32 set_capabilities_status;
+ u8 set_caps[16];
+ u16 num_param_blks;
+ u32 set_caps_status;
};
/* RAS2 Parameter Block Structure for PATROL_SCRUB */
-struct acpi_ras2_parameter_block {
+struct acpi_ras2_param_blk {
u16 type;
u16 version;
u16 length;
@@ -2823,11 +2823,11 @@ struct acpi_ras2_parameter_block {
/* RAS2 Parameter Block Structure for PATROL_SCRUB */
-struct acpi_ras2_patrol_scrub_parameter {
- struct acpi_ras2_parameter_block header;
- u16 patrol_scrub_command;
- u64 requested_address_range[2];
- u64 actual_address_range[2];
+struct acpi_ras2_patrol_scrub_param {
+ struct acpi_ras2_param_blk header;
+ u16 cmd;
+ u64 req_addr_range[2];
+ u64 actl_addr_range[2];
u32 flags;
u32 scrub_params_out;
u32 scrub_params_in;
@@ -2839,12 +2839,12 @@ struct acpi_ras2_patrol_scrub_parameter {
/* RAS2 Parameter Block Structure for LA2PA_TRANSLATION */
-struct acpi_ras2_la2pa_translation_parameter {
- struct acpi_ras2_parameter_block header;
- u16 addr_translation_command;
+struct acpi_ras2_la2pa_transln_param {
+ struct acpi_ras2_param_blk header;
+ u16 cmd;
u64 sub_inst_id;
- u64 logical_address;
- u64 physical_address;
+ u64 logical_addr;
+ u64 phy_addr;
u32 status;
};
@@ -2863,7 +2863,7 @@ enum acpi_ras2_features {
/* RAS2 Patrol Scrub Commands */
-enum acpi_ras2_patrol_scrub_commands {
+enum acpi_ras2_patrol_scrub_cmds {
ACPI_RAS2_GET_PATROL_PARAMETERS = 1,
ACPI_RAS2_START_PATROL_SCRUBBER = 2,
ACPI_RAS2_STOP_PATROL_SCRUBBER = 3
@@ -2871,13 +2871,13 @@ enum acpi_ras2_patrol_scrub_commands {
/* RAS2 LA2PA Translation Commands */
-enum acpi_ras2_la2_pa_translation_commands {
+enum acpi_ras2_la2_pa_transln_cmds {
ACPI_RAS2_GET_LA2PA_TRANSLATION = 1,
};
/* RAS2 LA2PA Translation Status values */
-enum acpi_ras2_la2_pa_translation_status {
+enum acpi_ras2_la2_pa_transln_status {
ACPI_RAS2_LA2PA_TRANSLATION_SUCCESS = 0,
ACPI_RAS2_LA2PA_TRANSLATION_FAIL = 1,
};
--
2.43.0
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 2/3] ACPI:RAS2: Add ACPI RAS2 driver
2025-03-05 18:02 [PATCH v2 0/3] ACPI: Add support for ACPI RAS2 feature table shiju.jose
2025-03-05 18:02 ` [PATCH v2 1/3] ACPI: ACPI 6.5: RAS2: Shorten RAS2 table structure and variable names shiju.jose
@ 2025-03-05 18:02 ` shiju.jose
2025-03-06 9:19 ` Jonathan Cameron
2025-03-05 18:02 ` [PATCH v2 3/3] ras: mem: Add memory " shiju.jose
2 siblings, 1 reply; 15+ messages in thread
From: shiju.jose @ 2025-03-05 18:02 UTC (permalink / raw)
To: linux-edac, linux-acpi, bp, tony.luck, rafael, lenb, mchehab,
leo.duran, Yazen.Ghannam
Cc: linux-cxl, dan.j.williams, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, david,
Vilas.Sridharan, linux-mm, linux-kernel, rientjes, jiaqiyan,
Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse, jthoughton,
somasundaram.a, erdemaktas, pgonda, duenwen, gthelen, wschwartz,
dferguson, wbs, nifan.cxl, yazen.ghannam, tanxiaofei, prime.zeng,
roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm, shiju.jose
From: Shiju Jose <shiju.jose@huawei.com>
Add support for ACPI RAS2 feature table (RAS2) defined in the
ACPI 6.5 Specification, section 5.2.21.
Driver defines RAS2 Init, which extracts the RAS2 table and driver
adds auxiliary device for each memory feature which binds to the
RAS2 memory driver.
Driver uses PCC mailbox to communicate with the ACPI HW and the
driver adds OSPM interfaces to send RAS2 commands.
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Co-developed-by: A Somasundaram <somasundaram.a@hpe.com>
Signed-off-by: A Somasundaram <somasundaram.a@hpe.com>
Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Daniel Ferguson <danielf@os.amperecomputing.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
drivers/acpi/Kconfig | 11 ++
drivers/acpi/Makefile | 1 +
drivers/acpi/bus.c | 3 +
drivers/acpi/ras2.c | 416 ++++++++++++++++++++++++++++++++++++++++++
include/acpi/ras2.h | 48 +++++
5 files changed, 479 insertions(+)
create mode 100755 drivers/acpi/ras2.c
create mode 100644 include/acpi/ras2.h
diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index d81b55f5068c..bae9a47c829d 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -293,6 +293,17 @@ config ACPI_CPPC_LIB
If your platform does not support CPPC in firmware,
leave this option disabled.
+config ACPI_RAS2
+ bool "ACPI RAS2 driver"
+ select AUXILIARY_BUS
+ select MAILBOX
+ select PCC
+ help
+ The driver adds support for ACPI RAS2 feature table(extracts RAS2
+ table from OS system table) and OSPM interfaces to send RAS2
+ commands via PCC mailbox subspace. Driver adds platform device for
+ the RAS2 memory features which binds to the RAS2 memory driver.
+
config ACPI_PROCESSOR
tristate "Processor"
depends on X86 || ARM64 || LOONGARCH || RISCV
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 40208a0f5dfb..797b38cdc2f3 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -100,6 +100,7 @@ obj-$(CONFIG_ACPI_EC_DEBUGFS) += ec_sys.o
obj-$(CONFIG_ACPI_BGRT) += bgrt.o
obj-$(CONFIG_ACPI_CPPC_LIB) += cppc_acpi.o
obj-$(CONFIG_ACPI_SPCR_TABLE) += spcr.o
+obj-$(CONFIG_ACPI_RAS2) += ras2.o
obj-$(CONFIG_ACPI_DEBUGGER_USER) += acpi_dbg.o
obj-$(CONFIG_ACPI_PPTT) += pptt.o
obj-$(CONFIG_ACPI_PFRUT) += pfr_update.o pfr_telemetry.o
diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 058910af82bc..9a0a1582b8c3 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -29,6 +29,7 @@
#include <linux/acpi_viot.h>
#include <linux/pci.h>
#include <acpi/apei.h>
+#include <acpi/ras2.h>
#include <linux/suspend.h>
#include <linux/prmt.h>
@@ -1472,6 +1473,8 @@ static int __init acpi_init(void)
acpi_debugger_init();
acpi_setup_sb_notify_handler();
acpi_viot_init();
+ acpi_ras2_init();
+
return 0;
}
diff --git a/drivers/acpi/ras2.c b/drivers/acpi/ras2.c
new file mode 100755
index 000000000000..8831a2bd5fab
--- /dev/null
+++ b/drivers/acpi/ras2.c
@@ -0,0 +1,416 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Implementation of ACPI RAS2 driver.
+ *
+ * Copyright (c) 2024-2025 HiSilicon Limited.
+ *
+ * Support for RAS2 - ACPI 6.5 Specification, section 5.2.21
+ *
+ * Driver contains ACPI RAS2 init, which extracts the ACPI RAS2 table and
+ * get the PCC channel subspace for communicating with the ACPI compliant
+ * HW platform which supports ACPI RAS2. Driver adds platform devices
+ * for each RAS2 memory feature which binds to the memory ACPI RAS2 driver.
+ */
+
+#define pr_fmt(fmt) "ACPI RAS2: " fmt
+
+#include <linux/delay.h>
+#include <linux/export.h>
+#include <linux/ktime.h>
+#include <acpi/pcc.h>
+#include <acpi/ras2.h>
+
+static struct acpi_table_ras2 *__read_mostly ras2_tab;
+
+/* Data structure for PCC communication */
+struct ras2_pcc_subspace {
+ int pcc_id;
+ struct mbox_client mbox_client;
+ struct pcc_mbox_chan *pcc_chan;
+ struct acpi_ras2_shmem __iomem *comm_addr;
+ bool pcc_chnl_acq;
+ ktime_t deadline;
+ unsigned int pcc_mpar;
+ unsigned int pcc_mrtt;
+ struct list_head elem;
+ u16 ref_count;
+};
+
+/*
+ * Arbitrary Retries for PCC commands because the
+ * remote processor could be much slower to reply.
+ */
+#define RAS2_NUM_RETRIES 600
+
+#define RAS2_FEAT_TYPE_MEMORY 0x00
+
+/* Global variables for the RAS2 PCC subspaces */
+static DEFINE_MUTEX(ras2_pcc_lock);
+static LIST_HEAD(ras2_pcc_subspaces);
+
+static int ras2_report_cap_error(u32 cap_status)
+{
+ switch (cap_status) {
+ case ACPI_RAS2_NOT_VALID:
+ case ACPI_RAS2_NOT_SUPPORTED:
+ return -EPERM;
+ case ACPI_RAS2_BUSY:
+ return -EBUSY;
+ case ACPI_RAS2_FAILED:
+ case ACPI_RAS2_ABORTED:
+ case ACPI_RAS2_INVALID_DATA:
+ return -EINVAL;
+ default: /* 0 or other, Success */
+ return 0;
+ }
+}
+
+static int ras2_check_pcc_chan(struct ras2_pcc_subspace *pcc_subspace)
+{
+ struct acpi_ras2_shmem __iomem *gen_comm_base = pcc_subspace->comm_addr;
+ ktime_t next_deadline = ktime_add(ktime_get(), pcc_subspace->deadline);
+ u32 cap_status;
+ u16 status;
+ u32 rc;
+
+ while (!ktime_after(ktime_get(), next_deadline)) {
+ /*
+ * As per ACPI spec, the PCC space will be initialized by
+ * platform and should have set the command completion bit when
+ * PCC can be used by OSPM
+ */
+ status = readw_relaxed(&gen_comm_base->status);
+ if (status & RAS2_PCC_CMD_ERROR) {
+ cap_status = readw_relaxed(&gen_comm_base->set_caps_status);
+ rc = ras2_report_cap_error(cap_status);
+
+ status &= ~RAS2_PCC_CMD_ERROR;
+ writew_relaxed(status, &gen_comm_base->status);
+ return rc;
+ }
+
+ if (status & RAS2_PCC_CMD_COMPLETE)
+ return 0;
+ /*
+ * Reducing the bus traffic in case this loop takes longer than
+ * a few retries.
+ */
+ msleep(10);
+ }
+
+ return -EIO;
+}
+
+/**
+ * ras2_send_pcc_cmd() - Send RAS2 command via PCC channel
+ * @ras2_ctx: pointer to the RAS2 context structure
+ * @cmd: command to send
+ *
+ * Returns: 0 on success, an error otherwise
+ */
+int ras2_send_pcc_cmd(struct ras2_mem_ctx *ras2_ctx, u16 cmd)
+{
+ struct ras2_pcc_subspace *pcc_subspace = ras2_ctx->pcc_subspace;
+ struct acpi_ras2_shmem *gen_comm_base = pcc_subspace->comm_addr;
+ static ktime_t last_cmd_cmpl_time, last_mpar_reset;
+ struct mbox_chan *pcc_channel;
+ unsigned int time_delta;
+ static int mpar_count;
+ int rc;
+
+ guard(mutex)(&ras2_pcc_lock);
+ rc = ras2_check_pcc_chan(pcc_subspace);
+ if (rc < 0)
+ return rc;
+
+ pcc_channel = pcc_subspace->pcc_chan->mchan;
+
+ /*
+ * Handle the Minimum Request Turnaround Time(MRTT)
+ * "The minimum amount of time that OSPM must wait after the completion
+ * of a command before issuing the next command, in microseconds"
+ */
+ if (pcc_subspace->pcc_mrtt) {
+ time_delta = ktime_us_delta(ktime_get(), last_cmd_cmpl_time);
+ if (pcc_subspace->pcc_mrtt > time_delta)
+ udelay(pcc_subspace->pcc_mrtt - time_delta);
+ }
+
+ /*
+ * Handle the non-zero Maximum Periodic Access Rate(MPAR)
+ * "The maximum number of periodic requests that the subspace channel can
+ * support, reported in commands per minute. 0 indicates no limitation."
+ *
+ * This parameter should be ideally zero or large enough so that it can
+ * handle maximum number of requests that all the cores in the system can
+ * collectively generate. If it is not, we will follow the spec and just
+ * not send the request to the platform after hitting the MPAR limit in
+ * any 60s window
+ */
+ if (pcc_subspace->pcc_mpar) {
+ if (mpar_count == 0) {
+ time_delta = ktime_ms_delta(ktime_get(), last_mpar_reset);
+ if (time_delta < 60 * MSEC_PER_SEC) {
+ dev_dbg(ras2_ctx->dev,
+ "PCC cmd not sent due to MPAR limit");
+ return -EIO;
+ }
+ last_mpar_reset = ktime_get();
+ mpar_count = pcc_subspace->pcc_mpar;
+ }
+ mpar_count--;
+ }
+
+ /* Write to the shared comm region. */
+ writew_relaxed(cmd, &gen_comm_base->cmd);
+
+ /* Flip CMD COMPLETE bit */
+ writew_relaxed(0, &gen_comm_base->status);
+
+ /* Ring doorbell */
+ rc = mbox_send_message(pcc_channel, &cmd);
+ if (rc < 0) {
+ dev_err(ras2_ctx->dev,
+ "Err sending PCC mbox message. cmd:%d, rc:%d\n",
+ cmd, rc);
+ return rc;
+ }
+
+ /*
+ * If Minimum Request Turnaround Time is non-zero, we need
+ * to record the completion time of both READ and WRITE
+ * command for proper handling of MRTT, so we need to check
+ * for pcc_mrtt in addition to CMD_READ
+ */
+ if (cmd == RAS2_PCC_CMD_EXEC || pcc_subspace->pcc_mrtt) {
+ rc = ras2_check_pcc_chan(pcc_subspace);
+ if (pcc_subspace->pcc_mrtt)
+ last_cmd_cmpl_time = ktime_get();
+ }
+
+ if (pcc_channel->mbox->txdone_irq)
+ mbox_chan_txdone(pcc_channel, rc);
+ else
+ mbox_client_txdone(pcc_channel, rc);
+
+ return rc >= 0 ? 0 : rc;
+}
+EXPORT_SYMBOL_GPL(ras2_send_pcc_cmd);
+
+static int ras2_register_pcc_channel(struct ras2_mem_ctx *ras2_ctx, int pcc_id)
+{
+ struct ras2_pcc_subspace *pcc_subspace;
+ struct pcc_mbox_chan *pcc_chan;
+ struct mbox_client *mbox_cl;
+
+ if (pcc_id < 0)
+ return -EINVAL;
+
+ mutex_lock(&ras2_pcc_lock);
+ list_for_each_entry(pcc_subspace, &ras2_pcc_subspaces, elem) {
+ if (pcc_subspace->pcc_id != pcc_id)
+ continue;
+ ras2_ctx->pcc_subspace = pcc_subspace;
+ pcc_subspace->ref_count++;
+ mutex_unlock(&ras2_pcc_lock);
+ return 0;
+ }
+ mutex_unlock(&ras2_pcc_lock);
+
+ pcc_subspace = kcalloc(1, sizeof(*pcc_subspace), GFP_KERNEL);
+ if (!pcc_subspace)
+ return -ENOMEM;
+
+ mbox_cl = &pcc_subspace->mbox_client;
+ mbox_cl->knows_txdone = true;
+
+ pcc_chan = pcc_mbox_request_channel(mbox_cl, pcc_id);
+ if (IS_ERR(pcc_chan)) {
+ kfree(pcc_subspace);
+ return PTR_ERR(pcc_chan);
+ }
+
+ *pcc_subspace = (struct ras2_pcc_subspace) {
+ .pcc_id = pcc_id,
+ .pcc_chan = pcc_chan,
+ .comm_addr = acpi_os_ioremap(pcc_chan->shmem_base_addr,
+ pcc_chan->shmem_size),
+ .deadline = ns_to_ktime(RAS2_NUM_RETRIES *
+ pcc_chan->latency *
+ NSEC_PER_USEC),
+ .pcc_mrtt = pcc_chan->min_turnaround_time,
+ .pcc_mpar = pcc_chan->max_access_rate,
+ .mbox_client = {
+ .knows_txdone = true,
+ },
+ .pcc_chnl_acq = true,
+ };
+
+ mutex_lock(&ras2_pcc_lock);
+ list_add(&pcc_subspace->elem, &ras2_pcc_subspaces);
+ pcc_subspace->ref_count++;
+ mutex_unlock(&ras2_pcc_lock);
+ ras2_ctx->pcc_subspace = pcc_subspace;
+ ras2_ctx->comm_addr = pcc_subspace->comm_addr;
+ ras2_ctx->dev = pcc_chan->mchan->mbox->dev;
+
+ return 0;
+}
+
+static DEFINE_IDA(ras2_ida);
+static void ras2_remove_pcc(struct ras2_mem_ctx *ras2_ctx)
+{
+ struct ras2_pcc_subspace *pcc_subspace = ras2_ctx->pcc_subspace;
+
+ guard(mutex)(&ras2_pcc_lock);
+ if (pcc_subspace->ref_count > 0)
+ pcc_subspace->ref_count--;
+
+ if (!pcc_subspace->ref_count) {
+ list_del(&pcc_subspace->elem);
+ pcc_mbox_free_channel(pcc_subspace->pcc_chan);
+ kfree(pcc_subspace);
+ }
+}
+
+static void ras2_release(struct device *device)
+{
+ struct auxiliary_device *auxdev = container_of(device, struct auxiliary_device, dev);
+ struct ras2_mem_ctx *ras2_ctx = container_of(auxdev, struct ras2_mem_ctx, adev);
+
+ ida_free(&ras2_ida, auxdev->id);
+ ras2_remove_pcc(ras2_ctx);
+ kfree(ras2_ctx);
+}
+
+static int ras2_add_aux_device(char *name, int channel)
+{
+ struct ras2_mem_ctx *ras2_ctx;
+ int id, rc;
+
+ ras2_ctx = kzalloc(sizeof(*ras2_ctx), GFP_KERNEL);
+ if (!ras2_ctx)
+ return -ENOMEM;
+
+ mutex_init(&ras2_ctx->lock);
+
+ rc = ras2_register_pcc_channel(ras2_ctx, channel);
+ if (rc < 0) {
+ pr_debug("failed to register pcc channel rc=%d\n", rc);
+ goto ctx_free;
+ }
+
+ id = ida_alloc(&ras2_ida, GFP_KERNEL);
+ if (id < 0) {
+ rc = id;
+ goto pcc_free;
+ }
+
+ ras2_ctx->id = id;
+ ras2_ctx->adev.id = id;
+ ras2_ctx->adev.name = RAS2_MEM_DEV_ID_NAME;
+ ras2_ctx->adev.dev.release = ras2_release;
+ ras2_ctx->adev.dev.parent = ras2_ctx->dev;
+
+ rc = auxiliary_device_init(&ras2_ctx->adev);
+ if (rc)
+ goto ida_free;
+
+ rc = auxiliary_device_add(&ras2_ctx->adev);
+ if (rc) {
+ auxiliary_device_uninit(&ras2_ctx->adev);
+ return rc;
+ }
+
+ return 0;
+
+ida_free:
+ ida_free(&ras2_ida, id);
+pcc_free:
+ ras2_remove_pcc(ras2_ctx);
+ctx_free:
+ kfree(ras2_ctx);
+
+ return rc;
+}
+
+static int acpi_ras2_parse(void)
+{
+ struct acpi_ras2_pcc_desc *pcc_desc_list;
+ int pcc_id;
+ u8 count = 0;
+ int rc, i;
+
+ if (ras2_tab->header.length < sizeof(struct acpi_table_ras2)) {
+ pr_warn(FW_WARN "ACPI RAS2 table present but broken (too short #1)\n");
+ return -EINVAL;
+ }
+
+ if (!ras2_tab->num_pcc_descs) {
+ pr_warn(FW_WARN "No PCC descs in ACPI RAS2 table\n");
+ return -EINVAL;
+ }
+
+ pcc_desc_list = (struct acpi_ras2_pcc_desc *)(ras2_tab + 1);
+ /* Double scan for the case of only one actual controller */
+ pcc_id = -1;
+ count = 0;
+ for (i = 0; i < ras2_tab->num_pcc_descs; i++, pcc_desc_list++) {
+ if (pcc_desc_list->feature_type != RAS2_FEAT_TYPE_MEMORY)
+ continue;
+ if (pcc_id == -1) {
+ pcc_id = pcc_desc_list->channel_id;
+ count++;
+ }
+ if (pcc_desc_list->channel_id != pcc_id)
+ count++;
+ }
+
+ /*
+ * Workaround for the client platform with multiple scrub devices
+ * but uses single PCC subspace for communication.
+ */
+ if (count == 1) {
+ /* Add auxiliary device and bind ACPI RAS2 memory driver */
+ rc = ras2_add_aux_device(RAS2_MEM_DEV_ID_NAME, pcc_id);
+ if (rc)
+ return rc;
+
+ return 0;
+ }
+
+ pcc_desc_list = (struct acpi_ras2_pcc_desc *)(ras2_tab + 1);
+ count = 0;
+ for (i = 0; i < ras2_tab->num_pcc_descs; i++, pcc_desc_list++) {
+ if (pcc_desc_list->feature_type != RAS2_FEAT_TYPE_MEMORY)
+ continue;
+ pcc_id = pcc_desc_list->channel_id;
+ /* Add auxiliary device and bind ACPI RAS2 memory driver */
+ rc = ras2_add_aux_device(RAS2_MEM_DEV_ID_NAME, pcc_id);
+ if (rc)
+ return rc;
+ }
+
+ return 0;
+}
+
+void __init acpi_ras2_init(void)
+{
+ acpi_status status;
+ int rc;
+
+ status = acpi_get_table(ACPI_SIG_RAS2, 0,
+ (struct acpi_table_header **)&ras2_tab);
+ if (ACPI_FAILURE(status) || !ras2_tab) {
+ const char *msg = acpi_format_exception(status);
+
+ pr_err("Failed to get table, %s\n", msg);
+ return;
+ }
+
+ rc = acpi_ras2_parse();
+ if (rc) {
+ acpi_put_table((struct acpi_table_header *)ras2_tab);
+ pr_err("Failed to parse RAS2 table\n");
+ }
+}
diff --git a/include/acpi/ras2.h b/include/acpi/ras2.h
new file mode 100644
index 000000000000..5b27c1f30096
--- /dev/null
+++ b/include/acpi/ras2.h
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * ACPI RAS2 driver header file
+ *
+ * Copyright (c) 2024-2025 HiSilicon Limited
+ */
+
+#ifndef _ACPI_RAS2_H
+#define _ACPI_RAS2_H
+
+#include <linux/acpi.h>
+#include <linux/auxiliary_bus.h>
+#include <linux/mailbox_client.h>
+#include <linux/mutex.h>
+#include <linux/types.h>
+
+#define RAS2_PCC_CMD_COMPLETE BIT(0)
+#define RAS2_PCC_CMD_ERROR BIT(2)
+
+/* RAS2 specific PCC commands */
+#define RAS2_PCC_CMD_EXEC 0x01
+
+#define RAS2_AUX_DEV_NAME "ras2"
+#define RAS2_MEM_DEV_ID_NAME "acpi_ras2_mem"
+
+/* Data structure RAS2 table */
+struct ras2_mem_ctx {
+ struct auxiliary_device adev;
+ /* Lock to provide mutually exclusive access to PCC channel */
+ struct mutex lock;
+ struct device *dev;
+ struct acpi_ras2_shmem __iomem *comm_addr;
+ void *pcc_subspace;
+ int id;
+};
+
+#ifdef CONFIG_ACPI_RAS2
+void __init acpi_ras2_init(void);
+int ras2_send_pcc_cmd(struct ras2_mem_ctx *ras2_ctx, u16 cmd);
+#else
+static inline void acpi_ras2_init(void) { }
+
+static inline int ras2_send_pcc_cmd(struct ras2_mem_ctx *ras2_ctx, u16 cmd)
+{
+ return -EOPNOTSUPP;
+}
+#endif
+#endif /* _ACPI_RAS2_H */
--
2.43.0
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 3/3] ras: mem: Add memory ACPI RAS2 driver
2025-03-05 18:02 [PATCH v2 0/3] ACPI: Add support for ACPI RAS2 feature table shiju.jose
2025-03-05 18:02 ` [PATCH v2 1/3] ACPI: ACPI 6.5: RAS2: Shorten RAS2 table structure and variable names shiju.jose
2025-03-05 18:02 ` [PATCH v2 2/3] ACPI:RAS2: Add ACPI RAS2 driver shiju.jose
@ 2025-03-05 18:02 ` shiju.jose
2025-03-06 9:32 ` Jonathan Cameron
2025-03-07 21:51 ` Daniel Ferguson
2 siblings, 2 replies; 15+ messages in thread
From: shiju.jose @ 2025-03-05 18:02 UTC (permalink / raw)
To: linux-edac, linux-acpi, bp, tony.luck, rafael, lenb, mchehab,
leo.duran, Yazen.Ghannam
Cc: linux-cxl, dan.j.williams, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, david,
Vilas.Sridharan, linux-mm, linux-kernel, rientjes, jiaqiyan,
Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse, jthoughton,
somasundaram.a, erdemaktas, pgonda, duenwen, gthelen, wschwartz,
dferguson, wbs, nifan.cxl, yazen.ghannam, tanxiaofei, prime.zeng,
roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm, shiju.jose
From: Shiju Jose <shiju.jose@huawei.com>
Memory ACPI RAS2 auxiliary driver binds to the auxiliary device
add by the ACPI RAS2 table parser.
Driver uses a PCC subspace for communicating with the ACPI compliant
platform.
Device with ACPI RAS2 scrub feature registers with EDAC device driver,
which retrieves the scrub descriptor from EDAC scrub and exposes
the scrub control attributes for RAS2 scrub instance to userspace in
/sys/bus/edac/devices/acpi_ras_mem0/scrubX/.
Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Daniel Ferguson <danielf@os.amperecomputing.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
Documentation/edac/scrub.rst | 73 +++++++
drivers/ras/Kconfig | 11 +
drivers/ras/Makefile | 1 +
drivers/ras/acpi_ras2.c | 391 +++++++++++++++++++++++++++++++++++
include/acpi/ras2.h | 6 +
5 files changed, 482 insertions(+)
create mode 100644 drivers/ras/acpi_ras2.c
diff --git a/Documentation/edac/scrub.rst b/Documentation/edac/scrub.rst
index daab929cdba1..fc8dcbd13f91 100644
--- a/Documentation/edac/scrub.rst
+++ b/Documentation/edac/scrub.rst
@@ -264,3 +264,76 @@ Sysfs files are documented in
`Documentation/ABI/testing/sysfs-edac-scrub`
`Documentation/ABI/testing/sysfs-edac-ecs`
+
+Examples
+--------
+
+The usage takes the form shown in these examples:
+
+1. ACPI RAS2
+
+1.1 On demand scrubbing for a specific memory region.
+
+1.1.1. Query what is device default/current scrub cycle setting.
+
+ Applicable to both on-demand and background scrubbing.
+
+# cat /sys/bus/edac/devices/acpi_ras_mem0/scrub0/current_cycle_duration
+
+36000
+
+1.1.2 Query the range of device supported scrub cycle for a memory region.
+
+# cat /sys/bus/edac/devices/acpi_ras_mem0/scrub0/min_cycle_duration
+
+3600
+
+# cat /sys/bus/edac/devices/acpi_ras_mem0/scrub0/max_cycle_duration
+
+86400
+
+1.1.3. Program scrubbing for the memory region in RAS2 device to repeat every 43200 seconds (half a day).
+
+# echo 43200 > /sys/bus/edac/devices/acpi_ras_mem0/scrub0/current_cycle_duration
+
+1.1.4. Program address and size of the memory region to scrub
+
+Readback 'addr', non-zero - demand scrub is in progress, zero - scrub is finished.
+
+# cat /sys/bus/edac/devices/acpi_ras_mem0/scrub0/addr
+
+0
+
+Write 'size' of the memory region to scrub.
+
+# echo 0x300000 > /sys/bus/edac/devices/acpi_ras_mem0/scrub0/size
+
+Write 'addr' starts demand scrubbing, please make sure other attributes are set prior to that.
+
+# echo 0x200000 > /sys/bus/edac/devices/acpi_ras_mem0/scrub0/addr
+
+Readback 'addr', non-zero - demand scrub is in progress, zero - scrub is finished.
+
+# cat /sys/bus/edac/devices/acpi_ras_mem0/scrub0/addr
+
+0x200000
+
+# cat /sys/bus/edac/devices/acpi_ras_mem0/scrub0/addr
+
+0
+
+1.2 Background scrubbing the entire memory
+
+1.2.3 Query the status of background scrubbing.
+
+# cat /sys/bus/edac/devices/acpi_ras_mem0/scrub0/enable_background
+
+0
+
+1.2.4. Program background scrubbing for RAS2 device to repeat in every 21600 seconds (quarter of a day).
+
+# echo 21600 > /sys/bus/edac/devices/acpi_ras_mem0/scrub0/current_cycle_duration
+
+1.2.5. Start 'background scrubbing'.
+
+# echo 1 > /sys/bus/edac/devices/acpi_ras_mem0/scrub0/enable_background
diff --git a/drivers/ras/Kconfig b/drivers/ras/Kconfig
index fc4f4bb94a4c..a88002f1f462 100644
--- a/drivers/ras/Kconfig
+++ b/drivers/ras/Kconfig
@@ -46,4 +46,15 @@ config RAS_FMPM
Memory will be retired during boot time and run time depending on
platform-specific policies.
+config MEM_ACPI_RAS2
+ tristate "Memory ACPI RAS2 driver"
+ depends on ACPI_RAS2
+ depends on EDAC
+ depends on EDAC_SCRUB
+ help
+ The driver binds to the platform device added by the ACPI RAS2
+ table parser. Use a PCC channel subspace for communicating with
+ the ACPI compliant platform to provide control of memory scrub
+ parameters to the user via the EDAC scrub.
+
endif
diff --git a/drivers/ras/Makefile b/drivers/ras/Makefile
index 11f95d59d397..a0e6e903d6b0 100644
--- a/drivers/ras/Makefile
+++ b/drivers/ras/Makefile
@@ -2,6 +2,7 @@
obj-$(CONFIG_RAS) += ras.o
obj-$(CONFIG_DEBUG_FS) += debugfs.o
obj-$(CONFIG_RAS_CEC) += cec.o
+obj-$(CONFIG_MEM_ACPI_RAS2) += acpi_ras2.o
obj-$(CONFIG_RAS_FMPM) += amd/fmpm.o
obj-y += amd/atl/
diff --git a/drivers/ras/acpi_ras2.c b/drivers/ras/acpi_ras2.c
new file mode 100644
index 000000000000..2f9317aa7b81
--- /dev/null
+++ b/drivers/ras/acpi_ras2.c
@@ -0,0 +1,391 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * ACPI RAS2 memory driver
+ *
+ * Copyright (c) 2024-2025 HiSilicon Limited.
+ *
+ */
+
+#define pr_fmt(fmt) "ACPI RAS2 MEMORY: " fmt
+
+#include <linux/bitfield.h>
+#include <linux/edac.h>
+#include <linux/platform_device.h>
+#include <acpi/ras2.h>
+
+#define RAS2_DEV_NUM_RAS_FEATURES 1
+
+#define RAS2_SUPPORT_HW_PARTOL_SCRUB BIT(0)
+#define RAS2_TYPE_PATROL_SCRUB 0x0000
+
+#define RAS2_GET_PATROL_PARAMETERS 0x01
+#define RAS2_START_PATROL_SCRUBBER 0x02
+#define RAS2_STOP_PATROL_SCRUBBER 0x03
+
+/*
+ * RAS2 patrol scrub
+ */
+#define RAS2_PS_SC_HRS_IN_MASK GENMASK(15, 8)
+#define RAS2_PS_EN_BACKGROUND BIT(0)
+#define RAS2_PS_SC_HRS_OUT_MASK GENMASK(7, 0)
+#define RAS2_PS_MIN_SC_HRS_OUT_MASK GENMASK(15, 8)
+#define RAS2_PS_MAX_SC_HRS_OUT_MASK GENMASK(23, 16)
+#define RAS2_PS_FLAG_SCRUB_RUNNING BIT(0)
+
+#define RAS2_SCRUB_NAME_LEN 128
+#define RAS2_HOUR_IN_SECS 3600
+
+struct acpi_ras2_ps_shared_mem {
+ struct acpi_ras2_shmem common;
+ struct acpi_ras2_patrol_scrub_param params;
+};
+
+static int ras2_is_patrol_scrub_support(struct ras2_mem_ctx *ras2_ctx)
+{
+ struct acpi_ras2_shmem __iomem *common = (void *)ras2_ctx->comm_addr;
+
+ guard(mutex)(&ras2_ctx->lock);
+ common->set_caps[0] = 0;
+
+ return common->features[0] & RAS2_SUPPORT_HW_PARTOL_SCRUB;
+}
+
+static int ras2_update_patrol_scrub_params_cache(struct ras2_mem_ctx *ras2_ctx)
+{
+ struct acpi_ras2_ps_shared_mem __iomem *ps_sm =
+ (void *)ras2_ctx->comm_addr;
+ int ret;
+
+ ps_sm->common.set_caps[0] = RAS2_SUPPORT_HW_PARTOL_SCRUB;
+ ps_sm->params.cmd = RAS2_GET_PATROL_PARAMETERS;
+
+ ret = ras2_send_pcc_cmd(ras2_ctx, RAS2_PCC_CMD_EXEC);
+ if (ret) {
+ dev_err(ras2_ctx->dev, "failed to read parameters\n");
+ return ret;
+ }
+
+ ras2_ctx->min_scrub_cycle = FIELD_GET(RAS2_PS_MIN_SC_HRS_OUT_MASK,
+ ps_sm->params.scrub_params_out);
+ ras2_ctx->max_scrub_cycle = FIELD_GET(RAS2_PS_MAX_SC_HRS_OUT_MASK,
+ ps_sm->params.scrub_params_out);
+ if (!ras2_ctx->bg_scrub) {
+ ras2_ctx->base = ps_sm->params.actl_addr_range[0];
+ ras2_ctx->size = ps_sm->params.actl_addr_range[1];
+ }
+
+ ras2_ctx->scrub_cycle_hrs = FIELD_GET(RAS2_PS_SC_HRS_OUT_MASK,
+ ps_sm->params.scrub_params_out);
+
+ return 0;
+}
+
+/* Context - lock must be held */
+static int ras2_get_patrol_scrub_running(struct ras2_mem_ctx *ras2_ctx,
+ bool *running)
+{
+ struct acpi_ras2_ps_shared_mem __iomem *ps_sm =
+ (void *)ras2_ctx->comm_addr;
+ int ret;
+
+ ps_sm->common.set_caps[0] = RAS2_SUPPORT_HW_PARTOL_SCRUB;
+ ps_sm->params.cmd = RAS2_GET_PATROL_PARAMETERS;
+
+ ret = ras2_send_pcc_cmd(ras2_ctx, RAS2_PCC_CMD_EXEC);
+ if (ret) {
+ dev_err(ras2_ctx->dev, "failed to read parameters\n");
+ return ret;
+ }
+
+ *running = ps_sm->params.flags & RAS2_PS_FLAG_SCRUB_RUNNING;
+
+ return 0;
+}
+
+static int ras2_hw_scrub_read_min_scrub_cycle(struct device *dev, void *drv_data,
+ u32 *min)
+{
+ struct ras2_mem_ctx *ras2_ctx = drv_data;
+
+ *min = ras2_ctx->min_scrub_cycle * RAS2_HOUR_IN_SECS;
+
+ return 0;
+}
+
+static int ras2_hw_scrub_read_max_scrub_cycle(struct device *dev, void *drv_data,
+ u32 *max)
+{
+ struct ras2_mem_ctx *ras2_ctx = drv_data;
+
+ *max = ras2_ctx->max_scrub_cycle * RAS2_HOUR_IN_SECS;
+
+ return 0;
+}
+
+static int ras2_hw_scrub_cycle_read(struct device *dev, void *drv_data,
+ u32 *scrub_cycle_secs)
+{
+ struct ras2_mem_ctx *ras2_ctx = drv_data;
+
+ *scrub_cycle_secs = ras2_ctx->scrub_cycle_hrs * RAS2_HOUR_IN_SECS;
+
+ return 0;
+}
+
+static int ras2_hw_scrub_cycle_write(struct device *dev, void *drv_data,
+ u32 scrub_cycle_secs)
+{
+ u8 scrub_cycle_hrs = scrub_cycle_secs / RAS2_HOUR_IN_SECS;
+ struct ras2_mem_ctx *ras2_ctx = drv_data;
+ bool running;
+ int ret;
+
+ guard(mutex)(&ras2_ctx->lock);
+ ret = ras2_get_patrol_scrub_running(ras2_ctx, &running);
+ if (ret)
+ return ret;
+
+ if (running)
+ return -EBUSY;
+
+ if (scrub_cycle_hrs < ras2_ctx->min_scrub_cycle ||
+ scrub_cycle_hrs > ras2_ctx->max_scrub_cycle)
+ return -EINVAL;
+
+ ras2_ctx->scrub_cycle_hrs = scrub_cycle_hrs;
+
+ return 0;
+}
+
+static int ras2_hw_scrub_read_addr(struct device *dev, void *drv_data, u64 *base)
+{
+ struct ras2_mem_ctx *ras2_ctx = drv_data;
+ int ret;
+
+ /*
+ * When BG scrubbing is enabled the actual address range is not valid.
+ * Return -EBUSY now unless find out a method to retrieve actual full PA range.
+ */
+ if (ras2_ctx->bg_scrub)
+ return -EBUSY;
+
+ /*
+ * When demand scrubbing is finished firmware must reset actual
+ * address range to 0. Otherwise userspace assumes demand scrubbing
+ * is in progress.
+ */
+ ret = ras2_update_patrol_scrub_params_cache(ras2_ctx);
+ if (ret)
+ return ret;
+
+ *base = ras2_ctx->base;
+
+ return 0;
+}
+
+static int ras2_hw_scrub_read_size(struct device *dev, void *drv_data, u64 *size)
+{
+ struct ras2_mem_ctx *ras2_ctx = drv_data;
+ int ret;
+
+ if (ras2_ctx->bg_scrub)
+ return -EBUSY;
+
+ ret = ras2_update_patrol_scrub_params_cache(ras2_ctx);
+ if (ret)
+ return ret;
+
+ *size = ras2_ctx->size;
+
+ return 0;
+}
+
+static int ras2_hw_scrub_write_addr(struct device *dev, void *drv_data, u64 base)
+{
+ struct ras2_mem_ctx *ras2_ctx = drv_data;
+ struct acpi_ras2_ps_shared_mem __iomem *ps_sm =
+ (void *)ras2_ctx->comm_addr;
+ bool running;
+ int ret;
+
+ guard(mutex)(&ras2_ctx->lock);
+ ps_sm->common.set_caps[0] = RAS2_SUPPORT_HW_PARTOL_SCRUB;
+ if (ras2_ctx->bg_scrub)
+ return -EBUSY;
+
+ if (!base || !ras2_ctx->size) {
+ dev_warn(ras2_ctx->dev,
+ "%s: Invalid address range, base=0x%llx "
+ "size=0x%llx\n", __func__,
+ base, ras2_ctx->size);
+ return -ERANGE;
+ }
+
+ ret = ras2_get_patrol_scrub_running(ras2_ctx, &running);
+ if (ret)
+ return ret;
+
+ if (running)
+ return -EBUSY;
+
+ ps_sm->params.scrub_params_in &= ~RAS2_PS_SC_HRS_IN_MASK;
+ ps_sm->params.scrub_params_in |= FIELD_PREP(RAS2_PS_SC_HRS_IN_MASK,
+ ras2_ctx->scrub_cycle_hrs);
+ ps_sm->params.req_addr_range[0] = base;
+ ps_sm->params.req_addr_range[1] = ras2_ctx->size;
+ ps_sm->params.scrub_params_in &= ~RAS2_PS_EN_BACKGROUND;
+ ps_sm->params.cmd = RAS2_START_PATROL_SCRUBBER;
+
+ ret = ras2_send_pcc_cmd(ras2_ctx, RAS2_PCC_CMD_EXEC);
+ if (ret) {
+ dev_err(ras2_ctx->dev, "Failed to start demand scrubbing\n");
+ return ret;
+ }
+
+ return ras2_update_patrol_scrub_params_cache(ras2_ctx);
+}
+
+static int ras2_hw_scrub_write_size(struct device *dev, void *drv_data, u64 size)
+{
+ struct ras2_mem_ctx *ras2_ctx = drv_data;
+ bool running;
+ int ret;
+
+ guard(mutex)(&ras2_ctx->lock);
+ ret = ras2_get_patrol_scrub_running(ras2_ctx, &running);
+ if (ret)
+ return ret;
+
+ if (running)
+ return -EBUSY;
+
+ if (!size) {
+ dev_warn(dev, "%s: Invalid address range size=0x%llx\n",
+ __func__, size);
+ return -EINVAL;
+ }
+
+ ras2_ctx->size = size;
+
+ return 0;
+}
+
+static int ras2_hw_scrub_set_enabled_bg(struct device *dev, void *drv_data, bool enable)
+{
+ struct ras2_mem_ctx *ras2_ctx = drv_data;
+ struct acpi_ras2_ps_shared_mem __iomem *ps_sm =
+ (void *)ras2_ctx->comm_addr;
+ bool running;
+ int ret;
+
+ guard(mutex)(&ras2_ctx->lock);
+ ps_sm->common.set_caps[0] = RAS2_SUPPORT_HW_PARTOL_SCRUB;
+ ret = ras2_get_patrol_scrub_running(ras2_ctx, &running);
+ if (ret)
+ return ret;
+
+ if (enable) {
+ if (ras2_ctx->bg_scrub || running)
+ return -EBUSY;
+ ps_sm->params.req_addr_range[0] = 0;
+ ps_sm->params.req_addr_range[1] = 0;
+ ps_sm->params.scrub_params_in &= ~RAS2_PS_SC_HRS_IN_MASK;
+ ps_sm->params.scrub_params_in |= FIELD_PREP(RAS2_PS_SC_HRS_IN_MASK,
+ ras2_ctx->scrub_cycle_hrs);
+ ps_sm->params.cmd = RAS2_START_PATROL_SCRUBBER;
+ } else {
+ if (!ras2_ctx->bg_scrub)
+ return -EPERM;
+ ps_sm->params.cmd = RAS2_STOP_PATROL_SCRUBBER;
+ }
+
+ ps_sm->params.scrub_params_in &= ~RAS2_PS_EN_BACKGROUND;
+ ps_sm->params.scrub_params_in |= FIELD_PREP(RAS2_PS_EN_BACKGROUND,
+ enable);
+ ret = ras2_send_pcc_cmd(ras2_ctx, RAS2_PCC_CMD_EXEC);
+ if (ret) {
+ dev_err(ras2_ctx->dev, "Failed to %s background scrubbing\n",
+ str_enable_disable(enable));
+ return ret;
+ }
+
+ if (enable) {
+ ras2_ctx->bg_scrub = true;
+ /* Update the cache to account for rounding of supplied parameters and similar */
+ ret = ras2_update_patrol_scrub_params_cache(ras2_ctx);
+ } else {
+ ret = ras2_update_patrol_scrub_params_cache(ras2_ctx);
+ ras2_ctx->bg_scrub = false;
+ }
+
+ return ret;
+}
+
+static int ras2_hw_scrub_get_enabled_bg(struct device *dev, void *drv_data, bool *enabled)
+{
+ struct ras2_mem_ctx *ras2_ctx = drv_data;
+
+ *enabled = ras2_ctx->bg_scrub;
+
+ return 0;
+}
+
+static const struct edac_scrub_ops ras2_scrub_ops = {
+ .read_addr = ras2_hw_scrub_read_addr,
+ .read_size = ras2_hw_scrub_read_size,
+ .write_addr = ras2_hw_scrub_write_addr,
+ .write_size = ras2_hw_scrub_write_size,
+ .get_enabled_bg = ras2_hw_scrub_get_enabled_bg,
+ .set_enabled_bg = ras2_hw_scrub_set_enabled_bg,
+ .get_min_cycle = ras2_hw_scrub_read_min_scrub_cycle,
+ .get_max_cycle = ras2_hw_scrub_read_max_scrub_cycle,
+ .get_cycle_duration = ras2_hw_scrub_cycle_read,
+ .set_cycle_duration = ras2_hw_scrub_cycle_write,
+};
+
+static int ras2_probe(struct auxiliary_device *auxdev,
+ const struct auxiliary_device_id *id)
+{
+ struct ras2_mem_ctx *ras2_ctx = container_of(auxdev, struct ras2_mem_ctx, adev);
+ struct edac_dev_feature ras_features[RAS2_DEV_NUM_RAS_FEATURES];
+ char scrub_name[RAS2_SCRUB_NAME_LEN];
+ int num_ras_features = 0;
+ int ret;
+
+ if (!ras2_is_patrol_scrub_support(ras2_ctx))
+ return -EOPNOTSUPP;
+
+ ret = ras2_update_patrol_scrub_params_cache(ras2_ctx);
+ if (ret)
+ return ret;
+
+ snprintf(scrub_name, sizeof(scrub_name), "acpi_ras_mem%d",
+ ras2_ctx->id);
+
+ ras_features[num_ras_features].ft_type = RAS_FEAT_SCRUB;
+ ras_features[num_ras_features].instance = ras2_ctx->instance;
+ ras_features[num_ras_features].scrub_ops = &ras2_scrub_ops;
+ ras_features[num_ras_features].ctx = ras2_ctx;
+ num_ras_features++;
+
+ return edac_dev_register(&auxdev->dev, scrub_name, NULL,
+ num_ras_features, ras_features);
+}
+
+static const struct auxiliary_device_id ras2_mem_dev_id_table[] = {
+ { .name = RAS2_AUX_DEV_NAME "." RAS2_MEM_DEV_ID_NAME, },
+ { }
+};
+
+MODULE_DEVICE_TABLE(auxiliary, ras2_mem_dev_id_table);
+
+static struct auxiliary_driver ras2_mem_driver = {
+ .name = RAS2_MEM_DEV_ID_NAME,
+ .probe = ras2_probe,
+ .id_table = ras2_mem_dev_id_table,
+};
+module_auxiliary_driver(ras2_mem_driver);
+
+MODULE_IMPORT_NS("ACPI_RAS2");
+MODULE_DESCRIPTION("ACPI RAS2 memory driver");
+MODULE_LICENSE("GPL");
diff --git a/include/acpi/ras2.h b/include/acpi/ras2.h
index 5b27c1f30096..c9a6b63745dc 100644
--- a/include/acpi/ras2.h
+++ b/include/acpi/ras2.h
@@ -31,7 +31,13 @@ struct ras2_mem_ctx {
struct device *dev;
struct acpi_ras2_shmem __iomem *comm_addr;
void *pcc_subspace;
+ u64 base, size;
int id;
+ u8 instance;
+ u8 scrub_cycle_hrs;
+ u8 min_scrub_cycle;
+ u8 max_scrub_cycle;
+ bool bg_scrub;
};
#ifdef CONFIG_ACPI_RAS2
--
2.43.0
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [PATCH v2 1/3] ACPI: ACPI 6.5: RAS2: Shorten RAS2 table structure and variable names
2025-03-05 18:02 ` [PATCH v2 1/3] ACPI: ACPI 6.5: RAS2: Shorten RAS2 table structure and variable names shiju.jose
@ 2025-03-05 18:51 ` Luck, Tony
2025-03-06 2:03 ` Jonathan Cameron
2025-03-06 6:05 ` Jonathan Cameron
1 sibling, 1 reply; 15+ messages in thread
From: Luck, Tony @ 2025-03-05 18:51 UTC (permalink / raw)
To: shiju.jose, linux-edac, linux-acpi, bp, rafael, lenb, mchehab,
leo.duran, Yazen.Ghannam, rafael
Cc: linux-cxl, Williams, Dan J, dave, jonathan.cameron, Jiang, Dave,
Schofield, Alison, Verma, Vishal L, Weiny, Ira, david,
Vilas.Sridharan, linux-mm, linux-kernel, rientjes, jiaqiyan,
Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse, jthoughton,
Somasundaram A, Aktas, Erdem, pgonda, duenwen, gthelen,
wschwartz, dferguson, wbs, nifan.cxl, yazen.ghannam, tanxiaofei,
prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm
[+Rafael]
> include/acpi/actbl2.h | 38 +++++++++++++++++++-------------------
> 1 file changed, 19 insertions(+), 19 deletions(-)
This file is (somewhat) automatically generated from the ACPICA github repository.
https://github.com/acpica/acpica
I'm not sure how much divergence from the original is allowed.
-Tony
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 1/3] ACPI: ACPI 6.5: RAS2: Shorten RAS2 table structure and variable names
2025-03-05 18:51 ` Luck, Tony
@ 2025-03-06 2:03 ` Jonathan Cameron
0 siblings, 0 replies; 15+ messages in thread
From: Jonathan Cameron @ 2025-03-06 2:03 UTC (permalink / raw)
To: Luck, Tony
Cc: shiju.jose, linux-edac, linux-acpi, bp, rafael, lenb, mchehab,
leo.duran, Yazen.Ghannam, linux-cxl, Williams, Dan J, dave,
Jiang, Dave, Schofield, Alison, Verma, Vishal L, Weiny, Ira,
david, Vilas.Sridharan, linux-mm, linux-kernel, rientjes,
jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse,
jthoughton, Somasundaram A, Aktas, Erdem, pgonda, duenwen,
gthelen, wschwartz, dferguson, wbs, nifan.cxl, tanxiaofei,
prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm
On Wed, 5 Mar 2025 18:51:57 +0000
"Luck, Tony" <tony.luck@intel.com> wrote:
> [+Rafael]
Now he gets two copies :)
>
> > include/acpi/actbl2.h | 38 +++++++++++++++++++-------------------
> > 1 file changed, 19 insertions(+), 19 deletions(-)
>
> This file is (somewhat) automatically generated from the ACPICA github repository.
>
> https://github.com/acpica/acpica
>
> I'm not sure how much divergence from the original is allowed.
Yup, this was basically fishing for Rafael to tell us what we can
get away with or not.
Thanks
Jonathan
>
> -Tony
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 1/3] ACPI: ACPI 6.5: RAS2: Shorten RAS2 table structure and variable names
2025-03-05 18:02 ` [PATCH v2 1/3] ACPI: ACPI 6.5: RAS2: Shorten RAS2 table structure and variable names shiju.jose
2025-03-05 18:51 ` Luck, Tony
@ 2025-03-06 6:05 ` Jonathan Cameron
1 sibling, 0 replies; 15+ messages in thread
From: Jonathan Cameron @ 2025-03-06 6:05 UTC (permalink / raw)
To: shiju.jose
Cc: linux-edac, linux-acpi, bp, tony.luck, rafael, lenb, mchehab,
leo.duran, Yazen.Ghannam, linux-cxl, dan.j.williams, dave,
dave.jiang, alison.schofield, vishal.l.verma, ira.weiny, david,
Vilas.Sridharan, linux-mm, linux-kernel, rientjes, jiaqiyan,
Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse, jthoughton,
somasundaram.a, erdemaktas, pgonda, duenwen, gthelen, wschwartz,
dferguson, wbs, nifan.cxl, tanxiaofei, prime.zeng, roberto.sassu,
kangkang.shen, wanghuiqiang, linuxarm
On Wed, 5 Mar 2025 18:02:22 +0000
<shiju.jose@huawei.com> wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
>
> Shorten RAS2 table structure and variable names.
>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Hi Shiju,
Generally looks reasonable to me, but I'm not sure what your
decision process was for which to shorten and which to leave alone.
Perhaps it is worth mentioning that in the patch description?
> ---
> include/acpi/actbl2.h | 38 +++++++++++++++++++-------------------
> 1 file changed, 19 insertions(+), 19 deletions(-)
>
> diff --git a/include/acpi/actbl2.h b/include/acpi/actbl2.h
> index 2e917a8f8bca..5cfc65ba6e9e 100644
> --- a/include/acpi/actbl2.h
> +++ b/include/acpi/actbl2.h
> @@ -2802,20 +2802,20 @@ struct acpi_ras2_pcc_desc {
>
> /* RAS2 Platform Communication Channel Shared Memory Region */
>
> -struct acpi_ras2_shared_memory {
> +struct acpi_ras2_shmem {
> u32 signature;
> - u16 command;
> + u16 cmd;
> u16 status;
> u16 version;
> u8 features[16];
> - u8 set_capabilities[16];
> - u16 num_parameter_blocks;
> - u32 set_capabilities_status;
> + u8 set_caps[16];
> + u16 num_param_blks;
> + u32 set_caps_status;
I assume focus was on fields that were leading to long line lengths?
If it was just generally shortening things to common form
sig, sts, ver, feats etc would also seem reasonable to me
(all subject to Tony's question on whether we can touch this at all.)
> };
>
> /* RAS2 Parameter Block Structure for PATROL_SCRUB */
>
> -struct acpi_ras2_parameter_block {
> +struct acpi_ras2_param_blk {
> u16 type;
> u16 version;
> u16 length;
> @@ -2823,11 +2823,11 @@ struct acpi_ras2_parameter_block {
>
> /* RAS2 Parameter Block Structure for PATROL_SCRUB */
>
> -struct acpi_ras2_patrol_scrub_parameter {
> - struct acpi_ras2_parameter_block header;
> - u16 patrol_scrub_command;
> - u64 requested_address_range[2];
> - u64 actual_address_range[2];
> +struct acpi_ras2_patrol_scrub_param {
> + struct acpi_ras2_param_blk header;
> + u16 cmd;
> + u64 req_addr_range[2];
> + u64 actl_addr_range[2];
> u32 flags;
> u32 scrub_params_out;
> u32 scrub_params_in;
> @@ -2839,12 +2839,12 @@ struct acpi_ras2_patrol_scrub_parameter {
>
> /* RAS2 Parameter Block Structure for LA2PA_TRANSLATION */
>
> -struct acpi_ras2_la2pa_translation_parameter {
> - struct acpi_ras2_parameter_block header;
> - u16 addr_translation_command;
> +struct acpi_ras2_la2pa_transln_param {
> + struct acpi_ras2_param_blk header;
> + u16 cmd;
> u64 sub_inst_id;
> - u64 logical_address;
> - u64 physical_address;
> + u64 logical_addr;
> + u64 phy_addr;
> u32 status;
> };
>
> @@ -2863,7 +2863,7 @@ enum acpi_ras2_features {
>
> /* RAS2 Patrol Scrub Commands */
>
> -enum acpi_ras2_patrol_scrub_commands {
> +enum acpi_ras2_patrol_scrub_cmds {
> ACPI_RAS2_GET_PATROL_PARAMETERS = 1,
> ACPI_RAS2_START_PATROL_SCRUBBER = 2,
> ACPI_RAS2_STOP_PATROL_SCRUBBER = 3
> @@ -2871,13 +2871,13 @@ enum acpi_ras2_patrol_scrub_commands {
>
> /* RAS2 LA2PA Translation Commands */
>
> -enum acpi_ras2_la2_pa_translation_commands {
> +enum acpi_ras2_la2_pa_transln_cmds {
> ACPI_RAS2_GET_LA2PA_TRANSLATION = 1,
> };
>
> /* RAS2 LA2PA Translation Status values */
>
> -enum acpi_ras2_la2_pa_translation_status {
> +enum acpi_ras2_la2_pa_transln_status {
Do we touch this in the main code? If not I'd be tempted
to leave decision to shorten this or not to whowever
writes code that uses it.
> ACPI_RAS2_LA2PA_TRANSLATION_SUCCESS = 0,
> ACPI_RAS2_LA2PA_TRANSLATION_FAIL = 1,
> };
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 2/3] ACPI:RAS2: Add ACPI RAS2 driver
2025-03-05 18:02 ` [PATCH v2 2/3] ACPI:RAS2: Add ACPI RAS2 driver shiju.jose
@ 2025-03-06 9:19 ` Jonathan Cameron
2025-03-06 11:21 ` Shiju Jose
0 siblings, 1 reply; 15+ messages in thread
From: Jonathan Cameron @ 2025-03-06 9:19 UTC (permalink / raw)
To: shiju.jose
Cc: linux-edac, linux-acpi, bp, tony.luck, rafael, lenb, mchehab,
leo.duran, Yazen.Ghannam, linux-cxl, dan.j.williams, dave,
dave.jiang, alison.schofield, vishal.l.verma, ira.weiny, david,
Vilas.Sridharan, linux-mm, linux-kernel, rientjes, jiaqiyan,
Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse, jthoughton,
somasundaram.a, erdemaktas, pgonda, duenwen, gthelen, wschwartz,
dferguson, wbs, nifan.cxl, tanxiaofei, prime.zeng, roberto.sassu,
kangkang.shen, wanghuiqiang, linuxarm
On Wed, 5 Mar 2025 18:02:23 +0000
<shiju.jose@huawei.com> wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
>
> Add support for ACPI RAS2 feature table (RAS2) defined in the
> ACPI 6.5 Specification, section 5.2.21.
> Driver defines RAS2 Init, which extracts the RAS2 table and driver
> adds auxiliary device for each memory feature which binds to the
> RAS2 memory driver.
>
> Driver uses PCC mailbox to communicate with the ACPI HW and the
> driver adds OSPM interfaces to send RAS2 commands.
>
> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Co-developed-by: A Somasundaram <somasundaram.a@hpe.com>
> Signed-off-by: A Somasundaram <somasundaram.a@hpe.com>
> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Tested-by: Daniel Ferguson <danielf@os.amperecomputing.com>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Hi Shiju,
I took another look through as it's been a while and I've
pretty much forgotten this code :(
Anyhow, a few minor comments inline.
Thanks,
Jonathan
> diff --git a/drivers/acpi/ras2.c b/drivers/acpi/ras2.c
> new file mode 100755
> index 000000000000..8831a2bd5fab
> --- /dev/null
> +++ b/drivers/acpi/ras2.c
> +static int ras2_register_pcc_channel(struct ras2_mem_ctx *ras2_ctx, int pcc_id)
> +{
> + struct ras2_pcc_subspace *pcc_subspace;
> + struct pcc_mbox_chan *pcc_chan;
> + struct mbox_client *mbox_cl;
> +
> + if (pcc_id < 0)
> + return -EINVAL;
> +
> + mutex_lock(&ras2_pcc_lock);
> + list_for_each_entry(pcc_subspace, &ras2_pcc_subspaces, elem) {
> + if (pcc_subspace->pcc_id != pcc_id)
> + continue;
> + ras2_ctx->pcc_subspace = pcc_subspace;
> + pcc_subspace->ref_count++;
> + mutex_unlock(&ras2_pcc_lock);
> + return 0;
> + }
> + mutex_unlock(&ras2_pcc_lock);
> +
> + pcc_subspace = kcalloc(1, sizeof(*pcc_subspace), GFP_KERNEL);
if allocating a count of 1, why not kzalloc?
> + if (!pcc_subspace)
> + return -ENOMEM;
> +static int acpi_ras2_parse(void)
> +{
> + struct acpi_ras2_pcc_desc *pcc_desc_list;
> + int pcc_id;
> + u8 count = 0;
> + int rc, i;
> +
> + if (ras2_tab->header.length < sizeof(struct acpi_table_ras2)) {
extra space before <
Maybe sizeof(*ras2_tab) is cleaner.
> + pr_warn(FW_WARN "ACPI RAS2 table present but broken (too short #1)\n");
> + return -EINVAL;
> + }
> +
> + if (!ras2_tab->num_pcc_descs) {
> + pr_warn(FW_WARN "No PCC descs in ACPI RAS2 table\n");
> + return -EINVAL;
> + }
> +
> + pcc_desc_list = (struct acpi_ras2_pcc_desc *)(ras2_tab + 1);
> + /* Double scan for the case of only one actual controller */
> + pcc_id = -1;
> + count = 0;
Already set above, so no need to do it again. I'd do it just here. Can
put it in the loop init though.
> + for (i = 0; i < ras2_tab->num_pcc_descs; i++, pcc_desc_list++) {
> + if (pcc_desc_list->feature_type != RAS2_FEAT_TYPE_MEMORY)
> + continue;
> + if (pcc_id == -1) {
> + pcc_id = pcc_desc_list->channel_id;
> + count++;
> + }
> + if (pcc_desc_list->channel_id != pcc_id)
> + count++;
> + }
> +
> + /*
> + * Workaround for the client platform with multiple scrub devices
> + * but uses single PCC subspace for communication.
> + */
> + if (count == 1) {
> + /* Add auxiliary device and bind ACPI RAS2 memory driver */
> + rc = ras2_add_aux_device(RAS2_MEM_DEV_ID_NAME, pcc_id);
> + if (rc)
> + return rc;
> +
> + return 0;
> + }
> +
> + pcc_desc_list = (struct acpi_ras2_pcc_desc *)(ras2_tab + 1);
> + count = 0;
Maybe set in loop init.
> + for (i = 0; i < ras2_tab->num_pcc_descs; i++, pcc_desc_list++) {
> + if (pcc_desc_list->feature_type != RAS2_FEAT_TYPE_MEMORY)
> + continue;
> + pcc_id = pcc_desc_list->channel_id;
> + /* Add auxiliary device and bind ACPI RAS2 memory driver */
obvious enough to drop the comment I think.
> + rc = ras2_add_aux_device(RAS2_MEM_DEV_ID_NAME, pcc_id);
pcc_desc_list->channel_id);
and no local variable.
> + if (rc)
> + return rc;
> + }
> +
> + return 0;
> +}
> +
> +void __init acpi_ras2_init(void)
> +{
> + acpi_status status;
> + int rc;
> +
> + status = acpi_get_table(ACPI_SIG_RAS2, 0,
> + (struct acpi_table_header **)&ras2_tab);
> + if (ACPI_FAILURE(status) || !ras2_tab) {
> + const char *msg = acpi_format_exception(status);
> +
> + pr_err("Failed to get table, %s\n", msg);
If only going to use it here maybe
pr_err("Failed to get table, %s\n",
acpi_format_exception(status));
and save on the local variable.
> + return;
> + }
> +
> + rc = acpi_ras2_parse();
> + if (rc) {
> + acpi_put_table((struct acpi_table_header *)ras2_tab);
> + pr_err("Failed to parse RAS2 table\n");
> + }
> +}
> diff --git a/include/acpi/ras2.h b/include/acpi/ras2.h
> new file mode 100644
> index 000000000000..5b27c1f30096
> --- /dev/null
> +++ b/include/acpi/ras2.h
> @@ -0,0 +1,48 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * ACPI RAS2 driver header file
> + *
> + * Copyright (c) 2024-2025 HiSilicon Limited
> + */
> +
> +#ifndef _ACPI_RAS2_H
> +#define _ACPI_RAS2_H
> +
> +#include <linux/acpi.h>
> +#include <linux/auxiliary_bus.h>
> +#include <linux/mailbox_client.h>
> +#include <linux/mutex.h>
> +#include <linux/types.h>
> +
> +#define RAS2_PCC_CMD_COMPLETE BIT(0)
> +#define RAS2_PCC_CMD_ERROR BIT(2)
> +
I think these bits are from table 14.11 and
generic to all PCC status registers? Should these
have more generic names rather than ras2 specific ones?
> +/* RAS2 specific PCC commands */
> +#define RAS2_PCC_CMD_EXEC 0x01
Are we mixing commands and field definitions both
with prefix RAS2_PCC_CMD_ ? That is somewhat
confusing.
> +
> +#define RAS2_AUX_DEV_NAME "ras2"
> +#define RAS2_MEM_DEV_ID_NAME "acpi_ras2_mem"
> +
I would add a forwards def
struct device;
whilst it is really unlikely that headers would ever be reorganized
such that auxiliary_bus.h would not include device.h given the embedded
device we shouldn't rely on that here.
> +/* Data structure RAS2 table */
> +struct ras2_mem_ctx {
> + struct auxiliary_device adev;
> + /* Lock to provide mutually exclusive access to PCC channel */
> + struct mutex lock;
> + struct device *dev;
> + struct acpi_ras2_shmem __iomem *comm_addr;
> + void *pcc_subspace;
> + int id;
> +};
> +
> +#ifdef CONFIG_ACPI_RAS2
> +void __init acpi_ras2_init(void);
> +int ras2_send_pcc_cmd(struct ras2_mem_ctx *ras2_ctx, u16 cmd);
> +#else
> +static inline void acpi_ras2_init(void) { }
> +
> +static inline int ras2_send_pcc_cmd(struct ras2_mem_ctx *ras2_ctx, u16 cmd)
Is this stub ever needed? To me it seems unlikely
we would have a user that is built without a dependency
on CONFIG_ACPI_RAS2. This is different from acpi_ras2_init()
which makes much more sense to me.
> +{
> + return -EOPNOTSUPP;
> +}
> +#endif
> +#endif /* _ACPI_RAS2_H */
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 3/3] ras: mem: Add memory ACPI RAS2 driver
2025-03-05 18:02 ` [PATCH v2 3/3] ras: mem: Add memory " shiju.jose
@ 2025-03-06 9:32 ` Jonathan Cameron
2025-03-07 21:51 ` Daniel Ferguson
1 sibling, 0 replies; 15+ messages in thread
From: Jonathan Cameron @ 2025-03-06 9:32 UTC (permalink / raw)
To: shiju.jose
Cc: linux-edac, linux-acpi, bp, tony.luck, rafael, lenb, mchehab,
leo.duran, Yazen.Ghannam, linux-cxl, dan.j.williams, dave,
dave.jiang, alison.schofield, vishal.l.verma, ira.weiny, david,
Vilas.Sridharan, linux-mm, linux-kernel, rientjes, jiaqiyan,
Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse, jthoughton,
somasundaram.a, erdemaktas, pgonda, duenwen, gthelen, wschwartz,
dferguson, wbs, nifan.cxl, tanxiaofei, prime.zeng, roberto.sassu,
kangkang.shen, wanghuiqiang, linuxarm
On Wed, 5 Mar 2025 18:02:24 +0000
<shiju.jose@huawei.com> wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
>
> Memory ACPI RAS2 auxiliary driver binds to the auxiliary device
> add by the ACPI RAS2 table parser.
>
> Driver uses a PCC subspace for communicating with the ACPI compliant
> platform.
>
> Device with ACPI RAS2 scrub feature registers with EDAC device driver,
> which retrieves the scrub descriptor from EDAC scrub and exposes
> the scrub control attributes for RAS2 scrub instance to userspace in
> /sys/bus/edac/devices/acpi_ras_mem0/scrubX/.
>
> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Tested-by: Daniel Ferguson <danielf@os.amperecomputing.com>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> diff --git a/Documentation/edac/scrub.rst b/Documentation/edac/scrub.rst
> index daab929cdba1..fc8dcbd13f91 100644
> --- a/Documentation/edac/scrub.rst
> +++ b/Documentation/edac/scrub.rst
> @@ -264,3 +264,76 @@ Sysfs files are documented in
...
> +1.2.4. Program background scrubbing for RAS2 device to repeat in every 21600 seconds (quarter of a day).
wrap to 80 chars. I think that is fine for titles in sphinx.
> +
> +# echo 21600 > /sys/bus/edac/devices/acpi_ras_mem0/scrub0/current_cycle_duration
> +
> +1.2.5. Start 'background scrubbing'.
> +
> +# echo 1 > /sys/bus/edac/devices/acpi_ras_mem0/scrub0/enable_background
> diff --git a/drivers/ras/acpi_ras2.c b/drivers/ras/acpi_ras2.c
> new file mode 100644
> index 000000000000..2f9317aa7b81
> --- /dev/null
> +++ b/drivers/ras/acpi_ras2.c
> @@ -0,0 +1,391 @@
> +struct acpi_ras2_ps_shared_mem {
> + struct acpi_ras2_shmem common;
> + struct acpi_ras2_patrol_scrub_param params;
> +};
...
> +static int ras2_update_patrol_scrub_params_cache(struct ras2_mem_ctx *ras2_ctx)
> +{
> + struct acpi_ras2_ps_shared_mem __iomem *ps_sm =
> + (void *)ras2_ctx->comm_addr;
Would a container_of() be better here given the type cast is doing
that with the assumption of it being first element of ps_shared_mem.
Same in other places, so maybe a macro.
> + int ret;
> +
> + ps_sm->common.set_caps[0] = RAS2_SUPPORT_HW_PARTOL_SCRUB;
> + ps_sm->params.cmd = RAS2_GET_PATROL_PARAMETERS;
...
> +
> +static int ras2_hw_scrub_set_enabled_bg(struct device *dev, void *drv_data, bool enable)
> +{
> + struct ras2_mem_ctx *ras2_ctx = drv_data;
> + struct acpi_ras2_ps_shared_mem __iomem *ps_sm =
> + (void *)ras2_ctx->comm_addr;
As above, maybe container_of appropriate as we have
a definition of what we are casting it to that has the thing
we are casting from as first element.
> + bool running;
> + int ret;
> +
...
> +
> +static int ras2_probe(struct auxiliary_device *auxdev,
> + const struct auxiliary_device_id *id)
> +{
> + struct ras2_mem_ctx *ras2_ctx = container_of(auxdev, struct ras2_mem_ctx, adev);
> + struct edac_dev_feature ras_features[RAS2_DEV_NUM_RAS_FEATURES];
Given we only have 1 RAS2 feature I'd be tempted to leave
making this flexible for some future series that adds a second one.
So maybe just have a single feature rather than array of 1.
> + char scrub_name[RAS2_SCRUB_NAME_LEN];
> + int num_ras_features = 0;
With change below this isn't needed.
> + int ret;
> +
> + if (!ras2_is_patrol_scrub_support(ras2_ctx))
> + return -EOPNOTSUPP;
> +
> + ret = ras2_update_patrol_scrub_params_cache(ras2_ctx);
> + if (ret)
> + return ret;
> +
> + snprintf(scrub_name, sizeof(scrub_name), "acpi_ras_mem%d",
> + ras2_ctx->id);
> +
> + ras_features[num_ras_features].ft_type = RAS_FEAT_SCRUB;
> + ras_features[num_ras_features].instance = ras2_ctx->instance;
> + ras_features[num_ras_features].scrub_ops = &ras2_scrub_ops;
> + ras_features[num_ras_features].ctx = ras2_ctx;
> + num_ras_features++;
As above, can also just assume this is 1 becasue it always is.
> +
> + return edac_dev_register(&auxdev->dev, scrub_name, NULL,
> + num_ras_features, ras_features);
here pass in &ras_feature after making it not be an array.
> +}
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [PATCH v2 2/3] ACPI:RAS2: Add ACPI RAS2 driver
2025-03-06 9:19 ` Jonathan Cameron
@ 2025-03-06 11:21 ` Shiju Jose
2025-03-10 12:44 ` Jonathan Cameron
0 siblings, 1 reply; 15+ messages in thread
From: Shiju Jose @ 2025-03-06 11:21 UTC (permalink / raw)
To: Jonathan Cameron
Cc: linux-edac, linux-acpi, bp, tony.luck, rafael, lenb, mchehab,
leo.duran, Yazen.Ghannam, linux-cxl, dan.j.williams, dave,
dave.jiang, alison.schofield, vishal.l.verma, ira.weiny, david,
Vilas.Sridharan, linux-mm, linux-kernel, rientjes, jiaqiyan,
Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse, jthoughton,
somasundaram.a, erdemaktas, pgonda, duenwen, gthelen, wschwartz,
dferguson, wbs, nifan.cxl, tanxiaofei, Zengtao (B),
Roberto Sassu, kangkang.shen, wanghuiqiang, Linuxarm
>-----Original Message-----
>From: Jonathan Cameron <jonathan.cameron@huawei.com>
>Sent: 06 March 2025 09:19
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: linux-edac@vger.kernel.org; linux-acpi@vger.kernel.org; bp@alien8.de;
>tony.luck@intel.com; rafael@kernel.org; lenb@kernel.org;
>mchehab@kernel.org; leo.duran@amd.com; Yazen.Ghannam@amd.com; linux-
>cxl@vger.kernel.org; dan.j.williams@intel.com; dave@stgolabs.net;
>dave.jiang@intel.com; alison.schofield@intel.com; vishal.l.verma@intel.com;
>ira.weiny@intel.com; david@redhat.com; Vilas.Sridharan@amd.com; linux-
>mm@kvack.org; linux-kernel@vger.kernel.org; rientjes@google.com;
>jiaqiyan@google.com; Jon.Grimm@amd.com; dave.hansen@linux.intel.com;
>naoya.horiguchi@nec.com; james.morse@arm.com; jthoughton@google.com;
>somasundaram.a@hpe.com; erdemaktas@google.com; pgonda@google.com;
>duenwen@google.com; gthelen@google.com;
>wschwartz@amperecomputing.com; dferguson@amperecomputing.com;
>wbs@os.amperecomputing.com; nifan.cxl@gmail.com; tanxiaofei
><tanxiaofei@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>; Roberto
>Sassu <roberto.sassu@huawei.com>; kangkang.shen@futurewei.com;
>wanghuiqiang <wanghuiqiang@huawei.com>; Linuxarm
><linuxarm@huawei.com>
>Subject: Re: [PATCH v2 2/3] ACPI:RAS2: Add ACPI RAS2 driver
>
>On Wed, 5 Mar 2025 18:02:23 +0000
><shiju.jose@huawei.com> wrote:
>
>> From: Shiju Jose <shiju.jose@huawei.com>
>>
>> Add support for ACPI RAS2 feature table (RAS2) defined in the ACPI 6.5
>> Specification, section 5.2.21.
>> Driver defines RAS2 Init, which extracts the RAS2 table and driver
>> adds auxiliary device for each memory feature which binds to the
>> RAS2 memory driver.
>>
>> Driver uses PCC mailbox to communicate with the ACPI HW and the driver
>> adds OSPM interfaces to send RAS2 commands.
>>
>> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> Co-developed-by: A Somasundaram <somasundaram.a@hpe.com>
>> Signed-off-by: A Somasundaram <somasundaram.a@hpe.com>
>> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Tested-by: Daniel Ferguson <danielf@os.amperecomputing.com>
>> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
>
>Hi Shiju,
>
>I took another look through as it's been a while and I've pretty much forgotten
>this code :(
>
>Anyhow, a few minor comments inline.
Hi Jonathan,
Thanks for the feedbacks.
Please find reply inline.
Thanks,
Shiju
>
>Thanks,
>
>Jonathan
>
>> diff --git a/drivers/acpi/ras2.c b/drivers/acpi/ras2.c new file mode
>> 100755 index 000000000000..8831a2bd5fab
>> --- /dev/null
>> +++ b/drivers/acpi/ras2.c
>
>
>
>
>> +static int ras2_register_pcc_channel(struct ras2_mem_ctx *ras2_ctx, int
>pcc_id)
>> +{
>> + struct ras2_pcc_subspace *pcc_subspace;
>> + struct pcc_mbox_chan *pcc_chan;
>> + struct mbox_client *mbox_cl;
>> +
>> + if (pcc_id < 0)
>> + return -EINVAL;
>> +
>> + mutex_lock(&ras2_pcc_lock);
>> + list_for_each_entry(pcc_subspace, &ras2_pcc_subspaces, elem) {
>> + if (pcc_subspace->pcc_id != pcc_id)
>> + continue;
>> + ras2_ctx->pcc_subspace = pcc_subspace;
>> + pcc_subspace->ref_count++;
>> + mutex_unlock(&ras2_pcc_lock);
>> + return 0;
>> + }
>> + mutex_unlock(&ras2_pcc_lock);
>> +
>> + pcc_subspace = kcalloc(1, sizeof(*pcc_subspace), GFP_KERNEL);
>
>if allocating a count of 1, why not kzalloc?
Will use kzalloc.
>
>> + if (!pcc_subspace)
>> + return -ENOMEM;
>
>
>
>
>> +static int acpi_ras2_parse(void)
>> +{
>> + struct acpi_ras2_pcc_desc *pcc_desc_list;
>> + int pcc_id;
>> + u8 count = 0;
>> + int rc, i;
>> +
>> + if (ras2_tab->header.length < sizeof(struct acpi_table_ras2)) {
>
>extra space before <
Will fix.
>
>Maybe sizeof(*ras2_tab) is cleaner.
Sure.
>
>> + pr_warn(FW_WARN "ACPI RAS2 table present but broken (too
>short #1)\n");
>> + return -EINVAL;
>> + }
>> +
>> + if (!ras2_tab->num_pcc_descs) {
>> + pr_warn(FW_WARN "No PCC descs in ACPI RAS2 table\n");
>> + return -EINVAL;
>> + }
>> +
>> + pcc_desc_list = (struct acpi_ras2_pcc_desc *)(ras2_tab + 1);
>> + /* Double scan for the case of only one actual controller */
>> + pcc_id = -1;
>> + count = 0;
>
>Already set above, so no need to do it again. I'd do it just here. Can
>put it in the loop init though.
Will change.
>
>> + for (i = 0; i < ras2_tab->num_pcc_descs; i++, pcc_desc_list++) {
>> + if (pcc_desc_list->feature_type != RAS2_FEAT_TYPE_MEMORY)
>> + continue;
>> + if (pcc_id == -1) {
>> + pcc_id = pcc_desc_list->channel_id;
>> + count++;
>> + }
>> + if (pcc_desc_list->channel_id != pcc_id)
>> + count++;
>> + }
>> +
>> + /*
>> + * Workaround for the client platform with multiple scrub devices
>> + * but uses single PCC subspace for communication.
>> + */
>> + if (count == 1) {
>> + /* Add auxiliary device and bind ACPI RAS2 memory driver */
>> + rc = ras2_add_aux_device(RAS2_MEM_DEV_ID_NAME, pcc_id);
>> + if (rc)
>> + return rc;
>> +
>> + return 0;
>> + }
>> +
>> + pcc_desc_list = (struct acpi_ras2_pcc_desc *)(ras2_tab + 1);
>> + count = 0;
>
>Maybe set in loop init.
Sure.
>
>> + for (i = 0; i < ras2_tab->num_pcc_descs; i++, pcc_desc_list++) {
>> + if (pcc_desc_list->feature_type != RAS2_FEAT_TYPE_MEMORY)
>> + continue;
>> + pcc_id = pcc_desc_list->channel_id;
>> + /* Add auxiliary device and bind ACPI RAS2 memory driver */
>obvious enough to drop the comment I think.
Will do.
>
>> + rc = ras2_add_aux_device(RAS2_MEM_DEV_ID_NAME, pcc_id);
> pcc_desc_list->channel_id);
>and no local variable.
Ok.
>
>> + if (rc)
>> + return rc;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +void __init acpi_ras2_init(void)
>> +{
>> + acpi_status status;
>> + int rc;
>> +
>> + status = acpi_get_table(ACPI_SIG_RAS2, 0,
>> + (struct acpi_table_header **)&ras2_tab);
>> + if (ACPI_FAILURE(status) || !ras2_tab) {
>> + const char *msg = acpi_format_exception(status);
>> +
>> + pr_err("Failed to get table, %s\n", msg);
>
>If only going to use it here maybe
> pr_err("Failed to get table, %s\n",
> acpi_format_exception(status));
>and save on the local variable.
Will change.
>
>> + return;
>> + }
>> +
>> + rc = acpi_ras2_parse();
>> + if (rc) {
>> + acpi_put_table((struct acpi_table_header *)ras2_tab);
>> + pr_err("Failed to parse RAS2 table\n");
>> + }
>> +}
>> diff --git a/include/acpi/ras2.h b/include/acpi/ras2.h
>> new file mode 100644
>> index 000000000000..5b27c1f30096
>> --- /dev/null
>> +++ b/include/acpi/ras2.h
>> @@ -0,0 +1,48 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*
>> + * ACPI RAS2 driver header file
>> + *
>> + * Copyright (c) 2024-2025 HiSilicon Limited
>> + */
>> +
>> +#ifndef _ACPI_RAS2_H
>> +#define _ACPI_RAS2_H
>> +
>> +#include <linux/acpi.h>
>> +#include <linux/auxiliary_bus.h>
>> +#include <linux/mailbox_client.h>
>> +#include <linux/mutex.h>
>> +#include <linux/types.h>
>> +
>> +#define RAS2_PCC_CMD_COMPLETE BIT(0)
>> +#define RAS2_PCC_CMD_ERROR BIT(2)
>> +
>I think these bits are from table 14.11 and
>generic to all PCC status registers? Should these
>have more generic names rather than ras2 specific ones?
Yes.
Instead will use PCC_STATUS_CMD_ COMPLETE and PCC_STATUS_ ERROR
from include/acpi/pcc.h.
>
>> +/* RAS2 specific PCC commands */
>> +#define RAS2_PCC_CMD_EXEC 0x01
>Are we mixing commands and field definitions both
>with prefix RAS2_PCC_CMD_ ? That is somewhat
>confusing.
Will add Table 5.82: .. here in the comment and
Is rename to PCC_CMD_ EXEC_RAS2 better?
>
>> +
>> +#define RAS2_AUX_DEV_NAME "ras2"
>> +#define RAS2_MEM_DEV_ID_NAME "acpi_ras2_mem"
>> +
>I would add a forwards def
>struct device;
Sure.
>
>whilst it is really unlikely that headers would ever be reorganized
>such that auxiliary_bus.h would not include device.h given the embedded
>device we shouldn't rely on that here.
>
>> +/* Data structure RAS2 table */
>> +struct ras2_mem_ctx {
>> + struct auxiliary_device adev;
>> + /* Lock to provide mutually exclusive access to PCC channel */
>> + struct mutex lock;
>> + struct device *dev;
>> + struct acpi_ras2_shmem __iomem *comm_addr;
>> + void *pcc_subspace;
>> + int id;
>> +};
>> +
>> +#ifdef CONFIG_ACPI_RAS2
>> +void __init acpi_ras2_init(void);
>> +int ras2_send_pcc_cmd(struct ras2_mem_ctx *ras2_ctx, u16 cmd);
>> +#else
>> +static inline void acpi_ras2_init(void) { }
>> +
>> +static inline int ras2_send_pcc_cmd(struct ras2_mem_ctx *ras2_ctx, u16
>cmd)
>
>Is this stub ever needed? To me it seems unlikely
>we would have a user that is built without a dependency
>on CONFIG_ACPI_RAS2. This is different from acpi_ras2_init()
>which makes much more sense to me.
Ok will remove.
>
>> +{
>> + return -EOPNOTSUPP;
>> +}
>> +#endif
>> +#endif /* _ACPI_RAS2_H */
Thanks,
Shiju
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 3/3] ras: mem: Add memory ACPI RAS2 driver
2025-03-05 18:02 ` [PATCH v2 3/3] ras: mem: Add memory " shiju.jose
2025-03-06 9:32 ` Jonathan Cameron
@ 2025-03-07 21:51 ` Daniel Ferguson
2025-03-10 11:12 ` Shiju Jose
1 sibling, 1 reply; 15+ messages in thread
From: Daniel Ferguson @ 2025-03-07 21:51 UTC (permalink / raw)
To: shiju.jose, linux-edac, linux-acpi, bp, tony.luck, rafael, lenb,
mchehab, leo.duran, Yazen.Ghannam
Cc: linux-cxl, dan.j.williams, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, david,
Vilas.Sridharan, linux-mm, linux-kernel, rientjes, jiaqiyan,
Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse, jthoughton,
somasundaram.a, erdemaktas, pgonda, duenwen, gthelen, wschwartz,
dferguson, wbs, nifan.cxl, tanxiaofei, prime.zeng, roberto.sassu,
kangkang.shen, wanghuiqiang, linuxarm
> +static int ras2_hw_scrub_read_size(struct device *dev, void *drv_data, u64 *size)
> +{
> + struct ras2_mem_ctx *ras2_ctx = drv_data;
> + int ret;
> +
> + if (ras2_ctx->bg_scrub)
> + return -EBUSY;
> +
> + ret = ras2_update_patrol_scrub_params_cache(ras2_ctx);
> + if (ret)
> + return ret;
> +
> + *size = ras2_ctx->size;
> +
> + return 0;
> +}
Calling ras2_update_patrol_scrub_params_cache here is problematic.
Imagine:
echo 0x1000 > size
cat size
echo 0x2000000000 > addr
What happens here? What happens is the scrub range is not what you expect it to
be. Once you cat size, you reset the size from what you initially set it to.
I don't think that is what anyone will expect. It certainly caused us to stumble
while testing.
Regards,
~Daniel
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [PATCH v2 3/3] ras: mem: Add memory ACPI RAS2 driver
2025-03-07 21:51 ` Daniel Ferguson
@ 2025-03-10 11:12 ` Shiju Jose
2025-03-10 14:36 ` Shiju Jose
0 siblings, 1 reply; 15+ messages in thread
From: Shiju Jose @ 2025-03-10 11:12 UTC (permalink / raw)
To: Daniel Ferguson, linux-edac, linux-acpi, bp, tony.luck, rafael,
lenb, mchehab, leo.duran, Yazen.Ghannam
Cc: linux-cxl, dan.j.williams, dave, Jonathan Cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, david,
Vilas.Sridharan, linux-mm, linux-kernel, rientjes, jiaqiyan,
Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse, jthoughton,
somasundaram.a, erdemaktas, pgonda, duenwen, gthelen, wschwartz,
dferguson, wbs, nifan.cxl, tanxiaofei, Zengtao (B),
Roberto Sassu, kangkang.shen, wanghuiqiang, Linuxarm
>-----Original Message-----
>From: Daniel Ferguson <danielf@os.amperecomputing.com>
>Sent: 07 March 2025 21:52
>To: Shiju Jose <shiju.jose@huawei.com>; linux-edac@vger.kernel.org; linux-
>acpi@vger.kernel.org; bp@alien8.de; tony.luck@intel.com; rafael@kernel.org;
>lenb@kernel.org; mchehab@kernel.org; leo.duran@amd.com;
>Yazen.Ghannam@amd.com
>Cc: linux-cxl@vger.kernel.org; dan.j.williams@intel.com; dave@stgolabs.net;
>Jonathan Cameron <jonathan.cameron@huawei.com>; dave.jiang@intel.com;
>alison.schofield@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>david@redhat.com; Vilas.Sridharan@amd.com; linux-mm@kvack.org; linux-
>kernel@vger.kernel.org; rientjes@google.com; jiaqiyan@google.com;
>Jon.Grimm@amd.com; dave.hansen@linux.intel.com;
>naoya.horiguchi@nec.com; james.morse@arm.com; jthoughton@google.com;
>somasundaram.a@hpe.com; erdemaktas@google.com; pgonda@google.com;
>duenwen@google.com; gthelen@google.com;
>wschwartz@amperecomputing.com; dferguson@amperecomputing.com;
>wbs@os.amperecomputing.com; nifan.cxl@gmail.com; tanxiaofei
><tanxiaofei@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>; Roberto
>Sassu <roberto.sassu@huawei.com>; kangkang.shen@futurewei.com;
>wanghuiqiang <wanghuiqiang@huawei.com>; Linuxarm
><linuxarm@huawei.com>
>Subject: Re: [PATCH v2 3/3] ras: mem: Add memory ACPI RAS2 driver
>
>
>> +static int ras2_hw_scrub_read_size(struct device *dev, void
>> +*drv_data, u64 *size) {
>> + struct ras2_mem_ctx *ras2_ctx = drv_data;
>> + int ret;
>> +
>> + if (ras2_ctx->bg_scrub)
>> + return -EBUSY;
>> +
>> + ret = ras2_update_patrol_scrub_params_cache(ras2_ctx);
>> + if (ret)
>> + return ret;
>> +
>> + *size = ras2_ctx->size;
>> +
>> + return 0;
>> +}
>
>Calling ras2_update_patrol_scrub_params_cache here is problematic.
>
>Imagine:
> echo 0x1000 > size
> cat size
> echo 0x2000000000 > addr
>
>What happens here? What happens is the scrub range is not what you expect it
>to be. Once you cat size, you reset the size from what you initially set it to.
>I don't think that is what anyone will expect. It certainly caused us to stumble
>while testing.
This is an expected behavior and this extra call was added here when changed using attribute 'addr' to start
the on-demand scrub operation instead of previous separate attribute ' enable_on_demand' to start
the on-demand scrub operation, according to Borislav's suggestion in v13.
Please see the following comment in the ras2_hw_scrub_read_addr() fnction,
"Userspace will get the status of the demand scrubbing through the address range
read from the firmware. When the demand scrubbing is finished firmware must reset actual
address range to 0. Otherwise userspace assumes demand scrubbing is in progress."
Here sysfs attributes 'addr' and 'size' is reading the field: Actual Address Range of
Table 5.87: Parameter Block Structure for PATROL_SCRUB, written by the firmware.
In my opinion, reading back the address range size set in the sysfs before actually writing
the address range to the firmware and starting the on-demand scrub operation doesn't
hold much significance?
>
>Regards,
>~Daniel
Thanks,
Shiju
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 2/3] ACPI:RAS2: Add ACPI RAS2 driver
2025-03-06 11:21 ` Shiju Jose
@ 2025-03-10 12:44 ` Jonathan Cameron
0 siblings, 0 replies; 15+ messages in thread
From: Jonathan Cameron @ 2025-03-10 12:44 UTC (permalink / raw)
To: Shiju Jose, dferguson
Cc: linux-edac, linux-acpi, bp, tony.luck, rafael, lenb, mchehab,
leo.duran, Yazen.Ghannam, linux-cxl, dan.j.williams, dave,
dave.jiang, alison.schofield, vishal.l.verma, ira.weiny, david,
Vilas.Sridharan, linux-mm, linux-kernel, rientjes, jiaqiyan,
Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse, jthoughton,
somasundaram.a, erdemaktas, pgonda, duenwen, gthelen, wschwartz,
wbs, nifan.cxl, tanxiaofei, Zengtao (B),
Roberto Sassu, kangkang.shen, wanghuiqiang, Linuxarm
> >> +#define RAS2_PCC_CMD_COMPLETE BIT(0)
> >> +#define RAS2_PCC_CMD_ERROR BIT(2)
> >> +
> >I think these bits are from table 14.11 and
> >generic to all PCC status registers? Should these
> >have more generic names rather than ras2 specific ones?
> Yes.
> Instead will use PCC_STATUS_CMD_ COMPLETE and PCC_STATUS_ ERROR
> from include/acpi/pcc.h.
>
> >
> >> +/* RAS2 specific PCC commands */
> >> +#define RAS2_PCC_CMD_EXEC 0x01
> >Are we mixing commands and field definitions both
> >with prefix RAS2_PCC_CMD_ ? That is somewhat
> >confusing.
> Will add Table 5.82: .. here in the comment and
> Is rename to PCC_CMD_ EXEC_RAS2 better?
That seems OK to me.
For things you agree with feel free to just crop out that bit
of the email so it is easier to spot the remaining questions.
Thanks,
Jonathan
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [PATCH v2 3/3] ras: mem: Add memory ACPI RAS2 driver
2025-03-10 11:12 ` Shiju Jose
@ 2025-03-10 14:36 ` Shiju Jose
2025-03-10 17:14 ` Daniel Ferguson
0 siblings, 1 reply; 15+ messages in thread
From: Shiju Jose @ 2025-03-10 14:36 UTC (permalink / raw)
To: Daniel Ferguson, linux-edac, linux-acpi, bp, tony.luck, rafael,
lenb, mchehab, leo.duran, Yazen.Ghannam
Cc: linux-cxl, dan.j.williams, dave, Jonathan Cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, david,
Vilas.Sridharan, linux-mm, linux-kernel, rientjes, jiaqiyan,
Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse, jthoughton,
somasundaram.a, erdemaktas, pgonda, duenwen, gthelen, wschwartz,
dferguson, wbs, nifan.cxl, tanxiaofei, Zengtao (B),
Roberto Sassu, kangkang.shen, wanghuiqiang, Linuxarm
>-----Original Message-----
>From: Shiju Jose
>Sent: 10 March 2025 11:12
>To: 'Daniel Ferguson' <danielf@os.amperecomputing.com>; linux-
>edac@vger.kernel.org; linux-acpi@vger.kernel.org; bp@alien8.de;
>tony.luck@intel.com; rafael@kernel.org; lenb@kernel.org;
>mchehab@kernel.org; leo.duran@amd.com; Yazen.Ghannam@amd.com
>Cc: linux-cxl@vger.kernel.org; dan.j.williams@intel.com; dave@stgolabs.net;
>Jonathan Cameron <jonathan.cameron@huawei.com>; dave.jiang@intel.com;
>alison.schofield@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>david@redhat.com; Vilas.Sridharan@amd.com; linux-mm@kvack.org; linux-
>kernel@vger.kernel.org; rientjes@google.com; jiaqiyan@google.com;
>Jon.Grimm@amd.com; dave.hansen@linux.intel.com;
>naoya.horiguchi@nec.com; james.morse@arm.com; jthoughton@google.com;
>somasundaram.a@hpe.com; erdemaktas@google.com; pgonda@google.com;
>duenwen@google.com; gthelen@google.com;
>wschwartz@amperecomputing.com; dferguson@amperecomputing.com;
>wbs@os.amperecomputing.com; nifan.cxl@gmail.com; tanxiaofei
><tanxiaofei@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>; Roberto
>Sassu <roberto.sassu@huawei.com>; kangkang.shen@futurewei.com;
>wanghuiqiang <wanghuiqiang@huawei.com>; Linuxarm
><linuxarm@huawei.com>
>Subject: RE: [PATCH v2 3/3] ras: mem: Add memory ACPI RAS2 driver
>
>>-----Original Message-----
>>From: Daniel Ferguson <danielf@os.amperecomputing.com>
>>Sent: 07 March 2025 21:52
>>To: Shiju Jose <shiju.jose@huawei.com>; linux-edac@vger.kernel.org;
>>linux- acpi@vger.kernel.org; bp@alien8.de; tony.luck@intel.com;
>>rafael@kernel.org; lenb@kernel.org; mchehab@kernel.org;
>>leo.duran@amd.com; Yazen.Ghannam@amd.com
>>Cc: linux-cxl@vger.kernel.org; dan.j.williams@intel.com;
>>dave@stgolabs.net; Jonathan Cameron <jonathan.cameron@huawei.com>;
>>dave.jiang@intel.com; alison.schofield@intel.com;
>>vishal.l.verma@intel.com; ira.weiny@intel.com; david@redhat.com;
>>Vilas.Sridharan@amd.com; linux-mm@kvack.org; linux-
>>kernel@vger.kernel.org; rientjes@google.com; jiaqiyan@google.com;
>>Jon.Grimm@amd.com; dave.hansen@linux.intel.com;
>>naoya.horiguchi@nec.com; james.morse@arm.com; jthoughton@google.com;
>>somasundaram.a@hpe.com; erdemaktas@google.com; pgonda@google.com;
>>duenwen@google.com; gthelen@google.com;
>wschwartz@amperecomputing.com;
>>dferguson@amperecomputing.com; wbs@os.amperecomputing.com;
>>nifan.cxl@gmail.com; tanxiaofei <tanxiaofei@huawei.com>; Zengtao (B)
>><prime.zeng@hisilicon.com>; Roberto Sassu <roberto.sassu@huawei.com>;
>>kangkang.shen@futurewei.com; wanghuiqiang <wanghuiqiang@huawei.com>;
>>Linuxarm <linuxarm@huawei.com>
>>Subject: Re: [PATCH v2 3/3] ras: mem: Add memory ACPI RAS2 driver
>>
>>
>>> +static int ras2_hw_scrub_read_size(struct device *dev, void
>>> +*drv_data, u64 *size) {
>>> + struct ras2_mem_ctx *ras2_ctx = drv_data;
>>> + int ret;
>>> +
>>> + if (ras2_ctx->bg_scrub)
>>> + return -EBUSY;
>>> +
>>> + ret = ras2_update_patrol_scrub_params_cache(ras2_ctx);
>>> + if (ret)
>>> + return ret;
>>> +
>>> + *size = ras2_ctx->size;
>>> +
>>> + return 0;
>>> +}
>>
>>Calling ras2_update_patrol_scrub_params_cache here is problematic.
>>
>>Imagine:
>> echo 0x1000 > size
>> cat size
>> echo 0x2000000000 > addr
>>
>>What happens here? What happens is the scrub range is not what you
>>expect it to be. Once you cat size, you reset the size from what you initially set
>it to.
>>I don't think that is what anyone will expect. It certainly caused us
>>to stumble while testing.
>
>This is an expected behavior and this extra call was added here when changed
>using attribute 'addr' to start the on-demand scrub operation instead of
>previous separate attribute ' enable_on_demand' to start the on-demand scrub
>operation, according to Borislav's suggestion in v13.
>
> Please see the following comment in the ras2_hw_scrub_read_addr() fnction,
>"Userspace will get the status of the demand scrubbing through the address
>range read from the firmware. When the demand scrubbing is finished
>firmware must reset actual address range to 0. Otherwise userspace assumes
>demand scrubbing is in progress."
>
>Here sysfs attributes 'addr' and 'size' is reading the field: Actual Address Range
>of Table 5.87: Parameter Block Structure for PATROL_SCRUB, written by the
>firmware.
>
>In my opinion, reading back the address range size set in the sysfs before
>actually writing the address range to the firmware and starting the on-demand
>scrub operation doesn't hold much significance?
After further discussion, I will add a fix for this case to return the 'size' which the user set in the sysfs
until the scrubbing is started.
Thanks,
Shiju
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 3/3] ras: mem: Add memory ACPI RAS2 driver
2025-03-10 14:36 ` Shiju Jose
@ 2025-03-10 17:14 ` Daniel Ferguson
0 siblings, 0 replies; 15+ messages in thread
From: Daniel Ferguson @ 2025-03-10 17:14 UTC (permalink / raw)
To: Shiju Jose, linux-edac, linux-acpi, bp, tony.luck, rafael, lenb,
mchehab, leo.duran, Yazen.Ghannam
Cc: linux-cxl, dan.j.williams, dave, Jonathan Cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, david,
Vilas.Sridharan, linux-mm, linux-kernel, rientjes, jiaqiyan,
Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse, jthoughton,
somasundaram.a, erdemaktas, pgonda, duenwen, gthelen, wschwartz,
dferguson, wbs, nifan.cxl, tanxiaofei, Zengtao (B),
Roberto Sassu, kangkang.shen, wanghuiqiang, Linuxarm
>>>> +static int ras2_hw_scrub_read_size(struct device *dev, void
>>>> +*drv_data, u64 *size) {
>>>> + struct ras2_mem_ctx *ras2_ctx = drv_data;
>>>> + int ret;
>>>> +
>>>> + if (ras2_ctx->bg_scrub)
>>>> + return -EBUSY;
>>>> +
>>>> + ret = ras2_update_patrol_scrub_params_cache(ras2_ctx);
>>>> + if (ret)
>>>> + return ret;
>>>> +
>>>> + *size = ras2_ctx->size;
>>>> +
>>>> + return 0;
>>>> +}
>>>
>>> Calling ras2_update_patrol_scrub_params_cache here is problematic.
>>>
>>> Imagine:
>>> echo 0x1000 > size
>>> cat size
>>> echo 0x2000000000 > addr
>>>
>>> What happens here? What happens is the scrub range is not what you
>>> expect it to be. Once you cat size, you reset the size from what you initially set
>> it to.
>>> I don't think that is what anyone will expect. It certainly caused us
>>> to stumble while testing.
>>
>> This is an expected behavior and this extra call was added here when changed
>> using attribute 'addr' to start the on-demand scrub operation instead of
>> previous separate attribute ' enable_on_demand' to start the on-demand scrub
>> operation, according to Borislav's suggestion in v13.
>>
>> Please see the following comment in the ras2_hw_scrub_read_addr() fnction,
>> "Userspace will get the status of the demand scrubbing through the address
>> range read from the firmware. When the demand scrubbing is finished
>> firmware must reset actual address range to 0. Otherwise userspace assumes
>> demand scrubbing is in progress."
Why not just use Bit[0] in the Flags register of the Parameter Block Structure
for PATROL_SCRUB? It seems having firmware reset the actual address range is
extra complexity for something we already have a facility for.
>>
>> Here sysfs attributes 'addr' and 'size' is reading the field: Actual Address Range
>> of Table 5.87: Parameter Block Structure for PATROL_SCRUB, written by the
>> firmware.
>>
>> In my opinion, reading back the address range size set in the sysfs before
>> actually writing the address range to the firmware and starting the on-demand
>> scrub operation doesn't hold much significance?
>
> After further discussion, I will add a fix for this case to return the 'size' which the user set in the sysfs
> until the scrubbing is started.
I think fixing this will make the interface less confusing, but I also agree
that it doesn't hold much significance technically.
Regards,
Daniel
>
> Thanks,
> Shiju
>>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-03-10 17:14 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-05 18:02 [PATCH v2 0/3] ACPI: Add support for ACPI RAS2 feature table shiju.jose
2025-03-05 18:02 ` [PATCH v2 1/3] ACPI: ACPI 6.5: RAS2: Shorten RAS2 table structure and variable names shiju.jose
2025-03-05 18:51 ` Luck, Tony
2025-03-06 2:03 ` Jonathan Cameron
2025-03-06 6:05 ` Jonathan Cameron
2025-03-05 18:02 ` [PATCH v2 2/3] ACPI:RAS2: Add ACPI RAS2 driver shiju.jose
2025-03-06 9:19 ` Jonathan Cameron
2025-03-06 11:21 ` Shiju Jose
2025-03-10 12:44 ` Jonathan Cameron
2025-03-05 18:02 ` [PATCH v2 3/3] ras: mem: Add memory " shiju.jose
2025-03-06 9:32 ` Jonathan Cameron
2025-03-07 21:51 ` Daniel Ferguson
2025-03-10 11:12 ` Shiju Jose
2025-03-10 14:36 ` Shiju Jose
2025-03-10 17:14 ` Daniel Ferguson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox