linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Shiju Jose <shiju.jose@huawei.com>
To: Daniel Ferguson <danielf@os.amperecomputing.com>,
	"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>
Cc: "bp@alien8.de" <bp@alien8.de>,
	"rafael@kernel.org" <rafael@kernel.org>,
	"tony.luck@intel.com" <tony.luck@intel.com>,
	"lenb@kernel.org" <lenb@kernel.org>,
	"leo.duran@amd.com" <leo.duran@amd.com>,
	"Yazen.Ghannam@amd.com" <Yazen.Ghannam@amd.com>,
	"mchehab@kernel.org" <mchehab@kernel.org>,
	Jonathan Cameron <jonathan.cameron@huawei.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Linuxarm <linuxarm@huawei.com>,
	"rientjes@google.com" <rientjes@google.com>,
	"jiaqiyan@google.com" <jiaqiyan@google.com>,
	"Jon.Grimm@amd.com" <Jon.Grimm@amd.com>,
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
	"naoya.horiguchi@nec.com" <naoya.horiguchi@nec.com>,
	"james.morse@arm.com" <james.morse@arm.com>,
	"jthoughton@google.com" <jthoughton@google.com>,
	"somasundaram.a@hpe.com" <somasundaram.a@hpe.com>,
	"erdemaktas@google.com" <erdemaktas@google.com>,
	"pgonda@google.com" <pgonda@google.com>,
	"duenwen@google.com" <duenwen@google.com>,
	"gthelen@google.com" <gthelen@google.com>,
	"wschwartz@amperecomputing.com" <wschwartz@amperecomputing.com>,
	"dferguson@amperecomputing.com" <dferguson@amperecomputing.com>,
	"wbs@os.amperecomputing.com" <wbs@os.amperecomputing.com>,
	"nifan.cxl@gmail.com" <nifan.cxl@gmail.com>,
	tanxiaofei <tanxiaofei@huawei.com>,
	"Zengtao (B)" <prime.zeng@hisilicon.com>,
	Roberto Sassu <roberto.sassu@huawei.com>,
	"kangkang.shen@futurewei.com" <kangkang.shen@futurewei.com>,
	wanghuiqiang <wanghuiqiang@huawei.com>
Subject: RE: [PATCH v5 1/2] ACPI:RAS2: Add ACPI RAS2 driver
Date: Wed, 14 May 2025 11:31:59 +0000	[thread overview]
Message-ID: <19ccc1b78e104132962792b55ab92df5@huawei.com> (raw)
In-Reply-To: <8cdf7885-31b3-4308-8a7c-f4e427486429@os.amperecomputing.com>

>-----Original Message-----
>From: Daniel Ferguson <danielf@os.amperecomputing.com>
>Sent: 14 May 2025 03:55
>To: Shiju Jose <shiju.jose@huawei.com>; linux-edac@vger.kernel.org; linux-
>acpi@vger.kernel.org; linux-doc@vger.kernel.org
>Cc: bp@alien8.de; rafael@kernel.org; tony.luck@intel.com; lenb@kernel.org;
>leo.duran@amd.com; Yazen.Ghannam@amd.com; mchehab@kernel.org;
>Jonathan Cameron <jonathan.cameron@huawei.com>; linux-mm@kvack.org;
>Linuxarm <linuxarm@huawei.com>; rientjes@google.com;
>jiaqiyan@google.com; Jon.Grimm@amd.com; dave.hansen@linux.intel.com;
>naoya.horiguchi@nec.com; james.morse@arm.com; jthoughton@google.com;
>somasundaram.a@hpe.com; erdemaktas@google.com; pgonda@google.com;
>duenwen@google.com; gthelen@google.com;
>wschwartz@amperecomputing.com; dferguson@amperecomputing.com;
>wbs@os.amperecomputing.com; nifan.cxl@gmail.com; tanxiaofei
><tanxiaofei@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>; Roberto
>Sassu <roberto.sassu@huawei.com>; kangkang.shen@futurewei.com;
>wanghuiqiang <wanghuiqiang@huawei.com>
>Subject: Re: [PATCH v5 1/2] ACPI:RAS2: Add ACPI RAS2 driver
>
>> +static int ras2_report_cap_error(u32 cap_status) {
>> +	switch (cap_status) {
>> +	case ACPI_RAS2_NOT_VALID:
>> +	case ACPI_RAS2_NOT_SUPPORTED:
>> +		return -EPERM;
>> +	case ACPI_RAS2_BUSY:
>> +		return -EBUSY;
>> +	case ACPI_RAS2_FAILED:
>> +	case ACPI_RAS2_ABORTED:
>> +	case ACPI_RAS2_INVALID_DATA:
>> +		return -EINVAL;
>> +	default: /* 0 or other, Success */
>> +		return 0;
>> +	}
>> +}
>> +
>> +static int ras2_check_pcc_chan(struct ras2_pcc_subspace
>> +*pcc_subspace) {
>> +	struct acpi_ras2_shmem __iomem *gen_comm_base = pcc_subspace-
>>comm_addr;
>> +	u32 cap_status;
>> +	u16 status;
>> +	u32 rc;
>> +
>> +	/*
>> +	 * As per ACPI spec, the PCC space will be initialized by
>> +	 * platform and should have set the command completion bit when
>> +	 * PCC can be used by OSPM.
>> +	 *
>> +	 * Poll PCC status register every 3us(delay_us) for maximum of
>> +	 * deadline_us(timeout_us) until PCC command complete bit is
>set(cond).
>> +	 */
>> +	rc = readw_relaxed_poll_timeout(&gen_comm_base->status, status,
>> +					status &
>PCC_STATUS_CMD_COMPLETE, 3,
>> +					pcc_subspace->deadline_us);
>> +	if (rc) {
>> +		pr_warn("PCC check channel failed for : %d rc=%d\n",
>> +			pcc_subspace->pcc_id, rc);
>> +		return rc;
>> +	}
>> +
>> +	if (status & PCC_STATUS_ERROR) {
>> +		cap_status = readw_relaxed(&gen_comm_base-
>>set_caps_status);
>> +		rc = ras2_report_cap_error(cap_status);
>> +
>> +		status &= ~PCC_STATUS_ERROR;
>> +		writew_relaxed(status, &gen_comm_base->status);
>> +		return rc;
>> +	}
>> +
>> +	if (status & PCC_STATUS_CMD_COMPLETE)
>> +		return 0;
>> +
>> +	return -EIO;
>> +}
>
>We still have an outstanding problem. This may sound familiar.
>
>If a user specifies an invalid address, our firmware will set an error code in the
>set_caps_status field of the acpi_ras2_shmem structure. In our case, the error
>code is ACPI_RAS2_INVALID_DATA, and the user will observe an EINVAL. This is
>expected.
>
>However, if the user then subsequently attempts to write a VALID address,
>ras2_get_patrol_scrub_running will indirectly call ras2_check_pcc_chan using
>the previously INVALID address to determine if the scrubber is still running.
>Unfortunately, the INVALID address causes ras2_get_patrol_scrub_running to
>fail, therefore preventing the user from specifying a VALID address after
>specifying an INVALID address.
>
>The only way to move forward from this inescapable condition is to reboot the
>system.
>
>Here is a demo of the problem as I roughly see it on our system (I've labeled the
>line numbers for sake of discussion):
>1  [root@myhost scrub0]# echo 0x100000000 > size
>2  [root@myhost scrub0]# echo 0x1f00000000 > addr
>3  [root@myhost scrub0]# echo 0xcf00000000 > addr
>4  write error: Invalid argument
>5  [  214.446338] PCCT PCCT: Failed to start demand scrubbing
>6  [root@myhost scrub0]# echo 0x1f00000000 > addr
>7  write error: Invalid argument
>8  [  242.263909] PCCT PCCT: failed to read parameters
>9  [root@myhost scrub0]# echo 0x100000000 > size
>10 write error: Invalid argument
>11 [  246.190196] PCCT PCCT: failed to read parameters
>
>The upper most memory address on this system is 0xbf00000000. Line 1 and 2
>use valid values, and line 2 produces the expected results. On line 3, I've
>specified an INVALID address (outside of valid range). The error on line 5 is
>expected after executing the START_PATROL_SCRUBBER command with an
>INVALID address.
>
>Line 6 show how I attempt to specify a VALID address. Unfortunately,
>ras2_get_patrol_scrub_running encounters and error after executing
>GET_PATROL_PARAMETERS because it used the OLD INVALID values in ps_sm-
>>params.req_addr_range. Line 7 and 8 are the result. Since the flow of
>execution if aborted at this point, you can never rectify the situation and insert a
>valid value into ps_sm->params.req_addr_range, unless you reboot the system.
>
>One half baked solution to this problem, is to modify
>ras2_get_patrol_scrub_running so that if there is a non-zero address or size
>specified, AND the last error code we received was INVALID DATA, then assume
>the scrubber is NOT running.
Hi Daniel,

Thanks for reporting the issue.
Can you check whether following change fix the issue in your test setup?
=============================================
diff --git a/drivers/ras/acpi_ras2.c b/drivers/ras/acpi_ras2.c
index 4d9cfd3bdf45..ff4aa1b75860 100644
--- a/drivers/ras/acpi_ras2.c
+++ b/drivers/ras/acpi_ras2.c
@@ -255,6 +255,13 @@ static int ras2_hw_scrub_write_addr(struct device *dev, void *drv_data, u64 base
        ret = ras2_send_pcc_cmd(ras2_ctx, PCC_CMD_EXEC_RAS2);
        if (ret) {
                dev_err(ras2_ctx->dev, "Failed to start demand scrubbing\n");
+               if (ret == -EPERM || ret == -EINVAL) {
+                       ps_sm->params.req_addr_range[0] = 0;
+                       ps_sm->params.req_addr_range[1] = 0;
+                       ras2_ctx->base = 0;
+                       ras2_ctx->size = 0;
+                       ras2_ctx->od_scrub_sts = OD_SCRUB_STS_IDLE;
+               }
                return ret;
        }
        ras2_ctx->od_scrub_sts = OD_SCRUB_STS_ACTIVE;
==============================================
Thanks,
Shiju 

>
>Regards,
>~Daniel

  reply	other threads:[~2025-05-14 11:32 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-07 21:43 [PATCH v5 0/2] ACPI: Add support for ACPI RAS2 feature table shiju.jose
2025-05-07 21:43 ` [PATCH v5 1/2] ACPI:RAS2: Add ACPI RAS2 driver shiju.jose
2025-05-14  2:55   ` Daniel Ferguson
2025-05-14 11:31     ` Shiju Jose [this message]
2025-05-16  1:49       ` Daniel Ferguson
2025-05-16 13:36         ` Shiju Jose
2025-05-07 21:43 ` [PATCH v5 2/2] ras: mem: Add memory " shiju.jose
2025-05-12  8:16 ` [PATCH v5 0/2] ACPI: Add support for ACPI RAS2 feature table Jonathan Cameron
2025-05-12  8:38   ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=19ccc1b78e104132962792b55ab92df5@huawei.com \
    --to=shiju.jose@huawei.com \
    --cc=Jon.Grimm@amd.com \
    --cc=Yazen.Ghannam@amd.com \
    --cc=bp@alien8.de \
    --cc=danielf@os.amperecomputing.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dferguson@amperecomputing.com \
    --cc=duenwen@google.com \
    --cc=erdemaktas@google.com \
    --cc=gthelen@google.com \
    --cc=james.morse@arm.com \
    --cc=jiaqiyan@google.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=jthoughton@google.com \
    --cc=kangkang.shen@futurewei.com \
    --cc=lenb@kernel.org \
    --cc=leo.duran@amd.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxarm@huawei.com \
    --cc=mchehab@kernel.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=nifan.cxl@gmail.com \
    --cc=pgonda@google.com \
    --cc=prime.zeng@hisilicon.com \
    --cc=rafael@kernel.org \
    --cc=rientjes@google.com \
    --cc=roberto.sassu@huawei.com \
    --cc=somasundaram.a@hpe.com \
    --cc=tanxiaofei@huawei.com \
    --cc=tony.luck@intel.com \
    --cc=wanghuiqiang@huawei.com \
    --cc=wbs@os.amperecomputing.com \
    --cc=wschwartz@amperecomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox