From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40543D18130 for ; Mon, 14 Oct 2024 16:39:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C57306B0089; Mon, 14 Oct 2024 12:39:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C07346B008A; Mon, 14 Oct 2024 12:39:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ACEE76B008C; Mon, 14 Oct 2024 12:39:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 8D2006B0089 for ; Mon, 14 Oct 2024 12:39:08 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 042CA1410D3 for ; Mon, 14 Oct 2024 16:38:59 +0000 (UTC) X-FDA: 82672767282.01.886D812 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by imf16.hostedemail.com (Postfix) with ESMTP id 6EF6B180016 for ; Mon, 14 Oct 2024 16:38:58 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=none; spf=pass (imf16.hostedemail.com: domain of jonathan.cameron@huawei.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=jonathan.cameron@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1728923803; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dDBXGrE2uwU4tay34NX0PGgEnKEPDSj+rhjNJZ9QM0g=; b=Lh7z1XQlwVdNlqMRbse2VbTSWHSE0HSjG3V1g8IufGW6Bgk1bmO2aNpDijBnqbsOz/ScXb bpEmNF/z+DHqkk5+z0x5VceR7jIA+9BiGama8rFwmY0CHXa2QPittbSETGxvkG6wuqnxAn bvpP/4rlL3r0gNBBkPtVqlMl70fu7RE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1728923803; a=rsa-sha256; cv=none; b=gezs3reCiYn548WV8fBdVUT1RkLC8D2JwBcYv7ofB9de7KLrW6TukIjAjaBwCT6F+zkZYQ sEEdFBbO4CelBVA/7+e6Ib4LbNFBwfcleRoYMiB6Vi8Rfd5uygCZyOGdwMrOHos8ooDV72 ybDfIx8FmPEL1vsTZC+XJTU0sRcgmjw= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=none; spf=pass (imf16.hostedemail.com: domain of jonathan.cameron@huawei.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=jonathan.cameron@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com Received: from mail.maildlp.com (unknown [172.18.186.216]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4XS2wf5jf2z6HJnN; Tue, 15 Oct 2024 00:38:26 +0800 (CST) Received: from frapeml500008.china.huawei.com (unknown [7.182.85.71]) by mail.maildlp.com (Postfix) with ESMTPS id F13D0140C72; Tue, 15 Oct 2024 00:38:58 +0800 (CST) Received: from localhost (10.203.177.66) by frapeml500008.china.huawei.com (7.182.85.71) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Mon, 14 Oct 2024 18:38:57 +0200 Date: Mon, 14 Oct 2024 17:38:55 +0100 From: Jonathan Cameron To: CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: Re: [PATCH v13 17/18] cxl/memfeature: Add CXL memory device PPR control feature Message-ID: <20241014173855.0000583c@Huawei.com> In-Reply-To: <20241009124120.1124-18-shiju.jose@huawei.com> References: <20241009124120.1124-1-shiju.jose@huawei.com> <20241009124120.1124-18-shiju.jose@huawei.com> Organization: Huawei Technologies Research and Development (UK) Ltd. X-Mailer: Claws Mail 4.1.0 (GTK 3.24.33; x86_64-w64-mingw32) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.203.177.66] X-ClientProxiedBy: lhrpeml100001.china.huawei.com (7.191.160.183) To frapeml500008.china.huawei.com (7.182.85.71) X-Stat-Signature: 9awq7dwndcs6sjys8oa99g56zm5w4gzk X-Rspamd-Queue-Id: 6EF6B180016 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1728923938-951704 X-HE-Meta: U2FsdGVkX193UKrr5DrA1UeY+/d/cg+k+yUh+YepDZYAp3Uti4wl0QyZR116/vHDAK1kMy4cd5MSGcruwtyI+3bPItkZX77ICJhiYUOc9sL80p3TKMM3yAaaMeM+IoAetl380H8kNukyj3wXddEQf9+pJy4eZK+DZYSlPqhwrMuEwrnQ3ue+QZdO+lrC6iVwxpfY4usWXJ5Ia85w5vAr3keV9+4DpHg7mZK1nNn2NP/FkqDzKCMkuNrHbh1A7WWA4esi6gDN8n6PN9L/4wKvz01Ak3wEGDadastyjADcmYPLEqGO3vnm0DiBS6H6yChGjPoaRWzRPmdpNxtSFxxlgJQsSOWDFaa6FLNZKBGbJ6r48OCWT7IrU3qjW4XpmunFYbFpsoRlmF2ioGsJLzYz5BcjiJAKIZQlUVD+g68y7nXnxSCTahJ1LrhCzQoBsIeWfmpL3tIqNyf0aDBY7V2aCkgaqSfpT2zePfUNLWwZNmzUTXO9MbHrlG6ONZBRY5Xv0/p/bWK+ROO9sZqgSMFX/eCfhFM6jdi/aQzVT2ppG2rheLTUJK4+A6n7tQpvhk3CQTCjJf38PU5vP0DIRUDrRz0azT9gNq/4jrgsefLoCon4ZMJhZH9qBtbfPn8ZamyaRZSggsu48b7wnj+30ngWpTpMANf5kLEurJCoJW05AvL1W5jc+25zFadRhK1soX4bmNkO6GnroDtSpM3rwp8/bIMe02BSnlI9legqaZzOoNDHUnCMkywRb8O+eIcPrMeRjJiTHbUIv7KujC6rweXoJ9vfb7rqxnp8IHLuLX7Tq2304A/nUXX3TdEh/wMeOR9jW8nv6QKRdMkiZR5g2x1mgD8iS7fvs2JbDwCZmn0sS0Ww1mkW6nQKakVuDZWhdJfB2kUs6eZd3zKveAqhhUnC9zgj8c5ObTv0gejQG3CEezSzDKY4w+YjwsVCk2ubWUSNa4LdwwjpD2EbHxPNDE0 9dfpgTN0 wDf2po6pimC2YbHAF/DQa1FNH/foWhZzYko7tIXWWuv1Id9bVUJYXCCUO8WwB5bbKesa4aqup93pmQlAYxoXcoMbNI+4adpRtEDq5vgySGp+gwLNVfBsqLG0cWiMbf2vI1ZkTh9SM8kkA/luaiMgv45oGkpE/hR+ZzGtsAjdRlVAxiYMiauc823A2hqN8UzURZ6+Px2NoU2c0HsihITStXOUAEIfPWXC2GTdbnP3KdNJThiC68pyedvzLuR4N7CiKGHLnDrgXntKJ+hWYnvCLRBxT22F3t3apGMSaXjpZ/GEgaBQnC1mEUSNxkmmGOe4Y9dbamW5OBIUAkWoiqXK1JzIR3J4s6iOpqOZaMah78fWzh6Q1x8CgVhlc1T1YjZOZkSHr X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, 9 Oct 2024 13:41:18 +0100 wrote: > From: Shiju Jose > > Post Package Repair (PPR) maintenance operations may be supported by CXL > devices that implement CXL.mem protocol. A PPR maintenance operation > requests the CXL device to perform a repair operation on its media. > For example, a CXL device with DRAM components that support PPR features > may implement PPR Maintenance operations. DRAM components may support two > types of PPR: Hard PPR (hPPR), for a permanent row repair, and Soft PPR > (sPPR), for a temporary row repair. sPPR is much faster than hPPR, but the > repair is lost with a power cycle. > > During the execution of a PPR Maintenance operation, a CXL memory device: > - May or may not retain data > - May or may not be able to process CXL.mem requests correctly, including > the ones that target the DPA involved in the repair. > These CXL Memory Device capabilities are specified by Restriction Flags > in the sPPR Feature and hPPR Feature. > > sPPR maintenance operation may be executed at runtime, if data is retained > and CXL.mem requests are correctly processed. For CXL devices with DRAM > components, hPPR maintenance operation may be executed only at boot because > data would not be retained. > When a CXL device identifies a failure on a memory component, the device > may inform the host about the need for a PPR maintenance operation by using > an Event Record, where the Maintenance Needed flag is set. The Event Record > specifies the DPA that should be repaired. A CXL device may not keep track > of the requests that have already been sent and the information on which > DPA should be repaired may be lost upon power cycle. > The userspace tool requests for maintenance operation if the number of > corrected error reported on a CXL.mem media exceeds error threshold. > > CXL spec 3.1 section 8.2.9.7.1.2 describes the device's sPPR (soft PPR) > maintenance operation and section 8.2.9.7.1.3 describes the device's > hPPR (hard PPR) maintenance operation feature. > > CXL spec 3.1 section 8.2.9.7.2.1 describes the sPPR feature discovery and > configuration. > > CXL spec 3.1 section 8.2.9.7.2.2 describes the hPPR feature discovery and > configuration. > > Add support for controlling CXL memory device PPR feature. > Register with EDAC driver, which gets the memory repair attr descriptors > from the EDAC memory repair driver and exposes sysfs repair control > attributes for PRR to the userspace. For example CXL PPR control for the > CXL mem0 device is exposed in /sys/bus/edac/devices/cxl_mem0/mem_repairX/ > > Tested with QEMU patch for CXL PPR feature. > https://lore.kernel.org/all/20240730045722.71482-1-dave@stgolabs.net/ > > Signed-off-by: Shiju Jose Trivial comments inline. This description should call out that initial support is sPPR only, though hPPR is very easy to add. Jonathan > --- > drivers/cxl/core/memfeature.c | 335 +++++++++++++++++++++++++++++++++- > 1 file changed, 329 insertions(+), 6 deletions(-) > > diff --git a/drivers/cxl/core/memfeature.c b/drivers/cxl/core/memfeature.c > index 567406566c77..a0c9a6bd73c0 100644 > --- a/drivers/cxl/core/memfeature.c > +++ b/drivers/cxl/core/memfeature.c > @@ -18,8 +18,9 @@ > #include > #include > #include > +#include "core.h" > > -#define CXL_DEV_NUM_RAS_FEATURES 2 > +#define CXL_DEV_NUM_RAS_FEATURES 3 > #define CXL_DEV_HOUR_IN_SECS 3600 > > #define CXL_SCRUB_NAME_LEN 128 > @@ -723,6 +724,294 @@ static const struct edac_ecs_ops cxl_ecs_ops = { > .set_threshold = cxl_ecs_set_threshold, > }; > > +/* CXL memory soft PPR & hard PPR control definitions */ Add some specification references for the various structures etc. > +static const uuid_t cxl_sppr_uuid = > + UUID_INIT(0x892ba475, 0xfad8, 0x474e, 0x9d, 0x3e, 0x69, 0x2c, 0x91, \ > + 0x75, 0x68, 0xbb); > + > +static const uuid_t cxl_hppr_uuid = > + UUID_INIT(0x80ea4521, 0x786f, 0x4127, 0xaf, 0xb1, 0xec, 0x74, 0x59, \ > + 0xfb, 0x0e, 0x24); > + > +#define CXL_MEMDEV_PPR_DEVICE_INITIATED_MASK BIT(0) > +#define CXL_MEMDEV_PPR_FLAG_DPA_SUPPORT_MASK BIT(0) > +#define CXL_MEMDEV_PPR_FLAG_NIBBLE_SUPPORT_MASK BIT(1) > +#define CXL_MEMDEV_PPR_FLAG_MEM_SPARING_EV_REC_SUPPORT_MASK BIT(2) > + > +#define CXL_MEMDEV_PPR_RESTRICTION_FLAG_MEDIA_ACCESSIBLE_MASK BIT(0) > +#define CXL_MEMDEV_PPR_RESTRICTION_FLAG_DATA_RETAINED_MASK BIT(2) > + > +#define CXL_MEMDEV_PPR_SPARING_EV_REC_EN_MASK BIT(0) > + > +struct cxl_memdev_ppr_rd_attrs { > + u8 max_op_latency; > + __le16 op_cap; > + __le16 op_mode; > + u8 op_class; > + u8 op_subclass; > + u8 rsvd[9]; Down to here is the common header. Maybe break that out as a separate structure as we will get more maintenance features. Also makes the spec reference simpler as some of the flags are in the generic part (the device initiated one) > + u8 ppr_flags; > + __le16 restriction_flags; > + u8 ppr_op_mode; > +} __packed; > + > + > +static int cxl_do_query_ppr(struct device *dev, void *drv_data) > +{ > + struct cxl_ppr_context *cxl_ppr_ctx = drv_data; > + > + if (!cxl_ppr_ctx->dpa) > + return -EINVAL; > + > + return cxl_mem_ppr_set_attrs(dev, drv_data, CXL_PPR_PARAM_DO_QUERY); > +} > + > +static int cxl_do_ppr(struct device *dev, void *drv_data) > +{ > + struct cxl_ppr_context *cxl_ppr_ctx = drv_data; > + int ret; > + > + if (!cxl_ppr_ctx->dpa) > + return -EINVAL; blank line here (as in do_query above) > + ret = cxl_mem_ppr_set_attrs(dev, drv_data, CXL_PPR_PARAM_DO_PPR); > + > + return ret; return cxl_mem_ppr_set_attrs() > +}