From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8EB1D18129 for ; Mon, 14 Oct 2024 15:41:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 45DB46B0083; Mon, 14 Oct 2024 11:41:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3E76D6B0085; Mon, 14 Oct 2024 11:41:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 288096B0088; Mon, 14 Oct 2024 11:41:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 06E2D6B0083 for ; Mon, 14 Oct 2024 11:41:08 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id B01E3140213 for ; Mon, 14 Oct 2024 15:40:59 +0000 (UTC) X-FDA: 82672621080.09.943086C Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by imf24.hostedemail.com (Postfix) with ESMTP id 14DE318001D for ; Mon, 14 Oct 2024 15:41:02 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf24.hostedemail.com: domain of jonathan.cameron@huawei.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=jonathan.cameron@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1728920419; a=rsa-sha256; cv=none; b=V1O/SxLxVXqOx0WlpjeXDYfe238n6ZRAWzYP/dtFcVH6dNypiMjFdRFmf2wB2spQaeltG/ +4TiXl104LycNLz88zy9SK0H6Bc7aYdE7UOTXtekSTCZ7+DxktBwe4Vch6FTEP95TgX5QD t8GGi9WH9WfSRQiDzIVXFd90kNWJD+M= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf24.hostedemail.com: domain of jonathan.cameron@huawei.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=jonathan.cameron@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1728920419; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=47458lR6cgPTfiMWr8NyK7wX+DYY6n8mgnNwM9Znrmo=; b=ZUpFZSTYgTOVNmpHUhCnw9thCqHtFmE8Ir3MtgRJgU3E8yKyAarFDhjsJhApkCPRJdD0kh 7zcNVtuE6TzmKMe7C8HTROCYgRG1OXXREgjY4bxywd5/SXck/pOB+/OL39uccUI6nZWlN9 esaUVl8WhSUYU7pm/i0eJqp5N2a//C4= Received: from mail.maildlp.com (unknown [172.18.186.31]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4XS1cX1gncz6GC8t; Mon, 14 Oct 2024 23:39:24 +0800 (CST) Received: from frapeml500008.china.huawei.com (unknown [7.182.85.71]) by mail.maildlp.com (Postfix) with ESMTPS id 1A5CF1401F4; Mon, 14 Oct 2024 23:40:59 +0800 (CST) Received: from localhost (10.203.177.66) by frapeml500008.china.huawei.com (7.182.85.71) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Mon, 14 Oct 2024 17:40:57 +0200 Date: Mon, 14 Oct 2024 16:40:55 +0100 From: Jonathan Cameron To: CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: Re: [PATCH v13 11/18] cxl/memfeature: Add CXL memory device ECS control feature Message-ID: <20241014164055.00002019@Huawei.com> In-Reply-To: <20241009124120.1124-12-shiju.jose@huawei.com> References: <20241009124120.1124-1-shiju.jose@huawei.com> <20241009124120.1124-12-shiju.jose@huawei.com> Organization: Huawei Technologies Research and Development (UK) Ltd. X-Mailer: Claws Mail 4.1.0 (GTK 3.24.33; x86_64-w64-mingw32) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.203.177.66] X-ClientProxiedBy: lhrpeml500004.china.huawei.com (7.191.163.9) To frapeml500008.china.huawei.com (7.182.85.71) X-Rspam-User: X-Rspamd-Queue-Id: 14DE318001D X-Rspamd-Server: rspam01 X-Stat-Signature: 9uxbs6qrge3wczkobbaepxfaw9k9n6y1 X-HE-Tag: 1728920462-246657 X-HE-Meta: U2FsdGVkX18/oWKIISeW8GmeKiMBXczBvC8+USVeLb9s0uuIyip/qXm5QF6T1h9NR83Vc39QeLqfcKhY9NuE5BVIzg7ynW+wGmb0VwUMrIcWk5C+5TVKau3KEyhOXBMP4oTXjKjuFcb+WU20QEGbqhggbxCTXXd/jSE6gO8gOlG/AUajEIALlya7pcPoI89qzSjiR/S66F4v7lg8uzAb8gcjqfGaNZVuMANAH1FpDkKn5YwegNKjpydlPu0L57DNBA5LNyel5pQbwfW8yapk24PmhBx8irvrusX8kA2TLWjW35zEvKO+Z44u+/IDqADzR0Cu5v6Ze18NBjeDzrEprc9qoMQH9odfKma/ZFu7/4b/0+cPgOu43ke9uCpk2CDZkTYxytUs/kQrKK3suyRF6tVnoq2AM8jtgmfInyn4M9sOWFICkL0d93CgXHu0QhBISci00pzmcylBqCqEhrWPbW28AuIUUX39vMXnJBSHz4OfTQuliGtgTZabYtKpQmnjkaNQvdV2tsu6APWoZm38HW4cDV364Y1/FA/62jBqktn27LLeTTWj13NeOOkvtEz8uwX8uLO2GwZffxEwCAK1UNh/7S7OvaISrMPNDRZtzTlMxBKQgO49fVCp7B5QHkyva+SeMu2hEdDyE97TnB8mpPQHdSxOrC+PjdVE8ZDD1zWr3Q8a6uH/1Be8Fljt4L5Cj4B3iHa+AS5DACP+vfHVAo+kyeNeu7SgGH8V3XsK8djDfYVQPLSwaHDWm+IdMMyFCtgfpufFaHcBPDqrNpvy6h4IlJaNl9QH9oDFMeK3vqzBr0b6zV6SX2zVB3BY4c0ZCwx6FePgaGDyg9rU/TTz69IsVOdex/WVY9clEWHfuklvg4gRxOfqgbO6xEHe5t299HX46D3pb1zfw6qOUhc9tCOoVMqW1gwH1MirmN6hUeGnZviFh26I5qkOdnTA78Lg/80lYuzRka4AZbss6T2 kNVJ4qjV jKTAUy77bX43ctcGQI/SzbRNHnmWO1oaqeyRp/+rvOApvvTZW1dCZxBGUGdIj5IX60seDaiUB0ne8HB52OD3JnzRGDjWNIrFthvRTMRQJEgnajgcqdbGF12DnVqXMKaw7wQ9oQskAKP9YsH4uXRxOpsjTNkPXyryceuoKiVcEMhS2I8fuXjum6y5A2FY8L7vyRns5ehFpr2KJSJ1EDxRRMLuGUJEXTuMrkead9eB/TkKvNw19fizQBxQ4Zuzrx4cYhQYVdoXzNRhtVfgAldJsNGkIWTJ0b/3jqlTmIPqE+jHR7zI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, 9 Oct 2024 13:41:12 +0100 wrote: > From: Shiju Jose > > CXL spec 3.1 section 8.2.9.9.11.2 describes the DDR5 ECS (Error Check > Scrub) control feature. > The Error Check Scrub (ECS) is a feature defined in JEDEC DDR5 SDRAM > Specification (JESD79-5) and allows the DRAM to internally read, correct > single-bit errors, and write back corrected data bits to the DRAM array > while providing transparency to error counts. I never understood the 'transparency to error counts'. Maybe from software point of view 'while reporting error counts to the host'. Unless anyone else can figure out what that text from the CXL spec means? (I'm guessing it is cut and paste from the JEDEC spec) > > The ECS control allows the requester to change the log entry type, the ECS > threshold count provided that the request is within the definition > specified in DDR5 mode registers, change mode between codeword mode and > row count mode, and reset the ECS counter. > > Register with EDAC device driver, which gets the ECS attr descriptors > from the EDAC ECS and expose sysfs ECS control attributes to userspace. > For example ECS control for the memory media FRU 0 in CXL mem0 device is > in /sys/bus/edac/devices/cxl_mem0/ecs_fruX/ fru0? > > Signed-off-by: Shiju Jose A few little things in line. In general looks good to me. > --- > drivers/cxl/core/memfeature.c | 467 +++++++++++++++++++++++++++++++++- > 1 file changed, 461 insertions(+), 6 deletions(-) > > diff --git a/drivers/cxl/core/memfeature.c b/drivers/cxl/core/memfeature.c > index 84d6e887a4fa..567406566c77 100644 > --- a/drivers/cxl/core/memfeature.c > +++ b/drivers/cxl/core/memfeature.c > @@ -19,7 +19,7 @@ > #include > #include > > -#define CXL_DEV_NUM_RAS_FEATURES 1 > +#define CXL_DEV_NUM_RAS_FEATURES 2 > #define CXL_DEV_HOUR_IN_SECS 3600 > > #define CXL_SCRUB_NAME_LEN 128 > @@ -309,6 +309,420 @@ static const struct edac_scrub_ops cxl_ps_scrub_ops = { > .set_cycle_duration = cxl_patrol_scrub_write_scrub_cycle, > }; > > +/* CXL DDR5 ECS control definitions */ > +static const uuid_t cxl_ecs_uuid = > + UUID_INIT(0xe5b13f22, 0x2328, 0x4a14, 0xb8, 0xba, 0xb9, 0x69, 0x1e, \ Why the \? > + 0x89, 0x33, 0x86); > + > + > +#define CXL_ECS_LOG_ENTRY_TYPE_MASK GENMASK(1, 0) > +#define CXL_ECS_REALTIME_REPORT_CAP_MASK BIT(0) > +#define CXL_ECS_THRESHOLD_COUNT_MASK GENMASK(2, 0) > +#define CXL_ECS_MODE_MASK BIT(3) That name is a little generic. Maybe CXL_ECS_COUNT_MODE_MASK ? > +#define CXL_ECS_RESET_COUNTER_MASK BIT(4) > + > +static const u16 ecs_supp_threshold[] = { 0, 0, 0, 256, 1024, 4096 }; > + > +enum { > + ECS_LOG_ENTRY_TYPE_DRAM = 0x0, > + ECS_LOG_ENTRY_TYPE_MEM_MEDIA_FRU = 0x1, > +}; > + > +enum { > + ECS_THRESHOLD_256 = 3, > + ECS_THRESHOLD_1024 = 4, > + ECS_THRESHOLD_4096 = 5, > +}; Perhaps move this above the ecs_supp_threshold array and use static const ecs_supp_threshold[] = { [ECS_THRESHOLD_256] = 256, [ECS_THRESHOLD_1024] = 1024, [ECS_THRESHOLD_4096] = 4096, }; which will fill the zeros in for you. You don't care about them anyway as they are undefined values. > + > +enum cxl_ecs_mode { > + ECS_MODE_COUNTS_ROWS = 0, > + ECS_MODE_COUNTS_CODEWORDS = 1, > +}; > + > +static int cxl_mem_ecs_set_attrs(struct device *dev, void *drv_data, int fru_id, > + struct cxl_ecs_params *params, u8 param_type) > +{ > + struct cxl_ecs_context *cxl_ecs_ctx = drv_data; > + struct cxl_memdev *cxlmd = cxl_ecs_ctx->cxlmd; > + struct cxl_dev_state *cxlds = cxlmd->cxlds; > + struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds); > + struct cxl_ecs_fru_rd_attrs *fru_rd_attrs; > + struct cxl_ecs_fru_wr_attrs *fru_wr_attrs; > + size_t rd_data_size, wr_data_size; > + u16 num_media_frus, count; > + size_t data_size; > + int ret; > + > + num_media_frus = cxl_ecs_ctx->num_media_frus; > + rd_data_size = cxl_ecs_ctx->get_feat_size; > + wr_data_size = cxl_ecs_ctx->set_feat_size; > + struct cxl_ecs_rd_attrs *rd_attrs __free(kfree) = > + kmalloc(rd_data_size, GFP_KERNEL); > + if (!rd_attrs) > + return -ENOMEM; > + > + data_size = cxl_get_feature(mds, cxl_ecs_uuid, > + CXL_GET_FEAT_SEL_CURRENT_VALUE, > + rd_attrs, rd_data_size); > + if (!data_size) > + return -EIO; blank line here as the next line isn't part of this allocate / check errors block. > + struct cxl_ecs_wr_attrs *wr_attrs __free(kfree) = > + kmalloc(wr_data_size, GFP_KERNEL); > + if (!wr_attrs) > + return -ENOMEM; > + > + /* Fill writable attributes from the current attributes read CXL uses /* * Fill writable style for multiline comments. > + * for all the media FRUs. > + */ > +static int cxl_ecs_get_mode_counts_rows(struct device *dev, void *drv_data, > + int fru_id, u32 *val) > +{ > + struct cxl_ecs_params params; > + int ret; > + > + ret = cxl_mem_ecs_get_attrs(dev, drv_data, fru_id, ¶ms); > + if (ret) > + return ret; > + > + if (params.mode == ECS_MODE_COUNTS_ROWS) > + *val = 1; > + else > + *val = 0; > + > + return 0; > +} > + > +static int cxl_ecs_get_mode_counts_codewords(struct device *dev, void *drv_data, > + int fru_id, u32 *val) > +{ > + struct cxl_ecs_params params; > + int ret; This form is pretty common. Maybe worth some macros like you have in the edac side of things? > + > + ret = cxl_mem_ecs_get_attrs(dev, drv_data, fru_id, ¶ms); > + if (ret) > + return ret; > + > + if (params.mode == ECS_MODE_COUNTS_CODEWORDS) > + *val = 1; > + else > + *val = 0; > + > + return 0; > +}