From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4CAA9D13588 for ; Mon, 28 Oct 2024 11:17:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A06D06B0085; Mon, 28 Oct 2024 07:17:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 98FCB6B0088; Mon, 28 Oct 2024 07:17:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7E28A6B0089; Mon, 28 Oct 2024 07:17:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 5C3F86B0085 for ; Mon, 28 Oct 2024 07:17:41 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id ABB76C1111 for ; Mon, 28 Oct 2024 11:17:16 +0000 (UTC) X-FDA: 82722760470.28.F3F4AD3 Received: from mail.alien8.de (mail.alien8.de [65.109.113.108]) by imf11.hostedemail.com (Postfix) with ESMTP id 50F8B40021 for ; Mon, 28 Oct 2024 11:17:10 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=alien8.de header.s=alien8 header.b=CK3gJwuD; spf=pass (imf11.hostedemail.com: domain of bp@alien8.de designates 65.109.113.108 as permitted sender) smtp.mailfrom=bp@alien8.de; dmarc=pass (policy=none) header.from=alien8.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730114102; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=aQXjtMdPxcXQFLTjo1thwq4DHI1V42PctDkGBJnF/+0=; b=pHL/mTq852ARI2tYgMTlcVoNkHYM1yEMg8QsrJosLnbjocf4gbmg05EDqJq39Nd+el3jMW QK4BoqXz5hswrob9AwpJ/9S3oi7ITjTySlG2kMfY7Eq/zDzz4pJN/L08us1gB59jzJPIB7 6nGFqyf6Wdb0M5bcTbFHQzn01ToMYcI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730114102; a=rsa-sha256; cv=none; b=voykGFvRGsgdvg/dYy0jrRHExem/mbkfc+VpdiAOlne6/MxuyNZcu0iPEFua/41HQGaa0i US6/6Nkx2Nk560SFDHavdTTsSbPWXHJH/xZoCJcfrr5lOZDYP/2LZidzeoYlih96b9F2FZ XBLBIJT9jjqXLXzQWPQx4JN4UGTac6o= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=alien8.de header.s=alien8 header.b=CK3gJwuD; spf=pass (imf11.hostedemail.com: domain of bp@alien8.de designates 65.109.113.108 as permitted sender) smtp.mailfrom=bp@alien8.de; dmarc=pass (policy=none) header.from=alien8.de Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.alien8.de (SuperMail on ZX Spectrum 128k) with ESMTP id A16B940E0198; Mon, 28 Oct 2024 11:17:34 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at mail.alien8.de Received: from mail.alien8.de ([127.0.0.1]) by localhost (mail.alien8.de [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id YWBf7PxV-zBo; Mon, 28 Oct 2024 11:17:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alien8.de; s=alien8; t=1730114249; bh=aQXjtMdPxcXQFLTjo1thwq4DHI1V42PctDkGBJnF/+0=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=CK3gJwuDuQBnlCE3o4SDY3/5iCsHbLCzO+vHjzamHleWqt+WYcO69bJsAASLHJC2q ti8RqEMlzOcv7Ux5ntYx8kAML4pLXzsvb1YxnYJuPspbRxqXKEl1mTguckHzn3Sl+G AUHvfxed+jn0CulXbHZq8twP4W+uDZ5sleZ5pW0W1X7mnBjwyRtIIukyWwhovog2yS CSNAQldYCihVxHRyGcn2WyWSLfnZEUdry5qSIp4KwlU141nOaW1S1/5p2g5/rYpXuf GyS6fOrlsWBGJAtd2wyQaVU4d+ZpRSZhlR3VpvXPRYkDHtPqaSbydAaHh5lwxHgBXu r9Xaf8d4tUDu7AyY54XoyTP/42bt7WCv1lgLY2pRdnCW2LNHfJeCVVMbeuqDOH3Nil vwgVyFBnSnB30uEfmEnTPEfCKjeM5p1Hc25hHnxPxf3MumAXCK2HYpvFiT/dtI3CGf Ij8WVjFgHfEicgmitayL1N2E4FueQgJWMU6bAq6j2QKwK8oSI4kpYM1c655qNxMpgS M8MTk1nUUCYSigSBEV2G629H1aOll/cax7kFcWBvmw0wY6R+cdqyB5/RvI3OjlTFjF uxPYhCQUrHlx2jzyO49OBSQGkvavl+8FNIhfdobnAPlb8WaRUPASwogUjvYM84LJf1 TbrkOTRm3H3ec7oEOYG7tTfw= Received: from zn.tnic (p5de8e8eb.dip0.t-ipconnect.de [93.232.232.235]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by mail.alien8.de (SuperMail on ZX Spectrum 128k) with ESMTPSA id 26F0C40E01A5; Mon, 28 Oct 2024 11:16:43 +0000 (UTC) Date: Mon, 28 Oct 2024 12:16:37 +0100 From: Borislav Petkov To: shiju.jose@huawei.com Cc: linux-edac@vger.kernel.org, linux-cxl@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, tony.luck@intel.com, rafael@kernel.org, lenb@kernel.org, mchehab@kernel.org, dan.j.williams@intel.com, dave@stgolabs.net, jonathan.cameron@huawei.com, gregkh@linuxfoundation.org, sudeep.holla@arm.com, jassisinghbrar@gmail.com, dave.jiang@intel.com, alison.schofield@intel.com, vishal.l.verma@intel.com, ira.weiny@intel.com, david@redhat.com, Vilas.Sridharan@amd.com, leo.duran@amd.com, Yazen.Ghannam@amd.com, rientjes@google.com, jiaqiyan@google.com, Jon.Grimm@amd.com, dave.hansen@linux.intel.com, naoya.horiguchi@nec.com, james.morse@arm.com, jthoughton@google.com, somasundaram.a@hpe.com, erdemaktas@google.com, pgonda@google.com, duenwen@google.com, gthelen@google.com, wschwartz@amperecomputing.com, dferguson@amperecomputing.com, wbs@os.amperecomputing.com, nifan.cxl@gmail.com, tanxiaofei@huawei.com, prime.zeng@hisilicon.com, roberto.sassu@huawei.com, kangkang.shen@futurewei.com, wanghuiqiang@huawei.com, linuxarm@huawei.com Subject: Re: [PATCH v14 03/14] EDAC: Add ECS control feature Message-ID: <20241028111637.GSZx9yleFPOjTklIVr@fat_crate.local> References: <20241025171356.1377-1-shiju.jose@huawei.com> <20241025171356.1377-4-shiju.jose@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20241025171356.1377-4-shiju.jose@huawei.com> X-Rspamd-Queue-Id: 50F8B40021 X-Stat-Signature: 6qbd3mjksgoda5kdx5rgf93chn8e4ey3 X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1730114230-963623 X-HE-Meta: U2FsdGVkX1/Jiv/XMzMkzvG25EIBVWZCG+fJRlSTWZ2pw1dND5go/EfCigH1kjFcbYgVo7eU57FNCSwqxAWU/3sbXFRgrNWJyxmWis4FUlPckWAfz9OEfxKcX85lWhrrGbtNgOQExiln2T+F4v8abJbxocjdInM9Fc4Bj30DOZTlsr/1Fjd76d+06z2ZxbY7IOPLGvVTeSannIB6r1o5YghpFOzs/T13/jAkBj1sbUeeeyKfDre6a5pfXF+7VioiVPeZVlC25/c67/aZmQHGt28G7aBa34bSi3m+e7i/6Ff9V6YMmDB7nNH2/oYBsC72SWDMg8PjP1ktbr20yGIGLtjHhb8/d9ihkyTsWhhzXJ9csPSDoXm8udD8YS701AW0s7x0KKS+y4aEwZm/k0Vob/Z7JrkVjz12irMHBWpeJZOPcNR1A8zOMhztIO8u0jG2T7PYqQu3bSzisdtvK1nHwVJNvckI+IatCCs8by6jVHOZbEsimX2OYvN9CB1fmBPzXP46E9yK81EvxlmNNIhRnqLDnjcDossOHjDowrTnBa6Pdkt6UDlb0FZc7w5g89cPPq9bbd+THaXmeRD33vBeNobdh6sQmMEVkNFvmn6DIpKlC9SF8Cko6FzmgwC7fVRTGbvnH5JBDDOZVxk5ey7Rjfdvox6z2ShMLFPOvsIN8jwtf7dGisDUqruiWXgSObSQYYJSYn+6GI9AP3efX6sMHhgo5XHPaAlnJu4NWag65YlGLZsF7HL6d6BiZk3ka9ojfxHRNAWd7Fif9kmQJ3FY2OIJoTjdb646YHfNhtMiWz7Wzw3yfTU9anNQEm36azbtU1bAsOlBrtstTI+5mw9ztIK9HlwNW1h7GaXFCVhWGxipeFawhxQ7bX43uRdF4+vW2g9ZfFfguU+tsnaZKQFLxxwfWyaz/UywnWlG5h8fd61PwrKGSfNINo/5CoY5ij3aKaeKFhba+lBX9a4qi+7 bzu7Zc3M Hg9zJ8Md3PVqXUcIDjMy3hcNquPJN5pazD79gvmDFMivyzv/VvbfhdFmVZq9/8AE6pjr12Qw6IDrnPatNFdJXzuciSLzLruLUJfVFr1LZOXZwXFVZWBJQZntoGWQiUVhE6YYRlWThA8qY4FdU3jLyx2UKxY38j3xoh9J7UvCf1mz9rSuKyet0RHumwRSt2lHCXzmnwev7JHEXh9b3vBndxv81IxXsDSIlCkh3eE8u4HaxrHtbbrB/+UO2JP2BPAsqaWn5ItCI3YKEv7eoeZYiBCeUgSwHsRywcSI2eRszHSWjiHZEnVXmv3aOrrDlTbAdEhyH X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Oct 25, 2024 at 06:13:44PM +0100, shiju.jose@huawei.com wrote: > +What: /sys/bus/edac/devices//ecs_fruX/log_entry_type > +Date: Jan 2025 > +KernelVersion: 6.13 > +Contact: linux-edac@vger.kernel.org > +Description: > + (RW) The log entry type of how the DDR5 ECS log is reported. > + 00b - per DRAM. > + 01b - per memory media FRU. If the conversion function here is kstrtoul(), why are those values not "0" and "1" but in binary format? > + > +What: /sys/bus/edac/devices//ecs_fruX/log_entry_type_per_dram > +Date: Jan 2025 > +KernelVersion: 6.13 > +Contact: linux-edac@vger.kernel.org > +Description: > + (RO) True if current log entry type is per DRAM. > + > +What: /sys/bus/edac/devices//ecs_fruX/log_entry_type_per_memory_media > +Date: Jan 2025 > +KernelVersion: 6.13 > +Contact: linux-edac@vger.kernel.org > +Description: > + (RO) True if current log entry type is per memory media FRU. What's the point of those two if log_entry_type already gives you the same info? And the filename length is a bit too much... > + > +What: /sys/bus/edac/devices//ecs_fruX/mode > +Date: Jan 2025 > +KernelVersion: 6.13 > +Contact: linux-edac@vger.kernel.org > +Description: > + (RW) The mode of how the DDR5 ECS counts the errors. > + 0 - ECS counts rows with errors. > + 1 - ECS counts codewords with errors. Now we have "0" and "1"s. Oh well. What are "rows", what are "codewords"? Explain them here pls for the user. > +What: /sys/bus/edac/devices//ecs_fruX/mode_counts_rows > +Date: Jan 2025 > +KernelVersion: 6.13 > +Contact: linux-edac@vger.kernel.org > +Description: > + (RO) True if current mode is ECS counts rows with errors. > + > +What: /sys/bus/edac/devices//ecs_fruX/mode_counts_codewords > +Date: Jan 2025 > +KernelVersion: 6.13 > +Contact: linux-edac@vger.kernel.org > +Description: > + (RO) True if current mode is ECS counts codewords with errors. Same question as above - redundant files. > +What: /sys/bus/edac/devices//ecs_fruX/reset > +Date: Jan 2025 > +KernelVersion: 6.13 > +Contact: linux-edac@vger.kernel.org > +Description: > + (WO) ECS reset ECC counter. > + 1 - reset ECC counter to the default value. 1 or any value? Looks like any to me... You should restrict it to "1" in case you want to extend this interface with "2" in the future, for example, doing something a bit different. > + > +What: /sys/bus/edac/devices//ecs_fruX/threshold > +Date: Jan 2025 > +KernelVersion: 6.13 > +Contact: linux-edac@vger.kernel.org > +Description: > + (RW) ECS threshold count per gigabits of memory cells. That definitely needs more explanation. > diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile > index 188501e676c7..b24c2c112d9c 100644 > --- a/drivers/edac/Makefile > +++ b/drivers/edac/Makefile > @@ -10,7 +10,7 @@ obj-$(CONFIG_EDAC) := edac_core.o > > edac_core-y := edac_mc.o edac_device.o edac_mc_sysfs.o > edac_core-y += edac_module.o edac_device_sysfs.o wq.o > -edac_core-y += scrub.o > +edac_core-y += scrub.o ecs.o > > edac_core-$(CONFIG_EDAC_DEBUG) += debugfs.o > > diff --git a/drivers/edac/ecs.c b/drivers/edac/ecs.c > new file mode 100755 > index 000000000000..a2b64d7bf6b6 > --- /dev/null > +++ b/drivers/edac/ecs.c > @@ -0,0 +1,240 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Generic ECS driver in order to support control the on die > + * error check scrub (e.g. DDR5 ECS). This sentence needs grammar check. > The common sysfs ECS > + * interface abstracts the control of an arbitrary ECS > + * functionality to a common set of functions. > + * > + * Copyright (c) 2024 HiSilicon Limited. > + */ > + #undef pr_fmt > +#define pr_fmt(fmt) "EDAC ECS: " fmt Grep the tree for examples how to do that properly. Also, this pr_fmt looks unused. > +static umode_t ecs_attr_visible(struct kobject *kobj, struct attribute *a, int attr_id) > +{ > + struct device *ras_feat_dev = kobj_to_dev(kobj); > + struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev); > + const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops; > + > + switch (attr_id) { > + case ECS_LOG_ENTRY_TYPE: > + if (ops->get_log_entry_type) { > + if (ops->set_log_entry_type) > + return a->mode; > + else > + return 0444; What is the goal for the access mode of all those sysfs entries? I sure hope it is going to be root-only no-matter what. I don't want normal users to cause scrub activity. Please make sure your whole set does that. > + } > + break; > + case ECS_LOG_ENTRY_TYPE_PER_DRAM: > + if (ops->get_log_entry_type_per_dram) > + return a->mode; > + break; > + case ECS_LOG_ENTRY_TYPE_PER_MEMORY_MEDIA: > + if (ops->get_log_entry_type_per_memory_media) > + return a->mode; > + break; > + case ECS_MODE: > + if (ops->get_mode) { > + if (ops->set_mode) > + return a->mode; > + else > + return 0444; > + } > + break; > + case ECS_MODE_COUNTS_ROWS: > + if (ops->get_mode_counts_rows) > + return a->mode; > + break; > + case ECS_MODE_COUNTS_CODEWORDS: > + if (ops->get_mode_counts_codewords) > + return a->mode; > + break; > + case ECS_RESET: > + if (ops->reset) > + return a->mode; > + break; > + case ECS_THRESHOLD: > + if (ops->get_threshold) { > + if (ops->set_threshold) > + return a->mode; > + else > + return 0444; > + } > + break; > + default: > + break; > + } > + > + return 0; > +} > + > +#define EDAC_ECS_ATTR_RO(_name, _fru_id) \ > + ((struct edac_ecs_dev_attr) { .dev_attr = __ATTR_RO(_name), \ > + .fru_id = _fru_id }) > + > +#define EDAC_ECS_ATTR_WO(_name, _fru_id) \ > + ((struct edac_ecs_dev_attr) { .dev_attr = __ATTR_WO(_name), \ > + .fru_id = _fru_id }) > + > +#define EDAC_ECS_ATTR_RW(_name, _fru_id) \ > + ((struct edac_ecs_dev_attr) { .dev_attr = __ATTR_RW(_name), \ > + .fru_id = _fru_id }) > + > +static int ecs_create_desc(struct device *ecs_dev, > + const struct attribute_group **attr_groups, u16 num_media_frus) > +{ > + struct edac_ecs_context *ecs_ctx; > + u32 fru; > + > + ecs_ctx = devm_kzalloc(ecs_dev, sizeof(*ecs_ctx), GFP_KERNEL); > + if (!ecs_ctx) > + return -ENOMEM; > + > + ecs_ctx->num_media_frus = num_media_frus; > + ecs_ctx->fru_ctxs = devm_kcalloc(ecs_dev, num_media_frus, > + sizeof(*ecs_ctx->fru_ctxs), > + GFP_KERNEL); > + if (!ecs_ctx->fru_ctxs) > + return -ENOMEM; > + > + for (fru = 0; fru < num_media_frus; fru++) { > + struct edac_ecs_fru_context *fru_ctx = &ecs_ctx->fru_ctxs[fru]; > + struct attribute_group *group = &fru_ctx->group; > + int i; > + > + fru_ctx->ecs_dev_attr[ECS_LOG_ENTRY_TYPE] = EDAC_ECS_ATTR_RW(log_entry_type, fru); > + fru_ctx->ecs_dev_attr[ECS_LOG_ENTRY_TYPE_PER_DRAM] = > + EDAC_ECS_ATTR_RO(log_entry_type_per_dram, fru); > + fru_ctx->ecs_dev_attr[ECS_LOG_ENTRY_TYPE_PER_MEMORY_MEDIA] = > + EDAC_ECS_ATTR_RO(log_entry_type_per_memory_media, fru); > + fru_ctx->ecs_dev_attr[ECS_MODE] = EDAC_ECS_ATTR_RW(mode, fru); > + fru_ctx->ecs_dev_attr[ECS_MODE_COUNTS_ROWS] = > + EDAC_ECS_ATTR_RO(mode_counts_rows, fru); > + fru_ctx->ecs_dev_attr[ECS_MODE_COUNTS_CODEWORDS] = > + EDAC_ECS_ATTR_RO(mode_counts_codewords, fru); > + fru_ctx->ecs_dev_attr[ECS_RESET] = EDAC_ECS_ATTR_WO(reset, fru); > + fru_ctx->ecs_dev_attr[ECS_THRESHOLD] = EDAC_ECS_ATTR_RW(threshold, fru); Clearly too long variable and define names. Shorten pls. Also, a new line here: <--- > + for (i = 0; i < ECS_MAX_ATTRS; i++) > + fru_ctx->ecs_attrs[i] = &fru_ctx->ecs_dev_attr[i].dev_attr.attr; > + > + sprintf(fru_ctx->name, "%s%d", EDAC_ECS_FRU_NAME, fru); > + group->name = fru_ctx->name; > + group->attrs = fru_ctx->ecs_attrs; > + group->is_visible = ecs_attr_visible; > + > + attr_groups[fru] = group; > + } > + > + return 0; > +} > + > +/** > + * edac_ecs_get_desc - get EDAC ECS descriptors > + * @ecs_dev: client device, supports ECS feature > + * @attr_groups: pointer to attribute group container > + * @num_media_frus: number of media FRUs in the device > + * > + * Return: > + * * %0 - Success. > + * * %-EINVAL - Invalid parameters passed. > + * * %-ENOMEM - Dynamic memory allocation failed. > + */ > +int edac_ecs_get_desc(struct device *ecs_dev, > + const struct attribute_group **attr_groups, u16 num_media_frus) > +{ > + if (!ecs_dev || !attr_groups || !num_media_frus) > + return -EINVAL; > + > + return ecs_create_desc(ecs_dev, attr_groups, num_media_frus); > +} > diff --git a/drivers/edac/edac_device.c b/drivers/edac/edac_device.c > index 91552271b34a..5fc3ec7f25eb 100644 > --- a/drivers/edac/edac_device.c > +++ b/drivers/edac/edac_device.c > @@ -626,6 +626,9 @@ int edac_dev_register(struct device *parent, char *name, > attr_gcnt++; > scrub_cnt++; > break; > + case RAS_FEAT_ECS: > + attr_gcnt += ras_features[feat].ecs_info.num_media_frus; > + break; > default: > return -EINVAL; > } > @@ -667,6 +670,18 @@ int edac_dev_register(struct device *parent, char *name, > scrub_inst++; > attr_gcnt++; > break; > + case RAS_FEAT_ECS: > + if (!ras_features->ecs_ops) > + goto data_mem_free; <---- newline here. > + dev_data = &ctx->ecs; > + dev_data->ecs_ops = ras_features->ecs_ops; > + dev_data->private = ras_features->ctx; > + ret = edac_ecs_get_desc(parent, &ras_attr_groups[attr_gcnt], > + ras_features->ecs_info.num_media_frus); > + if (ret) > + goto data_mem_free; Ditto. > + attr_gcnt += ras_features->ecs_info.num_media_frus; > + break; > default: > ret = -EINVAL; > goto data_mem_free; -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette