From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23E2AD2E9DD for ; Mon, 11 Nov 2024 11:29:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5A1136B007B; Mon, 11 Nov 2024 06:29:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 550D76B0083; Mon, 11 Nov 2024 06:29:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3A3AA6B0085; Mon, 11 Nov 2024 06:29:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 1C1B56B007B for ; Mon, 11 Nov 2024 06:29:26 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id C588F81804 for ; Mon, 11 Nov 2024 11:29:25 +0000 (UTC) X-FDA: 82773592356.12.E403038 Received: from mail.alien8.de (mail.alien8.de [65.109.113.108]) by imf21.hostedemail.com (Postfix) with ESMTP id E84E11C0015 for ; Mon, 11 Nov 2024 11:28:06 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=alien8.de header.s=alien8 header.b=SuhjY9Xn; dmarc=pass (policy=none) header.from=alien8.de; spf=pass (imf21.hostedemail.com: domain of bp@alien8.de designates 65.109.113.108 as permitted sender) smtp.mailfrom=bp@alien8.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731324436; a=rsa-sha256; cv=none; b=3vizmlXsutMHIpoT+lRwgrHwlImOxUCZw8eOqgsWzoNmbqSiZr1HnH6F8dcJi4VtFK0HJE 6Lu7TqVjW+0RFiY8KzC57+8pmDWthBOv2JymcYEzn1ydI2jhGdgFs/60nGusYM9WfFFgYl 41FYjvGLOGh+koI7SQGRO8dNsgwfaxg= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=alien8.de header.s=alien8 header.b=SuhjY9Xn; dmarc=pass (policy=none) header.from=alien8.de; spf=pass (imf21.hostedemail.com: domain of bp@alien8.de designates 65.109.113.108 as permitted sender) smtp.mailfrom=bp@alien8.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731324436; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JihSgH5EEo1zZNt1PcMAqI+W39owOYa2GO/ORFME5H8=; b=ZgTlr7c3Br+acs9L0ghxe/jVl3TixjwsrUdcRPXnthPubtmlsEAST0KSD7XtVkEBT7feot owTllFJ/HbxHBnhy7O+nzZ2PYvlYXDMMVn+nLZMHzvlA/D6ePcHl8nKDnmkfGIPDMmoqki eiM5VQUHcr++s7K9OWZzHgSOTa1scnM= Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.alien8.de (SuperMail on ZX Spectrum 128k) with ESMTP id 7B3C040E01FE; Mon, 11 Nov 2024 11:29:19 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at mail.alien8.de Received: from mail.alien8.de ([127.0.0.1]) by localhost (mail.alien8.de [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id LH789wugla_b; Mon, 11 Nov 2024 11:29:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alien8.de; s=alien8; t=1731324553; bh=JihSgH5EEo1zZNt1PcMAqI+W39owOYa2GO/ORFME5H8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=SuhjY9XnZ61YUXJC4uvZeRaxLqOrfY2JaVRfozxcZVCcQQoiMIzqFT+q9PDTupY4u Hz6vFjCezBUMp1Z9ilVs57fs5c6sOSvIgddWYN9XDQQbbHztkqGCUGJnu53EOeNcgV 8xqOueQtp8xg415nHLXdvfrxQ0hxs+W7Rsj2K/Hha8+z9Wpqlt6/OUzOamqctNuVc6 JEhLX+Zm2Uc8Y98c0WDxXUTOa+N3VASdpO9X12FoJ1MKyMVs8g5ps+Bjn7R5OsEuvm AzdopQYAzqqcC9Vv6zbV2pQAydaDbwevNcOHJGMg73p9bJcofd6iGewT0XED0AvR3Q cqzSB/hrFEJm9E2wnLLW24/0vbwSxWAbbL1+SDPKBoNr3A47OcbR9RBX56CY2dKTbv M75Mj7BP7raefAdBGWvIMkUYisRxJ3AsO2Mn5plb8jx6zaoxz3Tx1DhOfR2wKD51kq FmJ+KskVIy4cHuZqfCBB7FPAQC+gcPEnKb5lhfiWjkwJR7/qjrg4Bx4kkui9Mfgz6S tQhlJQofepQFryNPWV4eDH4/4QJcNt0cWG699kJQkKCK/fWqHHPYDxKe4BRbKJqddS v2Z7IqzW1qI2QfQyWdntY6GwFMG/gtHUeUYkCQLXd0A4SK+bD7GxMCW1/J3dk9/p6o rCqL4WPLRSj3eru4yUfxmUXk= Received: from zn.tnic (p200300ea973a31c3329c23fffea6a903.dip0.t-ipconnect.de [IPv6:2003:ea:973a:31c3:329c:23ff:fea6:a903]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by mail.alien8.de (SuperMail on ZX Spectrum 128k) with ESMTPSA id C093040E015F; Mon, 11 Nov 2024 11:28:26 +0000 (UTC) Date: Mon, 11 Nov 2024 12:28:19 +0100 From: Borislav Petkov To: Shiju Jose Cc: "linux-edac@vger.kernel.org" , "linux-cxl@vger.kernel.org" , "linux-acpi@vger.kernel.org" , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , "tony.luck@intel.com" , "rafael@kernel.org" , "lenb@kernel.org" , "mchehab@kernel.org" , "dan.j.williams@intel.com" , "dave@stgolabs.net" , Jonathan Cameron , "gregkh@linuxfoundation.org" , "sudeep.holla@arm.com" , "jassisinghbrar@gmail.com" , "dave.jiang@intel.com" , "alison.schofield@intel.com" , "vishal.l.verma@intel.com" , "ira.weiny@intel.com" , "david@redhat.com" , "Vilas.Sridharan@amd.com" , "leo.duran@amd.com" , "Yazen.Ghannam@amd.com" , "rientjes@google.com" , "jiaqiyan@google.com" , "Jon.Grimm@amd.com" , "dave.hansen@linux.intel.com" , "naoya.horiguchi@nec.com" , "james.morse@arm.com" , "jthoughton@google.com" , "somasundaram.a@hpe.com" , "erdemaktas@google.com" , "pgonda@google.com" , "duenwen@google.com" , "gthelen@google.com" , "wschwartz@amperecomputing.com" , "dferguson@amperecomputing.com" , "wbs@os.amperecomputing.com" , "nifan.cxl@gmail.com" , tanxiaofei , "Zengtao (B)" , Roberto Sassu , "kangkang.shen@futurewei.com" , wanghuiqiang , Linuxarm Subject: Re: [PATCH v15 11/15] EDAC: Add memory repair control feature Message-ID: <20241111112819.GCZzHqUz1Sz-vcW09c@fat_crate.local> References: <20241101091735.1465-1-shiju.jose@huawei.com> <20241101091735.1465-12-shiju.jose@huawei.com> <20241104061554.GOZyhmmo9melwI0c6q@fat_crate.local> <1ac30acc16ab42c98313c20c79988349@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <1ac30acc16ab42c98313c20c79988349@huawei.com> X-Rspamd-Queue-Id: E84E11C0015 X-Stat-Signature: 4cmzrhfxufxcmw6bxxcsz5rg6ej9u14k X-Rspam-User: X-Rspamd-Server: rspam05 X-HE-Tag: 1731324486-83069 X-HE-Meta: U2FsdGVkX190wf/40ZkQpdR8o+Z0TmhYOXfGOzX4XcUApQZWSAqEAehQFfUaiQq4SrLjdup9+nzii7JsPrxrUYl3AKb2HzpGLaKbXUnpD5/5V+zRWUjWa3y7LZZZWeugdIRzuQ3DMr50Ej3SD4HxV/K1IjwmyJjcqcERiSrS7UHA9VOxtbUaTU93gP4m3SdB1UpMi6R9KZ75ITr7XyXW5XoF0RHiCi3IB6FTCF3YdkP8+6TtwrAUWwFlke+RKpSfKDPM6twlx/qrqFFJ7Yr5oJtS0vi1Qq3eiJ7blXsr/j14k0KviAfDKl4HmyMNQbhODl7IMl0nqZnwc6zRc9Mw0H/Bo89d5aol8TNdEMs/lB/b8CETOrjttPOujL3Ip9W7XLHVkptqu0k+OPKBjJP7zxAaP79w5DnksK5nomqar4ucwcA8EEHfmZV9fuHXdARJ5RawtcFVrPjc9UlKq5PfUclG+ZyXm++cwI3l2ZUfR9lw1AQUgt1zzyxBgo+TULQ855rlAQ4Kxi9BtdHDLiYfGAYtJNs6de2QaE80QRr38WTrqmU5O4g/8eHlScMzQC/PC9mFa7xt/0fxlRhS8smjzRDYWeGfOyIVRuBgiE0GvAWMcIf1O7SpdTwNY4Dz3Me1+RxwcmdM5d7fIdWImAejLpYuN0v+RAkHLsjqgo0Dy2BpxOfSrY1wlGPtU6dIeJfHOoXd9u7CmSdl26NycgNnKbcadEZpKOowBj4s9H7nDoZBRTYd0MbslI/onMCOGxnbJ5GZObIxpYp8qQ2s0WbljpSsNuEcq4Tz8JwohIVqR7QaZtkt1StxmWbFVWFaMAVY5rcSxA5hquhgAw/iTW/IHVFV1jWnSfLlyl7N3qDKfAa7RH6DkDBw9avDRGtn4HzbbayWZF2ye0x7rQQ2jZt2vKsi6+foJYx2oiRiijAcGZmIUMIKvcb5c/emHfui2XJ4SdxbEqHiJNupceNjFPx GDSNsEIg M22QPqizbt1GZg1oTxgyvfQkTTGc3Ty+Fk2L64DEWdtzpya2Ki+4aRdPcwUrIMRbHwvCa5LF4gM/IMqSkWHp0F4B+GxR/cwf3uppHdGW2AdZcpfNQfmT4vIq9kElOqE5lqHem+lA4GEkuyS757saBaHQB2FG/cfWBIH2QRqt97fObvD6J9k0+vlJashjPIgMV6YHf96RxdZO6muhDBiE2iKINHYZgGB+zCEdmcnjPrCWo48aF2+uQK3WHi89TRZDPYGMg5QDhePqQzkXUu4CGQyl4jopufar73zKzNnt8yQwHk+wxR6UtM5fvi1NZxAiemgInQkg/eq+L0rrEbZuFoO/kQUeuWAw/D3xnONw2tpnd5SM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000036, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Nov 04, 2024 at 01:05:31PM +0000, Shiju Jose wrote: > More detailed explanation of PPR and memory sparing and use cases was added > in Documentation/edac/memory_repair.rst, which is part of the last common > patch ("EDAC: Add documentation for RAS feature control") added for > documentation of various RAS features supported in this series. Was not sure > the file to be part of this patch or not. If the commit message doesn't contain a justification for a patch's existence, why do you even bother sending it? IOW, no redirections pls - just state here what the use case is in short. You can always go nuts into details in the docs. > persist_mode used to readback the value of persist_mode presently set. For > eg. 1 - soft memory sparing for a sparing instance, though the CXL memory > device supports both soft and hard sparing, which is configurable. > persist_mode_avail used to return the temporary and permanent repair > capability of the device. Wait, sysfs does a one value per file thing. What does persist_mode_avail give? Surely you can't dump a list of all available modes... >From that doc: root@localhost:~# cat /sys/bus/edac/devices/cxl_mem0/mem_repair0/persist_mode_avail 0 Does that mean only sPPR is available? If only one mode is available, why am I even querying things? There's no other option. Catch my drift? > Also I will update here with more details which was given in the last part > of this document about DPA. Some memory devices (For eg. a CXL memory > device) may expect the Device Physical Address(DPA) for a repair operation > instead of Host Physical Address(HPA), because it may not have an active > mapping in the main host address physical address map. 'dpa_support' > attribute used to return this info to the user. All this stuff needs to be documented properly and especially how one is supposed to use this interface. Not have people go read CXL specs just to be able to even try to use this. I'd like to see clear steps in the docs what to do and what they mean. > The nibble mask actually for CXL memory PPR and memory sparing operations, > which is reported by the device in DRAM Event Record and to the userspace in the > CXL DRAM trace event. > Please see the details from the spec. This is *exactly* what I mean! If I have to see the spec in order to use an interface, than that's a major fail. > I was not sure add or not these CXL specific details in this EDAC document. So that document should contain enough info on how to use the interface. You can always put links to the spec giving people further reading but some initial how-do-I-use-this-damn-thing example should be there so that people can find their way around this. > The visibility of these control attributes to the user in sysfs is decided > by the is_visible() callback in the EDAC, which in turn depends on a memory > device support or not the control of a repair attribute. That still doesn't answer my question: what are valid values I can put in all those? Try as many as I can until one sticks? This is not a good interface. And since sysfs does one-value-per-file, dumping ranges here is kinda wrong. > This attribute used request to determine availability of resources for a repair operation > (For eg. memory PPR and sparing operation) for a given address and memory attributes set. > The device may return result for this request in different ways. > For example, in CXL device request query resource command for a, > 1. PPR operation returns resource availability as a return code of the command. > 2. memory sparing operation, the device will report the resource availability by producing a > Memory Sparing Event Record and memory sparing trace event to the userspace. > > May be 'dry-run' better name instead of query? Maybe this should not exist at all: my simple thinking would say that determining whether resources are available should be part of the actual repair operation. If none are there, it should return "no resources available". If there are, it should simply use them and do the repair. Exposing this as an explicit step sounds silly. > >Yeh, this needs to be part of the interface and not hidden in some obscure doc. > Adding this info in Documentation/edac/memory_repair.rst is sufficient? Yap, for example. You can always concentrate the whole documentation there and point to it from everywhere else. > The details of the repairing control was added in > Documentation/edac/memory_repair.rst, which is part of the common > patch ("EDAC: Add documentation for RAS feature control"). Ok, point to it pls in this doc so that people can find it. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette