From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73AFAC021A9 for ; Mon, 17 Feb 2025 13:24:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B1373280050; Mon, 17 Feb 2025 08:24:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A9BF128004D; Mon, 17 Feb 2025 08:24:25 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8EEF5280050; Mon, 17 Feb 2025 08:24:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 6F19228004D for ; Mon, 17 Feb 2025 08:24:25 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id EF10B1C6BEE for ; Mon, 17 Feb 2025 13:24:24 +0000 (UTC) X-FDA: 83129505648.11.A7FD79D Received: from mail.alien8.de (mail.alien8.de [65.109.113.108]) by imf08.hostedemail.com (Postfix) with ESMTP id 5874516000B for ; Mon, 17 Feb 2025 13:24:22 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=alien8.de header.s=alien8 header.b=W5GpODPx; spf=pass (imf08.hostedemail.com: domain of bp@alien8.de designates 65.109.113.108 as permitted sender) smtp.mailfrom=bp@alien8.de; dmarc=pass (policy=none) header.from=alien8.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739798663; a=rsa-sha256; cv=none; b=vGv8QObe99Q/Wz692FVvuZ0BBJ6w6JhyrsdpXTJGqOWlLh3W4nZUFUeS/ss9tYFuK7D3sd ZnurwnB/pMWYeEoeGbm5rytPw3o301Or1sLvGZj6vcBJgaR/jsnlmEbz91PPXZXO+BhhwD P5+lS0n1Cc/UIT72vEqim0fSaQ6A7Ww= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=alien8.de header.s=alien8 header.b=W5GpODPx; spf=pass (imf08.hostedemail.com: domain of bp@alien8.de designates 65.109.113.108 as permitted sender) smtp.mailfrom=bp@alien8.de; dmarc=pass (policy=none) header.from=alien8.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739798663; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2onwpgkhFDPCWv6IkvPW5ud97T/4ZgdkdGEXvd7PoMo=; b=mem/XHiHXD3m7jempQLPoWwbwnMDb2al8FG/DGTdt639ozvUoy2A3LtgKao6MQYIuyUiUm ThVM7LFDmpBqHB99ND9z1TI1tEYYP4Py1B0Lm0r63lT/MAtV/RDytb1dc91BlH2f6Fcrft YcTEqlqCKfHe4Pg+OIsyijTGoZCOC6A= Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.alien8.de (SuperMail on ZX Spectrum 128k) with ESMTP id B2C6140E0202; Mon, 17 Feb 2025 13:24:17 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at mail.alien8.de Received: from mail.alien8.de ([127.0.0.1]) by localhost (mail.alien8.de [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id lN4sRkwnqkdm; Mon, 17 Feb 2025 13:24:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alien8.de; s=alien8; t=1739798652; bh=2onwpgkhFDPCWv6IkvPW5ud97T/4ZgdkdGEXvd7PoMo=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=W5GpODPxu+IPrJGOXGnZ1B5wGCKyMgZ+Mgc6kiVPzoOhvkKoL43D+fnagH6L3D6XF YyXcmSVFBjZh43qcK67uyF4rBOG8sp2OFeiKVINjS3yoD3HizoxrcO+X17FjO/I4PD Xajpqd7T+pGWlVuHox+hIxEaCNM0FGVFPxY8SXHrS0GRrJ1YHEYpzgGS8nDbESNwVA pIUHa3nxBuumd83oodh5wSMsqUho4qWzGG9UwdkiGgIwRdiO6EvA9AV2oXhBdg9fuZ UMolTUXcLw/457qUh5HlyYLoeKkjvaJlbSc54l3SgfoaOR/0+QK+v+vNCg1DZfNoHH aY5ebsyXqik6H/MatZNqD8YdIl1qgCAi+an4Ahq7D0cMnyaSamCQzZMKlSYj3gyUWC JFtS211IFEWiogAA+WqG3wQkX3u8OpqP49vipAWCSkrvvP4jkGpuqt/X4tjCEYNqTZ 3CD3fj2gr/gF38dJGM74crPzpdJ9VNUOmHOBk96MxVdyhuj5CJBqN7hfnLygU4sjVB oo/YsBGKvhrr/j/R6yB5WJ6c+wR/csa0nQ1SH1Lydkmjxaei/3oxTyxcr0W8ynSux2 ivPkA53NrV2SzUB0f9Jw1IPxzGnlH8wKhgNSl8Z95rKet4YraRinAXNMkh33XVqlDT ohdAZxfPfHbjZypFwZEqaN5I= Received: from zn.tnic (pd95303ce.dip0.t-ipconnect.de [217.83.3.206]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by mail.alien8.de (SuperMail on ZX Spectrum 128k) with ESMTPSA id 4217540E0176; Mon, 17 Feb 2025 13:23:28 +0000 (UTC) Date: Mon, 17 Feb 2025 14:23:22 +0100 From: Borislav Petkov To: Jonathan Cameron Cc: Shiju Jose , "linux-edac@vger.kernel.org" , "linux-cxl@vger.kernel.org" , "linux-acpi@vger.kernel.org" , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , "tony.luck@intel.com" , "rafael@kernel.org" , "lenb@kernel.org" , "mchehab@kernel.org" , "dan.j.williams@intel.com" , "dave@stgolabs.net" , "dave.jiang@intel.com" , "alison.schofield@intel.com" , "vishal.l.verma@intel.com" , "ira.weiny@intel.com" , "david@redhat.com" , "Vilas.Sridharan@amd.com" , "leo.duran@amd.com" , "Yazen.Ghannam@amd.com" , "rientjes@google.com" , "jiaqiyan@google.com" , "Jon.Grimm@amd.com" , "dave.hansen@linux.intel.com" , "naoya.horiguchi@nec.com" , "james.morse@arm.com" , "jthoughton@google.com" , "somasundaram.a@hpe.com" , "erdemaktas@google.com" , "pgonda@google.com" , "duenwen@google.com" , "gthelen@google.com" , "wschwartz@amperecomputing.com" , "dferguson@amperecomputing.com" , "wbs@os.amperecomputing.com" , "nifan.cxl@gmail.com" , tanxiaofei , "Zengtao (B)" , Roberto Sassu , "kangkang.shen@futurewei.com" , wanghuiqiang , Linuxarm , Vandana Salve Subject: Re: [PATCH v18 04/19] EDAC: Add memory repair control feature Message-ID: <20250217132322.GCZ7M4Somf2VYvbwHb@fat_crate.local> References: <20250109151854.GCZ3_o3rf6S24qUbtB@fat_crate.local> <20250109160159.00002add@huawei.com> <20250109161902.GDZ3_29rH-sQMV4n0N@fat_crate.local> <20250109183448.000059ec@huawei.com> <20250111171243.GCZ4Kmi5xMtY2ktCHm@fat_crate.local> <20250113110740.00003a7c@huawei.com> <20250121161653.GAZ4_IdYDQ9_-QoEvn@fat_crate.local> <20250121181632.0000637c@huawei.com> <20250122190917.GDZ5FCXetp9--djyQ6@fat_crate.local> <20250206133949.00006dd6@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20250206133949.00006dd6@huawei.com> X-Rspamd-Queue-Id: 5874516000B X-Stat-Signature: puxej3i49o5f8ijkop5tga3qrd7cn6c9 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1739798662-73654 X-HE-Meta: U2FsdGVkX1+XcLY6DgxXTo4MHI7Jx8wfnmv38GNdtzTEwUiDY4XXUEn/j2KZQqBV6pryuynrOTqH2wfXwc8pbszY3w4DOXOU8/F9llq89Qeb9htdr6Wasj5XEevnMqx6+sMOHIfbwez0Mp3DF4SDAapJ3Ql4mgKQlFN/mpuofv59O1PhWaTrqt5rv623c8AgHSPhYHxFHFBs/yqVa4lpipJgZ01BvmYrFy8Jic1r49nZYbgeiHWQ/vFt2X1K4vcfknDPUQ+kemgAQ7zusY/Nv6I5T6vpgguGc6O5KnrfO7TGil8yqbtdWHEK3rbCbbrTO7StU7fZz3u40h+FvVMAkzeY9hTsGrll+3brSZjxcmGAI8sRKE7bOwMKtKFJY7tj47RJRst7yjjLLZgwSAXJEOOP5bRcHPSHs8v+eVIYNuiScxiJgVChtUPAl9+OFte1hhghBGwMC0kcg5zKQvX9rzx95daS9IZxEiZAmhPGsDUNWlcSZ3f+49xUXTRMUQueq8K6ydxuoE7ux2QdDaVISNlkp1Fz2HIH7W3iQkz5bXEswnHWGgqTfNboH/+hF7zYtM8iIwtuXcXuKBM5ecpNaTQUAroMdWJLAdiX5sNH2oUCTdGGwnynKcp+FSbSVUPa8z7b+Mb43JwKgZ/kOQGEIqOCY49Vvczgg3rgGIm29NiJnLaPMoJSheqNcu9nkg0E2lepPPc1mU2ynNdAFYN7khKCF9/q2Rp4QwbifeLr6LSb0cceY5XuZ1gqkvaoDjHfQCXT1AeSYnJaVyCOpofbIvGyj7UAc5M7ma1sfgI3UPoAKqikwnF9uaGn9IRQdQhh1oQvR4iol4hsmbWcomAaSo2D9paBn05ulpVUWt7ylDLGQe1cBrKmFnsVwGcBP5l3IwHwpbhDvSBI07XbPBndhNRVR/ey89RrqhiX9GpzgnLHKkxokh2js+unmiljn9ElmMMIu6dIoVC84COS0CH T6agL3v2 8saxhQ6e7xTF9qT/3o4CN6G20kW50ZHR+41hGImoUfxOKQUl7u+PJa4H8uryCpDVS7RuQdSGv45+jDcX3r7KHY4/gKhLY3Eqq3jOWqaLrOoKosholhQinmdZI4ZGbPUX+q5o7PRTcMRg+wjrTE3FAlnVPwccfpXtryxVRM26C0Tl5kbUQMvNAMPCBA1SmHukKsmSosTkPLrHA/JHcZpxDJ+kXbyZJregK3Gt8IvifaBJWiGbkF5XJYutgVIpmbhwO5fsZHyZU9pc5whRnRcHdjpzrO8+xT2+QLUc/FApG5hAg+UE3+E/1LWlAgIFZ01jhpmnv8f3Ubju/35gRiZxYV2WI/jchmEbJXoQy+bCDfXUuDP9lU9hO1GoqUlj9xBrJHED/snPLEV4G1sE4H0DZBbDrUg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.110287, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Feb 06, 2025 at 01:39:49PM +0000, Jonathan Cameron wrote: > Shiju is just finalizing a v19 + the userspace code. So may make > sense to read this reply only after that is out! Saw them. So, from a cursory view, all that sysfs marshalling that happens in patch 1 and 2 here: https://lore.kernel.org/r/20250207143028.1865-1-shiju.jose@huawei.com is not really needed, AFAICT. You can basically check CXL_EVENT_RECORD_FLAG_MAINT_NEEDED *in the kernel* and go and start the recovery action. rasdaemon is basically logging the error record and parroting it back into sysfs which is completely unnecessary - the kernel can simply do that. Patches 3 and 4 are probably more of a justification for the userspace interaction as the kernel driver is "not ready" to do recovery for . But there I'm also questioning the presence of the sysfs interface - the error record could simply be injected raw and the kernel can pick it apart. Or maybe there's a point for rasdaemon to ponder over all those different attributes and maybe involve some non-trivial massaging of error info in order to come at some conclusion and inject that as a recovery action. I guess I'm missing something and maybe there really is a valid use case to expose all those attributes through sysfs and use them. But I don't see a clear reason now... > For this comment I was referring letting the kernel do the > stats gathering etc. We would need to put back records from a previous boot. > That requires almost the same interface as just telling it to repair. > Note the address to physical memory mapping is not stable across boots > so we can't just provide a physical address, we need full description. Right. > Ah. No not that. I was just meaning the case where it is hard PPR. (hence > persistent for all time) Once you've done it you can't go back so after > N uses, any more errors mean you need a new device ASAP. That is as decision > with a very different threshold to soft PPR where it's a case of you > do it until you run out of spares, then you fall back to offlining > pages. Next boot you get your spares back again and may use them > differently this time. Ok. > True enough. I'm not against doing things in kernel in some cases. Even > then I want the controls to allow user space to do more complex things. > Even in the cases where the devices suggests repair, we may not want to for > reasons that device can't know about. Sure, as long as supporting such a use case is important enough to warrant supporting a user interface indefinitely. All I'm saying is, it better be worth the effort. > The interface provides all the data, and all the controls to match. > > Sure, something new might come along that needs additional controls (subchannel > for DDR5 showed up recently for instance and are in v19) but that extension > should be easy and fit within the ABI. Those new 'features' will need > kernel changes and matching rasdaemon changes anyway as there is new data > in the error records so this sort of extension should be fine. As long as you don't break existing usage, you're good. The moment you have to change how rasdaemon uses the interface with a new rasdaemon, then you need to support both. > Agreed. We need an interface we can support indefinitely - there is nothing > different between doing it sysfs or debugfs. That should be > extensible in a clean fashion to support new data and matching control. > > We don't have to guarantee that interface supports something 'new' though > as our crystal balls aren't perfect, but we do want to make extending to > cover the new straight forward. Right. > If a vendor wants to do their own thing then good luck to them but don't expect > the standard software stack to work. So far I have seen no sign of anyone > doing a non compliant memory expansion device and there are quite a > few spec compliant ones. Nowadays hw vendors use a lot of Linux to verify hw so catching an unsupported device early is good. But there's always a case... > > We will get weird memory devices with accelerators perhaps but then that > memory won't be treated as normal memory anyway and likely has a custom > RAS solution. If they do use the spec defined commands, then this > support should work fine. Just needs a call from their drive to hook > it up. > > It might not be the best analogy, but I think of the CXL type 3 device > spec as being similar to NVME. There are lots of options, but most people > will run one standard driver. There may be custom features but the > device better be compatible with the NVME driver if they advertise > the class code (there are compliance suites etc) Ack. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette