From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9EAD9C04FFE for ; Sat, 11 May 2024 10:18:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E20656B0174; Sat, 11 May 2024 06:18:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DAAC06B0175; Sat, 11 May 2024 06:18:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C4A916B0176; Sat, 11 May 2024 06:18:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id A737C6B0174 for ; Sat, 11 May 2024 06:18:19 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 6730DA0329 for ; Sat, 11 May 2024 10:18:19 +0000 (UTC) X-FDA: 82105715118.10.38CBE9B Received: from mail.alien8.de (mail.alien8.de [65.109.113.108]) by imf02.hostedemail.com (Postfix) with ESMTP id 0B1038000A for ; Sat, 11 May 2024 10:18:16 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=alien8.de header.s=alien8 header.b=DKtzd3Su; dmarc=pass (policy=none) header.from=alien8.de; spf=pass (imf02.hostedemail.com: domain of bp@alien8.de designates 65.109.113.108 as permitted sender) smtp.mailfrom=bp@alien8.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1715422697; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OWiW3+Pv1PHgnBB9aOP0p9srtw8n5xByedUyYyJohrQ=; b=h8yY53nphGG7/6qaQlwr9yreX1ZyKgEUfpQLHm4HhxbrgqtbExfgqI5HPEhGyeal3G2zGT 02K/RtFD9LEfjO2t5wva7+zGiyBdyBqKUnGregAOg1MDn4s0ZDnyPi4NlvE8S68J00dEHy B/4YxqWH2BdI5SiL5w4cgzDXFQU+iD0= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=alien8.de header.s=alien8 header.b=DKtzd3Su; dmarc=pass (policy=none) header.from=alien8.de; spf=pass (imf02.hostedemail.com: domain of bp@alien8.de designates 65.109.113.108 as permitted sender) smtp.mailfrom=bp@alien8.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1715422697; a=rsa-sha256; cv=none; b=JdTG6GamQJ5acfVljQZ2Llbupg3WUP8R0Cpiz+DxBgClRcDYnXeKCAUb/4aZ7KyqsY/Bj5 tzzQM/pPJmJjVw6iDv6pPMrYO+9+Kb7OKw3m+vFFYBkHfpFZTcHuQnCdxFdUZMuySKFpCg GAPbOSauK6P5nwpKBtngdTQOtHAhnC4= Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.alien8.de (SuperMail on ZX Spectrum 128k) with ESMTP id D720C40E016A; Sat, 11 May 2024 10:18:13 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at mail.alien8.de Received: from mail.alien8.de ([127.0.0.1]) by localhost (mail.alien8.de [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 5d3Fh0OE80Bw; Sat, 11 May 2024 10:18:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alien8.de; s=alien8; t=1715422690; bh=OWiW3+Pv1PHgnBB9aOP0p9srtw8n5xByedUyYyJohrQ=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=DKtzd3Suq42E445Ys800MflaKzKL/t+f2IE/nuTcCvY+qSzRtYBJX1fUzD6X4bc0w 4lBQt+IiQuJoneh1wnZK35b3FrTZwFEy6Oz24JM2JQpIowf76hBIpPKc4Cp2b3yFYb Gs9jXbx8WKJEswJL183a/f9ncBJEBABVuSv5iQ+UHZAdp2lmGSR510w8AwTyRjN20V lpb2XAoIMfGB1ttF6Ic1r5rxq9witRmqaLDBNvJdNqxYQ9i7lRibzcYsUz7/vR5mjy ItffzVSf/m6opBDMdtBYsLgZmHmkoyvzVHdLZcI1relrEqj6vJO3AXvfYYLWT5H+8j kB2KxVxBDNXxTBwb4z4XkPndnAk0+l3CDvOEMRdIr/CD6cyqXOPzO2ZrjLhxRFmHjP VdXtmk+OiA86+br87CaKvlL81SUoN8KgF6T1qE/AiTtuE305nD76ETC3zTKW5iWyKu Jkd/dSNcUNwQy6yHha7nZ0kZSCbJLZ2kivpcbN7BoOjuhCORY3UhxpbZeCYEbfMxRI IDZquGjtvp0BjbIr7Aw1c7AiCtMcKxIRRR2RcTV+FiI1glo0pxT7DVnJdFAaLeivMD 9cfHOXQabloDRE6Pa7JS/BAgwwf4DTChNl/RDmtnr43rMIiHCaptgU0XJNWad/1IrH osI6PeDIC9ZhddFbI0sv+3x4= Received: from zn.tnic (pd953020b.dip0.t-ipconnect.de [217.83.2.11]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by mail.alien8.de (SuperMail on ZX Spectrum 128k) with ESMTPSA id ECC9E40E022F; Sat, 11 May 2024 10:17:23 +0000 (UTC) Date: Sat, 11 May 2024 12:17:19 +0200 From: Borislav Petkov To: Dan Williams Cc: Jonathan Cameron , Shiju Jose , "linux-cxl@vger.kernel.org" , "linux-acpi@vger.kernel.org" , "linux-mm@kvack.org" , "dave@stgolabs.net" , "dave.jiang@intel.com" , "alison.schofield@intel.com" , "vishal.l.verma@intel.com" , "ira.weiny@intel.com" , "linux-edac@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "david@redhat.com" , "Vilas.Sridharan@amd.com" , "leo.duran@amd.com" , "Yazen.Ghannam@amd.com" , "rientjes@google.com" , "jiaqiyan@google.com" , "tony.luck@intel.com" , "Jon.Grimm@amd.com" , "dave.hansen@linux.intel.com" , "rafael@kernel.org" , "lenb@kernel.org" , "naoya.horiguchi@nec.com" , "james.morse@arm.com" , "jthoughton@google.com" , "somasundaram.a@hpe.com" , "erdemaktas@google.com" , "pgonda@google.com" , "duenwen@google.com" , "mike.malvestuto@intel.com" , "gthelen@google.com" , "wschwartz@amperecomputing.com" , "dferguson@amperecomputing.com" , "wbs@os.amperecomputing.com" , "nifan.cxl@gmail.com" , tanxiaofei , "Zengtao (B)" , "kangkang.shen@futurewei.com" , wanghuiqiang , Linuxarm , Greg Kroah-Hartman , Jean Delvare , Guenter Roeck , Dmitry Torokhov Subject: Re: [RFC PATCH v8 01/10] ras: scrub: Add scrub subsystem Message-ID: <20240511101705.GAZj9FoVbThp7JUK16@fat_crate.local> References: <4ceb38897d854cc095fca1220d49a4d2@huawei.com> <20240508192546.GHZjvRuvtu0XSJbkmz@fat_crate.local> <20240509101939.0000263a@Huawei.com> <20240509200306.GAZj0r-h5Tnc0ecIOz@fat_crate.local> <663d3e58a0f73_1c0a1929487@dwillia2-xfh.jf.intel.com.notmuch> <20240509215147.GBZj1Fc06Ieg8EQfnR@fat_crate.local> <663d55515a2d9_db82d2941e@dwillia2-xfh.jf.intel.com.notmuch> <20240510092511.GBZj3n9ye_BCSepFZy@fat_crate.local> <663e55c59d9d_3d7b429475@dwillia2-mobl3.amr.corp.intel.com.notmuch> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <663e55c59d9d_3d7b429475@dwillia2-mobl3.amr.corp.intel.com.notmuch> X-Rspamd-Queue-Id: 0B1038000A X-Stat-Signature: 3jf5z3eabchey9o363d83zi454wayt9k X-Rspam-User: X-Rspamd-Server: rspam04 X-HE-Tag: 1715422696-181362 X-HE-Meta: U2FsdGVkX1+aIZDbUA7TO/ApG8LyhvvlzcGSFCyHEt1GGTykdAsxWb5X+p+SF+nse3eHU4DjGwqzGej8uQWzHA1eXyDA/4w+zfLml4J+xx/U1zl+qb4fzC0aepA7nt6ptzaZcUSAD5uSZ5e9MBCGsiX9B1uUkHdrgtET/uzgHDlno/ZOBFFW4zw2pi8YNrqVz0ptbSn+MZ1w5uJo+F4mWbbaYDSu+RIeEn91detvlVxuRRSogw8mB3GZZ5cz6MCNPmBgNDSdj/TOjmKSFilMr/HQstpzBMm/Hy3ERQ2ivqRnfB0fv4DBLd5JWVqqzcRk2H1Y92Y4pk8dJ8PKye1o79gsX3Pr7ZDvdEnL8ZNVzsQ3DS2lyJG6lDjzLoY7B/s9mLitZmUxcpNld0XCVgHCyJgooJ9+C2cYUXH2Ujs7GDu03wrF8r64KJVV47bQsirAkbXMVWjfORQYEqNK6GzKg6puJz9YgsUSfJ/qdkCHsERVVhG6T8+zDBFDzbh7k/yxffp+Tjvh2rOf7ebBDDdIP4bNFvLxYkqfTRfbC73Hg1/vG+fam9U2D5f0rnuAAuJ/INNdaIALG8yVEvzg6lmSbIrCEsj1ZG2hjjfp0qu2wknrj2AIlsVj9Dh3296LWV0hKqyRLb/poTexLxtuQIcncR7k2UWkDvmu0VBc5cyA0cxpVboduOuhkj4oNoLWj+Aurl70o0WSvxlBNRsqgKg08Jc/c2/P3JT7CcwQ8P/06OLoKS+JauAN8h2CMARJgJTUyuaMYwxwDbk7SQXsJUw7ncXIhKx8/rpf0kh2y1+BRGlsiZLfabhp+UpSz4npevPRXEZt5XcrsVErWm08gSnQWarl1Mu1cDJI2l97po0xuhhkiRqf2k90MTFwVvjHvEsUg46ElSPEtfcLuWu8yGHS+sn0wWdGCqKt0E0Ir1tq+DZ6kKHggjrPdtISWrgIiOYCgcgeqpE6ubK76Yp1F8b W6SE4NGI QA3N/Aj+0Gk8hB5RIOsOUU4PI1MmtDX0Tj9XUtqRNun3yIp/fLY8R6z5vnHnq9n3cFKUnuhdiwpX4QQDezNQrmSxy2arsU/FsMgIXCLOLWM7q5+nH6CvpIaad1cknukinU3fHQ951vQ9hq4DVyN5Htxc4xGVUM9hScGvnhiYIV3NF//K1lM+LfXxnsFunGVw7gjBkxVOA6q63RGoqnV1TtNJ2JwZEqkmIfNbFEGcLWV9MSq72trJi10zgOnbb++3zrVD7rr5lFf7Y3N67f87UWPYox+e5hx4LVhSTjZk5CJVNSVckOXgB/z9wZQ0YUdhP8mMQ+0QxfukCVgFxJ1wieadB/w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, May 10, 2024 at 10:13:41AM -0700, Dan Williams wrote: > In fact this question matches my reaction to the last posting [1], and > led to a much improved cover letter and the "Comparison of scrubbing > features". To your point there are scrub capabilities already in the > kernel and we would need to make a decision about what to do about them. The answer to that question is whether this new userspace usage is going to want to control those too. So "Use case of scrub control feature" from the cover letter is giving two short sentences about what one would do but I'm still meh. A whole subsystem needing a bunch of effort would need a lot more justification. So can anyone please elaborate more on the use cases and why this new thing is needed? > I called out NVDIMM ARS as one example and am open to exploring > converting that to the common scrub ABI, but not block this proposal > in the meantime. > > For me the proposal can be boiled down to, "here we (kernel community) > are again with 2 new scrub interfaces to add to the kernel. Lets step > back, define a common ABI for ACPI RAS 2 and CXL to stop the > proliferation of scrub ABIs, and then make a decision about when/whether > to integrate legacy scrub facilities into this new interface". Fully agreed as long as there's valid users for it and we don't end up designing and supporting an interface which people are not sure if anyone uses. ras_userspace_consumers() from the other thread case-in-point. > [1]: http://lore.kernel.org/r/65d6936952764_1138c7294e@dwillia2-xfh.jf.intel.com.notmuch ^^^^^ Ha, you're speaking what I'm thinking here. :-) > The scrub_core, like edac_core, has no method to detect scrubbing > facility, it is simply a passive library waiting for the first > scrub_device_register() call. Well, those scrub things still have methods which are better than nothing. EDAC is ancient. But ok, let's just say they're the same for the sake of simplicity. > Yeah, that's backwards. CXL enumeration belongs in the CXL driver and > the CXL driver is fully responsible for deciding when to incur the costs > of loading scrub_core. Ok, fair enough. > Assume that it does and memory_scrub_control_init() finds no scrub > facilities in any CXL devices and fails memory_scrub_control_init(). Any > module that links to scrub_device_register() will also fail to load > because module symbol resolution depends on all modules completing init. My angle was: scan the system for *all* possible scrub functionalities and if none present, then fail. And since they're only two... > Sure, but that's a driver-probe-time facility, not a module_init-time > facility. Oh well. > I assume you do not consider edac_core a mess? The whole EDAC is a mess but that's a whole another story. :-) > Now, the question of how many legacy scrub interfaces should be > considered in this design out of the gate is a worthwhile discussion. I > am encouraged that this ABI is at least trying to handle more than 1 > backend, which makes me feel better that adding a 3rd and 4th might not > be prohibitive. See above. I'm perfectly fine with: "hey, we have a new scrub API interfacing to RAS scrub capability and it is *the* thing to use and all other hw scrub functionality should be shoehorned into it. So this thing's design should at least try to anticipate supporting other scrub hw. Because there's EDAC too. Why isn't this scrub thing part of EDAC? Why isn't this scrub API part of edac_core? I mean, this is all RAS so why design a whole new thing when the required glue is already there? We can just as well have a /sys/devices/system/edac/scrub/ node hierarchy and have everything there. Why does it have to be yet another thing? And if it needs to be separate, who's going to maintain it? > Which matches what I reacted to on the last posting: > > "Maybe it is self evident to others, but for me there is little in these > changelogs besides 'mechanism exists, enable it'" > > ...and to me that feedback was taken to heart with much improved > changelogs in this new posting. Ok. > This init time feature probing discussion feels like it was born from a > micommunication / misunderstanding. Yes, it seems so, thanks for clarifying things. I still am unclear on the usecases and how this is supposed to be used and also, as mentioned above, we have a *lot* of RAS functionality spread around the kernel. Perhaps we should start unifying it instead of adding more... So the big picture and where we're headed to, needs to be clarified first. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette