From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52AE4E77199 for ; Thu, 9 Jan 2025 16:02:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D55286B0092; Thu, 9 Jan 2025 11:02:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D04646B0099; Thu, 9 Jan 2025 11:02:09 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B56236B0093; Thu, 9 Jan 2025 11:02:09 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 984066B008C for ; Thu, 9 Jan 2025 11:02:09 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 22F711C6954 for ; Thu, 9 Jan 2025 16:02:09 +0000 (UTC) X-FDA: 82988379978.01.FCDB9FF Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by imf09.hostedemail.com (Postfix) with ESMTP id 3E673140010 for ; Thu, 9 Jan 2025 16:02:05 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=none; spf=pass (imf09.hostedemail.com: domain of jonathan.cameron@huawei.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=jonathan.cameron@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736438527; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=y0PWGH4sRymFtlJ68A2uBZmdTy5U3X4xkjcBkVvkfME=; b=yXiMPCJ40alvuvlCCVqLbxBT4CGeSgEMbWXHYaAyTOeGEtMTA8uZ+bfEzMcOLYTOjAKU+k NzaMvLmj2kVsFpZPndnDrjKDtYDqNJ61q9YkixMdDn+AkPrJyglL9kZBbo0h1xrK+CpNkh aGixkyNN7WXpjdedsfkYV++aZJVWo5o= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=none; spf=pass (imf09.hostedemail.com: domain of jonathan.cameron@huawei.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=jonathan.cameron@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736438527; a=rsa-sha256; cv=none; b=zyyy4k7XXwom7Fdg+296OsnBAuG+WOiaN+1fN7k0wmFBMsu7cOn1/tyBRWDSsD5vIuVZr2 eDk2SwP8pwomZA6T3edAkHGlqRJEA3nFqCOreORZ09x8U8IdkddHhQvJXEu4iAkHRC08GU B2KQLxfjMd0iScEdKm9xQdMux/J2U08= Received: from mail.maildlp.com (unknown [172.18.186.31]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4YTTv74rVcz6JBCB; Thu, 9 Jan 2025 23:57:23 +0800 (CST) Received: from frapeml500008.china.huawei.com (unknown [7.182.85.71]) by mail.maildlp.com (Postfix) with ESMTPS id 467541403A2; Fri, 10 Jan 2025 00:02:02 +0800 (CST) Received: from localhost (10.203.177.66) by frapeml500008.china.huawei.com (7.182.85.71) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Thu, 9 Jan 2025 17:02:00 +0100 Date: Thu, 9 Jan 2025 16:01:59 +0000 From: Jonathan Cameron To: Borislav Petkov CC: Shiju Jose , "linux-edac@vger.kernel.org" , "linux-cxl@vger.kernel.org" , "linux-acpi@vger.kernel.org" , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , "tony.luck@intel.com" , "rafael@kernel.org" , "lenb@kernel.org" , "mchehab@kernel.org" , "dan.j.williams@intel.com" , "dave@stgolabs.net" , "dave.jiang@intel.com" , "alison.schofield@intel.com" , "vishal.l.verma@intel.com" , "ira.weiny@intel.com" , "david@redhat.com" , "Vilas.Sridharan@amd.com" , "leo.duran@amd.com" , "Yazen.Ghannam@amd.com" , "rientjes@google.com" , "jiaqiyan@google.com" , "Jon.Grimm@amd.com" , "dave.hansen@linux.intel.com" , "naoya.horiguchi@nec.com" , "james.morse@arm.com" , "jthoughton@google.com" , "somasundaram.a@hpe.com" , "erdemaktas@google.com" , "pgonda@google.com" , "duenwen@google.com" , "gthelen@google.com" , "wschwartz@amperecomputing.com" , "dferguson@amperecomputing.com" , "wbs@os.amperecomputing.com" , "nifan.cxl@gmail.com" , tanxiaofei , "Zengtao (B)" , "Roberto Sassu" , "kangkang.shen@futurewei.com" , wanghuiqiang , Linuxarm Subject: Re: [PATCH v18 04/19] EDAC: Add memory repair control feature Message-ID: <20250109160159.00002add@huawei.com> In-Reply-To: <20250109151854.GCZ3_o3rf6S24qUbtB@fat_crate.local> References: <20250106121017.1620-1-shiju.jose@huawei.com> <20250106121017.1620-5-shiju.jose@huawei.com> <20250109091915.GAZ3-Uk3rkuh38cQyy@fat_crate.local> <3b2d4275d1d24dbeacee0f192ac4d69b@huawei.com> <20250109123222.GBZ3_B1g3Esgu1-MPi@fat_crate.local> <20250109142433.00004ea7@huawei.com> <20250109151854.GCZ3_o3rf6S24qUbtB@fat_crate.local> X-Mailer: Claws Mail 4.3.0 (GTK 3.24.42; x86_64-w64-mingw32) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.203.177.66] X-ClientProxiedBy: lhrpeml500002.china.huawei.com (7.191.160.78) To frapeml500008.china.huawei.com (7.182.85.71) X-Rspamd-Server: rspam05 X-Stat-Signature: utrqjig1d51tqmxgxksaukes4pozcriw X-Rspamd-Queue-Id: 3E673140010 X-Rspam-User: X-HE-Tag: 1736438525-186977 X-HE-Meta: U2FsdGVkX1+ydfhUcOJRbk+1rPs7O68mQjkQ4SZpyjzb6e51C2EVMRyYOkRLa4fTyUe5PQ7qjltZGntAZ/JwAiXLlDE6r8Ojz0xEcS9MN9klma0y06AXJIlxx/5nfqmmm2yhHPCc731yQU2SkW1psvV1VJMW48wXEnWEQaPryJnuA4FxrH5Hjenv+mTnoeGK0w9GxvlTQAUAKpdkPkPY/beahwbHtTuYdw1svcXRLHSO0SppgCUQqxk5ymiSWQG0kglRIgOH7jOHMN4ZkJPTsLEycVTbhfy+zkLsSAn8n8x/wfObI8jcNCoefyd01c4ECCEmgOOq+lF9sKgqlD2IzXq9g5zFb7Sh3WnXBgUQEYETG1RrqrvsNUvaec89zBDQps4H2+T856GwDLn5y6jH7FfRXlbXltgcUMNzALkExtWiJ0HbpaxnDpab3+cWb/soH3u0vo6K212iG+3z94as4YvP8uaAMQ5qBaqQwfdRAuxURCoVP/WBVA8qpMC+YlgGXRd6u3jDi/MjWSarOjWZBuaNLDwhFly7SwKLCK0k+xgxbL1BvjGfuQmPCFDCre+jHAx6lSIohCvzS0VaUvBPjryohxx+JuvqB7aqLYw7aHW+pd72EA9g3BpzT+yXb/ETA6+H1YbRIy2aQE4uhUQRc5ExSRDF40NB+eFhQXaZn19lcA/+xze8V9Q2R/Ze2pWY6/zqgQqcG1aKbXH0a/WPtZC6EyIlFN+xmvFJemEfvqYlDRJx5A90vi0KWHkH9kCsfPJe1DXPVd4G93c0uidoiWVhjy/p60ZzUuZP0aQS2dFCmmwsOLSqsazdHrPkm8A66L2R4UKmsvxRGalHqRKLNktOhN9I1qNW0khny9tT7saUASyOxy8pIME4ObuD5aVuawpEu7gPaJ+zdhc6+CzPb0E7BX4e9P6foDdLEg0ieBHdjtiyJuSYY7hJMx7yKrD8A3pN3incwIuTUxZtySP sEBV4rS6 823eEl+3MlSeCGtOEVkRnxFiCpK7kbCUEYxjG+8mrXKZ7skZOm1sAkeGibrpVsc0nPN9AEPIPd9eK2jifjLQxbDwIcqnF2TxYw3Z+M/xAb6zDJOrP9URf2hLyHRoc9x40GQCo7Q5MHJTFJpN25j3oPrzjuBeCYPnMKKBSFiiiSXtM/2FaQWvIULnr8s+4McKAn6hgL66gTDU1NbYwmIpUAVAbIHYSyI2VB2qrEL/2nq03YlqLbSObU3tEvQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, 9 Jan 2025 16:18:54 +0100 Borislav Petkov wrote: > On Thu, Jan 09, 2025 at 02:24:33PM +0000, Jonathan Cameron wrote: > > To my thinking that would fail the test of being an intuitive interface. > > To issue a repair command requires that multiple attributes be configured > > before triggering the actual repair. > > > > Think of it as setting the coordinates of the repair in a high dimensional > > space. > > Why? Ok. To me the fact it's not a single write was relevant. Seems not in your mental model of how this works. For me a single write that you cannot query back is fine, setting lots of parameters and being unable to query any of them less so. I guess you disagree. In interests of progress I'm not going to argue further. No one is going to use this interface by hand anyway so the lost of useability I'm seeing doesn't matter a lot. > > You can write every attribute in its separate file and have a "commit" or > "start" file which does that. That's what we have. > > Or you can designate a file which starts the process. This is how I'm > injecting errors on x86: > > see readme_msg here: arch/x86/kernel/cpu/mce/inject.c > > More specifically: > > "flags:\t Injection type to be performed. Writing to this file will trigger a\n" > "\t real machine check, an APIC interrupt or invoke the error decoder routines\n" > "\t for AMD processors.\n" > > So you set everything else, and as the last step you set the injection type > *and* you also trigger it with this one write. Agreed. I'm not sure of the relevance though. This is how it works and there is no proposal to change that. What I was trying to argue was for an interface that let you set all the coordinates and read back what they were before hitting go. > > > Sure. In this case the addition of min/max was perhaps a wrong response to > > your request for a way to those ranges rather than just rejecting a write > > of something out of range as earlier version did. > > > > We can revisit in future if range discovery becomes necessary. Personally > > I don't think it is given we are only taking these actions in response error > > records that give us precisely what to write and hence are always in range. > > My goal here was to make this user-friendly. Because you need some way of > knowing what valid ranges are and in order to trigger the repair, if it needs > to happen for a range. In at least the CXL case I'm fairly sure most of them are not discoverable. Until you see errors you have no idea what the memory topology is. > > Or, you can teach the repair logic to ignore invalid ranges and "clamp" things > to whatever makes sense. For that you'd need to have a path to read back what happened. > > Again, I'm looking at it from the usability perspective. I haven't actually > needed this scrub+repair functionality yet to know whether the UI makes sense. > So yeah, collecting some feedback from real-life use cases would probably give > you a lot better understanding of how that UI should be designed... perhaps > you won't ever need the ranges, whow knows. > > So yes, preemptively designing stuff like that "in the dark" is kinda hard. > :-) The discoverability is unnecessary for any known usecase. Ok. Then can we just drop the range discoverability entirely or we go with your suggestion and do not support read back of what has been requested but instead have the reads return a range if known or "" / return -EONOTSUPP if simply not known? I can live with that though to me we are heading in the direction of a less intuitive interface to save a small number of additional files. Jonathan >