From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7AAEC25B10 for ; Fri, 10 May 2024 11:23:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4C0C36B00B8; Fri, 10 May 2024 07:23:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 470FB6B00B9; Fri, 10 May 2024 07:23:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 338096B00BA; Fri, 10 May 2024 07:23:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 165D86B00B8 for ; Fri, 10 May 2024 07:23:35 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 7EC24160EA0 for ; Fri, 10 May 2024 11:23:34 +0000 (UTC) X-FDA: 82102250748.11.BBF24F5 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by imf27.hostedemail.com (Postfix) with ESMTP id C05244000F for ; Fri, 10 May 2024 11:23:30 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf27.hostedemail.com: domain of jonathan.cameron@huawei.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=jonathan.cameron@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1715340212; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RIxsxpgbRljekEmmknFingJhmLDrGOmQulWprykI4to=; b=WO/T6x5yTd7G3DuvRQSfl9lLARe9ku/zr1sCd1K94yMldnUo+cdZdYC/vlVhPuDym8/63D VXolk6/qloXYXZBzHwmu4GGfwwbLBjPAx6vPZNvXzMlOMMqL8GM/bzq1/8WHYNP3Em8mk/ xYQ6QXp0bYv5xOArfaZhWc+99KFurHA= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf27.hostedemail.com: domain of jonathan.cameron@huawei.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=jonathan.cameron@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1715340212; a=rsa-sha256; cv=none; b=xTjB48H+vlhKydwYm1XrVg4YjRtUTlkpdrwUI3TbhOwbaE39HtwB9drqTN+c+PtgwxziEP krxqH2d/1Z9bkQePRsUbq7uwDEzKbhw/W/3Rmnjhesi1/E+xSbTMjvWQeOtzfNS/yOucDb LP9ivH3k/Aq1XHNbyzdQJbJ8b0IC0sU= Received: from mail.maildlp.com (unknown [172.18.186.231]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4VbRM36936z6K9R3; Fri, 10 May 2024 19:22:55 +0800 (CST) Received: from lhrpeml500005.china.huawei.com (unknown [7.191.163.240]) by mail.maildlp.com (Postfix) with ESMTPS id EF1A1140A77; Fri, 10 May 2024 19:23:27 +0800 (CST) Received: from localhost (10.202.227.76) by lhrpeml500005.china.huawei.com (7.191.163.240) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Fri, 10 May 2024 12:23:27 +0100 Date: Fri, 10 May 2024 12:23:25 +0100 From: Jonathan Cameron To: Dan Williams CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: Re: [RFC PATCH v8 05/10] cxl/memscrub: Add CXL device patrol scrub control feature Message-ID: <20240510122325.00005e83@Huawei.com> In-Reply-To: <663d69c61db8c_3d7b4294e0@dwillia2-mobl3.amr.corp.intel.com.notmuch> References: <20240419164720.1765-1-shiju.jose@huawei.com> <20240419164720.1765-6-shiju.jose@huawei.com> <663d69c61db8c_3d7b4294e0@dwillia2-mobl3.amr.corp.intel.com.notmuch> Organization: Huawei Technologies Research and Development (UK) Ltd. X-Mailer: Claws Mail 4.1.0 (GTK 3.24.33; x86_64-w64-mingw32) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.202.227.76] X-ClientProxiedBy: lhrpeml500001.china.huawei.com (7.191.163.213) To lhrpeml500005.china.huawei.com (7.191.163.240) X-Stat-Signature: om4ec3xysxct77oo9uuwzt5d7a1wyd3c X-Rspamd-Queue-Id: C05244000F X-Rspam-User: X-Rspamd-Server: rspam12 X-HE-Tag: 1715340210-896916 X-HE-Meta: U2FsdGVkX1+SIsi+xyVyw3ex7hiBcVb04AeMKtVYhpU8nQESkyC4OMWVb2MnU1y4GIaNwq81iniz877ZqHvkImg5kPl9XoMfGNTRy1aDQf+ci5LtzOUV6eKyj2wQ6HWoTzHQfKVJfJys+jWukVV/benwwxYJrE4H5DZobdQ9M4MYr/aRvPN3CIqYWoVof733TO+LeXniQ4+fsvK4Yd7SFiRx8ltz0Dqaya1VZSoUFMr9ZTieJcxeBvd0bHw4n/SYHPhKBs70ddwcriF5CM6hKh9x8meO2jgX2leq/KOo7uc12uuKw+i2KE89vpWPCLBzMHMixJgtLXVsIkBaCTQkgr4XsPMHKA+Tk0BJzomhnhDmT1XgLhh0mAi6JaXp6garIfEVv9UpnCvQltSo9apvp7NaMbdaqP98uT9FEvWPhjCzwzvnnn8TcBC0iC3Pr2vtcV2ldCxXNdB40u8ejcZH1Av4vRroqS3A9+tOg0q8eW7c8l36z/5gIBWFGcJRT7ggBS7/Bn0V9qqdl4BO7YQDxkHwMpsInGAfwl00LA4xyNnnLCL/MyzNr76S5a4NJ+zD5pw1OSHlC3U/8bGl/cqFM2aNmT/yEVTnO2zvKM4Tjnggt68ap5/WIN0oypAF30ltTPPFpSF3R9mbBN3/KV9ERcjlUN9snalXcCm1okBSVRUBqrMX1mZmGRHRXqz53bMw3201buNLf99DzgShuv6PKGTw8hfzKmGID3p/Zst1OWkMKIK9+elZm9TxF/ifLDGHDn7v239AAgWbtkKgRjsIOPhG8MI+q2W+wu+au1gFJtuuqfl3LDynH5/ZtqrbAYXhT8kt/7PsYIGJsABzCVTt3TIewf02yw2Iw8BgeBFQX4hnjsAR46gmGfKtboj5+E7TrOiVLfgwsFuxzc3jWGXYnUXWmKB/dNHrBcP8FVkd9yXtzxXI5cweFSsw2EsuC7gCWzYQfipCa6Y0jaKJALZ c10IvNY6 +L49oSN3dA09Ij5I9kJytUg+9F7BhjsNZzyPQCWpRS1p0ffDSG/Ty4JkpFu1hGSVkpcHHRfIGKOgyvaiyhjPlPJVmCTF4UtMdBumQDXxVi29eCrxC8U3jcRGVq33VsJ4Mx8+ORSre3aiGX+CaKrprh2y1ASlPD+Wmv1qEvkDGbsCX0pFMe8Z5G4RwladwjHZ3oVIV/S9q8LA3KbuutelQpWfypDMjgcMA8g0RHW9qMn6/zy96QnSnNjqNkHB+HtJ/pNw7bWesHrEGILJVmcHlf9wkAHeLQNaeCH2EDhtrO+ftM8SV+/OldNoJRZgCbGhbRczz X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, 9 May 2024 17:26:46 -0700 Dan Williams wrote: > shiju.jose@ wrote: > > From: Shiju Jose > > > > CXL spec 3.1 section 8.2.9.9.11.1 describes the device patrol scrub control > > feature. The device patrol scrub proactively locates and makes corrections > > to errors in regular cycle. > > > > Allow specifying the number of hours within which the patrol scrub must be > > completed, subject to minimum and maximum limits reported by the device. > > Also allow disabling scrub allowing trade-off error rates against > > performance. > > > > Register with scrub subsystem to provide scrub control attributes to the > > user. > > > > Co-developed-by: Jonathan Cameron > > Signed-off-by: Jonathan Cameron > > Signed-off-by: Shiju Jose > [..] > > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c > > index 0c79d9ce877c..399e43463626 100644 > > --- a/drivers/cxl/mem.c > > +++ b/drivers/cxl/mem.c > > @@ -117,6 +117,12 @@ static int cxl_mem_probe(struct device *dev) > > if (!cxlds->media_ready) > > return -EBUSY; > > > > + rc = cxl_mem_patrol_scrub_init(cxlmd); > > + if (rc) { > > + dev_dbg(&cxlmd->dev, "CXL patrol scrub init failed\n"); > > + return rc; > > + } > > 2 concerns: > > * Why should cxl_mem_probe() fail just because this optional > scrub interface did not register? > Flip the dev_dbg to dev_warn() and indeed carry on. > * Why is this not located in cxl_region_probe()? If the ras2 scrub is an > HPA-based scrub I think CXL should do the work to interface with the scrub > interface at the same level. This also provides another in-kernel user > for all the DPA-to-HPA translation infrastructure that the CXL driver > contains. Pretty much the only reason the CXL driver needs to exist at > all is address translation, so at a minimum it seems a waste to inflict > more need to understand DPAs on userspace. As you might expect this will get messy - I'm not saying it's a bad thing to do, but complexities that come to mind include: * Scrub is device wide (unlike RAS2 which in theory supports HPA range control) So if you map a given DPA range into multiple regions then the controls will interfere. Maybe scrub at max rate requested for any region is fine. * Interleave - so we'd be controlling multiple hardware scrubbers. * Comes and goes with regions. Do we stop scrubbing if no region? Not sure. My guess is break down is: 1) Component registered for each CXL mem device to handle the control + combining of all regions specific requests. 2) Region specific component that exposes the controls on HPA basis, and requests from all it's CXL mem device drivers a minimum service level. 3) Device specific scrub instance (perhaps) reflecting that some scrub may make sense when not yet in a region (identify bad mem etc). So I think we will end up with a lot more layering in here, but end result will indeed be better. This has been going on a while, so not sure the DPA to HPA stuff was all in place and at the time I think was still an open question of whether that should be a userspace problem or not. Anyhow time to adapt :) Jonathan