From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3CB7C83F1A for ; Thu, 10 Jul 2025 05:32:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 02D276B00A3; Thu, 10 Jul 2025 01:32:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F1F926B00A6; Thu, 10 Jul 2025 01:32:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E0E9C6B00A8; Thu, 10 Jul 2025 01:32:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id CA5CA6B00A3 for ; Thu, 10 Jul 2025 01:32:14 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 47818C0361 for ; Thu, 10 Jul 2025 05:32:14 +0000 (UTC) X-FDA: 83647234188.18.65529EE Received: from mail.zytor.com (terminus.zytor.com [198.137.202.136]) by imf01.hostedemail.com (Postfix) with ESMTP id 19ACC4000F for ; Thu, 10 Jul 2025 05:32:11 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=zytor.com header.s=2025062101 header.b=UZtpJDyY; dmarc=pass (policy=none) header.from=zytor.com; spf=pass (imf01.hostedemail.com: domain of hpa@zytor.com designates 198.137.202.136 as permitted sender) smtp.mailfrom=hpa@zytor.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752125532; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CQh1DInxbfOA81mONPPipgpqdwiFZ+9PeQ0qSReXUEY=; b=gNSC+jvqbw5+f9FWp/AZNjStYNVebtd6dLTfAs0URWMg+VSZjsyHsLYoLZ+yGMOAiBRKig hZ3OtTT/sStu9v+/+eolajCIjYkANryXzjhWL0xKVWQJU8mOMiYkGiS7OQmswd+kwt2TxU b8tm3q+YuaRm82UcXskWK0DszI2RACk= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=zytor.com header.s=2025062101 header.b=UZtpJDyY; dmarc=pass (policy=none) header.from=zytor.com; spf=pass (imf01.hostedemail.com: domain of hpa@zytor.com designates 198.137.202.136 as permitted sender) smtp.mailfrom=hpa@zytor.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752125532; a=rsa-sha256; cv=none; b=SBOBKxcW3XUfkh2v/2OwePM9mmxvkx+etIWqr1e2LdDAHqJ2tfuEb4IB9Y5oufdKsLME8P /4g7NRRcfmI/15QlSakhK8Jy+6hHYGkHdBOEhYBsWSRxxRAPvl2qJPi/aBY3wlPE1fwcMO IoOdnag4U9zcM5BUZvqncehKtci9s2I= Received: from [127.0.0.1] (c-76-133-66-138.hsd1.ca.comcast.net [76.133.66.138]) (authenticated bits=0) by mail.zytor.com (8.18.1/8.17.1) with ESMTPSA id 56A5VD0w409123 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NO); Wed, 9 Jul 2025 22:31:13 -0700 DKIM-Filter: OpenDKIM Filter v2.11.0 mail.zytor.com 56A5VD0w409123 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=zytor.com; s=2025062101; t=1752125475; bh=CQh1DInxbfOA81mONPPipgpqdwiFZ+9PeQ0qSReXUEY=; h=Date:From:To:CC:Subject:In-Reply-To:References:From; b=UZtpJDyY0O7zpLI46+afIFDPGTxicGDP5fvGUB8m4RMhn9cHwI0R+GHNT0AKpxNfx t6Pw1pdlDE+7fB6TOn/GWTKlrd0SXt6e6wUteOoCiDCqyhEkAh9A1GbdtI78ilGRSq d+cD61d6QF539bDzPSi4LXpUhpF3mJyZOq+NMUXTlPpvdEy6oYdjcX56K5wyPmDIG7 0EoDScUMuLImo9dLMrbgJhQPPugqNm64uHQRAL6lw/17BNapD7Otf6phO1a5PZDcxn 6NItttWq5qkxG0SToUdZnGvrCIAUhgHWZdX89R8kylXPNXMBH3XlLDcdQe+toELCEb vwu2G4cUcFKLA== Date: Wed, 09 Jul 2025 22:31:11 -0700 From: "H. Peter Anvin" To: dan.j.williams@intel.com, Peter Zijlstra CC: Jonathan Cameron , Catalin Marinas , james.morse@arm.com, linux-cxl@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-acpi@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, gregkh@linuxfoundation.org, Will Deacon , Dan Williams , Davidlohr Bueso , Yicong Yang , linuxarm@huawei.com, Yushan Wang , Lorenzo Pieralisi , Mark Rutland , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, Andy Lutomirski Subject: Re: [PATCH v2 0/8] Cache coherency management subsystem User-Agent: K-9 Mail for Android In-Reply-To: <686f4e20c57cd_1d3d100b7@dwillia2-xfh.jf.intel.com.notmuch> References: <20250624154805.66985-1-Jonathan.Cameron@huawei.com> <20250625085204.GC1613200@noisy.programming.kicks-ass.net> <20250625093152.GZ1613376@noisy.programming.kicks-ass.net> <686f4e20c57cd_1d3d100b7@dwillia2-xfh.jf.intel.com.notmuch> Message-ID: <3EAB3F81-D98C-4E4A-8E44-DB067547B318@zytor.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 19ACC4000F X-Stat-Signature: csjbjp4yr3e9ffg1j36oz86eryzexzw3 X-Rspam-User: X-HE-Tag: 1752125531-774433 X-HE-Meta: U2FsdGVkX1+DuovHsVT7RDCeZ8VcMBCebuLHL7XwLEOU3+HD+jSYk4yQJ3MJedlh8VPBHFjHgw2w6qpdlYRVKRCKFKBl4rPZNf+tQCM+3+WCaDawSHOSz3YCeFevUtW3londCwglXyOYHCeO8JYf2CqjwHzADfnCwLuBX2Ao0bv5EaamUXMW65WuNuyxOj7kkvxI25G+7n5q4LlJAHNOz89QIQq2bkTFLQnpgYjjo9zLXcw4tdIaPYMq/RHHrX+kuph80R7KMDyv5CbWS+Y6rlOmcZYAYZMokNx7mRGnJu35m8UYyLcmE+pCbaTOwgah8Eto2H8ZK2Pg5E+5S903B8v1jQf6zE9QF12/799GCvJjWDayivPdTwO46R9BhA6iXjyFLQVDW43qWu5/3prNNhmxIfay46IMnjt0d5A8igOeNGVvgYmQ9evKgGDr5Kq7knzsnj1M8HqF8bg1ARzAA4AS6BQzRrRVaB4SJCH9Tm0/6wJFB8JE598hS3GOeJZcUkUucT2Y48iv6lNCo3uuO5ZfMLLhSlpbbIU59RjlouU73gJ8jfq3QY5ElsSbzJ3NznvecP5grLNExoAG6mb8lp6BpIVpmhHGR4KPjJ1nIOJqIytMAtMRuoZz0Gh4tYNFg5Q5MVGQpTPIW/FTzSg+o04IShZ0kAtWpB9VJLpI3VGDPErs7ZsoWckj9xsRp7zNKUl9nDOpM4OYLIaCCWndDr/CIkfXAUwXHlxZ6zZjfMRy51j02UjYafkvhylcenep+NKHLddAVxyeFXJyYAFTCbOrFGVmyD8D9xuYXUUDJTwKQWcRDTVMZOLgoaMkN2SUa46KWmGP6wVKtQahnNfzhDNP29sQpuEvjzoG38K5r3DO0iHEi/MhkhrJOF0lZP9D5Jr1GXdi4yTE4culDr9+57UduJxUvZsqp4ew3unC1yZ4Nb4RlcUun3lZx/a0diGZMZbR1Lze+c8AuyR1/Gr rx9Spu5M vDH2LojvKuPIosjenq5M8YTFCot42vDp7nDOgpVk+XcUh9XbGGOnJ68sC2XnGjCcq3irScL7Wsf+y+BZTwu82BDuXaF9eS4Ry0gkKhE78bW5F9nPHu9lb1MRHq3AiFE8QHj251LNWK5xC1nf+zPhOrMfQNDYxixrSLkiUcElXLbx4omg9gifssIdQ/MujGm/t76CH2XPW90D6APNX8SNIqt15ZWqKZG4nVl4ROSGtk8g3jq2E+kjaUz/mD7JQDhzcWQ/+9wNw7T9uWJS5/n1692TTRfXeELiy/vAr7mgawISinLJXslkl7WqZkh6QbR1HGoOsq4WcjzvCQ17ViNoo+JCbnZFZ9cc9+/tmgxUt7d6O3KXUo+E6Et2IbLKomUBof/1JEQCXCqcX20UFRb2RHzOCAesjapoYUmHF X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On July 9, 2025 10:22:40 PM PDT, dan=2Ej=2Ewilliams@intel=2Ecom wrote: >Peter Zijlstra wrote: >> On Wed, Jun 25, 2025 at 02:12:39AM -0700, H=2E Peter Anvin wrote: >> > On June 25, 2025 1:52:04 AM PDT, Peter Zijlstra wrote: >> > >On Tue, Jun 24, 2025 at 04:47:56PM +0100, Jonathan Cameron wrote: >> > > >> > >> On x86 there is the much loved WBINVD instruction that causes a wr= ite back >> > >> and invalidate of all caches in the system=2E It is expensive but = it is >> > > >> > >Expensive is not the only problem=2E It actively interferes with thi= ngs >> > >like Cache-Allocation-Technology (RDT-CAT for the intel folks)=2E Do= ing >> > >WBINVD utterly destroys the cache subsystem for everybody on the >> > >machine=2E >> > > >> > >> necessary in a few corner cases=2E=20 >> > > >> > >Don't we have things like CLFLUSH/CLFLUSHOPT/CLWB exactly so that we= can >> > >avoid doing dumb things like WBINVD ?!? >> > > >> > >> These are cases where the contents of >> > >> Physical Memory may change without any writes from the host=2E Whi= lst there >> > >> are a few reasons this might happen, the one I care about here is = when >> > >> we are adding or removing mappings on CXL=2E So typically going fr= om >> > >> there being actual memory at a host Physical Address to nothing th= ere >> > >> (reads as zero, writes dropped) or visa-versa=2E=20 >> > > >> > >> The >> > >> thing that makes it very hard to handle with CPU flushes is that t= he >> > >> instructions are normally VA based and not guaranteed to reach bey= ond >> > >> the Point of Coherence or similar=2E You might be able to (ab)use >> > >> various flush operations intended to ensure persistence memory but >> > >> in general they don't work either=2E >> > > >> > >Urgh so this=2E Dan, Dave, are we getting new instructions to deal w= ith >> > >this? I'm really not keen on having WBINVD in active use=2E >> > > >> >=20 >> > WBINVD is the nuclear weapon to use when you have lost all notion of >> > where the problematic data can be, and amounts to a full reset of the >> > cache system=2E=20 >> >=20 >> > WBINVD can block interrupts for many *milliseconds*, system wide, and >> > so is really only useful for once-per-boot type events, like MTRR >> > initialization=2E >>=20 >> Right this=2E=2E=2E But that CXL thing sounds like that's semi 'regular= ' to >> the point that providing some infrastructure around it makes sense=2E T= his >> should not be=2E > >"Regular?", no=2E Something is wrong if you are doing this regularly=2E I= n >current CXL systems the expectation is to suffer a WBINVD event once per >server provisioning event=2E > >Now, there is a nascent capability called "Dynamic Capacity Devices" >(DCD) where the CXL configuration is able to change at runtime with >multiple hosts sharing a pool of memory=2E Each time the physical memory >capacity changes, cache management is needed=2E > >For DCD, I think the negative effects of WBINVD are a *useful* stick to >move device vendors to stop relying on software to solve this problem=2E >They can implement an existing CXL protocol where the device tells CPUs >and other CXL=2Ecache agents to invalidate the physical address ranges >that the device owns=2E > >In other words, if WBINVD makes DCD inviable that is a useful outcome >because it motivates unburdening Linux long term with this problem=2E > >In the near term though, current CXL platforms that do not support >device-initiated-invalidate still need coarse cache management for that >original infrequent provisioning events=2E Folks that want to go further >and attempt frequent DCD events with WBINVD get to keep all the pieces=2E Since this is presumably rare, it might be better to loop and clflush, eve= n though it will take longer, rather than stopping the world=2E