From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00B60C28B30 for ; Thu, 20 Mar 2025 17:41:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7F81628000A; Thu, 20 Mar 2025 13:41:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7A68C280006; Thu, 20 Mar 2025 13:41:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6480728000A; Thu, 20 Mar 2025 13:41:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 435B9280006 for ; Thu, 20 Mar 2025 13:41:29 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 127BCAC5F3 for ; Thu, 20 Mar 2025 17:41:30 +0000 (UTC) X-FDA: 83242646340.01.F704C6F Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by imf16.hostedemail.com (Postfix) with ESMTP id D4A8718001B for ; Thu, 20 Mar 2025 17:41:24 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=none; spf=pass (imf16.hostedemail.com: domain of jonathan.cameron@huawei.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=jonathan.cameron@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742492485; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references; bh=oHT5tBIeBf0cxB91Eqye+0ABodbiuy0yPJIz/1GC/YE=; b=EUQLzMubl5n16Udm6qulgwujn/76ZMQdWEpLdCr19q2voLUTnH5wE/GosqZ0pQDiW9xz8i 2dMGBx9q+ZGQxgSaQODON0LhuFP7TfruO+RJPl42nK1Xn8YhgiQ9jsWUfi1BQlK9jMMM9o tyTU6vdv6PO8PsgAqC2R24dpsO8mjKI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742492485; a=rsa-sha256; cv=none; b=V9u2dLfMBhA3CTZSTNunL5r7Wm0BkZ5iiQLb3yMR7TXjasB+gIjU7jj3LF6ixngSO8QDcf 6/uuEnrqNUeRa9YWdl5ztb4IO+UkZgWEVsJ0omU8elneJmssPZy2Rx9B60wxEZTA2vMr5z zA6y9bh8pnXKUFqtTO9g6Olg3qx2PcQ= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=none; spf=pass (imf16.hostedemail.com: domain of jonathan.cameron@huawei.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=jonathan.cameron@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com Received: from mail.maildlp.com (unknown [172.18.186.31]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4ZJXpw37tQz6M4k2; Fri, 21 Mar 2025 01:38:00 +0800 (CST) Received: from frapeml500008.china.huawei.com (unknown [7.182.85.71]) by mail.maildlp.com (Postfix) with ESMTPS id B399614050A; Fri, 21 Mar 2025 01:41:20 +0800 (CST) Received: from SecurePC-101-06.china.huawei.com (10.122.19.247) by frapeml500008.china.huawei.com (7.182.85.71) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Thu, 20 Mar 2025 18:41:19 +0100 From: Jonathan Cameron To: , , , , Yicong Yang , CC: , , Yushan Wang , , , Lorenzo Pieralisi , Mark Rutland , Catalin Marinas , Will Deacon , Dan Williams Subject: [RFC PATCH 0/6] Cache coherency management subsystem Date: Thu, 20 Mar 2025 17:41:12 +0000 Message-ID: <20250320174118.39173-1-Jonathan.Cameron@huawei.com> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.122.19.247] X-ClientProxiedBy: lhrpeml100001.china.huawei.com (7.191.160.183) To frapeml500008.china.huawei.com (7.182.85.71) X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: D4A8718001B X-Stat-Signature: r96tccee11foftop7x5kgg4yyaujkysh X-HE-Tag: 1742492484-7237 X-HE-Meta: U2FsdGVkX1/KlAYFUndPDU3zNP7jP/x59loM7l+lfD1d2EdRiwNr3t3Wqg5pNo0eifhtxuvTz62yQMe+lCSH3TciSyQgFhwLbANQ+o6L+JHaQ6ZpxI0p0CDv2t9fghq5IkhNFaDDWrBCTRmaYPaesakhusNO+y6AVpwq4hDtX2VUtBmuD8izXfiktz1afGT387+9C2anqZCsYBeLnSQlm+TzJW9aExd2R8l+QDGHfr0nJrw9knwzaGdKXwE0DDs7wt1mIgydd1yFLGLq8rY9kloww2NJxlUBUpp5lLTHijDsIWEZmrH8CIqDPAiwBWsN3i4PdvUd09k5OpWkBZJ6jvXV4R9eDJ6Kw1Bd+RsICg3PsGlx+r6f0lDZ1RusTIbB1VvegC5KWDjd2AJzplVb6gW8l3+0rz2y03r/t9kc4ANbxjP9n7i/HjGcu7p7KnBzSPG4C0H3tOY836Axsjefig/VUD9bII9IRC+Vqh3yoaRVmkvO7pMgEZh3q2bpcGmBnTORonv344DKqVy9viRPZCO+hc7oi/MWT7tzprsPP9Wv+qipNuBND2qbNyTC5JfOuDOaiARabTjZo0GQeiycjs9Nx3pJYolU9mFWYsTuVuwuCsq5/q2/UdvKheWTt70IWGEYFT5+ufgTl1dVJrwaB/wl2JsnpQIcZrjizwiwdLpQd50/ESOG+UqkVUu96AcZ1eTvReWChuHA7KymHUwm8o0plYguNG2dJIRwZGq9cPUNPpalGNkNIbK7j0uC7op4u3y6bnyHssHy+1pnuVdmIpuGvVJbwLb8povOnPCcCqcax5VyVqS/oHERduI5VyVQJI/PeYPrSmPWf0LHbF65NMLT6+KNBLneWgEOUwTVGkxWjHQOXIUyyNjDsybYwR91l6csfREykAv9n4eSvMDBWoYCxDtXrwGFLeKW99UD7cd3PYA/4fvXSg7gx5KVznGUOSQ5LUwORL+gMCQ/ka/ PMivD9Qw OdNbSjyMzTxsnaK7NRgaDMjWHVzru9rtbzH1+2odkvHUXF+TQWL33kT6gi+pFg8qV9NUQTY23Mm0v2l/0hZdBRUeSTJVhzwpZlVxq+3tP4TtX6rL0LCbhKAp/YJ52Nah/TujTr/dDByzRiTg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Note that I've only a vague idea of who will care about this so please do +CC others as needed. On x86 there is the much loved WBINVD instruction that causes a write back and invalidate of all caches in the system. It is expensive but it is necessary in a few corner cases. These are cases where the contents of Physical Memory may change without any writes from the host. Whilst there are a few reasons this might happen, the one I care about here is when we are adding or removing mappings on CXL. So typically going from there being actual memory at a host Physical Address to nothing there (reads as zero, writes dropped) or visa-versa. That involves the reprogramming of address decoders (HDM Decoders); in the near future it may also include the device offering dynamic capacity extents. The thing that makes it very hard to handle with CPU flushes is that the instructions are normally VA based and not guaranteed to reach beyond the Point of Coherence or similar. You might be able to (ab)use various flush operations intended to ensure persistence memory but in general they don't work either. So on other architectures such as ARM64 we have no instruction similar to WBINVD but we may have device interfaces in the system that provide a way to ensure a PA range undergoes the write back and invalidate action. This RFC is to find a way to support those cache maintenance device interfaces. The ones I know about are much more flexible than WBINVD, allowing invalidation of particular PA ranges, or a much richer set of flush types (not supported yet as not needed for upstream use cases). To illustrate how a solution might work, I've taken both a HiSilicon design (slight quirk as registers overlap with existing PMU driver) and more controversially a firmware interface proposal from ARM (wrapped up in made up ACPI) that was dropped from the released spec but for which the alpha spec is still available. Why drivers/cache? - Mainly because it exists and smells like a reasonable place. - Conor, you are maintainer for this currently do you mind us putting this stuff in there? Why not just register a singleton function pointer? - Systems may include multiple cache control devices, responsible for different parts of the PA address range (interleaving etc make this complex). They may not all share a common hardware interface. - A device class is more convenient than managing multiple homogeneous device instances within a driver. - Disadvantage is that we need this small class Generalizing to more arch? - I've started with ARM64, but if useful elsewhere the small amount of arch code could be moved to a generic location. QEMU emulation code at http://gitlab.com/jic23/qemu cxl-2025-03-20 Why an RFC? - I'm really just looking for feedback on whether the class approach is the way to go at this stage. I'm not strongly attached to it but it feels like the right balance of complexity and flexibility to me. - I made up the ACPI spec - it's not documented, non official and honestly needs work. I would however like to get feedback on whether it is something we want to try and get through the ACPI Working group as a much improved code first proposal? The potential justification being to avoid the need for lots trivial drivers where maybe a bit of DSDT interpreted code does the job better. Jonathan Cameron (3): cache: coherency device class acpi: PoC of Cache control via ACPI0019 and _DSM Hack: Pretend we have PSCI 1.2 Yicong Yang (3): memregion: Support fine grained invalidate by cpu_cache_invalidate_memregion() arm64: Support ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION cache: Support cache maintenance for HiSilicon SoC Hydra Home Agent arch/arm64/Kconfig | 1 + arch/arm64/include/asm/cacheflush.h | 14 ++ arch/arm64/mm/flush.c | 42 ++++++ arch/x86/mm/pat/set_memory.c | 2 +- drivers/acpi/Makefile | 1 + drivers/cache/Kconfig | 26 ++++ drivers/cache/Makefile | 4 + drivers/cache/acpi_cache_control.c | 157 ++++++++++++++++++++++ drivers/cache/coherency_core.c | 130 +++++++++++++++++++ drivers/cache/hisi_soc_hha.c | 193 ++++++++++++++++++++++++++++ drivers/cxl/core/region.c | 6 +- drivers/firmware/psci/psci.c | 2 + drivers/nvdimm/region.c | 3 +- drivers/nvdimm/region_devs.c | 3 +- include/linux/cache_coherency.h | 60 +++++++++ include/linux/memregion.h | 8 +- 16 files changed, 646 insertions(+), 6 deletions(-) create mode 100644 drivers/cache/acpi_cache_control.c create mode 100644 drivers/cache/coherency_core.c create mode 100644 drivers/cache/hisi_soc_hha.c create mode 100644 include/linux/cache_coherency.h -- 2.43.0