[PATCH v4 0/6] Cache coherency management subsystem

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v4 0/6] Cache coherency management subsystem
@ 2025-10-22 11:33 Jonathan Cameron
  2025-10-22 11:33 ` [PATCH v4 1/6] memregion: Drop unused IORES_DESC_* parameter from cpu_cache_invalidate_memregion() Jonathan Cameron
                   ` (6 more replies)
  0 siblings, 7 replies; 18+ messages in thread
From: Jonathan Cameron @ 2025-10-22 11:33 UTC (permalink / raw)
  To: Conor Dooley, Catalin Marinas, linux-cxl, linux-arm-kernel,
	linux-arch, linux-mm, Dan Williams, H . Peter Anvin,
	Peter Zijlstra, Andrew Morton
  Cc: james.morse, Will Deacon, Davidlohr Bueso, linuxarm, Yushan Wang,
	Lorenzo Pieralisi, Mark Rutland, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, Andy Lutomirski, Dave Jiang

Support system level interfaces for cache maintenance as found on some
ARM64 systems. This is needed for correct functionality during various
forms of memory hotplug (e.g. CXL). Typical hardware has MMIO interface
found via ACPI DSDT.

Includes parameter changes to cpu_cache_invalidate_memregion() but no
functional changes for architectures that already support this call.

How to merge?  When this is ready to proceed (so subject to review
feedback on this version), I'm not sure what the best route into the
kernel is. Conor could take the lot via his tree for drivers/cache but
the generic changes perhaps suggest it might be better if Andrew
handles this?  Any merge conflicts in drivers/cache will be trivial
build file stuff. Or maybe even take it throug one of the affected
trees such as CXL.

v4: (Small changes called out in each patch)
- Drop the ACPI driver. It has done it's job as a second implementation
  to help with generality testing. I have heard zero interest in actually
  doing the specification work needed to make that official. Easy to bring
  back if needed in future. I have it locally still as a second test
  case.
- Add a cpu_cache_invalidate_all() helper for the 0,-1 case that is used
  to indicate everything should be flushed as no fine grained range info
  available.
- Simplify the necessary symbols to be selected by architectures by
  making CONFIG_GENERIC_CPU_CACHE_MAINTENANCE select
  ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION
- Avoid naming mentioning devices as there is no struct device.
- Use a kref so as to have something on which a _put() operation makes
  sense avoiding rather confusing freeing of an internal structure pointer
  that was seen in v3.
- Gather tags given.
- Various minor things like typos, header tweaks etc.
Thanks to all who reviewed v3.

On current ARM64 systems (and likely other architectures) the
implementation of cache flushing need for actions such as CXL memory
hotplug e.g. cpu_cache_invalidate_memregion(), is performed by system
components outside of the CPU, controlled via either firmware or MMIO
interfaces.

These control units run the necessary coherency protocol operations to
cause the write backs and cache flushes to occur asynchronously. They allow
filtering by PA range to reduce disruption to the system. Systems
supporting this interface must be designed to ensure that, when complete,
all cache lines in the range are in invalid state or clean state
(prefetches may have raced with the invalidation). This must include
memory-side caches and other non architectural caches beyond the Point
of Coherence (ARM terminology) such that writes will reach memory even
after OS programmable address decoders are modified (for CXL this is
any HDM decoders that aren't locked). Software will guarantee that no
writes to these memory ranges race with this operation. Whilst this is
subtly different from write backs must reach the physical memory that
difference probably doesn't matter to those reading this series.

The often distributed nature of the relevant coherency management units
(e.g. due to interleaving) requires the appropriate commands to be issued
to multiple (potentially heterogeneous) units. To enable this a
registration framework is provided to which drivers may register a set
of callbacks. Upon a request for a cache maintenance operation the
framework iterates over all registered callback sets, calling first a
command to write back and invalidate, and then optionally a command to wait
for completion. Filtering on relevance if a give request is left to the
individual drivers.

In this version only one driver is included. This is the HiSilicon Hydra
Home Agent driver which controls hardware found on some of our relevant
server SoCs. Also available (I can post if anyone is interested)
is an ACPI driver based on a firmware interface that was in a public
PSCI specification alpha version

QEMU emulation code at
http://gitlab.com/jic23/qemu cxl-2025-03-20 

Notes:
- I don't particularly like defining 'generic' infrastructure with so few
  implementations. If anyone can point me at docs for another one or two,
  or confirm that they think this is fine that would be great!
  The converse to this is I don't want to wait longer for those to surface
  given the necessity to support this one platform that I do know about!

Jonathan Cameron (3):
  memregion: Drop unused IORES_DESC_* parameter from
    cpu_cache_invalidate_memregion()
  arm64: Select GENERIC_CPU_CACHE_MAINTENANCE
  MAINTAINERS: Add Jonathan Cameron to drivers/cache and add
    lib/cache_maint.c + header

Yicong Yang (2):
  memregion: Support fine grained invalidate by
    cpu_cache_invalidate_memregion()
  lib: Support ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION

Yushan Wang (1):
  cache: Support cache maintenance for HiSilicon SoC Hydra Home Agent

 MAINTAINERS                     |   3 +
 arch/arm64/Kconfig              |   1 +
 arch/x86/mm/pat/set_memory.c    |   2 +-
 drivers/cache/Kconfig           |  15 +++
 drivers/cache/Makefile          |   2 +
 drivers/cache/hisi_soc_hha.c    | 191 ++++++++++++++++++++++++++++++++
 drivers/cxl/core/region.c       |   5 +-
 drivers/nvdimm/region.c         |   2 +-
 drivers/nvdimm/region_devs.c    |   2 +-
 include/linux/cache_coherency.h |  61 ++++++++++
 include/linux/memregion.h       |  16 ++-
 lib/Kconfig                     |   4 +
 lib/Makefile                    |   2 +
 lib/cache_maint.c               | 138 +++++++++++++++++++++++
 14 files changed, 436 insertions(+), 8 deletions(-)
 create mode 100644 drivers/cache/hisi_soc_hha.c
 create mode 100644 include/linux/cache_coherency.h
 create mode 100644 lib/cache_maint.c

-- 
2.48.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v4 1/6] memregion: Drop unused IORES_DESC_* parameter from cpu_cache_invalidate_memregion()
  2025-10-22 11:33 [PATCH v4 0/6] Cache coherency management subsystem Jonathan Cameron
@ 2025-10-22 11:33 ` Jonathan Cameron
  2025-10-22 11:33 ` [PATCH v4 2/6] memregion: Support fine grained invalidate by cpu_cache_invalidate_memregion() Jonathan Cameron
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Jonathan Cameron @ 2025-10-22 11:33 UTC (permalink / raw)
  To: Conor Dooley, Catalin Marinas, linux-cxl, linux-arm-kernel,
	linux-arch, linux-mm, Dan Williams, H . Peter Anvin,
	Peter Zijlstra, Andrew Morton
  Cc: james.morse, Will Deacon, Davidlohr Bueso, linuxarm, Yushan Wang,
	Lorenzo Pieralisi, Mark Rutland, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, Andy Lutomirski, Dave Jiang

The res_desc parameter was originally introduced for documentation purposes
and with the idea that with HDM-DB CXL invalidation could be triggered from
the device. That has not come to pass and the continued existence of the
option is confusing when we add a range in the following patch which might
not be a strict subset of the res_desc. So avoid that confusion by dropping
the parameter.

Link: https://lore.kernel.org/linux-mm/686eedb25ed02_24471002e@dwillia2-xfh.jf.intel.com.notmuch/
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

---
v4: Dan's tag (thanks!)
V3: New patch.
    As Dan calls out in the linked mail, an alternative might be to lookup
    the ranges and enforce the descriptor but his expressed preference
    was for dropping the parameter.
---
 arch/x86/mm/pat/set_memory.c | 2 +-
 drivers/cxl/core/region.c    | 2 +-
 drivers/nvdimm/region.c      | 2 +-
 drivers/nvdimm/region_devs.c | 2 +-
 include/linux/memregion.h    | 7 +++----
 5 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index d2d54b8c4dbb..0cfee2544ad4 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -368,7 +368,7 @@ bool cpu_cache_has_invalidate_memregion(void)
 }
 EXPORT_SYMBOL_NS_GPL(cpu_cache_has_invalidate_memregion, "DEVMEM");
 
-int cpu_cache_invalidate_memregion(int res_desc)
+int cpu_cache_invalidate_memregion(void)
 {
 	if (WARN_ON_ONCE(!cpu_cache_has_invalidate_memregion()))
 		return -ENXIO;
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index e14c1d305b22..36489cb086f3 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -236,7 +236,7 @@ static int cxl_region_invalidate_memregion(struct cxl_region *cxlr)
 		return -ENXIO;
 	}
 
-	cpu_cache_invalidate_memregion(IORES_DESC_CXL);
+	cpu_cache_invalidate_memregion();
 	return 0;
 }
 
diff --git a/drivers/nvdimm/region.c b/drivers/nvdimm/region.c
index cd9b52040d7b..47e263ecedf7 100644
--- a/drivers/nvdimm/region.c
+++ b/drivers/nvdimm/region.c
@@ -110,7 +110,7 @@ static void nd_region_remove(struct device *dev)
 	 * here is ok.
 	 */
 	if (cpu_cache_has_invalidate_memregion())
-		cpu_cache_invalidate_memregion(IORES_DESC_PERSISTENT_MEMORY);
+		cpu_cache_invalidate_memregion();
 }
 
 static int child_notify(struct device *dev, void *data)
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index a5ceaf5db595..c375b11aea6d 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -90,7 +90,7 @@ static int nd_region_invalidate_memregion(struct nd_region *nd_region)
 		}
 	}
 
-	cpu_cache_invalidate_memregion(IORES_DESC_PERSISTENT_MEMORY);
+	cpu_cache_invalidate_memregion();
 out:
 	for (i = 0; i < nd_region->ndr_mappings; i++) {
 		struct nd_mapping *nd_mapping = &nd_region->mapping[i];
diff --git a/include/linux/memregion.h b/include/linux/memregion.h
index c01321467789..945646bde825 100644
--- a/include/linux/memregion.h
+++ b/include/linux/memregion.h
@@ -26,8 +26,7 @@ static inline void memregion_free(int id)
 
 /**
  * cpu_cache_invalidate_memregion - drop any CPU cached data for
- *     memregions described by @res_desc
- * @res_desc: one of the IORES_DESC_* types
+ *     memregion
  *
  * Perform cache maintenance after a memory event / operation that
  * changes the contents of physical memory in a cache-incoherent manner.
@@ -46,7 +45,7 @@ static inline void memregion_free(int id)
  * the cache maintenance.
  */
 #ifdef CONFIG_ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION
-int cpu_cache_invalidate_memregion(int res_desc);
+int cpu_cache_invalidate_memregion(void);
 bool cpu_cache_has_invalidate_memregion(void);
 #else
 static inline bool cpu_cache_has_invalidate_memregion(void)
@@ -54,7 +53,7 @@ static inline bool cpu_cache_has_invalidate_memregion(void)
 	return false;
 }
 
-static inline int cpu_cache_invalidate_memregion(int res_desc)
+static inline int cpu_cache_invalidate_memregion(void)
 {
 	WARN_ON_ONCE("CPU cache invalidation required");
 	return -ENXIO;
-- 
2.48.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v4 2/6] memregion: Support fine grained invalidate by cpu_cache_invalidate_memregion()
  2025-10-22 11:33 [PATCH v4 0/6] Cache coherency management subsystem Jonathan Cameron
  2025-10-22 11:33 ` [PATCH v4 1/6] memregion: Drop unused IORES_DESC_* parameter from cpu_cache_invalidate_memregion() Jonathan Cameron
@ 2025-10-22 11:33 ` Jonathan Cameron
  2025-10-22 11:33 ` [PATCH v4 3/6] lib: Support ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION Jonathan Cameron
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Jonathan Cameron @ 2025-10-22 11:33 UTC (permalink / raw)
  To: Conor Dooley, Catalin Marinas, linux-cxl, linux-arm-kernel,
	linux-arch, linux-mm, Dan Williams, H . Peter Anvin,
	Peter Zijlstra, Andrew Morton
  Cc: james.morse, Will Deacon, Davidlohr Bueso, linuxarm, Yushan Wang,
	Lorenzo Pieralisi, Mark Rutland, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, Andy Lutomirski, Dave Jiang

From: Yicong Yang <yangyicong@hisilicon.com>

Extend cpu_cache_invalidate_memregion() to support invalidate certain range
of memory by introducing start and length parameters. Control of types of
invalidation is left for when usecases turn up. For now everything is
Clean and Invalidate.

Where the range is unknown, use the provided cpu_cache_invalidate_all()
helper to act as documentation of intent that is clearer than passing
(0, -1) to cpu_cache_invalidate_memregion().

Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

---
v4: Add cpu_cache_invalidate_all() helper for the (0, -1) case that
    applies when we don't have the invalidate range so just want to
    invalidate all caches. - (Thanks to Dan Williams for this suggestion).
v3: Rebase on top of previous patch that removed the IO_RESDESC_*
    parameter.
---
 arch/x86/mm/pat/set_memory.c |  2 +-
 drivers/cxl/core/region.c    |  5 ++++-
 drivers/nvdimm/region.c      |  2 +-
 drivers/nvdimm/region_devs.c |  2 +-
 include/linux/memregion.h    | 13 +++++++++++--
 5 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 0cfee2544ad4..05e7704f0128 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -368,7 +368,7 @@ bool cpu_cache_has_invalidate_memregion(void)
 }
 EXPORT_SYMBOL_NS_GPL(cpu_cache_has_invalidate_memregion, "DEVMEM");
 
-int cpu_cache_invalidate_memregion(void)
+int cpu_cache_invalidate_memregion(phys_addr_t start, size_t len)
 {
 	if (WARN_ON_ONCE(!cpu_cache_has_invalidate_memregion()))
 		return -ENXIO;
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 36489cb086f3..7d0f6f07352f 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -236,7 +236,10 @@ static int cxl_region_invalidate_memregion(struct cxl_region *cxlr)
 		return -ENXIO;
 	}
 
-	cpu_cache_invalidate_memregion();
+	if (!cxlr->params.res)
+		return -ENXIO;
+	cpu_cache_invalidate_memregion(cxlr->params.res->start,
+				       resource_size(cxlr->params.res));
 	return 0;
 }
 
diff --git a/drivers/nvdimm/region.c b/drivers/nvdimm/region.c
index 47e263ecedf7..53567f3ed427 100644
--- a/drivers/nvdimm/region.c
+++ b/drivers/nvdimm/region.c
@@ -110,7 +110,7 @@ static void nd_region_remove(struct device *dev)
 	 * here is ok.
 	 */
 	if (cpu_cache_has_invalidate_memregion())
-		cpu_cache_invalidate_memregion();
+		cpu_cache_invalidate_all();
 }
 
 static int child_notify(struct device *dev, void *data)
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index c375b11aea6d..1220530a23b6 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -90,7 +90,7 @@ static int nd_region_invalidate_memregion(struct nd_region *nd_region)
 		}
 	}
 
-	cpu_cache_invalidate_memregion();
+	cpu_cache_invalidate_all();
 out:
 	for (i = 0; i < nd_region->ndr_mappings; i++) {
 		struct nd_mapping *nd_mapping = &nd_region->mapping[i];
diff --git a/include/linux/memregion.h b/include/linux/memregion.h
index 945646bde825..a55f62cc5266 100644
--- a/include/linux/memregion.h
+++ b/include/linux/memregion.h
@@ -27,6 +27,9 @@ static inline void memregion_free(int id)
 /**
  * cpu_cache_invalidate_memregion - drop any CPU cached data for
  *     memregion
+ * @start: start physical address of the target memory region.
+ * @len: length of the target memory region. -1 for all the regions of
+ *       the target type.
  *
  * Perform cache maintenance after a memory event / operation that
  * changes the contents of physical memory in a cache-incoherent manner.
@@ -45,7 +48,7 @@ static inline void memregion_free(int id)
  * the cache maintenance.
  */
 #ifdef CONFIG_ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION
-int cpu_cache_invalidate_memregion(void);
+int cpu_cache_invalidate_memregion(phys_addr_t start, size_t len);
 bool cpu_cache_has_invalidate_memregion(void);
 #else
 static inline bool cpu_cache_has_invalidate_memregion(void)
@@ -53,10 +56,16 @@ static inline bool cpu_cache_has_invalidate_memregion(void)
 	return false;
 }
 
-static inline int cpu_cache_invalidate_memregion(void)
+static inline int cpu_cache_invalidate_memregion(phys_addr_t start, size_t len)
 {
 	WARN_ON_ONCE("CPU cache invalidation required");
 	return -ENXIO;
 }
 #endif
+
+static inline int cpu_cache_invalidate_all(void)
+{
+	return cpu_cache_invalidate_memregion(0, -1);
+}
+
 #endif /* _MEMREGION_H_ */
-- 
2.48.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v4 3/6] lib: Support ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION
  2025-10-22 11:33 [PATCH v4 0/6] Cache coherency management subsystem Jonathan Cameron
  2025-10-22 11:33 ` [PATCH v4 1/6] memregion: Drop unused IORES_DESC_* parameter from cpu_cache_invalidate_memregion() Jonathan Cameron
  2025-10-22 11:33 ` [PATCH v4 2/6] memregion: Support fine grained invalidate by cpu_cache_invalidate_memregion() Jonathan Cameron
@ 2025-10-22 11:33 ` Jonathan Cameron
  2025-10-22 21:11   ` Conor Dooley
  2025-10-22 11:33 ` [PATCH v4 4/6] arm64: Select GENERIC_CPU_CACHE_MAINTENANCE Jonathan Cameron
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Jonathan Cameron @ 2025-10-22 11:33 UTC (permalink / raw)
  To: Conor Dooley, Catalin Marinas, linux-cxl, linux-arm-kernel,
	linux-arch, linux-mm, Dan Williams, H . Peter Anvin,
	Peter Zijlstra, Andrew Morton
  Cc: james.morse, Will Deacon, Davidlohr Bueso, linuxarm, Yushan Wang,
	Lorenzo Pieralisi, Mark Rutland, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, Andy Lutomirski, Dave Jiang

From: Yicong Yang <yangyicong@hisilicon.com>

ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION provides the mechanism for
invalidating certain memory regions in a cache-incoherent manner. Currently
this is used by NVDIMM and CXL memory drivers in cases where it is
necessary to flush all data from caches by physical address range.

In some architectures these operations are supported by system components
that may become available only later in boot as they are either present
on a discoverable bus, or via a firmware description of an MMIO interface
(e.g. ACPI DSDT). Provide a framework to handle this case.

Architectures can opt in for this support via
CONFIG_GENERIC_CPU_CACHE_MAINTENANCE

Add a registration framework. Each driver provides an ops structure and
the first op is Write Back and Invalidate by PA Range. The driver may
over invalidate.

An optional completion check operation is also provided. If present
that should be called to ensure that the action has finished.

When multiple agents are present in the system each should register with
this framework and the core code will issue the invalidate to all of them
before checking for completion on each. This is done to avoid need for
filtering in the core code which can become complex when interleave,
potentially across different cache coherency hardware is going on, so it
is easier to tell everyone and let those who don't care do nothing.

Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
v4:
 - Improve formatting of Returns documentation.  (Randy Dunlap)
 - select ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION as this is providing the
   implementation if this selected by an architecture (Catalin Marinas)
 - Avoid use of device in naming as no struct device involved any more.
   Instead use cache_coherency_ops for the callback structure and
   cache_coherency_ops_inst for a single register instance.
   Rename the allocation and registration functions to reflect this
   (based on feedback from Dan Williams)
 - Use a kref to avoid the oddity of calling free on an embedded structure.
   (Dan Williams)
---
 include/linux/cache_coherency.h |  61 ++++++++++++++
 lib/Kconfig                     |   4 +
 lib/Makefile                    |   2 +
 lib/cache_maint.c               | 138 ++++++++++++++++++++++++++++++++
 4 files changed, 205 insertions(+)

diff --git a/include/linux/cache_coherency.h b/include/linux/cache_coherency.h
new file mode 100644
index 000000000000..cc81c5733e31
--- /dev/null
+++ b/include/linux/cache_coherency.h
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Cache coherency maintenance operation device drivers
+ *
+ * Copyright Huawei 2025
+ */
+#ifndef _LINUX_CACHE_COHERENCY_H_
+#define _LINUX_CACHE_COHERENCY_H_
+
+#include <linux/list.h>
+#include <linux/kref.h>
+#include <linux/types.h>
+
+struct cc_inval_params {
+	phys_addr_t addr;
+	size_t size;
+};
+
+struct cache_coherency_ops_inst;
+
+struct cache_coherency_ops {
+	int (*wbinv)(struct cache_coherency_ops_inst *cci,
+		     struct cc_inval_params *invp);
+	int (*done)(struct cache_coherency_ops_inst *cci);
+};
+
+struct cache_coherency_ops_inst {
+	struct kref kref;
+	struct list_head node;
+	const struct cache_coherency_ops *ops;
+};
+
+int cache_coherency_ops_instance_register(struct cache_coherency_ops_inst *cci);
+void cache_coherency_ops_instance_unregister(struct cache_coherency_ops_inst *cci);
+
+struct cache_coherency_ops_inst *
+_cache_coherency_ops_instance_alloc(const struct cache_coherency_ops *ops,
+				    size_t size);
+/**
+ * cache_coherency_ops_instance_alloc - Allocate cache coherency ops instance
+ * @ops: Cache maintenance operations
+ * @drv_struct: structure that contains the struct cache_coherency_ops_inst
+ * @member: Name of the struct cache_coherency_ops_inst member in @drv_struct.
+ *
+ * This allocates a driver specific structure and initializes the
+ * cache_coherency_ops_inst embedded in the drv_struct. Upon success the
+ * pointer must be freed via cache_coherency_ops_instance_put().
+ *
+ * Returns a &drv_struct * on success, %NULL on error.
+ */
+#define cache_coherency_ops_instance_alloc(ops, drv_struct, member)	    \
+	({								    \
+		static_assert(__same_type(struct cache_coherency_ops_inst,  \
+					  ((drv_struct *)NULL)->member));   \
+		static_assert(offsetof(drv_struct, member) == 0);	    \
+		(drv_struct *)_cache_coherency_ops_instance_alloc(ops,	    \
+			sizeof(drv_struct));				    \
+	})
+void cache_coherency_ops_instance_put(struct cache_coherency_ops_inst *cci);
+
+#endif
diff --git a/lib/Kconfig b/lib/Kconfig
index e629449dd2a3..e11136d188ae 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -542,6 +542,10 @@ config MEMREGION
 config ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION
 	bool
 
+config GENERIC_CPU_CACHE_MAINTENANCE
+	bool
+	select ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION
+
 config ARCH_HAS_MEMREMAP_COMPAT_ALIGN
 	bool
 
diff --git a/lib/Makefile b/lib/Makefile
index 1ab2c4be3b66..aaf677cf4527 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -127,6 +127,8 @@ obj-$(CONFIG_HAS_IOMEM) += iomap_copy.o devres.o
 obj-$(CONFIG_CHECK_SIGNATURE) += check_signature.o
 obj-$(CONFIG_DEBUG_LOCKING_API_SELFTESTS) += locking-selftest.o
 
+obj-$(CONFIG_GENERIC_CPU_CACHE_MAINTENANCE) += cache_maint.o
+
 lib-y += logic_pio.o
 
 lib-$(CONFIG_INDIRECT_IOMEM) += logic_iomem.o
diff --git a/lib/cache_maint.c b/lib/cache_maint.c
new file mode 100644
index 000000000000..9256a9ffc34c
--- /dev/null
+++ b/lib/cache_maint.c
@@ -0,0 +1,138 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Generic support for Memory System Cache Maintenance operations.
+ *
+ * Coherency maintenance drivers register with this simple framework that will
+ * iterate over each registered instance to first kick off invalidation and
+ * then to wait until it is complete.
+ *
+ * If no implementations are registered yet cpu_cache_has_invalidate_memregion()
+ * will return false. If this runs concurrently with unregistration then a
+ * race exists but this is no worse than the case where the operations instance
+ * responsible for a given memory region has not yet registered.
+ */
+#include <linux/cache_coherency.h>
+#include <linux/cleanup.h>
+#include <linux/container_of.h>
+#include <linux/export.h>
+#include <linux/kref.h>
+#include <linux/list.h>
+#include <linux/memregion.h>
+#include <linux/module.h>
+#include <linux/rwsem.h>
+#include <linux/slab.h>
+
+static LIST_HEAD(cache_ops_instance_list);
+static DECLARE_RWSEM(cache_ops_instance_list_lock);
+
+static void __cache_coherency_ops_instance_free(struct kref *kref)
+{
+	struct cache_coherency_ops_inst *cci =
+		container_of(kref, struct cache_coherency_ops_inst, kref);
+	kfree(cci);
+}
+
+void cache_coherency_ops_instance_put(struct cache_coherency_ops_inst *cci)
+{
+	kref_put(&cci->kref, __cache_coherency_ops_instance_free);
+}
+EXPORT_SYMBOL_GPL(cache_coherency_ops_instance_put);
+
+static int cache_inval_one(struct cache_coherency_ops_inst *cci, void *data)
+{
+	if (!cci->ops)
+		return -EINVAL;
+
+	return cci->ops->wbinv(cci, data);
+}
+
+static int cache_inval_done_one(struct cache_coherency_ops_inst *cci)
+{
+	if (!cci->ops)
+		return -EINVAL;
+
+	if (!cci->ops->done)
+		return 0;
+
+	return cci->ops->done(cci);
+}
+
+static int cache_invalidate_memregion(phys_addr_t addr, size_t size)
+{
+	int ret;
+	struct cache_coherency_ops_inst *cci;
+	struct cc_inval_params params = {
+		.addr = addr,
+		.size = size,
+	};
+
+	guard(rwsem_read)(&cache_ops_instance_list_lock);
+	list_for_each_entry(cci, &cache_ops_instance_list, node) {
+		ret = cache_inval_one(cci, &params);
+		if (ret)
+			return ret;
+	}
+	list_for_each_entry(cci, &cache_ops_instance_list, node) {
+		ret = cache_inval_done_one(cci);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+struct cache_coherency_ops_inst *
+_cache_coherency_ops_instance_alloc(const struct cache_coherency_ops *ops,
+				    size_t size)
+{
+	struct cache_coherency_ops_inst *cci;
+
+	if (!ops || !ops->wbinv)
+		return NULL;
+
+	cci = kzalloc(size, GFP_KERNEL);
+	if (!cci)
+		return NULL;
+
+	cci->ops = ops;
+	INIT_LIST_HEAD(&cci->node);
+	kref_init(&cci->kref);
+
+	return cci;
+}
+EXPORT_SYMBOL_NS_GPL(_cache_coherency_ops_instance_alloc, "CACHE_COHERENCY");
+
+int cache_coherency_ops_instance_register(struct cache_coherency_ops_inst *cci)
+{
+	guard(rwsem_write)(&cache_ops_instance_list_lock);
+	list_add(&cci->node, &cache_ops_instance_list);
+
+	return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cache_coherency_ops_instance_register, "CACHE_COHERENCY");
+
+void cache_coherency_ops_instance_unregister(struct cache_coherency_ops_inst *cci)
+{
+	guard(rwsem_write)(&cache_ops_instance_list_lock);
+	list_del(&cci->node);
+}
+EXPORT_SYMBOL_NS_GPL(cache_coherency_ops_instance_unregister, "CACHE_COHERENCY");
+
+int cpu_cache_invalidate_memregion(phys_addr_t start, size_t len)
+{
+	return cache_invalidate_memregion(start, len);
+}
+EXPORT_SYMBOL_NS_GPL(cpu_cache_invalidate_memregion, "DEVMEM");
+
+/*
+ * Used for optimization / debug purposes only as removal can race
+ *
+ * Machines that do not support invalidation, e.g. VMs, will not have any
+ * operations instance to register and so this will always return false.
+ */
+bool cpu_cache_has_invalidate_memregion(void)
+{
+	guard(rwsem_read)(&cache_ops_instance_list_lock);
+	return !list_empty(&cache_ops_instance_list);
+}
+EXPORT_SYMBOL_NS_GPL(cpu_cache_has_invalidate_memregion, "DEVMEM");
-- 
2.48.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v4 4/6] arm64: Select GENERIC_CPU_CACHE_MAINTENANCE
  2025-10-22 11:33 [PATCH v4 0/6] Cache coherency management subsystem Jonathan Cameron
                   ` (2 preceding siblings ...)
  2025-10-22 11:33 ` [PATCH v4 3/6] lib: Support ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION Jonathan Cameron
@ 2025-10-22 11:33 ` Jonathan Cameron
  2025-10-22 11:33 ` [PATCH v4 5/6] MAINTAINERS: Add Jonathan Cameron to drivers/cache and add lib/cache_maint.c + header Jonathan Cameron
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Jonathan Cameron @ 2025-10-22 11:33 UTC (permalink / raw)
  To: Conor Dooley, Catalin Marinas, linux-cxl, linux-arm-kernel,
	linux-arch, linux-mm, Dan Williams, H . Peter Anvin,
	Peter Zijlstra, Andrew Morton
  Cc: james.morse, Will Deacon, Davidlohr Bueso, linuxarm, Yushan Wang,
	Lorenzo Pieralisi, Mark Rutland, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, Andy Lutomirski, Dave Jiang

The generic CPU cache maintenance framework provides a way to register
drivers for devices implementing the underlying support for
cpu_cache_has_invalidate_memregion(). Enable it for arm64 by selecting
GENERIC_CPU_CACHE_MAINTENANCE which provides the implementation for,
and in turn selects, ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
v4: Drop select ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION as that
    is now selected by GENERIC_CPU_CACHE_MAINTENANCE (Catalin Marinas)
    Picked up tag from Catalin. (thanks!)
---
 arch/arm64/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 6663ffd23f25..893e0af0bc51 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -149,6 +149,7 @@ config ARM64
 	select GENERIC_ARCH_TOPOLOGY
 	select GENERIC_CLOCKEVENTS_BROADCAST
 	select GENERIC_CPU_AUTOPROBE
+	select GENERIC_CPU_CACHE_MAINTENANCE
 	select GENERIC_CPU_DEVICES
 	select GENERIC_CPU_VULNERABILITIES
 	select GENERIC_EARLY_IOREMAP
-- 
2.48.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v4 5/6] MAINTAINERS: Add Jonathan Cameron to drivers/cache and add lib/cache_maint.c + header
  2025-10-22 11:33 [PATCH v4 0/6] Cache coherency management subsystem Jonathan Cameron
                   ` (3 preceding siblings ...)
  2025-10-22 11:33 ` [PATCH v4 4/6] arm64: Select GENERIC_CPU_CACHE_MAINTENANCE Jonathan Cameron
@ 2025-10-22 11:33 ` Jonathan Cameron
  2025-10-22 11:33 ` [PATCH v4 6/6] cache: Support cache maintenance for HiSilicon SoC Hydra Home Agent Jonathan Cameron
  2025-10-22 19:22 ` [PATCH v4 0/6] Cache coherency management subsystem Andrew Morton
  6 siblings, 0 replies; 18+ messages in thread
From: Jonathan Cameron @ 2025-10-22 11:33 UTC (permalink / raw)
  To: Conor Dooley, Catalin Marinas, linux-cxl, linux-arm-kernel,
	linux-arch, linux-mm, Dan Williams, H . Peter Anvin,
	Peter Zijlstra, Andrew Morton
  Cc: james.morse, Will Deacon, Davidlohr Bueso, linuxarm, Yushan Wang,
	Lorenzo Pieralisi, Mark Rutland, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, Andy Lutomirski, Dave Jiang

Seems unfair to inflict the cache-coherency drivers on Conor with out also
stepping up as a second maintainer for drivers/cache.

Include the library support for cache-coherency maintenance drivers to the
existing entry.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
---
v4: Tag from Conor (thanks!)  Updated commit message to make it
    reflect the added files.
v3: Add lib/cache_maint.c and include/cache_coherency.h
    Conor, do you mind those two being in this entry? Seems silly to spin
    another MAINTAINERS entry for a few 10s of lines of simple code.
---
 MAINTAINERS | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 46126ce2f968..b517f2703615 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -24417,10 +24417,13 @@ F:	drivers/staging/
 
 STANDALONE CACHE CONTROLLER DRIVERS
 M:	Conor Dooley <conor@kernel.org>
+M:	Jonathan Cameron <jonathan.cameron@huawei.com>
 S:	Maintained
 T:	git https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux.git/
 F:	Documentation/devicetree/bindings/cache/
 F:	drivers/cache
+F:	include/cache_coherency.h
+F:	lib/cache_maint.c
 
 STARFIRE/DURALAN NETWORK DRIVER
 M:	Ion Badulescu <ionut@badula.org>
-- 
2.48.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v4 6/6] cache: Support cache maintenance for HiSilicon SoC Hydra Home Agent
  2025-10-22 11:33 [PATCH v4 0/6] Cache coherency management subsystem Jonathan Cameron
                   ` (4 preceding siblings ...)
  2025-10-22 11:33 ` [PATCH v4 5/6] MAINTAINERS: Add Jonathan Cameron to drivers/cache and add lib/cache_maint.c + header Jonathan Cameron
@ 2025-10-22 11:33 ` Jonathan Cameron
  2025-10-22 21:39   ` Conor Dooley
  2025-10-22 19:22 ` [PATCH v4 0/6] Cache coherency management subsystem Andrew Morton
  6 siblings, 1 reply; 18+ messages in thread
From: Jonathan Cameron @ 2025-10-22 11:33 UTC (permalink / raw)
  To: Conor Dooley, Catalin Marinas, linux-cxl, linux-arm-kernel,
	linux-arch, linux-mm, Dan Williams, H . Peter Anvin,
	Peter Zijlstra, Andrew Morton
  Cc: james.morse, Will Deacon, Davidlohr Bueso, linuxarm, Yushan Wang,
	Lorenzo Pieralisi, Mark Rutland, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, Andy Lutomirski, Dave Jiang

From: Yushan Wang <wangyushan12@huawei.com>

Hydra Home Agent is a device used to maintain cache coherency, add support
of explicit cache maintenance operations for it.

Memory resource of HHA conflicts with that of HHA PMU. A workaround is
implemented here by replacing devm_ioremap_resource() to devm_ioremap() to
workaround the resource conflict check.

Co-developed-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Yushan Wang <wangyushan12@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

---
v4: Update for naming changes around device / instance.
    Switch to kref put based freeing via helper.
---
 drivers/cache/Kconfig        |  15 +++
 drivers/cache/Makefile       |   2 +
 drivers/cache/hisi_soc_hha.c | 191 +++++++++++++++++++++++++++++++++++
 3 files changed, 208 insertions(+)

diff --git a/drivers/cache/Kconfig b/drivers/cache/Kconfig
index db51386c663a..4551b28e14dd 100644
--- a/drivers/cache/Kconfig
+++ b/drivers/cache/Kconfig
@@ -1,6 +1,21 @@
 # SPDX-License-Identifier: GPL-2.0
 menu "Cache Drivers"
 
+if GENERIC_CPU_CACHE_MAINTENANCE
+
+config HISI_SOC_HHA
+	tristate "HiSilicon Hydra Home Agent (HHA) device driver"
+	depends on (ARM64 && ACPI) || COMPILE_TEST
+	help
+	  The Hydra Home Agent (HHA) is responsible for cache coherency
+	  on the SoC. This drivers enables the cache maintenance functions of
+	  the HHA.
+
+	  This driver can be built as a module. If so, the module will be
+	  called hisi_soc_hha.
+
+endif
+
 config AX45MP_L2_CACHE
 	bool "Andes Technology AX45MP L2 Cache controller"
 	depends on RISCV
diff --git a/drivers/cache/Makefile b/drivers/cache/Makefile
index 55c5e851034d..b3362b15d6c1 100644
--- a/drivers/cache/Makefile
+++ b/drivers/cache/Makefile
@@ -3,3 +3,5 @@
 obj-$(CONFIG_AX45MP_L2_CACHE)		+= ax45mp_cache.o
 obj-$(CONFIG_SIFIVE_CCACHE)		+= sifive_ccache.o
 obj-$(CONFIG_STARFIVE_STARLINK_CACHE)	+= starfive_starlink_cache.o
+
+obj-$(CONFIG_HISI_SOC_HHA)		+= hisi_soc_hha.o
diff --git a/drivers/cache/hisi_soc_hha.c b/drivers/cache/hisi_soc_hha.c
new file mode 100644
index 000000000000..bf403f711c6b
--- /dev/null
+++ b/drivers/cache/hisi_soc_hha.c
@@ -0,0 +1,191 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Driver for HiSilicon Hydra Home Agent (HHA).
+ *
+ * Copyright (c) 2025 HiSilicon Technologies Co., Ltd.
+ * Author: Yicong Yang <yangyicong@hisilicon.com>
+ *         Yushan Wang <wangyushan12@huawei.com>
+ */
+
+#include <linux/bitfield.h>
+#include <linux/cache_coherency.h>
+#include <linux/dev_printk.h>
+#include <linux/init.h>
+#include <linux/io.h>
+#include <linux/iopoll.h>
+#include <linux/kernel.h>
+#include <linux/memregion.h>
+#include <linux/module.h>
+#include <linux/mod_devicetable.h>
+#include <linux/mutex.h>
+#include <linux/platform_device.h>
+
+#define HISI_HHA_CTRL		0x5004
+#define   HISI_HHA_CTRL_EN	BIT(0)
+#define   HISI_HHA_CTRL_RANGE	BIT(1)
+#define   HISI_HHA_CTRL_TYPE	GENMASK(3, 2)
+#define HISI_HHA_START_L	0x5008
+#define HISI_HHA_START_H	0x500c
+#define HISI_HHA_LEN_L		0x5010
+#define HISI_HHA_LEN_H		0x5014
+
+/* The maintain operation performs in a 128 Byte granularity */
+#define HISI_HHA_MAINT_ALIGN	128
+
+#define HISI_HHA_POLL_GAP_US		10
+#define HISI_HHA_POLL_TIMEOUT_US	50000
+
+struct hisi_soc_hha {
+	/* Must be first element */
+	struct cache_coherency_ops_inst cci;
+	/* Locks HHA instance to forbid overlapping access. */
+	struct mutex lock;
+	void __iomem *base;
+};
+
+static bool hisi_hha_cache_maintain_wait_finished(struct hisi_soc_hha *soc_hha)
+{
+	u32 val;
+
+	return !readl_poll_timeout_atomic(soc_hha->base + HISI_HHA_CTRL, val,
+					  !(val & HISI_HHA_CTRL_EN),
+					  HISI_HHA_POLL_GAP_US,
+					  HISI_HHA_POLL_TIMEOUT_US);
+}
+
+static int hisi_soc_hha_wbinv(struct cache_coherency_ops_inst *cci,
+			struct cc_inval_params *invp)
+{
+	struct hisi_soc_hha *soc_hha =
+		container_of(cci, struct hisi_soc_hha, cci);
+	phys_addr_t top, addr = invp->addr;
+	size_t size = invp->size;
+	u32 reg;
+
+	if (!size)
+		return -EINVAL;
+
+	addr = ALIGN_DOWN(addr, HISI_HHA_MAINT_ALIGN);
+	top = ALIGN(addr + size, HISI_HHA_MAINT_ALIGN);
+	size = top - addr;
+
+	guard(mutex)(&soc_hha->lock);
+
+	if (!hisi_hha_cache_maintain_wait_finished(soc_hha))
+		return -EBUSY;
+
+	/*
+	 * Hardware will search for addresses ranging [addr, addr + size - 1],
+	 * last byte included, and perform maintain in 128 byte granule
+	 * on those cachelines which contain the addresses.
+	 */
+	size -= 1;
+
+	writel(lower_32_bits(addr), soc_hha->base + HISI_HHA_START_L);
+	writel(upper_32_bits(addr), soc_hha->base + HISI_HHA_START_H);
+	writel(lower_32_bits(size), soc_hha->base + HISI_HHA_LEN_L);
+	writel(upper_32_bits(size), soc_hha->base + HISI_HHA_LEN_H);
+
+	reg = FIELD_PREP(HISI_HHA_CTRL_TYPE, 1); /* Clean Invalid */
+	reg |= HISI_HHA_CTRL_RANGE | HISI_HHA_CTRL_EN;
+	writel(reg, soc_hha->base + HISI_HHA_CTRL);
+
+	return 0;
+}
+
+static int hisi_soc_hha_done(struct cache_coherency_ops_inst *cci)
+{
+	struct hisi_soc_hha *soc_hha =
+		container_of(cci, struct hisi_soc_hha, cci);
+
+	guard(mutex)(&soc_hha->lock);
+	if (!hisi_hha_cache_maintain_wait_finished(soc_hha))
+		return -ETIMEDOUT;
+
+	return 0;
+}
+
+static const struct cache_coherency_ops hha_ops = {
+	.wbinv = hisi_soc_hha_wbinv,
+	.done = hisi_soc_hha_done,
+};
+
+static int hisi_soc_hha_probe(struct platform_device *pdev)
+{
+	struct hisi_soc_hha *soc_hha;
+	struct resource *mem;
+	int ret;
+
+	soc_hha = cache_coherency_ops_instance_alloc(&hha_ops,
+						     struct hisi_soc_hha, cci);
+	if (!soc_hha)
+		return -ENOMEM;
+
+	platform_set_drvdata(pdev, soc_hha);
+
+	mutex_init(&soc_hha->lock);
+
+	mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	if (!mem) {
+		ret = -ENOMEM;
+		goto err_free_cci;
+	}
+
+	/*
+	 * HHA cache driver share the same register region with HHA uncore PMU
+	 * driver in hardware's perspective, none of them should reserve the
+	 * resource to itself only.  Here exclusive access verification is
+	 * avoided by calling devm_ioremap instead of devm_ioremap_resource to
+	 * allow both drivers to exist at the same time.
+	 */
+	soc_hha->base = ioremap(mem->start, resource_size(mem));
+	if (!soc_hha->base) {
+		ret = dev_err_probe(&pdev->dev, -ENOMEM,
+				    "failed to remap io memory");
+		goto err_free_cci;
+	}
+
+	ret = cache_coherency_ops_instance_register(&soc_hha->cci);
+	if (ret)
+		goto err_iounmap;
+
+	return 0;
+
+err_iounmap:
+	iounmap(soc_hha->base);
+err_free_cci:
+	cache_coherency_ops_instance_put(&soc_hha->cci);
+	return ret;
+}
+
+static void hisi_soc_hha_remove(struct platform_device *pdev)
+{
+	struct hisi_soc_hha *soc_hha = platform_get_drvdata(pdev);
+
+	cache_coherency_ops_instance_unregister(&soc_hha->cci);
+	iounmap(soc_hha->base);
+	cache_coherency_ops_instance_put(&soc_hha->cci);
+}
+
+static const struct acpi_device_id hisi_soc_hha_ids[] = {
+	{ "HISI0511", },
+	{ }
+};
+MODULE_DEVICE_TABLE(acpi, hisi_soc_hha_ids);
+
+static struct platform_driver hisi_soc_hha_driver = {
+	.driver = {
+		.name = "hisi_soc_hha",
+		.acpi_match_table = hisi_soc_hha_ids,
+	},
+	.probe = hisi_soc_hha_probe,
+	.remove = hisi_soc_hha_remove,
+};
+
+module_platform_driver(hisi_soc_hha_driver);
+
+MODULE_IMPORT_NS("CACHE_COHERENCY");
+MODULE_DESCRIPTION("HiSilicon Hydra Home Agent driver supporting cache maintenance");
+MODULE_AUTHOR("Yicong Yang <yangyicong@hisilicon.com>");
+MODULE_AUTHOR("Yushan Wang <wangyushan12@huawei.com>");
+MODULE_LICENSE("GPL");
-- 
2.48.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 0/6] Cache coherency management subsystem
  2025-10-22 11:33 [PATCH v4 0/6] Cache coherency management subsystem Jonathan Cameron
                   ` (5 preceding siblings ...)
  2025-10-22 11:33 ` [PATCH v4 6/6] cache: Support cache maintenance for HiSilicon SoC Hydra Home Agent Jonathan Cameron
@ 2025-10-22 19:22 ` Andrew Morton
  2025-10-22 20:47   ` Conor Dooley
  2025-10-23 12:31   ` Jonathan Cameron
  6 siblings, 2 replies; 18+ messages in thread
From: Andrew Morton @ 2025-10-22 19:22 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Conor Dooley, Catalin Marinas, linux-cxl, linux-arm-kernel,
	linux-arch, linux-mm, Dan Williams, H . Peter Anvin,
	Peter Zijlstra, james.morse, Will Deacon, Davidlohr Bueso,
	linuxarm, Yushan Wang, Lorenzo Pieralisi, Mark Rutland,
	Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Andy Lutomirski, Dave Jiang

On Wed, 22 Oct 2025 12:33:43 +0100 Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> Support system level interfaces for cache maintenance as found on some
> ARM64 systems. This is needed for correct functionality during various
> forms of memory hotplug (e.g. CXL). Typical hardware has MMIO interface
> found via ACPI DSDT.
> 
> Includes parameter changes to cpu_cache_invalidate_memregion() but no
> functional changes for architectures that already support this call.

I see additions to lib/ so presumably there is an expectation that
other architectures might use this.

Please expand on this.  Any particular architectures in mind?  Any
words of wisdom which maintainers of those architectures might benefit
from?

> How to merge?  When this is ready to proceed (so subject to review
> feedback on this version), I'm not sure what the best route into the
> kernel is. Conor could take the lot via his tree for drivers/cache but
> the generic changes perhaps suggest it might be better if Andrew
> handles this?  Any merge conflicts in drivers/cache will be trivial
> build file stuff. Or maybe even take it throug one of the affected
> trees such as CXL.

Let's not split the series up.  Either CXL or COnor's tree is fine my
me.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 0/6] Cache coherency management subsystem
  2025-10-22 19:22 ` [PATCH v4 0/6] Cache coherency management subsystem Andrew Morton
@ 2025-10-22 20:47   ` Conor Dooley
  2025-10-23 16:40     ` Jonathan Cameron
  2025-10-23 12:31   ` Jonathan Cameron
  1 sibling, 1 reply; 18+ messages in thread
From: Conor Dooley @ 2025-10-22 20:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jonathan Cameron, Catalin Marinas, linux-cxl, linux-arm-kernel,
	linux-arch, linux-mm, Dan Williams, H . Peter Anvin,
	Peter Zijlstra, james.morse, Will Deacon, Davidlohr Bueso,
	linuxarm, Yushan Wang, Lorenzo Pieralisi, Mark Rutland,
	Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Andy Lutomirski, Dave Jiang

[-- Attachment #1: Type: text/plain, Size: 1626 bytes --]

On Wed, Oct 22, 2025 at 12:22:41PM -0700, Andrew Morton wrote:
> On Wed, 22 Oct 2025 12:33:43 +0100 Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> 
> > Support system level interfaces for cache maintenance as found on some
> > ARM64 systems. This is needed for correct functionality during various
> > forms of memory hotplug (e.g. CXL). Typical hardware has MMIO interface
> > found via ACPI DSDT.
> > 
> > Includes parameter changes to cpu_cache_invalidate_memregion() but no
> > functional changes for architectures that already support this call.
> 
> I see additions to lib/ so presumably there is an expectation that
> other architectures might use this.
> 
> Please expand on this.  Any particular architectures in mind?  Any
> words of wisdom which maintainers of those architectures might benefit
> from?

It seems fairly probable that we're gonna end up with riscv systems
where drivers are being used for both this and the existing non-standard
cache ops stuff.

> > How to merge?  When this is ready to proceed (so subject to review
> > feedback on this version), I'm not sure what the best route into the
> > kernel is. Conor could take the lot via his tree for drivers/cache but
> > the generic changes perhaps suggest it might be better if Andrew
> > handles this?  Any merge conflicts in drivers/cache will be trivial
> > build file stuff. Or maybe even take it throug one of the affected
> > trees such as CXL.
> 
> Let's not split the series up.  Either CXL or COnor's tree is fine my
> me.

CXL is fine by me, greater volume there probably by orders of magnitude.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 3/6] lib: Support ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION
  2025-10-22 11:33 ` [PATCH v4 3/6] lib: Support ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION Jonathan Cameron
@ 2025-10-22 21:11   ` Conor Dooley
  2025-10-23 11:13     ` Jonathan Cameron
  0 siblings, 1 reply; 18+ messages in thread
From: Conor Dooley @ 2025-10-22 21:11 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Catalin Marinas, linux-cxl, linux-arm-kernel, linux-arch,
	linux-mm, Dan Williams, H . Peter Anvin, Peter Zijlstra,
	Andrew Morton, james.morse, Will Deacon, Davidlohr Bueso,
	linuxarm, Yushan Wang, Lorenzo Pieralisi, Mark Rutland,
	Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Andy Lutomirski, Dave Jiang

[-- Attachment #1: Type: text/plain, Size: 1956 bytes --]

On Wed, Oct 22, 2025 at 12:33:46PM +0100, Jonathan Cameron wrote:
> From: Yicong Yang <yangyicong@hisilicon.com>
> 
> ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION provides the mechanism for
> invalidating certain memory regions in a cache-incoherent manner. Currently
> this is used by NVDIMM and CXL memory drivers in cases where it is
> necessary to flush all data from caches by physical address range.
> 
> In some architectures these operations are supported by system components
> that may become available only later in boot as they are either present
> on a discoverable bus, or via a firmware description of an MMIO interface
> (e.g. ACPI DSDT). Provide a framework to handle this case.
> 
> Architectures can opt in for this support via
> CONFIG_GENERIC_CPU_CACHE_MAINTENANCE
> 
> Add a registration framework. Each driver provides an ops structure and
> the first op is Write Back and Invalidate by PA Range. The driver may
> over invalidate.
> 
> An optional completion check operation is also provided. If present
> that should be called to ensure that the action has finished.
> 
> When multiple agents are present in the system each should register with
> this framework and the core code will issue the invalidate to all of them
> before checking for completion on each. This is done to avoid need for
> filtering in the core code which can become complex when interleave,
> potentially across different cache coherency hardware is going on, so it
> is easier to tell everyone and let those who don't care do nothing.
> 
> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Acked-by: Conor Dooley <conor.dooley@microchip.com>

I'm fine with this stuff. I do wonder though, have you actually
encountered systems with the multiple "agents" or is that something
theoretical?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 6/6] cache: Support cache maintenance for HiSilicon SoC Hydra Home Agent
  2025-10-22 11:33 ` [PATCH v4 6/6] cache: Support cache maintenance for HiSilicon SoC Hydra Home Agent Jonathan Cameron
@ 2025-10-22 21:39   ` Conor Dooley
  2025-10-23 11:49     ` Jonathan Cameron
  0 siblings, 1 reply; 18+ messages in thread
From: Conor Dooley @ 2025-10-22 21:39 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Catalin Marinas, linux-cxl, linux-arm-kernel, linux-arch,
	linux-mm, Dan Williams, H . Peter Anvin, Peter Zijlstra,
	Andrew Morton, james.morse, Will Deacon, Davidlohr Bueso,
	linuxarm, Yushan Wang, Lorenzo Pieralisi, Mark Rutland,
	Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Andy Lutomirski, Dave Jiang

[-- Attachment #1: Type: text/plain, Size: 4542 bytes --]

On Wed, Oct 22, 2025 at 12:33:49PM +0100, Jonathan Cameron wrote:

> +static int hisi_soc_hha_wbinv(struct cache_coherency_ops_inst *cci,
> +			struct cc_inval_params *invp)
> +{
> +	struct hisi_soc_hha *soc_hha =
> +		container_of(cci, struct hisi_soc_hha, cci);
> +	phys_addr_t top, addr = invp->addr;
> +	size_t size = invp->size;
> +	u32 reg;
> +
> +	if (!size)
> +		return -EINVAL;
> +
> +	addr = ALIGN_DOWN(addr, HISI_HHA_MAINT_ALIGN);
> +	top = ALIGN(addr + size, HISI_HHA_MAINT_ALIGN);
> +	size = top - addr;
> +
> +	guard(mutex)(&soc_hha->lock);
> +
> +	if (!hisi_hha_cache_maintain_wait_finished(soc_hha))
> +		return -EBUSY;
> +
> +	/*
> +	 * Hardware will search for addresses ranging [addr, addr + size - 1],
> +	 * last byte included, and perform maintain in 128 byte granule
> +	 * on those cachelines which contain the addresses.
> +	 */

Hmm, does this mean that the IP has some built-in handling for there
being more than one "agent" in a system? IOW, if the address is not in
its range, then the search will just fail into a NOP?
If that's not the case, is this particular "agent" by design not suitable
for a system like that? Or will a dual hydra home agent system come with
a new ACPI ID that we can use to deal with that kind of situation?
(Although I don't know enough about ACPI to know where you'd even get
the information about what instance handles what range from...)

> +	size -= 1;
> +
> +	writel(lower_32_bits(addr), soc_hha->base + HISI_HHA_START_L);
> +	writel(upper_32_bits(addr), soc_hha->base + HISI_HHA_START_H);
> +	writel(lower_32_bits(size), soc_hha->base + HISI_HHA_LEN_L);
> +	writel(upper_32_bits(size), soc_hha->base + HISI_HHA_LEN_H);
> +
> +	reg = FIELD_PREP(HISI_HHA_CTRL_TYPE, 1); /* Clean Invalid */
> +	reg |= HISI_HHA_CTRL_RANGE | HISI_HHA_CTRL_EN;
> +	writel(reg, soc_hha->base + HISI_HHA_CTRL);
> +
> +	return 0;
> +}
> +
> +static int hisi_soc_hha_done(struct cache_coherency_ops_inst *cci)
> +{
> +	struct hisi_soc_hha *soc_hha =
> +		container_of(cci, struct hisi_soc_hha, cci);
> +
> +	guard(mutex)(&soc_hha->lock);
> +	if (!hisi_hha_cache_maintain_wait_finished(soc_hha))
> +		return -ETIMEDOUT;
> +
> +	return 0;
> +}
> +
> +static const struct cache_coherency_ops hha_ops = {
> +	.wbinv = hisi_soc_hha_wbinv,
> +	.done = hisi_soc_hha_done,
> +};
> +
> +static int hisi_soc_hha_probe(struct platform_device *pdev)
> +{
> +	struct hisi_soc_hha *soc_hha;
> +	struct resource *mem;
> +	int ret;
> +
> +	soc_hha = cache_coherency_ops_instance_alloc(&hha_ops,
> +						     struct hisi_soc_hha, cci);
> +	if (!soc_hha)
> +		return -ENOMEM;
> +
> +	platform_set_drvdata(pdev, soc_hha);
> +
> +	mutex_init(&soc_hha->lock);
> +
> +	mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> +	if (!mem) {
> +		ret = -ENOMEM;
> +		goto err_free_cci;
> +	}
> +
> +	/*
> +	 * HHA cache driver share the same register region with HHA uncore PMU
> +	 * driver in hardware's perspective, none of them should reserve the
> +	 * resource to itself only.  Here exclusive access verification is
> +	 * avoided by calling devm_ioremap instead of devm_ioremap_resource to

The comment here doesn't exactly match the code, dunno if you went away
from devm some reason and just forgot to to make the change or the other
way around? Not a big deal obviously, but maybe you forgot to do
something you intended doing. It's mentioned in the commit message too.

Other than the question I have about the multi-"agent" stuff, this
looks fine to me. I assume it's been thought about and is fine for w/e
reason, but I'd like to know what that is.

Cheers,
Conor.

> +	 * allow both drivers to exist at the same time.
> +	 */
> +	soc_hha->base = ioremap(mem->start, resource_size(mem));
> +	if (!soc_hha->base) {
> +		ret = dev_err_probe(&pdev->dev, -ENOMEM,
> +				    "failed to remap io memory");
> +		goto err_free_cci;
> +	}
> +
> +	ret = cache_coherency_ops_instance_register(&soc_hha->cci);
> +	if (ret)
> +		goto err_iounmap;
> +
> +	return 0;
> +
> +err_iounmap:
> +	iounmap(soc_hha->base);
> +err_free_cci:
> +	cache_coherency_ops_instance_put(&soc_hha->cci);
> +	return ret;
> +}
> +
> +static void hisi_soc_hha_remove(struct platform_device *pdev)
> +{
> +	struct hisi_soc_hha *soc_hha = platform_get_drvdata(pdev);
> +
> +	cache_coherency_ops_instance_unregister(&soc_hha->cci);
> +	iounmap(soc_hha->base);
> +	cache_coherency_ops_instance_put(&soc_hha->cci);
> +}

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 3/6] lib: Support ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION
  2025-10-22 21:11   ` Conor Dooley
@ 2025-10-23 11:13     ` Jonathan Cameron
  0 siblings, 0 replies; 18+ messages in thread
From: Jonathan Cameron @ 2025-10-23 11:13 UTC (permalink / raw)
  To: Conor Dooley
  Cc: Catalin Marinas, linux-cxl, linux-arm-kernel, linux-arch,
	linux-mm, Dan Williams, H . Peter Anvin, Peter Zijlstra,
	Andrew Morton, james.morse, Will Deacon, Davidlohr Bueso,
	linuxarm, Yushan Wang, Lorenzo Pieralisi, Mark Rutland,
	Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Andy Lutomirski, Dave Jiang

On Wed, 22 Oct 2025 22:11:12 +0100
Conor Dooley <conor@kernel.org> wrote:

> On Wed, Oct 22, 2025 at 12:33:46PM +0100, Jonathan Cameron wrote:
> > From: Yicong Yang <yangyicong@hisilicon.com>
> > 
> > ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION provides the mechanism for
> > invalidating certain memory regions in a cache-incoherent manner. Currently
> > this is used by NVDIMM and CXL memory drivers in cases where it is
> > necessary to flush all data from caches by physical address range.
> > 
> > In some architectures these operations are supported by system components
> > that may become available only later in boot as they are either present
> > on a discoverable bus, or via a firmware description of an MMIO interface
> > (e.g. ACPI DSDT). Provide a framework to handle this case.
> > 
> > Architectures can opt in for this support via
> > CONFIG_GENERIC_CPU_CACHE_MAINTENANCE
> > 
> > Add a registration framework. Each driver provides an ops structure and
> > the first op is Write Back and Invalidate by PA Range. The driver may
> > over invalidate.
> > 
> > An optional completion check operation is also provided. If present
> > that should be called to ensure that the action has finished.
> > 
> > When multiple agents are present in the system each should register with
> > this framework and the core code will issue the invalidate to all of them
> > before checking for completion on each. This is done to avoid need for
> > filtering in the core code which can become complex when interleave,
> > potentially across different cache coherency hardware is going on, so it
> > is easier to tell everyone and let those who don't care do nothing.
> > 
> > Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> > Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>  
> 
> Acked-by: Conor Dooley <conor.dooley@microchip.com>
> 
> I'm fine with this stuff. I do wonder though, have you actually
> encountered systems with the multiple "agents" or is that something
> theoretical?

Yes to multiple agents. There are a multiple instances in the HiSi platform.
The multiple heterogeneous agents case is more theoretical today.  Similar
components for other purposes are heterogeneous so I'd be surprised if it
doesn't surface at some point. Our initial internal driver for the
hisi_hha wrapped up the multiple instances in a fake front end, but it
meant we ended up with multiple levels of registration and it was just
simpler to relax assumption that they were all handled by one driver.

Jonathan




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 6/6] cache: Support cache maintenance for HiSilicon SoC Hydra Home Agent
  2025-10-22 21:39   ` Conor Dooley
@ 2025-10-23 11:49     ` Jonathan Cameron
  2025-10-23 17:58       ` Conor Dooley
  0 siblings, 1 reply; 18+ messages in thread
From: Jonathan Cameron @ 2025-10-23 11:49 UTC (permalink / raw)
  To: Conor Dooley
  Cc: Catalin Marinas, linux-cxl, linux-arm-kernel, linux-arch,
	linux-mm, Dan Williams, H . Peter Anvin, Peter Zijlstra,
	Andrew Morton, james.morse, Will Deacon, Davidlohr Bueso,
	linuxarm, Yushan Wang, Lorenzo Pieralisi, Mark Rutland,
	Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Andy Lutomirski, Dave Jiang

On Wed, 22 Oct 2025 22:39:28 +0100
Conor Dooley <conor@kernel.org> wrote:

Hi Conor,

> On Wed, Oct 22, 2025 at 12:33:49PM +0100, Jonathan Cameron wrote:
> 
> > +static int hisi_soc_hha_wbinv(struct cache_coherency_ops_inst *cci,
> > +			struct cc_inval_params *invp)
> > +{
> > +	struct hisi_soc_hha *soc_hha =
> > +		container_of(cci, struct hisi_soc_hha, cci);
> > +	phys_addr_t top, addr = invp->addr;
> > +	size_t size = invp->size;
> > +	u32 reg;
> > +
> > +	if (!size)
> > +		return -EINVAL;
> > +
> > +	addr = ALIGN_DOWN(addr, HISI_HHA_MAINT_ALIGN);
> > +	top = ALIGN(addr + size, HISI_HHA_MAINT_ALIGN);
> > +	size = top - addr;
> > +
> > +	guard(mutex)(&soc_hha->lock);
> > +
> > +	if (!hisi_hha_cache_maintain_wait_finished(soc_hha))
> > +		return -EBUSY;
> > +
> > +	/*
> > +	 * Hardware will search for addresses ranging [addr, addr + size - 1],
> > +	 * last byte included, and perform maintain in 128 byte granule
> > +	 * on those cachelines which contain the addresses.
> > +	 */  
> 
> Hmm, does this mean that the IP has some built-in handling for there
> being more than one "agent" in a system? IOW, if the address is not in
> its range, then the search will just fail into a NOP?

Exactly that. NOP if nothing to do. The hardware is only tracking
a subset of what it might contain (depending on which cachelines are
actually in caches) so it's very much a 'clear this if you happen to
have it' command.  Even if it is in the subset of PA being covered by
an instance, many cases will be a 'miss' and hence a NOP.

> If that's not the case, is this particular "agent" by design not suitable
> for a system like that? Or will a dual hydra home agent system come with
> a new ACPI ID that we can use to deal with that kind of situation?

Existing systems have multiple instances of this hardware block.

Simplifying things over reality just to make this explanation less
messy.  (ignoring other levels of interleaving beyond the Point of
Coherency etc).

In servers the DRAM access are pretty much always interleaved 
(usually at cache line granularity). That interleaving may go very
different physical locations on a die or across multiple dies.

Similarly the agent responsible for tracking the coherency state
(easy to think of this as a complete directory but it's never that
simple) is distributed so that it is on the path to the DRAM. Hence
if we have N way interleave there maybe N separate agents responsible for
different parts of the range 0..(64*N-1) (taking smallest possible
flush that would have to go to all those agents).
 
> (Although I don't know enough about ACPI to know where you'd even get
> the information about what instance handles what range from...)

We don't today. It would be easy to encode that information
as a resource and it may make sense for larger systems depending
on exactly how the coherency fabric in a system works. I'd definitely
expect to see some drivers doing this. Those drivers could then prefilter.

Interleaving gets really complex so any description is likely to only
provide a conservative superset of what is actually handled by a given
agent.

> 
> > +	size -= 1;
> > +
> > +	writel(lower_32_bits(addr), soc_hha->base + HISI_HHA_START_L);
> > +	writel(upper_32_bits(addr), soc_hha->base + HISI_HHA_START_H);
> > +	writel(lower_32_bits(size), soc_hha->base + HISI_HHA_LEN_L);
> > +	writel(upper_32_bits(size), soc_hha->base + HISI_HHA_LEN_H);
> > +
> > +	reg = FIELD_PREP(HISI_HHA_CTRL_TYPE, 1); /* Clean Invalid */
> > +	reg |= HISI_HHA_CTRL_RANGE | HISI_HHA_CTRL_EN;
> > +	writel(reg, soc_hha->base + HISI_HHA_CTRL);
> > +
> > +	return 0;
> > +}
> > +
> > +static int hisi_soc_hha_done(struct cache_coherency_ops_inst *cci)
> > +{
> > +	struct hisi_soc_hha *soc_hha =
> > +		container_of(cci, struct hisi_soc_hha, cci);
> > +
> > +	guard(mutex)(&soc_hha->lock);
> > +	if (!hisi_hha_cache_maintain_wait_finished(soc_hha))
> > +		return -ETIMEDOUT;
> > +
> > +	return 0;
> > +}
> > +
> > +static const struct cache_coherency_ops hha_ops = {
> > +	.wbinv = hisi_soc_hha_wbinv,
> > +	.done = hisi_soc_hha_done,
> > +};
> > +
> > +static int hisi_soc_hha_probe(struct platform_device *pdev)
> > +{
> > +	struct hisi_soc_hha *soc_hha;
> > +	struct resource *mem;
> > +	int ret;
> > +
> > +	soc_hha = cache_coherency_ops_instance_alloc(&hha_ops,
> > +						     struct hisi_soc_hha, cci);
> > +	if (!soc_hha)
> > +		return -ENOMEM;
> > +
> > +	platform_set_drvdata(pdev, soc_hha);
> > +
> > +	mutex_init(&soc_hha->lock);
> > +
> > +	mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> > +	if (!mem) {
> > +		ret = -ENOMEM;
> > +		goto err_free_cci;
> > +	}
> > +
> > +	/*
> > +	 * HHA cache driver share the same register region with HHA uncore PMU
> > +	 * driver in hardware's perspective, none of them should reserve the
> > +	 * resource to itself only.  Here exclusive access verification is
> > +	 * avoided by calling devm_ioremap instead of devm_ioremap_resource to  
> 
> The comment here doesn't exactly match the code, dunno if you went away
> from devm some reason and just forgot to to make the change or the other
> way around? Not a big deal obviously, but maybe you forgot to do
> something you intended doing. It's mentioned in the commit message too.

Ah. Indeed stale comment, I'll drop that.

Going away from devm was mostly a hang over from similar discussions in fwctl
where I copied the pattern of embedded device(there)/kref(here) and reluctance
to hide way the final put().

> 
> Other than the question I have about the multi-"agent" stuff, this
> looks fine to me. I assume it's been thought about and is fine for w/e
> reason, but I'd like to know what that is.

I'll see if I can craft a short intro bit of documentation for the
top of this driver file to state clearly that there are lots of instances
of this in a system and that a requests to clear something that isn't 'theirs'
results in a NOP.  Better to have that available so anyone writing
a similar driver thinks about whether that applies to what they have or
if they need to do in driver filtering.

> 
> Cheers,
> Conor.

Thanks!

Jonathan
> 
> > +	 * allow both drivers to exist at the same time.
> > +	 */
> > +	soc_hha->base = ioremap(mem->start, resource_size(mem));
> > +	if (!soc_hha->base) {
> > +		ret = dev_err_probe(&pdev->dev, -ENOMEM,
> > +				    "failed to remap io memory");
> > +		goto err_free_cci;
> > +	}
> > +
> > +	ret = cache_coherency_ops_instance_register(&soc_hha->cci);
> > +	if (ret)
> > +		goto err_iounmap;
> > +
> > +	return 0;
> > +
> > +err_iounmap:
> > +	iounmap(soc_hha->base);
> > +err_free_cci:
> > +	cache_coherency_ops_instance_put(&soc_hha->cci);
> > +	return ret;
> > +}
> > +
> > +static void hisi_soc_hha_remove(struct platform_device *pdev)
> > +{
> > +	struct hisi_soc_hha *soc_hha = platform_get_drvdata(pdev);
> > +
> > +	cache_coherency_ops_instance_unregister(&soc_hha->cci);
> > +	iounmap(soc_hha->base);
> > +	cache_coherency_ops_instance_put(&soc_hha->cci);
> > +}  
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 0/6] Cache coherency management subsystem
  2025-10-22 19:22 ` [PATCH v4 0/6] Cache coherency management subsystem Andrew Morton
  2025-10-22 20:47   ` Conor Dooley
@ 2025-10-23 12:31   ` Jonathan Cameron
  1 sibling, 0 replies; 18+ messages in thread
From: Jonathan Cameron @ 2025-10-23 12:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Conor Dooley, Catalin Marinas, linux-cxl, linux-arm-kernel,
	linux-arch, linux-mm, Dan Williams, H . Peter Anvin,
	Peter Zijlstra, james.morse, Will Deacon, Davidlohr Bueso,
	linuxarm, Yushan Wang, Lorenzo Pieralisi, Mark Rutland,
	Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Andy Lutomirski, Dave Jiang

On Wed, 22 Oct 2025 12:22:41 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Wed, 22 Oct 2025 12:33:43 +0100 Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> 
> > Support system level interfaces for cache maintenance as found on some
> > ARM64 systems. This is needed for correct functionality during various
> > forms of memory hotplug (e.g. CXL). Typical hardware has MMIO interface
> > found via ACPI DSDT.
> > 
> > Includes parameter changes to cpu_cache_invalidate_memregion() but no
> > functional changes for architectures that already support this call.  

Hi Andrew,

> 
> I see additions to lib/ so presumably there is an expectation that
> other architectures might use this.

Absolutely. It's not ARM specific in any way. Given, in at least some
implementations, it is part of the coherency fabric and there are
examples in the past of people mixing and matching those with CPU
architectures, it's more than possible a given driver might be applicable
across different CPU architectures.

> 
> Please expand on this.  Any particular architectures in mind?  Any
> words of wisdom which maintainers of those architectures might benefit
> from?

My initial guess for a second architecture using it would be RiscV
but I don't know if anyone yet cares about the any of the use cases.

The short answer is that it depends on whether the architecture
requires 'one solution' or leaves it as a system problem where
a driver needs to be loaded to suit the particular implementation.

Longer answer follows:

There are two aspects to when people might find this useful to
consider

A) The use case.  For it to apply to an architecture you need to have
   a requirement to support the case of content of memory presented
   at a PA to change without the host explicitly writing it.  That
   can happen for various reasons.
   - Late exposure of memory - security keys for pmem for instance.
     Until those are programmed any prefetchers will fill caches
     with garbage that needs clearing out.
   - Reprogramming of address decoders beyond the edge of where the
     Host Physical Addresses define what goes on.  This is the CXL
     case where there is a translation from Host Physical Address
     to Device Physical address which can change at runtime.
   - (not yet enabled) Interhost sharing without hw coherency. Necessary
     to flush local caches because someone changed the data under the
     hood. Because this happened beyond the scope of the local host
     normal cache flushing instructions might not do the job.
     Hopefully we will have lighter weight solutions for this.
So upshot today is that it is likely to only apply to server architectures.

B) Is there an architected solution for that architecture. (i.e. is it
   in the CPU architecture spec) If there is 'one solution', then
   registering the arch callbacks directly is sufficient. This is
   true for x86 as there is a CPU instruction that performances the
   relevant operations.

Arm decided (for now) to not go down the path of architecting this
in one of their architecture specs that licensees would then have
to comply with (I'll let James / others add more on that if they want).
There were already being multiple hardware IPs out there that providing
this feature as part of the coherency fabrics.  Earlier versions of
this series mentioned an attempt to provide a firmware interface to
hide away the complexity but that also turned out to be unnecessary
as everyone with a usecase had memory mapped devices the kernel can
directly control.

So there will be multiple different implementations on ARM servers.
I doubt we'll even keep it completely consistent across different
HiSilicon CPU generations. As per the discussion with Conor, there
are multiple agents each of which registers separately and has
no knowledge of the other instances. For now the ones I know of
are homogeneous for a given server, but it made no difference to
allow for heterogeneous cases (I emulated those to check).

So for other architectures, it is a case of which path do they want to
follow?  If they don't have existing instructions defined that work
for this, and have more than one implementer, then the approach seen
here should be useful. I think RiscV doesn't have such an instruction
so I'd expect this to be useful to them.  Not sure on other server
architectures as most of them today are much less diverse than ARM / RiscV
so a "one true solution" in an architecture spec is perhaps more likely.

In the various review rounds, we've had some discussion of the requirements
implied by the current simple interface (no ordering, single operation in
flight).  So I'd not be surprised if we have to make things a little
cleverer in the long run.  The HiSilicon HHA hardware interface is very simple
so I've supported what that (and the PSCI spec with sane options - see v3)
for now.

> 
> > How to merge?  When this is ready to proceed (so subject to review
> > feedback on this version), I'm not sure what the best route into the
> > kernel is. Conor could take the lot via his tree for drivers/cache but
> > the generic changes perhaps suggest it might be better if Andrew
> > handles this?  Any merge conflicts in drivers/cache will be trivial
> > build file stuff. Or maybe even take it throug one of the affected
> > trees such as CXL.  
> 
> Let's not split the series up.  Either CXL or COnor's tree is fine my
> me.

Thanks,

Jonathan
> 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 0/6] Cache coherency management subsystem
  2025-10-22 20:47   ` Conor Dooley
@ 2025-10-23 16:40     ` Jonathan Cameron
  2025-10-27  9:44       ` Arnd Bergmann
  0 siblings, 1 reply; 18+ messages in thread
From: Jonathan Cameron @ 2025-10-23 16:40 UTC (permalink / raw)
  To: Conor Dooley
  Cc: Andrew Morton, Catalin Marinas, linux-cxl, linux-arm-kernel,
	linux-arch, linux-mm, Dan Williams, H . Peter Anvin,
	Peter Zijlstra, james.morse, Will Deacon, Davidlohr Bueso,
	linuxarm, Yushan Wang, Lorenzo Pieralisi, Mark Rutland,
	Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Andy Lutomirski, Dave Jiang, Arnd Bergmann, Krzysztof Kozlowski,
	Alexandre Belloni, Linus Walleij, Drew Fustini

On Wed, 22 Oct 2025 21:47:21 +0100
Conor Dooley <conor@kernel.org> wrote:

> On Wed, Oct 22, 2025 at 12:22:41PM -0700, Andrew Morton wrote:
> > On Wed, 22 Oct 2025 12:33:43 +0100 Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> >   
> > > Support system level interfaces for cache maintenance as found on some
> > > ARM64 systems. This is needed for correct functionality during various
> > > forms of memory hotplug (e.g. CXL). Typical hardware has MMIO interface
> > > found via ACPI DSDT.
> > > 
> > > Includes parameter changes to cpu_cache_invalidate_memregion() but no
> > > functional changes for architectures that already support this call.  
> > 
> > I see additions to lib/ so presumably there is an expectation that
> > other architectures might use this.
> > 
> > Please expand on this.  Any particular architectures in mind?  Any
> > words of wisdom which maintainers of those architectures might benefit
> > from?  
> 
> It seems fairly probable that we're gonna end up with riscv systems
> where drivers are being used for both this and the existing non-standard
> cache ops stuff.
> 
> > > How to merge?  When this is ready to proceed (so subject to review
> > > feedback on this version), I'm not sure what the best route into the
> > > kernel is. Conor could take the lot via his tree for drivers/cache but
> > > the generic changes perhaps suggest it might be better if Andrew
> > > handles this?  Any merge conflicts in drivers/cache will be trivial
> > > build file stuff. Or maybe even take it throug one of the affected
> > > trees such as CXL.  
> > 
> > Let's not split the series up.  Either CXL or COnor's tree is fine my
> > me.  
> 
> CXL is fine by me, greater volume there probably by orders of magnitude.
> 

On CXL discord, some reasonable doubts were expressed about justifying
this to Linus via CXL. Which is fair given tiny overlap from a 'where
the code is' point of view and also it seems I went too far in trying to
avoid people interpreting this as affecting x86 systems (see earlier
versions for how my badly scoped cover letter distracted from what this
was doing) and focus in on what was specifically being enabled rather
than the generic bit. Hence it mentions arm64 only right now and right
at the top of the cover letter.

Given it's not Arm architecture (hence just one Kconfig line in Arm
specific code) I guess alternative is back to drivers/cache and Conor which
I see goes via SoC (so +CC SoC tree maintainers).

Given there will be a v5 I'll rewrite the cover letter to make it less
specific whilst still calling out that for now the only driver happens to
be in an Arm SoC. Will leave some time for additional review first though!

Thanks,

Jonathan






^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 6/6] cache: Support cache maintenance for HiSilicon SoC Hydra Home Agent
  2025-10-23 11:49     ` Jonathan Cameron
@ 2025-10-23 17:58       ` Conor Dooley
  0 siblings, 0 replies; 18+ messages in thread
From: Conor Dooley @ 2025-10-23 17:58 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Catalin Marinas, linux-cxl, linux-arm-kernel, linux-arch,
	linux-mm, Dan Williams, H . Peter Anvin, Peter Zijlstra,
	Andrew Morton, james.morse, Will Deacon, Davidlohr Bueso,
	linuxarm, Yushan Wang, Lorenzo Pieralisi, Mark Rutland,
	Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Andy Lutomirski, Dave Jiang

[-- Attachment #1: Type: text/plain, Size: 7032 bytes --]

On Thu, Oct 23, 2025 at 12:49:14PM +0100, Jonathan Cameron wrote:
> On Wed, 22 Oct 2025 22:39:28 +0100
> Conor Dooley <conor@kernel.org> wrote:
> 
> Hi Conor,
> 
> > On Wed, Oct 22, 2025 at 12:33:49PM +0100, Jonathan Cameron wrote:
> > 
> > > +static int hisi_soc_hha_wbinv(struct cache_coherency_ops_inst *cci,
> > > +			struct cc_inval_params *invp)
> > > +{
> > > +	struct hisi_soc_hha *soc_hha =
> > > +		container_of(cci, struct hisi_soc_hha, cci);
> > > +	phys_addr_t top, addr = invp->addr;
> > > +	size_t size = invp->size;
> > > +	u32 reg;
> > > +
> > > +	if (!size)
> > > +		return -EINVAL;
> > > +
> > > +	addr = ALIGN_DOWN(addr, HISI_HHA_MAINT_ALIGN);
> > > +	top = ALIGN(addr + size, HISI_HHA_MAINT_ALIGN);
> > > +	size = top - addr;
> > > +
> > > +	guard(mutex)(&soc_hha->lock);
> > > +
> > > +	if (!hisi_hha_cache_maintain_wait_finished(soc_hha))
> > > +		return -EBUSY;
> > > +
> > > +	/*
> > > +	 * Hardware will search for addresses ranging [addr, addr + size - 1],
> > > +	 * last byte included, and perform maintain in 128 byte granule
> > > +	 * on those cachelines which contain the addresses.
> > > +	 */  
> > 
> > Hmm, does this mean that the IP has some built-in handling for there
> > being more than one "agent" in a system? IOW, if the address is not in
> > its range, then the search will just fail into a NOP?
> 
> Exactly that. NOP if nothing to do. The hardware is only tracking
> a subset of what it might contain (depending on which cachelines are
> actually in caches) so it's very much a 'clear this if you happen to
> have it' command.  Even if it is in the subset of PA being covered by
> an instance, many cases will be a 'miss' and hence a NOP.

Okay, cool. I kinda figured this was the mostly outcome, when yous put
"search" into the comment.

> > If that's not the case, is this particular "agent" by design not suitable
> > for a system like that? Or will a dual hydra home agent system come with
> > a new ACPI ID that we can use to deal with that kind of situation?
> 
> Existing systems have multiple instances of this hardware block.
> 
> Simplifying things over reality just to make this explanation less
> messy.  (ignoring other levels of interleaving beyond the Point of
> Coherency etc).
> 
> In servers the DRAM access are pretty much always interleaved 
> (usually at cache line granularity). That interleaving may go very
> different physical locations on a die or across multiple dies.
> 
> Similarly the agent responsible for tracking the coherency state
> (easy to think of this as a complete directory but it's never that
> simple) is distributed so that it is on the path to the DRAM. Hence
> if we have N way interleave there maybe N separate agents responsible for
> different parts of the range 0..(64*N-1) (taking smallest possible
> flush that would have to go to all those agents).

Well, thanks for the explanation.. I was only looking to know if there
were multiple, since it wasn't clear, but the reason why you do is
welcome.

> > (Although I don't know enough about ACPI to know where you'd even get
> > the information about what instance handles what range from...)
> 
> We don't today. It would be easy to encode that information
> as a resource and it may make sense for larger systems depending
> on exactly how the coherency fabric in a system works. I'd definitely
> expect to see some drivers doing this. Those drivers could then prefilter.

Okay cool. I can clearly see how it'd be done in DT land, if required,
but didn't know if it was possible on ACPI systems.

> 
> Interleaving gets really complex so any description is likely to only
> provide a conservative superset of what is actually handled by a given
> agent.
> 
> > 
> > > +	size -= 1;
> > > +
> > > +	writel(lower_32_bits(addr), soc_hha->base + HISI_HHA_START_L);
> > > +	writel(upper_32_bits(addr), soc_hha->base + HISI_HHA_START_H);
> > > +	writel(lower_32_bits(size), soc_hha->base + HISI_HHA_LEN_L);
> > > +	writel(upper_32_bits(size), soc_hha->base + HISI_HHA_LEN_H);
> > > +
> > > +	reg = FIELD_PREP(HISI_HHA_CTRL_TYPE, 1); /* Clean Invalid */
> > > +	reg |= HISI_HHA_CTRL_RANGE | HISI_HHA_CTRL_EN;
> > > +	writel(reg, soc_hha->base + HISI_HHA_CTRL);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +static int hisi_soc_hha_done(struct cache_coherency_ops_inst *cci)
> > > +{
> > > +	struct hisi_soc_hha *soc_hha =
> > > +		container_of(cci, struct hisi_soc_hha, cci);
> > > +
> > > +	guard(mutex)(&soc_hha->lock);
> > > +	if (!hisi_hha_cache_maintain_wait_finished(soc_hha))
> > > +		return -ETIMEDOUT;
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +static const struct cache_coherency_ops hha_ops = {
> > > +	.wbinv = hisi_soc_hha_wbinv,
> > > +	.done = hisi_soc_hha_done,
> > > +};
> > > +
> > > +static int hisi_soc_hha_probe(struct platform_device *pdev)
> > > +{
> > > +	struct hisi_soc_hha *soc_hha;
> > > +	struct resource *mem;
> > > +	int ret;
> > > +
> > > +	soc_hha = cache_coherency_ops_instance_alloc(&hha_ops,
> > > +						     struct hisi_soc_hha, cci);
> > > +	if (!soc_hha)
> > > +		return -ENOMEM;
> > > +
> > > +	platform_set_drvdata(pdev, soc_hha);
> > > +
> > > +	mutex_init(&soc_hha->lock);
> > > +
> > > +	mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> > > +	if (!mem) {
> > > +		ret = -ENOMEM;
> > > +		goto err_free_cci;
> > > +	}
> > > +
> > > +	/*
> > > +	 * HHA cache driver share the same register region with HHA uncore PMU
> > > +	 * driver in hardware's perspective, none of them should reserve the
> > > +	 * resource to itself only.  Here exclusive access verification is
> > > +	 * avoided by calling devm_ioremap instead of devm_ioremap_resource to  
> > 
> > The comment here doesn't exactly match the code, dunno if you went away
> > from devm some reason and just forgot to to make the change or the other
> > way around? Not a big deal obviously, but maybe you forgot to do
> > something you intended doing. It's mentioned in the commit message too.
> 
> Ah. Indeed stale comment, I'll drop that.
> 
> Going away from devm was mostly a hang over from similar discussions in fwctl
> where I copied the pattern of embedded device(there)/kref(here) and reluctance
> to hide way the final put().
> 
> > 
> > Other than the question I have about the multi-"agent" stuff, this
> > looks fine to me. I assume it's been thought about and is fine for w/e
> > reason, but I'd like to know what that is.
> 
> I'll see if I can craft a short intro bit of documentation for the
> top of this driver file to state clearly that there are lots of instances
> of this in a system and that a requests to clear something that isn't 'theirs'
> results in a NOP.  Better to have that available so anyone writing
> a similar driver thinks about whether that applies to what they have or
> if they need to do in driver filtering.

Yeah, adding a comment would be ideal, thanks.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 0/6] Cache coherency management subsystem
  2025-10-23 16:40     ` Jonathan Cameron
@ 2025-10-27  9:44       ` Arnd Bergmann
  2025-10-28 11:43         ` Jonathan Cameron
  0 siblings, 1 reply; 18+ messages in thread
From: Arnd Bergmann @ 2025-10-27  9:44 UTC (permalink / raw)
  To: Jonathan Cameron, Conor Dooley
  Cc: Andrew Morton, Catalin Marinas, linux-cxl, linux-arm-kernel,
	Linux-Arch, linux-mm, Dan Williams, H. Peter Anvin,
	Peter Zijlstra, James Morse, Will Deacon, Davidlohr Bueso,
	linuxarm, Yushan Wang, Lorenzo Pieralisi, Mark Rutland,
	Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Andy Lutomirski, Dave Jiang, Krzysztof Kozlowski,
	Alexandre Belloni, Linus Walleij, Drew Fustini

On Thu, Oct 23, 2025, at 18:40, Jonathan Cameron wrote:
> On Wed, 22 Oct 2025 21:47:21 +0100 Conor Dooley <conor@kernel.org> wrote:

> On CXL discord, some reasonable doubts were expressed about justifying
> this to Linus via CXL. Which is fair given tiny overlap from a 'where
> the code is' point of view and also it seems I went too far in trying to
> avoid people interpreting this as affecting x86 systems (see earlier
> versions for how my badly scoped cover letter distracted from what this
> was doing) and focus in on what was specifically being enabled rather
> than the generic bit. Hence it mentions arm64 only right now and right
> at the top of the cover letter.
>
> Given it's not Arm architecture (hence just one Kconfig line in Arm
> specific code) I guess alternative is back to drivers/cache and Conor which
> I see goes via SoC (so +CC SoC tree maintainers).

I tried to understand the driver from the cover letter and the
implementation, but I think I still have some fundamental questions
about which parts of the system require this for coherency with
one another.

drivers/cache/* is about keeping coherency between DMA masters
that lack support for snooping the CPU caches on low-end SoCs.
Does the new code fit into the same category?
Or is this about flushing cacheable mappings on CXL devices
that are mapped as MMIO into the CPU physical address space,
which sounds like it would be out of scope for drivers/cache?

If it's the first of those two scenarios, we may want to
generalize the existing riscv_nonstd_cache_ops structure into
something that can be used across architectures.

     Arnd

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 0/6] Cache coherency management subsystem
  2025-10-27  9:44       ` Arnd Bergmann
@ 2025-10-28 11:43         ` Jonathan Cameron
  0 siblings, 0 replies; 18+ messages in thread
From: Jonathan Cameron @ 2025-10-28 11:43 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Conor Dooley, Andrew Morton, Catalin Marinas, linux-cxl,
	linux-arm-kernel, Linux-Arch, linux-mm, Dan Williams,
	H. Peter Anvin, Peter Zijlstra, James Morse, Will Deacon,
	Davidlohr Bueso, linuxarm, Yushan Wang, Lorenzo Pieralisi,
	Mark Rutland, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, Andy Lutomirski, Dave Jiang,
	Krzysztof Kozlowski, Alexandre Belloni, Linus Walleij,
	Drew Fustini

On Mon, 27 Oct 2025 10:44:03 +0100
"Arnd Bergmann" <arnd@arndb.de> wrote:

> On Thu, Oct 23, 2025, at 18:40, Jonathan Cameron wrote:
> > On Wed, 22 Oct 2025 21:47:21 +0100 Conor Dooley <conor@kernel.org> wrote:  
> 
> > On CXL discord, some reasonable doubts were expressed about justifying
> > this to Linus via CXL. Which is fair given tiny overlap from a 'where
> > the code is' point of view and also it seems I went too far in trying to
> > avoid people interpreting this as affecting x86 systems (see earlier
> > versions for how my badly scoped cover letter distracted from what this
> > was doing) and focus in on what was specifically being enabled rather
> > than the generic bit. Hence it mentions arm64 only right now and right
> > at the top of the cover letter.
> >
> > Given it's not Arm architecture (hence just one Kconfig line in Arm
> > specific code) I guess alternative is back to drivers/cache and Conor which
> > I see goes via SoC (so +CC SoC tree maintainers).  
> 
> I tried to understand the driver from the cover letter and the
> implementation, but I think I still have some fundamental questions
> about which parts of the system require this for coherency with
> one another.
> 
> drivers/cache/* is about keeping coherency between DMA masters
> that lack support for snooping the CPU caches on low-end SoCs.
> Does the new code fit into the same category?

Hi Arnd,

Sort of if you squint a bit.  The biggest difference is every architecture
has to do something explicit and often that's different from what it would
do for local (same host) non coherent DMA. In some cases it's a CPU
instruction (x86 - which this patch set doesn't touch) in others an MMIO
interface (all known arm64 implementations today).

The closest to your question that this comes is if we do end up using this
for the multi host non-coherent shared memory case.

Before I expand on this, note it is very doubtful this use case
of the patch set here will be realized as the performance will
be terrible unless the change of ownership is very very rare.

In that case you could conceive of it being a bit like two symmetric
DMA masters combined with CPUs. From viewpoint of each host the other one
is the DMA master that doesn't support snooping.

View from Host A... Host B looks like non coherent DMA
 ________                            ________
|        |                          |        |
| Host A |                          | Host B |
|  CPU   |---------- MEM -----------|  (CPU) |
| (DMA)  |                          |  DMA   | 
|________|                          |________|

View from Host B... Host A looks like non coherent DMA
 ________                            ________
|        |                          |        |
| Host A |                          | Host B |
| (CPU)  |---------- MEM -----------|  CPU   |
|  DMA   |                          | (DMA)  | 
|________|                          |________|

In my opinion, new architecture is needed to make fine grained sharing
without hardware coherency viable. Note that CXL supports fully hardware
coherent multi host shared memory which resolves that problem but doesn't
cover the hotplug aspect as the device won't flush out lines it never
knew the host cached (before it was hot plugged!)

Use case that matters is much closer to flushing because memory hotplug
occurred - something hosts presumably do, but hide in firmware when it's
physical DDR hotplug).  Arguably you could conceive of persistent memory
hotplug as being non coherent DMA done by some host at an earlier time
that is then exposed to the local host by the hotplug event.   Kind of
a stretch though.

> Or is this about flushing cacheable mappings on CXL devices
> that are mapped as MMIO into the CPU physical address space,
> which sounds like it would be out of scope for drivers/cache?

Not MMIO. The memory in question is mapped as normal RAM - just the same
as DDR DIMM or similar.

As above, the easiest thing to think of it is as is memory hotplug where
the memory may contain data (so could think of it as similar to
hotplugging possibly persistent memory).

Before the memory is there you can be served zeros (or poison) and when
the memory is plugged in you need to make sure those zeros are not in
cache.  More complex sequences of removing memory then putting other
memory back at the same PA are covered as well.

> 
> If it's the first of those two scenarios, we may want to
> generalize the existing riscv_nonstd_cache_ops structure into
> something that can be used across architectures.

There are some strong similarities, hence very similar function prototype
for wbinv().  We could generalize that infrastructure and a) make it handle
multiple (heterogeneous) flushing agents b) polling for completion c)
late arrival of those agents which is a problem for anything that can't
be made to wait by user space (not a problem for use cases I need this
for, userspace is always in the loop for policy decisions anyway).

The hard part would be that we'd have to add infrastructure to distinguish
when the operation should be called and that the level of flush will be
dependent on that.  An example is the use of the riscv ops in
arch_invalidate_pmem(). That's used for clearing poison for example.

On x86 the implementation of that is clflush_cache_range() whereas today
the implementation we are replacing here is the much heavier WBINVD
(whether we could use clflush is an open question that was discussed in
earlier versions of this patch set - not in scope here though).
On arm64 for this case today dcache_inval_poc() is enough as long as
we are dealing with a single host (as those code paths are). If this
flush did go far enough I believe a secondary issue is we would have to
jump through hoops to create a temporary VA to PA mapping to memory
that doesn't actually exist at some points in time where we flush/

On arm64 at least, the Point of Coherence is currently a single host thing
so not guaranteed to write far enough for the 'hotplug' of memory case
(and does not do so on some existing hardware as for fully coherent
single hosts this is a noop).

Also the DMA use cases of the existing riscv ops are not applicable here
at all as when DMA is going on we'd better be sure the memory remains
available and doesn't need any flushes.

Longer term I can see we might want to combine the two approaches or
(this patch set and the existing riscv specific infrastructure), but
I have little idea on how to do that with the usecases we have
visibility of today.  This stuff is all in kernel so I'm not that worried
about getting everything perfect first time.

The drivers/cache placement was mostly about finding a place where
we'd naturally see exactly the sort of overlap you've drawn attention to.

Thanks,

Jonathan

> 
>      Arnd

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-10-28 11:44 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-22 11:33 [PATCH v4 0/6] Cache coherency management subsystem Jonathan Cameron
2025-10-22 11:33 ` [PATCH v4 1/6] memregion: Drop unused IORES_DESC_* parameter from cpu_cache_invalidate_memregion() Jonathan Cameron
2025-10-22 11:33 ` [PATCH v4 2/6] memregion: Support fine grained invalidate by cpu_cache_invalidate_memregion() Jonathan Cameron
2025-10-22 11:33 ` [PATCH v4 3/6] lib: Support ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION Jonathan Cameron
2025-10-22 21:11   ` Conor Dooley
2025-10-23 11:13     ` Jonathan Cameron
2025-10-22 11:33 ` [PATCH v4 4/6] arm64: Select GENERIC_CPU_CACHE_MAINTENANCE Jonathan Cameron
2025-10-22 11:33 ` [PATCH v4 5/6] MAINTAINERS: Add Jonathan Cameron to drivers/cache and add lib/cache_maint.c + header Jonathan Cameron
2025-10-22 11:33 ` [PATCH v4 6/6] cache: Support cache maintenance for HiSilicon SoC Hydra Home Agent Jonathan Cameron
2025-10-22 21:39   ` Conor Dooley
2025-10-23 11:49     ` Jonathan Cameron
2025-10-23 17:58       ` Conor Dooley
2025-10-22 19:22 ` [PATCH v4 0/6] Cache coherency management subsystem Andrew Morton
2025-10-22 20:47   ` Conor Dooley
2025-10-23 16:40     ` Jonathan Cameron
2025-10-27  9:44       ` Arnd Bergmann
2025-10-28 11:43         ` Jonathan Cameron
2025-10-23 12:31   ` Jonathan Cameron

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox