linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/4] iommu: Add IOMMU_DEBUG_PAGEALLOC sanitizer
@ 2026-01-06 16:21 Mostafa Saleh
  2026-01-06 16:21 ` [PATCH v5 1/4] iommu: Add page_ext for IOMMU_DEBUG_PAGEALLOC Mostafa Saleh
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Mostafa Saleh @ 2026-01-06 16:21 UTC (permalink / raw)
  To: linux-mm, iommu, linux-kernel, linux-doc
  Cc: corbet, joro, will, robin.murphy, akpm, vbabka, surenb, mhocko,
	jackmanb, hannes, ziy, david, lorenzo.stoakes, Liam.Howlett,
	rppt, xiaqinxin, baolu.lu, rdunlap, Mostafa Saleh

Overview
--------
This patch series introduces a new debugging feature,
IOMMU_DEBUG_PAGEALLOC, designed to catch DMA use-after-free bugs
and IOMMU mapping leaks from buggy drivers.

The kernel has powerful sanitizers like KASAN and DEBUG_PAGEALLOC
for catching CPU-side memory corruption. However, there is limited
runtime sanitization for DMA mappings managed by the IOMMU. A buggy
driver can free a page while it is still mapped for DMA, leading to
memory corruption or use-after-free vulnerabilities when that page is
reallocated and used for a different purpose.

Inspired by DEBUG_PAGEALLOC, this sanitizer tracks IOMMU mappings on a
per-page basis, as it’s not possible to unmap the pages, because it
requires to lock and walk all domains on every kernel free, instead we
rely on page_ext to add an IOMMU-specific mapping reference count for
each page.
And on each page allocated/freed from the kernel we simply check the
count and WARN if it is not zero, and dumping page owner information
if enabled.

Concurrency
-----------
By design this check is racy where one caller can map pages just after
the check, which can lead to false negatives.
In my opinion this is acceptable for sanitizers (for ex KCSAN have
that property).
Otherwise we have to implement locks in iommu_map/unmap for all domains
which is not favourable even for a debug feature.
The sanitizer only guarantees that the refcount itself doesn’t get
corrupted using atomics. And there are no false positives.

CPU vs IOMMU Page Size
----------------------
IOMMUs can use different page sizes and which can be non-homogeneous;
not even all of them have the same page size.

To solve this, the refcount is always incremented and decremented in
units of the smallest page size supported by the IOMMU domain. This
ensures the accounting remains consistent regardless of the size of
the map or unmap operation, otherwise double counting can happen.

Testing & Performance
---------------------
This was tested on Morello with Arm64 + SMMUv3
Did some testing Lenovo IdeaCentre X Gen 10 Snapdragon
Did some testing on Qemu including different SMMUv3/CPU page size (arm64).

I also ran dma_map_benchmark on Morello:

echo dma_map_benchmark > /sys/bus/pci/devices/0000\:06\:00.0/driver_override
echo 0000:06:00.0 >  /sys/bus/pci/devices/0000\:06\:00.0/driver/unbind
echo 0000:06:00.0 > /sys/bus/pci/drivers/dma_map_benchmark/bind
./dma_map_benchmark -t $threads -g $nr_pages

CONFIG refers to "CONFIG_IOMMU_DEBUG_PAGEALLOC"
cmdline refers to "iommu.debug_pagealloc"
Numbers are (map latency)/(unmap latency), lower is better.

			CONFIG=n    CONFIG=y    CONFIG=y
			            cmdline=0   cmdline=1
4K - 1 thread		0.1/0.6     0.1/0.6     0.1/0.7
4K - 4 threads		0.1/1.1     0.1/1.0     0.2/1.1
1M - 1 thread		0.8/21.2    0.7/21.2    5.4/42.3
1M - 4 threads		1.1/45.9    1.1/46.0    5.9/45.1

Changes in v5:
v4: https://lore.kernel.org/all/20251211125928.3258905-1-smostafa@google.com/
- Fix typo in comment
- Collect Baolu R-bs

Main changes in v4:
v3: https://lore.kernel.org/all/20251124200811.2942432-1-smostafa@google.com/
- Update the kernel parameter format in docs based on Randy feedback
- Update commit subjects
- Add IOMMU only functions in iommu-priv.h based on Baolu feedback

Main changes in v3: (Most of them addressing Will comments)
v2: https://lore.kernel.org/linux-iommu/20251106163953.1971067-1-smostafa@google.com/
- Reword the Kconfig help
- Use unmap_begin/end instead of unmap/remap
- Use relaxed accessors when refcounting
- Fix a bug with checking the returned address from iova_to_phys
- Add more hardening checks (overflow)
- Add more debug info on assertions (dump_page_owner())
- Handle cases where unmap returns larger size as the core code seems
  to tolerate that.
- Drop Tested-by tags from Qinxin as the code logic changed

Main changes in v2:
v1: https://lore.kernel.org/linux-iommu/20251003173229.1533640-1-smostafa@google.com/
- Address Jörg comments about #ifdefs and static keys
- Reword the Kconfig help
- Drop RFC
- Collect t-b from Qinxin
- Minor cleanups

Mostafa Saleh (4):
  iommu: Add page_ext for IOMMU_DEBUG_PAGEALLOC
  iommu: Add calls for IOMMU_DEBUG_PAGEALLOC
  iommu: debug-pagealloc: Track IOMMU pages
  iommu: debug-pagealloc: Check mapped/unmapped kernel memory

 .../admin-guide/kernel-parameters.txt         |   9 +
 drivers/iommu/Kconfig                         |  19 ++
 drivers/iommu/Makefile                        |   1 +
 drivers/iommu/iommu-debug-pagealloc.c         | 174 ++++++++++++++++++
 drivers/iommu/iommu-priv.h                    |  58 ++++++
 drivers/iommu/iommu.c                         |  11 +-
 include/linux/iommu-debug-pagealloc.h         |  32 ++++
 include/linux/mm.h                            |   5 +
 mm/page_ext.c                                 |   4 +
 9 files changed, 311 insertions(+), 2 deletions(-)
 create mode 100644 drivers/iommu/iommu-debug-pagealloc.c
 create mode 100644 include/linux/iommu-debug-pagealloc.h

-- 
2.52.0.351.gbe84eed79e-goog



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v5 1/4] iommu: Add page_ext for IOMMU_DEBUG_PAGEALLOC
  2026-01-06 16:21 [PATCH v5 0/4] iommu: Add IOMMU_DEBUG_PAGEALLOC sanitizer Mostafa Saleh
@ 2026-01-06 16:21 ` Mostafa Saleh
  2026-01-06 18:50   ` David Hildenbrand (Red Hat)
                     ` (2 more replies)
  2026-01-06 16:21 ` [PATCH v5 2/4] iommu: Add calls " Mostafa Saleh
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 20+ messages in thread
From: Mostafa Saleh @ 2026-01-06 16:21 UTC (permalink / raw)
  To: linux-mm, iommu, linux-kernel, linux-doc
  Cc: corbet, joro, will, robin.murphy, akpm, vbabka, surenb, mhocko,
	jackmanb, hannes, ziy, david, lorenzo.stoakes, Liam.Howlett,
	rppt, xiaqinxin, baolu.lu, rdunlap, Mostafa Saleh

Add a new config IOMMU_DEBUG_PAGEALLOC, which registers new data to
page_ext.

This config will be used by the IOMMU API to track pages mapped in
the IOMMU to catch drivers trying to free kernel memory that they
still map in their domains, causing all types of memory corruption.

This behaviour is disabled by default and can be enabled using
kernel cmdline iommu.debug_pagealloc.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
 .../admin-guide/kernel-parameters.txt         |  9 ++++++
 drivers/iommu/Kconfig                         | 19 +++++++++++
 drivers/iommu/Makefile                        |  1 +
 drivers/iommu/iommu-debug-pagealloc.c         | 32 +++++++++++++++++++
 include/linux/iommu-debug-pagealloc.h         | 17 ++++++++++
 mm/page_ext.c                                 |  4 +++
 6 files changed, 82 insertions(+)
 create mode 100644 drivers/iommu/iommu-debug-pagealloc.c
 create mode 100644 include/linux/iommu-debug-pagealloc.h

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a8d0afde7f85..d484d9d8d0a4 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2675,6 +2675,15 @@ Kernel parameters
 			1 - Bypass the IOMMU for DMA.
 			unset - Use value of CONFIG_IOMMU_DEFAULT_PASSTHROUGH.
 
+	iommu.debug_pagealloc=
+			[KNL,EARLY] When CONFIG_IOMMU_DEBUG_PAGEALLOC is set, this
+			parameter enables the feature at boot time. By default, it
+			is disabled and the system behaves the same way as a kernel
+			built without CONFIG_IOMMU_DEBUG_PAGEALLOC.
+			Format: { "0" | "1" }
+			0 - Sanitizer disabled.
+			1 - Sanitizer enabled, expect runtime overhead.
+
 	io7=		[HW] IO7 for Marvel-based Alpha systems
 			See comment before marvel_specify_io7 in
 			arch/alpha/kernel/core_marvel.c.
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 99095645134f..f86262b11416 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -384,6 +384,25 @@ config SPRD_IOMMU
 
 	  Say Y here if you want to use the multimedia devices listed above.
 
+config IOMMU_DEBUG_PAGEALLOC
+	bool "Debug IOMMU mappings against page allocations"
+	depends on DEBUG_PAGEALLOC && IOMMU_API && PAGE_EXTENSION
+	help
+	  This enables a consistency check between the kernel page allocator and
+	  the IOMMU subsystem. It verifies that pages being allocated or freed
+	  are not currently mapped in any IOMMU domain.
+
+	  This helps detect DMA use-after-free bugs where a driver frees a page
+	  but forgets to unmap it from the IOMMU, potentially allowing a device
+	  to overwrite memory that the kernel has repurposed.
+
+	  These checks are best-effort and may not detect all problems.
+
+	  Due to performance overhead, this feature is disabled by default.
+	  You must enable "iommu.debug_pagealloc" from the kernel command
+	  line to activate the runtime checks.
+
+	  If unsure, say N.
 endif # IOMMU_SUPPORT
 
 source "drivers/iommu/generic_pt/Kconfig"
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 8e8843316c4b..0275821f4ef9 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -36,3 +36,4 @@ obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
 obj-$(CONFIG_IOMMU_IOPF) += io-pgfault.o
 obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o
 obj-$(CONFIG_APPLE_DART) += apple-dart.o
+obj-$(CONFIG_IOMMU_DEBUG_PAGEALLOC) += iommu-debug-pagealloc.o
diff --git a/drivers/iommu/iommu-debug-pagealloc.c b/drivers/iommu/iommu-debug-pagealloc.c
new file mode 100644
index 000000000000..4022e9af7f27
--- /dev/null
+++ b/drivers/iommu/iommu-debug-pagealloc.c
@@ -0,0 +1,32 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2025 - Google Inc
+ * Author: Mostafa Saleh <smostafa@google.com>
+ * IOMMU API debug page alloc sanitizer
+ */
+#include <linux/atomic.h>
+#include <linux/iommu-debug-pagealloc.h>
+#include <linux/kernel.h>
+#include <linux/page_ext.h>
+
+static bool needed;
+
+struct iommu_debug_metadata {
+	atomic_t ref;
+};
+
+static __init bool need_iommu_debug(void)
+{
+	return needed;
+}
+
+struct page_ext_operations page_iommu_debug_ops = {
+	.size = sizeof(struct iommu_debug_metadata),
+	.need = need_iommu_debug,
+};
+
+static int __init iommu_debug_pagealloc(char *str)
+{
+	return kstrtobool(str, &needed);
+}
+early_param("iommu.debug_pagealloc", iommu_debug_pagealloc);
diff --git a/include/linux/iommu-debug-pagealloc.h b/include/linux/iommu-debug-pagealloc.h
new file mode 100644
index 000000000000..83e64d70bf6c
--- /dev/null
+++ b/include/linux/iommu-debug-pagealloc.h
@@ -0,0 +1,17 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2025 - Google Inc
+ * Author: Mostafa Saleh <smostafa@google.com>
+ * IOMMU API debug page alloc sanitizer
+ */
+
+#ifndef __LINUX_IOMMU_DEBUG_PAGEALLOC_H
+#define __LINUX_IOMMU_DEBUG_PAGEALLOC_H
+
+#ifdef CONFIG_IOMMU_DEBUG_PAGEALLOC
+
+extern struct page_ext_operations page_iommu_debug_ops;
+
+#endif /* CONFIG_IOMMU_DEBUG_PAGEALLOC */
+
+#endif /* __LINUX_IOMMU_DEBUG_PAGEALLOC_H */
diff --git a/mm/page_ext.c b/mm/page_ext.c
index d7396a8970e5..297e4cd8ce90 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -11,6 +11,7 @@
 #include <linux/page_table_check.h>
 #include <linux/rcupdate.h>
 #include <linux/pgalloc_tag.h>
+#include <linux/iommu-debug-pagealloc.h>
 
 /*
  * struct page extension
@@ -89,6 +90,9 @@ static struct page_ext_operations *page_ext_ops[] __initdata = {
 #ifdef CONFIG_PAGE_TABLE_CHECK
 	&page_table_check_ops,
 #endif
+#ifdef CONFIG_IOMMU_DEBUG_PAGEALLOC
+	&page_iommu_debug_ops,
+#endif
 };
 
 unsigned long page_ext_size;
-- 
2.52.0.351.gbe84eed79e-goog



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v5 2/4] iommu: Add calls for IOMMU_DEBUG_PAGEALLOC
  2026-01-06 16:21 [PATCH v5 0/4] iommu: Add IOMMU_DEBUG_PAGEALLOC sanitizer Mostafa Saleh
  2026-01-06 16:21 ` [PATCH v5 1/4] iommu: Add page_ext for IOMMU_DEBUG_PAGEALLOC Mostafa Saleh
@ 2026-01-06 16:21 ` Mostafa Saleh
  2026-01-06 21:17   ` Samiullah Khawaja
                     ` (2 more replies)
  2026-01-06 16:21 ` [PATCH v5 3/4] iommu: debug-pagealloc: Track IOMMU pages Mostafa Saleh
                   ` (2 subsequent siblings)
  4 siblings, 3 replies; 20+ messages in thread
From: Mostafa Saleh @ 2026-01-06 16:21 UTC (permalink / raw)
  To: linux-mm, iommu, linux-kernel, linux-doc
  Cc: corbet, joro, will, robin.murphy, akpm, vbabka, surenb, mhocko,
	jackmanb, hannes, ziy, david, lorenzo.stoakes, Liam.Howlett,
	rppt, xiaqinxin, baolu.lu, rdunlap, Mostafa Saleh

Add calls for the new iommu debug config IOMMU_DEBUG_PAGEALLOC:
- iommu_debug_init: Enable the debug mode if configured by the user.
- iommu_debug_map: Track iommu pages mapped, using physical address.
- iommu_debug_unmap_begin: Track start of iommu unmap operation, with
  IOVA and size.
- iommu_debug_unmap_end: Track the end of unmap operation, passing the
  actual unmapped size versus the tracked one at unmap_begin.

We have to do the unmap_begin/end as once pages are unmapped we lose
the information of the physical address.
This is racy, but the API is racy by construction as it uses refcounts
and doesn't attempt to lock/synchronize with the IOMMU API as that will
be costly, meaning that possibility of false negative exists.

Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
 drivers/iommu/iommu-debug-pagealloc.c | 28 +++++++++++++
 drivers/iommu/iommu-priv.h            | 58 +++++++++++++++++++++++++++
 drivers/iommu/iommu.c                 | 11 ++++-
 include/linux/iommu-debug-pagealloc.h |  1 +
 4 files changed, 96 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iommu-debug-pagealloc.c b/drivers/iommu/iommu-debug-pagealloc.c
index 4022e9af7f27..1d343421da98 100644
--- a/drivers/iommu/iommu-debug-pagealloc.c
+++ b/drivers/iommu/iommu-debug-pagealloc.c
@@ -5,11 +5,15 @@
  * IOMMU API debug page alloc sanitizer
  */
 #include <linux/atomic.h>
+#include <linux/iommu.h>
 #include <linux/iommu-debug-pagealloc.h>
 #include <linux/kernel.h>
 #include <linux/page_ext.h>
 
+#include "iommu-priv.h"
+
 static bool needed;
+DEFINE_STATIC_KEY_FALSE(iommu_debug_initialized);
 
 struct iommu_debug_metadata {
 	atomic_t ref;
@@ -25,6 +29,30 @@ struct page_ext_operations page_iommu_debug_ops = {
 	.need = need_iommu_debug,
 };
 
+void __iommu_debug_map(struct iommu_domain *domain, phys_addr_t phys, size_t size)
+{
+}
+
+void __iommu_debug_unmap_begin(struct iommu_domain *domain,
+			       unsigned long iova, size_t size)
+{
+}
+
+void __iommu_debug_unmap_end(struct iommu_domain *domain,
+			     unsigned long iova, size_t size,
+			     size_t unmapped)
+{
+}
+
+void iommu_debug_init(void)
+{
+	if (!needed)
+		return;
+
+	pr_info("iommu: Debugging page allocations, expect overhead or disable iommu.debug_pagealloc");
+	static_branch_enable(&iommu_debug_initialized);
+}
+
 static int __init iommu_debug_pagealloc(char *str)
 {
 	return kstrtobool(str, &needed);
diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
index c95394cd03a7..aaffad5854fc 100644
--- a/drivers/iommu/iommu-priv.h
+++ b/drivers/iommu/iommu-priv.h
@@ -5,6 +5,7 @@
 #define __LINUX_IOMMU_PRIV_H
 
 #include <linux/iommu.h>
+#include <linux/iommu-debug-pagealloc.h>
 #include <linux/msi.h>
 
 static inline const struct iommu_ops *dev_iommu_ops(struct device *dev)
@@ -65,4 +66,61 @@ static inline int iommufd_sw_msi(struct iommu_domain *domain,
 int iommu_replace_device_pasid(struct iommu_domain *domain,
 			       struct device *dev, ioasid_t pasid,
 			       struct iommu_attach_handle *handle);
+
+#ifdef CONFIG_IOMMU_DEBUG_PAGEALLOC
+
+void __iommu_debug_map(struct iommu_domain *domain, phys_addr_t phys,
+		       size_t size);
+void __iommu_debug_unmap_begin(struct iommu_domain *domain,
+			       unsigned long iova, size_t size);
+void __iommu_debug_unmap_end(struct iommu_domain *domain,
+			     unsigned long iova, size_t size, size_t unmapped);
+
+static inline void iommu_debug_map(struct iommu_domain *domain,
+				   phys_addr_t phys, size_t size)
+{
+	if (static_branch_unlikely(&iommu_debug_initialized))
+		__iommu_debug_map(domain, phys, size);
+}
+
+static inline void iommu_debug_unmap_begin(struct iommu_domain *domain,
+					   unsigned long iova, size_t size)
+{
+	if (static_branch_unlikely(&iommu_debug_initialized))
+		__iommu_debug_unmap_begin(domain, iova, size);
+}
+
+static inline void iommu_debug_unmap_end(struct iommu_domain *domain,
+					 unsigned long iova, size_t size,
+					 size_t unmapped)
+{
+	if (static_branch_unlikely(&iommu_debug_initialized))
+		__iommu_debug_unmap_end(domain, iova, size, unmapped);
+}
+
+void iommu_debug_init(void);
+
+#else
+static inline void iommu_debug_map(struct iommu_domain *domain,
+				   phys_addr_t phys, size_t size)
+{
+}
+
+static inline void iommu_debug_unmap_begin(struct iommu_domain *domain,
+					   unsigned long iova, size_t size)
+{
+}
+
+static inline void iommu_debug_unmap_end(struct iommu_domain *domain,
+					 unsigned long iova, size_t size,
+					 size_t unmapped)
+{
+}
+
+static inline void iommu_debug_init(void)
+{
+}
+
+#endif /* CONFIG_IOMMU_DEBUG_PAGEALLOC */
+
 #endif /* __LINUX_IOMMU_PRIV_H */
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 2ca990dfbb88..01b062575519 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -232,6 +232,8 @@ static int __init iommu_subsys_init(void)
 	if (!nb)
 		return -ENOMEM;
 
+	iommu_debug_init();
+
 	for (int i = 0; i < ARRAY_SIZE(iommu_buses); i++) {
 		nb[i].notifier_call = iommu_bus_notifier;
 		bus_register_notifier(iommu_buses[i], &nb[i]);
@@ -2562,10 +2564,12 @@ int iommu_map_nosync(struct iommu_domain *domain, unsigned long iova,
 	}
 
 	/* unroll mapping in case something went wrong */
-	if (ret)
+	if (ret) {
 		iommu_unmap(domain, orig_iova, orig_size - size);
-	else
+	} else {
 		trace_map(orig_iova, orig_paddr, orig_size);
+		iommu_debug_map(domain, orig_paddr, orig_size);
+	}
 
 	return ret;
 }
@@ -2627,6 +2631,8 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
 
 	pr_debug("unmap this: iova 0x%lx size 0x%zx\n", iova, size);
 
+	iommu_debug_unmap_begin(domain, iova, size);
+
 	/*
 	 * Keep iterating until we either unmap 'size' bytes (or more)
 	 * or we hit an area that isn't mapped.
@@ -2647,6 +2653,7 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
 	}
 
 	trace_unmap(orig_iova, size, unmapped);
+	iommu_debug_unmap_end(domain, orig_iova, size, unmapped);
 	return unmapped;
 }
 
diff --git a/include/linux/iommu-debug-pagealloc.h b/include/linux/iommu-debug-pagealloc.h
index 83e64d70bf6c..a439d6815ca1 100644
--- a/include/linux/iommu-debug-pagealloc.h
+++ b/include/linux/iommu-debug-pagealloc.h
@@ -9,6 +9,7 @@
 #define __LINUX_IOMMU_DEBUG_PAGEALLOC_H
 
 #ifdef CONFIG_IOMMU_DEBUG_PAGEALLOC
+DECLARE_STATIC_KEY_FALSE(iommu_debug_initialized);
 
 extern struct page_ext_operations page_iommu_debug_ops;
 
-- 
2.52.0.351.gbe84eed79e-goog



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v5 3/4] iommu: debug-pagealloc: Track IOMMU pages
  2026-01-06 16:21 [PATCH v5 0/4] iommu: Add IOMMU_DEBUG_PAGEALLOC sanitizer Mostafa Saleh
  2026-01-06 16:21 ` [PATCH v5 1/4] iommu: Add page_ext for IOMMU_DEBUG_PAGEALLOC Mostafa Saleh
  2026-01-06 16:21 ` [PATCH v5 2/4] iommu: Add calls " Mostafa Saleh
@ 2026-01-06 16:21 ` Mostafa Saleh
  2026-01-06 21:18   ` Samiullah Khawaja
  2026-01-07 15:21   ` Pranjal Shrivastava
  2026-01-06 16:22 ` [PATCH v5 4/4] iommu: debug-pagealloc: Check mapped/unmapped kernel memory Mostafa Saleh
  2026-01-07 15:24 ` [PATCH v5 0/4] iommu: Add IOMMU_DEBUG_PAGEALLOC sanitizer Pranjal Shrivastava
  4 siblings, 2 replies; 20+ messages in thread
From: Mostafa Saleh @ 2026-01-06 16:21 UTC (permalink / raw)
  To: linux-mm, iommu, linux-kernel, linux-doc
  Cc: corbet, joro, will, robin.murphy, akpm, vbabka, surenb, mhocko,
	jackmanb, hannes, ziy, david, lorenzo.stoakes, Liam.Howlett,
	rppt, xiaqinxin, baolu.lu, rdunlap, Mostafa Saleh

Using the new calls, use an atomic refcount to track how many times
a page is mapped in any of the IOMMUs.

For unmap we need to use iova_to_phys() to get the physical address
of the pages.

We use the smallest supported page size as the granularity of tracking
per domain.
This is important as it is possible to map pages and unmap them with
larger sizes (as in map_sg()) cases.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
 drivers/iommu/iommu-debug-pagealloc.c | 91 +++++++++++++++++++++++++++
 1 file changed, 91 insertions(+)

diff --git a/drivers/iommu/iommu-debug-pagealloc.c b/drivers/iommu/iommu-debug-pagealloc.c
index 1d343421da98..86ccb310a4a8 100644
--- a/drivers/iommu/iommu-debug-pagealloc.c
+++ b/drivers/iommu/iommu-debug-pagealloc.c
@@ -29,19 +29,110 @@ struct page_ext_operations page_iommu_debug_ops = {
 	.need = need_iommu_debug,
 };
 
+static struct page_ext *get_iommu_page_ext(phys_addr_t phys)
+{
+	struct page *page = phys_to_page(phys);
+	struct page_ext *page_ext = page_ext_get(page);
+
+	return page_ext;
+}
+
+static struct iommu_debug_metadata *get_iommu_data(struct page_ext *page_ext)
+{
+	return page_ext_data(page_ext, &page_iommu_debug_ops);
+}
+
+static void iommu_debug_inc_page(phys_addr_t phys)
+{
+	struct page_ext *page_ext = get_iommu_page_ext(phys);
+	struct iommu_debug_metadata *d = get_iommu_data(page_ext);
+
+	WARN_ON(atomic_inc_return_relaxed(&d->ref) <= 0);
+	page_ext_put(page_ext);
+}
+
+static void iommu_debug_dec_page(phys_addr_t phys)
+{
+	struct page_ext *page_ext = get_iommu_page_ext(phys);
+	struct iommu_debug_metadata *d = get_iommu_data(page_ext);
+
+	WARN_ON(atomic_dec_return_relaxed(&d->ref) < 0);
+	page_ext_put(page_ext);
+}
+
+/*
+ * IOMMU page size doesn't have to match the CPU page size. So, we use
+ * the smallest IOMMU page size to refcount the pages in the vmemmap.
+ * That is important as both map and unmap has to use the same page size
+ * to update the refcount to avoid double counting the same page.
+ * And as we can't know from iommu_unmap() what was the original page size
+ * used for map, we just use the minimum supported one for both.
+ */
+static size_t iommu_debug_page_size(struct iommu_domain *domain)
+{
+	return 1UL << __ffs(domain->pgsize_bitmap);
+}
+
 void __iommu_debug_map(struct iommu_domain *domain, phys_addr_t phys, size_t size)
 {
+	size_t off, end;
+	size_t page_size = iommu_debug_page_size(domain);
+
+	if (WARN_ON(!phys || check_add_overflow(phys, size, &end)))
+		return;
+
+	for (off = 0 ; off < size ; off += page_size) {
+		if (!pfn_valid(__phys_to_pfn(phys + off)))
+			continue;
+		iommu_debug_inc_page(phys + off);
+	}
+}
+
+static void __iommu_debug_update_iova(struct iommu_domain *domain,
+				      unsigned long iova, size_t size, bool inc)
+{
+	size_t off, end;
+	size_t page_size = iommu_debug_page_size(domain);
+
+	if (WARN_ON(check_add_overflow(iova, size, &end)))
+		return;
+
+	for (off = 0 ; off < size ; off += page_size) {
+		phys_addr_t phys = iommu_iova_to_phys(domain, iova + off);
+
+		if (!phys || !pfn_valid(__phys_to_pfn(phys)))
+			continue;
+
+		if (inc)
+			iommu_debug_inc_page(phys);
+		else
+			iommu_debug_dec_page(phys);
+	}
 }
 
 void __iommu_debug_unmap_begin(struct iommu_domain *domain,
 			       unsigned long iova, size_t size)
 {
+	__iommu_debug_update_iova(domain, iova, size, false);
 }
 
 void __iommu_debug_unmap_end(struct iommu_domain *domain,
 			     unsigned long iova, size_t size,
 			     size_t unmapped)
 {
+	if (unmapped == size)
+		return;
+
+	/*
+	 * If unmap failed, re-increment the refcount, but if it unmapped
+	 * larger size, decrement the extra part.
+	 */
+	if (unmapped < size)
+		__iommu_debug_update_iova(domain, iova + unmapped,
+					  size - unmapped, true);
+	else
+		__iommu_debug_update_iova(domain, iova + size,
+					  unmapped - size, false);
 }
 
 void iommu_debug_init(void)
-- 
2.52.0.351.gbe84eed79e-goog



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v5 4/4] iommu: debug-pagealloc: Check mapped/unmapped kernel memory
  2026-01-06 16:21 [PATCH v5 0/4] iommu: Add IOMMU_DEBUG_PAGEALLOC sanitizer Mostafa Saleh
                   ` (2 preceding siblings ...)
  2026-01-06 16:21 ` [PATCH v5 3/4] iommu: debug-pagealloc: Track IOMMU pages Mostafa Saleh
@ 2026-01-06 16:22 ` Mostafa Saleh
  2026-01-06 21:19   ` Samiullah Khawaja
  2026-01-07 15:24 ` [PATCH v5 0/4] iommu: Add IOMMU_DEBUG_PAGEALLOC sanitizer Pranjal Shrivastava
  4 siblings, 1 reply; 20+ messages in thread
From: Mostafa Saleh @ 2026-01-06 16:22 UTC (permalink / raw)
  To: linux-mm, iommu, linux-kernel, linux-doc
  Cc: corbet, joro, will, robin.murphy, akpm, vbabka, surenb, mhocko,
	jackmanb, hannes, ziy, david, lorenzo.stoakes, Liam.Howlett,
	rppt, xiaqinxin, baolu.lu, rdunlap, Mostafa Saleh

Now, as the page_ext holds count of IOMMU mappings, we can use it to
assert that any page allocated/freed is indeed not in the IOMMU.

The sanitizer doesn’t protect against mapping/unmapping during this
period. However, that’s less harmful as the page is not used by the
kernel.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
 drivers/iommu/iommu-debug-pagealloc.c | 23 +++++++++++++++++++++++
 include/linux/iommu-debug-pagealloc.h | 14 ++++++++++++++
 include/linux/mm.h                    |  5 +++++
 3 files changed, 42 insertions(+)

diff --git a/drivers/iommu/iommu-debug-pagealloc.c b/drivers/iommu/iommu-debug-pagealloc.c
index 86ccb310a4a8..5353417e64f9 100644
--- a/drivers/iommu/iommu-debug-pagealloc.c
+++ b/drivers/iommu/iommu-debug-pagealloc.c
@@ -9,6 +9,7 @@
 #include <linux/iommu-debug-pagealloc.h>
 #include <linux/kernel.h>
 #include <linux/page_ext.h>
+#include <linux/page_owner.h>
 
 #include "iommu-priv.h"
 
@@ -73,6 +74,28 @@ static size_t iommu_debug_page_size(struct iommu_domain *domain)
 	return 1UL << __ffs(domain->pgsize_bitmap);
 }
 
+static bool iommu_debug_page_count(const struct page *page)
+{
+	unsigned int ref;
+	struct page_ext *page_ext = page_ext_get(page);
+	struct iommu_debug_metadata *d = get_iommu_data(page_ext);
+
+	ref = atomic_read(&d->ref);
+	page_ext_put(page_ext);
+	return ref != 0;
+}
+
+void __iommu_debug_check_unmapped(const struct page *page, int numpages)
+{
+	while (numpages--) {
+		if (WARN_ON(iommu_debug_page_count(page))) {
+			pr_warn("iommu: Detected page leak!\n");
+			dump_page_owner(page);
+		}
+		page++;
+	}
+}
+
 void __iommu_debug_map(struct iommu_domain *domain, phys_addr_t phys, size_t size)
 {
 	size_t off, end;
diff --git a/include/linux/iommu-debug-pagealloc.h b/include/linux/iommu-debug-pagealloc.h
index a439d6815ca1..46c3c1f70150 100644
--- a/include/linux/iommu-debug-pagealloc.h
+++ b/include/linux/iommu-debug-pagealloc.h
@@ -13,6 +13,20 @@ DECLARE_STATIC_KEY_FALSE(iommu_debug_initialized);
 
 extern struct page_ext_operations page_iommu_debug_ops;
 
+void __iommu_debug_check_unmapped(const struct page *page, int numpages);
+
+static inline void iommu_debug_check_unmapped(const struct page *page, int numpages)
+{
+	if (static_branch_unlikely(&iommu_debug_initialized))
+		__iommu_debug_check_unmapped(page, numpages);
+}
+
+#else
+static inline void iommu_debug_check_unmapped(const struct page *page,
+					      int numpages)
+{
+}
+
 #endif /* CONFIG_IOMMU_DEBUG_PAGEALLOC */
 
 #endif /* __LINUX_IOMMU_DEBUG_PAGEALLOC_H */
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6f959d8ca4b4..32205d2a24b2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -36,6 +36,7 @@
 #include <linux/rcuwait.h>
 #include <linux/bitmap.h>
 #include <linux/bitops.h>
+#include <linux/iommu-debug-pagealloc.h>
 
 struct mempolicy;
 struct anon_vma;
@@ -4133,12 +4134,16 @@ extern void __kernel_map_pages(struct page *page, int numpages, int enable);
 #ifdef CONFIG_DEBUG_PAGEALLOC
 static inline void debug_pagealloc_map_pages(struct page *page, int numpages)
 {
+	iommu_debug_check_unmapped(page, numpages);
+
 	if (debug_pagealloc_enabled_static())
 		__kernel_map_pages(page, numpages, 1);
 }
 
 static inline void debug_pagealloc_unmap_pages(struct page *page, int numpages)
 {
+	iommu_debug_check_unmapped(page, numpages);
+
 	if (debug_pagealloc_enabled_static())
 		__kernel_map_pages(page, numpages, 0);
 }
-- 
2.52.0.351.gbe84eed79e-goog



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 1/4] iommu: Add page_ext for IOMMU_DEBUG_PAGEALLOC
  2026-01-06 16:21 ` [PATCH v5 1/4] iommu: Add page_ext for IOMMU_DEBUG_PAGEALLOC Mostafa Saleh
@ 2026-01-06 18:50   ` David Hildenbrand (Red Hat)
  2026-01-07 15:26   ` Pranjal Shrivastava
  2026-01-07 16:53   ` David Hildenbrand (Red Hat)
  2 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand (Red Hat) @ 2026-01-06 18:50 UTC (permalink / raw)
  To: Mostafa Saleh, linux-mm, iommu, linux-kernel, linux-doc
  Cc: corbet, joro, will, robin.murphy, akpm, vbabka, surenb, mhocko,
	jackmanb, hannes, ziy, lorenzo.stoakes, Liam.Howlett, rppt,
	xiaqinxin, baolu.lu, rdunlap

On 1/6/26 17:21, Mostafa Saleh wrote:
> Add a new config IOMMU_DEBUG_PAGEALLOC, which registers new data to
> page_ext.
> 
> This config will be used by the IOMMU API to track pages mapped in
> the IOMMU to catch drivers trying to free kernel memory that they
> still map in their domains, causing all types of memory corruption.
> 
> This behaviour is disabled by default and can be enabled using
> kernel cmdline iommu.debug_pagealloc.
> 
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Mostafa Saleh <smostafa@google.com>
> ---

[...]

> +#endif /* __LINUX_IOMMU_DEBUG_PAGEALLOC_H */
> diff --git a/mm/page_ext.c b/mm/page_ext.c
> index d7396a8970e5..297e4cd8ce90 100644
> --- a/mm/page_ext.c
> +++ b/mm/page_ext.c
> @@ -11,6 +11,7 @@
>   #include <linux/page_table_check.h>
>   #include <linux/rcupdate.h>
>   #include <linux/pgalloc_tag.h>
> +#include <linux/iommu-debug-pagealloc.h>
>   
>   /*
>    * struct page extension
> @@ -89,6 +90,9 @@ static struct page_ext_operations *page_ext_ops[] __initdata = {
>   #ifdef CONFIG_PAGE_TABLE_CHECK
>   	&page_table_check_ops,
>   #endif
> +#ifdef CONFIG_IOMMU_DEBUG_PAGEALLOC
> +	&page_iommu_debug_ops,
> +#endif
>   };
>   
>   unsigned long page_ext_size;

The MM bits LGTM

Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 2/4] iommu: Add calls for IOMMU_DEBUG_PAGEALLOC
  2026-01-06 16:21 ` [PATCH v5 2/4] iommu: Add calls " Mostafa Saleh
@ 2026-01-06 21:17   ` Samiullah Khawaja
  2026-01-07  5:48   ` Baolu Lu
  2026-01-07 15:28   ` Pranjal Shrivastava
  2 siblings, 0 replies; 20+ messages in thread
From: Samiullah Khawaja @ 2026-01-06 21:17 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: linux-mm, iommu, linux-kernel, linux-doc, corbet, joro, will,
	robin.murphy, akpm, vbabka, surenb, mhocko, jackmanb, hannes,
	ziy, david, lorenzo.stoakes, Liam.Howlett, rppt, xiaqinxin,
	baolu.lu, rdunlap

On Tue, Jan 6, 2026 at 8:22 AM Mostafa Saleh <smostafa@google.com> wrote:
>
> Add calls for the new iommu debug config IOMMU_DEBUG_PAGEALLOC:
> - iommu_debug_init: Enable the debug mode if configured by the user.
> - iommu_debug_map: Track iommu pages mapped, using physical address.
> - iommu_debug_unmap_begin: Track start of iommu unmap operation, with
>   IOVA and size.
> - iommu_debug_unmap_end: Track the end of unmap operation, passing the
>   actual unmapped size versus the tracked one at unmap_begin.
>
> We have to do the unmap_begin/end as once pages are unmapped we lose
> the information of the physical address.
> This is racy, but the API is racy by construction as it uses refcounts
> and doesn't attempt to lock/synchronize with the IOMMU API as that will
> be costly, meaning that possibility of false negative exists.
>
> Signed-off-by: Mostafa Saleh <smostafa@google.com>
> ---
>  drivers/iommu/iommu-debug-pagealloc.c | 28 +++++++++++++
>  drivers/iommu/iommu-priv.h            | 58 +++++++++++++++++++++++++++
>  drivers/iommu/iommu.c                 | 11 ++++-
>  include/linux/iommu-debug-pagealloc.h |  1 +
>  4 files changed, 96 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/iommu/iommu-debug-pagealloc.c b/drivers/iommu/iommu-debug-pagealloc.c
> index 4022e9af7f27..1d343421da98 100644
> --- a/drivers/iommu/iommu-debug-pagealloc.c
> +++ b/drivers/iommu/iommu-debug-pagealloc.c
> @@ -5,11 +5,15 @@
>   * IOMMU API debug page alloc sanitizer
>   */
>  #include <linux/atomic.h>
> +#include <linux/iommu.h>
>  #include <linux/iommu-debug-pagealloc.h>
>  #include <linux/kernel.h>
>  #include <linux/page_ext.h>
>
> +#include "iommu-priv.h"
> +
>  static bool needed;
> +DEFINE_STATIC_KEY_FALSE(iommu_debug_initialized);
>
>  struct iommu_debug_metadata {
>         atomic_t ref;
> @@ -25,6 +29,30 @@ struct page_ext_operations page_iommu_debug_ops = {
>         .need = need_iommu_debug,
>  };
>
> +void __iommu_debug_map(struct iommu_domain *domain, phys_addr_t phys, size_t size)
> +{
> +}
> +
> +void __iommu_debug_unmap_begin(struct iommu_domain *domain,
> +                              unsigned long iova, size_t size)
> +{
> +}
> +
> +void __iommu_debug_unmap_end(struct iommu_domain *domain,
> +                            unsigned long iova, size_t size,
> +                            size_t unmapped)
> +{
> +}
> +
> +void iommu_debug_init(void)
> +{
> +       if (!needed)
> +               return;
> +
> +       pr_info("iommu: Debugging page allocations, expect overhead or disable iommu.debug_pagealloc");
> +       static_branch_enable(&iommu_debug_initialized);
> +}
> +
>  static int __init iommu_debug_pagealloc(char *str)
>  {
>         return kstrtobool(str, &needed);
> diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
> index c95394cd03a7..aaffad5854fc 100644
> --- a/drivers/iommu/iommu-priv.h
> +++ b/drivers/iommu/iommu-priv.h
> @@ -5,6 +5,7 @@
>  #define __LINUX_IOMMU_PRIV_H
>
>  #include <linux/iommu.h>
> +#include <linux/iommu-debug-pagealloc.h>
>  #include <linux/msi.h>
>
>  static inline const struct iommu_ops *dev_iommu_ops(struct device *dev)
> @@ -65,4 +66,61 @@ static inline int iommufd_sw_msi(struct iommu_domain *domain,
>  int iommu_replace_device_pasid(struct iommu_domain *domain,
>                                struct device *dev, ioasid_t pasid,
>                                struct iommu_attach_handle *handle);
> +
> +#ifdef CONFIG_IOMMU_DEBUG_PAGEALLOC
> +
> +void __iommu_debug_map(struct iommu_domain *domain, phys_addr_t phys,
> +                      size_t size);
> +void __iommu_debug_unmap_begin(struct iommu_domain *domain,
> +                              unsigned long iova, size_t size);
> +void __iommu_debug_unmap_end(struct iommu_domain *domain,
> +                            unsigned long iova, size_t size, size_t unmapped);
> +
> +static inline void iommu_debug_map(struct iommu_domain *domain,
> +                                  phys_addr_t phys, size_t size)
> +{
> +       if (static_branch_unlikely(&iommu_debug_initialized))
> +               __iommu_debug_map(domain, phys, size);
> +}
> +
> +static inline void iommu_debug_unmap_begin(struct iommu_domain *domain,
> +                                          unsigned long iova, size_t size)
> +{
> +       if (static_branch_unlikely(&iommu_debug_initialized))
> +               __iommu_debug_unmap_begin(domain, iova, size);
> +}
> +
> +static inline void iommu_debug_unmap_end(struct iommu_domain *domain,
> +                                        unsigned long iova, size_t size,
> +                                        size_t unmapped)
> +{
> +       if (static_branch_unlikely(&iommu_debug_initialized))
> +               __iommu_debug_unmap_end(domain, iova, size, unmapped);
> +}
> +
> +void iommu_debug_init(void);
> +
> +#else
> +static inline void iommu_debug_map(struct iommu_domain *domain,
> +                                  phys_addr_t phys, size_t size)
> +{
> +}
> +
> +static inline void iommu_debug_unmap_begin(struct iommu_domain *domain,
> +                                          unsigned long iova, size_t size)
> +{
> +}
> +
> +static inline void iommu_debug_unmap_end(struct iommu_domain *domain,
> +                                        unsigned long iova, size_t size,
> +                                        size_t unmapped)
> +{
> +}
> +
> +static inline void iommu_debug_init(void)
> +{
> +}
> +
> +#endif /* CONFIG_IOMMU_DEBUG_PAGEALLOC */
> +
>  #endif /* __LINUX_IOMMU_PRIV_H */
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 2ca990dfbb88..01b062575519 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -232,6 +232,8 @@ static int __init iommu_subsys_init(void)
>         if (!nb)
>                 return -ENOMEM;
>
> +       iommu_debug_init();
> +
>         for (int i = 0; i < ARRAY_SIZE(iommu_buses); i++) {
>                 nb[i].notifier_call = iommu_bus_notifier;
>                 bus_register_notifier(iommu_buses[i], &nb[i]);
> @@ -2562,10 +2564,12 @@ int iommu_map_nosync(struct iommu_domain *domain, unsigned long iova,
>         }
>
>         /* unroll mapping in case something went wrong */
> -       if (ret)
> +       if (ret) {
>                 iommu_unmap(domain, orig_iova, orig_size - size);
> -       else
> +       } else {
>                 trace_map(orig_iova, orig_paddr, orig_size);
> +               iommu_debug_map(domain, orig_paddr, orig_size);
> +       }
>
>         return ret;
>  }
> @@ -2627,6 +2631,8 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
>
>         pr_debug("unmap this: iova 0x%lx size 0x%zx\n", iova, size);
>
> +       iommu_debug_unmap_begin(domain, iova, size);
> +
>         /*
>          * Keep iterating until we either unmap 'size' bytes (or more)
>          * or we hit an area that isn't mapped.
> @@ -2647,6 +2653,7 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
>         }
>
>         trace_unmap(orig_iova, size, unmapped);
> +       iommu_debug_unmap_end(domain, orig_iova, size, unmapped);
>         return unmapped;
>  }
>
> diff --git a/include/linux/iommu-debug-pagealloc.h b/include/linux/iommu-debug-pagealloc.h
> index 83e64d70bf6c..a439d6815ca1 100644
> --- a/include/linux/iommu-debug-pagealloc.h
> +++ b/include/linux/iommu-debug-pagealloc.h
> @@ -9,6 +9,7 @@
>  #define __LINUX_IOMMU_DEBUG_PAGEALLOC_H
>
>  #ifdef CONFIG_IOMMU_DEBUG_PAGEALLOC
> +DECLARE_STATIC_KEY_FALSE(iommu_debug_initialized);
>
>  extern struct page_ext_operations page_iommu_debug_ops;
>
> --
> 2.52.0.351.gbe84eed79e-goog
>
>

Reviewed-by: Samiullah Khawaja <skhawaja@google.com>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 3/4] iommu: debug-pagealloc: Track IOMMU pages
  2026-01-06 16:21 ` [PATCH v5 3/4] iommu: debug-pagealloc: Track IOMMU pages Mostafa Saleh
@ 2026-01-06 21:18   ` Samiullah Khawaja
  2026-01-07 15:21   ` Pranjal Shrivastava
  1 sibling, 0 replies; 20+ messages in thread
From: Samiullah Khawaja @ 2026-01-06 21:18 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: linux-mm, iommu, linux-kernel, linux-doc, corbet, joro, will,
	robin.murphy, akpm, vbabka, surenb, mhocko, jackmanb, hannes,
	ziy, david, lorenzo.stoakes, Liam.Howlett, rppt, xiaqinxin,
	baolu.lu, rdunlap

On Tue, Jan 6, 2026 at 8:22 AM Mostafa Saleh <smostafa@google.com> wrote:
>
> Using the new calls, use an atomic refcount to track how many times
> a page is mapped in any of the IOMMUs.
>
> For unmap we need to use iova_to_phys() to get the physical address
> of the pages.
>
> We use the smallest supported page size as the granularity of tracking
> per domain.
> This is important as it is possible to map pages and unmap them with
> larger sizes (as in map_sg()) cases.
>
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Mostafa Saleh <smostafa@google.com>
> ---
>  drivers/iommu/iommu-debug-pagealloc.c | 91 +++++++++++++++++++++++++++
>  1 file changed, 91 insertions(+)
>
> diff --git a/drivers/iommu/iommu-debug-pagealloc.c b/drivers/iommu/iommu-debug-pagealloc.c
> index 1d343421da98..86ccb310a4a8 100644
> --- a/drivers/iommu/iommu-debug-pagealloc.c
> +++ b/drivers/iommu/iommu-debug-pagealloc.c
> @@ -29,19 +29,110 @@ struct page_ext_operations page_iommu_debug_ops = {
>         .need = need_iommu_debug,
>  };
>
> +static struct page_ext *get_iommu_page_ext(phys_addr_t phys)
> +{
> +       struct page *page = phys_to_page(phys);
> +       struct page_ext *page_ext = page_ext_get(page);
> +
> +       return page_ext;
> +}
> +
> +static struct iommu_debug_metadata *get_iommu_data(struct page_ext *page_ext)
> +{
> +       return page_ext_data(page_ext, &page_iommu_debug_ops);
> +}
> +
> +static void iommu_debug_inc_page(phys_addr_t phys)
> +{
> +       struct page_ext *page_ext = get_iommu_page_ext(phys);
> +       struct iommu_debug_metadata *d = get_iommu_data(page_ext);
> +
> +       WARN_ON(atomic_inc_return_relaxed(&d->ref) <= 0);
> +       page_ext_put(page_ext);
> +}
> +
> +static void iommu_debug_dec_page(phys_addr_t phys)
> +{
> +       struct page_ext *page_ext = get_iommu_page_ext(phys);
> +       struct iommu_debug_metadata *d = get_iommu_data(page_ext);
> +
> +       WARN_ON(atomic_dec_return_relaxed(&d->ref) < 0);
> +       page_ext_put(page_ext);
> +}
> +
> +/*
> + * IOMMU page size doesn't have to match the CPU page size. So, we use
> + * the smallest IOMMU page size to refcount the pages in the vmemmap.
> + * That is important as both map and unmap has to use the same page size
> + * to update the refcount to avoid double counting the same page.
> + * And as we can't know from iommu_unmap() what was the original page size
> + * used for map, we just use the minimum supported one for both.
> + */
> +static size_t iommu_debug_page_size(struct iommu_domain *domain)
> +{
> +       return 1UL << __ffs(domain->pgsize_bitmap);
> +}
> +
>  void __iommu_debug_map(struct iommu_domain *domain, phys_addr_t phys, size_t size)
>  {
> +       size_t off, end;
> +       size_t page_size = iommu_debug_page_size(domain);
> +
> +       if (WARN_ON(!phys || check_add_overflow(phys, size, &end)))
> +               return;
> +
> +       for (off = 0 ; off < size ; off += page_size) {
> +               if (!pfn_valid(__phys_to_pfn(phys + off)))
> +                       continue;
> +               iommu_debug_inc_page(phys + off);
> +       }
> +}
> +
> +static void __iommu_debug_update_iova(struct iommu_domain *domain,
> +                                     unsigned long iova, size_t size, bool inc)
> +{
> +       size_t off, end;
> +       size_t page_size = iommu_debug_page_size(domain);
> +
> +       if (WARN_ON(check_add_overflow(iova, size, &end)))
> +               return;
> +
> +       for (off = 0 ; off < size ; off += page_size) {
> +               phys_addr_t phys = iommu_iova_to_phys(domain, iova + off);
> +
> +               if (!phys || !pfn_valid(__phys_to_pfn(phys)))
> +                       continue;
> +
> +               if (inc)
> +                       iommu_debug_inc_page(phys);
> +               else
> +                       iommu_debug_dec_page(phys);
> +       }
>  }
>
>  void __iommu_debug_unmap_begin(struct iommu_domain *domain,
>                                unsigned long iova, size_t size)
>  {
> +       __iommu_debug_update_iova(domain, iova, size, false);
>  }
>
>  void __iommu_debug_unmap_end(struct iommu_domain *domain,
>                              unsigned long iova, size_t size,
>                              size_t unmapped)
>  {
> +       if (unmapped == size)
> +               return;
> +
> +       /*
> +        * If unmap failed, re-increment the refcount, but if it unmapped
> +        * larger size, decrement the extra part.
> +        */
> +       if (unmapped < size)
> +               __iommu_debug_update_iova(domain, iova + unmapped,
> +                                         size - unmapped, true);
> +       else
> +               __iommu_debug_update_iova(domain, iova + size,
> +                                         unmapped - size, false);
>  }
>
>  void iommu_debug_init(void)
> --
> 2.52.0.351.gbe84eed79e-goog
>
>

Reviewed-by: Samiullah Khawaja <skhawaja@google.com>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 4/4] iommu: debug-pagealloc: Check mapped/unmapped kernel memory
  2026-01-06 16:22 ` [PATCH v5 4/4] iommu: debug-pagealloc: Check mapped/unmapped kernel memory Mostafa Saleh
@ 2026-01-06 21:19   ` Samiullah Khawaja
  0 siblings, 0 replies; 20+ messages in thread
From: Samiullah Khawaja @ 2026-01-06 21:19 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: linux-mm, iommu, linux-kernel, linux-doc, corbet, joro, will,
	robin.murphy, akpm, vbabka, surenb, mhocko, jackmanb, hannes,
	ziy, david, lorenzo.stoakes, Liam.Howlett, rppt, xiaqinxin,
	baolu.lu, rdunlap

On Tue, Jan 6, 2026 at 8:22 AM Mostafa Saleh <smostafa@google.com> wrote:
>
> Now, as the page_ext holds count of IOMMU mappings, we can use it to
> assert that any page allocated/freed is indeed not in the IOMMU.
>
> The sanitizer doesn’t protect against mapping/unmapping during this
> period. However, that’s less harmful as the page is not used by the
> kernel.
>
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Mostafa Saleh <smostafa@google.com>
> ---
>  drivers/iommu/iommu-debug-pagealloc.c | 23 +++++++++++++++++++++++
>  include/linux/iommu-debug-pagealloc.h | 14 ++++++++++++++
>  include/linux/mm.h                    |  5 +++++
>  3 files changed, 42 insertions(+)
>
> diff --git a/drivers/iommu/iommu-debug-pagealloc.c b/drivers/iommu/iommu-debug-pagealloc.c
> index 86ccb310a4a8..5353417e64f9 100644
> --- a/drivers/iommu/iommu-debug-pagealloc.c
> +++ b/drivers/iommu/iommu-debug-pagealloc.c
> @@ -9,6 +9,7 @@
>  #include <linux/iommu-debug-pagealloc.h>
>  #include <linux/kernel.h>
>  #include <linux/page_ext.h>
> +#include <linux/page_owner.h>
>
>  #include "iommu-priv.h"
>
> @@ -73,6 +74,28 @@ static size_t iommu_debug_page_size(struct iommu_domain *domain)
>         return 1UL << __ffs(domain->pgsize_bitmap);
>  }
>
> +static bool iommu_debug_page_count(const struct page *page)
> +{
> +       unsigned int ref;
> +       struct page_ext *page_ext = page_ext_get(page);
> +       struct iommu_debug_metadata *d = get_iommu_data(page_ext);
> +
> +       ref = atomic_read(&d->ref);
> +       page_ext_put(page_ext);
> +       return ref != 0;
> +}
> +
> +void __iommu_debug_check_unmapped(const struct page *page, int numpages)
> +{
> +       while (numpages--) {
> +               if (WARN_ON(iommu_debug_page_count(page))) {
> +                       pr_warn("iommu: Detected page leak!\n");
> +                       dump_page_owner(page);
> +               }
> +               page++;
> +       }
> +}
> +
>  void __iommu_debug_map(struct iommu_domain *domain, phys_addr_t phys, size_t size)
>  {
>         size_t off, end;
> diff --git a/include/linux/iommu-debug-pagealloc.h b/include/linux/iommu-debug-pagealloc.h
> index a439d6815ca1..46c3c1f70150 100644
> --- a/include/linux/iommu-debug-pagealloc.h
> +++ b/include/linux/iommu-debug-pagealloc.h
> @@ -13,6 +13,20 @@ DECLARE_STATIC_KEY_FALSE(iommu_debug_initialized);
>
>  extern struct page_ext_operations page_iommu_debug_ops;
>
> +void __iommu_debug_check_unmapped(const struct page *page, int numpages);
> +
> +static inline void iommu_debug_check_unmapped(const struct page *page, int numpages)
> +{
> +       if (static_branch_unlikely(&iommu_debug_initialized))
> +               __iommu_debug_check_unmapped(page, numpages);
> +}
> +
> +#else
> +static inline void iommu_debug_check_unmapped(const struct page *page,
> +                                             int numpages)
> +{
> +}
> +
>  #endif /* CONFIG_IOMMU_DEBUG_PAGEALLOC */
>
>  #endif /* __LINUX_IOMMU_DEBUG_PAGEALLOC_H */
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 6f959d8ca4b4..32205d2a24b2 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -36,6 +36,7 @@
>  #include <linux/rcuwait.h>
>  #include <linux/bitmap.h>
>  #include <linux/bitops.h>
> +#include <linux/iommu-debug-pagealloc.h>
>
>  struct mempolicy;
>  struct anon_vma;
> @@ -4133,12 +4134,16 @@ extern void __kernel_map_pages(struct page *page, int numpages, int enable);
>  #ifdef CONFIG_DEBUG_PAGEALLOC
>  static inline void debug_pagealloc_map_pages(struct page *page, int numpages)
>  {
> +       iommu_debug_check_unmapped(page, numpages);
> +
>         if (debug_pagealloc_enabled_static())
>                 __kernel_map_pages(page, numpages, 1);
>  }
>
>  static inline void debug_pagealloc_unmap_pages(struct page *page, int numpages)
>  {
> +       iommu_debug_check_unmapped(page, numpages);
> +
>         if (debug_pagealloc_enabled_static())
>                 __kernel_map_pages(page, numpages, 0);
>  }
> --
> 2.52.0.351.gbe84eed79e-goog
>
>

Reviewed-by: Samiullah Khawaja <skhawaja@google.com>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 2/4] iommu: Add calls for IOMMU_DEBUG_PAGEALLOC
  2026-01-06 16:21 ` [PATCH v5 2/4] iommu: Add calls " Mostafa Saleh
  2026-01-06 21:17   ` Samiullah Khawaja
@ 2026-01-07  5:48   ` Baolu Lu
  2026-01-07 15:28   ` Pranjal Shrivastava
  2 siblings, 0 replies; 20+ messages in thread
From: Baolu Lu @ 2026-01-07  5:48 UTC (permalink / raw)
  To: Mostafa Saleh, linux-mm, iommu, linux-kernel, linux-doc
  Cc: corbet, joro, will, robin.murphy, akpm, vbabka, surenb, mhocko,
	jackmanb, hannes, ziy, david, lorenzo.stoakes, Liam.Howlett,
	rppt, xiaqinxin, rdunlap

On 1/7/26 00:21, Mostafa Saleh wrote:
> Add calls for the new iommu debug config IOMMU_DEBUG_PAGEALLOC:
> - iommu_debug_init: Enable the debug mode if configured by the user.
> - iommu_debug_map: Track iommu pages mapped, using physical address.
> - iommu_debug_unmap_begin: Track start of iommu unmap operation, with
>    IOVA and size.
> - iommu_debug_unmap_end: Track the end of unmap operation, passing the
>    actual unmapped size versus the tracked one at unmap_begin.
> 
> We have to do the unmap_begin/end as once pages are unmapped we lose
> the information of the physical address.
> This is racy, but the API is racy by construction as it uses refcounts
> and doesn't attempt to lock/synchronize with the IOMMU API as that will
> be costly, meaning that possibility of false negative exists.
> 
> Signed-off-by: Mostafa Saleh<smostafa@google.com>
> ---
>   drivers/iommu/iommu-debug-pagealloc.c | 28 +++++++++++++
>   drivers/iommu/iommu-priv.h            | 58 +++++++++++++++++++++++++++
>   drivers/iommu/iommu.c                 | 11 ++++-
>   include/linux/iommu-debug-pagealloc.h |  1 +
>   4 files changed, 96 insertions(+), 2 deletions(-)

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 3/4] iommu: debug-pagealloc: Track IOMMU pages
  2026-01-06 16:21 ` [PATCH v5 3/4] iommu: debug-pagealloc: Track IOMMU pages Mostafa Saleh
  2026-01-06 21:18   ` Samiullah Khawaja
@ 2026-01-07 15:21   ` Pranjal Shrivastava
  2026-01-08 11:06     ` Mostafa Saleh
  1 sibling, 1 reply; 20+ messages in thread
From: Pranjal Shrivastava @ 2026-01-07 15:21 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: linux-mm, iommu, linux-kernel, linux-doc, corbet, joro, will,
	robin.murphy, akpm, vbabka, surenb, mhocko, jackmanb, hannes,
	ziy, david, lorenzo.stoakes, Liam.Howlett, rppt, xiaqinxin,
	baolu.lu, rdunlap

On Tue, Jan 06, 2026 at 04:21:59PM +0000, Mostafa Saleh wrote:
> Using the new calls, use an atomic refcount to track how many times
> a page is mapped in any of the IOMMUs.
> 
> For unmap we need to use iova_to_phys() to get the physical address
> of the pages.
> 
> We use the smallest supported page size as the granularity of tracking
> per domain.
> This is important as it is possible to map pages and unmap them with
> larger sizes (as in map_sg()) cases.
> 
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Mostafa Saleh <smostafa@google.com>
> ---
>  drivers/iommu/iommu-debug-pagealloc.c | 91 +++++++++++++++++++++++++++
>  1 file changed, 91 insertions(+)
> 
> diff --git a/drivers/iommu/iommu-debug-pagealloc.c b/drivers/iommu/iommu-debug-pagealloc.c
> index 1d343421da98..86ccb310a4a8 100644
> --- a/drivers/iommu/iommu-debug-pagealloc.c
> +++ b/drivers/iommu/iommu-debug-pagealloc.c
> @@ -29,19 +29,110 @@ struct page_ext_operations page_iommu_debug_ops = {
>  	.need = need_iommu_debug,
>  };
>  
> +static struct page_ext *get_iommu_page_ext(phys_addr_t phys)
> +{
> +	struct page *page = phys_to_page(phys);
> +	struct page_ext *page_ext = page_ext_get(page);
> +
> +	return page_ext;
> +}
> +
> +static struct iommu_debug_metadata *get_iommu_data(struct page_ext *page_ext)
> +{
> +	return page_ext_data(page_ext, &page_iommu_debug_ops);
> +}
> +
> +static void iommu_debug_inc_page(phys_addr_t phys)
> +{
> +	struct page_ext *page_ext = get_iommu_page_ext(phys);
> +	struct iommu_debug_metadata *d = get_iommu_data(page_ext);
> +
> +	WARN_ON(atomic_inc_return_relaxed(&d->ref) <= 0);
> +	page_ext_put(page_ext);
> +}
> +
> +static void iommu_debug_dec_page(phys_addr_t phys)
> +{
> +	struct page_ext *page_ext = get_iommu_page_ext(phys);
> +	struct iommu_debug_metadata *d = get_iommu_data(page_ext);
> +
> +	WARN_ON(atomic_dec_return_relaxed(&d->ref) < 0);
> +	page_ext_put(page_ext);
> +}
> +
> +/*
> + * IOMMU page size doesn't have to match the CPU page size. So, we use
> + * the smallest IOMMU page size to refcount the pages in the vmemmap.
> + * That is important as both map and unmap has to use the same page size
> + * to update the refcount to avoid double counting the same page.
> + * And as we can't know from iommu_unmap() what was the original page size
> + * used for map, we just use the minimum supported one for both.
> + */
> +static size_t iommu_debug_page_size(struct iommu_domain *domain)
> +{
> +	return 1UL << __ffs(domain->pgsize_bitmap);
> +}
> +
>  void __iommu_debug_map(struct iommu_domain *domain, phys_addr_t phys, size_t size)
>  {
> +	size_t off, end;
> +	size_t page_size = iommu_debug_page_size(domain);
> +
> +	if (WARN_ON(!phys || check_add_overflow(phys, size, &end)))
> +		return;
> +
> +	for (off = 0 ; off < size ; off += page_size) {
> +		if (!pfn_valid(__phys_to_pfn(phys + off)))
> +			continue;
> +		iommu_debug_inc_page(phys + off);
> +	}
> +}
> +
> +static void __iommu_debug_update_iova(struct iommu_domain *domain,
> +				      unsigned long iova, size_t size, bool inc)
> +{
> +	size_t off, end;
> +	size_t page_size = iommu_debug_page_size(domain);
> +
> +	if (WARN_ON(check_add_overflow(iova, size, &end)))
> +		return;
> +
> +	for (off = 0 ; off < size ; off += page_size) {
> +		phys_addr_t phys = iommu_iova_to_phys(domain, iova + off);
> +
> +		if (!phys || !pfn_valid(__phys_to_pfn(phys)))
> +			continue;
> +
> +		if (inc)
> +			iommu_debug_inc_page(phys);
> +		else
> +			iommu_debug_dec_page(phys);
> +	}

This might loop for too long when we're unmapping a big buffer (say 1GB)
which is backed by multiple 4K mappings (i.e. not mapped using large
mappings) it may hold the CPU for too long, per the above example:

1,073,741,824 / 4096 = 262,144 iterations each with an iova_to_phys walk
in a tight loop, could hold the CPU for a little too long and could
potentially result in soft lockups (painful to see in a debug kernel).
Since, iommu_unmap can be called in atomic contexts (i.e. interrupts,
spinlocks with pre-emption disabled) we cannot simply add cond_resched()
here as well.

Maybe we can cross that bridge once we get there, but if we can't solve
the latency now, it'd be nice to explicitly document this risk
(potential soft lockups on large unmaps) in the Kconfig or cmdline help text?

>  }
>  
>  void __iommu_debug_unmap_begin(struct iommu_domain *domain,
>  			       unsigned long iova, size_t size)
>  {
> +	__iommu_debug_update_iova(domain, iova, size, false);
>  }
>  
>  void __iommu_debug_unmap_end(struct iommu_domain *domain,
>  			     unsigned long iova, size_t size,
>  			     size_t unmapped)
>  {
> +	if (unmapped == size)
> +		return;
> +
> +	/*
> +	 * If unmap failed, re-increment the refcount, but if it unmapped
> +	 * larger size, decrement the extra part.
> +	 */
> +	if (unmapped < size)
> +		__iommu_debug_update_iova(domain, iova + unmapped,
> +					  size - unmapped, true);
> +	else
> +		__iommu_debug_update_iova(domain, iova + size,
> +					  unmapped - size, false);
>  }

I'm a little concerned about this part, when we unmap more than requested,
the __iommu_debug_update_iova relies on 
iommu_iova_to_phys(domain, iova + off) to find the physical page to
decrement. However, since __iommu_debug_unmap_end is called *after* the
IOMMU driver has removed the mapping (in __iommu_unmap). Thus, the
iommu_iova_to_phys return 0 (fail) causing the loop in update_iova:
`if (!phys ...)` to silently continue.

Since the refcounts for the physical pages in the range:
[iova + size, iova + unmapped] are never decremented. Won't this result
in false positives (warnings about page leaks) when those pages are
eventually freed?

For example:

- A driver maps a 2MB region (512 x 4KB). All 512 pgs have refcount = 1.

- A driver / IOMMU-client calls iommu_unmap(iova, 4KB)

- unmap_begin(4KB) calls iova_to_phys, succeeds, and decrements the
  refcount for the 1st page to 0.

- __iommu_unmap calls the IOMMU driver. The driver (unable to split the
  block) zaps the entire 2MB range and returns unmapped = 2MB.

- unmap_end(size=4KB, unmapped=2MB) sees that more was unmapped than
  requested & attempts to decrement refcounts for the remaining 511 pgs

- __iommu_debug_update_iova is called for the remaining range, which
  ends up calling iommu_iova_to_phys. Since the mapping was destroyed,
  iova_to_phys returns 0.

- The loop skips the decrement causing the remaining 511 pages to leak
  with refcount = 1.

Thanks,
Praan


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 0/4] iommu: Add IOMMU_DEBUG_PAGEALLOC sanitizer
  2026-01-06 16:21 [PATCH v5 0/4] iommu: Add IOMMU_DEBUG_PAGEALLOC sanitizer Mostafa Saleh
                   ` (3 preceding siblings ...)
  2026-01-06 16:22 ` [PATCH v5 4/4] iommu: debug-pagealloc: Check mapped/unmapped kernel memory Mostafa Saleh
@ 2026-01-07 15:24 ` Pranjal Shrivastava
  2026-01-08 11:37   ` Mostafa Saleh
  4 siblings, 1 reply; 20+ messages in thread
From: Pranjal Shrivastava @ 2026-01-07 15:24 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: linux-mm, iommu, linux-kernel, linux-doc, corbet, joro, will,
	robin.murphy, akpm, vbabka, surenb, mhocko, jackmanb, hannes,
	ziy, david, lorenzo.stoakes, Liam.Howlett, rppt, xiaqinxin,
	baolu.lu, rdunlap

On Tue, Jan 06, 2026 at 04:21:56PM +0000, Mostafa Saleh wrote:
> Overview
> --------
> This patch series introduces a new debugging feature,
> IOMMU_DEBUG_PAGEALLOC, designed to catch DMA use-after-free bugs
> and IOMMU mapping leaks from buggy drivers.
> 
> The kernel has powerful sanitizers like KASAN and DEBUG_PAGEALLOC
> for catching CPU-side memory corruption. However, there is limited
> runtime sanitization for DMA mappings managed by the IOMMU. A buggy
> driver can free a page while it is still mapped for DMA, leading to
> memory corruption or use-after-free vulnerabilities when that page is
> reallocated and used for a different purpose.
> 

Thanks for this series! This is really helpful!

> Inspired by DEBUG_PAGEALLOC, this sanitizer tracks IOMMU mappings on a
> per-page basis, as it’s not possible to unmap the pages, because it
> requires to lock and walk all domains on every kernel free, instead we
> rely on page_ext to add an IOMMU-specific mapping reference count for
> each page.
> And on each page allocated/freed from the kernel we simply check the
> count and WARN if it is not zero, and dumping page owner information
> if enabled.
> 
> Concurrency
> -----------
> By design this check is racy where one caller can map pages just after
> the check, which can lead to false negatives.
> In my opinion this is acceptable for sanitizers (for ex KCSAN have
> that property).
> Otherwise we have to implement locks in iommu_map/unmap for all domains
> which is not favourable even for a debug feature.
> The sanitizer only guarantees that the refcount itself doesn’t get
> corrupted using atomics. And there are no false positives.
> 
> CPU vs IOMMU Page Size
> ----------------------
> IOMMUs can use different page sizes and which can be non-homogeneous;
> not even all of them have the same page size.
> 
> To solve this, the refcount is always incremented and decremented in
> units of the smallest page size supported by the IOMMU domain. This
> ensures the accounting remains consistent regardless of the size of
> the map or unmap operation, otherwise double counting can happen.
> 
> Testing & Performance
> ---------------------
> This was tested on Morello with Arm64 + SMMUv3
> Did some testing Lenovo IdeaCentre X Gen 10 Snapdragon
> Did some testing on Qemu including different SMMUv3/CPU page size (arm64).
> 
> I also ran dma_map_benchmark on Morello:
> 
> echo dma_map_benchmark > /sys/bus/pci/devices/0000\:06\:00.0/driver_override
> echo 0000:06:00.0 >  /sys/bus/pci/devices/0000\:06\:00.0/driver/unbind
> echo 0000:06:00.0 > /sys/bus/pci/drivers/dma_map_benchmark/bind
> ./dma_map_benchmark -t $threads -g $nr_pages
> 
> CONFIG refers to "CONFIG_IOMMU_DEBUG_PAGEALLOC"
> cmdline refers to "iommu.debug_pagealloc"
> Numbers are (map latency)/(unmap latency), lower is better.
> 
> 			CONFIG=n    CONFIG=y    CONFIG=y
> 			            cmdline=0   cmdline=1
> 4K - 1 thread		0.1/0.6     0.1/0.6     0.1/0.7
> 4K - 4 threads		0.1/1.1     0.1/1.0     0.2/1.1
> 1M - 1 thread		0.8/21.2    0.7/21.2    5.4/42.3
> 1M - 4 threads		1.1/45.9    1.1/46.0    5.9/45.1
> 

Just curious to know if we've also measured the latency for larger
mappings? e.g. 1G mapping backed by `n` 4K mappings?

> Changes in v5:
> v4: https://lore.kernel.org/all/20251211125928.3258905-1-smostafa@google.com/
> - Fix typo in comment
> - Collect Baolu R-bs
> 
> Main changes in v4:
> v3: https://lore.kernel.org/all/20251124200811.2942432-1-smostafa@google.com/
> - Update the kernel parameter format in docs based on Randy feedback
> - Update commit subjects
> - Add IOMMU only functions in iommu-priv.h based on Baolu feedback
> 
> Main changes in v3: (Most of them addressing Will comments)
> v2: https://lore.kernel.org/linux-iommu/20251106163953.1971067-1-smostafa@google.com/
> - Reword the Kconfig help
> - Use unmap_begin/end instead of unmap/remap
> - Use relaxed accessors when refcounting
> - Fix a bug with checking the returned address from iova_to_phys
> - Add more hardening checks (overflow)
> - Add more debug info on assertions (dump_page_owner())
> - Handle cases where unmap returns larger size as the core code seems
>   to tolerate that.
> - Drop Tested-by tags from Qinxin as the code logic changed
> 
> Main changes in v2:
> v1: https://lore.kernel.org/linux-iommu/20251003173229.1533640-1-smostafa@google.com/
> - Address Jörg comments about #ifdefs and static keys
> - Reword the Kconfig help
> - Drop RFC
> - Collect t-b from Qinxin
> - Minor cleanups
> 
> Mostafa Saleh (4):
>   iommu: Add page_ext for IOMMU_DEBUG_PAGEALLOC
>   iommu: Add calls for IOMMU_DEBUG_PAGEALLOC
>   iommu: debug-pagealloc: Track IOMMU pages
>   iommu: debug-pagealloc: Check mapped/unmapped kernel memory
> 
>  .../admin-guide/kernel-parameters.txt         |   9 +
>  drivers/iommu/Kconfig                         |  19 ++
>  drivers/iommu/Makefile                        |   1 +
>  drivers/iommu/iommu-debug-pagealloc.c         | 174 ++++++++++++++++++
>  drivers/iommu/iommu-priv.h                    |  58 ++++++
>  drivers/iommu/iommu.c                         |  11 +-
>  include/linux/iommu-debug-pagealloc.h         |  32 ++++
>  include/linux/mm.h                            |   5 +
>  mm/page_ext.c                                 |   4 +
>  9 files changed, 311 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/iommu/iommu-debug-pagealloc.c
>  create mode 100644 include/linux/iommu-debug-pagealloc.h
> 
> -- 
> 2.52.0.351.gbe84eed79e-goog
> 
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 1/4] iommu: Add page_ext for IOMMU_DEBUG_PAGEALLOC
  2026-01-06 16:21 ` [PATCH v5 1/4] iommu: Add page_ext for IOMMU_DEBUG_PAGEALLOC Mostafa Saleh
  2026-01-06 18:50   ` David Hildenbrand (Red Hat)
@ 2026-01-07 15:26   ` Pranjal Shrivastava
  2026-01-07 16:53   ` David Hildenbrand (Red Hat)
  2 siblings, 0 replies; 20+ messages in thread
From: Pranjal Shrivastava @ 2026-01-07 15:26 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: linux-mm, iommu, linux-kernel, linux-doc, corbet, joro, will,
	robin.murphy, akpm, vbabka, surenb, mhocko, jackmanb, hannes,
	ziy, david, lorenzo.stoakes, Liam.Howlett, rppt, xiaqinxin,
	baolu.lu, rdunlap

On Tue, Jan 06, 2026 at 04:21:57PM +0000, Mostafa Saleh wrote:
> Add a new config IOMMU_DEBUG_PAGEALLOC, which registers new data to
> page_ext.
> 
> This config will be used by the IOMMU API to track pages mapped in
> the IOMMU to catch drivers trying to free kernel memory that they
> still map in their domains, causing all types of memory corruption.
> 
> This behaviour is disabled by default and can be enabled using
> kernel cmdline iommu.debug_pagealloc.
> 
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Mostafa Saleh <smostafa@google.com>
> ---
>  .../admin-guide/kernel-parameters.txt         |  9 ++++++
>  drivers/iommu/Kconfig                         | 19 +++++++++++
>  drivers/iommu/Makefile                        |  1 +
>  drivers/iommu/iommu-debug-pagealloc.c         | 32 +++++++++++++++++++
>  include/linux/iommu-debug-pagealloc.h         | 17 ++++++++++
>  mm/page_ext.c                                 |  4 +++
>  6 files changed, 82 insertions(+)
>  create mode 100644 drivers/iommu/iommu-debug-pagealloc.c
>  create mode 100644 include/linux/iommu-debug-pagealloc.h
> 

Reviewed-by: Pranjal Shrivastava <praan@google.com>
Thanks!


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 2/4] iommu: Add calls for IOMMU_DEBUG_PAGEALLOC
  2026-01-06 16:21 ` [PATCH v5 2/4] iommu: Add calls " Mostafa Saleh
  2026-01-06 21:17   ` Samiullah Khawaja
  2026-01-07  5:48   ` Baolu Lu
@ 2026-01-07 15:28   ` Pranjal Shrivastava
  2 siblings, 0 replies; 20+ messages in thread
From: Pranjal Shrivastava @ 2026-01-07 15:28 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: linux-mm, iommu, linux-kernel, linux-doc, corbet, joro, will,
	robin.murphy, akpm, vbabka, surenb, mhocko, jackmanb, hannes,
	ziy, david, lorenzo.stoakes, Liam.Howlett, rppt, xiaqinxin,
	baolu.lu, rdunlap

On Tue, Jan 06, 2026 at 04:21:58PM +0000, Mostafa Saleh wrote:
> Add calls for the new iommu debug config IOMMU_DEBUG_PAGEALLOC:
> - iommu_debug_init: Enable the debug mode if configured by the user.
> - iommu_debug_map: Track iommu pages mapped, using physical address.
> - iommu_debug_unmap_begin: Track start of iommu unmap operation, with
>   IOVA and size.
> - iommu_debug_unmap_end: Track the end of unmap operation, passing the
>   actual unmapped size versus the tracked one at unmap_begin.
> 
> We have to do the unmap_begin/end as once pages are unmapped we lose
> the information of the physical address.
> This is racy, but the API is racy by construction as it uses refcounts
> and doesn't attempt to lock/synchronize with the IOMMU API as that will
> be costly, meaning that possibility of false negative exists.
> 
> Signed-off-by: Mostafa Saleh <smostafa@google.com>
> ---
>  drivers/iommu/iommu-debug-pagealloc.c | 28 +++++++++++++
>  drivers/iommu/iommu-priv.h            | 58 +++++++++++++++++++++++++++
>  drivers/iommu/iommu.c                 | 11 ++++-
>  include/linux/iommu-debug-pagealloc.h |  1 +
>  4 files changed, 96 insertions(+), 2 deletions(-)
> 

Reviewed-by: Pranjal Shrivastava <praan@google.com>
Thanks


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 1/4] iommu: Add page_ext for IOMMU_DEBUG_PAGEALLOC
  2026-01-06 16:21 ` [PATCH v5 1/4] iommu: Add page_ext for IOMMU_DEBUG_PAGEALLOC Mostafa Saleh
  2026-01-06 18:50   ` David Hildenbrand (Red Hat)
  2026-01-07 15:26   ` Pranjal Shrivastava
@ 2026-01-07 16:53   ` David Hildenbrand (Red Hat)
  2026-01-08 10:42     ` Mostafa Saleh
  2 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand (Red Hat) @ 2026-01-07 16:53 UTC (permalink / raw)
  To: Mostafa Saleh, linux-mm, iommu, linux-kernel, linux-doc
  Cc: corbet, joro, will, robin.murphy, akpm, vbabka, surenb, mhocko,
	jackmanb, hannes, ziy, lorenzo.stoakes, Liam.Howlett, rppt,
	xiaqinxin, baolu.lu, rdunlap

On 1/6/26 17:21, Mostafa Saleh wrote:
> Add a new config IOMMU_DEBUG_PAGEALLOC, which registers new data to
> page_ext.
> 
> This config will be used by the IOMMU API to track pages mapped in
> the IOMMU to catch drivers trying to free kernel memory that they
> still map in their domains, causing all types of memory corruption.
> 
> This behaviour is disabled by default and can be enabled using
> kernel cmdline iommu.debug_pagealloc.
> 
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Mostafa Saleh <smostafa@google.com>
> ---

I think I acked it, but maybe too late for you to spot it

for the MM bits

Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 1/4] iommu: Add page_ext for IOMMU_DEBUG_PAGEALLOC
  2026-01-07 16:53   ` David Hildenbrand (Red Hat)
@ 2026-01-08 10:42     ` Mostafa Saleh
  2026-01-08 11:53       ` David Hildenbrand (Red Hat)
  0 siblings, 1 reply; 20+ messages in thread
From: Mostafa Saleh @ 2026-01-08 10:42 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat)
  Cc: linux-mm, iommu, linux-kernel, linux-doc, corbet, joro, will,
	robin.murphy, akpm, vbabka, surenb, mhocko, jackmanb, hannes,
	ziy, lorenzo.stoakes, Liam.Howlett, rppt, xiaqinxin, baolu.lu,
	rdunlap

On Wed, Jan 07, 2026 at 05:53:50PM +0100, David Hildenbrand (Red Hat) wrote:
> On 1/6/26 17:21, Mostafa Saleh wrote:
> > Add a new config IOMMU_DEBUG_PAGEALLOC, which registers new data to
> > page_ext.
> > 
> > This config will be used by the IOMMU API to track pages mapped in
> > the IOMMU to catch drivers trying to free kernel memory that they
> > still map in their domains, causing all types of memory corruption.
> > 
> > This behaviour is disabled by default and can be enabled using
> > kernel cmdline iommu.debug_pagealloc.
> > 
> > Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> > Signed-off-by: Mostafa Saleh <smostafa@google.com>
> > ---
> 
> I think I acked it, but maybe too late for you to spot it
> 
> for the MM bits
> 
> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>

Thanks! If my mail client is not acting, it seems that was the
same version(v5) also.

Thanks,
Mostafa

> 
> -- 
> Cheers
> 
> David


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 3/4] iommu: debug-pagealloc: Track IOMMU pages
  2026-01-07 15:21   ` Pranjal Shrivastava
@ 2026-01-08 11:06     ` Mostafa Saleh
  2026-01-08 11:33       ` Mostafa Saleh
  0 siblings, 1 reply; 20+ messages in thread
From: Mostafa Saleh @ 2026-01-08 11:06 UTC (permalink / raw)
  To: Pranjal Shrivastava
  Cc: linux-mm, iommu, linux-kernel, linux-doc, corbet, joro, will,
	robin.murphy, akpm, vbabka, surenb, mhocko, jackmanb, hannes,
	ziy, david, lorenzo.stoakes, Liam.Howlett, rppt, xiaqinxin,
	baolu.lu, rdunlap

On Wed, Jan 07, 2026 at 03:21:41PM +0000, Pranjal Shrivastava wrote:
> On Tue, Jan 06, 2026 at 04:21:59PM +0000, Mostafa Saleh wrote:
> > Using the new calls, use an atomic refcount to track how many times
> > a page is mapped in any of the IOMMUs.
> > 
> > For unmap we need to use iova_to_phys() to get the physical address
> > of the pages.
> > 
> > We use the smallest supported page size as the granularity of tracking
> > per domain.
> > This is important as it is possible to map pages and unmap them with
> > larger sizes (as in map_sg()) cases.
> > 
> > Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> > Signed-off-by: Mostafa Saleh <smostafa@google.com>
> > ---
> >  drivers/iommu/iommu-debug-pagealloc.c | 91 +++++++++++++++++++++++++++
> >  1 file changed, 91 insertions(+)
> > 
> > diff --git a/drivers/iommu/iommu-debug-pagealloc.c b/drivers/iommu/iommu-debug-pagealloc.c
> > index 1d343421da98..86ccb310a4a8 100644
> > --- a/drivers/iommu/iommu-debug-pagealloc.c
> > +++ b/drivers/iommu/iommu-debug-pagealloc.c
> > @@ -29,19 +29,110 @@ struct page_ext_operations page_iommu_debug_ops = {
> >  	.need = need_iommu_debug,
> >  };
> >  
> > +static struct page_ext *get_iommu_page_ext(phys_addr_t phys)
> > +{
> > +	struct page *page = phys_to_page(phys);
> > +	struct page_ext *page_ext = page_ext_get(page);
> > +
> > +	return page_ext;
> > +}
> > +
> > +static struct iommu_debug_metadata *get_iommu_data(struct page_ext *page_ext)
> > +{
> > +	return page_ext_data(page_ext, &page_iommu_debug_ops);
> > +}
> > +
> > +static void iommu_debug_inc_page(phys_addr_t phys)
> > +{
> > +	struct page_ext *page_ext = get_iommu_page_ext(phys);
> > +	struct iommu_debug_metadata *d = get_iommu_data(page_ext);
> > +
> > +	WARN_ON(atomic_inc_return_relaxed(&d->ref) <= 0);
> > +	page_ext_put(page_ext);
> > +}
> > +
> > +static void iommu_debug_dec_page(phys_addr_t phys)
> > +{
> > +	struct page_ext *page_ext = get_iommu_page_ext(phys);
> > +	struct iommu_debug_metadata *d = get_iommu_data(page_ext);
> > +
> > +	WARN_ON(atomic_dec_return_relaxed(&d->ref) < 0);
> > +	page_ext_put(page_ext);
> > +}
> > +
> > +/*
> > + * IOMMU page size doesn't have to match the CPU page size. So, we use
> > + * the smallest IOMMU page size to refcount the pages in the vmemmap.
> > + * That is important as both map and unmap has to use the same page size
> > + * to update the refcount to avoid double counting the same page.
> > + * And as we can't know from iommu_unmap() what was the original page size
> > + * used for map, we just use the minimum supported one for both.
> > + */
> > +static size_t iommu_debug_page_size(struct iommu_domain *domain)
> > +{
> > +	return 1UL << __ffs(domain->pgsize_bitmap);
> > +}
> > +
> >  void __iommu_debug_map(struct iommu_domain *domain, phys_addr_t phys, size_t size)
> >  {
> > +	size_t off, end;
> > +	size_t page_size = iommu_debug_page_size(domain);
> > +
> > +	if (WARN_ON(!phys || check_add_overflow(phys, size, &end)))
> > +		return;
> > +
> > +	for (off = 0 ; off < size ; off += page_size) {
> > +		if (!pfn_valid(__phys_to_pfn(phys + off)))
> > +			continue;
> > +		iommu_debug_inc_page(phys + off);
> > +	}
> > +}
> > +
> > +static void __iommu_debug_update_iova(struct iommu_domain *domain,
> > +				      unsigned long iova, size_t size, bool inc)
> > +{
> > +	size_t off, end;
> > +	size_t page_size = iommu_debug_page_size(domain);
> > +
> > +	if (WARN_ON(check_add_overflow(iova, size, &end)))
> > +		return;
> > +
> > +	for (off = 0 ; off < size ; off += page_size) {
> > +		phys_addr_t phys = iommu_iova_to_phys(domain, iova + off);
> > +
> > +		if (!phys || !pfn_valid(__phys_to_pfn(phys)))
> > +			continue;
> > +
> > +		if (inc)
> > +			iommu_debug_inc_page(phys);
> > +		else
> > +			iommu_debug_dec_page(phys);
> > +	}
> 
> This might loop for too long when we're unmapping a big buffer (say 1GB)
> which is backed by multiple 4K mappings (i.e. not mapped using large
> mappings) it may hold the CPU for too long, per the above example:
> 
> 1,073,741,824 / 4096 = 262,144 iterations each with an iova_to_phys walk
> in a tight loop, could hold the CPU for a little too long and could
> potentially result in soft lockups (painful to see in a debug kernel).
> Since, iommu_unmap can be called in atomic contexts (i.e. interrupts,
> spinlocks with pre-emption disabled) we cannot simply add cond_resched()
> here as well.
> 
> Maybe we can cross that bridge once we get there, but if we can't solve
> the latency now, it'd be nice to explicitly document this risk
> (potential soft lockups on large unmaps) in the Kconfig or cmdline help text?
> 

Yes, I am not sure how bad that would be, looking at the code, the closest
pattern I see in that path is for SWIOTLB, when it’s enabled it will do a
lot of per-page operations on unmap.
There is a disclaimer already in dmesg and the Kconfig about the performance
overhead, and you would need to enable a config + cmdline to get this, so
I’d expect someone enabling it to have some expectations of what they are
doing. But I can add more info to Kconfig if that makes sense.

> >  }
> >  
> >  void __iommu_debug_unmap_begin(struct iommu_domain *domain,
> >  			       unsigned long iova, size_t size)
> >  {
> > +	__iommu_debug_update_iova(domain, iova, size, false);
> >  }
> >  
> >  void __iommu_debug_unmap_end(struct iommu_domain *domain,
> >  			     unsigned long iova, size_t size,
> >  			     size_t unmapped)
> >  {
> > +	if (unmapped == size)
> > +		return;
> > +
> > +	/*
> > +	 * If unmap failed, re-increment the refcount, but if it unmapped
> > +	 * larger size, decrement the extra part.
> > +	 */
> > +	if (unmapped < size)
> > +		__iommu_debug_update_iova(domain, iova + unmapped,
> > +					  size - unmapped, true);
> > +	else
> > +		__iommu_debug_update_iova(domain, iova + size,
> > +					  unmapped - size, false);
> >  }
> 
> I'm a little concerned about this part, when we unmap more than requested,
> the __iommu_debug_update_iova relies on 
> iommu_iova_to_phys(domain, iova + off) to find the physical page to
> decrement. However, since __iommu_debug_unmap_end is called *after* the
> IOMMU driver has removed the mapping (in __iommu_unmap). Thus, the
> iommu_iova_to_phys return 0 (fail) causing the loop in update_iova:
> `if (!phys ...)` to silently continue.
> 
> Since the refcounts for the physical pages in the range:
> [iova + size, iova + unmapped] are never decremented. Won't this result
> in false positives (warnings about page leaks) when those pages are
> eventually freed?
> 
> For example:
> 
> - A driver maps a 2MB region (512 x 4KB). All 512 pgs have refcount = 1.
> 
> - A driver / IOMMU-client calls iommu_unmap(iova, 4KB)
> 
> - unmap_begin(4KB) calls iova_to_phys, succeeds, and decrements the
>   refcount for the 1st page to 0.
> 
> - __iommu_unmap calls the IOMMU driver. The driver (unable to split the
>   block) zaps the entire 2MB range and returns unmapped = 2MB.
> 
> - unmap_end(size=4KB, unmapped=2MB) sees that more was unmapped than
>   requested & attempts to decrement refcounts for the remaining 511 pgs
> 
> - __iommu_debug_update_iova is called for the remaining range, which
>   ends up calling iommu_iova_to_phys. Since the mapping was destroyed,
>   iova_to_phys returns 0.
> 
> - The loop skips the decrement causing the remaining 511 pages to leak
>   with refcount = 1.
> 

Agh, yes, iova_to_phys will always return zero, so the
__iommu_debug_update_iova() will do nothing in that case.

I am not aware which drivers are doing this, I added this logic
because I saw the IOMMU core allow it. I vaguely remember that
had something about splitting blocks which might be related to VFIO,
but I don't think that is needed anymore.

I am happy just to drop it or even preemptively warn in that case, as
it is impossible to retrieve the old addresses.

And maybe, that's a chance to re-evaluate we allow this behviour.

Thanks,
Mostafa

> Thanks,
> Praan


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 3/4] iommu: debug-pagealloc: Track IOMMU pages
  2026-01-08 11:06     ` Mostafa Saleh
@ 2026-01-08 11:33       ` Mostafa Saleh
  0 siblings, 0 replies; 20+ messages in thread
From: Mostafa Saleh @ 2026-01-08 11:33 UTC (permalink / raw)
  To: Pranjal Shrivastava
  Cc: linux-mm, iommu, linux-kernel, linux-doc, corbet, joro, will,
	robin.murphy, akpm, vbabka, surenb, mhocko, jackmanb, hannes,
	ziy, david, lorenzo.stoakes, Liam.Howlett, rppt, xiaqinxin,
	baolu.lu, rdunlap

On Thu, Jan 8, 2026 at 11:06 AM Mostafa Saleh <smostafa@google.com> wrote:
>
> On Wed, Jan 07, 2026 at 03:21:41PM +0000, Pranjal Shrivastava wrote:
> > On Tue, Jan 06, 2026 at 04:21:59PM +0000, Mostafa Saleh wrote:
> > > Using the new calls, use an atomic refcount to track how many times
> > > a page is mapped in any of the IOMMUs.
> > >
> > > For unmap we need to use iova_to_phys() to get the physical address
> > > of the pages.
> > >
> > > We use the smallest supported page size as the granularity of tracking
> > > per domain.
> > > This is important as it is possible to map pages and unmap them with
> > > larger sizes (as in map_sg()) cases.
> > >
> > > Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> > > Signed-off-by: Mostafa Saleh <smostafa@google.com>
> > > ---
> > >  drivers/iommu/iommu-debug-pagealloc.c | 91 +++++++++++++++++++++++++++
> > >  1 file changed, 91 insertions(+)
> > >
> > > diff --git a/drivers/iommu/iommu-debug-pagealloc.c b/drivers/iommu/iommu-debug-pagealloc.c
> > > index 1d343421da98..86ccb310a4a8 100644
> > > --- a/drivers/iommu/iommu-debug-pagealloc.c
> > > +++ b/drivers/iommu/iommu-debug-pagealloc.c
> > > @@ -29,19 +29,110 @@ struct page_ext_operations page_iommu_debug_ops = {
> > >     .need = need_iommu_debug,
> > >  };
> > >
> > > +static struct page_ext *get_iommu_page_ext(phys_addr_t phys)
> > > +{
> > > +   struct page *page = phys_to_page(phys);
> > > +   struct page_ext *page_ext = page_ext_get(page);
> > > +
> > > +   return page_ext;
> > > +}
> > > +
> > > +static struct iommu_debug_metadata *get_iommu_data(struct page_ext *page_ext)
> > > +{
> > > +   return page_ext_data(page_ext, &page_iommu_debug_ops);
> > > +}
> > > +
> > > +static void iommu_debug_inc_page(phys_addr_t phys)
> > > +{
> > > +   struct page_ext *page_ext = get_iommu_page_ext(phys);
> > > +   struct iommu_debug_metadata *d = get_iommu_data(page_ext);
> > > +
> > > +   WARN_ON(atomic_inc_return_relaxed(&d->ref) <= 0);
> > > +   page_ext_put(page_ext);
> > > +}
> > > +
> > > +static void iommu_debug_dec_page(phys_addr_t phys)
> > > +{
> > > +   struct page_ext *page_ext = get_iommu_page_ext(phys);
> > > +   struct iommu_debug_metadata *d = get_iommu_data(page_ext);
> > > +
> > > +   WARN_ON(atomic_dec_return_relaxed(&d->ref) < 0);
> > > +   page_ext_put(page_ext);
> > > +}
> > > +
> > > +/*
> > > + * IOMMU page size doesn't have to match the CPU page size. So, we use
> > > + * the smallest IOMMU page size to refcount the pages in the vmemmap.
> > > + * That is important as both map and unmap has to use the same page size
> > > + * to update the refcount to avoid double counting the same page.
> > > + * And as we can't know from iommu_unmap() what was the original page size
> > > + * used for map, we just use the minimum supported one for both.
> > > + */
> > > +static size_t iommu_debug_page_size(struct iommu_domain *domain)
> > > +{
> > > +   return 1UL << __ffs(domain->pgsize_bitmap);
> > > +}
> > > +
> > >  void __iommu_debug_map(struct iommu_domain *domain, phys_addr_t phys, size_t size)
> > >  {
> > > +   size_t off, end;
> > > +   size_t page_size = iommu_debug_page_size(domain);
> > > +
> > > +   if (WARN_ON(!phys || check_add_overflow(phys, size, &end)))
> > > +           return;
> > > +
> > > +   for (off = 0 ; off < size ; off += page_size) {
> > > +           if (!pfn_valid(__phys_to_pfn(phys + off)))
> > > +                   continue;
> > > +           iommu_debug_inc_page(phys + off);
> > > +   }
> > > +}
> > > +
> > > +static void __iommu_debug_update_iova(struct iommu_domain *domain,
> > > +                                 unsigned long iova, size_t size, bool inc)
> > > +{
> > > +   size_t off, end;
> > > +   size_t page_size = iommu_debug_page_size(domain);
> > > +
> > > +   if (WARN_ON(check_add_overflow(iova, size, &end)))
> > > +           return;
> > > +
> > > +   for (off = 0 ; off < size ; off += page_size) {
> > > +           phys_addr_t phys = iommu_iova_to_phys(domain, iova + off);
> > > +
> > > +           if (!phys || !pfn_valid(__phys_to_pfn(phys)))
> > > +                   continue;
> > > +
> > > +           if (inc)
> > > +                   iommu_debug_inc_page(phys);
> > > +           else
> > > +                   iommu_debug_dec_page(phys);
> > > +   }
> >
> > This might loop for too long when we're unmapping a big buffer (say 1GB)
> > which is backed by multiple 4K mappings (i.e. not mapped using large
> > mappings) it may hold the CPU for too long, per the above example:
> >
> > 1,073,741,824 / 4096 = 262,144 iterations each with an iova_to_phys walk
> > in a tight loop, could hold the CPU for a little too long and could
> > potentially result in soft lockups (painful to see in a debug kernel).
> > Since, iommu_unmap can be called in atomic contexts (i.e. interrupts,
> > spinlocks with pre-emption disabled) we cannot simply add cond_resched()
> > here as well.
> >
> > Maybe we can cross that bridge once we get there, but if we can't solve
> > the latency now, it'd be nice to explicitly document this risk
> > (potential soft lockups on large unmaps) in the Kconfig or cmdline help text?
> >
>
> Yes, I am not sure how bad that would be, looking at the code, the closest
> pattern I see in that path is for SWIOTLB, when it’s enabled it will do a
> lot of per-page operations on unmap.
> There is a disclaimer already in dmesg and the Kconfig about the performance
> overhead, and you would need to enable a config + cmdline to get this, so
> I’d expect someone enabling it to have some expectations of what they are
> doing. But I can add more info to Kconfig if that makes sense.
>
> > >  }
> > >
> > >  void __iommu_debug_unmap_begin(struct iommu_domain *domain,
> > >                            unsigned long iova, size_t size)
> > >  {
> > > +   __iommu_debug_update_iova(domain, iova, size, false);
> > >  }
> > >
> > >  void __iommu_debug_unmap_end(struct iommu_domain *domain,
> > >                          unsigned long iova, size_t size,
> > >                          size_t unmapped)
> > >  {
> > > +   if (unmapped == size)
> > > +           return;
> > > +
> > > +   /*
> > > +    * If unmap failed, re-increment the refcount, but if it unmapped
> > > +    * larger size, decrement the extra part.
> > > +    */
> > > +   if (unmapped < size)
> > > +           __iommu_debug_update_iova(domain, iova + unmapped,
> > > +                                     size - unmapped, true);
> > > +   else
> > > +           __iommu_debug_update_iova(domain, iova + size,
> > > +                                     unmapped - size, false);
> > >  }
> >
> > I'm a little concerned about this part, when we unmap more than requested,
> > the __iommu_debug_update_iova relies on
> > iommu_iova_to_phys(domain, iova + off) to find the physical page to
> > decrement. However, since __iommu_debug_unmap_end is called *after* the
> > IOMMU driver has removed the mapping (in __iommu_unmap). Thus, the
> > iommu_iova_to_phys return 0 (fail) causing the loop in update_iova:
> > `if (!phys ...)` to silently continue.
> >
> > Since the refcounts for the physical pages in the range:
> > [iova + size, iova + unmapped] are never decremented. Won't this result
> > in false positives (warnings about page leaks) when those pages are
> > eventually freed?
> >
> > For example:
> >
> > - A driver maps a 2MB region (512 x 4KB). All 512 pgs have refcount = 1.
> >
> > - A driver / IOMMU-client calls iommu_unmap(iova, 4KB)
> >
> > - unmap_begin(4KB) calls iova_to_phys, succeeds, and decrements the
> >   refcount for the 1st page to 0.
> >
> > - __iommu_unmap calls the IOMMU driver. The driver (unable to split the
> >   block) zaps the entire 2MB range and returns unmapped = 2MB.
> >
> > - unmap_end(size=4KB, unmapped=2MB) sees that more was unmapped than
> >   requested & attempts to decrement refcounts for the remaining 511 pgs
> >
> > - __iommu_debug_update_iova is called for the remaining range, which
> >   ends up calling iommu_iova_to_phys. Since the mapping was destroyed,
> >   iova_to_phys returns 0.
> >
> > - The loop skips the decrement causing the remaining 511 pages to leak
> >   with refcount = 1.
> >
>
> Agh, yes, iova_to_phys will always return zero, so the
> __iommu_debug_update_iova() will do nothing in that case.
>
> I am not aware which drivers are doing this, I added this logic
> because I saw the IOMMU core allow it. I vaguely remember that
> had something about splitting blocks which might be related to VFIO,
> but I don't think that is needed anymore.
>
> I am happy just to drop it or even preemptively warn in that case, as
> it is impossible to retrieve the old addresses.
>
> And maybe, that's a chance to re-evaluate we allow this behviour.
>

I have this, it should have the same effect + a WARN, I will include
it in the new version

diff --git a/drivers/iommu/iommu-debug-pagealloc.c
b/drivers/iommu/iommu-debug-pagealloc.c
index 5353417e64f9..64ec0795fe4c 100644
--- a/drivers/iommu/iommu-debug-pagealloc.c
+++ b/drivers/iommu/iommu-debug-pagealloc.c
@@ -146,16 +146,12 @@ void __iommu_debug_unmap_end(struct iommu_domain *domain,
        if (unmapped == size)
                return;

-       /*
-        * If unmap failed, re-increment the refcount, but if it unmapped
-        * larger size, decrement the extra part.
-        */
+       /* If unmap failed, re-increment the refcount. */
        if (unmapped < size)
                __iommu_debug_update_iova(domain, iova + unmapped,
                                          size - unmapped, true);
        else
-               __iommu_debug_update_iova(domain, iova + size,
-                                         unmapped - size, false);
+               WARN_ONCE(1, "iommu: unmap larger than requested is
not supported in debug_pagealloc\n");
 }

 void iommu_debug_init(void)

Thanks,
Mostafa

> Thanks,
> Mostafa
>
> > Thanks,
> > Praan


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 0/4] iommu: Add IOMMU_DEBUG_PAGEALLOC sanitizer
  2026-01-07 15:24 ` [PATCH v5 0/4] iommu: Add IOMMU_DEBUG_PAGEALLOC sanitizer Pranjal Shrivastava
@ 2026-01-08 11:37   ` Mostafa Saleh
  0 siblings, 0 replies; 20+ messages in thread
From: Mostafa Saleh @ 2026-01-08 11:37 UTC (permalink / raw)
  To: Pranjal Shrivastava
  Cc: linux-mm, iommu, linux-kernel, linux-doc, corbet, joro, will,
	robin.murphy, akpm, vbabka, surenb, mhocko, jackmanb, hannes,
	ziy, david, lorenzo.stoakes, Liam.Howlett, rppt, xiaqinxin,
	baolu.lu, rdunlap

On Wed, Jan 07, 2026 at 03:24:59PM +0000, Pranjal Shrivastava wrote:
> On Tue, Jan 06, 2026 at 04:21:56PM +0000, Mostafa Saleh wrote:
> > Overview
> > --------
> > This patch series introduces a new debugging feature,
> > IOMMU_DEBUG_PAGEALLOC, designed to catch DMA use-after-free bugs
> > and IOMMU mapping leaks from buggy drivers.
> > 
> > The kernel has powerful sanitizers like KASAN and DEBUG_PAGEALLOC
> > for catching CPU-side memory corruption. However, there is limited
> > runtime sanitization for DMA mappings managed by the IOMMU. A buggy
> > driver can free a page while it is still mapped for DMA, leading to
> > memory corruption or use-after-free vulnerabilities when that page is
> > reallocated and used for a different purpose.
> > 
> 
> Thanks for this series! This is really helpful!
> 
> > Inspired by DEBUG_PAGEALLOC, this sanitizer tracks IOMMU mappings on a
> > per-page basis, as it’s not possible to unmap the pages, because it
> > requires to lock and walk all domains on every kernel free, instead we
> > rely on page_ext to add an IOMMU-specific mapping reference count for
> > each page.
> > And on each page allocated/freed from the kernel we simply check the
> > count and WARN if it is not zero, and dumping page owner information
> > if enabled.
> > 
> > Concurrency
> > -----------
> > By design this check is racy where one caller can map pages just after
> > the check, which can lead to false negatives.
> > In my opinion this is acceptable for sanitizers (for ex KCSAN have
> > that property).
> > Otherwise we have to implement locks in iommu_map/unmap for all domains
> > which is not favourable even for a debug feature.
> > The sanitizer only guarantees that the refcount itself doesn’t get
> > corrupted using atomics. And there are no false positives.
> > 
> > CPU vs IOMMU Page Size
> > ----------------------
> > IOMMUs can use different page sizes and which can be non-homogeneous;
> > not even all of them have the same page size.
> > 
> > To solve this, the refcount is always incremented and decremented in
> > units of the smallest page size supported by the IOMMU domain. This
> > ensures the accounting remains consistent regardless of the size of
> > the map or unmap operation, otherwise double counting can happen.
> > 
> > Testing & Performance
> > ---------------------
> > This was tested on Morello with Arm64 + SMMUv3
> > Did some testing Lenovo IdeaCentre X Gen 10 Snapdragon
> > Did some testing on Qemu including different SMMUv3/CPU page size (arm64).
> > 
> > I also ran dma_map_benchmark on Morello:
> > 
> > echo dma_map_benchmark > /sys/bus/pci/devices/0000\:06\:00.0/driver_override
> > echo 0000:06:00.0 >  /sys/bus/pci/devices/0000\:06\:00.0/driver/unbind
> > echo 0000:06:00.0 > /sys/bus/pci/drivers/dma_map_benchmark/bind
> > ./dma_map_benchmark -t $threads -g $nr_pages
> > 
> > CONFIG refers to "CONFIG_IOMMU_DEBUG_PAGEALLOC"
> > cmdline refers to "iommu.debug_pagealloc"
> > Numbers are (map latency)/(unmap latency), lower is better.
> > 
> > 			CONFIG=n    CONFIG=y    CONFIG=y
> > 			            cmdline=0   cmdline=1
> > 4K - 1 thread		0.1/0.6     0.1/0.6     0.1/0.7
> > 4K - 4 threads		0.1/1.1     0.1/1.0     0.2/1.1
> > 1M - 1 thread		0.8/21.2    0.7/21.2    5.4/42.3
> > 1M - 4 threads		1.1/45.9    1.1/46.0    5.9/45.1
> > 
> 
> Just curious to know if we've also measured the latency for larger
> mappings? e.g. 1G mapping backed by `n` 4K mappings?

No, the max granule supported by dma_map_benchmark is 1024, which
is 4M for 4K kernels.
I thought 1M would be better for my setup, as I am using SMMUv3,
where 1MB includes many PTEs compared to 4M, and the 4K test will
cover the single PTE case, so we get more coverage.

Thanks,
Mostafa

> 
> > Changes in v5:
> > v4: https://lore.kernel.org/all/20251211125928.3258905-1-smostafa@google.com/
> > - Fix typo in comment
> > - Collect Baolu R-bs
> > 
> > Main changes in v4:
> > v3: https://lore.kernel.org/all/20251124200811.2942432-1-smostafa@google.com/
> > - Update the kernel parameter format in docs based on Randy feedback
> > - Update commit subjects
> > - Add IOMMU only functions in iommu-priv.h based on Baolu feedback
> > 
> > Main changes in v3: (Most of them addressing Will comments)
> > v2: https://lore.kernel.org/linux-iommu/20251106163953.1971067-1-smostafa@google.com/
> > - Reword the Kconfig help
> > - Use unmap_begin/end instead of unmap/remap
> > - Use relaxed accessors when refcounting
> > - Fix a bug with checking the returned address from iova_to_phys
> > - Add more hardening checks (overflow)
> > - Add more debug info on assertions (dump_page_owner())
> > - Handle cases where unmap returns larger size as the core code seems
> >   to tolerate that.
> > - Drop Tested-by tags from Qinxin as the code logic changed
> > 
> > Main changes in v2:
> > v1: https://lore.kernel.org/linux-iommu/20251003173229.1533640-1-smostafa@google.com/
> > - Address Jörg comments about #ifdefs and static keys
> > - Reword the Kconfig help
> > - Drop RFC
> > - Collect t-b from Qinxin
> > - Minor cleanups
> > 
> > Mostafa Saleh (4):
> >   iommu: Add page_ext for IOMMU_DEBUG_PAGEALLOC
> >   iommu: Add calls for IOMMU_DEBUG_PAGEALLOC
> >   iommu: debug-pagealloc: Track IOMMU pages
> >   iommu: debug-pagealloc: Check mapped/unmapped kernel memory
> > 
> >  .../admin-guide/kernel-parameters.txt         |   9 +
> >  drivers/iommu/Kconfig                         |  19 ++
> >  drivers/iommu/Makefile                        |   1 +
> >  drivers/iommu/iommu-debug-pagealloc.c         | 174 ++++++++++++++++++
> >  drivers/iommu/iommu-priv.h                    |  58 ++++++
> >  drivers/iommu/iommu.c                         |  11 +-
> >  include/linux/iommu-debug-pagealloc.h         |  32 ++++
> >  include/linux/mm.h                            |   5 +
> >  mm/page_ext.c                                 |   4 +
> >  9 files changed, 311 insertions(+), 2 deletions(-)
> >  create mode 100644 drivers/iommu/iommu-debug-pagealloc.c
> >  create mode 100644 include/linux/iommu-debug-pagealloc.h
> > 
> > -- 
> > 2.52.0.351.gbe84eed79e-goog
> > 
> > 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 1/4] iommu: Add page_ext for IOMMU_DEBUG_PAGEALLOC
  2026-01-08 10:42     ` Mostafa Saleh
@ 2026-01-08 11:53       ` David Hildenbrand (Red Hat)
  0 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand (Red Hat) @ 2026-01-08 11:53 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: linux-mm, iommu, linux-kernel, linux-doc, corbet, joro, will,
	robin.murphy, akpm, vbabka, surenb, mhocko, jackmanb, hannes,
	ziy, lorenzo.stoakes, Liam.Howlett, rppt, xiaqinxin, baolu.lu,
	rdunlap

On 1/8/26 11:42, Mostafa Saleh wrote:
> On Wed, Jan 07, 2026 at 05:53:50PM +0100, David Hildenbrand (Red Hat) wrote:
>> On 1/6/26 17:21, Mostafa Saleh wrote:
>>> Add a new config IOMMU_DEBUG_PAGEALLOC, which registers new data to
>>> page_ext.
>>>
>>> This config will be used by the IOMMU API to track pages mapped in
>>> the IOMMU to catch drivers trying to free kernel memory that they
>>> still map in their domains, causing all types of memory corruption.
>>>
>>> This behaviour is disabled by default and can be enabled using
>>> kernel cmdline iommu.debug_pagealloc.
>>>
>>> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
>>> Signed-off-by: Mostafa Saleh <smostafa@google.com>
>>> ---
>>
>> I think I acked it, but maybe too late for you to spot it
>>
>> for the MM bits
>>
>> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
> 
> Thanks! If my mail client is not acting, it seems that was the
> same version(v5) also.

Maybe my client is having problems :)

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2026-01-08 11:53 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-01-06 16:21 [PATCH v5 0/4] iommu: Add IOMMU_DEBUG_PAGEALLOC sanitizer Mostafa Saleh
2026-01-06 16:21 ` [PATCH v5 1/4] iommu: Add page_ext for IOMMU_DEBUG_PAGEALLOC Mostafa Saleh
2026-01-06 18:50   ` David Hildenbrand (Red Hat)
2026-01-07 15:26   ` Pranjal Shrivastava
2026-01-07 16:53   ` David Hildenbrand (Red Hat)
2026-01-08 10:42     ` Mostafa Saleh
2026-01-08 11:53       ` David Hildenbrand (Red Hat)
2026-01-06 16:21 ` [PATCH v5 2/4] iommu: Add calls " Mostafa Saleh
2026-01-06 21:17   ` Samiullah Khawaja
2026-01-07  5:48   ` Baolu Lu
2026-01-07 15:28   ` Pranjal Shrivastava
2026-01-06 16:21 ` [PATCH v5 3/4] iommu: debug-pagealloc: Track IOMMU pages Mostafa Saleh
2026-01-06 21:18   ` Samiullah Khawaja
2026-01-07 15:21   ` Pranjal Shrivastava
2026-01-08 11:06     ` Mostafa Saleh
2026-01-08 11:33       ` Mostafa Saleh
2026-01-06 16:22 ` [PATCH v5 4/4] iommu: debug-pagealloc: Check mapped/unmapped kernel memory Mostafa Saleh
2026-01-06 21:19   ` Samiullah Khawaja
2026-01-07 15:24 ` [PATCH v5 0/4] iommu: Add IOMMU_DEBUG_PAGEALLOC sanitizer Pranjal Shrivastava
2026-01-08 11:37   ` Mostafa Saleh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox