* [PATCH v2 0/4] iommu: Add IOMMU_DEBUG_PAGEALLOC sanitizer
@ 2025-11-06 16:39 Mostafa Saleh
2025-11-06 16:39 ` [PATCH v2 1/4] drivers/iommu: Add page_ext for IOMMU_DEBUG_PAGEALLOC Mostafa Saleh
` (3 more replies)
0 siblings, 4 replies; 18+ messages in thread
From: Mostafa Saleh @ 2025-11-06 16:39 UTC (permalink / raw)
To: linux-mm, iommu, linux-kernel, linux-doc
Cc: corbet, joro, will, robin.murphy, akpm, vbabka, surenb, mhocko,
jackmanb, hannes, ziy, david, lorenzo.stoakes, Liam.Howlett,
rppt, Mostafa Saleh
Overview
--------
This patch series introduces a new debugging feature,
IOMMU_DEBUG_PAGEALLOC, designed to catch DMA use-after-free bugs
and IOMMU mapping leaks from buggy drivers.
The kernel has powerful sanitizers like KASAN and DEBUG_PAGEALLOC
for catching CPU-side memory corruption. However, there is limited
runtime sanitization for DMA mappings managed by the IOMMU. A buggy
driver can free a page while it is still mapped for DMA, leading to
memory corruption or use-after-free vulnerabilities when that page is
reallocated and used for a different purpose.
Inspired by DEBUG_PAGEALLOC, this sanitizer tracks IOMMU mappings on a
per-page basis, as it’s not possible to unmap the pages, because it
requires to lock and walk all domains on every kernel free, instead we
rely on page_ext to add an IOMMU-specific mapping reference count for
each page.
And on each page allocated/freed from the kernel we simply check the
count and WARN if it is not zero.
Concurrency
-----------
By design this check is racy where one caller can map pages just after
the check, which can lead to false negatives.
In my opinion this is acceptable for sanitizers (for ex KCSAN have
that property).
Otherwise we have to implement locks in iommu_map/unmap for all domains
which is not favourable even for a debug feature.
The sanitizer only guarantees that the refcount itself doesn’t get
corrupted using atomics. And there are no false positives.
CPU vs IOMMU Page Size
----------------------
IOMMUs can use different page sizes and which can be non-homogeneous;
not even all of them have the same page size.
To solve this, the refcount is always incremented and decremented in
units of the smallest page size supported by the IOMMU domain. This
ensures the accounting remains consistent regardless of the size of
the map or unmap operation, otherwise double counting can happen.
Testing & Performance
---------------------
This was tested on Morello with Arm64 + SMMUv3
Also I booted RockPi-4b with Rockchip IOMMU.
Did some tests on Qemu including different SMMUv3/CPU page size (arm64).
I also ran dma_map_benchmark on Morello:
echo dma_map_benchmark > /sys/bus/pci/devices/0000\:06\:00.0/driver_override
echo 0000:06:00.0 > /sys/bus/pci/devices/0000\:06\:00.0/driver/unbind
echo 0000:06:00.0 > /sys/bus/pci/drivers/dma_map_benchmark/bind
./dma_map_benchmark -t $threads -g $nr_pages
CONFIG refers to "CONFIG_IOMMU_DEBUG_PAGEALLOC"
cmdline refers to "iommu.debug_pagealloc"
Numbers are (map latency)/(unmap latency), lower is better.
CONFIG=n CONFIG=y CONFIG=y
cmdline=0 cmdline=1
4K - 1 thread 0.1/0.6 0.1/0.6 0.1/0.7
4K - 4 threads 0.1/1.0 0.1/1.0 0.1/1.1
1M - 1 thread 0.8/21.2 0.8/21.2 5.6/42.4
1M - 4 threads 1.1/45.9 1.1/46.0 6.0/45.4
Main changes v2:
v1: https://lore.kernel.org/linux-iommu/20251003173229.1533640-1-smostafa@google.com/
- Address Jörg comments about #ifdefs and static keys
- Reword the KCONFIG help
- Drop RFC
- Collect t-b from Qinxin
- Minor cleanups
Mostafa Saleh (4):
drivers/iommu: Add page_ext for IOMMU_DEBUG_PAGEALLOC
drivers/iommu: Add calls for IOMMU_DEBUG_PAGEALLOC
drivers/iommu-debug-pagealloc: Track IOMMU pages
drivers/iommu-debug-pagealloc: Check mapped/unmapped kernel memory
.../admin-guide/kernel-parameters.txt | 6 +
drivers/iommu/Kconfig | 15 ++
drivers/iommu/Makefile | 1 +
drivers/iommu/iommu-debug-pagealloc.c | 148 ++++++++++++++++++
drivers/iommu/iommu.c | 14 +-
include/linux/iommu-debug-pagealloc.h | 83 ++++++++++
include/linux/mm.h | 5 +
mm/page_ext.c | 4 +
8 files changed, 274 insertions(+), 2 deletions(-)
create mode 100644 drivers/iommu/iommu-debug-pagealloc.c
create mode 100644 include/linux/iommu-debug-pagealloc.h
base-commit: dc77806cf3b4788d328fddf245e86c5b529f31a2
--
2.51.2.1026.g39e6a42477-goog
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 1/4] drivers/iommu: Add page_ext for IOMMU_DEBUG_PAGEALLOC
2025-11-06 16:39 [PATCH v2 0/4] iommu: Add IOMMU_DEBUG_PAGEALLOC sanitizer Mostafa Saleh
@ 2025-11-06 16:39 ` Mostafa Saleh
2025-11-06 19:50 ` Randy Dunlap
2025-11-13 10:05 ` Will Deacon
2025-11-06 16:39 ` [PATCH v2 2/4] drivers/iommu: Add calls " Mostafa Saleh
` (2 subsequent siblings)
3 siblings, 2 replies; 18+ messages in thread
From: Mostafa Saleh @ 2025-11-06 16:39 UTC (permalink / raw)
To: linux-mm, iommu, linux-kernel, linux-doc
Cc: corbet, joro, will, robin.murphy, akpm, vbabka, surenb, mhocko,
jackmanb, hannes, ziy, david, lorenzo.stoakes, Liam.Howlett,
rppt, Mostafa Saleh, Qinxin Xia
Add a new config IOMMU_DEBUG_PAGEALLOC, which registers new data to
page_ext.
This config will be used by the IOMMU API to track pages mapped in
the IOMMU to catch drivers trying to free kernel memory that they
still map in their domains, causing all types of memory corruption.
This behaviour is disabled by default and can be enabled using
kernel cmdline iommu.debug_pagealloc.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Tested-by: Qinxin Xia <xiaqinxin@huawei.com>
---
.../admin-guide/kernel-parameters.txt | 6 ++++
drivers/iommu/Kconfig | 15 +++++++++
drivers/iommu/Makefile | 1 +
drivers/iommu/iommu-debug-pagealloc.c | 32 +++++++++++++++++++
include/linux/iommu-debug-pagealloc.h | 17 ++++++++++
mm/page_ext.c | 4 +++
6 files changed, 75 insertions(+)
create mode 100644 drivers/iommu/iommu-debug-pagealloc.c
create mode 100644 include/linux/iommu-debug-pagealloc.h
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 6c42061ca20e..9a1c4ac8ba96 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2557,6 +2557,12 @@
1 - Bypass the IOMMU for DMA.
unset - Use value of CONFIG_IOMMU_DEFAULT_PASSTHROUGH.
+ iommu.debug_pagealloc=
+ [KNL,EARLY] When CONFIG_IOMMU_DEBUG_PAGEALLOC is set, this
+ parameter enables the feature at boot time. By default, it
+ is disabled and the system will work mostly the same as a
+ kernel built without CONFIG_IOMMU_DEBUG_PAGEALLOC.
+
io7= [HW] IO7 for Marvel-based Alpha systems
See comment before marvel_specify_io7 in
arch/alpha/kernel/core_marvel.c.
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 70d29b14d851..6b5e9a2d936a 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -383,4 +383,19 @@ config SPRD_IOMMU
Say Y here if you want to use the multimedia devices listed above.
+config IOMMU_DEBUG_PAGEALLOC
+ bool "Debug page memory allocations against IOMMU"
+ depends on DEBUG_PAGEALLOC && IOMMU_API && PAGE_EXTENSION
+ help
+ This config checks that a page is freed(unmapped) or mapped by the
+ kernel is not mapped in any IOMMU domain. It can help with debugging
+ use-after-free or out-of-bound maps from drivers doing DMA through
+ the IOMMU API.
+ This santaizer can have false-negative cases where some problems
+ won't be detected.
+ Expect overhead when enabling this and enabling the kernel command
+ line iommu.debug_pagealloc.
+
+ If unsure, say N here.
+
endif # IOMMU_SUPPORT
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 355294fa9033..8f5130b6a671 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -34,3 +34,4 @@ obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
obj-$(CONFIG_IOMMU_IOPF) += io-pgfault.o
obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o
obj-$(CONFIG_APPLE_DART) += apple-dart.o
+obj-$(CONFIG_IOMMU_DEBUG_PAGEALLOC) += iommu-debug-pagealloc.o
diff --git a/drivers/iommu/iommu-debug-pagealloc.c b/drivers/iommu/iommu-debug-pagealloc.c
new file mode 100644
index 000000000000..385c8bfae02b
--- /dev/null
+++ b/drivers/iommu/iommu-debug-pagealloc.c
@@ -0,0 +1,32 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2025 - Google Inc
+ * Author: Mostafa Saleh <smostafa@google.com>
+ * IOMMU API debug page alloc sanitizer
+ */
+#include <linux/atomic.h>
+#include <linux/iommu-debug-pagealloc.h>
+#include <linux/kernel.h>
+#include <linux/page_ext.h>
+
+static bool needed;
+
+struct iommu_debug_metadate {
+ atomic_t ref;
+};
+
+static __init bool need_iommu_debug(void)
+{
+ return needed;
+}
+
+struct page_ext_operations page_iommu_debug_ops = {
+ .size = sizeof(struct iommu_debug_metadate),
+ .need = need_iommu_debug,
+};
+
+static int __init iommu_debug_pagealloc(char *str)
+{
+ return kstrtobool(str, &needed);
+}
+early_param("iommu.debug_pagealloc", iommu_debug_pagealloc);
diff --git a/include/linux/iommu-debug-pagealloc.h b/include/linux/iommu-debug-pagealloc.h
new file mode 100644
index 000000000000..83e64d70bf6c
--- /dev/null
+++ b/include/linux/iommu-debug-pagealloc.h
@@ -0,0 +1,17 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2025 - Google Inc
+ * Author: Mostafa Saleh <smostafa@google.com>
+ * IOMMU API debug page alloc sanitizer
+ */
+
+#ifndef __LINUX_IOMMU_DEBUG_PAGEALLOC_H
+#define __LINUX_IOMMU_DEBUG_PAGEALLOC_H
+
+#ifdef CONFIG_IOMMU_DEBUG_PAGEALLOC
+
+extern struct page_ext_operations page_iommu_debug_ops;
+
+#endif /* CONFIG_IOMMU_DEBUG_PAGEALLOC */
+
+#endif /* __LINUX_IOMMU_DEBUG_PAGEALLOC_H */
diff --git a/mm/page_ext.c b/mm/page_ext.c
index d7396a8970e5..297e4cd8ce90 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -11,6 +11,7 @@
#include <linux/page_table_check.h>
#include <linux/rcupdate.h>
#include <linux/pgalloc_tag.h>
+#include <linux/iommu-debug-pagealloc.h>
/*
* struct page extension
@@ -89,6 +90,9 @@ static struct page_ext_operations *page_ext_ops[] __initdata = {
#ifdef CONFIG_PAGE_TABLE_CHECK
&page_table_check_ops,
#endif
+#ifdef CONFIG_IOMMU_DEBUG_PAGEALLOC
+ &page_iommu_debug_ops,
+#endif
};
unsigned long page_ext_size;
--
2.51.2.1026.g39e6a42477-goog
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 2/4] drivers/iommu: Add calls for IOMMU_DEBUG_PAGEALLOC
2025-11-06 16:39 [PATCH v2 0/4] iommu: Add IOMMU_DEBUG_PAGEALLOC sanitizer Mostafa Saleh
2025-11-06 16:39 ` [PATCH v2 1/4] drivers/iommu: Add page_ext for IOMMU_DEBUG_PAGEALLOC Mostafa Saleh
@ 2025-11-06 16:39 ` Mostafa Saleh
2025-11-13 11:00 ` Will Deacon
2025-11-06 16:39 ` [PATCH v2 3/4] drivers/iommu-debug-pagealloc: Track IOMMU pages Mostafa Saleh
2025-11-06 16:39 ` [PATCH v2 4/4] drivers/iommu-debug-pagealloc: Check mapped/unmapped kernel memory Mostafa Saleh
3 siblings, 1 reply; 18+ messages in thread
From: Mostafa Saleh @ 2025-11-06 16:39 UTC (permalink / raw)
To: linux-mm, iommu, linux-kernel, linux-doc
Cc: corbet, joro, will, robin.murphy, akpm, vbabka, surenb, mhocko,
jackmanb, hannes, ziy, david, lorenzo.stoakes, Liam.Howlett,
rppt, Mostafa Saleh, Qinxin Xia
Add calls for the new iommu debug config IOMMU_DEBUG_PAGEALLOC:
- iommu_debug_init: Enable the debug mode if configured by the user.
- iommu_debug_map: Track iommu pages mapped, using physical address.
- iommu_debug_unmap: Track iommu pages unmapped, using IO virtual
address.
- iommu_debug_remap: Track iommu pages, already mapped using IOVA.
We have to do the unmap/remap as once pages are unmapped we lose the
information of the physical address.
This is racy, but the API is racy by construction as it uses refcounts
and doesn't attempt to lock/synchronize with the IOMMU API as that will
be costly, meaning that possibility of false negative exists.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Tested-by: Qinxin Xia <xiaqinxin@huawei.com>
---
drivers/iommu/iommu-debug-pagealloc.c | 23 ++++++++++++
drivers/iommu/iommu.c | 14 ++++++-
include/linux/iommu-debug-pagealloc.h | 54 +++++++++++++++++++++++++++
3 files changed, 89 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/iommu-debug-pagealloc.c b/drivers/iommu/iommu-debug-pagealloc.c
index 385c8bfae02b..a6a2f844b09d 100644
--- a/drivers/iommu/iommu-debug-pagealloc.c
+++ b/drivers/iommu/iommu-debug-pagealloc.c
@@ -5,11 +5,13 @@
* IOMMU API debug page alloc sanitizer
*/
#include <linux/atomic.h>
+#include <linux/iommu.h>
#include <linux/iommu-debug-pagealloc.h>
#include <linux/kernel.h>
#include <linux/page_ext.h>
static bool needed;
+DEFINE_STATIC_KEY_FALSE(iommu_debug_initialized);
struct iommu_debug_metadate {
atomic_t ref;
@@ -25,6 +27,27 @@ struct page_ext_operations page_iommu_debug_ops = {
.need = need_iommu_debug,
};
+void __iommu_debug_map(struct iommu_domain *domain, phys_addr_t phys, size_t size)
+{
+}
+
+void __iommu_debug_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
+{
+}
+
+void __iommu_debug_remap(struct iommu_domain *domain, unsigned long iova, size_t size)
+{
+}
+
+void iommu_debug_init(void)
+{
+ if (!needed)
+ return;
+
+ pr_info("iommu: Debugging page allocations, expect overhead or disable iommu.debug_pagealloc");
+ static_branch_enable(&iommu_debug_initialized);
+}
+
static int __init iommu_debug_pagealloc(char *str)
{
return kstrtobool(str, &needed);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 59244c744eab..f374ac0fdf95 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -18,6 +18,7 @@
#include <linux/errno.h>
#include <linux/host1x_context_bus.h>
#include <linux/iommu.h>
+#include <linux/iommu-debug-pagealloc.h>
#include <linux/iommufd.h>
#include <linux/idr.h>
#include <linux/err.h>
@@ -231,6 +232,8 @@ static int __init iommu_subsys_init(void)
if (!nb)
return -ENOMEM;
+ iommu_debug_init();
+
for (int i = 0; i < ARRAY_SIZE(iommu_buses); i++) {
nb[i].notifier_call = iommu_bus_notifier;
bus_register_notifier(iommu_buses[i], &nb[i]);
@@ -2544,10 +2547,12 @@ int iommu_map_nosync(struct iommu_domain *domain, unsigned long iova,
}
/* unroll mapping in case something went wrong */
- if (ret)
+ if (ret) {
iommu_unmap(domain, orig_iova, orig_size - size);
- else
+ } else {
trace_map(orig_iova, orig_paddr, orig_size);
+ iommu_debug_map(domain, orig_paddr, orig_size);
+ }
return ret;
}
@@ -2609,6 +2614,8 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
pr_debug("unmap this: iova 0x%lx size 0x%zx\n", iova, size);
+ iommu_debug_unmap(domain, iova, size);
+
/*
* Keep iterating until we either unmap 'size' bytes (or more)
* or we hit an area that isn't mapped.
@@ -2628,6 +2635,9 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
unmapped += unmapped_page;
}
+ if (unmapped < size)
+ iommu_debug_remap(domain, iova, size - unmapped);
+
trace_unmap(orig_iova, size, unmapped);
return unmapped;
}
diff --git a/include/linux/iommu-debug-pagealloc.h b/include/linux/iommu-debug-pagealloc.h
index 83e64d70bf6c..180446d6d86f 100644
--- a/include/linux/iommu-debug-pagealloc.h
+++ b/include/linux/iommu-debug-pagealloc.h
@@ -8,10 +8,64 @@
#ifndef __LINUX_IOMMU_DEBUG_PAGEALLOC_H
#define __LINUX_IOMMU_DEBUG_PAGEALLOC_H
+struct iommu_domain;
+
#ifdef CONFIG_IOMMU_DEBUG_PAGEALLOC
+DECLARE_STATIC_KEY_FALSE(iommu_debug_initialized);
+
extern struct page_ext_operations page_iommu_debug_ops;
+void __iommu_debug_map(struct iommu_domain *domain, phys_addr_t phys,
+ size_t size);
+void __iommu_debug_unmap(struct iommu_domain *domain, unsigned long iova,
+ size_t size);
+void __iommu_debug_remap(struct iommu_domain *domain, unsigned long iova,
+ size_t size);
+
+static inline void iommu_debug_map(struct iommu_domain *domain,
+ phys_addr_t phys, size_t size)
+{
+ if (static_branch_unlikely(&iommu_debug_initialized))
+ __iommu_debug_map(domain, phys, size);
+}
+
+static inline void iommu_debug_unmap(struct iommu_domain *domain,
+ unsigned long iova, size_t size)
+{
+ if (static_branch_unlikely(&iommu_debug_initialized))
+ __iommu_debug_unmap(domain, iova, size);
+}
+
+static inline void iommu_debug_remap(struct iommu_domain *domain,
+ unsigned long iova, size_t size)
+{
+ if (static_branch_unlikely(&iommu_debug_initialized))
+ __iommu_debug_remap(domain, iova, size);
+}
+
+void iommu_debug_init(void);
+
+#else
+static inline void iommu_debug_map(struct iommu_domain *domain,
+ phys_addr_t phys, size_t size)
+{
+}
+
+static inline void iommu_debug_unmap(struct iommu_domain *domain,
+ unsigned long iova, size_t size)
+{
+}
+
+static inline void iommu_debug_remap(struct iommu_domain *domain,
+ unsigned long iova, size_t size)
+{
+}
+
+static inline void iommu_debug_init(void)
+{
+}
+
#endif /* CONFIG_IOMMU_DEBUG_PAGEALLOC */
#endif /* __LINUX_IOMMU_DEBUG_PAGEALLOC_H */
--
2.51.2.1026.g39e6a42477-goog
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 3/4] drivers/iommu-debug-pagealloc: Track IOMMU pages
2025-11-06 16:39 [PATCH v2 0/4] iommu: Add IOMMU_DEBUG_PAGEALLOC sanitizer Mostafa Saleh
2025-11-06 16:39 ` [PATCH v2 1/4] drivers/iommu: Add page_ext for IOMMU_DEBUG_PAGEALLOC Mostafa Saleh
2025-11-06 16:39 ` [PATCH v2 2/4] drivers/iommu: Add calls " Mostafa Saleh
@ 2025-11-06 16:39 ` Mostafa Saleh
2025-11-13 11:00 ` Will Deacon
2025-11-06 16:39 ` [PATCH v2 4/4] drivers/iommu-debug-pagealloc: Check mapped/unmapped kernel memory Mostafa Saleh
3 siblings, 1 reply; 18+ messages in thread
From: Mostafa Saleh @ 2025-11-06 16:39 UTC (permalink / raw)
To: linux-mm, iommu, linux-kernel, linux-doc
Cc: corbet, joro, will, robin.murphy, akpm, vbabka, surenb, mhocko,
jackmanb, hannes, ziy, david, lorenzo.stoakes, Liam.Howlett,
rppt, Mostafa Saleh, Qinxin Xia
Using the new calls, use an atomic refcount to track how many times
a page is mapped in any of the IOMMUs.
For unmap we need to use iova_to_phys() to get the physical address
of the pages.
We use the smallest supported page size as the granularity of tracking
per domain.
This is important as it possible to map pages and unmap them with
larger sizes (as in map_sg()) cases.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Tested-by: Qinxin Xia <xiaqinxin@huawei.com>
---
drivers/iommu/iommu-debug-pagealloc.c | 74 +++++++++++++++++++++++++++
1 file changed, 74 insertions(+)
diff --git a/drivers/iommu/iommu-debug-pagealloc.c b/drivers/iommu/iommu-debug-pagealloc.c
index a6a2f844b09d..0e14104b971c 100644
--- a/drivers/iommu/iommu-debug-pagealloc.c
+++ b/drivers/iommu/iommu-debug-pagealloc.c
@@ -27,16 +27,90 @@ struct page_ext_operations page_iommu_debug_ops = {
.need = need_iommu_debug,
};
+static struct page_ext *get_iommu_page_ext(phys_addr_t phys)
+{
+ struct page *page = phys_to_page(phys);
+ struct page_ext *page_ext = page_ext_get(page);
+
+ return page_ext;
+}
+
+static struct iommu_debug_metadate *get_iommu_data(struct page_ext *page_ext)
+{
+ return page_ext_data(page_ext, &page_iommu_debug_ops);
+}
+
+static void iommu_debug_inc_page(phys_addr_t phys)
+{
+ struct page_ext *page_ext = get_iommu_page_ext(phys);
+ struct iommu_debug_metadate *d = get_iommu_data(page_ext);
+
+ WARN_ON(atomic_inc_return(&d->ref) <= 0);
+ page_ext_put(page_ext);
+}
+
+static void iommu_debug_dec_page(phys_addr_t phys)
+{
+ struct page_ext *page_ext = get_iommu_page_ext(phys);
+ struct iommu_debug_metadate *d = get_iommu_data(page_ext);
+
+ WARN_ON(atomic_dec_return(&d->ref) < 0);
+ page_ext_put(page_ext);
+}
+
+/*
+ * IOMMU page size might not match the CPU page size, in that case, we use
+ * the smallest IOMMU page size to refcount the pages in the vmemmap.
+ * That is important as both map and unmap has to use the same page size
+ * to update the refcount to avoid double counting the same page.
+ * And as we can't know from iommu_unmap() what was the original page size
+ * used for map, we just use the minimum supported one for both.
+ */
+static size_t iommu_debug_page_size(struct iommu_domain *domain)
+{
+ return 1UL << __ffs(domain->pgsize_bitmap);
+}
+
void __iommu_debug_map(struct iommu_domain *domain, phys_addr_t phys, size_t size)
{
+ size_t off;
+ size_t page_size = iommu_debug_page_size(domain);
+
+ for (off = 0 ; off < size ; off += page_size) {
+ if (!pfn_valid(__phys_to_pfn(phys + off)))
+ continue;
+ iommu_debug_inc_page(phys + off);
+ }
}
void __iommu_debug_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
{
+ size_t off;
+ size_t page_size = iommu_debug_page_size(domain);
+
+ for (off = 0 ; off < size ; off += page_size) {
+ phys_addr_t phys = iommu_iova_to_phys(domain, iova + off);
+
+ if (!phys || !pfn_valid(__phys_to_pfn(phys + off)))
+ continue;
+
+ iommu_debug_dec_page(phys);
+ }
}
void __iommu_debug_remap(struct iommu_domain *domain, unsigned long iova, size_t size)
{
+ size_t off;
+ size_t page_size = iommu_debug_page_size(domain);
+
+ for (off = 0 ; off < size ; off += page_size) {
+ phys_addr_t phys = iommu_iova_to_phys(domain, iova + off);
+
+ if (!phys || !pfn_valid(__phys_to_pfn(phys + off)))
+ continue;
+
+ iommu_debug_inc_page(phys);
+ }
}
void iommu_debug_init(void)
--
2.51.2.1026.g39e6a42477-goog
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 4/4] drivers/iommu-debug-pagealloc: Check mapped/unmapped kernel memory
2025-11-06 16:39 [PATCH v2 0/4] iommu: Add IOMMU_DEBUG_PAGEALLOC sanitizer Mostafa Saleh
` (2 preceding siblings ...)
2025-11-06 16:39 ` [PATCH v2 3/4] drivers/iommu-debug-pagealloc: Track IOMMU pages Mostafa Saleh
@ 2025-11-06 16:39 ` Mostafa Saleh
2025-11-13 10:59 ` Will Deacon
3 siblings, 1 reply; 18+ messages in thread
From: Mostafa Saleh @ 2025-11-06 16:39 UTC (permalink / raw)
To: linux-mm, iommu, linux-kernel, linux-doc
Cc: corbet, joro, will, robin.murphy, akpm, vbabka, surenb, mhocko,
jackmanb, hannes, ziy, david, lorenzo.stoakes, Liam.Howlett,
rppt, Mostafa Saleh, Qinxin Xia
Now, as the page_ext holds count of IOMMU mappings, we can use it to
assert that any page allocated/freed is indeed not in the IOMMU.
The sanitizer doesn’t protect against mapping/unmapping during this
period. However, that’s less harmful as the page is not used by the
kernel.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Tested-by: Qinxin Xia <xiaqinxin@huawei.com>
---
drivers/iommu/iommu-debug-pagealloc.c | 19 +++++++++++++++++++
include/linux/iommu-debug-pagealloc.h | 12 ++++++++++++
include/linux/mm.h | 5 +++++
3 files changed, 36 insertions(+)
diff --git a/drivers/iommu/iommu-debug-pagealloc.c b/drivers/iommu/iommu-debug-pagealloc.c
index 0e14104b971c..5b26c84d3a0e 100644
--- a/drivers/iommu/iommu-debug-pagealloc.c
+++ b/drivers/iommu/iommu-debug-pagealloc.c
@@ -71,6 +71,25 @@ static size_t iommu_debug_page_size(struct iommu_domain *domain)
return 1UL << __ffs(domain->pgsize_bitmap);
}
+static unsigned int iommu_debug_page_count(unsigned long phys)
+{
+ unsigned int ref;
+ struct page_ext *page_ext = get_iommu_page_ext(phys);
+ struct iommu_debug_metadate *d = get_iommu_data(page_ext);
+
+ ref = atomic_read(&d->ref);
+ page_ext_put(page_ext);
+ return ref;
+}
+
+void __iommu_debug_check_unmapped(const struct page *page, int numpages)
+{
+ while (numpages--) {
+ WARN_ON(iommu_debug_page_count(page_to_phys(page)));
+ page++;
+ }
+}
+
void __iommu_debug_map(struct iommu_domain *domain, phys_addr_t phys, size_t size)
{
size_t off;
diff --git a/include/linux/iommu-debug-pagealloc.h b/include/linux/iommu-debug-pagealloc.h
index 180446d6d86f..84110e4ecfaa 100644
--- a/include/linux/iommu-debug-pagealloc.h
+++ b/include/linux/iommu-debug-pagealloc.h
@@ -22,6 +22,7 @@ void __iommu_debug_unmap(struct iommu_domain *domain, unsigned long iova,
size_t size);
void __iommu_debug_remap(struct iommu_domain *domain, unsigned long iova,
size_t size);
+void __iommu_debug_check_unmapped(const struct page *page, int numpages);
static inline void iommu_debug_map(struct iommu_domain *domain,
phys_addr_t phys, size_t size)
@@ -44,6 +45,12 @@ static inline void iommu_debug_remap(struct iommu_domain *domain,
__iommu_debug_remap(domain, iova, size);
}
+static inline void iommu_debug_check_unmapped(const struct page *page, int numpages)
+{
+ if (static_branch_unlikely(&iommu_debug_initialized))
+ __iommu_debug_check_unmapped(page, numpages);
+}
+
void iommu_debug_init(void);
#else
@@ -66,6 +73,11 @@ static inline void iommu_debug_init(void)
{
}
+static inline void iommu_debug_check_unmapped(const struct page *page,
+ int numpages)
+{
+}
+
#endif /* CONFIG_IOMMU_DEBUG_PAGEALLOC */
#endif /* __LINUX_IOMMU_DEBUG_PAGEALLOC_H */
diff --git a/include/linux/mm.h b/include/linux/mm.h
index d16b33bacc32..895a60a49120 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -36,6 +36,7 @@
#include <linux/rcuwait.h>
#include <linux/bitmap.h>
#include <linux/bitops.h>
+#include <linux/iommu-debug-pagealloc.h>
struct mempolicy;
struct anon_vma;
@@ -3811,12 +3812,16 @@ extern void __kernel_map_pages(struct page *page, int numpages, int enable);
#ifdef CONFIG_DEBUG_PAGEALLOC
static inline void debug_pagealloc_map_pages(struct page *page, int numpages)
{
+ iommu_debug_check_unmapped(page, numpages);
+
if (debug_pagealloc_enabled_static())
__kernel_map_pages(page, numpages, 1);
}
static inline void debug_pagealloc_unmap_pages(struct page *page, int numpages)
{
+ iommu_debug_check_unmapped(page, numpages);
+
if (debug_pagealloc_enabled_static())
__kernel_map_pages(page, numpages, 0);
}
--
2.51.2.1026.g39e6a42477-goog
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/4] drivers/iommu: Add page_ext for IOMMU_DEBUG_PAGEALLOC
2025-11-06 16:39 ` [PATCH v2 1/4] drivers/iommu: Add page_ext for IOMMU_DEBUG_PAGEALLOC Mostafa Saleh
@ 2025-11-06 19:50 ` Randy Dunlap
2025-11-24 11:04 ` Mostafa Saleh
2025-11-13 10:05 ` Will Deacon
1 sibling, 1 reply; 18+ messages in thread
From: Randy Dunlap @ 2025-11-06 19:50 UTC (permalink / raw)
To: Mostafa Saleh, linux-mm, iommu, linux-kernel, linux-doc
Cc: corbet, joro, will, robin.murphy, akpm, vbabka, surenb, mhocko,
jackmanb, hannes, ziy, david, lorenzo.stoakes, Liam.Howlett,
rppt, Qinxin Xia
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 70d29b14d851..6b5e9a2d936a 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -383,4 +383,19 @@ config SPRD_IOMMU
>
> Say Y here if you want to use the multimedia devices listed above.
>
> +config IOMMU_DEBUG_PAGEALLOC
> + bool "Debug page memory allocations against IOMMU"
> + depends on DEBUG_PAGEALLOC && IOMMU_API && PAGE_EXTENSION
> + help
> + This config checks that a page is freed(unmapped) or mapped by the
> + kernel is not mapped in any IOMMU domain. It can help with debugging
> + use-after-free or out-of-bound maps from drivers doing DMA through
> + the IOMMU API.
> + This santaizer can have false-negative cases where some problems
sanitizer
> + won't be detected.
> + Expect overhead when enabling this and enabling the kernel command
> + line iommu.debug_pagealloc.
> +
> + If unsure, say N here.
> +
> endif # IOMMU_SUPPORT
--
~Randy
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/4] drivers/iommu: Add page_ext for IOMMU_DEBUG_PAGEALLOC
2025-11-06 16:39 ` [PATCH v2 1/4] drivers/iommu: Add page_ext for IOMMU_DEBUG_PAGEALLOC Mostafa Saleh
2025-11-06 19:50 ` Randy Dunlap
@ 2025-11-13 10:05 ` Will Deacon
2025-11-24 11:10 ` Mostafa Saleh
1 sibling, 1 reply; 18+ messages in thread
From: Will Deacon @ 2025-11-13 10:05 UTC (permalink / raw)
To: Mostafa Saleh
Cc: linux-mm, iommu, linux-kernel, linux-doc, corbet, joro,
robin.murphy, akpm, vbabka, surenb, mhocko, jackmanb, hannes,
ziy, david, lorenzo.stoakes, Liam.Howlett, rppt, Qinxin Xia
Hi Mostafa,
On Thu, Nov 06, 2025 at 04:39:50PM +0000, Mostafa Saleh wrote:
> Add a new config IOMMU_DEBUG_PAGEALLOC, which registers new data to
> page_ext.
> This config will be used by the IOMMU API to track pages mapped in
> the IOMMU to catch drivers trying to free kernel memory that they
> still map in their domains, causing all types of memory corruption.
> This behaviour is disabled by default and can be enabled using
> kernel cmdline iommu.debug_pagealloc.
>
> Signed-off-by: Mostafa Saleh <smostafa@google.com>
> Tested-by: Qinxin Xia <xiaqinxin@huawei.com>
> ---
> .../admin-guide/kernel-parameters.txt | 6 ++++
> drivers/iommu/Kconfig | 15 +++++++++
> drivers/iommu/Makefile | 1 +
> drivers/iommu/iommu-debug-pagealloc.c | 32 +++++++++++++++++++
> include/linux/iommu-debug-pagealloc.h | 17 ++++++++++
> mm/page_ext.c | 4 +++
> 6 files changed, 75 insertions(+)
> create mode 100644 drivers/iommu/iommu-debug-pagealloc.c
> create mode 100644 include/linux/iommu-debug-pagealloc.h
This looks like a pretty handy feature to me, but I have some nits below.
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 6c42061ca20e..9a1c4ac8ba96 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2557,6 +2557,12 @@
> 1 - Bypass the IOMMU for DMA.
> unset - Use value of CONFIG_IOMMU_DEFAULT_PASSTHROUGH.
>
> + iommu.debug_pagealloc=
> + [KNL,EARLY] When CONFIG_IOMMU_DEBUG_PAGEALLOC is set, this
> + parameter enables the feature at boot time. By default, it
> + is disabled and the system will work mostly the same as a
> + kernel built without CONFIG_IOMMU_DEBUG_PAGEALLOC.
Can you be more specific about "mostly the same"?
> +
> io7= [HW] IO7 for Marvel-based Alpha systems
> See comment before marvel_specify_io7 in
> arch/alpha/kernel/core_marvel.c.
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 70d29b14d851..6b5e9a2d936a 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -383,4 +383,19 @@ config SPRD_IOMMU
>
> Say Y here if you want to use the multimedia devices listed above.
>
> +config IOMMU_DEBUG_PAGEALLOC
> + bool "Debug page memory allocations against IOMMU"
Perhaps "IOMMU mappings" would make this a little clearer?
> + depends on DEBUG_PAGEALLOC && IOMMU_API && PAGE_EXTENSION
> + help
> + This config checks that a page is freed(unmapped) or mapped by the
> + kernel is not mapped in any IOMMU domain.
I can't really parse this sentence :/
> It can help with debugging
> + use-after-free or out-of-bound maps from drivers doing DMA through
> + the IOMMU API.
> + This santaizer can have false-negative cases where some problems
> + won't be detected.
Maybe just say "The sanitizer is best-effort and can fail to detect problems
in the case that ...".
> + Expect overhead when enabling this and enabling the kernel command
> + line iommu.debug_pagealloc.
I'd reword this to say something like "Due to the overhead of the sanitiser,
iommu.debug_pagealloc must also be passed on the kernel command-line to
enable this feature".
> +
> + If unsure, say N here.
> +
> endif # IOMMU_SUPPORT
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 355294fa9033..8f5130b6a671 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -34,3 +34,4 @@ obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
> obj-$(CONFIG_IOMMU_IOPF) += io-pgfault.o
> obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o
> obj-$(CONFIG_APPLE_DART) += apple-dart.o
> +obj-$(CONFIG_IOMMU_DEBUG_PAGEALLOC) += iommu-debug-pagealloc.o
> diff --git a/drivers/iommu/iommu-debug-pagealloc.c b/drivers/iommu/iommu-debug-pagealloc.c
> new file mode 100644
> index 000000000000..385c8bfae02b
> --- /dev/null
> +++ b/drivers/iommu/iommu-debug-pagealloc.c
> @@ -0,0 +1,32 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2025 - Google Inc
> + * Author: Mostafa Saleh <smostafa@google.com>
> + * IOMMU API debug page alloc sanitizer
> + */
> +#include <linux/atomic.h>
> +#include <linux/iommu-debug-pagealloc.h>
> +#include <linux/kernel.h>
> +#include <linux/page_ext.h>
> +
> +static bool needed;
> +
> +struct iommu_debug_metadate {
> + atomic_t ref;
> +};
s/metadate/metadata/
Will
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 4/4] drivers/iommu-debug-pagealloc: Check mapped/unmapped kernel memory
2025-11-06 16:39 ` [PATCH v2 4/4] drivers/iommu-debug-pagealloc: Check mapped/unmapped kernel memory Mostafa Saleh
@ 2025-11-13 10:59 ` Will Deacon
2025-11-24 12:38 ` Mostafa Saleh
0 siblings, 1 reply; 18+ messages in thread
From: Will Deacon @ 2025-11-13 10:59 UTC (permalink / raw)
To: Mostafa Saleh
Cc: linux-mm, iommu, linux-kernel, linux-doc, corbet, joro,
robin.murphy, akpm, vbabka, surenb, mhocko, jackmanb, hannes,
ziy, david, lorenzo.stoakes, Liam.Howlett, rppt, Qinxin Xia
On Thu, Nov 06, 2025 at 04:39:53PM +0000, Mostafa Saleh wrote:
> Now, as the page_ext holds count of IOMMU mappings, we can use it to
> assert that any page allocated/freed is indeed not in the IOMMU.
>
> The sanitizer doesn’t protect against mapping/unmapping during this
> period. However, that’s less harmful as the page is not used by the
> kernel.
>
> Signed-off-by: Mostafa Saleh <smostafa@google.com>
> Tested-by: Qinxin Xia <xiaqinxin@huawei.com>
> ---
> drivers/iommu/iommu-debug-pagealloc.c | 19 +++++++++++++++++++
> include/linux/iommu-debug-pagealloc.h | 12 ++++++++++++
> include/linux/mm.h | 5 +++++
> 3 files changed, 36 insertions(+)
>
> diff --git a/drivers/iommu/iommu-debug-pagealloc.c b/drivers/iommu/iommu-debug-pagealloc.c
> index 0e14104b971c..5b26c84d3a0e 100644
> --- a/drivers/iommu/iommu-debug-pagealloc.c
> +++ b/drivers/iommu/iommu-debug-pagealloc.c
> @@ -71,6 +71,25 @@ static size_t iommu_debug_page_size(struct iommu_domain *domain)
> return 1UL << __ffs(domain->pgsize_bitmap);
> }
>
> +static unsigned int iommu_debug_page_count(unsigned long phys)
'phys_addr_t phys' ?
But having said that, wouldn't you be better off taking the
'struct page *' here rather than converting it to a physical address
only for get_iommu_page_ext() to convert it straight back again?
> +{
> + unsigned int ref;
> + struct page_ext *page_ext = get_iommu_page_ext(phys);
> + struct iommu_debug_metadate *d = get_iommu_data(page_ext);
> +
> + ref = atomic_read(&d->ref);
> + page_ext_put(page_ext);
> + return ref;
> +}
> +
> +void __iommu_debug_check_unmapped(const struct page *page, int numpages)
> +{
> + while (numpages--) {
> + WARN_ON(iommu_debug_page_count(page_to_phys(page)));
Since you only care about the count being non-zero, perhaps tweak
iommu_debug_page_count() to be something like:
bool iommu_debug_page_referenced(struct page *);
Will
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/4] drivers/iommu: Add calls for IOMMU_DEBUG_PAGEALLOC
2025-11-06 16:39 ` [PATCH v2 2/4] drivers/iommu: Add calls " Mostafa Saleh
@ 2025-11-13 11:00 ` Will Deacon
2025-11-24 11:23 ` Mostafa Saleh
0 siblings, 1 reply; 18+ messages in thread
From: Will Deacon @ 2025-11-13 11:00 UTC (permalink / raw)
To: Mostafa Saleh
Cc: linux-mm, iommu, linux-kernel, linux-doc, corbet, joro,
robin.murphy, akpm, vbabka, surenb, mhocko, jackmanb, hannes,
ziy, david, lorenzo.stoakes, Liam.Howlett, rppt, Qinxin Xia
On Thu, Nov 06, 2025 at 04:39:51PM +0000, Mostafa Saleh wrote:
> Add calls for the new iommu debug config IOMMU_DEBUG_PAGEALLOC:
> - iommu_debug_init: Enable the debug mode if configured by the user.
> - iommu_debug_map: Track iommu pages mapped, using physical address.
> - iommu_debug_unmap: Track iommu pages unmapped, using IO virtual
> address.
> - iommu_debug_remap: Track iommu pages, already mapped using IOVA.
>
> We have to do the unmap/remap as once pages are unmapped we lose the
> information of the physical address.
> This is racy, but the API is racy by construction as it uses refcounts
> and doesn't attempt to lock/synchronize with the IOMMU API as that will
> be costly, meaning that possibility of false negative exists.
>
> Signed-off-by: Mostafa Saleh <smostafa@google.com>
> Tested-by: Qinxin Xia <xiaqinxin@huawei.com>
> ---
> drivers/iommu/iommu-debug-pagealloc.c | 23 ++++++++++++
> drivers/iommu/iommu.c | 14 ++++++-
> include/linux/iommu-debug-pagealloc.h | 54 +++++++++++++++++++++++++++
> 3 files changed, 89 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/iommu/iommu-debug-pagealloc.c b/drivers/iommu/iommu-debug-pagealloc.c
> index 385c8bfae02b..a6a2f844b09d 100644
> --- a/drivers/iommu/iommu-debug-pagealloc.c
> +++ b/drivers/iommu/iommu-debug-pagealloc.c
> @@ -5,11 +5,13 @@
> * IOMMU API debug page alloc sanitizer
> */
> #include <linux/atomic.h>
> +#include <linux/iommu.h>
> #include <linux/iommu-debug-pagealloc.h>
> #include <linux/kernel.h>
> #include <linux/page_ext.h>
>
> static bool needed;
> +DEFINE_STATIC_KEY_FALSE(iommu_debug_initialized);
>
> struct iommu_debug_metadate {
> atomic_t ref;
> @@ -25,6 +27,27 @@ struct page_ext_operations page_iommu_debug_ops = {
> .need = need_iommu_debug,
> };
>
> +void __iommu_debug_map(struct iommu_domain *domain, phys_addr_t phys, size_t size)
> +{
> +}
> +
> +void __iommu_debug_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
> +{
> +}
> +
> +void __iommu_debug_remap(struct iommu_domain *domain, unsigned long iova, size_t size)
> +{
> +}
Since the IOMMU API doesn't really have a "remap" operation, I wonder
whether it would be clearer to have unmap_begin() and unmap_end()
functions instead? You'd probably want to call them as a pair, so the
check for unmapped < size would move into unmap_end().
Will
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 3/4] drivers/iommu-debug-pagealloc: Track IOMMU pages
2025-11-06 16:39 ` [PATCH v2 3/4] drivers/iommu-debug-pagealloc: Track IOMMU pages Mostafa Saleh
@ 2025-11-13 11:00 ` Will Deacon
2025-11-24 12:37 ` Mostafa Saleh
0 siblings, 1 reply; 18+ messages in thread
From: Will Deacon @ 2025-11-13 11:00 UTC (permalink / raw)
To: Mostafa Saleh
Cc: linux-mm, iommu, linux-kernel, linux-doc, corbet, joro,
robin.murphy, akpm, vbabka, surenb, mhocko, jackmanb, hannes,
ziy, david, lorenzo.stoakes, Liam.Howlett, rppt, Qinxin Xia
On Thu, Nov 06, 2025 at 04:39:52PM +0000, Mostafa Saleh wrote:
> Using the new calls, use an atomic refcount to track how many times
> a page is mapped in any of the IOMMUs.
>
> For unmap we need to use iova_to_phys() to get the physical address
> of the pages.
>
> We use the smallest supported page size as the granularity of tracking
> per domain.
> This is important as it possible to map pages and unmap them with
> larger sizes (as in map_sg()) cases.
>
> Signed-off-by: Mostafa Saleh <smostafa@google.com>
> Tested-by: Qinxin Xia <xiaqinxin@huawei.com>
> ---
> drivers/iommu/iommu-debug-pagealloc.c | 74 +++++++++++++++++++++++++++
> 1 file changed, 74 insertions(+)
>
> diff --git a/drivers/iommu/iommu-debug-pagealloc.c b/drivers/iommu/iommu-debug-pagealloc.c
> index a6a2f844b09d..0e14104b971c 100644
> --- a/drivers/iommu/iommu-debug-pagealloc.c
> +++ b/drivers/iommu/iommu-debug-pagealloc.c
> @@ -27,16 +27,90 @@ struct page_ext_operations page_iommu_debug_ops = {
> .need = need_iommu_debug,
> };
>
> +static struct page_ext *get_iommu_page_ext(phys_addr_t phys)
> +{
> + struct page *page = phys_to_page(phys);
> + struct page_ext *page_ext = page_ext_get(page);
> +
> + return page_ext;
> +}
> +
> +static struct iommu_debug_metadate *get_iommu_data(struct page_ext *page_ext)
> +{
> + return page_ext_data(page_ext, &page_iommu_debug_ops);
> +}
> +
> +static void iommu_debug_inc_page(phys_addr_t phys)
> +{
> + struct page_ext *page_ext = get_iommu_page_ext(phys);
> + struct iommu_debug_metadate *d = get_iommu_data(page_ext);
> +
> + WARN_ON(atomic_inc_return(&d->ref) <= 0);
Is it worth dumping some information about the page in addition to the
WARN_ON()? That way, you might be able to benefit from other debug
options (e.g. PAGE_OWNER) if they are enabled.
> + page_ext_put(page_ext);
> +}
> +
> +static void iommu_debug_dec_page(phys_addr_t phys)
> +{
> + struct page_ext *page_ext = get_iommu_page_ext(phys);
> + struct iommu_debug_metadate *d = get_iommu_data(page_ext);
> +
> + WARN_ON(atomic_dec_return(&d->ref) < 0);
nit: I can't see why you need memory ordering guarantees for the refcount,
so you could use the relaxed variants for the inc/dec operations.
> + page_ext_put(page_ext);
> +}
> +
> +/*
> + * IOMMU page size might not match the CPU page size, in that case, we use
> + * the smallest IOMMU page size to refcount the pages in the vmemmap.
> + * That is important as both map and unmap has to use the same page size
> + * to update the refcount to avoid double counting the same page.
> + * And as we can't know from iommu_unmap() what was the original page size
> + * used for map, we just use the minimum supported one for both.
> + */
> +static size_t iommu_debug_page_size(struct iommu_domain *domain)
> +{
> + return 1UL << __ffs(domain->pgsize_bitmap);
> +}
> +
> void __iommu_debug_map(struct iommu_domain *domain, phys_addr_t phys, size_t size)
> {
> + size_t off;
> + size_t page_size = iommu_debug_page_size(domain);
Since this is a debug feature, is it worth checking other properties of
the arguments too? For example, that phys is non-zero and that phys +
size doesn't overflow?
> + for (off = 0 ; off < size ; off += page_size) {
> + if (!pfn_valid(__phys_to_pfn(phys + off)))
> + continue;
> + iommu_debug_inc_page(phys + off);
> + }
> }
>
> void __iommu_debug_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
> {
> + size_t off;
> + size_t page_size = iommu_debug_page_size(domain);
> +
> + for (off = 0 ; off < size ; off += page_size) {
> + phys_addr_t phys = iommu_iova_to_phys(domain, iova + off);
> +
> + if (!phys || !pfn_valid(__phys_to_pfn(phys + off)))
> + continue;
Hmm, it looks weird to add 'off' to both 'iova' _and_ the resulting
physical address. Is that correct?
> + iommu_debug_dec_page(phys);
> + }
> }
>
> void __iommu_debug_remap(struct iommu_domain *domain, unsigned long iova, size_t size)
> {
> + size_t off;
> + size_t page_size = iommu_debug_page_size(domain);
> +
> + for (off = 0 ; off < size ; off += page_size) {
> + phys_addr_t phys = iommu_iova_to_phys(domain, iova + off);
> +
> + if (!phys || !pfn_valid(__phys_to_pfn(phys + off)))
> + continue;
> +
> + iommu_debug_inc_page(phys);
> + }
You can make the bulk of this code common with the unmap function.
Will
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/4] drivers/iommu: Add page_ext for IOMMU_DEBUG_PAGEALLOC
2025-11-06 19:50 ` Randy Dunlap
@ 2025-11-24 11:04 ` Mostafa Saleh
0 siblings, 0 replies; 18+ messages in thread
From: Mostafa Saleh @ 2025-11-24 11:04 UTC (permalink / raw)
To: Randy Dunlap
Cc: linux-mm, iommu, linux-kernel, linux-doc, corbet, joro, will,
robin.murphy, akpm, vbabka, surenb, mhocko, jackmanb, hannes,
ziy, david, lorenzo.stoakes, Liam.Howlett, rppt, Qinxin Xia
On Thu, Nov 06, 2025 at 11:50:11AM -0800, Randy Dunlap wrote:
>
>
> > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > index 70d29b14d851..6b5e9a2d936a 100644
> > --- a/drivers/iommu/Kconfig
> > +++ b/drivers/iommu/Kconfig
> > @@ -383,4 +383,19 @@ config SPRD_IOMMU
> >
> > Say Y here if you want to use the multimedia devices listed above.
> >
> > +config IOMMU_DEBUG_PAGEALLOC
> > + bool "Debug page memory allocations against IOMMU"
> > + depends on DEBUG_PAGEALLOC && IOMMU_API && PAGE_EXTENSION
> > + help
> > + This config checks that a page is freed(unmapped) or mapped by the
> > + kernel is not mapped in any IOMMU domain. It can help with debugging
> > + use-after-free or out-of-bound maps from drivers doing DMA through
> > + the IOMMU API.
> > + This santaizer can have false-negative cases where some problems
>
> sanitizer
I will fix it.
Thanks,
Mostafa
>
> > + won't be detected.
> > + Expect overhead when enabling this and enabling the kernel command
> > + line iommu.debug_pagealloc.
> > +
> > + If unsure, say N here.
> > +
> > endif # IOMMU_SUPPORT
>
>
> --
> ~Randy
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/4] drivers/iommu: Add page_ext for IOMMU_DEBUG_PAGEALLOC
2025-11-13 10:05 ` Will Deacon
@ 2025-11-24 11:10 ` Mostafa Saleh
2025-11-24 12:45 ` Mostafa Saleh
0 siblings, 1 reply; 18+ messages in thread
From: Mostafa Saleh @ 2025-11-24 11:10 UTC (permalink / raw)
To: Will Deacon
Cc: linux-mm, iommu, linux-kernel, linux-doc, corbet, joro,
robin.murphy, akpm, vbabka, surenb, mhocko, jackmanb, hannes,
ziy, david, lorenzo.stoakes, Liam.Howlett, rppt, Qinxin Xia
On Thu, Nov 13, 2025 at 10:05:19AM +0000, Will Deacon wrote:
> Hi Mostafa,
>
> On Thu, Nov 06, 2025 at 04:39:50PM +0000, Mostafa Saleh wrote:
> > Add a new config IOMMU_DEBUG_PAGEALLOC, which registers new data to
> > page_ext.
> > This config will be used by the IOMMU API to track pages mapped in
> > the IOMMU to catch drivers trying to free kernel memory that they
> > still map in their domains, causing all types of memory corruption.
> > This behaviour is disabled by default and can be enabled using
> > kernel cmdline iommu.debug_pagealloc.
> >
> > Signed-off-by: Mostafa Saleh <smostafa@google.com>
> > Tested-by: Qinxin Xia <xiaqinxin@huawei.com>
> > ---
> > .../admin-guide/kernel-parameters.txt | 6 ++++
> > drivers/iommu/Kconfig | 15 +++++++++
> > drivers/iommu/Makefile | 1 +
> > drivers/iommu/iommu-debug-pagealloc.c | 32 +++++++++++++++++++
> > include/linux/iommu-debug-pagealloc.h | 17 ++++++++++
> > mm/page_ext.c | 4 +++
> > 6 files changed, 75 insertions(+)
> > create mode 100644 drivers/iommu/iommu-debug-pagealloc.c
> > create mode 100644 include/linux/iommu-debug-pagealloc.h
>
> This looks like a pretty handy feature to me, but I have some nits below.
Thanks for taking the time to review the patches!
>
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index 6c42061ca20e..9a1c4ac8ba96 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -2557,6 +2557,12 @@
> > 1 - Bypass the IOMMU for DMA.
> > unset - Use value of CONFIG_IOMMU_DEFAULT_PASSTHROUGH.
> >
> > + iommu.debug_pagealloc=
> > + [KNL,EARLY] When CONFIG_IOMMU_DEBUG_PAGEALLOC is set, this
> > + parameter enables the feature at boot time. By default, it
> > + is disabled and the system will work mostly the same as a
> > + kernel built without CONFIG_IOMMU_DEBUG_PAGEALLOC.
>
> Can you be more specific about "mostly the same"?
The only difference is that the static key to gate the calls, I was not sure if
saying “exactly the same” is correct, but I think it’s better avoid “mostly” as
it might be confusing and as the data in the cover letter shows no overhead,
I will re-write the whole help anyway.
>
> > +
> > io7= [HW] IO7 for Marvel-based Alpha systems
> > See comment before marvel_specify_io7 in
> > arch/alpha/kernel/core_marvel.c.
> > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > index 70d29b14d851..6b5e9a2d936a 100644
> > --- a/drivers/iommu/Kconfig
> > +++ b/drivers/iommu/Kconfig
> > @@ -383,4 +383,19 @@ config SPRD_IOMMU
> >
> > Say Y here if you want to use the multimedia devices listed above.
> >
> > +config IOMMU_DEBUG_PAGEALLOC
> > + bool "Debug page memory allocations against IOMMU"
>
> Perhaps "IOMMU mappings" would make this a little clearer?
Will do.
>
> > + depends on DEBUG_PAGEALLOC && IOMMU_API && PAGE_EXTENSION
> > + help
> > + This config checks that a page is freed(unmapped) or mapped by the
> > + kernel is not mapped in any IOMMU domain.
>
> I can't really parse this sentence :/
I will re-write it.
>
> > It can help with debugging
> > + use-after-free or out-of-bound maps from drivers doing DMA through
> > + the IOMMU API.
> > + This santaizer can have false-negative cases where some problems
> > + won't be detected.
>
> Maybe just say "The sanitizer is best-effort and can fail to detect problems
> in the case that ...".
Makes sense, will do.
>
> > + Expect overhead when enabling this and enabling the kernel command
> > + line iommu.debug_pagealloc.
>
> I'd reword this to say something like "Due to the overhead of the sanitiser,
> iommu.debug_pagealloc must also be passed on the kernel command-line to
> enable this feature".
Will do.
>
> > +
> > + If unsure, say N here.
> > +
> > endif # IOMMU_SUPPORT
> > diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> > index 355294fa9033..8f5130b6a671 100644
> > --- a/drivers/iommu/Makefile
> > +++ b/drivers/iommu/Makefile
> > @@ -34,3 +34,4 @@ obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
> > obj-$(CONFIG_IOMMU_IOPF) += io-pgfault.o
> > obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o
> > obj-$(CONFIG_APPLE_DART) += apple-dart.o
> > +obj-$(CONFIG_IOMMU_DEBUG_PAGEALLOC) += iommu-debug-pagealloc.o
> > diff --git a/drivers/iommu/iommu-debug-pagealloc.c b/drivers/iommu/iommu-debug-pagealloc.c
> > new file mode 100644
> > index 000000000000..385c8bfae02b
> > --- /dev/null
> > +++ b/drivers/iommu/iommu-debug-pagealloc.c
> > @@ -0,0 +1,32 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Copyright (C) 2025 - Google Inc
> > + * Author: Mostafa Saleh <smostafa@google.com>
> > + * IOMMU API debug page alloc sanitizer
> > + */
> > +#include <linux/atomic.h>
> > +#include <linux/iommu-debug-pagealloc.h>
> > +#include <linux/kernel.h>
> > +#include <linux/page_ext.h>
> > +
> > +static bool needed;
> > +
> > +struct iommu_debug_metadate {
> > + atomic_t ref;
> > +};
>
> s/metadate/metadata/
Ah, that's embarrassing, I will fix it.
Thanks,
Mostafa
>
> Will
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/4] drivers/iommu: Add calls for IOMMU_DEBUG_PAGEALLOC
2025-11-13 11:00 ` Will Deacon
@ 2025-11-24 11:23 ` Mostafa Saleh
0 siblings, 0 replies; 18+ messages in thread
From: Mostafa Saleh @ 2025-11-24 11:23 UTC (permalink / raw)
To: Will Deacon
Cc: linux-mm, iommu, linux-kernel, linux-doc, corbet, joro,
robin.murphy, akpm, vbabka, surenb, mhocko, jackmanb, hannes,
ziy, david, lorenzo.stoakes, Liam.Howlett, rppt, Qinxin Xia
On Thu, Nov 13, 2025 at 11:00:15AM +0000, Will Deacon wrote:
> On Thu, Nov 06, 2025 at 04:39:51PM +0000, Mostafa Saleh wrote:
> > Add calls for the new iommu debug config IOMMU_DEBUG_PAGEALLOC:
> > - iommu_debug_init: Enable the debug mode if configured by the user.
> > - iommu_debug_map: Track iommu pages mapped, using physical address.
> > - iommu_debug_unmap: Track iommu pages unmapped, using IO virtual
> > address.
> > - iommu_debug_remap: Track iommu pages, already mapped using IOVA.
> >
> > We have to do the unmap/remap as once pages are unmapped we lose the
> > information of the physical address.
> > This is racy, but the API is racy by construction as it uses refcounts
> > and doesn't attempt to lock/synchronize with the IOMMU API as that will
> > be costly, meaning that possibility of false negative exists.
> >
> > Signed-off-by: Mostafa Saleh <smostafa@google.com>
> > Tested-by: Qinxin Xia <xiaqinxin@huawei.com>
> > ---
> > drivers/iommu/iommu-debug-pagealloc.c | 23 ++++++++++++
> > drivers/iommu/iommu.c | 14 ++++++-
> > include/linux/iommu-debug-pagealloc.h | 54 +++++++++++++++++++++++++++
> > 3 files changed, 89 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/iommu/iommu-debug-pagealloc.c b/drivers/iommu/iommu-debug-pagealloc.c
> > index 385c8bfae02b..a6a2f844b09d 100644
> > --- a/drivers/iommu/iommu-debug-pagealloc.c
> > +++ b/drivers/iommu/iommu-debug-pagealloc.c
> > @@ -5,11 +5,13 @@
> > * IOMMU API debug page alloc sanitizer
> > */
> > #include <linux/atomic.h>
> > +#include <linux/iommu.h>
> > #include <linux/iommu-debug-pagealloc.h>
> > #include <linux/kernel.h>
> > #include <linux/page_ext.h>
> >
> > static bool needed;
> > +DEFINE_STATIC_KEY_FALSE(iommu_debug_initialized);
> >
> > struct iommu_debug_metadate {
> > atomic_t ref;
> > @@ -25,6 +27,27 @@ struct page_ext_operations page_iommu_debug_ops = {
> > .need = need_iommu_debug,
> > };
> >
> > +void __iommu_debug_map(struct iommu_domain *domain, phys_addr_t phys, size_t size)
> > +{
> > +}
> > +
> > +void __iommu_debug_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
> > +{
> > +}
> > +
> > +void __iommu_debug_remap(struct iommu_domain *domain, unsigned long iova, size_t size)
> > +{
> > +}
>
> Since the IOMMU API doesn't really have a "remap" operation, I wonder
> whether it would be clearer to have unmap_begin() and unmap_end()
> functions instead? You'd probably want to call them as a pair, so the
> check for unmapped < size would move into unmap_end().
I guess that would be cleaner, but we have to pass both size and
unmapped to unmap_end, I will give it a try.
Thanks,
Mostafa
>
> Will
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 3/4] drivers/iommu-debug-pagealloc: Track IOMMU pages
2025-11-13 11:00 ` Will Deacon
@ 2025-11-24 12:37 ` Mostafa Saleh
2025-11-24 15:35 ` Will Deacon
0 siblings, 1 reply; 18+ messages in thread
From: Mostafa Saleh @ 2025-11-24 12:37 UTC (permalink / raw)
To: Will Deacon
Cc: linux-mm, iommu, linux-kernel, linux-doc, corbet, joro,
robin.murphy, akpm, vbabka, surenb, mhocko, jackmanb, hannes,
ziy, david, lorenzo.stoakes, Liam.Howlett, rppt, Qinxin Xia
On Thu, Nov 13, 2025 at 11:00:29AM +0000, Will Deacon wrote:
> On Thu, Nov 06, 2025 at 04:39:52PM +0000, Mostafa Saleh wrote:
> > Using the new calls, use an atomic refcount to track how many times
> > a page is mapped in any of the IOMMUs.
> >
> > For unmap we need to use iova_to_phys() to get the physical address
> > of the pages.
> >
> > We use the smallest supported page size as the granularity of tracking
> > per domain.
> > This is important as it possible to map pages and unmap them with
> > larger sizes (as in map_sg()) cases.
> >
> > Signed-off-by: Mostafa Saleh <smostafa@google.com>
> > Tested-by: Qinxin Xia <xiaqinxin@huawei.com>
> > ---
> > drivers/iommu/iommu-debug-pagealloc.c | 74 +++++++++++++++++++++++++++
> > 1 file changed, 74 insertions(+)
> >
> > diff --git a/drivers/iommu/iommu-debug-pagealloc.c b/drivers/iommu/iommu-debug-pagealloc.c
> > index a6a2f844b09d..0e14104b971c 100644
> > --- a/drivers/iommu/iommu-debug-pagealloc.c
> > +++ b/drivers/iommu/iommu-debug-pagealloc.c
> > @@ -27,16 +27,90 @@ struct page_ext_operations page_iommu_debug_ops = {
> > .need = need_iommu_debug,
> > };
> >
> > +static struct page_ext *get_iommu_page_ext(phys_addr_t phys)
> > +{
> > + struct page *page = phys_to_page(phys);
> > + struct page_ext *page_ext = page_ext_get(page);
> > +
> > + return page_ext;
> > +}
> > +
> > +static struct iommu_debug_metadate *get_iommu_data(struct page_ext *page_ext)
> > +{
> > + return page_ext_data(page_ext, &page_iommu_debug_ops);
> > +}
> > +
> > +static void iommu_debug_inc_page(phys_addr_t phys)
> > +{
> > + struct page_ext *page_ext = get_iommu_page_ext(phys);
> > + struct iommu_debug_metadate *d = get_iommu_data(page_ext);
> > +
> > + WARN_ON(atomic_inc_return(&d->ref) <= 0);
>
> Is it worth dumping some information about the page in addition to the
> WARN_ON()? That way, you might be able to benefit from other debug
> options (e.g. PAGE_OWNER) if they are enabled.
These WARN_ON are for overflows, which should never happen.
I initially thought about using the refcount_t, but it didn’t seem
suitable as refcount_add() expects that the refcount is already “1”
indicating that an object was already created which doesn’t fit
in the semantics of what this is. Similar for refcount_dec().
In the next patch there is a WARN_ON for the refcount check
to capture the mis-behaving context, I will add a debug print with
the leaked physical address in that case as this is the important one.
>
> > + page_ext_put(page_ext);
> > +}
> > +
> > +static void iommu_debug_dec_page(phys_addr_t phys)
> > +{
> > + struct page_ext *page_ext = get_iommu_page_ext(phys);
> > + struct iommu_debug_metadate *d = get_iommu_data(page_ext);
> > +
> > + WARN_ON(atomic_dec_return(&d->ref) < 0);
>
> nit: I can't see why you need memory ordering guarantees for the refcount,
> so you could use the relaxed variants for the inc/dec operations.
Will do.
>
> > + page_ext_put(page_ext);
> > +}
> > +
> > +/*
> > + * IOMMU page size might not match the CPU page size, in that case, we use
> > + * the smallest IOMMU page size to refcount the pages in the vmemmap.
> > + * That is important as both map and unmap has to use the same page size
> > + * to update the refcount to avoid double counting the same page.
> > + * And as we can't know from iommu_unmap() what was the original page size
> > + * used for map, we just use the minimum supported one for both.
> > + */
> > +static size_t iommu_debug_page_size(struct iommu_domain *domain)
> > +{
> > + return 1UL << __ffs(domain->pgsize_bitmap);
> > +}
> > +
> > void __iommu_debug_map(struct iommu_domain *domain, phys_addr_t phys, size_t size)
> > {
> > + size_t off;
> > + size_t page_size = iommu_debug_page_size(domain);
>
> Since this is a debug feature, is it worth checking other properties of
> the arguments too? For example, that phys is non-zero and that phys +
> size doesn't overflow?
>
Makes sense, I will add some more checks.
> > + for (off = 0 ; off < size ; off += page_size) {
> > + if (!pfn_valid(__phys_to_pfn(phys + off)))
> > + continue;
> > + iommu_debug_inc_page(phys + off);
> > + }
> > }
> >
> > void __iommu_debug_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
> > {
> > + size_t off;
> > + size_t page_size = iommu_debug_page_size(domain);
> > +
> > + for (off = 0 ; off < size ; off += page_size) {
> > + phys_addr_t phys = iommu_iova_to_phys(domain, iova + off);
> > +
> > + if (!phys || !pfn_valid(__phys_to_pfn(phys + off)))
> > + continue;
>
> Hmm, it looks weird to add 'off' to both 'iova' _and_ the resulting
> physical address. Is that correct?
>
Yes, that's a bug, I will fix it.
> > + iommu_debug_dec_page(phys);
> > + }
> > }
> >
> > void __iommu_debug_remap(struct iommu_domain *domain, unsigned long iova, size_t size)
> > {
> > + size_t off;
> > + size_t page_size = iommu_debug_page_size(domain);
> > +
> > + for (off = 0 ; off < size ; off += page_size) {
> > + phys_addr_t phys = iommu_iova_to_phys(domain, iova + off);
> > +
> > + if (!phys || !pfn_valid(__phys_to_pfn(phys + off)))
> > + continue;
> > +
> > + iommu_debug_inc_page(phys);
> > + }
>
> You can make the bulk of this code common with the unmap function.
Will do.
Thank,
Mostafa
>
> Will
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 4/4] drivers/iommu-debug-pagealloc: Check mapped/unmapped kernel memory
2025-11-13 10:59 ` Will Deacon
@ 2025-11-24 12:38 ` Mostafa Saleh
0 siblings, 0 replies; 18+ messages in thread
From: Mostafa Saleh @ 2025-11-24 12:38 UTC (permalink / raw)
To: Will Deacon
Cc: linux-mm, iommu, linux-kernel, linux-doc, corbet, joro,
robin.murphy, akpm, vbabka, surenb, mhocko, jackmanb, hannes,
ziy, david, lorenzo.stoakes, Liam.Howlett, rppt, Qinxin Xia
On Thu, Nov 13, 2025 at 10:59:57AM +0000, Will Deacon wrote:
> On Thu, Nov 06, 2025 at 04:39:53PM +0000, Mostafa Saleh wrote:
> > Now, as the page_ext holds count of IOMMU mappings, we can use it to
> > assert that any page allocated/freed is indeed not in the IOMMU.
> >
> > The sanitizer doesn’t protect against mapping/unmapping during this
> > period. However, that’s less harmful as the page is not used by the
> > kernel.
> >
> > Signed-off-by: Mostafa Saleh <smostafa@google.com>
> > Tested-by: Qinxin Xia <xiaqinxin@huawei.com>
> > ---
> > drivers/iommu/iommu-debug-pagealloc.c | 19 +++++++++++++++++++
> > include/linux/iommu-debug-pagealloc.h | 12 ++++++++++++
> > include/linux/mm.h | 5 +++++
> > 3 files changed, 36 insertions(+)
> >
> > diff --git a/drivers/iommu/iommu-debug-pagealloc.c b/drivers/iommu/iommu-debug-pagealloc.c
> > index 0e14104b971c..5b26c84d3a0e 100644
> > --- a/drivers/iommu/iommu-debug-pagealloc.c
> > +++ b/drivers/iommu/iommu-debug-pagealloc.c
> > @@ -71,6 +71,25 @@ static size_t iommu_debug_page_size(struct iommu_domain *domain)
> > return 1UL << __ffs(domain->pgsize_bitmap);
> > }
> >
> > +static unsigned int iommu_debug_page_count(unsigned long phys)
>
> 'phys_addr_t phys' ?
>
> But having said that, wouldn't you be better off taking the
> 'struct page *' here rather than converting it to a physical address
> only for get_iommu_page_ext() to convert it straight back again?
Will do, we will need the physical address anyway for the error message.
>
> > +{
> > + unsigned int ref;
> > + struct page_ext *page_ext = get_iommu_page_ext(phys);
> > + struct iommu_debug_metadate *d = get_iommu_data(page_ext);
> > +
> > + ref = atomic_read(&d->ref);
> > + page_ext_put(page_ext);
> > + return ref;
> > +}
> > +
> > +void __iommu_debug_check_unmapped(const struct page *page, int numpages)
> > +{
> > + while (numpages--) {
> > + WARN_ON(iommu_debug_page_count(page_to_phys(page)));
>
> Since you only care about the count being non-zero, perhaps tweak
> iommu_debug_page_count() to be something like:
>
> bool iommu_debug_page_referenced(struct page *);
>
Will do.
Thanks,
Mostafa
> Will
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/4] drivers/iommu: Add page_ext for IOMMU_DEBUG_PAGEALLOC
2025-11-24 11:10 ` Mostafa Saleh
@ 2025-11-24 12:45 ` Mostafa Saleh
0 siblings, 0 replies; 18+ messages in thread
From: Mostafa Saleh @ 2025-11-24 12:45 UTC (permalink / raw)
To: Will Deacon
Cc: linux-mm, iommu, linux-kernel, linux-doc, corbet, joro,
robin.murphy, akpm, vbabka, surenb, mhocko, jackmanb, hannes,
ziy, david, lorenzo.stoakes, Liam.Howlett, rppt, Qinxin Xia
On Mon, Nov 24, 2025 at 11:10 AM Mostafa Saleh <smostafa@google.com> wrote:
>
> On Thu, Nov 13, 2025 at 10:05:19AM +0000, Will Deacon wrote:
> > Hi Mostafa,
> >
> > On Thu, Nov 06, 2025 at 04:39:50PM +0000, Mostafa Saleh wrote:
> > > Add a new config IOMMU_DEBUG_PAGEALLOC, which registers new data to
> > > page_ext.
> > > This config will be used by the IOMMU API to track pages mapped in
> > > the IOMMU to catch drivers trying to free kernel memory that they
> > > still map in their domains, causing all types of memory corruption.
> > > This behaviour is disabled by default and can be enabled using
> > > kernel cmdline iommu.debug_pagealloc.
> > >
> > > Signed-off-by: Mostafa Saleh <smostafa@google.com>
> > > Tested-by: Qinxin Xia <xiaqinxin@huawei.com>
> > > ---
> > > .../admin-guide/kernel-parameters.txt | 6 ++++
> > > drivers/iommu/Kconfig | 15 +++++++++
> > > drivers/iommu/Makefile | 1 +
> > > drivers/iommu/iommu-debug-pagealloc.c | 32 +++++++++++++++++++
> > > include/linux/iommu-debug-pagealloc.h | 17 ++++++++++
> > > mm/page_ext.c | 4 +++
> > > 6 files changed, 75 insertions(+)
> > > create mode 100644 drivers/iommu/iommu-debug-pagealloc.c
> > > create mode 100644 include/linux/iommu-debug-pagealloc.h
> >
> > This looks like a pretty handy feature to me, but I have some nits below.
>
> Thanks for taking the time to review the patches!
>
> >
> > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > > index 6c42061ca20e..9a1c4ac8ba96 100644
> > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > @@ -2557,6 +2557,12 @@
> > > 1 - Bypass the IOMMU for DMA.
> > > unset - Use value of CONFIG_IOMMU_DEFAULT_PASSTHROUGH.
> > >
> > > + iommu.debug_pagealloc=
> > > + [KNL,EARLY] When CONFIG_IOMMU_DEBUG_PAGEALLOC is set, this
> > > + parameter enables the feature at boot time. By default, it
> > > + is disabled and the system will work mostly the same as a
> > > + kernel built without CONFIG_IOMMU_DEBUG_PAGEALLOC.
> >
> > Can you be more specific about "mostly the same"?
>
> The only difference is that the static key to gate the calls, I was not sure if
> saying “exactly the same” is correct, but I think it’s better avoid “mostly” as
> it might be confusing and as the data in the cover letter shows no overhead,
> I will re-write the whole help anyway.
>
Ah, I think I copied the base of this from the documenation of
"debug_pagealloc="
which actually slightly changes system behviour by not using THP on some archs,
which is not relevant in our case, I will drop it then.
Thanks,
Mostafa
> >
> > > +
> > > io7= [HW] IO7 for Marvel-based Alpha systems
> > > See comment before marvel_specify_io7 in
> > > arch/alpha/kernel/core_marvel.c.
> > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > > index 70d29b14d851..6b5e9a2d936a 100644
> > > --- a/drivers/iommu/Kconfig
> > > +++ b/drivers/iommu/Kconfig
> > > @@ -383,4 +383,19 @@ config SPRD_IOMMU
> > >
> > > Say Y here if you want to use the multimedia devices listed above.
> > >
> > > +config IOMMU_DEBUG_PAGEALLOC
> > > + bool "Debug page memory allocations against IOMMU"
> >
> > Perhaps "IOMMU mappings" would make this a little clearer?
>
> Will do.
>
> >
> > > + depends on DEBUG_PAGEALLOC && IOMMU_API && PAGE_EXTENSION
> > > + help
> > > + This config checks that a page is freed(unmapped) or mapped by the
> > > + kernel is not mapped in any IOMMU domain.
> >
> > I can't really parse this sentence :/
>
> I will re-write it.
>
> >
> > > It can help with debugging
> > > + use-after-free or out-of-bound maps from drivers doing DMA through
> > > + the IOMMU API.
> > > + This santaizer can have false-negative cases where some problems
> > > + won't be detected.
> >
> > Maybe just say "The sanitizer is best-effort and can fail to detect problems
> > in the case that ...".
>
> Makes sense, will do.
>
> >
> > > + Expect overhead when enabling this and enabling the kernel command
> > > + line iommu.debug_pagealloc.
> >
> > I'd reword this to say something like "Due to the overhead of the sanitiser,
> > iommu.debug_pagealloc must also be passed on the kernel command-line to
> > enable this feature".
>
> Will do.
>
> >
> > > +
> > > + If unsure, say N here.
> > > +
> > > endif # IOMMU_SUPPORT
> > > diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> > > index 355294fa9033..8f5130b6a671 100644
> > > --- a/drivers/iommu/Makefile
> > > +++ b/drivers/iommu/Makefile
> > > @@ -34,3 +34,4 @@ obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
> > > obj-$(CONFIG_IOMMU_IOPF) += io-pgfault.o
> > > obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o
> > > obj-$(CONFIG_APPLE_DART) += apple-dart.o
> > > +obj-$(CONFIG_IOMMU_DEBUG_PAGEALLOC) += iommu-debug-pagealloc.o
> > > diff --git a/drivers/iommu/iommu-debug-pagealloc.c b/drivers/iommu/iommu-debug-pagealloc.c
> > > new file mode 100644
> > > index 000000000000..385c8bfae02b
> > > --- /dev/null
> > > +++ b/drivers/iommu/iommu-debug-pagealloc.c
> > > @@ -0,0 +1,32 @@
> > > +// SPDX-License-Identifier: GPL-2.0-only
> > > +/*
> > > + * Copyright (C) 2025 - Google Inc
> > > + * Author: Mostafa Saleh <smostafa@google.com>
> > > + * IOMMU API debug page alloc sanitizer
> > > + */
> > > +#include <linux/atomic.h>
> > > +#include <linux/iommu-debug-pagealloc.h>
> > > +#include <linux/kernel.h>
> > > +#include <linux/page_ext.h>
> > > +
> > > +static bool needed;
> > > +
> > > +struct iommu_debug_metadate {
> > > + atomic_t ref;
> > > +};
> >
> > s/metadate/metadata/
>
> Ah, that's embarrassing, I will fix it.
>
> Thanks,
> Mostafa
>
> >
> > Will
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 3/4] drivers/iommu-debug-pagealloc: Track IOMMU pages
2025-11-24 12:37 ` Mostafa Saleh
@ 2025-11-24 15:35 ` Will Deacon
2025-11-24 16:01 ` Mostafa Saleh
0 siblings, 1 reply; 18+ messages in thread
From: Will Deacon @ 2025-11-24 15:35 UTC (permalink / raw)
To: Mostafa Saleh
Cc: linux-mm, iommu, linux-kernel, linux-doc, corbet, joro,
robin.murphy, akpm, vbabka, surenb, mhocko, jackmanb, hannes,
ziy, david, lorenzo.stoakes, Liam.Howlett, rppt, Qinxin Xia
On Mon, Nov 24, 2025 at 12:37:31PM +0000, Mostafa Saleh wrote:
> On Thu, Nov 13, 2025 at 11:00:29AM +0000, Will Deacon wrote:
> > On Thu, Nov 06, 2025 at 04:39:52PM +0000, Mostafa Saleh wrote:
> > > Using the new calls, use an atomic refcount to track how many times
> > > a page is mapped in any of the IOMMUs.
> > >
> > > For unmap we need to use iova_to_phys() to get the physical address
> > > of the pages.
> > >
> > > We use the smallest supported page size as the granularity of tracking
> > > per domain.
> > > This is important as it possible to map pages and unmap them with
> > > larger sizes (as in map_sg()) cases.
> > >
> > > Signed-off-by: Mostafa Saleh <smostafa@google.com>
> > > Tested-by: Qinxin Xia <xiaqinxin@huawei.com>
> > > ---
> > > drivers/iommu/iommu-debug-pagealloc.c | 74 +++++++++++++++++++++++++++
> > > 1 file changed, 74 insertions(+)
> > >
> > > diff --git a/drivers/iommu/iommu-debug-pagealloc.c b/drivers/iommu/iommu-debug-pagealloc.c
> > > index a6a2f844b09d..0e14104b971c 100644
> > > --- a/drivers/iommu/iommu-debug-pagealloc.c
> > > +++ b/drivers/iommu/iommu-debug-pagealloc.c
> > > @@ -27,16 +27,90 @@ struct page_ext_operations page_iommu_debug_ops = {
> > > .need = need_iommu_debug,
> > > };
> > >
> > > +static struct page_ext *get_iommu_page_ext(phys_addr_t phys)
> > > +{
> > > + struct page *page = phys_to_page(phys);
> > > + struct page_ext *page_ext = page_ext_get(page);
> > > +
> > > + return page_ext;
> > > +}
> > > +
> > > +static struct iommu_debug_metadate *get_iommu_data(struct page_ext *page_ext)
> > > +{
> > > + return page_ext_data(page_ext, &page_iommu_debug_ops);
> > > +}
> > > +
> > > +static void iommu_debug_inc_page(phys_addr_t phys)
> > > +{
> > > + struct page_ext *page_ext = get_iommu_page_ext(phys);
> > > + struct iommu_debug_metadate *d = get_iommu_data(page_ext);
> > > +
> > > + WARN_ON(atomic_inc_return(&d->ref) <= 0);
> >
> > Is it worth dumping some information about the page in addition to the
> > WARN_ON()? That way, you might be able to benefit from other debug
> > options (e.g. PAGE_OWNER) if they are enabled.
>
> These WARN_ON are for overflows, which should never happen.
> I initially thought about using the refcount_t, but it didn’t seem
> suitable as refcount_add() expects that the refcount is already “1”
> indicating that an object was already created which doesn’t fit
> in the semantics of what this is. Similar for refcount_dec().
>
> In the next patch there is a WARN_ON for the refcount check
> to capture the mis-behaving context, I will add a debug print with
> the leaked physical address in that case as this is the important one.
I was thinking specifically about calling dump_page_owner().
Will
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 3/4] drivers/iommu-debug-pagealloc: Track IOMMU pages
2025-11-24 15:35 ` Will Deacon
@ 2025-11-24 16:01 ` Mostafa Saleh
0 siblings, 0 replies; 18+ messages in thread
From: Mostafa Saleh @ 2025-11-24 16:01 UTC (permalink / raw)
To: Will Deacon
Cc: linux-mm, iommu, linux-kernel, linux-doc, corbet, joro,
robin.murphy, akpm, vbabka, surenb, mhocko, jackmanb, hannes,
ziy, david, lorenzo.stoakes, Liam.Howlett, rppt, Qinxin Xia
On Mon, Nov 24, 2025 at 03:35:47PM +0000, Will Deacon wrote:
> On Mon, Nov 24, 2025 at 12:37:31PM +0000, Mostafa Saleh wrote:
> > On Thu, Nov 13, 2025 at 11:00:29AM +0000, Will Deacon wrote:
> > > On Thu, Nov 06, 2025 at 04:39:52PM +0000, Mostafa Saleh wrote:
> > > > Using the new calls, use an atomic refcount to track how many times
> > > > a page is mapped in any of the IOMMUs.
> > > >
> > > > For unmap we need to use iova_to_phys() to get the physical address
> > > > of the pages.
> > > >
> > > > We use the smallest supported page size as the granularity of tracking
> > > > per domain.
> > > > This is important as it possible to map pages and unmap them with
> > > > larger sizes (as in map_sg()) cases.
> > > >
> > > > Signed-off-by: Mostafa Saleh <smostafa@google.com>
> > > > Tested-by: Qinxin Xia <xiaqinxin@huawei.com>
> > > > ---
> > > > drivers/iommu/iommu-debug-pagealloc.c | 74 +++++++++++++++++++++++++++
> > > > 1 file changed, 74 insertions(+)
> > > >
> > > > diff --git a/drivers/iommu/iommu-debug-pagealloc.c b/drivers/iommu/iommu-debug-pagealloc.c
> > > > index a6a2f844b09d..0e14104b971c 100644
> > > > --- a/drivers/iommu/iommu-debug-pagealloc.c
> > > > +++ b/drivers/iommu/iommu-debug-pagealloc.c
> > > > @@ -27,16 +27,90 @@ struct page_ext_operations page_iommu_debug_ops = {
> > > > .need = need_iommu_debug,
> > > > };
> > > >
> > > > +static struct page_ext *get_iommu_page_ext(phys_addr_t phys)
> > > > +{
> > > > + struct page *page = phys_to_page(phys);
> > > > + struct page_ext *page_ext = page_ext_get(page);
> > > > +
> > > > + return page_ext;
> > > > +}
> > > > +
> > > > +static struct iommu_debug_metadate *get_iommu_data(struct page_ext *page_ext)
> > > > +{
> > > > + return page_ext_data(page_ext, &page_iommu_debug_ops);
> > > > +}
> > > > +
> > > > +static void iommu_debug_inc_page(phys_addr_t phys)
> > > > +{
> > > > + struct page_ext *page_ext = get_iommu_page_ext(phys);
> > > > + struct iommu_debug_metadate *d = get_iommu_data(page_ext);
> > > > +
> > > > + WARN_ON(atomic_inc_return(&d->ref) <= 0);
> > >
> > > Is it worth dumping some information about the page in addition to the
> > > WARN_ON()? That way, you might be able to benefit from other debug
> > > options (e.g. PAGE_OWNER) if they are enabled.
> >
> > These WARN_ON are for overflows, which should never happen.
> > I initially thought about using the refcount_t, but it didn’t seem
> > suitable as refcount_add() expects that the refcount is already “1”
> > indicating that an object was already created which doesn’t fit
> > in the semantics of what this is. Similar for refcount_dec().
> >
> > In the next patch there is a WARN_ON for the refcount check
> > to capture the mis-behaving context, I will add a debug print with
> > the leaked physical address in that case as this is the important one.
>
> I was thinking specifically about calling dump_page_owner().
I see, that makes sense.
Thanks,
Mostafa
>
> Will
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2025-11-24 16:02 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-06 16:39 [PATCH v2 0/4] iommu: Add IOMMU_DEBUG_PAGEALLOC sanitizer Mostafa Saleh
2025-11-06 16:39 ` [PATCH v2 1/4] drivers/iommu: Add page_ext for IOMMU_DEBUG_PAGEALLOC Mostafa Saleh
2025-11-06 19:50 ` Randy Dunlap
2025-11-24 11:04 ` Mostafa Saleh
2025-11-13 10:05 ` Will Deacon
2025-11-24 11:10 ` Mostafa Saleh
2025-11-24 12:45 ` Mostafa Saleh
2025-11-06 16:39 ` [PATCH v2 2/4] drivers/iommu: Add calls " Mostafa Saleh
2025-11-13 11:00 ` Will Deacon
2025-11-24 11:23 ` Mostafa Saleh
2025-11-06 16:39 ` [PATCH v2 3/4] drivers/iommu-debug-pagealloc: Track IOMMU pages Mostafa Saleh
2025-11-13 11:00 ` Will Deacon
2025-11-24 12:37 ` Mostafa Saleh
2025-11-24 15:35 ` Will Deacon
2025-11-24 16:01 ` Mostafa Saleh
2025-11-06 16:39 ` [PATCH v2 4/4] drivers/iommu-debug-pagealloc: Check mapped/unmapped kernel memory Mostafa Saleh
2025-11-13 10:59 ` Will Deacon
2025-11-24 12:38 ` Mostafa Saleh
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox