* Re: [PATCH v3] io: add io_pgtable abstraction
2025-11-12 10:15 [PATCH v3] io: add io_pgtable abstraction Alice Ryhl
@ 2025-11-12 12:57 ` Daniel Almeida
2025-11-17 16:34 ` Alice Ryhl
2025-11-19 10:53 ` Boris Brezillon
2025-11-19 10:56 ` Boris Brezillon
` (2 subsequent siblings)
3 siblings, 2 replies; 12+ messages in thread
From: Daniel Almeida @ 2025-11-12 12:57 UTC (permalink / raw)
To: Alice Ryhl
Cc: Miguel Ojeda, Will Deacon, Boris Brezillon, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Trevor Gross, Danilo Krummrich, Joerg Roedel, Robin Murphy,
Lorenzo Stoakes, Liam R. Howlett, Asahi Lina, linux-kernel,
rust-for-linux, iommu, linux-mm
Hi Alice,
> On 12 Nov 2025, at 07:15, Alice Ryhl <aliceryhl@google.com> wrote:
>
> From: Asahi Lina <lina+kernel@asahilina.net>
>
> This will be used by the Tyr driver to create and modify the page table
> of each address space on the GPU. Each time a mapping gets created or
> removed by userspace, Tyr will call into GPUVM, which will figure out
> which calls to map_pages and unmap_pages are required to map the data in
> question in the page table so that the GPU may access those pages when
> using that address space.
>
> The Rust type wraps the struct using a raw pointer rather than the usual
> Opaque+ARef approach because Opaque+ARef requires the target type to be
> refcounted.
>
> Signed-off-by: Asahi Lina <lina+kernel@asahilina.net>
> Co-Developed-by: Alice Ryhl <aliceryhl@google.com>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> ---
> This patch is based on [1] but I have rewritten and simplified large
> parts of it. The Asahi driver no longer uses the io-pgtable abstraction,
> and Nova never planned to (since NVIDIA has its own separate memory).
> Therefore, I have simplified these abstractions to fit the needs of the
> Tyr GPU driver.
>
> This series depends on the PhysAddr typedef [2].
>
> [1]: https://lore.kernel.org/all/20250623-io_pgtable-v2-1-fd72daac75f1@collabora.com/
> [2]: https://lore.kernel.org/all/20251112-resource-phys-typedefs-v2-0-538307384f82@google.com/
> ---
> rust/bindings/bindings_helper.h | 3 +-
> rust/kernel/io.rs | 1 +
> rust/kernel/io/pgtable.rs | 254 ++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 257 insertions(+), 1 deletion(-)
>
> diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
> index 2e43c66635a2c9f31bd99b9817bd2d6ab89fbcf2..faab6bc9463321c092a8bbcb6281175e490caccd 100644
> --- a/rust/bindings/bindings_helper.h
> +++ b/rust/bindings/bindings_helper.h
> @@ -56,8 +56,9 @@
> #include <linux/fdtable.h>
> #include <linux/file.h>
> #include <linux/firmware.h>
> -#include <linux/interrupt.h>
> #include <linux/fs.h>
> +#include <linux/interrupt.h>
> +#include <linux/io-pgtable.h>
> #include <linux/ioport.h>
> #include <linux/jiffies.h>
> #include <linux/jump_label.h>
> diff --git a/rust/kernel/io.rs b/rust/kernel/io.rs
> index 56a435eb14e3a1ce72dd58b88cbf296041f1703e..5913e240d5a9814ceed52c6dc1a798e64158d567 100644
> --- a/rust/kernel/io.rs
> +++ b/rust/kernel/io.rs
> @@ -8,6 +8,7 @@
> use crate::{bindings, build_assert, ffi::c_void};
>
> pub mod mem;
> +pub mod pgtable;
> pub mod poll;
> pub mod resource;
>
> diff --git a/rust/kernel/io/pgtable.rs b/rust/kernel/io/pgtable.rs
> new file mode 100644
> index 0000000000000000000000000000000000000000..fe05bc1673f9a7741a887a3c9bbad866dd17a2b5
> --- /dev/null
> +++ b/rust/kernel/io/pgtable.rs
> @@ -0,0 +1,254 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! IOMMU page table management.
> +//!
> +//! C header: [`include/io-pgtable.h`](srctree/include/io-pgtable.h)
> +
> +use core::{
> + marker::PhantomData,
> + ptr::NonNull, //
> +};
> +
> +use crate::{
> + alloc,
> + bindings,
> + device::{Bound, Device},
> + devres::Devres,
> + error::to_result,
> + io::PhysAddr,
> + prelude::*, //
> +};
> +
> +use bindings::io_pgtable_fmt;
> +
> +/// Protection flags used with IOMMU mappings.
> +pub mod prot {
> + /// Read access.
> + pub const READ: u32 = bindings::IOMMU_READ;
> + /// Write access.
> + pub const WRITE: u32 = bindings::IOMMU_WRITE;
> + /// Request cache coherency.
> + pub const CACHE: u32 = bindings::IOMMU_CACHE;
> + /// Request no-execute permission.
> + pub const NOEXEC: u32 = bindings::IOMMU_NOEXEC;
> + /// MMIO peripheral mapping.
> + pub const MMIO: u32 = bindings::IOMMU_MMIO;
> + /// Privileged mapping.
> + pub const PRIV: u32 = bindings::IOMMU_PRIV;
> +}
> +
> +/// Represents a requested `io_pgtable` configuration.
> +pub struct Config {
> + /// Quirk bitmask (type-specific).
> + pub quirks: usize,
> + /// Valid page sizes, as a bitmask of powers of two.
> + pub pgsize_bitmap: usize,
> + /// Input address space size in bits.
> + pub ias: u32,
> + /// Output address space size in bits.
> + pub oas: u32,
> + /// IOMMU uses coherent accesses for page table walks.
> + pub coherent_walk: bool,
> +}
> +
> +/// An io page table using a specific format.
> +///
> +/// # Invariants
> +///
> +/// The pointer references a valid io page table.
> +pub struct IoPageTable<F> {
> + ptr: NonNull<bindings::io_pgtable_ops>,
> + _marker: PhantomData<F>,
> +}
> +
> +// SAFETY: `struct io_pgtable_ops` is not restricted to a single thread.
> +unsafe impl<F> Send for IoPageTable<F> {}
> +// SAFETY: `struct io_pgtable_ops` may be accessed concurrently.
> +unsafe impl<F> Sync for IoPageTable<F> {}
> +
> +/// The format used by this page table.
> +pub trait IoPageTableFmt: 'static {
> + /// The value representing this format.
> + const FORMAT: io_pgtable_fmt;
> +}
> +
> +impl<F: IoPageTableFmt> IoPageTable<F> {
> + /// Create a new `IoPageTable` as a device resource.
> + #[inline]
> + pub fn new(
> + dev: &Device<Bound>,
> + config: Config,
> + ) -> impl PinInit<Devres<IoPageTable<F>>, Error> + '_ {
> + // SAFETY: Devres ensures that the value is dropped during device unbind.
> + Devres::new(dev, unsafe { Self::new_raw(dev, config) })
> + }
> +
> + /// Create a new `IoPageTable`.
> + ///
> + /// # Safety
> + ///
> + /// If successful, then the returned value must be dropped before the device is unbound.
> + #[inline]
> + pub unsafe fn new_raw(dev: &Device<Bound>, config: Config) -> Result<IoPageTable<F>> {
> + let mut raw_cfg = bindings::io_pgtable_cfg {
> + quirks: config.quirks,
> + pgsize_bitmap: config.pgsize_bitmap,
> + ias: config.ias,
> + oas: config.oas,
> + coherent_walk: config.coherent_walk,
> + tlb: &raw const NOOP_FLUSH_OPS,
> + iommu_dev: dev.as_raw(),
> + // SAFETY: All zeroes is a valid value for `struct io_pgtable_cfg`.
> + ..unsafe { core::mem::zeroed() }
> + };
> +
> + // SAFETY:
> + // * The raw_cfg pointer is valid for the duration of this call.
> + // * The provided `FLUSH_OPS` contains valid function pointers that accept a null pointer
> + // as cookie.
> + // * The caller ensures that the io pgtable does not outlive the device.
> + let ops = unsafe {
> + bindings::alloc_io_pgtable_ops(F::FORMAT, &mut raw_cfg, core::ptr::null_mut())
> + };
> + // INVARIANT: We successfully created a valid page table.
> + Ok(IoPageTable {
> + ptr: NonNull::new(ops).ok_or(ENOMEM)?,
> + _marker: PhantomData,
> + })
> + }
> +
> + /// Obtain a raw pointer to the underlying `struct io_pgtable_ops`.
> + #[inline]
> + pub fn raw_ops(&self) -> *mut bindings::io_pgtable_ops {
> + self.ptr.as_ptr()
> + }
> +
> + /// Obtain a raw pointer to the underlying `struct io_pgtable`.
> + #[inline]
> + pub fn raw_pgtable(&self) -> *mut bindings::io_pgtable {
> + // SAFETY: The io_pgtable_ops of an io-pgtable is always the ops field of a io_pgtable.
> + unsafe { kernel::container_of!(self.raw_ops(), bindings::io_pgtable, ops) }
> + }
> +
> + /// Obtain a raw pointer to the underlying `struct io_pgtable_cfg`.
> + #[inline]
> + pub fn raw_cfg(&self) -> *mut bindings::io_pgtable_cfg {
> + // SAFETY: The `raw_pgtable()` method returns a valid pointer.
> + unsafe { &raw mut (*self.raw_pgtable()).cfg }
> + }
> +
I don’t think that drivers need these three accessors above to be pub.
> + /// Map a physically contiguous range of pages of the same size.
> + ///
> + /// # Safety
> + ///
> + /// * This page table must not contain any mapping that overlaps with the mapping created by
> + /// this call.
I don't think there is a restriction that forbids you from mapping a region that has already been mapped.
If A->B->C are adjacent regions, and you try to map A -> C when B->C is already mapped, I think this
call will simply return length(A->B).
> + /// * If this page table is live, then the caller must ensure that it's okay to access the
> + /// physical address being mapped for the duration in which it is mapped.
> + #[inline]
> + pub unsafe fn map_pages(
> + &self,
> + iova: usize,
> + paddr: PhysAddr,
> + pgsize: usize,
> + pgcount: usize,
> + prot: u32,
> + flags: alloc::Flags,
> + ) -> Result<usize> {
> + let mut mapped: usize = 0;
> +
> + // SAFETY: The `map_pages` function in `io_pgtable_ops` is never null.
> + let map_pages = unsafe { (*self.raw_ops()).map_pages.unwrap_unchecked() };
> +
> + // SAFETY: The safety requirements of this method are sufficient to call `map_pages`.
> + to_result(unsafe {
> + (map_pages)(
> + self.raw_ops(),
> + iova,
> + paddr,
> + pgsize,
> + pgcount,
> + prot as i32,
> + flags.as_raw(),
> + &mut mapped,
> + )
> + })?;
> +
> + Ok(mapped)
> + }
> +
> + /// Unmap a range of virtually contiguous pages of the same size.
> + ///
> + /// # Safety
> + ///
> + /// This page table must contain a mapping at `iova` that consists of exactly `pgcount` pages
> + /// of size `pgsize`.
Same here. I don’t think the above is necessarily a requirement.
> + #[inline]
> + pub unsafe fn unmap_pages(&self, iova: usize, pgsize: usize, pgcount: usize) -> usize {
> + // SAFETY: The `unmap_pages` function in `io_pgtable_ops` is never null.
> + let unmap_pages = unsafe { (*self.raw_ops()).unmap_pages.unwrap_unchecked() };
> +
> + // SAFETY: The safety requirements of this method are sufficient to call `unmap_pages`.
> + unsafe { (unmap_pages)(self.raw_ops(), iova, pgsize, pgcount, core::ptr::null_mut()) }
> + }
> +}
> +
> +// These bindings are currently designed for use by GPU drivers, which use this page table together
> +// with GPUVM. When using GPUVM, a single mapping operation may be translated into many operations
> +// on the page table, and in that case you generally want to flush the TLB only once per GPUVM
> +// operation. Thus, do not use these callbacks as they would flush more often than needed.
> +static NOOP_FLUSH_OPS: bindings::iommu_flush_ops = bindings::iommu_flush_ops {
> + tlb_flush_all: Some(rust_tlb_flush_all_noop),
> + tlb_flush_walk: Some(rust_tlb_flush_walk_noop),
> + tlb_add_page: None,
> +};
> +
> +#[no_mangle]
> +extern "C" fn rust_tlb_flush_all_noop(_cookie: *mut core::ffi::c_void) {}
> +
> +#[no_mangle]
> +extern "C" fn rust_tlb_flush_walk_noop(
> + _iova: usize,
> + _size: usize,
> + _granule: usize,
> + _cookie: *mut core::ffi::c_void,
> +) {
> +}
> +
> +impl<F> Drop for IoPageTable<F> {
> + fn drop(&mut self) {
> + // SAFETY: The caller of `ttbr` promised that the page table is not live when this
> + // destructor runs.
> + unsafe { bindings::free_io_pgtable_ops(self.0.ops) };
> + }
> +}
> +
> +/// The `ARM_64_LPAE_S1` page table format.
> +pub enum ARM64LPAES1 {}
> +
> +impl IoPageTableFmt for ARM64LPAES1 {
> + const FORMAT: io_pgtable_fmt = bindings::io_pgtable_fmt_ARM_64_LPAE_S1 as io_pgtable_fmt;
> +}
> +
> +impl IoPageTable<ARM64LPAES1> {
> + /// Access the `ttbr` field of the configuration.
> + ///
> + /// This is the physical address of the page table, which may be passed to the device that
> + /// needs to use it.
> + ///
> + /// # Safety
> + ///
> + /// The caller must ensure that the device stops using the page table before dropping it.
> + #[inline]
> + pub unsafe fn ttbr(&self) -> u64 {
> + // SAFETY: `arm_lpae_s1_cfg` is the right cfg type for `ARM64LPAES1`.
> + unsafe { (*self.raw_cfg()).__bindgen_anon_1.arm_lpae_s1_cfg.ttbr }
> + }
> +
> + /// Access the `mair` field of the configuration.
> + #[inline]
> + pub fn mair(&self) -> u64 {
> + // SAFETY: `arm_lpae_s1_cfg` is the right cfg type for `ARM64LPAES1`.
> + unsafe { (*self.raw_cfg()).__bindgen_anon_1.arm_lpae_s1_cfg.mair }
> + }
The two above will indeed be used by drivers, so it makes sense for them to be pub.
> +}
>
> ---
> base-commit: ffee675aceb9f44b0502a8bec912abb0c4f4af62
> change-id: 20251111-io-pgtable-fe0822b4ebdd
> prerequisite-change-id: 20251106-resource-phys-typedefs-6db37927d159:v2
> prerequisite-patch-id: 350421d8dbaf3db51b1243d82077c5eb88f54db5
> prerequisite-patch-id: ac0166fb3cd235de76841789173051191a4d2434
> prerequisite-patch-id: f4bca02c77c40093690b66cdf477f928784bdbf4
> prerequisite-patch-id: 083d1c22b1a7eb0dcae37052b926362543c68e8a
>
> Best regards,
> --
> Alice Ryhl <aliceryhl@google.com>
>
>
This overall looks ok to me, and I quite like how much cleaner it is now.
However, I think a discussion will naturally emerge from making
map/unmap_pages unsafe functions, and it seems like the safety requirements
do not apply.
— Daniel
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH v3] io: add io_pgtable abstraction
2025-11-12 12:57 ` Daniel Almeida
@ 2025-11-17 16:34 ` Alice Ryhl
2025-11-19 8:59 ` Boris Brezillon
2025-11-19 10:53 ` Boris Brezillon
1 sibling, 1 reply; 12+ messages in thread
From: Alice Ryhl @ 2025-11-17 16:34 UTC (permalink / raw)
To: Daniel Almeida
Cc: Miguel Ojeda, Will Deacon, Boris Brezillon, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Trevor Gross, Danilo Krummrich, Joerg Roedel, Robin Murphy,
Lorenzo Stoakes, Liam R. Howlett, Asahi Lina, linux-kernel,
rust-for-linux, iommu, linux-mm
On Wed, Nov 12, 2025 at 09:57:09AM -0300, Daniel Almeida wrote:
> Hi Alice,
>
> > On 12 Nov 2025, at 07:15, Alice Ryhl <aliceryhl@google.com> wrote:
> > + /// Map a physically contiguous range of pages of the same size.
> > + ///
> > + /// # Safety
> > + ///
> > + /// * This page table must not contain any mapping that overlaps with the mapping created by
> > + /// this call.
>
> I don't think there is a restriction that forbids you from mapping a region that has already been mapped.
>
> If A->B->C are adjacent regions, and you try to map A -> C when B->C is already mapped, I think this
> call will simply return length(A->B).
I think that such overlapping ranges are not allowed. It triggers a
warn:
} else if (pte) {
/* We require an unmap first */
WARN_ON(!(cfg->quirks & IO_PGTABLE_QUIRK_NO_WARN));
return -EEXIST;
Alice
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH v3] io: add io_pgtable abstraction
2025-11-17 16:34 ` Alice Ryhl
@ 2025-11-19 8:59 ` Boris Brezillon
0 siblings, 0 replies; 12+ messages in thread
From: Boris Brezillon @ 2025-11-19 8:59 UTC (permalink / raw)
To: Alice Ryhl
Cc: Daniel Almeida, Miguel Ojeda, Will Deacon, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Trevor Gross, Danilo Krummrich, Joerg Roedel, Robin Murphy,
Lorenzo Stoakes, Liam R. Howlett, Asahi Lina, linux-kernel,
rust-for-linux, iommu, linux-mm
On Mon, 17 Nov 2025 16:34:06 +0000
Alice Ryhl <aliceryhl@google.com> wrote:
> On Wed, Nov 12, 2025 at 09:57:09AM -0300, Daniel Almeida wrote:
> > Hi Alice,
> >
> > > On 12 Nov 2025, at 07:15, Alice Ryhl <aliceryhl@google.com> wrote:
> > > + /// Map a physically contiguous range of pages of the same size.
> > > + ///
> > > + /// # Safety
> > > + ///
> > > + /// * This page table must not contain any mapping that overlaps with the mapping created by
> > > + /// this call.
> >
> > I don't think there is a restriction that forbids you from mapping a region that has already been mapped.
> >
> > If A->B->C are adjacent regions, and you try to map A -> C when B->C is already mapped, I think this
> > call will simply return length(A->B).
>
> I think that such overlapping ranges are not allowed. It triggers a
> warn:
>
> } else if (pte) {
> /* We require an unmap first */
> WARN_ON(!(cfg->quirks & IO_PGTABLE_QUIRK_NO_WARN));
> return -EEXIST;
I confirm the io_pgtable framework (or at least some of its
implementations) doesn't allow you to do that (remap without an
explicit unmap coming first).
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3] io: add io_pgtable abstraction
2025-11-12 12:57 ` Daniel Almeida
2025-11-17 16:34 ` Alice Ryhl
@ 2025-11-19 10:53 ` Boris Brezillon
1 sibling, 0 replies; 12+ messages in thread
From: Boris Brezillon @ 2025-11-19 10:53 UTC (permalink / raw)
To: Daniel Almeida
Cc: Alice Ryhl, Miguel Ojeda, Will Deacon, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Trevor Gross, Danilo Krummrich, Joerg Roedel, Robin Murphy,
Lorenzo Stoakes, Liam R. Howlett, Asahi Lina, linux-kernel,
rust-for-linux, iommu, linux-mm
On Wed, 12 Nov 2025 09:57:09 -0300
Daniel Almeida <daniel.almeida@collabora.com> wrote:
> > + /// Unmap a range of virtually contiguous pages of the same size.
> > + ///
> > + /// # Safety
> > + ///
> > + /// This page table must contain a mapping at `iova` that consists of exactly `pgcount` pages
> > + /// of size `pgsize`.
>
> Same here. I don’t think the above is necessarily a requirement.
It's not, indeed, the returned size will tell you how much was unmapped
if there's less to unmap, or the region has holes.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3] io: add io_pgtable abstraction
2025-11-12 10:15 [PATCH v3] io: add io_pgtable abstraction Alice Ryhl
2025-11-12 12:57 ` Daniel Almeida
@ 2025-11-19 10:56 ` Boris Brezillon
2025-11-28 11:56 ` Robin Murphy
2025-11-28 18:02 ` Jason Gunthorpe
3 siblings, 0 replies; 12+ messages in thread
From: Boris Brezillon @ 2025-11-19 10:56 UTC (permalink / raw)
To: Alice Ryhl
Cc: Miguel Ojeda, Will Deacon, Daniel Almeida, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Trevor Gross, Danilo Krummrich, Joerg Roedel, Robin Murphy,
Lorenzo Stoakes, Liam R. Howlett, Asahi Lina, linux-kernel,
rust-for-linux, iommu, linux-mm
On Wed, 12 Nov 2025 10:15:00 +0000
Alice Ryhl <aliceryhl@google.com> wrote:
> From: Asahi Lina <lina+kernel@asahilina.net>
>
> This will be used by the Tyr driver to create and modify the page table
> of each address space on the GPU. Each time a mapping gets created or
> removed by userspace, Tyr will call into GPUVM, which will figure out
> which calls to map_pages and unmap_pages are required to map the data in
> question in the page table so that the GPU may access those pages when
> using that address space.
>
> The Rust type wraps the struct using a raw pointer rather than the usual
> Opaque+ARef approach because Opaque+ARef requires the target type to be
> refcounted.
>
> Signed-off-by: Asahi Lina <lina+kernel@asahilina.net>
> Co-Developed-by: Alice Ryhl <aliceryhl@google.com>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
This seems to be enough for what we need in Tyr (which is what we
basically have in Panthor, but translated to rust)
I'm no rust expert and I'm not an iommu maintainer either, so I'm not
sure how useful that is, but this is
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
> ---
> This patch is based on [1] but I have rewritten and simplified large
> parts of it. The Asahi driver no longer uses the io-pgtable abstraction,
> and Nova never planned to (since NVIDIA has its own separate memory).
> Therefore, I have simplified these abstractions to fit the needs of the
> Tyr GPU driver.
>
> This series depends on the PhysAddr typedef [2].
>
> [1]: https://lore.kernel.org/all/20250623-io_pgtable-v2-1-fd72daac75f1@collabora.com/
> [2]: https://lore.kernel.org/all/20251112-resource-phys-typedefs-v2-0-538307384f82@google.com/
> ---
> rust/bindings/bindings_helper.h | 3 +-
> rust/kernel/io.rs | 1 +
> rust/kernel/io/pgtable.rs | 254 ++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 257 insertions(+), 1 deletion(-)
>
> diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
> index 2e43c66635a2c9f31bd99b9817bd2d6ab89fbcf2..faab6bc9463321c092a8bbcb6281175e490caccd 100644
> --- a/rust/bindings/bindings_helper.h
> +++ b/rust/bindings/bindings_helper.h
> @@ -56,8 +56,9 @@
> #include <linux/fdtable.h>
> #include <linux/file.h>
> #include <linux/firmware.h>
> -#include <linux/interrupt.h>
> #include <linux/fs.h>
> +#include <linux/interrupt.h>
> +#include <linux/io-pgtable.h>
> #include <linux/ioport.h>
> #include <linux/jiffies.h>
> #include <linux/jump_label.h>
> diff --git a/rust/kernel/io.rs b/rust/kernel/io.rs
> index 56a435eb14e3a1ce72dd58b88cbf296041f1703e..5913e240d5a9814ceed52c6dc1a798e64158d567 100644
> --- a/rust/kernel/io.rs
> +++ b/rust/kernel/io.rs
> @@ -8,6 +8,7 @@
> use crate::{bindings, build_assert, ffi::c_void};
>
> pub mod mem;
> +pub mod pgtable;
> pub mod poll;
> pub mod resource;
>
> diff --git a/rust/kernel/io/pgtable.rs b/rust/kernel/io/pgtable.rs
> new file mode 100644
> index 0000000000000000000000000000000000000000..fe05bc1673f9a7741a887a3c9bbad866dd17a2b5
> --- /dev/null
> +++ b/rust/kernel/io/pgtable.rs
> @@ -0,0 +1,254 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! IOMMU page table management.
> +//!
> +//! C header: [`include/io-pgtable.h`](srctree/include/io-pgtable.h)
> +
> +use core::{
> + marker::PhantomData,
> + ptr::NonNull, //
> +};
> +
> +use crate::{
> + alloc,
> + bindings,
> + device::{Bound, Device},
> + devres::Devres,
> + error::to_result,
> + io::PhysAddr,
> + prelude::*, //
> +};
> +
> +use bindings::io_pgtable_fmt;
> +
> +/// Protection flags used with IOMMU mappings.
> +pub mod prot {
> + /// Read access.
> + pub const READ: u32 = bindings::IOMMU_READ;
> + /// Write access.
> + pub const WRITE: u32 = bindings::IOMMU_WRITE;
> + /// Request cache coherency.
> + pub const CACHE: u32 = bindings::IOMMU_CACHE;
> + /// Request no-execute permission.
> + pub const NOEXEC: u32 = bindings::IOMMU_NOEXEC;
> + /// MMIO peripheral mapping.
> + pub const MMIO: u32 = bindings::IOMMU_MMIO;
> + /// Privileged mapping.
> + pub const PRIV: u32 = bindings::IOMMU_PRIV;
> +}
> +
> +/// Represents a requested `io_pgtable` configuration.
> +pub struct Config {
> + /// Quirk bitmask (type-specific).
> + pub quirks: usize,
> + /// Valid page sizes, as a bitmask of powers of two.
> + pub pgsize_bitmap: usize,
> + /// Input address space size in bits.
> + pub ias: u32,
> + /// Output address space size in bits.
> + pub oas: u32,
> + /// IOMMU uses coherent accesses for page table walks.
> + pub coherent_walk: bool,
> +}
> +
> +/// An io page table using a specific format.
> +///
> +/// # Invariants
> +///
> +/// The pointer references a valid io page table.
> +pub struct IoPageTable<F> {
> + ptr: NonNull<bindings::io_pgtable_ops>,
> + _marker: PhantomData<F>,
> +}
> +
> +// SAFETY: `struct io_pgtable_ops` is not restricted to a single thread.
> +unsafe impl<F> Send for IoPageTable<F> {}
> +// SAFETY: `struct io_pgtable_ops` may be accessed concurrently.
> +unsafe impl<F> Sync for IoPageTable<F> {}
> +
> +/// The format used by this page table.
> +pub trait IoPageTableFmt: 'static {
> + /// The value representing this format.
> + const FORMAT: io_pgtable_fmt;
> +}
> +
> +impl<F: IoPageTableFmt> IoPageTable<F> {
> + /// Create a new `IoPageTable` as a device resource.
> + #[inline]
> + pub fn new(
> + dev: &Device<Bound>,
> + config: Config,
> + ) -> impl PinInit<Devres<IoPageTable<F>>, Error> + '_ {
> + // SAFETY: Devres ensures that the value is dropped during device unbind.
> + Devres::new(dev, unsafe { Self::new_raw(dev, config) })
> + }
> +
> + /// Create a new `IoPageTable`.
> + ///
> + /// # Safety
> + ///
> + /// If successful, then the returned value must be dropped before the device is unbound.
> + #[inline]
> + pub unsafe fn new_raw(dev: &Device<Bound>, config: Config) -> Result<IoPageTable<F>> {
> + let mut raw_cfg = bindings::io_pgtable_cfg {
> + quirks: config.quirks,
> + pgsize_bitmap: config.pgsize_bitmap,
> + ias: config.ias,
> + oas: config.oas,
> + coherent_walk: config.coherent_walk,
> + tlb: &raw const NOOP_FLUSH_OPS,
> + iommu_dev: dev.as_raw(),
> + // SAFETY: All zeroes is a valid value for `struct io_pgtable_cfg`.
> + ..unsafe { core::mem::zeroed() }
> + };
> +
> + // SAFETY:
> + // * The raw_cfg pointer is valid for the duration of this call.
> + // * The provided `FLUSH_OPS` contains valid function pointers that accept a null pointer
> + // as cookie.
> + // * The caller ensures that the io pgtable does not outlive the device.
> + let ops = unsafe {
> + bindings::alloc_io_pgtable_ops(F::FORMAT, &mut raw_cfg, core::ptr::null_mut())
> + };
> + // INVARIANT: We successfully created a valid page table.
> + Ok(IoPageTable {
> + ptr: NonNull::new(ops).ok_or(ENOMEM)?,
> + _marker: PhantomData,
> + })
> + }
> +
> + /// Obtain a raw pointer to the underlying `struct io_pgtable_ops`.
> + #[inline]
> + pub fn raw_ops(&self) -> *mut bindings::io_pgtable_ops {
> + self.ptr.as_ptr()
> + }
> +
> + /// Obtain a raw pointer to the underlying `struct io_pgtable`.
> + #[inline]
> + pub fn raw_pgtable(&self) -> *mut bindings::io_pgtable {
> + // SAFETY: The io_pgtable_ops of an io-pgtable is always the ops field of a io_pgtable.
> + unsafe { kernel::container_of!(self.raw_ops(), bindings::io_pgtable, ops) }
> + }
> +
> + /// Obtain a raw pointer to the underlying `struct io_pgtable_cfg`.
> + #[inline]
> + pub fn raw_cfg(&self) -> *mut bindings::io_pgtable_cfg {
> + // SAFETY: The `raw_pgtable()` method returns a valid pointer.
> + unsafe { &raw mut (*self.raw_pgtable()).cfg }
> + }
> +
> + /// Map a physically contiguous range of pages of the same size.
> + ///
> + /// # Safety
> + ///
> + /// * This page table must not contain any mapping that overlaps with the mapping created by
> + /// this call.
> + /// * If this page table is live, then the caller must ensure that it's okay to access the
> + /// physical address being mapped for the duration in which it is mapped.
> + #[inline]
> + pub unsafe fn map_pages(
> + &self,
> + iova: usize,
> + paddr: PhysAddr,
> + pgsize: usize,
> + pgcount: usize,
> + prot: u32,
> + flags: alloc::Flags,
> + ) -> Result<usize> {
> + let mut mapped: usize = 0;
> +
> + // SAFETY: The `map_pages` function in `io_pgtable_ops` is never null.
> + let map_pages = unsafe { (*self.raw_ops()).map_pages.unwrap_unchecked() };
> +
> + // SAFETY: The safety requirements of this method are sufficient to call `map_pages`.
> + to_result(unsafe {
> + (map_pages)(
> + self.raw_ops(),
> + iova,
> + paddr,
> + pgsize,
> + pgcount,
> + prot as i32,
> + flags.as_raw(),
> + &mut mapped,
> + )
> + })?;
> +
> + Ok(mapped)
> + }
> +
> + /// Unmap a range of virtually contiguous pages of the same size.
> + ///
> + /// # Safety
> + ///
> + /// This page table must contain a mapping at `iova` that consists of exactly `pgcount` pages
> + /// of size `pgsize`.
> + #[inline]
> + pub unsafe fn unmap_pages(&self, iova: usize, pgsize: usize, pgcount: usize) -> usize {
> + // SAFETY: The `unmap_pages` function in `io_pgtable_ops` is never null.
> + let unmap_pages = unsafe { (*self.raw_ops()).unmap_pages.unwrap_unchecked() };
> +
> + // SAFETY: The safety requirements of this method are sufficient to call `unmap_pages`.
> + unsafe { (unmap_pages)(self.raw_ops(), iova, pgsize, pgcount, core::ptr::null_mut()) }
> + }
> +}
> +
> +// These bindings are currently designed for use by GPU drivers, which use this page table together
> +// with GPUVM. When using GPUVM, a single mapping operation may be translated into many operations
> +// on the page table, and in that case you generally want to flush the TLB only once per GPUVM
> +// operation. Thus, do not use these callbacks as they would flush more often than needed.
> +static NOOP_FLUSH_OPS: bindings::iommu_flush_ops = bindings::iommu_flush_ops {
> + tlb_flush_all: Some(rust_tlb_flush_all_noop),
> + tlb_flush_walk: Some(rust_tlb_flush_walk_noop),
> + tlb_add_page: None,
> +};
> +
> +#[no_mangle]
> +extern "C" fn rust_tlb_flush_all_noop(_cookie: *mut core::ffi::c_void) {}
> +
> +#[no_mangle]
> +extern "C" fn rust_tlb_flush_walk_noop(
> + _iova: usize,
> + _size: usize,
> + _granule: usize,
> + _cookie: *mut core::ffi::c_void,
> +) {
> +}
> +
> +impl<F> Drop for IoPageTable<F> {
> + fn drop(&mut self) {
> + // SAFETY: The caller of `ttbr` promised that the page table is not live when this
> + // destructor runs.
> + unsafe { bindings::free_io_pgtable_ops(self.0.ops) };
> + }
> +}
> +
> +/// The `ARM_64_LPAE_S1` page table format.
> +pub enum ARM64LPAES1 {}
> +
> +impl IoPageTableFmt for ARM64LPAES1 {
> + const FORMAT: io_pgtable_fmt = bindings::io_pgtable_fmt_ARM_64_LPAE_S1 as io_pgtable_fmt;
> +}
> +
> +impl IoPageTable<ARM64LPAES1> {
> + /// Access the `ttbr` field of the configuration.
> + ///
> + /// This is the physical address of the page table, which may be passed to the device that
> + /// needs to use it.
> + ///
> + /// # Safety
> + ///
> + /// The caller must ensure that the device stops using the page table before dropping it.
> + #[inline]
> + pub unsafe fn ttbr(&self) -> u64 {
> + // SAFETY: `arm_lpae_s1_cfg` is the right cfg type for `ARM64LPAES1`.
> + unsafe { (*self.raw_cfg()).__bindgen_anon_1.arm_lpae_s1_cfg.ttbr }
> + }
> +
> + /// Access the `mair` field of the configuration.
> + #[inline]
> + pub fn mair(&self) -> u64 {
> + // SAFETY: `arm_lpae_s1_cfg` is the right cfg type for `ARM64LPAES1`.
> + unsafe { (*self.raw_cfg()).__bindgen_anon_1.arm_lpae_s1_cfg.mair }
> + }
> +}
>
> ---
> base-commit: ffee675aceb9f44b0502a8bec912abb0c4f4af62
> change-id: 20251111-io-pgtable-fe0822b4ebdd
> prerequisite-change-id: 20251106-resource-phys-typedefs-6db37927d159:v2
> prerequisite-patch-id: 350421d8dbaf3db51b1243d82077c5eb88f54db5
> prerequisite-patch-id: ac0166fb3cd235de76841789173051191a4d2434
> prerequisite-patch-id: f4bca02c77c40093690b66cdf477f928784bdbf4
> prerequisite-patch-id: 083d1c22b1a7eb0dcae37052b926362543c68e8a
>
> Best regards,
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH v3] io: add io_pgtable abstraction
2025-11-12 10:15 [PATCH v3] io: add io_pgtable abstraction Alice Ryhl
2025-11-12 12:57 ` Daniel Almeida
2025-11-19 10:56 ` Boris Brezillon
@ 2025-11-28 11:56 ` Robin Murphy
2025-11-28 12:27 ` Alice Ryhl
2025-11-28 18:02 ` Jason Gunthorpe
3 siblings, 1 reply; 12+ messages in thread
From: Robin Murphy @ 2025-11-28 11:56 UTC (permalink / raw)
To: Alice Ryhl, Miguel Ojeda, Will Deacon, Daniel Almeida, Boris Brezillon
Cc: Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Trevor Gross, Danilo Krummrich, Joerg Roedel,
Lorenzo Stoakes, Liam R. Howlett, Asahi Lina, linux-kernel,
rust-for-linux, iommu, linux-mm
On 2025-11-12 10:15 am, Alice Ryhl wrote:
> From: Asahi Lina <lina+kernel@asahilina.net>
>
> This will be used by the Tyr driver to create and modify the page table
> of each address space on the GPU. Each time a mapping gets created or
> removed by userspace, Tyr will call into GPUVM, which will figure out
> which calls to map_pages and unmap_pages are required to map the data in
> question in the page table so that the GPU may access those pages when
> using that address space.
>
> The Rust type wraps the struct using a raw pointer rather than the usual
> Opaque+ARef approach because Opaque+ARef requires the target type to be
> refcounted.
>
> Signed-off-by: Asahi Lina <lina+kernel@asahilina.net>
> Co-Developed-by: Alice Ryhl <aliceryhl@google.com>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> ---
> This patch is based on [1] but I have rewritten and simplified large
> parts of it. The Asahi driver no longer uses the io-pgtable abstraction,
> and Nova never planned to (since NVIDIA has its own separate memory).
> Therefore, I have simplified these abstractions to fit the needs of the
> Tyr GPU driver.
>
> This series depends on the PhysAddr typedef [2].
>
> [1]: https://lore.kernel.org/all/20250623-io_pgtable-v2-1-fd72daac75f1@collabora.com/
> [2]: https://lore.kernel.org/all/20251112-resource-phys-typedefs-v2-0-538307384f82@google.com/
> ---
> rust/bindings/bindings_helper.h | 3 +-
> rust/kernel/io.rs | 1 +
> rust/kernel/io/pgtable.rs | 254 ++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 257 insertions(+), 1 deletion(-)
>
> diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
> index 2e43c66635a2c9f31bd99b9817bd2d6ab89fbcf2..faab6bc9463321c092a8bbcb6281175e490caccd 100644
> --- a/rust/bindings/bindings_helper.h
> +++ b/rust/bindings/bindings_helper.h
> @@ -56,8 +56,9 @@
> #include <linux/fdtable.h>
> #include <linux/file.h>
> #include <linux/firmware.h>
> -#include <linux/interrupt.h>
> #include <linux/fs.h>
> +#include <linux/interrupt.h>
> +#include <linux/io-pgtable.h>
> #include <linux/ioport.h>
> #include <linux/jiffies.h>
> #include <linux/jump_label.h>
> diff --git a/rust/kernel/io.rs b/rust/kernel/io.rs
> index 56a435eb14e3a1ce72dd58b88cbf296041f1703e..5913e240d5a9814ceed52c6dc1a798e64158d567 100644
> --- a/rust/kernel/io.rs
> +++ b/rust/kernel/io.rs
> @@ -8,6 +8,7 @@
> use crate::{bindings, build_assert, ffi::c_void};
>
> pub mod mem;
> +pub mod pgtable;
> pub mod poll;
> pub mod resource;
>
> diff --git a/rust/kernel/io/pgtable.rs b/rust/kernel/io/pgtable.rs
> new file mode 100644
> index 0000000000000000000000000000000000000000..fe05bc1673f9a7741a887a3c9bbad866dd17a2b5
> --- /dev/null
> +++ b/rust/kernel/io/pgtable.rs
> @@ -0,0 +1,254 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! IOMMU page table management.
> +//!
> +//! C header: [`include/io-pgtable.h`](srctree/include/io-pgtable.h)
> +
> +use core::{
> + marker::PhantomData,
> + ptr::NonNull, //
> +};
> +
> +use crate::{
> + alloc,
> + bindings,
> + device::{Bound, Device},
> + devres::Devres,
> + error::to_result,
> + io::PhysAddr,
> + prelude::*, //
> +};
> +
> +use bindings::io_pgtable_fmt;
> +
> +/// Protection flags used with IOMMU mappings.
> +pub mod prot {
> + /// Read access.
> + pub const READ: u32 = bindings::IOMMU_READ;
> + /// Write access.
> + pub const WRITE: u32 = bindings::IOMMU_WRITE;
> + /// Request cache coherency.
> + pub const CACHE: u32 = bindings::IOMMU_CACHE;
> + /// Request no-execute permission.
> + pub const NOEXEC: u32 = bindings::IOMMU_NOEXEC;
> + /// MMIO peripheral mapping.
> + pub const MMIO: u32 = bindings::IOMMU_MMIO;
> + /// Privileged mapping.
> + pub const PRIV: u32 = bindings::IOMMU_PRIV;
Nit: probably best to call this PRIVILEGED from day 1 for clarity - some
day we may eventually get round to renaming the C symbol too, especially
if we revisit the notion of "private" mappings (that's still on my ideas
list...)
> +}
> +
> +/// Represents a requested `io_pgtable` configuration.
> +pub struct Config {
> + /// Quirk bitmask (type-specific).
> + pub quirks: usize,
> + /// Valid page sizes, as a bitmask of powers of two.
> + pub pgsize_bitmap: usize,
> + /// Input address space size in bits.
> + pub ias: u32,
> + /// Output address space size in bits.
> + pub oas: u32,
> + /// IOMMU uses coherent accesses for page table walks.
> + pub coherent_walk: bool,
> +}
> +
> +/// An io page table using a specific format.
> +///
> +/// # Invariants
> +///
> +/// The pointer references a valid io page table.
> +pub struct IoPageTable<F> {
> + ptr: NonNull<bindings::io_pgtable_ops>,
> + _marker: PhantomData<F>,
> +}
> +
> +// SAFETY: `struct io_pgtable_ops` is not restricted to a single thread.
> +unsafe impl<F> Send for IoPageTable<F> {}
> +// SAFETY: `struct io_pgtable_ops` may be accessed concurrently.
> +unsafe impl<F> Sync for IoPageTable<F> {}
> +
> +/// The format used by this page table.
> +pub trait IoPageTableFmt: 'static {
> + /// The value representing this format.
> + const FORMAT: io_pgtable_fmt;
> +}
> +
> +impl<F: IoPageTableFmt> IoPageTable<F> {
> + /// Create a new `IoPageTable` as a device resource.
> + #[inline]
> + pub fn new(
> + dev: &Device<Bound>,
> + config: Config,
> + ) -> impl PinInit<Devres<IoPageTable<F>>, Error> + '_ {
> + // SAFETY: Devres ensures that the value is dropped during device unbind.
> + Devres::new(dev, unsafe { Self::new_raw(dev, config) })
> + }
> +
> + /// Create a new `IoPageTable`.
> + ///
> + /// # Safety
> + ///
> + /// If successful, then the returned value must be dropped before the device is unbound.
> + #[inline]
> + pub unsafe fn new_raw(dev: &Device<Bound>, config: Config) -> Result<IoPageTable<F>> {
> + let mut raw_cfg = bindings::io_pgtable_cfg {
> + quirks: config.quirks,
> + pgsize_bitmap: config.pgsize_bitmap,
> + ias: config.ias,
> + oas: config.oas,
> + coherent_walk: config.coherent_walk,
> + tlb: &raw const NOOP_FLUSH_OPS,
> + iommu_dev: dev.as_raw(),
> + // SAFETY: All zeroes is a valid value for `struct io_pgtable_cfg`.
> + ..unsafe { core::mem::zeroed() }
> + };
> +
> + // SAFETY:
> + // * The raw_cfg pointer is valid for the duration of this call.
> + // * The provided `FLUSH_OPS` contains valid function pointers that accept a null pointer
> + // as cookie.
> + // * The caller ensures that the io pgtable does not outlive the device.
> + let ops = unsafe {
> + bindings::alloc_io_pgtable_ops(F::FORMAT, &mut raw_cfg, core::ptr::null_mut())
> + };
> + // INVARIANT: We successfully created a valid page table.
> + Ok(IoPageTable {
> + ptr: NonNull::new(ops).ok_or(ENOMEM)?,
> + _marker: PhantomData,
> + })
> + }
> +
> + /// Obtain a raw pointer to the underlying `struct io_pgtable_ops`.
> + #[inline]
> + pub fn raw_ops(&self) -> *mut bindings::io_pgtable_ops {
> + self.ptr.as_ptr()
> + }
> +
> + /// Obtain a raw pointer to the underlying `struct io_pgtable`.
> + #[inline]
> + pub fn raw_pgtable(&self) -> *mut bindings::io_pgtable {
> + // SAFETY: The io_pgtable_ops of an io-pgtable is always the ops field of a io_pgtable.
> + unsafe { kernel::container_of!(self.raw_ops(), bindings::io_pgtable, ops) }
> + }
> +
> + /// Obtain a raw pointer to the underlying `struct io_pgtable_cfg`.
> + #[inline]
> + pub fn raw_cfg(&self) -> *mut bindings::io_pgtable_cfg {
> + // SAFETY: The `raw_pgtable()` method returns a valid pointer.
> + unsafe { &raw mut (*self.raw_pgtable()).cfg }
> + }
> +
> + /// Map a physically contiguous range of pages of the same size.
> + ///
> + /// # Safety
> + ///
> + /// * This page table must not contain any mapping that overlaps with the mapping created by
> + /// this call.
As mentioned this isn't necessarily true of io-pgtable itself, but since
you've not included QUIRK_NO_WARN in the abstraction then it's fair if
this layer wants to be a little stricter toward Rust users.
> + /// * If this page table is live, then the caller must ensure that it's okay to access the
> + /// physical address being mapped for the duration in which it is mapped.
> + #[inline]
> + pub unsafe fn map_pages(
> + &self,
> + iova: usize,
> + paddr: PhysAddr,
> + pgsize: usize,
> + pgcount: usize,
> + prot: u32,
> + flags: alloc::Flags,
> + ) -> Result<usize> {
> + let mut mapped: usize = 0;
> +
> + // SAFETY: The `map_pages` function in `io_pgtable_ops` is never null.
> + let map_pages = unsafe { (*self.raw_ops()).map_pages.unwrap_unchecked() };
> +
> + // SAFETY: The safety requirements of this method are sufficient to call `map_pages`.
> + to_result(unsafe {
> + (map_pages)(
> + self.raw_ops(),
> + iova,
> + paddr,
> + pgsize,
> + pgcount,
> + prot as i32,
> + flags.as_raw(),
> + &mut mapped,
> + )
> + })?;
> +
> + Ok(mapped)
Just to double-check since I'm a bit unclear on the Rust semantics, this
can correctly reflect all 4 outcomes back to the caller, right? I.e.:
- no error, mapped == pgcount * pgsize (success)
- no error, mapped < pgcount * pgsize (call again with the remainder)
- error, mapped > 0 (probably unmap that bit, unless clever trickery
where an error was expected)
- error, mapped == 0 (nothing was done, straightforward failure)
(the only case not permitted is "no error, mapped == 0" - failure to
make any progress must always be an error)
Alternatively you might want to consider encapsulating the
partial-mapping handling in this layer as well - in the C code that's
done at the level of the IOMMU API calls that io-pgtable-using IOMMU
drivers are merely passing through, hence why panfrost/panthor have to
open-code their own equivalents, but there's no particular reason to
follow the *exact* same pattern here.
> + }
> +
> + /// Unmap a range of virtually contiguous pages of the same size.
> + ///
> + /// # Safety
> + ///
> + /// This page table must contain a mapping at `iova` that consists of exactly `pgcount` pages
> + /// of size `pgsize`.
Again, the underlying requirement here is only that pgsize * pgcount
represents the IOVA range of one or more consecutive ranges previously
mapped, i.e.:
map(0, 4KB * 256);
map(1MB, 4KB * 256);
unmap(0, 2MB * 1);
is legal, since it's generally impractical for callers to know and keep
track of the *exact* structure of a given pagetable. In this case there
isn't really any good reason to try to be stricter.
Thanks,
Robin.
> + #[inline]
> + pub unsafe fn unmap_pages(&self, iova: usize, pgsize: usize, pgcount: usize) -> usize {
> + // SAFETY: The `unmap_pages` function in `io_pgtable_ops` is never null.
> + let unmap_pages = unsafe { (*self.raw_ops()).unmap_pages.unwrap_unchecked() };
> +
> + // SAFETY: The safety requirements of this method are sufficient to call `unmap_pages`.
> + unsafe { (unmap_pages)(self.raw_ops(), iova, pgsize, pgcount, core::ptr::null_mut()) }
> + }
> +}
> +
> +// These bindings are currently designed for use by GPU drivers, which use this page table together
> +// with GPUVM. When using GPUVM, a single mapping operation may be translated into many operations
> +// on the page table, and in that case you generally want to flush the TLB only once per GPUVM
> +// operation. Thus, do not use these callbacks as they would flush more often than needed.
> +static NOOP_FLUSH_OPS: bindings::iommu_flush_ops = bindings::iommu_flush_ops {
> + tlb_flush_all: Some(rust_tlb_flush_all_noop),
> + tlb_flush_walk: Some(rust_tlb_flush_walk_noop),
> + tlb_add_page: None,
> +};
> +
> +#[no_mangle]
> +extern "C" fn rust_tlb_flush_all_noop(_cookie: *mut core::ffi::c_void) {}
> +
> +#[no_mangle]
> +extern "C" fn rust_tlb_flush_walk_noop(
> + _iova: usize,
> + _size: usize,
> + _granule: usize,
> + _cookie: *mut core::ffi::c_void,
> +) {
> +}
> +
> +impl<F> Drop for IoPageTable<F> {
> + fn drop(&mut self) {
> + // SAFETY: The caller of `ttbr` promised that the page table is not live when this
> + // destructor runs.
> + unsafe { bindings::free_io_pgtable_ops(self.0.ops) };
> + }
> +}
> +
> +/// The `ARM_64_LPAE_S1` page table format.
> +pub enum ARM64LPAES1 {}
> +
> +impl IoPageTableFmt for ARM64LPAES1 {
> + const FORMAT: io_pgtable_fmt = bindings::io_pgtable_fmt_ARM_64_LPAE_S1 as io_pgtable_fmt;
> +}
> +
> +impl IoPageTable<ARM64LPAES1> {
> + /// Access the `ttbr` field of the configuration.
> + ///
> + /// This is the physical address of the page table, which may be passed to the device that
> + /// needs to use it.
> + ///
> + /// # Safety
> + ///
> + /// The caller must ensure that the device stops using the page table before dropping it.
> + #[inline]
> + pub unsafe fn ttbr(&self) -> u64 {
> + // SAFETY: `arm_lpae_s1_cfg` is the right cfg type for `ARM64LPAES1`.
> + unsafe { (*self.raw_cfg()).__bindgen_anon_1.arm_lpae_s1_cfg.ttbr }
> + }
> +
> + /// Access the `mair` field of the configuration.
> + #[inline]
> + pub fn mair(&self) -> u64 {
> + // SAFETY: `arm_lpae_s1_cfg` is the right cfg type for `ARM64LPAES1`.
> + unsafe { (*self.raw_cfg()).__bindgen_anon_1.arm_lpae_s1_cfg.mair }
> + }
> +}
>
> ---
> base-commit: ffee675aceb9f44b0502a8bec912abb0c4f4af62
> change-id: 20251111-io-pgtable-fe0822b4ebdd
> prerequisite-change-id: 20251106-resource-phys-typedefs-6db37927d159:v2
> prerequisite-patch-id: 350421d8dbaf3db51b1243d82077c5eb88f54db5
> prerequisite-patch-id: ac0166fb3cd235de76841789173051191a4d2434
> prerequisite-patch-id: f4bca02c77c40093690b66cdf477f928784bdbf4
> prerequisite-patch-id: 083d1c22b1a7eb0dcae37052b926362543c68e8a
>
> Best regards,
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH v3] io: add io_pgtable abstraction
2025-11-28 11:56 ` Robin Murphy
@ 2025-11-28 12:27 ` Alice Ryhl
2025-11-28 16:47 ` Robin Murphy
0 siblings, 1 reply; 12+ messages in thread
From: Alice Ryhl @ 2025-11-28 12:27 UTC (permalink / raw)
To: Robin Murphy
Cc: Miguel Ojeda, Will Deacon, Daniel Almeida, Boris Brezillon,
Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Trevor Gross, Danilo Krummrich, Joerg Roedel,
Lorenzo Stoakes, Liam R. Howlett, Asahi Lina, linux-kernel,
rust-for-linux, iommu, linux-mm
On Fri, Nov 28, 2025 at 11:56:17AM +0000, Robin Murphy wrote:
> On 2025-11-12 10:15 am, Alice Ryhl wrote:
> > From: Asahi Lina <lina+kernel@asahilina.net>
> >
> > This will be used by the Tyr driver to create and modify the page table
> > of each address space on the GPU. Each time a mapping gets created or
> > removed by userspace, Tyr will call into GPUVM, which will figure out
> > which calls to map_pages and unmap_pages are required to map the data in
> > question in the page table so that the GPU may access those pages when
> > using that address space.
> >
> > The Rust type wraps the struct using a raw pointer rather than the usual
> > Opaque+ARef approach because Opaque+ARef requires the target type to be
> > refcounted.
> >
> > Signed-off-by: Asahi Lina <lina+kernel@asahilina.net>
> > Co-Developed-by: Alice Ryhl <aliceryhl@google.com>
> > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> > +/// Protection flags used with IOMMU mappings.
> > +pub mod prot {
> > + /// Read access.
> > + pub const READ: u32 = bindings::IOMMU_READ;
> > + /// Write access.
> > + pub const WRITE: u32 = bindings::IOMMU_WRITE;
> > + /// Request cache coherency.
> > + pub const CACHE: u32 = bindings::IOMMU_CACHE;
> > + /// Request no-execute permission.
> > + pub const NOEXEC: u32 = bindings::IOMMU_NOEXEC;
> > + /// MMIO peripheral mapping.
> > + pub const MMIO: u32 = bindings::IOMMU_MMIO;
> > + /// Privileged mapping.
> > + pub const PRIV: u32 = bindings::IOMMU_PRIV;
>
> Nit: probably best to call this PRIVILEGED from day 1 for clarity - some day
> we may eventually get round to renaming the C symbol too, especially if we
> revisit the notion of "private" mappings (that's still on my ideas list...)
Sure, will rename.
> > + /// Map a physically contiguous range of pages of the same size.
> > + ///
> > + /// # Safety
> > + ///
> > + /// * This page table must not contain any mapping that overlaps with the mapping created by
> > + /// this call.
>
> As mentioned this isn't necessarily true of io-pgtable itself, but since
> you've not included QUIRK_NO_WARN in the abstraction then it's fair if this
> layer wants to be a little stricter toward Rust users.
Assuming that we don't allow QUICK_NO_WARN, would you say that it's
precise as-is?
> > + /// * If this page table is live, then the caller must ensure that it's okay to access the
> > + /// physical address being mapped for the duration in which it is mapped.
> > + #[inline]
> > + pub unsafe fn map_pages(
> > + &self,
> > + iova: usize,
> > + paddr: PhysAddr,
> > + pgsize: usize,
> > + pgcount: usize,
> > + prot: u32,
> > + flags: alloc::Flags,
> > + ) -> Result<usize> {
> > + let mut mapped: usize = 0;
> > +
> > + // SAFETY: The `map_pages` function in `io_pgtable_ops` is never null.
> > + let map_pages = unsafe { (*self.raw_ops()).map_pages.unwrap_unchecked() };
> > +
> > + // SAFETY: The safety requirements of this method are sufficient to call `map_pages`.
> > + to_result(unsafe {
> > + (map_pages)(
> > + self.raw_ops(),
> > + iova,
> > + paddr,
> > + pgsize,
> > + pgcount,
> > + prot as i32,
> > + flags.as_raw(),
> > + &mut mapped,
> > + )
> > + })?;
> > +
> > + Ok(mapped)
>
> Just to double-check since I'm a bit unclear on the Rust semantics, this can
> correctly reflect all 4 outcomes back to the caller, right? I.e.:
>
> - no error, mapped == pgcount * pgsize (success)
> - no error, mapped < pgcount * pgsize (call again with the remainder)
> - error, mapped > 0 (probably unmap that bit, unless clever trickery where
> an error was expected)
> - error, mapped == 0 (nothing was done, straightforward failure)
>
> (the only case not permitted is "no error, mapped == 0" - failure to make
> any progress must always be an error)
>
> Alternatively you might want to consider encapsulating the partial-mapping
> handling in this layer as well - in the C code that's done at the level of
> the IOMMU API calls that io-pgtable-using IOMMU drivers are merely passing
> through, hence why panfrost/panthor have to open-code their own equivalents,
> but there's no particular reason to follow the *exact* same pattern here.
Ah, no this signature does not reflect all of those cases. The return
type is Result<usize>, which corresponds to:
struct my_return_type {
bool success;
union {
size_t ok;
int err; // an errno
}
};
We need a different signature if it's possible to have mapped != 0 when
returning an error.
> > + }
> > +
> > + /// Unmap a range of virtually contiguous pages of the same size.
> > + ///
> > + /// # Safety
> > + ///
> > + /// This page table must contain a mapping at `iova` that consists of exactly `pgcount` pages
> > + /// of size `pgsize`.
>
> Again, the underlying requirement here is only that pgsize * pgcount
> represents the IOVA range of one or more consecutive ranges previously
> mapped, i.e.:
>
> map(0, 4KB * 256);
> map(1MB, 4KB * 256);
> unmap(0, 2MB * 1);
>
> is legal, since it's generally impractical for callers to know and keep
> track of the *exact* structure of a given pagetable. In this case there
> isn't really any good reason to try to be stricter.
How about this wording?
This page table must contain one or more consecutive mappings starting
at `iova` whose total size is `pgcount*pgsize`.
Alice
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH v3] io: add io_pgtable abstraction
2025-11-28 12:27 ` Alice Ryhl
@ 2025-11-28 16:47 ` Robin Murphy
2025-12-01 9:58 ` Alice Ryhl
0 siblings, 1 reply; 12+ messages in thread
From: Robin Murphy @ 2025-11-28 16:47 UTC (permalink / raw)
To: Alice Ryhl
Cc: Miguel Ojeda, Will Deacon, Daniel Almeida, Boris Brezillon,
Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Trevor Gross, Danilo Krummrich, Joerg Roedel,
Lorenzo Stoakes, Liam R. Howlett, Asahi Lina, linux-kernel,
rust-for-linux, iommu, linux-mm
On 2025-11-28 12:27 pm, Alice Ryhl wrote:
[...]
>>> + /// Map a physically contiguous range of pages of the same size.
>>> + ///
>>> + /// # Safety
>>> + ///
>>> + /// * This page table must not contain any mapping that overlaps with the mapping created by
>>> + /// this call.
>>
>> As mentioned this isn't necessarily true of io-pgtable itself, but since
>> you've not included QUIRK_NO_WARN in the abstraction then it's fair if this
>> layer wants to be a little stricter toward Rust users.
>
> Assuming that we don't allow QUICK_NO_WARN, would you say that it's
> precise as-is?
As an assumption of use for the Rust API, like I say it's fine - it's
still not really "unsafe" if a caller did try an overlapping mapping;
the call will still fail gracefully and accurately, it's just it will
also fire a WARN_ON() since ARM_64_LPAE_S1 without
IO_PGTABLE_QUIRK_NO_WARN considers this indicative of a usage error or
race in the caller.
If we do end up wanting to support more opportunistic and/or
userspace-controlled mappings by Rust drivers in future then we can
relax this expectation as appropriate.
>>> + /// * If this page table is live, then the caller must ensure that it's okay to access the
>>> + /// physical address being mapped for the duration in which it is mapped.
>>> + #[inline]
>>> + pub unsafe fn map_pages(
>>> + &self,
>>> + iova: usize,
>>> + paddr: PhysAddr,
>>> + pgsize: usize,
>>> + pgcount: usize,
>>> + prot: u32,
>>> + flags: alloc::Flags,
>>> + ) -> Result<usize> {
>>> + let mut mapped: usize = 0;
>>> +
>>> + // SAFETY: The `map_pages` function in `io_pgtable_ops` is never null.
>>> + let map_pages = unsafe { (*self.raw_ops()).map_pages.unwrap_unchecked() };
>>> +
>>> + // SAFETY: The safety requirements of this method are sufficient to call `map_pages`.
>>> + to_result(unsafe {
>>> + (map_pages)(
>>> + self.raw_ops(),
>>> + iova,
>>> + paddr,
>>> + pgsize,
>>> + pgcount,
>>> + prot as i32,
>>> + flags.as_raw(),
>>> + &mut mapped,
>>> + )
>>> + })?;
>>> +
>>> + Ok(mapped)
>>
>> Just to double-check since I'm a bit unclear on the Rust semantics, this can
>> correctly reflect all 4 outcomes back to the caller, right? I.e.:
>>
>> - no error, mapped == pgcount * pgsize (success)
>> - no error, mapped < pgcount * pgsize (call again with the remainder)
>> - error, mapped > 0 (probably unmap that bit, unless clever trickery where
>> an error was expected)
>> - error, mapped == 0 (nothing was done, straightforward failure)
>>
>> (the only case not permitted is "no error, mapped == 0" - failure to make
>> any progress must always be an error)
>>
>> Alternatively you might want to consider encapsulating the partial-mapping
>> handling in this layer as well - in the C code that's done at the level of
>> the IOMMU API calls that io-pgtable-using IOMMU drivers are merely passing
>> through, hence why panfrost/panthor have to open-code their own equivalents,
>> but there's no particular reason to follow the *exact* same pattern here.
>
> Ah, no this signature does not reflect all of those cases. The return
> type is Result<usize>, which corresponds to:
>
> struct my_return_type {
> bool success;
> union {
> size_t ok;
> int err; // an errno
> }
> };
>
> We need a different signature if it's possible to have mapped != 0 when
> returning an error.
Aha, thanks for clarifying - indeed this is not the common "value or
error" case, it is two (almost) orthogonal return values. However if
we're not permitting callers to try to do anything clever with -EEXIST
then it might make sense to just embed the inevitable cleanup-on-failure
boilerplate here anyway (even if we still leave retry-on-partial-success
to the caller).
Note that it does appear to be the case that io-pgtable-arm in its
current state won't actually do this, since it happens to handle all its
error return cases before any leaf PTEs are touched and "mapped" is
updated, but the abstraction layer shouldn't assume that in general
since other implementations like io-pgtable-arm-v7s definitely *can*
fail with a partial mapping.
>>> + }
>>> +
>>> + /// Unmap a range of virtually contiguous pages of the same size.
>>> + ///
>>> + /// # Safety
>>> + ///
>>> + /// This page table must contain a mapping at `iova` that consists of exactly `pgcount` pages
>>> + /// of size `pgsize`.
>>
>> Again, the underlying requirement here is only that pgsize * pgcount
>> represents the IOVA range of one or more consecutive ranges previously
>> mapped, i.e.:
>>
>> map(0, 4KB * 256);
>> map(1MB, 4KB * 256);
>> unmap(0, 2MB * 1);
>>
>> is legal, since it's generally impractical for callers to know and keep
>> track of the *exact* structure of a given pagetable. In this case there
>> isn't really any good reason to try to be stricter.
>
> How about this wording?
>
> This page table must contain one or more consecutive mappings starting
> at `iova` whose total size is `pgcount*pgsize`.
Yes, that's a nice way to put it.
Thanks,
Robin.
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH v3] io: add io_pgtable abstraction
2025-11-28 16:47 ` Robin Murphy
@ 2025-12-01 9:58 ` Alice Ryhl
2025-12-01 13:55 ` Robin Murphy
0 siblings, 1 reply; 12+ messages in thread
From: Alice Ryhl @ 2025-12-01 9:58 UTC (permalink / raw)
To: Robin Murphy
Cc: Miguel Ojeda, Will Deacon, Daniel Almeida, Boris Brezillon,
Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Trevor Gross, Danilo Krummrich, Joerg Roedel,
Lorenzo Stoakes, Liam R. Howlett, Asahi Lina, linux-kernel,
rust-for-linux, iommu, linux-mm
On Fri, Nov 28, 2025 at 04:47:52PM +0000, Robin Murphy wrote:
> On 2025-11-28 12:27 pm, Alice Ryhl wrote:
> [...]
> > > > + /// Map a physically contiguous range of pages of the same size.
> > > > + ///
> > > > + /// # Safety
> > > > + ///
> > > > + /// * This page table must not contain any mapping that overlaps with the mapping created by
> > > > + /// this call.
> > >
> > > As mentioned this isn't necessarily true of io-pgtable itself, but since
> > > you've not included QUIRK_NO_WARN in the abstraction then it's fair if this
> > > layer wants to be a little stricter toward Rust users.
> >
> > Assuming that we don't allow QUICK_NO_WARN, would you say that it's
> > precise as-is?
>
> As an assumption of use for the Rust API, like I say it's fine - it's still
> not really "unsafe" if a caller did try an overlapping mapping; the call
> will still fail gracefully and accurately, it's just it will also fire a
> WARN_ON() since ARM_64_LPAE_S1 without IO_PGTABLE_QUIRK_NO_WARN considers
> this indicative of a usage error or race in the caller.
>
> If we do end up wanting to support more opportunistic and/or
> userspace-controlled mappings by Rust drivers in future then we can relax
> this expectation as appropriate.
Yeah, let's just say that it's an unsupported use-case. These bindings
can be expanded in the future if anyone needs QUICK_NO_WARN.
> > > > + /// * If this page table is live, then the caller must ensure that it's okay to access the
> > > > + /// physical address being mapped for the duration in which it is mapped.
> > > > + #[inline]
> > > > + pub unsafe fn map_pages(
> > > > + &self,
> > > > + iova: usize,
> > > > + paddr: PhysAddr,
> > > > + pgsize: usize,
> > > > + pgcount: usize,
> > > > + prot: u32,
> > > > + flags: alloc::Flags,
> > > > + ) -> Result<usize> {
> > > > + let mut mapped: usize = 0;
> > > > +
> > > > + // SAFETY: The `map_pages` function in `io_pgtable_ops` is never null.
> > > > + let map_pages = unsafe { (*self.raw_ops()).map_pages.unwrap_unchecked() };
> > > > +
> > > > + // SAFETY: The safety requirements of this method are sufficient to call `map_pages`.
> > > > + to_result(unsafe {
> > > > + (map_pages)(
> > > > + self.raw_ops(),
> > > > + iova,
> > > > + paddr,
> > > > + pgsize,
> > > > + pgcount,
> > > > + prot as i32,
> > > > + flags.as_raw(),
> > > > + &mut mapped,
> > > > + )
> > > > + })?;
> > > > +
> > > > + Ok(mapped)
> > >
> > > Just to double-check since I'm a bit unclear on the Rust semantics, this can
> > > correctly reflect all 4 outcomes back to the caller, right? I.e.:
> > >
> > > - no error, mapped == pgcount * pgsize (success)
> > > - no error, mapped < pgcount * pgsize (call again with the remainder)
> > > - error, mapped > 0 (probably unmap that bit, unless clever trickery where
> > > an error was expected)
> > > - error, mapped == 0 (nothing was done, straightforward failure)
> > >
> > > (the only case not permitted is "no error, mapped == 0" - failure to make
> > > any progress must always be an error)
> > >
> > > Alternatively you might want to consider encapsulating the partial-mapping
> > > handling in this layer as well - in the C code that's done at the level of
> > > the IOMMU API calls that io-pgtable-using IOMMU drivers are merely passing
> > > through, hence why panfrost/panthor have to open-code their own equivalents,
> > > but there's no particular reason to follow the *exact* same pattern here.
> >
> > Ah, no this signature does not reflect all of those cases. The return
> > type is Result<usize>, which corresponds to:
> >
> > struct my_return_type {
> > bool success;
> > union {
> > size_t ok;
> > int err; // an errno
> > }
> > };
> >
> > We need a different signature if it's possible to have mapped != 0 when
> > returning an error.
>
> Aha, thanks for clarifying - indeed this is not the common "value or error"
> case, it is two (almost) orthogonal return values. However if we're not
> permitting callers to try to do anything clever with -EEXIST then it might
> make sense to just embed the inevitable cleanup-on-failure boilerplate here
> anyway (even if we still leave retry-on-partial-success to the caller).
Is the only possible error -EEXIST? I could encode that in the API if
that is the case.
> Note that it does appear to be the case that io-pgtable-arm in its current
> state won't actually do this, since it happens to handle all its error
> return cases before any leaf PTEs are touched and "mapped" is updated, but
> the abstraction layer shouldn't assume that in general since other
> implementations like io-pgtable-arm-v7s definitely *can* fail with a partial
> mapping.
Agreed, I will update the API accordingly.
> > > > + }
> > > > +
> > > > + /// Unmap a range of virtually contiguous pages of the same size.
> > > > + ///
> > > > + /// # Safety
> > > > + ///
> > > > + /// This page table must contain a mapping at `iova` that consists of exactly `pgcount` pages
> > > > + /// of size `pgsize`.
> > >
> > > Again, the underlying requirement here is only that pgsize * pgcount
> > > represents the IOVA range of one or more consecutive ranges previously
> > > mapped, i.e.:
> > >
> > > map(0, 4KB * 256);
> > > map(1MB, 4KB * 256);
> > > unmap(0, 2MB * 1);
> > >
> > > is legal, since it's generally impractical for callers to know and keep
> > > track of the *exact* structure of a given pagetable. In this case there
> > > isn't really any good reason to try to be stricter.
> >
> > How about this wording?
> >
> > This page table must contain one or more consecutive mappings starting
> > at `iova` whose total size is `pgcount*pgsize`.
>
> Yes, that's a nice way to put it.
Perfect thanks.
Alice
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH v3] io: add io_pgtable abstraction
2025-12-01 9:58 ` Alice Ryhl
@ 2025-12-01 13:55 ` Robin Murphy
0 siblings, 0 replies; 12+ messages in thread
From: Robin Murphy @ 2025-12-01 13:55 UTC (permalink / raw)
To: Alice Ryhl
Cc: Miguel Ojeda, Will Deacon, Daniel Almeida, Boris Brezillon,
Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Trevor Gross, Danilo Krummrich, Joerg Roedel,
Lorenzo Stoakes, Liam R. Howlett, Asahi Lina, linux-kernel,
rust-for-linux, iommu, linux-mm
On 2025-12-01 9:58 am, Alice Ryhl wrote:
[...]
>>> We need a different signature if it's possible to have mapped != 0 when
>>> returning an error.
>>
>> Aha, thanks for clarifying - indeed this is not the common "value or error"
>> case, it is two (almost) orthogonal return values. However if we're not
>> permitting callers to try to do anything clever with -EEXIST then it might
>> make sense to just embed the inevitable cleanup-on-failure boilerplate here
>> anyway (even if we still leave retry-on-partial-success to the caller).
>
> Is the only possible error -EEXIST? I could encode that in the API if
> that is the case.
No, I was just calling out -EEXIST as the only error where I imagine a
caller *might* want to continue without cleaning up a partial mapping,
if for instance they were playing clever tricks like a background
mapping of a large buffer while already allowing other threads to
eagerly demand-page bits of it, so the "main" mapping thread just
adjusts and restarts to skip over already-present pages. Other errors
are still possible, but generally represent terminal failure conditions
at the caller's level too - in practice things like -EINVAL and -ENOMEM
are likely to happen before any mappings can be made, but io-pgtable
doesn't guarantee any particular behaviour here, so a well-behaved
caller should still generally handle cleaning up after an error (at
least if they intend to keep trying to use the pagetable beyond that point).
Cheers,
Robin.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3] io: add io_pgtable abstraction
2025-11-12 10:15 [PATCH v3] io: add io_pgtable abstraction Alice Ryhl
` (2 preceding siblings ...)
2025-11-28 11:56 ` Robin Murphy
@ 2025-11-28 18:02 ` Jason Gunthorpe
3 siblings, 0 replies; 12+ messages in thread
From: Jason Gunthorpe @ 2025-11-28 18:02 UTC (permalink / raw)
To: Alice Ryhl
Cc: Miguel Ojeda, Will Deacon, Daniel Almeida, Boris Brezillon,
Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Trevor Gross, Danilo Krummrich, Joerg Roedel,
Robin Murphy, Lorenzo Stoakes, Liam R. Howlett, Asahi Lina,
linux-kernel, rust-for-linux, iommu, linux-mm
On Wed, Nov 12, 2025 at 10:15:00AM +0000, Alice Ryhl wrote:
> + /// Map a physically contiguous range of pages of the same size.
> + ///
> + /// # Safety
> + ///
> + /// * This page table must not contain any mapping that overlaps with the mapping created by
> + /// this call.
> + /// * If this page table is live, then the caller must ensure that it's okay to access the
> + /// physical address being mapped for the duration in which it is mapped.
All the iommu page table code runs with a caller provided locking
regime.
The caller must effectively hold a range lock over the IOVA it is
mapping/unmapping/iova_to_phys'ing and it is illegal to race those
three functions with overlapping IOVA.
For example the caller cannot map to IOVA A-B while unmapping C-B or
iova_to_physing(A).
This locking detail should be a safety statement.
> +// These bindings are currently designed for use by GPU drivers, which use this page table together
> +// with GPUVM. When using GPUVM, a single mapping operation may be translated into many operations
Now that we have the generic pt stuff I wonder if GPUVM should be
providing its own version of the page table implementation that
matches its semantics better.
> +// on the page table, and in that case you generally want to flush the TLB only once per GPUVM
> +// operation. Thus, do not use these callbacks as they would flush more often than needed.
This doesn't sound quite right to me, the flushing requirements are a
consequence of what flushing HW is available in the xTLB that is
caching this. The flushes generated by the common arm code match the
ARM64 architecture semantics. If the GPU has different semantics then
it can do something else, but one must be careful not to match a GPU
that is expecting ARM semantics with "do not use these callbacks"
IOW it doesn't seem right that common code would be making decisions
like this, the nature and requirements of the flushing are entirely up
to the driver binding to HW.
Jason
^ permalink raw reply [flat|nested] 12+ messages in thread