[PATCH v9 0/8] Rust support for mm_struct, vm_area

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v9 0/8] Rust support for mm_struct, vm_area_struct, and mmap
@ 2024-11-22 15:40 Alice Ryhl
  2024-11-22 15:40 ` [PATCH v9 1/8] mm: rust: add abstraction for struct mm_struct Alice Ryhl
                   ` (7 more replies)
  0 siblings, 8 replies; 38+ messages in thread
From: Alice Ryhl @ 2024-11-22 15:40 UTC (permalink / raw)
  To: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
	John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan
  Cc: Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, linux-kernel, linux-mm, rust-for-linux, Alice Ryhl,
	Andreas Hindborg

This updates the vm_area_struct support to use the approach we discussed
at LPC where there are several different Rust wrappers for
vm_area_struct depending on the kind of access you have to the vma. Each
case allows a different set of operations on the vma.

This series depends on vfs.rust.file and char-misc-next which are
scheduled to land upstream in the current merge window.

---
Changes in v9:
- Be more explicit about VmAreaNew being used with f_ops->mmap().
- Point out that clearing VM_MAYWRITE is irreversible.
- Use __vm_flags to set the flags.
- Use as_ and into_ prefixes for conversions.
- Update lock_vma_under_rcu docs and commit msg
- Mention that VmAreaRef::end is exclusive.
- Reword docs for zap_page_range_single.
- Minor fixes to flag docs.
- Add way to access current->mm without a refcount increment.
- Link to v8: https://lore.kernel.org/r/20241120-vma-v8-0-eb31425da66b@google.com

Changes in v8:
- Split series into more commits to ease review.
- Improve read locks based on Lorenzo's doc: either the mmap or vma lock
  can be used.
- Get rid of mmap write lock because it's possible to avoid the need for
  it.
- Do not allow invalid flag combinations on VmAreaNew.
- Link to v7: https://lore.kernel.org/r/20241014-vma-v7-0-01e32f861195@google.com

Changes in v7:
- Make the mmap read/write lock guards respect strict owner semantics.
- Link to v6: https://lore.kernel.org/r/20241010-vma-v6-0-d89039b6f573@google.com

Changes in v6:
- Introduce VmArea{Ref,Mut,New} distinction.
- Add a second patchset for miscdevice.
- Rebase on char-misc-next (currently on v6.12-rc2).
- Link to v5: https://lore.kernel.org/r/20240806-vma-v5-1-04018f05de2b@google.com

Changes in v5:
- Rename VmArea::from_raw_vma to from_raw.
- Use Pin for mutable VmArea references.
- Go through `ARef::from` in `mmgrab_current`.
- Link to v4: https://lore.kernel.org/r/20240802-vma-v4-1-091a87058a43@google.com

Changes in v4:
- Pull out ARef::into_raw into a separate patch.
- Update invariants and struct documentation.
- Rename from_raw_mm to from_raw.
- Link to v3: https://lore.kernel.org/r/20240801-vma-v3-1-db6c1c0afda9@google.com

Changes in v3:
- Reorder entries in mm.rs.
- Use ARef for mmput_async helper.
- Clarify that VmArea requires you to hold the mmap read or write lock.
- Link to v2: https://lore.kernel.org/r/20240727-vma-v2-1-ab3e5927dc3a@google.com

Changes in v2:
- mm.rs is redesigned from scratch making use of AsRef
- Add notes about whether destructors may sleep
- Rename Area to VmArea
- Link to v1: https://lore.kernel.org/r/20240723-vma-v1-1-32ad5a0118ee@google.com

---
Alice Ryhl (8):
      mm: rust: add abstraction for struct mm_struct
      mm: rust: add vm_area_struct methods that require read access
      mm: rust: add vm_insert_page
      mm: rust: add lock_vma_under_rcu
      mm: rust: add mmput_async support
      mm: rust: add VmAreaNew for f_ops->mmap()
      rust: miscdevice: add mmap support
      task: rust: rework how current is accessed

 rust/helpers/helpers.c    |   1 +
 rust/helpers/mm.c         |  50 ++++++
 rust/kernel/lib.rs        |   1 +
 rust/kernel/miscdevice.rs |  28 +++
 rust/kernel/mm.rs         | 323 +++++++++++++++++++++++++++++++++++
 rust/kernel/mm/virt.rs    | 421 ++++++++++++++++++++++++++++++++++++++++++++++
 rust/kernel/task.rs       |  64 +++++--
 7 files changed, 875 insertions(+), 13 deletions(-)
---
base-commit: 8b8379ae7c70734686e98cf7455e6b25e614d593
change-id: 20240723-vma-f80119f9fb35

Best regards,
-- 
Alice Ryhl <aliceryhl@google.com>



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v9 1/8] mm: rust: add abstraction for struct mm_struct
  2024-11-22 15:40 [PATCH v9 0/8] Rust support for mm_struct, vm_area_struct, and mmap Alice Ryhl
@ 2024-11-22 15:40 ` Alice Ryhl
  2024-11-22 17:27   ` Lorenzo Stoakes
  2024-11-22 15:40 ` [PATCH v9 2/8] mm: rust: add vm_area_struct methods that require read access Alice Ryhl
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 38+ messages in thread
From: Alice Ryhl @ 2024-11-22 15:40 UTC (permalink / raw)
  To: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
	John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan
  Cc: Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, linux-kernel, linux-mm, rust-for-linux, Alice Ryhl,
	Andreas Hindborg

These abstractions allow you to reference a `struct mm_struct` using
both mmgrab and mmget refcounts. This is done using two Rust types:

* Mm - represents an mm_struct where you don't know anything about the
  value of mm_users.
* MmWithUser - represents an mm_struct where you know at compile time
  that mm_users is non-zero.

This allows us to encode in the type system whether a method requires
that mm_users is non-zero or not. For instance, you can always call
`mmget_not_zero` but you can only call `mmap_read_lock` when mm_users is
non-zero.

It's possible to access current->mm without a refcount increment, but
that is added in a later patch of this series.

Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
 rust/helpers/helpers.c |   1 +
 rust/helpers/mm.c      |  39 +++++++++
 rust/kernel/lib.rs     |   1 +
 rust/kernel/mm.rs      | 219 +++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 260 insertions(+)

diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c
index 6d90afd38c40..2ee3af594633 100644
--- a/rust/helpers/helpers.c
+++ b/rust/helpers/helpers.c
@@ -15,6 +15,7 @@
 #include "err.c"
 #include "fs.c"
 #include "kunit.c"
+#include "mm.c"
 #include "mutex.c"
 #include "page.c"
 #include "rbtree.c"
diff --git a/rust/helpers/mm.c b/rust/helpers/mm.c
new file mode 100644
index 000000000000..7201747a5d31
--- /dev/null
+++ b/rust/helpers/mm.c
@@ -0,0 +1,39 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/mm.h>
+#include <linux/sched/mm.h>
+
+void rust_helper_mmgrab(struct mm_struct *mm)
+{
+	mmgrab(mm);
+}
+
+void rust_helper_mmdrop(struct mm_struct *mm)
+{
+	mmdrop(mm);
+}
+
+void rust_helper_mmget(struct mm_struct *mm)
+{
+	mmget(mm);
+}
+
+bool rust_helper_mmget_not_zero(struct mm_struct *mm)
+{
+	return mmget_not_zero(mm);
+}
+
+void rust_helper_mmap_read_lock(struct mm_struct *mm)
+{
+	mmap_read_lock(mm);
+}
+
+bool rust_helper_mmap_read_trylock(struct mm_struct *mm)
+{
+	return mmap_read_trylock(mm);
+}
+
+void rust_helper_mmap_read_unlock(struct mm_struct *mm)
+{
+	mmap_read_unlock(mm);
+}
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index bf98caa6d6a5..104e619f5dbd 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -45,6 +45,7 @@
 pub mod kunit;
 pub mod list;
 pub mod miscdevice;
+pub mod mm;
 #[cfg(CONFIG_NET)]
 pub mod net;
 pub mod page;
diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
new file mode 100644
index 000000000000..84cba581edaa
--- /dev/null
+++ b/rust/kernel/mm.rs
@@ -0,0 +1,219 @@
+// SPDX-License-Identifier: GPL-2.0
+
+// Copyright (C) 2024 Google LLC.
+
+//! Memory management.
+//!
+//! C header: [`include/linux/mm.h`](srctree/include/linux/mm.h)
+
+use crate::{
+    bindings,
+    types::{ARef, AlwaysRefCounted, NotThreadSafe, Opaque},
+};
+use core::{ops::Deref, ptr::NonNull};
+
+/// A wrapper for the kernel's `struct mm_struct`.
+///
+/// Since `mm_users` may be zero, the associated address space may not exist anymore. You can use
+/// [`mmget_not_zero`] to be able to access the address space.
+///
+/// The `ARef<Mm>` smart pointer holds an `mmgrab` refcount. Its destructor may sleep.
+///
+/// # Invariants
+///
+/// Values of this type are always refcounted using `mmgrab`.
+///
+/// [`mmget_not_zero`]: Mm::mmget_not_zero
+#[repr(transparent)]
+pub struct Mm {
+    mm: Opaque<bindings::mm_struct>,
+}
+
+// SAFETY: It is safe to call `mmdrop` on another thread than where `mmgrab` was called.
+unsafe impl Send for Mm {}
+// SAFETY: All methods on `Mm` can be called in parallel from several threads.
+unsafe impl Sync for Mm {}
+
+// SAFETY: By the type invariants, this type is always refcounted.
+unsafe impl AlwaysRefCounted for Mm {
+    #[inline]
+    fn inc_ref(&self) {
+        // SAFETY: The pointer is valid since self is a reference.
+        unsafe { bindings::mmgrab(self.as_raw()) };
+    }
+
+    #[inline]
+    unsafe fn dec_ref(obj: NonNull<Self>) {
+        // SAFETY: The caller is giving up their refcount.
+        unsafe { bindings::mmdrop(obj.cast().as_ptr()) };
+    }
+}
+
+/// A wrapper for the kernel's `struct mm_struct`.
+///
+/// This type is like [`Mm`], but with non-zero `mm_users`. It can only be used when `mm_users` can
+/// be proven to be non-zero at compile-time, usually because the relevant code holds an `mmget`
+/// refcount. It can be used to access the associated address space.
+///
+/// The `ARef<MmWithUser>` smart pointer holds an `mmget` refcount. Its destructor may sleep.
+///
+/// # Invariants
+///
+/// Values of this type are always refcounted using `mmget`. The value of `mm_users` is non-zero.
+#[repr(transparent)]
+pub struct MmWithUser {
+    mm: Mm,
+}
+
+// SAFETY: It is safe to call `mmput` on another thread than where `mmget` was called.
+unsafe impl Send for MmWithUser {}
+// SAFETY: All methods on `MmWithUser` can be called in parallel from several threads.
+unsafe impl Sync for MmWithUser {}
+
+// SAFETY: By the type invariants, this type is always refcounted.
+unsafe impl AlwaysRefCounted for MmWithUser {
+    #[inline]
+    fn inc_ref(&self) {
+        // SAFETY: The pointer is valid since self is a reference.
+        unsafe { bindings::mmget(self.as_raw()) };
+    }
+
+    #[inline]
+    unsafe fn dec_ref(obj: NonNull<Self>) {
+        // SAFETY: The caller is giving up their refcount.
+        unsafe { bindings::mmput(obj.cast().as_ptr()) };
+    }
+}
+
+// Make all `Mm` methods available on `MmWithUser`.
+impl Deref for MmWithUser {
+    type Target = Mm;
+
+    #[inline]
+    fn deref(&self) -> &Mm {
+        &self.mm
+    }
+}
+
+// These methods are safe to call even if `mm_users` is zero.
+impl Mm {
+    /// Call `mmgrab` on `current.mm`.
+    #[inline]
+    pub fn mmgrab_current() -> Option<ARef<Mm>> {
+        // SAFETY: It's safe to get the `mm` field from current.
+        let mm = unsafe {
+            let current = bindings::get_current();
+            (*current).mm
+        };
+
+        if mm.is_null() {
+            return None;
+        }
+
+        // SAFETY: The value of `current->mm` is guaranteed to be null or a valid `mm_struct`. We
+        // just checked that it's not null. Furthermore, the returned `&Mm` is valid only for the
+        // duration of this function, and `current->mm` will stay valid for that long.
+        let mm = unsafe { Mm::from_raw(mm) };
+
+        // This increments the refcount using `mmgrab`.
+        Some(ARef::from(mm))
+    }
+
+    /// Returns a raw pointer to the inner `mm_struct`.
+    #[inline]
+    pub fn as_raw(&self) -> *mut bindings::mm_struct {
+        self.mm.get()
+    }
+
+    /// Obtain a reference from a raw pointer.
+    ///
+    /// # Safety
+    ///
+    /// The caller must ensure that `ptr` points at an `mm_struct`, and that it is not deallocated
+    /// during the lifetime 'a.
+    #[inline]
+    pub unsafe fn from_raw<'a>(ptr: *const bindings::mm_struct) -> &'a Mm {
+        // SAFETY: Caller promises that the pointer is valid for 'a. Layouts are compatible due to
+        // repr(transparent).
+        unsafe { &*ptr.cast() }
+    }
+
+    /// Calls `mmget_not_zero` and returns a handle if it succeeds.
+    #[inline]
+    pub fn mmget_not_zero(&self) -> Option<ARef<MmWithUser>> {
+        // SAFETY: The pointer is valid since self is a reference.
+        let success = unsafe { bindings::mmget_not_zero(self.as_raw()) };
+
+        if success {
+            // SAFETY: We just created an `mmget` refcount.
+            Some(unsafe { ARef::from_raw(NonNull::new_unchecked(self.as_raw().cast())) })
+        } else {
+            None
+        }
+    }
+}
+
+// These methods require `mm_users` to be non-zero.
+impl MmWithUser {
+    /// Obtain a reference from a raw pointer.
+    ///
+    /// # Safety
+    ///
+    /// The caller must ensure that `ptr` points at an `mm_struct`, and that `mm_users` remains
+    /// non-zero for the duration of the lifetime 'a.
+    #[inline]
+    pub unsafe fn from_raw<'a>(ptr: *const bindings::mm_struct) -> &'a MmWithUser {
+        // SAFETY: Caller promises that the pointer is valid for 'a. The layout is compatible due
+        // to repr(transparent).
+        unsafe { &*ptr.cast() }
+    }
+
+    /// Lock the mmap read lock.
+    #[inline]
+    pub fn mmap_read_lock(&self) -> MmapReadGuard<'_> {
+        // SAFETY: The pointer is valid since self is a reference.
+        unsafe { bindings::mmap_read_lock(self.as_raw()) };
+
+        // INVARIANT: We just acquired the read lock.
+        MmapReadGuard {
+            mm: self,
+            _nts: NotThreadSafe,
+        }
+    }
+
+    /// Try to lock the mmap read lock.
+    #[inline]
+    pub fn mmap_read_trylock(&self) -> Option<MmapReadGuard<'_>> {
+        // SAFETY: The pointer is valid since self is a reference.
+        let success = unsafe { bindings::mmap_read_trylock(self.as_raw()) };
+
+        if success {
+            // INVARIANT: We just acquired the read lock.
+            Some(MmapReadGuard {
+                mm: self,
+                _nts: NotThreadSafe,
+            })
+        } else {
+            None
+        }
+    }
+}
+
+/// A guard for the mmap read lock.
+///
+/// # Invariants
+///
+/// This `MmapReadGuard` guard owns the mmap read lock.
+pub struct MmapReadGuard<'a> {
+    mm: &'a MmWithUser,
+    // `mmap_read_lock` and `mmap_read_unlock` must be called on the same thread
+    _nts: NotThreadSafe,
+}
+
+impl Drop for MmapReadGuard<'_> {
+    #[inline]
+    fn drop(&mut self) {
+        // SAFETY: We hold the read lock by the type invariants.
+        unsafe { bindings::mmap_read_unlock(self.mm.as_raw()) };
+    }
+}

-- 
2.47.0.371.ga323438b13-goog



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 1/8] mm: rust: add abstraction for struct mm_struct
  2024-11-22 15:40 ` [PATCH v9 1/8] mm: rust: add abstraction for struct mm_struct Alice Ryhl
@ 2024-11-22 17:27   ` Lorenzo Stoakes
  0 siblings, 0 replies; 38+ messages in thread
From: Lorenzo Stoakes @ 2024-11-22 17:27 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Miguel Ojeda, Matthew Wilcox, Vlastimil Babka, John Hubbard,
	Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
	Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, linux-kernel, linux-mm, rust-for-linux,
	Andreas Hindborg

On Fri, Nov 22, 2024 at 03:40:26PM +0000, Alice Ryhl wrote:
> These abstractions allow you to reference a `struct mm_struct` using
> both mmgrab and mmget refcounts. This is done using two Rust types:
>
> * Mm - represents an mm_struct where you don't know anything about the
>   value of mm_users.
> * MmWithUser - represents an mm_struct where you know at compile time
>   that mm_users is non-zero.
>
> This allows us to encode in the type system whether a method requires
> that mm_users is non-zero or not. For instance, you can always call
> `mmget_not_zero` but you can only call `mmap_read_lock` when mm_users is
> non-zero.
>
> It's possible to access current->mm without a refcount increment, but
> that is added in a later patch of this series.
>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>

I see you address the current thing in a later commit, so as the rest of the
patch looked fine previously (with help from your kind explanations on the rust
side :), so:

Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)

> ---
>  rust/helpers/helpers.c |   1 +
>  rust/helpers/mm.c      |  39 +++++++++
>  rust/kernel/lib.rs     |   1 +
>  rust/kernel/mm.rs      | 219 +++++++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 260 insertions(+)
>
> diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c
> index 6d90afd38c40..2ee3af594633 100644
> --- a/rust/helpers/helpers.c
> +++ b/rust/helpers/helpers.c
> @@ -15,6 +15,7 @@
>  #include "err.c"
>  #include "fs.c"
>  #include "kunit.c"
> +#include "mm.c"
>  #include "mutex.c"
>  #include "page.c"
>  #include "rbtree.c"
> diff --git a/rust/helpers/mm.c b/rust/helpers/mm.c
> new file mode 100644
> index 000000000000..7201747a5d31
> --- /dev/null
> +++ b/rust/helpers/mm.c
> @@ -0,0 +1,39 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/mm.h>
> +#include <linux/sched/mm.h>
> +
> +void rust_helper_mmgrab(struct mm_struct *mm)
> +{
> +	mmgrab(mm);
> +}
> +
> +void rust_helper_mmdrop(struct mm_struct *mm)
> +{
> +	mmdrop(mm);
> +}
> +
> +void rust_helper_mmget(struct mm_struct *mm)
> +{
> +	mmget(mm);
> +}
> +
> +bool rust_helper_mmget_not_zero(struct mm_struct *mm)
> +{
> +	return mmget_not_zero(mm);
> +}
> +
> +void rust_helper_mmap_read_lock(struct mm_struct *mm)
> +{
> +	mmap_read_lock(mm);
> +}
> +
> +bool rust_helper_mmap_read_trylock(struct mm_struct *mm)
> +{
> +	return mmap_read_trylock(mm);
> +}
> +
> +void rust_helper_mmap_read_unlock(struct mm_struct *mm)
> +{
> +	mmap_read_unlock(mm);
> +}
> diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
> index bf98caa6d6a5..104e619f5dbd 100644
> --- a/rust/kernel/lib.rs
> +++ b/rust/kernel/lib.rs
> @@ -45,6 +45,7 @@
>  pub mod kunit;
>  pub mod list;
>  pub mod miscdevice;
> +pub mod mm;
>  #[cfg(CONFIG_NET)]
>  pub mod net;
>  pub mod page;
> diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
> new file mode 100644
> index 000000000000..84cba581edaa
> --- /dev/null
> +++ b/rust/kernel/mm.rs
> @@ -0,0 +1,219 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +// Copyright (C) 2024 Google LLC.
> +
> +//! Memory management.
> +//!
> +//! C header: [`include/linux/mm.h`](srctree/include/linux/mm.h)
> +
> +use crate::{
> +    bindings,
> +    types::{ARef, AlwaysRefCounted, NotThreadSafe, Opaque},
> +};
> +use core::{ops::Deref, ptr::NonNull};
> +
> +/// A wrapper for the kernel's `struct mm_struct`.
> +///
> +/// Since `mm_users` may be zero, the associated address space may not exist anymore. You can use
> +/// [`mmget_not_zero`] to be able to access the address space.
> +///
> +/// The `ARef<Mm>` smart pointer holds an `mmgrab` refcount. Its destructor may sleep.
> +///
> +/// # Invariants
> +///
> +/// Values of this type are always refcounted using `mmgrab`.
> +///
> +/// [`mmget_not_zero`]: Mm::mmget_not_zero
> +#[repr(transparent)]
> +pub struct Mm {
> +    mm: Opaque<bindings::mm_struct>,
> +}
> +
> +// SAFETY: It is safe to call `mmdrop` on another thread than where `mmgrab` was called.
> +unsafe impl Send for Mm {}
> +// SAFETY: All methods on `Mm` can be called in parallel from several threads.
> +unsafe impl Sync for Mm {}
> +
> +// SAFETY: By the type invariants, this type is always refcounted.
> +unsafe impl AlwaysRefCounted for Mm {
> +    #[inline]
> +    fn inc_ref(&self) {
> +        // SAFETY: The pointer is valid since self is a reference.
> +        unsafe { bindings::mmgrab(self.as_raw()) };
> +    }
> +
> +    #[inline]
> +    unsafe fn dec_ref(obj: NonNull<Self>) {
> +        // SAFETY: The caller is giving up their refcount.
> +        unsafe { bindings::mmdrop(obj.cast().as_ptr()) };
> +    }
> +}
> +
> +/// A wrapper for the kernel's `struct mm_struct`.
> +///
> +/// This type is like [`Mm`], but with non-zero `mm_users`. It can only be used when `mm_users` can
> +/// be proven to be non-zero at compile-time, usually because the relevant code holds an `mmget`
> +/// refcount. It can be used to access the associated address space.
> +///
> +/// The `ARef<MmWithUser>` smart pointer holds an `mmget` refcount. Its destructor may sleep.
> +///
> +/// # Invariants
> +///
> +/// Values of this type are always refcounted using `mmget`. The value of `mm_users` is non-zero.
> +#[repr(transparent)]
> +pub struct MmWithUser {
> +    mm: Mm,
> +}
> +
> +// SAFETY: It is safe to call `mmput` on another thread than where `mmget` was called.
> +unsafe impl Send for MmWithUser {}
> +// SAFETY: All methods on `MmWithUser` can be called in parallel from several threads.
> +unsafe impl Sync for MmWithUser {}
> +
> +// SAFETY: By the type invariants, this type is always refcounted.
> +unsafe impl AlwaysRefCounted for MmWithUser {
> +    #[inline]
> +    fn inc_ref(&self) {
> +        // SAFETY: The pointer is valid since self is a reference.
> +        unsafe { bindings::mmget(self.as_raw()) };
> +    }
> +
> +    #[inline]
> +    unsafe fn dec_ref(obj: NonNull<Self>) {
> +        // SAFETY: The caller is giving up their refcount.
> +        unsafe { bindings::mmput(obj.cast().as_ptr()) };
> +    }
> +}
> +
> +// Make all `Mm` methods available on `MmWithUser`.
> +impl Deref for MmWithUser {
> +    type Target = Mm;
> +
> +    #[inline]
> +    fn deref(&self) -> &Mm {
> +        &self.mm
> +    }
> +}
> +
> +// These methods are safe to call even if `mm_users` is zero.
> +impl Mm {
> +    /// Call `mmgrab` on `current.mm`.
> +    #[inline]
> +    pub fn mmgrab_current() -> Option<ARef<Mm>> {
> +        // SAFETY: It's safe to get the `mm` field from current.
> +        let mm = unsafe {
> +            let current = bindings::get_current();
> +            (*current).mm
> +        };
> +
> +        if mm.is_null() {
> +            return None;
> +        }
> +
> +        // SAFETY: The value of `current->mm` is guaranteed to be null or a valid `mm_struct`. We
> +        // just checked that it's not null. Furthermore, the returned `&Mm` is valid only for the
> +        // duration of this function, and `current->mm` will stay valid for that long.
> +        let mm = unsafe { Mm::from_raw(mm) };
> +
> +        // This increments the refcount using `mmgrab`.
> +        Some(ARef::from(mm))
> +    }
> +
> +    /// Returns a raw pointer to the inner `mm_struct`.
> +    #[inline]
> +    pub fn as_raw(&self) -> *mut bindings::mm_struct {
> +        self.mm.get()
> +    }
> +
> +    /// Obtain a reference from a raw pointer.
> +    ///
> +    /// # Safety
> +    ///
> +    /// The caller must ensure that `ptr` points at an `mm_struct`, and that it is not deallocated
> +    /// during the lifetime 'a.
> +    #[inline]
> +    pub unsafe fn from_raw<'a>(ptr: *const bindings::mm_struct) -> &'a Mm {
> +        // SAFETY: Caller promises that the pointer is valid for 'a. Layouts are compatible due to
> +        // repr(transparent).
> +        unsafe { &*ptr.cast() }
> +    }
> +
> +    /// Calls `mmget_not_zero` and returns a handle if it succeeds.
> +    #[inline]
> +    pub fn mmget_not_zero(&self) -> Option<ARef<MmWithUser>> {
> +        // SAFETY: The pointer is valid since self is a reference.
> +        let success = unsafe { bindings::mmget_not_zero(self.as_raw()) };
> +
> +        if success {
> +            // SAFETY: We just created an `mmget` refcount.
> +            Some(unsafe { ARef::from_raw(NonNull::new_unchecked(self.as_raw().cast())) })
> +        } else {
> +            None
> +        }
> +    }
> +}
> +
> +// These methods require `mm_users` to be non-zero.
> +impl MmWithUser {
> +    /// Obtain a reference from a raw pointer.
> +    ///
> +    /// # Safety
> +    ///
> +    /// The caller must ensure that `ptr` points at an `mm_struct`, and that `mm_users` remains
> +    /// non-zero for the duration of the lifetime 'a.
> +    #[inline]
> +    pub unsafe fn from_raw<'a>(ptr: *const bindings::mm_struct) -> &'a MmWithUser {
> +        // SAFETY: Caller promises that the pointer is valid for 'a. The layout is compatible due
> +        // to repr(transparent).
> +        unsafe { &*ptr.cast() }
> +    }
> +
> +    /// Lock the mmap read lock.
> +    #[inline]
> +    pub fn mmap_read_lock(&self) -> MmapReadGuard<'_> {
> +        // SAFETY: The pointer is valid since self is a reference.
> +        unsafe { bindings::mmap_read_lock(self.as_raw()) };
> +
> +        // INVARIANT: We just acquired the read lock.
> +        MmapReadGuard {
> +            mm: self,
> +            _nts: NotThreadSafe,
> +        }
> +    }
> +
> +    /// Try to lock the mmap read lock.
> +    #[inline]
> +    pub fn mmap_read_trylock(&self) -> Option<MmapReadGuard<'_>> {
> +        // SAFETY: The pointer is valid since self is a reference.
> +        let success = unsafe { bindings::mmap_read_trylock(self.as_raw()) };
> +
> +        if success {
> +            // INVARIANT: We just acquired the read lock.
> +            Some(MmapReadGuard {
> +                mm: self,
> +                _nts: NotThreadSafe,
> +            })
> +        } else {
> +            None
> +        }
> +    }
> +}
> +
> +/// A guard for the mmap read lock.
> +///
> +/// # Invariants
> +///
> +/// This `MmapReadGuard` guard owns the mmap read lock.
> +pub struct MmapReadGuard<'a> {
> +    mm: &'a MmWithUser,
> +    // `mmap_read_lock` and `mmap_read_unlock` must be called on the same thread
> +    _nts: NotThreadSafe,
> +}
> +
> +impl Drop for MmapReadGuard<'_> {
> +    #[inline]
> +    fn drop(&mut self) {
> +        // SAFETY: We hold the read lock by the type invariants.
> +        unsafe { bindings::mmap_read_unlock(self.mm.as_raw()) };
> +    }
> +}
>
> --
> 2.47.0.371.ga323438b13-goog
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v9 2/8] mm: rust: add vm_area_struct methods that require read access
  2024-11-22 15:40 [PATCH v9 0/8] Rust support for mm_struct, vm_area_struct, and mmap Alice Ryhl
  2024-11-22 15:40 ` [PATCH v9 1/8] mm: rust: add abstraction for struct mm_struct Alice Ryhl
@ 2024-11-22 15:40 ` Alice Ryhl
  2024-11-26 22:09   ` Jann Horn
  2024-11-22 15:40 ` [PATCH v9 3/8] mm: rust: add vm_insert_page Alice Ryhl
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 38+ messages in thread
From: Alice Ryhl @ 2024-11-22 15:40 UTC (permalink / raw)
  To: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
	John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan
  Cc: Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, linux-kernel, linux-mm, rust-for-linux, Alice Ryhl,
	Andreas Hindborg

This adds a type called VmAreaRef which is used when referencing a vma
that you have read access to. Here, read access means that you hold
either the mmap read lock or the vma read lock (or stronger).

Additionally, a vma_lookup method is added to the mmap read guard, which
enables you to obtain a &VmAreaRef in safe Rust code.

This patch only provides a way to lock the mmap read lock, but a
follow-up patch also provides a way to just lock the vma read lock.

Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
 rust/helpers/mm.c      |   6 ++
 rust/kernel/mm.rs      |  21 ++++++
 rust/kernel/mm/virt.rs | 176 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 203 insertions(+)

diff --git a/rust/helpers/mm.c b/rust/helpers/mm.c
index 7201747a5d31..7b72eb065a3e 100644
--- a/rust/helpers/mm.c
+++ b/rust/helpers/mm.c
@@ -37,3 +37,9 @@ void rust_helper_mmap_read_unlock(struct mm_struct *mm)
 {
 	mmap_read_unlock(mm);
 }
+
+struct vm_area_struct *rust_helper_vma_lookup(struct mm_struct *mm,
+					      unsigned long addr)
+{
+	return vma_lookup(mm, addr);
+}
diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
index 84cba581edaa..ace8e7d57afe 100644
--- a/rust/kernel/mm.rs
+++ b/rust/kernel/mm.rs
@@ -12,6 +12,8 @@
 };
 use core::{ops::Deref, ptr::NonNull};
 
+pub mod virt;
+
 /// A wrapper for the kernel's `struct mm_struct`.
 ///
 /// Since `mm_users` may be zero, the associated address space may not exist anymore. You can use
@@ -210,6 +212,25 @@ pub struct MmapReadGuard<'a> {
     _nts: NotThreadSafe,
 }
 
+impl<'a> MmapReadGuard<'a> {
+    /// Look up a vma at the given address.
+    #[inline]
+    pub fn vma_lookup(&self, vma_addr: usize) -> Option<&virt::VmAreaRef> {
+        // SAFETY: We hold a reference to the mm, so the pointer must be valid. Any value is okay
+        // for `vma_addr`.
+        let vma = unsafe { bindings::vma_lookup(self.mm.as_raw(), vma_addr as _) };
+
+        if vma.is_null() {
+            None
+        } else {
+            // SAFETY: We just checked that a vma was found, so the pointer is valid. Furthermore,
+            // the returned area will borrow from this read lock guard, so it can only be used
+            // while the mmap read lock is still held.
+            unsafe { Some(virt::VmAreaRef::from_raw(vma)) }
+        }
+    }
+}
+
 impl Drop for MmapReadGuard<'_> {
     #[inline]
     fn drop(&mut self) {
diff --git a/rust/kernel/mm/virt.rs b/rust/kernel/mm/virt.rs
new file mode 100644
index 000000000000..6df145fea128
--- /dev/null
+++ b/rust/kernel/mm/virt.rs
@@ -0,0 +1,176 @@
+// SPDX-License-Identifier: GPL-2.0
+
+// Copyright (C) 2024 Google LLC.
+
+//! Virtual memory.
+
+use crate::{bindings, types::Opaque};
+
+/// A wrapper for the kernel's `struct vm_area_struct` with read access.
+///
+/// It represents an area of virtual memory.
+///
+/// # Invariants
+///
+/// The caller must hold the mmap read lock or the vma read lock.
+#[repr(transparent)]
+pub struct VmAreaRef {
+    vma: Opaque<bindings::vm_area_struct>,
+}
+
+// Methods you can call when holding the mmap or vma read lock (or strong). They must be usable no
+// matter what the vma flags are.
+impl VmAreaRef {
+    /// Access a virtual memory area given a raw pointer.
+    ///
+    /// # Safety
+    ///
+    /// Callers must ensure that `vma` is valid for the duration of 'a, and that the mmap or vma
+    /// read lock (or stronger) is held for at least the duration of 'a.
+    #[inline]
+    pub unsafe fn from_raw<'a>(vma: *const bindings::vm_area_struct) -> &'a Self {
+        // SAFETY: The caller ensures that the invariants are satisfied for the duration of 'a.
+        unsafe { &*vma.cast() }
+    }
+
+    /// Returns a raw pointer to this area.
+    #[inline]
+    pub fn as_ptr(&self) -> *mut bindings::vm_area_struct {
+        self.vma.get()
+    }
+
+    /// Returns the flags associated with the virtual memory area.
+    ///
+    /// The possible flags are a combination of the constants in [`flags`].
+    #[inline]
+    pub fn flags(&self) -> vm_flags_t {
+        // SAFETY: By the type invariants, the caller holds at least the mmap read lock, so this
+        // access is not a data race.
+        unsafe { (*self.as_ptr()).__bindgen_anon_2.vm_flags as _ }
+    }
+
+    /// Returns the (inclusive) start address of the virtual memory area.
+    #[inline]
+    pub fn start(&self) -> usize {
+        // SAFETY: By the type invariants, the caller holds at least the mmap read lock, so this
+        // access is not a data race.
+        unsafe { (*self.as_ptr()).__bindgen_anon_1.__bindgen_anon_1.vm_start as _ }
+    }
+
+    /// Returns the (exclusive) end address of the virtual memory area.
+    #[inline]
+    pub fn end(&self) -> usize {
+        // SAFETY: By the type invariants, the caller holds at least the mmap read lock, so this
+        // access is not a data race.
+        unsafe { (*self.as_ptr()).__bindgen_anon_1.__bindgen_anon_1.vm_end as _ }
+    }
+
+    /// Zap pages in the given page range.
+    ///
+    /// This clears page table mappings for the range at the leaf level, leaving all other page
+    /// tables intact, and freeing any memory referenced by the VMA in this range. That is,
+    /// anonymous memory is completely freed, file-backed memory has its reference count on page
+    /// cache folio's dropped, any dirty data will still be written back to disk as usual.
+    #[inline]
+    pub fn zap_page_range_single(&self, address: usize, size: usize) {
+        // SAFETY: By the type invariants, the caller has read access to this VMA, which is
+        // sufficient for this method call. This method has no requirements on the vma flags. Any
+        // value of `address` and `size` is allowed.
+        unsafe {
+            bindings::zap_page_range_single(
+                self.as_ptr(),
+                address as _,
+                size as _,
+                core::ptr::null_mut(),
+            )
+        };
+    }
+}
+
+/// The integer type used for vma flags.
+#[doc(inline)]
+pub use bindings::vm_flags_t;
+
+/// All possible flags for [`VmAreaRef`].
+pub mod flags {
+    use super::vm_flags_t;
+    use crate::bindings;
+
+    /// No flags are set.
+    pub const NONE: vm_flags_t = bindings::VM_NONE as _;
+
+    /// Mapping allows reads.
+    pub const READ: vm_flags_t = bindings::VM_READ as _;
+
+    /// Mapping allows writes.
+    pub const WRITE: vm_flags_t = bindings::VM_WRITE as _;
+
+    /// Mapping allows execution.
+    pub const EXEC: vm_flags_t = bindings::VM_EXEC as _;
+
+    /// Mapping is shared.
+    pub const SHARED: vm_flags_t = bindings::VM_SHARED as _;
+
+    /// Mapping may be updated to allow reads.
+    pub const MAYREAD: vm_flags_t = bindings::VM_MAYREAD as _;
+
+    /// Mapping may be updated to allow writes.
+    pub const MAYWRITE: vm_flags_t = bindings::VM_MAYWRITE as _;
+
+    /// Mapping may be updated to allow execution.
+    pub const MAYEXEC: vm_flags_t = bindings::VM_MAYEXEC as _;
+
+    /// Mapping may be updated to be shared.
+    pub const MAYSHARE: vm_flags_t = bindings::VM_MAYSHARE as _;
+
+    /// Page-ranges managed without `struct page`, just pure PFN.
+    pub const PFNMAP: vm_flags_t = bindings::VM_PFNMAP as _;
+
+    /// Memory mapped I/O or similar.
+    pub const IO: vm_flags_t = bindings::VM_IO as _;
+
+    /// Do not copy this vma on fork.
+    pub const DONTCOPY: vm_flags_t = bindings::VM_DONTCOPY as _;
+
+    /// Cannot expand with mremap().
+    pub const DONTEXPAND: vm_flags_t = bindings::VM_DONTEXPAND as _;
+
+    /// Lock the pages covered when they are faulted in.
+    pub const LOCKONFAULT: vm_flags_t = bindings::VM_LOCKONFAULT as _;
+
+    /// Is a VM accounted object.
+    pub const ACCOUNT: vm_flags_t = bindings::VM_ACCOUNT as _;
+
+    /// Should the VM suppress accounting.
+    pub const NORESERVE: vm_flags_t = bindings::VM_NORESERVE as _;
+
+    /// Huge TLB Page VM.
+    pub const HUGETLB: vm_flags_t = bindings::VM_HUGETLB as _;
+
+    /// Synchronous page faults. (DAX-specific)
+    pub const SYNC: vm_flags_t = bindings::VM_SYNC as _;
+
+    /// Architecture-specific flag.
+    pub const ARCH_1: vm_flags_t = bindings::VM_ARCH_1 as _;
+
+    /// Wipe VMA contents in child on fork.
+    pub const WIPEONFORK: vm_flags_t = bindings::VM_WIPEONFORK as _;
+
+    /// Do not include in the core dump.
+    pub const DONTDUMP: vm_flags_t = bindings::VM_DONTDUMP as _;
+
+    /// Not soft dirty clean area.
+    pub const SOFTDIRTY: vm_flags_t = bindings::VM_SOFTDIRTY as _;
+
+    /// Can contain `struct page` and pure PFN pages.
+    pub const MIXEDMAP: vm_flags_t = bindings::VM_MIXEDMAP as _;
+
+    /// MADV_HUGEPAGE marked this vma.
+    pub const HUGEPAGE: vm_flags_t = bindings::VM_HUGEPAGE as _;
+
+    /// MADV_NOHUGEPAGE marked this vma.
+    pub const NOHUGEPAGE: vm_flags_t = bindings::VM_NOHUGEPAGE as _;
+
+    /// KSM may merge identical pages.
+    pub const MERGEABLE: vm_flags_t = bindings::VM_MERGEABLE as _;
+}

-- 
2.47.0.371.ga323438b13-goog



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 2/8] mm: rust: add vm_area_struct methods that require read access
  2024-11-22 15:40 ` [PATCH v9 2/8] mm: rust: add vm_area_struct methods that require read access Alice Ryhl
@ 2024-11-26 22:09   ` Jann Horn
  2024-11-27 12:01     ` Alice Ryhl
  0 siblings, 1 reply; 38+ messages in thread
From: Jann Horn @ 2024-11-26 22:09 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
	John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Suren Baghdasaryan,
	Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, linux-kernel, linux-mm, rust-for-linux,
	Andreas Hindborg

On Fri, Nov 22, 2024 at 4:41 PM Alice Ryhl <aliceryhl@google.com> wrote:
> This adds a type called VmAreaRef which is used when referencing a vma
> that you have read access to. Here, read access means that you hold
> either the mmap read lock or the vma read lock (or stronger).
>
> Additionally, a vma_lookup method is added to the mmap read guard, which
> enables you to obtain a &VmAreaRef in safe Rust code.
>
> This patch only provides a way to lock the mmap read lock, but a
> follow-up patch also provides a way to just lock the vma read lock.
>
> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>

Reviewed-by: Jann Horn <jannh@google.com>

with one comment:

> +    /// Zap pages in the given page range.
> +    ///
> +    /// This clears page table mappings for the range at the leaf level, leaving all other page
> +    /// tables intact, and freeing any memory referenced by the VMA in this range. That is,
> +    /// anonymous memory is completely freed, file-backed memory has its reference count on page
> +    /// cache folio's dropped, any dirty data will still be written back to disk as usual.
> +    #[inline]
> +    pub fn zap_page_range_single(&self, address: usize, size: usize) {
> +        // SAFETY: By the type invariants, the caller has read access to this VMA, which is
> +        // sufficient for this method call. This method has no requirements on the vma flags. Any
> +        // value of `address` and `size` is allowed.

If we really want to allow any address and size, we might want to add
an early bailout in zap_page_range_single(). The comment on top of
zap_page_range_single() currently says "The range must fit into one
VMA", and it looks like by the point we reach a bailout, we could have
gone through an interval tree walk via
mmu_notifier_invalidate_range_start()->__mmu_notifier_invalidate_range_start()->mn_itree_invalidate()
for a range that ends before it starts; I don't know how safe that is.

> +        unsafe {
> +            bindings::zap_page_range_single(
> +                self.as_ptr(),
> +                address as _,
> +                size as _,
> +                core::ptr::null_mut(),
> +            )
> +        };
> +    }
> +}


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 2/8] mm: rust: add vm_area_struct methods that require read access
  2024-11-26 22:09   ` Jann Horn
@ 2024-11-27 12:01     ` Alice Ryhl
  2024-11-27 15:40       ` Jann Horn
  0 siblings, 1 reply; 38+ messages in thread
From: Alice Ryhl @ 2024-11-27 12:01 UTC (permalink / raw)
  To: Jann Horn
  Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
	John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Suren Baghdasaryan,
	Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, linux-kernel, linux-mm, rust-for-linux,
	Andreas Hindborg

On Tue, Nov 26, 2024 at 11:10 PM Jann Horn <jannh@google.com> wrote:
>
> On Fri, Nov 22, 2024 at 4:41 PM Alice Ryhl <aliceryhl@google.com> wrote:
> > This adds a type called VmAreaRef which is used when referencing a vma
> > that you have read access to. Here, read access means that you hold
> > either the mmap read lock or the vma read lock (or stronger).
> >
> > Additionally, a vma_lookup method is added to the mmap read guard, which
> > enables you to obtain a &VmAreaRef in safe Rust code.
> >
> > This patch only provides a way to lock the mmap read lock, but a
> > follow-up patch also provides a way to just lock the vma read lock.
> >
> > Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
>
> Reviewed-by: Jann Horn <jannh@google.com>

Thanks!

> with one comment:
>
> > +    /// Zap pages in the given page range.
> > +    ///
> > +    /// This clears page table mappings for the range at the leaf level, leaving all other page
> > +    /// tables intact, and freeing any memory referenced by the VMA in this range. That is,
> > +    /// anonymous memory is completely freed, file-backed memory has its reference count on page
> > +    /// cache folio's dropped, any dirty data will still be written back to disk as usual.
> > +    #[inline]
> > +    pub fn zap_page_range_single(&self, address: usize, size: usize) {
> > +        // SAFETY: By the type invariants, the caller has read access to this VMA, which is
> > +        // sufficient for this method call. This method has no requirements on the vma flags. Any
> > +        // value of `address` and `size` is allowed.
>
> If we really want to allow any address and size, we might want to add
> an early bailout in zap_page_range_single(). The comment on top of
> zap_page_range_single() currently says "The range must fit into one
> VMA", and it looks like by the point we reach a bailout, we could have
> gone through an interval tree walk via
> mmu_notifier_invalidate_range_start()->__mmu_notifier_invalidate_range_start()->mn_itree_invalidate()
> for a range that ends before it starts; I don't know how safe that is.

I could change the comment on zap_page_range_single() to say:

"The range should be contained within a single VMA. Otherwise an error
is returned."

And then I can add an overflow check at the top of
zap_page_range_single(). Sounds ok?

Alice


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 2/8] mm: rust: add vm_area_struct methods that require read access
  2024-11-27 12:01     ` Alice Ryhl
@ 2024-11-27 15:40       ` Jann Horn
  2024-11-27 15:45         ` Alice Ryhl
  2024-11-29 11:44         ` Alice Ryhl
  0 siblings, 2 replies; 38+ messages in thread
From: Jann Horn @ 2024-11-27 15:40 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
	John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Suren Baghdasaryan,
	Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, linux-kernel, linux-mm, rust-for-linux,
	Andreas Hindborg

On Wed, Nov 27, 2024 at 1:01 PM Alice Ryhl <aliceryhl@google.com> wrote:
> On Tue, Nov 26, 2024 at 11:10 PM Jann Horn <jannh@google.com> wrote:
> >
> > On Fri, Nov 22, 2024 at 4:41 PM Alice Ryhl <aliceryhl@google.com> wrote:
> > > This adds a type called VmAreaRef which is used when referencing a vma
> > > that you have read access to. Here, read access means that you hold
> > > either the mmap read lock or the vma read lock (or stronger).
> > >
> > > Additionally, a vma_lookup method is added to the mmap read guard, which
> > > enables you to obtain a &VmAreaRef in safe Rust code.
> > >
> > > This patch only provides a way to lock the mmap read lock, but a
> > > follow-up patch also provides a way to just lock the vma read lock.
> > >
> > > Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> > > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> >
> > Reviewed-by: Jann Horn <jannh@google.com>
>
> Thanks!
>
> > with one comment:
> >
> > > +    /// Zap pages in the given page range.
> > > +    ///
> > > +    /// This clears page table mappings for the range at the leaf level, leaving all other page
> > > +    /// tables intact, and freeing any memory referenced by the VMA in this range. That is,
> > > +    /// anonymous memory is completely freed, file-backed memory has its reference count on page
> > > +    /// cache folio's dropped, any dirty data will still be written back to disk as usual.
> > > +    #[inline]
> > > +    pub fn zap_page_range_single(&self, address: usize, size: usize) {
> > > +        // SAFETY: By the type invariants, the caller has read access to this VMA, which is
> > > +        // sufficient for this method call. This method has no requirements on the vma flags. Any
> > > +        // value of `address` and `size` is allowed.
> >
> > If we really want to allow any address and size, we might want to add
> > an early bailout in zap_page_range_single(). The comment on top of
> > zap_page_range_single() currently says "The range must fit into one
> > VMA", and it looks like by the point we reach a bailout, we could have
> > gone through an interval tree walk via
> > mmu_notifier_invalidate_range_start()->__mmu_notifier_invalidate_range_start()->mn_itree_invalidate()
> > for a range that ends before it starts; I don't know how safe that is.
>
> I could change the comment on zap_page_range_single() to say:
>
> "The range should be contained within a single VMA. Otherwise an error
> is returned."
>
> And then I can add an overflow check at the top of
> zap_page_range_single(). Sounds ok?

Yes, I think changing the comment like that and adding a check for
whether address+size wraps around there addresses this.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 2/8] mm: rust: add vm_area_struct methods that require read access
  2024-11-27 15:40       ` Jann Horn
@ 2024-11-27 15:45         ` Alice Ryhl
  2024-11-27 16:16           ` Jann Horn
  2024-11-29 11:44         ` Alice Ryhl
  1 sibling, 1 reply; 38+ messages in thread
From: Alice Ryhl @ 2024-11-27 15:45 UTC (permalink / raw)
  To: Jann Horn
  Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
	John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Suren Baghdasaryan,
	Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, linux-kernel, linux-mm, rust-for-linux,
	Andreas Hindborg

On Wed, Nov 27, 2024 at 4:41 PM Jann Horn <jannh@google.com> wrote:
>
> On Wed, Nov 27, 2024 at 1:01 PM Alice Ryhl <aliceryhl@google.com> wrote:
> > On Tue, Nov 26, 2024 at 11:10 PM Jann Horn <jannh@google.com> wrote:
> > >
> > > On Fri, Nov 22, 2024 at 4:41 PM Alice Ryhl <aliceryhl@google.com> wrote:
> > > > This adds a type called VmAreaRef which is used when referencing a vma
> > > > that you have read access to. Here, read access means that you hold
> > > > either the mmap read lock or the vma read lock (or stronger).
> > > >
> > > > Additionally, a vma_lookup method is added to the mmap read guard, which
> > > > enables you to obtain a &VmAreaRef in safe Rust code.
> > > >
> > > > This patch only provides a way to lock the mmap read lock, but a
> > > > follow-up patch also provides a way to just lock the vma read lock.
> > > >
> > > > Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> > > > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> > >
> > > Reviewed-by: Jann Horn <jannh@google.com>
> >
> > Thanks!
> >
> > > with one comment:
> > >
> > > > +    /// Zap pages in the given page range.
> > > > +    ///
> > > > +    /// This clears page table mappings for the range at the leaf level, leaving all other page
> > > > +    /// tables intact, and freeing any memory referenced by the VMA in this range. That is,
> > > > +    /// anonymous memory is completely freed, file-backed memory has its reference count on page
> > > > +    /// cache folio's dropped, any dirty data will still be written back to disk as usual.
> > > > +    #[inline]
> > > > +    pub fn zap_page_range_single(&self, address: usize, size: usize) {
> > > > +        // SAFETY: By the type invariants, the caller has read access to this VMA, which is
> > > > +        // sufficient for this method call. This method has no requirements on the vma flags. Any
> > > > +        // value of `address` and `size` is allowed.
> > >
> > > If we really want to allow any address and size, we might want to add
> > > an early bailout in zap_page_range_single(). The comment on top of
> > > zap_page_range_single() currently says "The range must fit into one
> > > VMA", and it looks like by the point we reach a bailout, we could have
> > > gone through an interval tree walk via
> > > mmu_notifier_invalidate_range_start()->__mmu_notifier_invalidate_range_start()->mn_itree_invalidate()
> > > for a range that ends before it starts; I don't know how safe that is.
> >
> > I could change the comment on zap_page_range_single() to say:
> >
> > "The range should be contained within a single VMA. Otherwise an error
> > is returned."
> >
> > And then I can add an overflow check at the top of
> > zap_page_range_single(). Sounds ok?
>
> Yes, I think changing the comment like that and adding a check for
> whether address+size wraps around there addresses this.

Can there be a page at the top of the address space? If so, I have to
be a bit careful in the wrap-around check, because it should only fail
if the addition wraps around *and* the sum is non-zero.

Alice


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 2/8] mm: rust: add vm_area_struct methods that require read access
  2024-11-27 15:45         ` Alice Ryhl
@ 2024-11-27 16:16           ` Jann Horn
  0 siblings, 0 replies; 38+ messages in thread
From: Jann Horn @ 2024-11-27 16:16 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
	John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Suren Baghdasaryan,
	Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, linux-kernel, linux-mm, rust-for-linux,
	Andreas Hindborg

On Wed, Nov 27, 2024 at 4:46 PM Alice Ryhl <aliceryhl@google.com> wrote:
>
> On Wed, Nov 27, 2024 at 4:41 PM Jann Horn <jannh@google.com> wrote:
> >
> > On Wed, Nov 27, 2024 at 1:01 PM Alice Ryhl <aliceryhl@google.com> wrote:
> > > On Tue, Nov 26, 2024 at 11:10 PM Jann Horn <jannh@google.com> wrote:
> > > >
> > > > On Fri, Nov 22, 2024 at 4:41 PM Alice Ryhl <aliceryhl@google.com> wrote:
> > > > > This adds a type called VmAreaRef which is used when referencing a vma
> > > > > that you have read access to. Here, read access means that you hold
> > > > > either the mmap read lock or the vma read lock (or stronger).
> > > > >
> > > > > Additionally, a vma_lookup method is added to the mmap read guard, which
> > > > > enables you to obtain a &VmAreaRef in safe Rust code.
> > > > >
> > > > > This patch only provides a way to lock the mmap read lock, but a
> > > > > follow-up patch also provides a way to just lock the vma read lock.
> > > > >
> > > > > Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> > > > > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> > > >
> > > > Reviewed-by: Jann Horn <jannh@google.com>
> > >
> > > Thanks!
> > >
> > > > with one comment:
> > > >
> > > > > +    /// Zap pages in the given page range.
> > > > > +    ///
> > > > > +    /// This clears page table mappings for the range at the leaf level, leaving all other page
> > > > > +    /// tables intact, and freeing any memory referenced by the VMA in this range. That is,
> > > > > +    /// anonymous memory is completely freed, file-backed memory has its reference count on page
> > > > > +    /// cache folio's dropped, any dirty data will still be written back to disk as usual.
> > > > > +    #[inline]
> > > > > +    pub fn zap_page_range_single(&self, address: usize, size: usize) {
> > > > > +        // SAFETY: By the type invariants, the caller has read access to this VMA, which is
> > > > > +        // sufficient for this method call. This method has no requirements on the vma flags. Any
> > > > > +        // value of `address` and `size` is allowed.
> > > >
> > > > If we really want to allow any address and size, we might want to add
> > > > an early bailout in zap_page_range_single(). The comment on top of
> > > > zap_page_range_single() currently says "The range must fit into one
> > > > VMA", and it looks like by the point we reach a bailout, we could have
> > > > gone through an interval tree walk via
> > > > mmu_notifier_invalidate_range_start()->__mmu_notifier_invalidate_range_start()->mn_itree_invalidate()
> > > > for a range that ends before it starts; I don't know how safe that is.
> > >
> > > I could change the comment on zap_page_range_single() to say:
> > >
> > > "The range should be contained within a single VMA. Otherwise an error
> > > is returned."
> > >
> > > And then I can add an overflow check at the top of
> > > zap_page_range_single(). Sounds ok?
> >
> > Yes, I think changing the comment like that and adding a check for
> > whether address+size wraps around there addresses this.
>
> Can there be a page at the top of the address space? If so, I have to
> be a bit careful in the wrap-around check, because it should only fail
> if the addition wraps around *and* the sum is non-zero.

Uh, good question... I can't think of any architectures that have
userspace mappings right at the top of the address space, and having
objects allocated right at the end of the address space would violate
the C standard (because C guarantees that pointers pointing directly
behind an array behave reasonably, and this would not work if a
pointer pointing directly behind an array could wrap around to NULL).
And unless the userspace allocator takes responsibility for enforcing
this edge case, the kernel has to do it by preventing the last page of
the virtual address space from being mapped. (This is the reason why a
32-bit process on an arm64 kernel is normally only allowed to map
addresses up to 0xfffff000, see
<https://git.kernel.org/linus/d263119387de>.)

Allowing userspace to map the top of the address space would also be a
bad idea because then ERR_PTR() could return valid userspace pointers.

Looking at the current implementation of zap_page_range_single(), in
the case you're describing, unmap_single_vma() would get called with
end_addr==0, and then we'd bail out on the "if (end <= vma->vm_start)"
check. So calling zap_page_range_single() on such a range would
already fail.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 2/8] mm: rust: add vm_area_struct methods that require read access
  2024-11-27 15:40       ` Jann Horn
  2024-11-27 15:45         ` Alice Ryhl
@ 2024-11-29 11:44         ` Alice Ryhl
  2024-11-29 11:58           ` Lorenzo Stoakes
  1 sibling, 1 reply; 38+ messages in thread
From: Alice Ryhl @ 2024-11-29 11:44 UTC (permalink / raw)
  To: Jann Horn
  Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
	John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Suren Baghdasaryan,
	Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, linux-kernel, linux-mm, rust-for-linux,
	Andreas Hindborg

On Wed, Nov 27, 2024 at 4:41 PM Jann Horn <jannh@google.com> wrote:
>
> On Wed, Nov 27, 2024 at 1:01 PM Alice Ryhl <aliceryhl@google.com> wrote:
> > On Tue, Nov 26, 2024 at 11:10 PM Jann Horn <jannh@google.com> wrote:
> > >
> > > On Fri, Nov 22, 2024 at 4:41 PM Alice Ryhl <aliceryhl@google.com> wrote:
> > > > This adds a type called VmAreaRef which is used when referencing a vma
> > > > that you have read access to. Here, read access means that you hold
> > > > either the mmap read lock or the vma read lock (or stronger).
> > > >
> > > > Additionally, a vma_lookup method is added to the mmap read guard, which
> > > > enables you to obtain a &VmAreaRef in safe Rust code.
> > > >
> > > > This patch only provides a way to lock the mmap read lock, but a
> > > > follow-up patch also provides a way to just lock the vma read lock.
> > > >
> > > > Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> > > > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> > >
> > > Reviewed-by: Jann Horn <jannh@google.com>
> >
> > Thanks!
> >
> > > with one comment:
> > >
> > > > +    /// Zap pages in the given page range.
> > > > +    ///
> > > > +    /// This clears page table mappings for the range at the leaf level, leaving all other page
> > > > +    /// tables intact, and freeing any memory referenced by the VMA in this range. That is,
> > > > +    /// anonymous memory is completely freed, file-backed memory has its reference count on page
> > > > +    /// cache folio's dropped, any dirty data will still be written back to disk as usual.
> > > > +    #[inline]
> > > > +    pub fn zap_page_range_single(&self, address: usize, size: usize) {
> > > > +        // SAFETY: By the type invariants, the caller has read access to this VMA, which is
> > > > +        // sufficient for this method call. This method has no requirements on the vma flags. Any
> > > > +        // value of `address` and `size` is allowed.
> > >
> > > If we really want to allow any address and size, we might want to add
> > > an early bailout in zap_page_range_single(). The comment on top of
> > > zap_page_range_single() currently says "The range must fit into one
> > > VMA", and it looks like by the point we reach a bailout, we could have
> > > gone through an interval tree walk via
> > > mmu_notifier_invalidate_range_start()->__mmu_notifier_invalidate_range_start()->mn_itree_invalidate()
> > > for a range that ends before it starts; I don't know how safe that is.
> >
> > I could change the comment on zap_page_range_single() to say:
> >
> > "The range should be contained within a single VMA. Otherwise an error
> > is returned."
> >
> > And then I can add an overflow check at the top of
> > zap_page_range_single(). Sounds ok?
>
> Yes, I think changing the comment like that and adding a check for
> whether address+size wraps around there addresses this.

Hmm. Looking at this again now ...

For one, the function returns void so we can at best handle overflow
by performing a no-op.

Another question is, are we actually okay to call it with ranges
outside the vma? Does that just no-op or what? Maybe the Rust side
should just bail out early if the address range is not a subset of the
vma?

Alice


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 2/8] mm: rust: add vm_area_struct methods that require read access
  2024-11-29 11:44         ` Alice Ryhl
@ 2024-11-29 11:58           ` Lorenzo Stoakes
  0 siblings, 0 replies; 38+ messages in thread
From: Lorenzo Stoakes @ 2024-11-29 11:58 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Jann Horn, Miguel Ojeda, Matthew Wilcox, Vlastimil Babka,
	John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Suren Baghdasaryan,
	Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, linux-kernel, linux-mm, rust-for-linux,
	Andreas Hindborg

On Fri, Nov 29, 2024 at 12:44:10PM +0100, Alice Ryhl wrote:
> On Wed, Nov 27, 2024 at 4:41 PM Jann Horn <jannh@google.com> wrote:
> >
> > On Wed, Nov 27, 2024 at 1:01 PM Alice Ryhl <aliceryhl@google.com> wrote:
> > > On Tue, Nov 26, 2024 at 11:10 PM Jann Horn <jannh@google.com> wrote:
> > > >
> > > > On Fri, Nov 22, 2024 at 4:41 PM Alice Ryhl <aliceryhl@google.com> wrote:
> > > > > This adds a type called VmAreaRef which is used when referencing a vma
> > > > > that you have read access to. Here, read access means that you hold
> > > > > either the mmap read lock or the vma read lock (or stronger).
> > > > >
> > > > > Additionally, a vma_lookup method is added to the mmap read guard, which
> > > > > enables you to obtain a &VmAreaRef in safe Rust code.
> > > > >
> > > > > This patch only provides a way to lock the mmap read lock, but a
> > > > > follow-up patch also provides a way to just lock the vma read lock.
> > > > >
> > > > > Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> > > > > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> > > >
> > > > Reviewed-by: Jann Horn <jannh@google.com>
> > >
> > > Thanks!
> > >
> > > > with one comment:
> > > >
> > > > > +    /// Zap pages in the given page range.
> > > > > +    ///
> > > > > +    /// This clears page table mappings for the range at the leaf level, leaving all other page
> > > > > +    /// tables intact, and freeing any memory referenced by the VMA in this range. That is,
> > > > > +    /// anonymous memory is completely freed, file-backed memory has its reference count on page
> > > > > +    /// cache folio's dropped, any dirty data will still be written back to disk as usual.
> > > > > +    #[inline]
> > > > > +    pub fn zap_page_range_single(&self, address: usize, size: usize) {
> > > > > +        // SAFETY: By the type invariants, the caller has read access to this VMA, which is
> > > > > +        // sufficient for this method call. This method has no requirements on the vma flags. Any
> > > > > +        // value of `address` and `size` is allowed.
> > > >
> > > > If we really want to allow any address and size, we might want to add
> > > > an early bailout in zap_page_range_single(). The comment on top of
> > > > zap_page_range_single() currently says "The range must fit into one
> > > > VMA", and it looks like by the point we reach a bailout, we could have
> > > > gone through an interval tree walk via
> > > > mmu_notifier_invalidate_range_start()->__mmu_notifier_invalidate_range_start()->mn_itree_invalidate()
> > > > for a range that ends before it starts; I don't know how safe that is.
> > >
> > > I could change the comment on zap_page_range_single() to say:
> > >
> > > "The range should be contained within a single VMA. Otherwise an error
> > > is returned."
> > >
> > > And then I can add an overflow check at the top of
> > > zap_page_range_single(). Sounds ok?
> >
> > Yes, I think changing the comment like that and adding a check for
> > whether address+size wraps around there addresses this.
>
> Hmm. Looking at this again now ...
>
> For one, the function returns void so we can at best handle overflow
> by performing a no-op.
>
> Another question is, are we actually okay to call it with ranges
> outside the vma? Does that just no-op or what? Maybe the Rust side
> should just bail out early if the address range is not a subset of the
> vma?

In unmap_single_vma() invoked by zap_page_range_single() we check this:

	unsigned long start = max(vma->vm_start, start_addr);

	...

	if (start >= vma->vm_end)
		return;
	end = min(vma->vm_end, end_addr);
	if (end <= vma->vm_start)
		return;

So you could specify start <= vma->vm_end and end > vma->vm_start and it'll get
clamped like:

0                                        MAX
        start  <-clamp->
<---------------------->xxxxxxxxxxxxxxxxxxxx
               |--------| VMA
xxxxxxxxxxxxxxxx<-------------------------->
	        <-clamp->

However note that we will tell mmum_notifier_range_init() and
hugetlb_zap_begin() incorrect values in this case... not sure.

I'd prefer if rust would strictly only allowed ranges within the VMA.

>
> Alice


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v9 3/8] mm: rust: add vm_insert_page
  2024-11-22 15:40 [PATCH v9 0/8] Rust support for mm_struct, vm_area_struct, and mmap Alice Ryhl
  2024-11-22 15:40 ` [PATCH v9 1/8] mm: rust: add abstraction for struct mm_struct Alice Ryhl
  2024-11-22 15:40 ` [PATCH v9 2/8] mm: rust: add vm_area_struct methods that require read access Alice Ryhl
@ 2024-11-22 15:40 ` Alice Ryhl
  2024-11-22 15:40 ` [PATCH v9 4/8] mm: rust: add lock_vma_under_rcu Alice Ryhl
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 38+ messages in thread
From: Alice Ryhl @ 2024-11-22 15:40 UTC (permalink / raw)
  To: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
	John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan
  Cc: Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, linux-kernel, linux-mm, rust-for-linux, Alice Ryhl,
	Andreas Hindborg

The vm_insert_page method is only usable on vmas with the VM_MIXEDMAP
flag, so we introduce a new type to keep track of such vmas.

The approach used in this patch assumes that we will not need to encode
many flag combinations in the type. I don't think we need to encode more
than VM_MIXEDMAP and VM_PFNMAP as things are now. However, if that
becomes necessary, using generic parameters in a single type would scale
better as the number of flags increases.

Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
 rust/kernel/mm/virt.rs | 70 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 69 insertions(+), 1 deletion(-)

diff --git a/rust/kernel/mm/virt.rs b/rust/kernel/mm/virt.rs
index 6df145fea128..3e494e40b530 100644
--- a/rust/kernel/mm/virt.rs
+++ b/rust/kernel/mm/virt.rs
@@ -4,7 +4,14 @@
 
 //! Virtual memory.
 
-use crate::{bindings, types::Opaque};
+use crate::{
+    bindings,
+    error::{to_result, Result},
+    page::Page,
+    types::Opaque,
+};
+
+use core::ops::Deref;
 
 /// A wrapper for the kernel's `struct vm_area_struct` with read access.
 ///
@@ -85,6 +92,67 @@ pub fn zap_page_range_single(&self, address: usize, size: usize) {
             )
         };
     }
+
+    /// Check whether the `VM_MIXEDMAP` flag is set.
+    ///
+    /// This can be used to access methods that require `VM_MIXEDMAP` to be set.
+    #[inline]
+    pub fn as_mixedmap_vma(&self) -> Option<&VmAreaMixedMap> {
+        if self.flags() & flags::MIXEDMAP != 0 {
+            // SAFETY: We just checked that `VM_MIXEDMAP` is set. All other requirements are
+            // satisfied by the type invariants of `VmAreaRef`.
+            Some(unsafe { VmAreaMixedMap::from_raw(self.as_ptr()) })
+        } else {
+            None
+        }
+    }
+}
+
+/// A wrapper for the kernel's `struct vm_area_struct` with read access and `VM_MIXEDMAP` set.
+///
+/// It represents an area of virtual memory.
+///
+/// # Invariants
+///
+/// The caller must hold the mmap read lock or the vma read lock. The `VM_MIXEDMAP` flag must be
+/// set.
+#[repr(transparent)]
+pub struct VmAreaMixedMap {
+    vma: VmAreaRef,
+}
+
+// Make all `VmAreaRef` methods available on `VmAreaMixedMap`.
+impl Deref for VmAreaMixedMap {
+    type Target = VmAreaRef;
+
+    #[inline]
+    fn deref(&self) -> &VmAreaRef {
+        &self.vma
+    }
+}
+
+impl VmAreaMixedMap {
+    /// Access a virtual memory area given a raw pointer.
+    ///
+    /// # Safety
+    ///
+    /// Callers must ensure that `vma` is valid for the duration of 'a, and that the mmap read lock
+    /// (or stronger) is held for at least the duration of 'a. The `VM_MIXEDMAP` flag must be set.
+    #[inline]
+    pub unsafe fn from_raw<'a>(vma: *const bindings::vm_area_struct) -> &'a Self {
+        // SAFETY: The caller ensures that the invariants are satisfied for the duration of 'a.
+        unsafe { &*vma.cast() }
+    }
+
+    /// Maps a single page at the given address within the virtual memory area.
+    ///
+    /// This operation does not take ownership of the page.
+    #[inline]
+    pub fn vm_insert_page(&self, address: usize, page: &Page) -> Result {
+        // SAFETY: The caller has read access and has verified that `VM_MIXEDMAP` is set. The page
+        // is order 0. The address is checked on the C side so it can take any value.
+        to_result(unsafe { bindings::vm_insert_page(self.as_ptr(), address as _, page.as_ptr()) })
+    }
 }
 
 /// The integer type used for vma flags.

-- 
2.47.0.371.ga323438b13-goog



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v9 4/8] mm: rust: add lock_vma_under_rcu
  2024-11-22 15:40 [PATCH v9 0/8] Rust support for mm_struct, vm_area_struct, and mmap Alice Ryhl
                   ` (2 preceding siblings ...)
  2024-11-22 15:40 ` [PATCH v9 3/8] mm: rust: add vm_insert_page Alice Ryhl
@ 2024-11-22 15:40 ` Alice Ryhl
  2024-11-26 21:50   ` Jann Horn
  2024-11-22 15:40 ` [PATCH v9 5/8] mm: rust: add mmput_async support Alice Ryhl
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 38+ messages in thread
From: Alice Ryhl @ 2024-11-22 15:40 UTC (permalink / raw)
  To: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
	John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan
  Cc: Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, linux-kernel, linux-mm, rust-for-linux, Alice Ryhl,
	Andreas Hindborg

Currently, the binder driver always uses the mmap lock to make changes
to its vma. Because the mmap lock is global to the process, this can
involve significant contention. However, the kernel has a feature called
per-vma locks, which can significantly reduce contention. For example,
you can take a vma lock in parallel with an mmap write lock. This is
important because contention on the mmap lock has been a long-term
recurring challenge for the Binder driver.

This patch introduces support for using `lock_vma_under_rcu` from Rust.
The Rust Binder driver will be able to use this to reduce contention on
the mmap lock.

Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
 rust/helpers/mm.c |  5 +++++
 rust/kernel/mm.rs | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 61 insertions(+)

diff --git a/rust/helpers/mm.c b/rust/helpers/mm.c
index 7b72eb065a3e..81b510c96fd2 100644
--- a/rust/helpers/mm.c
+++ b/rust/helpers/mm.c
@@ -43,3 +43,8 @@ struct vm_area_struct *rust_helper_vma_lookup(struct mm_struct *mm,
 {
 	return vma_lookup(mm, addr);
 }
+
+void rust_helper_vma_end_read(struct vm_area_struct *vma)
+{
+	vma_end_read(vma);
+}
diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
index ace8e7d57afe..425b73a9dfe6 100644
--- a/rust/kernel/mm.rs
+++ b/rust/kernel/mm.rs
@@ -13,6 +13,7 @@
 use core::{ops::Deref, ptr::NonNull};
 
 pub mod virt;
+use virt::VmAreaRef;
 
 /// A wrapper for the kernel's `struct mm_struct`.
 ///
@@ -170,6 +171,32 @@ pub unsafe fn from_raw<'a>(ptr: *const bindings::mm_struct) -> &'a MmWithUser {
         unsafe { &*ptr.cast() }
     }
 
+    /// Attempt to access a vma using the vma read lock.
+    ///
+    /// This is an optimistic trylock operation, so it may fail if there is contention. In that
+    /// case, you should fall back to taking the mmap read lock.
+    ///
+    /// When per-vma locks are disabled, this always returns `None`.
+    #[inline]
+    pub fn lock_vma_under_rcu(&self, vma_addr: usize) -> Option<VmaReadGuard<'_>> {
+        #[cfg(CONFIG_PER_VMA_LOCK)]
+        {
+            // SAFETY: Calling `bindings::lock_vma_under_rcu` is always okay given an mm where
+            // `mm_users` is non-zero.
+            let vma = unsafe { bindings::lock_vma_under_rcu(self.as_raw(), vma_addr as _) };
+            if !vma.is_null() {
+                return Some(VmaReadGuard {
+                    // SAFETY: If `lock_vma_under_rcu` returns a non-null ptr, then it points at a
+                    // valid vma. The vma is stable for as long as the vma read lock is held.
+                    vma: unsafe { VmAreaRef::from_raw(vma) },
+                    _nts: NotThreadSafe,
+                });
+            }
+        }
+
+        None
+    }
+
     /// Lock the mmap read lock.
     #[inline]
     pub fn mmap_read_lock(&self) -> MmapReadGuard<'_> {
@@ -238,3 +265,32 @@ fn drop(&mut self) {
         unsafe { bindings::mmap_read_unlock(self.mm.as_raw()) };
     }
 }
+
+/// A guard for the vma read lock.
+///
+/// # Invariants
+///
+/// This `VmaReadGuard` guard owns the vma read lock.
+pub struct VmaReadGuard<'a> {
+    vma: &'a VmAreaRef,
+    // `vma_end_read` must be called on the same thread as where the lock was taken
+    _nts: NotThreadSafe,
+}
+
+// Make all `VmAreaRef` methods available on `VmaReadGuard`.
+impl Deref for VmaReadGuard<'_> {
+    type Target = VmAreaRef;
+
+    #[inline]
+    fn deref(&self) -> &VmAreaRef {
+        self.vma
+    }
+}
+
+impl Drop for VmaReadGuard<'_> {
+    #[inline]
+    fn drop(&mut self) {
+        // SAFETY: We hold the read lock by the type invariants.
+        unsafe { bindings::vma_end_read(self.vma.as_ptr()) };
+    }
+}

-- 
2.47.0.371.ga323438b13-goog



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 4/8] mm: rust: add lock_vma_under_rcu
  2024-11-22 15:40 ` [PATCH v9 4/8] mm: rust: add lock_vma_under_rcu Alice Ryhl
@ 2024-11-26 21:50   ` Jann Horn
  0 siblings, 0 replies; 38+ messages in thread
From: Jann Horn @ 2024-11-26 21:50 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
	John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Suren Baghdasaryan,
	Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, linux-kernel, linux-mm, rust-for-linux,
	Andreas Hindborg

On Fri, Nov 22, 2024 at 4:41 PM Alice Ryhl <aliceryhl@google.com> wrote:
> Currently, the binder driver always uses the mmap lock to make changes
> to its vma. Because the mmap lock is global to the process, this can
> involve significant contention. However, the kernel has a feature called
> per-vma locks, which can significantly reduce contention. For example,
> you can take a vma lock in parallel with an mmap write lock. This is
> important because contention on the mmap lock has been a long-term
> recurring challenge for the Binder driver.
>
> This patch introduces support for using `lock_vma_under_rcu` from Rust.
> The Rust Binder driver will be able to use this to reduce contention on
> the mmap lock.
>
> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>

Reviewed-by: Jann Horn <jannh@google.com>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v9 5/8] mm: rust: add mmput_async support
  2024-11-22 15:40 [PATCH v9 0/8] Rust support for mm_struct, vm_area_struct, and mmap Alice Ryhl
                   ` (3 preceding siblings ...)
  2024-11-22 15:40 ` [PATCH v9 4/8] mm: rust: add lock_vma_under_rcu Alice Ryhl
@ 2024-11-22 15:40 ` Alice Ryhl
  2024-11-22 15:40 ` [PATCH v9 6/8] mm: rust: add VmAreaNew for f_ops->mmap() Alice Ryhl
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 38+ messages in thread
From: Alice Ryhl @ 2024-11-22 15:40 UTC (permalink / raw)
  To: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
	John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan
  Cc: Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, linux-kernel, linux-mm, rust-for-linux, Alice Ryhl,
	Andreas Hindborg

Adds an MmWithUserAsync type that uses mmput_async when dropped but is
otherwise identical to MmWithUser. This has to be done using a separate
type because the thing we are changing is the destructor.

Rust Binder needs this to avoid a certain deadlock. See commit
9a9ab0d96362 ("binder: fix race between mmput() and do_exit()") for
details. It's also needed in the shrinker to avoid cleaning up the mm in
the shrinker's context.

Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
 rust/kernel/mm.rs | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
index 425b73a9dfe6..50f4861ae4b9 100644
--- a/rust/kernel/mm.rs
+++ b/rust/kernel/mm.rs
@@ -98,6 +98,48 @@ fn deref(&self) -> &Mm {
     }
 }
 
+/// A wrapper for the kernel's `struct mm_struct`.
+///
+/// This type is identical to `MmWithUser` except that it uses `mmput_async` when dropping a
+/// refcount. This means that the destructor of `ARef<MmWithUserAsync>` is safe to call in atomic
+/// context.
+///
+/// # Invariants
+///
+/// Values of this type are always refcounted using `mmget`. The value of `mm_users` is non-zero.
+#[repr(transparent)]
+pub struct MmWithUserAsync {
+    mm: MmWithUser,
+}
+
+// SAFETY: It is safe to call `mmput_async` on another thread than where `mmget` was called.
+unsafe impl Send for MmWithUserAsync {}
+// SAFETY: All methods on `MmWithUserAsync` can be called in parallel from several threads.
+unsafe impl Sync for MmWithUserAsync {}
+
+// SAFETY: By the type invariants, this type is always refcounted.
+unsafe impl AlwaysRefCounted for MmWithUserAsync {
+    fn inc_ref(&self) {
+        // SAFETY: The pointer is valid since self is a reference.
+        unsafe { bindings::mmget(self.as_raw()) };
+    }
+
+    unsafe fn dec_ref(obj: NonNull<Self>) {
+        // SAFETY: The caller is giving up their refcount.
+        unsafe { bindings::mmput_async(obj.cast().as_ptr()) };
+    }
+}
+
+// Make all `MmWithUser` methods available on `MmWithUserAsync`.
+impl Deref for MmWithUserAsync {
+    type Target = MmWithUser;
+
+    #[inline]
+    fn deref(&self) -> &MmWithUser {
+        &self.mm
+    }
+}
+
 // These methods are safe to call even if `mm_users` is zero.
 impl Mm {
     /// Call `mmgrab` on `current.mm`.
@@ -171,6 +213,13 @@ pub unsafe fn from_raw<'a>(ptr: *const bindings::mm_struct) -> &'a MmWithUser {
         unsafe { &*ptr.cast() }
     }
 
+    /// Use `mmput_async` when dropping this refcount.
+    #[inline]
+    pub fn into_mmput_async(me: ARef<MmWithUser>) -> ARef<MmWithUserAsync> {
+        // SAFETY: The layouts and invariants are compatible.
+        unsafe { ARef::from_raw(ARef::into_raw(me).cast()) }
+    }
+
     /// Attempt to access a vma using the vma read lock.
     ///
     /// This is an optimistic trylock operation, so it may fail if there is contention. In that

-- 
2.47.0.371.ga323438b13-goog



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v9 6/8] mm: rust: add VmAreaNew for f_ops->mmap()
  2024-11-22 15:40 [PATCH v9 0/8] Rust support for mm_struct, vm_area_struct, and mmap Alice Ryhl
                   ` (4 preceding siblings ...)
  2024-11-22 15:40 ` [PATCH v9 5/8] mm: rust: add mmput_async support Alice Ryhl
@ 2024-11-22 15:40 ` Alice Ryhl
  2024-11-22 17:33   ` Lorenzo Stoakes
  2024-11-26 21:29   ` Jann Horn
  2024-11-22 15:40 ` [PATCH v9 7/8] rust: miscdevice: add mmap support Alice Ryhl
  2024-11-22 15:40 ` [PATCH v9 8/8] task: rust: rework how current is accessed Alice Ryhl
  7 siblings, 2 replies; 38+ messages in thread
From: Alice Ryhl @ 2024-11-22 15:40 UTC (permalink / raw)
  To: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
	John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan
  Cc: Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, linux-kernel, linux-mm, rust-for-linux, Alice Ryhl,
	Andreas Hindborg

This type will be used when setting up a new vma in an f_ops->mmap()
hook. Using a separate type from VmAreaRef allows us to have a separate
set of operations that you are only able to use during the mmap() hook.
For example, the VM_MIXEDMAP flag must not be changed after the initial
setup that happens during the f_ops->mmap() hook.

To avoid setting invalid flag values, the methods for clearing
VM_MAYWRITE and similar involve a check of VM_WRITE, and return an error
if VM_WRITE is set. Trying to use `try_clear_maywrite` without checking
the return value results in a compilation error because the `Result`
type is marked #[must_use].

For now, there's only a method for VM_MIXEDMAP and not VM_PFNMAP. When
we add a VM_PFNMAP method, we will need some way to prevent you from
setting both VM_MIXEDMAP and VM_PFNMAP on the same vma.

Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
 rust/kernel/mm/virt.rs | 179 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 178 insertions(+), 1 deletion(-)

diff --git a/rust/kernel/mm/virt.rs b/rust/kernel/mm/virt.rs
index 3e494e40b530..2a49c29a49c7 100644
--- a/rust/kernel/mm/virt.rs
+++ b/rust/kernel/mm/virt.rs
@@ -6,7 +6,7 @@
 
 use crate::{
     bindings,
-    error::{to_result, Result},
+    error::{code::EINVAL, to_result, Result},
     page::Page,
     types::Opaque,
 };
@@ -155,6 +155,183 @@ pub fn vm_insert_page(&self, address: usize, page: &Page) -> Result {
     }
 }
 
+/// A builder for setting up a vma in an `f_ops->mmap()` hook.
+///
+/// # Invariants
+///
+/// For the duration of 'a, the referenced vma must be undergoing initialization in an
+/// `f_ops->mmap()` hook.
+pub struct VmAreaNew {
+    vma: VmAreaRef,
+}
+
+// Make all `VmAreaRef` methods available on `VmAreaNew`.
+impl Deref for VmAreaNew {
+    type Target = VmAreaRef;
+
+    #[inline]
+    fn deref(&self) -> &VmAreaRef {
+        &self.vma
+    }
+}
+
+impl VmAreaNew {
+    /// Access a virtual memory area given a raw pointer.
+    ///
+    /// # Safety
+    ///
+    /// Callers must ensure that `vma` is undergoing initial vma setup for the duration of 'a.
+    #[inline]
+    pub unsafe fn from_raw<'a>(vma: *const bindings::vm_area_struct) -> &'a Self {
+        // SAFETY: The caller ensures that the invariants are satisfied for the duration of 'a.
+        unsafe { &*vma.cast() }
+    }
+
+    /// Internal method for updating the vma flags.
+    ///
+    /// # Safety
+    ///
+    /// This must not be used to set the flags to an invalid value.
+    #[inline]
+    unsafe fn update_flags(&self, set: vm_flags_t, unset: vm_flags_t) {
+        let mut flags = self.flags();
+        flags |= set;
+        flags &= !unset;
+
+        // SAFETY: This is not a data race: the vma is undergoing initial setup, so it's not yet
+        // shared. Additionally, `VmAreaNew` is `!Sync`, so it cannot be used to write in parallel.
+        // The caller promises that this does not set the flags to an invalid value.
+        unsafe { (*self.as_ptr()).__bindgen_anon_2.__vm_flags = flags };
+    }
+
+    /// Set the `VM_MIXEDMAP` flag on this vma.
+    ///
+    /// This enables the vma to contain both `struct page` and pure PFN pages. Returns a reference
+    /// that can be used to call `vm_insert_page` on the vma.
+    #[inline]
+    pub fn set_mixedmap(&self) -> &VmAreaMixedMap {
+        // SAFETY: We don't yet provide a way to set VM_PFNMAP, so this cannot put the flags in an
+        // invalid state.
+        unsafe { self.update_flags(flags::MIXEDMAP, 0) };
+
+        // SAFETY: We just set `VM_MIXEDMAP` on the vma.
+        unsafe { VmAreaMixedMap::from_raw(self.vma.as_ptr()) }
+    }
+
+    /// Set the `VM_IO` flag on this vma.
+    ///
+    /// This marks the vma as being a memory-mapped I/O region.
+    #[inline]
+    pub fn set_io(&self) {
+        // SAFETY: Setting the VM_IO flag is always okay.
+        unsafe { self.update_flags(flags::IO, 0) };
+    }
+
+    /// Set the `VM_DONTEXPAND` flag on this vma.
+    ///
+    /// This prevents the vma from being expanded with `mremap()`.
+    #[inline]
+    pub fn set_dontexpand(&self) {
+        // SAFETY: Setting the VM_DONTEXPAND flag is always okay.
+        unsafe { self.update_flags(flags::DONTEXPAND, 0) };
+    }
+
+    /// Set the `VM_DONTCOPY` flag on this vma.
+    ///
+    /// This prevents the vma from being copied on fork. This option is only permanent if `VM_IO`
+    /// is set.
+    #[inline]
+    pub fn set_dontcopy(&self) {
+        // SAFETY: Setting the VM_DONTCOPY flag is always okay.
+        unsafe { self.update_flags(flags::DONTCOPY, 0) };
+    }
+
+    /// Set the `VM_DONTDUMP` flag on this vma.
+    ///
+    /// This prevents the vma from being included in core dumps. This option is only permanent if
+    /// `VM_IO` is set.
+    #[inline]
+    pub fn set_dontdump(&self) {
+        // SAFETY: Setting the VM_DONTDUMP flag is always okay.
+        unsafe { self.update_flags(flags::DONTDUMP, 0) };
+    }
+
+    /// Returns whether `VM_READ` is set.
+    ///
+    /// This flag indicates whether userspace is mapping this vma as readable.
+    #[inline]
+    pub fn get_read(&self) -> bool {
+        (self.flags() & flags::READ) != 0
+    }
+
+    /// Try to clear the `VM_MAYREAD` flag, failing if `VM_READ` is set.
+    ///
+    /// This flag indicates whether userspace is allowed to make this vma readable with
+    /// `mprotect()`.
+    ///
+    /// Note that this operation is irreversible. Once `VM_MAYREAD` has been cleared, it can never
+    /// be set again.
+    #[inline]
+    pub fn try_clear_mayread(&self) -> Result {
+        if self.get_read() {
+            return Err(EINVAL);
+        }
+        // SAFETY: Clearing `VM_MAYREAD` is okay when `VM_READ` is not set.
+        unsafe { self.update_flags(0, flags::MAYREAD) };
+        Ok(())
+    }
+
+    /// Returns whether `VM_WRITE` is set.
+    ///
+    /// This flag indicates whether userspace is mapping this vma as writable.
+    #[inline]
+    pub fn get_write(&self) -> bool {
+        (self.flags() & flags::WRITE) != 0
+    }
+
+    /// Try to clear the `VM_MAYWRITE` flag, failing if `VM_WRITE` is set.
+    ///
+    /// This flag indicates whether userspace is allowed to make this vma writable with
+    /// `mprotect()`.
+    ///
+    /// Note that this operation is irreversible. Once `VM_MAYWRITE` has been cleared, it can never
+    /// be set again.
+    #[inline]
+    pub fn try_clear_maywrite(&self) -> Result {
+        if self.get_write() {
+            return Err(EINVAL);
+        }
+        // SAFETY: Clearing `VM_MAYWRITE` is okay when `VM_WRITE` is not set.
+        unsafe { self.update_flags(0, flags::MAYWRITE) };
+        Ok(())
+    }
+
+    /// Returns whether `VM_EXEC` is set.
+    ///
+    /// This flag indicates whether userspace is mapping this vma as executable.
+    #[inline]
+    pub fn get_exec(&self) -> bool {
+        (self.flags() & flags::EXEC) != 0
+    }
+
+    /// Try to clear the `VM_MAYEXEC` flag, failing if `VM_EXEC` is set.
+    ///
+    /// This flag indicates whether userspace is allowed to make this vma executable with
+    /// `mprotect()`.
+    ///
+    /// Note that this operation is irreversible. Once `VM_MAYEXEC` has been cleared, it can never
+    /// be set again.
+    #[inline]
+    pub fn try_clear_mayexec(&self) -> Result {
+        if self.get_exec() {
+            return Err(EINVAL);
+        }
+        // SAFETY: Clearing `VM_MAYEXEC` is okay when `VM_EXEC` is not set.
+        unsafe { self.update_flags(0, flags::MAYEXEC) };
+        Ok(())
+    }
+}
+
 /// The integer type used for vma flags.
 #[doc(inline)]
 pub use bindings::vm_flags_t;

-- 
2.47.0.371.ga323438b13-goog



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 6/8] mm: rust: add VmAreaNew for f_ops->mmap()
  2024-11-22 15:40 ` [PATCH v9 6/8] mm: rust: add VmAreaNew for f_ops->mmap() Alice Ryhl
@ 2024-11-22 17:33   ` Lorenzo Stoakes
  2024-11-26 21:29   ` Jann Horn
  1 sibling, 0 replies; 38+ messages in thread
From: Lorenzo Stoakes @ 2024-11-22 17:33 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Miguel Ojeda, Matthew Wilcox, Vlastimil Babka, John Hubbard,
	Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
	Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, linux-kernel, linux-mm, rust-for-linux,
	Andreas Hindborg

On Fri, Nov 22, 2024 at 03:40:31PM +0000, Alice Ryhl wrote:
> This type will be used when setting up a new vma in an f_ops->mmap()
> hook. Using a separate type from VmAreaRef allows us to have a separate
> set of operations that you are only able to use during the mmap() hook.
> For example, the VM_MIXEDMAP flag must not be changed after the initial
> setup that happens during the f_ops->mmap() hook.

Nice, thanks!

>
> To avoid setting invalid flag values, the methods for clearing
> VM_MAYWRITE and similar involve a check of VM_WRITE, and return an error
> if VM_WRITE is set. Trying to use `try_clear_maywrite` without checking
> the return value results in a compilation error because the `Result`
> type is marked #[must_use].
>
> For now, there's only a method for VM_MIXEDMAP and not VM_PFNMAP. When
> we add a VM_PFNMAP method, we will need some way to prevent you from
> setting both VM_MIXEDMAP and VM_PFNMAP on the same vma.
>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>

LGTM, so:

Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)

> ---
>  rust/kernel/mm/virt.rs | 179 ++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 178 insertions(+), 1 deletion(-)
>
> diff --git a/rust/kernel/mm/virt.rs b/rust/kernel/mm/virt.rs
> index 3e494e40b530..2a49c29a49c7 100644
> --- a/rust/kernel/mm/virt.rs
> +++ b/rust/kernel/mm/virt.rs
> @@ -6,7 +6,7 @@
>
>  use crate::{
>      bindings,
> -    error::{to_result, Result},
> +    error::{code::EINVAL, to_result, Result},
>      page::Page,
>      types::Opaque,
>  };
> @@ -155,6 +155,183 @@ pub fn vm_insert_page(&self, address: usize, page: &Page) -> Result {
>      }
>  }
>
> +/// A builder for setting up a vma in an `f_ops->mmap()` hook.
> +///
> +/// # Invariants
> +///
> +/// For the duration of 'a, the referenced vma must be undergoing initialization in an
> +/// `f_ops->mmap()` hook.
> +pub struct VmAreaNew {
> +    vma: VmAreaRef,
> +}
> +
> +// Make all `VmAreaRef` methods available on `VmAreaNew`.
> +impl Deref for VmAreaNew {
> +    type Target = VmAreaRef;
> +
> +    #[inline]
> +    fn deref(&self) -> &VmAreaRef {
> +        &self.vma
> +    }
> +}
> +
> +impl VmAreaNew {
> +    /// Access a virtual memory area given a raw pointer.
> +    ///
> +    /// # Safety
> +    ///
> +    /// Callers must ensure that `vma` is undergoing initial vma setup for the duration of 'a.
> +    #[inline]
> +    pub unsafe fn from_raw<'a>(vma: *const bindings::vm_area_struct) -> &'a Self {
> +        // SAFETY: The caller ensures that the invariants are satisfied for the duration of 'a.
> +        unsafe { &*vma.cast() }
> +    }
> +
> +    /// Internal method for updating the vma flags.
> +    ///
> +    /// # Safety
> +    ///
> +    /// This must not be used to set the flags to an invalid value.
> +    #[inline]
> +    unsafe fn update_flags(&self, set: vm_flags_t, unset: vm_flags_t) {
> +        let mut flags = self.flags();
> +        flags |= set;
> +        flags &= !unset;
> +
> +        // SAFETY: This is not a data race: the vma is undergoing initial setup, so it's not yet
> +        // shared. Additionally, `VmAreaNew` is `!Sync`, so it cannot be used to write in parallel.
> +        // The caller promises that this does not set the flags to an invalid value.
> +        unsafe { (*self.as_ptr()).__bindgen_anon_2.__vm_flags = flags };

Thanks.

> +    }
> +
> +    /// Set the `VM_MIXEDMAP` flag on this vma.
> +    ///
> +    /// This enables the vma to contain both `struct page` and pure PFN pages. Returns a reference
> +    /// that can be used to call `vm_insert_page` on the vma.
> +    #[inline]
> +    pub fn set_mixedmap(&self) -> &VmAreaMixedMap {
> +        // SAFETY: We don't yet provide a way to set VM_PFNMAP, so this cannot put the flags in an
> +        // invalid state.
> +        unsafe { self.update_flags(flags::MIXEDMAP, 0) };
> +
> +        // SAFETY: We just set `VM_MIXEDMAP` on the vma.
> +        unsafe { VmAreaMixedMap::from_raw(self.vma.as_ptr()) }
> +    }
> +
> +    /// Set the `VM_IO` flag on this vma.
> +    ///
> +    /// This marks the vma as being a memory-mapped I/O region.
> +    #[inline]
> +    pub fn set_io(&self) {
> +        // SAFETY: Setting the VM_IO flag is always okay.
> +        unsafe { self.update_flags(flags::IO, 0) };
> +    }
> +
> +    /// Set the `VM_DONTEXPAND` flag on this vma.
> +    ///
> +    /// This prevents the vma from being expanded with `mremap()`.
> +    #[inline]
> +    pub fn set_dontexpand(&self) {
> +        // SAFETY: Setting the VM_DONTEXPAND flag is always okay.
> +        unsafe { self.update_flags(flags::DONTEXPAND, 0) };
> +    }
> +
> +    /// Set the `VM_DONTCOPY` flag on this vma.
> +    ///
> +    /// This prevents the vma from being copied on fork. This option is only permanent if `VM_IO`
> +    /// is set.
> +    #[inline]
> +    pub fn set_dontcopy(&self) {
> +        // SAFETY: Setting the VM_DONTCOPY flag is always okay.
> +        unsafe { self.update_flags(flags::DONTCOPY, 0) };
> +    }
> +
> +    /// Set the `VM_DONTDUMP` flag on this vma.
> +    ///
> +    /// This prevents the vma from being included in core dumps. This option is only permanent if
> +    /// `VM_IO` is set.
> +    #[inline]
> +    pub fn set_dontdump(&self) {
> +        // SAFETY: Setting the VM_DONTDUMP flag is always okay.
> +        unsafe { self.update_flags(flags::DONTDUMP, 0) };
> +    }
> +
> +    /// Returns whether `VM_READ` is set.
> +    ///
> +    /// This flag indicates whether userspace is mapping this vma as readable.
> +    #[inline]
> +    pub fn get_read(&self) -> bool {
> +        (self.flags() & flags::READ) != 0
> +    }
> +
> +    /// Try to clear the `VM_MAYREAD` flag, failing if `VM_READ` is set.
> +    ///
> +    /// This flag indicates whether userspace is allowed to make this vma readable with
> +    /// `mprotect()`.
> +    ///
> +    /// Note that this operation is irreversible. Once `VM_MAYREAD` has been cleared, it can never
> +    /// be set again.
> +    #[inline]
> +    pub fn try_clear_mayread(&self) -> Result {
> +        if self.get_read() {
> +            return Err(EINVAL);
> +        }
> +        // SAFETY: Clearing `VM_MAYREAD` is okay when `VM_READ` is not set.
> +        unsafe { self.update_flags(0, flags::MAYREAD) };
> +        Ok(())
> +    }
> +
> +    /// Returns whether `VM_WRITE` is set.
> +    ///
> +    /// This flag indicates whether userspace is mapping this vma as writable.
> +    #[inline]
> +    pub fn get_write(&self) -> bool {
> +        (self.flags() & flags::WRITE) != 0
> +    }
> +
> +    /// Try to clear the `VM_MAYWRITE` flag, failing if `VM_WRITE` is set.
> +    ///
> +    /// This flag indicates whether userspace is allowed to make this vma writable with
> +    /// `mprotect()`.
> +    ///
> +    /// Note that this operation is irreversible. Once `VM_MAYWRITE` has been cleared, it can never
> +    /// be set again.
> +    #[inline]
> +    pub fn try_clear_maywrite(&self) -> Result {
> +        if self.get_write() {
> +            return Err(EINVAL);
> +        }
> +        // SAFETY: Clearing `VM_MAYWRITE` is okay when `VM_WRITE` is not set.
> +        unsafe { self.update_flags(0, flags::MAYWRITE) };
> +        Ok(())
> +    }
> +
> +    /// Returns whether `VM_EXEC` is set.
> +    ///
> +    /// This flag indicates whether userspace is mapping this vma as executable.
> +    #[inline]
> +    pub fn get_exec(&self) -> bool {
> +        (self.flags() & flags::EXEC) != 0
> +    }
> +
> +    /// Try to clear the `VM_MAYEXEC` flag, failing if `VM_EXEC` is set.
> +    ///
> +    /// This flag indicates whether userspace is allowed to make this vma executable with
> +    /// `mprotect()`.
> +    ///
> +    /// Note that this operation is irreversible. Once `VM_MAYEXEC` has been cleared, it can never
> +    /// be set again.
> +    #[inline]
> +    pub fn try_clear_mayexec(&self) -> Result {
> +        if self.get_exec() {
> +            return Err(EINVAL);
> +        }
> +        // SAFETY: Clearing `VM_MAYEXEC` is okay when `VM_EXEC` is not set.
> +        unsafe { self.update_flags(0, flags::MAYEXEC) };
> +        Ok(())
> +    }
> +}
> +
>  /// The integer type used for vma flags.
>  #[doc(inline)]
>  pub use bindings::vm_flags_t;
>
> --
> 2.47.0.371.ga323438b13-goog
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 6/8] mm: rust: add VmAreaNew for f_ops->mmap()
  2024-11-22 15:40 ` [PATCH v9 6/8] mm: rust: add VmAreaNew for f_ops->mmap() Alice Ryhl
  2024-11-22 17:33   ` Lorenzo Stoakes
@ 2024-11-26 21:29   ` Jann Horn
  2024-11-27 12:38     ` Alice Ryhl
  1 sibling, 1 reply; 38+ messages in thread
From: Jann Horn @ 2024-11-26 21:29 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
	John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Suren Baghdasaryan,
	Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, linux-kernel, linux-mm, rust-for-linux,
	Andreas Hindborg

On Fri, Nov 22, 2024 at 4:41 PM Alice Ryhl <aliceryhl@google.com> wrote:
> This type will be used when setting up a new vma in an f_ops->mmap()
> hook. Using a separate type from VmAreaRef allows us to have a separate
> set of operations that you are only able to use during the mmap() hook.
> For example, the VM_MIXEDMAP flag must not be changed after the initial
> setup that happens during the f_ops->mmap() hook.
>
> To avoid setting invalid flag values, the methods for clearing
> VM_MAYWRITE and similar involve a check of VM_WRITE, and return an error
> if VM_WRITE is set. Trying to use `try_clear_maywrite` without checking
> the return value results in a compilation error because the `Result`
> type is marked #[must_use].
>
> For now, there's only a method for VM_MIXEDMAP and not VM_PFNMAP. When
> we add a VM_PFNMAP method, we will need some way to prevent you from
> setting both VM_MIXEDMAP and VM_PFNMAP on the same vma.
>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>

Thanks, this looks really neat!

Reviewed-by: Jann Horn <jannh@google.com>

> +    /// Set the `VM_IO` flag on this vma.
> +    ///
> +    /// This marks the vma as being a memory-mapped I/O region.

nit: VM_IO isn't really exclusively used for MMIO; the header comment
says "Memory mapped I/O or similar", while the comment in
remap_pfn_range_internal() says "VM_IO tells people not to look at
these pages (accesses can have side effects)". But I don't really have
a good definition of what VM_IO actually means; so I don't really have
a concrete suggestion for what do do here. So my comment isn't very
actionable, I guess it's fine to leave this as-is unless someone
actually has a good definition...

> +    #[inline]
> +    pub fn set_io(&self) {
> +        // SAFETY: Setting the VM_IO flag is always okay.
> +        unsafe { self.update_flags(flags::IO, 0) };
> +    }


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 6/8] mm: rust: add VmAreaNew for f_ops->mmap()
  2024-11-26 21:29   ` Jann Horn
@ 2024-11-27 12:38     ` Alice Ryhl
  2024-11-27 16:19       ` Jann Horn
  0 siblings, 1 reply; 38+ messages in thread
From: Alice Ryhl @ 2024-11-27 12:38 UTC (permalink / raw)
  To: Jann Horn
  Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
	John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Suren Baghdasaryan,
	Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, linux-kernel, linux-mm, rust-for-linux,
	Andreas Hindborg

On Tue, Nov 26, 2024 at 10:30 PM Jann Horn <jannh@google.com> wrote:
>
> On Fri, Nov 22, 2024 at 4:41 PM Alice Ryhl <aliceryhl@google.com> wrote:
> > This type will be used when setting up a new vma in an f_ops->mmap()
> > hook. Using a separate type from VmAreaRef allows us to have a separate
> > set of operations that you are only able to use during the mmap() hook.
> > For example, the VM_MIXEDMAP flag must not be changed after the initial
> > setup that happens during the f_ops->mmap() hook.
> >
> > To avoid setting invalid flag values, the methods for clearing
> > VM_MAYWRITE and similar involve a check of VM_WRITE, and return an error
> > if VM_WRITE is set. Trying to use `try_clear_maywrite` without checking
> > the return value results in a compilation error because the `Result`
> > type is marked #[must_use].
> >
> > For now, there's only a method for VM_MIXEDMAP and not VM_PFNMAP. When
> > we add a VM_PFNMAP method, we will need some way to prevent you from
> > setting both VM_MIXEDMAP and VM_PFNMAP on the same vma.
> >
> > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
>
> Thanks, this looks really neat!
>
> Reviewed-by: Jann Horn <jannh@google.com>
>
> > +    /// Set the `VM_IO` flag on this vma.
> > +    ///
> > +    /// This marks the vma as being a memory-mapped I/O region.
>
> nit: VM_IO isn't really exclusively used for MMIO; the header comment
> says "Memory mapped I/O or similar", while the comment in
> remap_pfn_range_internal() says "VM_IO tells people not to look at
> these pages (accesses can have side effects)". But I don't really have
> a good definition of what VM_IO actually means; so I don't really have
> a concrete suggestion for what do do here. So my comment isn't very
> actionable, I guess it's fine to leave this as-is unless someone
> actually has a good definition...

I can use this comment?

This is used for memory mapped IO and similar. The flag tells other
parts of the kernel to not look at the pages. For memory mapped IO
this is useful as accesses to the pages could have side effects.

Alice


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 6/8] mm: rust: add VmAreaNew for f_ops->mmap()
  2024-11-27 12:38     ` Alice Ryhl
@ 2024-11-27 16:19       ` Jann Horn
  0 siblings, 0 replies; 38+ messages in thread
From: Jann Horn @ 2024-11-27 16:19 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
	John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Suren Baghdasaryan,
	Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, linux-kernel, linux-mm, rust-for-linux,
	Andreas Hindborg

On Wed, Nov 27, 2024 at 1:38 PM Alice Ryhl <aliceryhl@google.com> wrote:
> On Tue, Nov 26, 2024 at 10:30 PM Jann Horn <jannh@google.com> wrote:
> > On Fri, Nov 22, 2024 at 4:41 PM Alice Ryhl <aliceryhl@google.com> wrote:
> > > This type will be used when setting up a new vma in an f_ops->mmap()
> > > hook. Using a separate type from VmAreaRef allows us to have a separate
> > > set of operations that you are only able to use during the mmap() hook.
> > > For example, the VM_MIXEDMAP flag must not be changed after the initial
> > > setup that happens during the f_ops->mmap() hook.
> > >
> > > To avoid setting invalid flag values, the methods for clearing
> > > VM_MAYWRITE and similar involve a check of VM_WRITE, and return an error
> > > if VM_WRITE is set. Trying to use `try_clear_maywrite` without checking
> > > the return value results in a compilation error because the `Result`
> > > type is marked #[must_use].
> > >
> > > For now, there's only a method for VM_MIXEDMAP and not VM_PFNMAP. When
> > > we add a VM_PFNMAP method, we will need some way to prevent you from
> > > setting both VM_MIXEDMAP and VM_PFNMAP on the same vma.
> > >
> > > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> >
> > Thanks, this looks really neat!
> >
> > Reviewed-by: Jann Horn <jannh@google.com>
> >
> > > +    /// Set the `VM_IO` flag on this vma.
> > > +    ///
> > > +    /// This marks the vma as being a memory-mapped I/O region.
> >
> > nit: VM_IO isn't really exclusively used for MMIO; the header comment
> > says "Memory mapped I/O or similar", while the comment in
> > remap_pfn_range_internal() says "VM_IO tells people not to look at
> > these pages (accesses can have side effects)". But I don't really have
> > a good definition of what VM_IO actually means; so I don't really have
> > a concrete suggestion for what do do here. So my comment isn't very
> > actionable, I guess it's fine to leave this as-is unless someone
> > actually has a good definition...
>
> I can use this comment?
>
> This is used for memory mapped IO and similar. The flag tells other
> parts of the kernel to not look at the pages. For memory mapped IO
> this is useful as accesses to the pages could have side effects.

Yeah, sounds reasonable.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v9 7/8] rust: miscdevice: add mmap support
  2024-11-22 15:40 [PATCH v9 0/8] Rust support for mm_struct, vm_area_struct, and mmap Alice Ryhl
                   ` (5 preceding siblings ...)
  2024-11-22 15:40 ` [PATCH v9 6/8] mm: rust: add VmAreaNew for f_ops->mmap() Alice Ryhl
@ 2024-11-22 15:40 ` Alice Ryhl
  2024-11-22 15:40 ` [PATCH v9 8/8] task: rust: rework how current is accessed Alice Ryhl
  7 siblings, 0 replies; 38+ messages in thread
From: Alice Ryhl @ 2024-11-22 15:40 UTC (permalink / raw)
  To: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
	John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan
  Cc: Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, linux-kernel, linux-mm, rust-for-linux, Alice Ryhl,
	Andreas Hindborg

Add the ability to write a file_operations->mmap hook in Rust when using
the miscdevice abstraction. The `vma` argument to the `mmap` hook uses
the `VmAreaNew` type from the previous commit; this type provides the
correct set of operations for a file_operations->mmap hook.

Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
 rust/kernel/miscdevice.rs | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/rust/kernel/miscdevice.rs b/rust/kernel/miscdevice.rs
index 7e2a79b3ae26..4e4b9476e092 100644
--- a/rust/kernel/miscdevice.rs
+++ b/rust/kernel/miscdevice.rs
@@ -11,6 +11,7 @@
 use crate::{
     bindings,
     error::{to_result, Error, Result, VTABLE_DEFAULT_ERROR},
+    mm::virt::VmAreaNew,
     prelude::*,
     str::CStr,
     types::{ForeignOwnable, Opaque},
@@ -110,6 +111,11 @@ fn release(device: Self::Ptr) {
         drop(device);
     }
 
+    /// Handle for mmap.
+    fn mmap(_device: <Self::Ptr as ForeignOwnable>::Borrowed<'_>, _vma: &VmAreaNew) -> Result {
+        kernel::build_error(VTABLE_DEFAULT_ERROR)
+    }
+
     /// Handler for ioctls.
     ///
     /// The `cmd` argument is usually manipulated using the utilties in [`kernel::ioctl`].
@@ -156,6 +162,7 @@ impl<T: MiscDevice> VtableHelper<T> {
         const VTABLE: bindings::file_operations = bindings::file_operations {
             open: Some(fops_open::<T>),
             release: Some(fops_release::<T>),
+            mmap: maybe_fn(T::HAS_MMAP, fops_mmap::<T>),
             unlocked_ioctl: maybe_fn(T::HAS_IOCTL, fops_ioctl::<T>),
             #[cfg(CONFIG_COMPAT)]
             compat_ioctl: if T::HAS_COMPAT_IOCTL {
@@ -216,6 +223,27 @@ impl<T: MiscDevice> VtableHelper<T> {
     0
 }
 
+/// # Safety
+///
+/// `file` must be a valid file that is associated with a `MiscDeviceRegistration<T>`.
+/// `vma` must be a vma that is currently being mmap'ed with this file.
+unsafe extern "C" fn fops_mmap<T: MiscDevice>(
+    file: *mut bindings::file,
+    vma: *mut bindings::vm_area_struct,
+) -> c_int {
+    // SAFETY: The mmap call of a file can access the private data.
+    let private = unsafe { (*file).private_data };
+    // SAFETY: Mmap calls can borrow the private data of the file.
+    let device = unsafe { <T::Ptr as ForeignOwnable>::borrow(private) };
+    // SAFETY: The caller provides a vma that is undergoing initial VMA setup.
+    let area = unsafe { VmAreaNew::from_raw(vma) };
+
+    match T::mmap(device, area) {
+        Ok(()) => 0,
+        Err(err) => err.to_errno() as c_int,
+    }
+}
+
 /// # Safety
 ///
 /// `file` must be a valid file that is associated with a `MiscDeviceRegistration<T>`.

-- 
2.47.0.371.ga323438b13-goog



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v9 8/8] task: rust: rework how current is accessed
  2024-11-22 15:40 [PATCH v9 0/8] Rust support for mm_struct, vm_area_struct, and mmap Alice Ryhl
                   ` (6 preceding siblings ...)
  2024-11-22 15:40 ` [PATCH v9 7/8] rust: miscdevice: add mmap support Alice Ryhl
@ 2024-11-22 15:40 ` Alice Ryhl
  2024-11-22 15:53   ` Alice Ryhl
                     ` (3 more replies)
  7 siblings, 4 replies; 38+ messages in thread
From: Alice Ryhl @ 2024-11-22 15:40 UTC (permalink / raw)
  To: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
	John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan
  Cc: Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, linux-kernel, linux-mm, rust-for-linux, Alice Ryhl,
	Andreas Hindborg

Introduce a new type called `CurrentTask` that lets you perform various
operations that are only safe on the `current` task. Use the new type to
provide a way to access the current mm without incrementing its
refcount.

With this change, you can write stuff such as

	let vma = current!().mm().lock_vma_under_rcu(addr);

without incrementing any refcounts.

Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
Reviewers: Does accessing task->mm on a non-current task require rcu
protection?

Christian: If you submit the PidNamespace abstractions this cycle, I'll
update this to also apply to PidNamespace.
---
 rust/kernel/mm.rs   | 22 ------------------
 rust/kernel/task.rs | 64 ++++++++++++++++++++++++++++++++++++++++++-----------
 2 files changed, 51 insertions(+), 35 deletions(-)

diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
index 50f4861ae4b9..f7d1079391ef 100644
--- a/rust/kernel/mm.rs
+++ b/rust/kernel/mm.rs
@@ -142,28 +142,6 @@ fn deref(&self) -> &MmWithUser {
 
 // These methods are safe to call even if `mm_users` is zero.
 impl Mm {
-    /// Call `mmgrab` on `current.mm`.
-    #[inline]
-    pub fn mmgrab_current() -> Option<ARef<Mm>> {
-        // SAFETY: It's safe to get the `mm` field from current.
-        let mm = unsafe {
-            let current = bindings::get_current();
-            (*current).mm
-        };
-
-        if mm.is_null() {
-            return None;
-        }
-
-        // SAFETY: The value of `current->mm` is guaranteed to be null or a valid `mm_struct`. We
-        // just checked that it's not null. Furthermore, the returned `&Mm` is valid only for the
-        // duration of this function, and `current->mm` will stay valid for that long.
-        let mm = unsafe { Mm::from_raw(mm) };
-
-        // This increments the refcount using `mmgrab`.
-        Some(ARef::from(mm))
-    }
-
     /// Returns a raw pointer to the inner `mm_struct`.
     #[inline]
     pub fn as_raw(&self) -> *mut bindings::mm_struct {
diff --git a/rust/kernel/task.rs b/rust/kernel/task.rs
index 9e59d86da42d..103d235eb844 100644
--- a/rust/kernel/task.rs
+++ b/rust/kernel/task.rs
@@ -94,6 +94,26 @@ unsafe impl Send for Task {}
 // synchronised by C code (e.g., `signal_pending`).
 unsafe impl Sync for Task {}
 
+/// Represents a [`Task`] obtained from the `current` global.
+///
+/// This type exists to provide more efficient operations that are only valid on the current task.
+/// For example, to retrieve the pid-namespace of a task, you must use rcu protection unless it is
+/// the current task.
+///
+/// # Invariants
+///
+/// Must be equal to `current` of some thread that is currently running somewhere.
+pub struct CurrentTask(Task);
+
+// Make all `Task` methods available on `CurrentTask`.
+impl Deref for CurrentTask {
+    type Target = Task;
+    #[inline]
+    fn deref(&self) -> &Task {
+        &self.0
+    }
+}
+
 /// The type of process identifiers (PIDs).
 type Pid = bindings::pid_t;
 
@@ -121,27 +141,25 @@ pub fn current_raw() -> *mut bindings::task_struct {
     /// # Safety
     ///
     /// Callers must ensure that the returned object doesn't outlive the current task/thread.
-    pub unsafe fn current() -> impl Deref<Target = Task> {
-        struct TaskRef<'a> {
-            task: &'a Task,
-            _not_send: NotThreadSafe,
+    pub unsafe fn current() -> impl Deref<Target = CurrentTask> {
+        struct TaskRef {
+            task: *const CurrentTask,
         }
 
-        impl Deref for TaskRef<'_> {
-            type Target = Task;
+        impl Deref for TaskRef {
+            type Target = CurrentTask;
 
             fn deref(&self) -> &Self::Target {
-                self.task
+                // SAFETY: The returned reference borrows from this `TaskRef`, so it cannot outlive
+                // the `TaskRef`, which the caller of `Task::current()` has promised will not
+                // outlive the task/thread for which `self.task` is the `current` pointer. Thus, it
+                // is okay to return a `CurrentTask` reference here.
+                unsafe { &*self.task }
             }
         }
 
-        let current = Task::current_raw();
         TaskRef {
-            // SAFETY: If the current thread is still running, the current task is valid. Given
-            // that `TaskRef` is not `Send`, we know it cannot be transferred to another thread
-            // (where it could potentially outlive the caller).
-            task: unsafe { &*current.cast() },
-            _not_send: NotThreadSafe,
+            task: Task::current_raw().cast(),
         }
     }
 
@@ -203,6 +221,26 @@ pub fn wake_up(&self) {
     }
 }
 
+impl CurrentTask {
+    /// Access the address space of this task.
+    ///
+    /// To increment the refcount of the referenced `mm`, you can use `ARef::from`.
+    #[inline]
+    pub fn mm(&self) -> Option<&MmWithUser> {
+        let mm = unsafe { (*self.as_ptr()).mm };
+
+        if mm.is_null() {
+            None
+        } else {
+            // SAFETY: If `current->mm` is non-null, then it references a valid mm with a non-zero
+            // value of `mm_users`. The returned `&MmWithUser` borrows from `CurrentTask`, so the
+            // `&MmWithUser` cannot escape the current task, meaning `mm_users` can't reach zero
+            // while the reference is still live.
+            Some(unsafe { MmWithUser::from_raw(mm) })
+        }
+    }
+}
+
 // SAFETY: The type invariants guarantee that `Task` is always refcounted.
 unsafe impl crate::types::AlwaysRefCounted for Task {
     fn inc_ref(&self) {

-- 
2.47.0.371.ga323438b13-goog



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 8/8] task: rust: rework how current is accessed
  2024-11-22 15:40 ` [PATCH v9 8/8] task: rust: rework how current is accessed Alice Ryhl
@ 2024-11-22 15:53   ` Alice Ryhl
  2024-11-22 17:34     ` Lorenzo Stoakes
  2024-11-22 17:54   ` Lorenzo Stoakes
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 38+ messages in thread
From: Alice Ryhl @ 2024-11-22 15:53 UTC (permalink / raw)
  To: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
	John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan
  Cc: Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, linux-kernel, linux-mm, rust-for-linux,
	Andreas Hindborg

On Fri, Nov 22, 2024 at 4:41 PM Alice Ryhl <aliceryhl@google.com> wrote:
>
> Introduce a new type called `CurrentTask` that lets you perform various
> operations that are only safe on the `current` task. Use the new type to
> provide a way to access the current mm without incrementing its
> refcount.
>
> With this change, you can write stuff such as
>
>         let vma = current!().mm().lock_vma_under_rcu(addr);
>
> without incrementing any refcounts.
>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>

Oh, that's awkward, I was testing this change using a config file that
was missing CONFIG_RUST=y, so it didn't compile the code at all. You
need the following imports for this to work:

diff --git a/rust/kernel/task.rs b/rust/kernel/task.rs
index 103d235eb844..60659076997a 100644
--- a/rust/kernel/task.rs
+++ b/rust/kernel/task.rs
@@ -7,7 +7,8 @@
 use crate::{
     bindings,
     ffi::{c_int, c_long, c_uint},
-    types::{NotThreadSafe, Opaque},
+    mm::MmWithUser,
+    types::Opaque,
 };
 use core::{
     cmp::{Eq, PartialEq},

Otherwise the code should be correct. You can fetch the tree with this fixed at:
https://github.com/Darksonn/linux/commits/b4/vma/

I'll fix it in the next version, but I will wait for review before I send that.

Alice


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 8/8] task: rust: rework how current is accessed
  2024-11-22 15:53   ` Alice Ryhl
@ 2024-11-22 17:34     ` Lorenzo Stoakes
  0 siblings, 0 replies; 38+ messages in thread
From: Lorenzo Stoakes @ 2024-11-22 17:34 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Miguel Ojeda, Matthew Wilcox, Vlastimil Babka, John Hubbard,
	Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
	Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, linux-kernel, linux-mm, rust-for-linux,
	Andreas Hindborg

On Fri, Nov 22, 2024 at 04:53:58PM +0100, Alice Ryhl wrote:
> On Fri, Nov 22, 2024 at 4:41 PM Alice Ryhl <aliceryhl@google.com> wrote:
> >
> > Introduce a new type called `CurrentTask` that lets you perform various
> > operations that are only safe on the `current` task. Use the new type to
> > provide a way to access the current mm without incrementing its
> > refcount.
> >
> > With this change, you can write stuff such as
> >
> >         let vma = current!().mm().lock_vma_under_rcu(addr);
> >
> > without incrementing any refcounts.
> >
> > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
>
> Oh, that's awkward, I was testing this change using a config file that
> was missing CONFIG_RUST=y, so it didn't compile the code at all. You
> need the following imports for this to work:
>
> diff --git a/rust/kernel/task.rs b/rust/kernel/task.rs
> index 103d235eb844..60659076997a 100644
> --- a/rust/kernel/task.rs
> +++ b/rust/kernel/task.rs
> @@ -7,7 +7,8 @@
>  use crate::{
>      bindings,
>      ffi::{c_int, c_long, c_uint},
> -    types::{NotThreadSafe, Opaque},
> +    mm::MmWithUser,
> +    types::Opaque,
>  };
>  use core::{
>      cmp::{Eq, PartialEq},
>
> Otherwise the code should be correct. You can fetch the tree with this fixed at:
> https://github.com/Darksonn/linux/commits/b4/vma/
>
> I'll fix it in the next version, but I will wait for review before I send that.

Sure, no problem, we can just make this a predicate for the ack.

>
> Alice


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 8/8] task: rust: rework how current is accessed
  2024-11-22 15:40 ` [PATCH v9 8/8] task: rust: rework how current is accessed Alice Ryhl
  2024-11-22 15:53   ` Alice Ryhl
@ 2024-11-22 17:54   ` Lorenzo Stoakes
  2024-11-22 18:51     ` Alice Ryhl
  2024-11-22 18:03   ` Boqun Feng
  2024-11-26 17:14   ` Jann Horn
  3 siblings, 1 reply; 38+ messages in thread
From: Lorenzo Stoakes @ 2024-11-22 17:54 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Miguel Ojeda, Matthew Wilcox, Vlastimil Babka, John Hubbard,
	Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
	Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, linux-kernel, linux-mm, rust-for-linux,
	Andreas Hindborg

On Fri, Nov 22, 2024 at 03:40:33PM +0000, Alice Ryhl wrote:
> Introduce a new type called `CurrentTask` that lets you perform various
> operations that are only safe on the `current` task. Use the new type to
> provide a way to access the current mm without incrementing its
> refcount.

Nice!

>
> With this change, you can write stuff such as
>
> 	let vma = current!().mm().lock_vma_under_rcu(addr);
>
> without incrementing any refcounts.
>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>

On assumption that the problem you reference with the rust imports is
corrected in v10, and that what you are doing with current_raw() is
sensible, then:

Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

Thanks!

> ---
> Reviewers: Does accessing task->mm on a non-current task require rcu
> protection?

Hm I am not actually sure, but it seems like you probably do, and I would say
you need the task lock right?

Looking at find_lock_task_mm() as used by the oomk for instance suggests as much.

>
> Christian: If you submit the PidNamespace abstractions this cycle, I'll
> update this to also apply to PidNamespace.
> ---
>  rust/kernel/mm.rs   | 22 ------------------
>  rust/kernel/task.rs | 64 ++++++++++++++++++++++++++++++++++++++++++-----------
>  2 files changed, 51 insertions(+), 35 deletions(-)
>
> diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
> index 50f4861ae4b9..f7d1079391ef 100644
> --- a/rust/kernel/mm.rs
> +++ b/rust/kernel/mm.rs
> @@ -142,28 +142,6 @@ fn deref(&self) -> &MmWithUser {
>
>  // These methods are safe to call even if `mm_users` is zero.
>  impl Mm {
> -    /// Call `mmgrab` on `current.mm`.
> -    #[inline]
> -    pub fn mmgrab_current() -> Option<ARef<Mm>> {
> -        // SAFETY: It's safe to get the `mm` field from current.
> -        let mm = unsafe {
> -            let current = bindings::get_current();
> -            (*current).mm
> -        };
> -
> -        if mm.is_null() {
> -            return None;
> -        }
> -
> -        // SAFETY: The value of `current->mm` is guaranteed to be null or a valid `mm_struct`. We
> -        // just checked that it's not null. Furthermore, the returned `&Mm` is valid only for the
> -        // duration of this function, and `current->mm` will stay valid for that long.
> -        let mm = unsafe { Mm::from_raw(mm) };
> -
> -        // This increments the refcount using `mmgrab`.
> -        Some(ARef::from(mm))
> -    }
> -

It's nice to drop this to discourage the unusual thing of grabbing current's mm
and incrementing reference count.

>      /// Returns a raw pointer to the inner `mm_struct`.
>      #[inline]
>      pub fn as_raw(&self) -> *mut bindings::mm_struct {
> diff --git a/rust/kernel/task.rs b/rust/kernel/task.rs
> index 9e59d86da42d..103d235eb844 100644
> --- a/rust/kernel/task.rs
> +++ b/rust/kernel/task.rs
> @@ -94,6 +94,26 @@ unsafe impl Send for Task {}
>  // synchronised by C code (e.g., `signal_pending`).
>  unsafe impl Sync for Task {}
>
> +/// Represents a [`Task`] obtained from the `current` global.
> +///
> +/// This type exists to provide more efficient operations that are only valid on the current task.
> +/// For example, to retrieve the pid-namespace of a task, you must use rcu protection unless it is
> +/// the current task.
> +///
> +/// # Invariants
> +///
> +/// Must be equal to `current` of some thread that is currently running somewhere.
> +pub struct CurrentTask(Task);

Nice, I do like the ability to express abstractions like this...

> +
> +// Make all `Task` methods available on `CurrentTask`.
> +impl Deref for CurrentTask {
> +    type Target = Task;
> +    #[inline]
> +    fn deref(&self) -> &Task {
> +        &self.0
> +    }
> +}
> +

It's nice to be able to 'alias' types like this too (ok I'm sure that's not
quite the write way of describing but you know what I mean), so you can abstract
something, then very simply create variants that have the same methods but
different attributes otherwise.

>  /// The type of process identifiers (PIDs).
>  type Pid = bindings::pid_t;
>
> @@ -121,27 +141,25 @@ pub fn current_raw() -> *mut bindings::task_struct {
>      /// # Safety
>      ///
>      /// Callers must ensure that the returned object doesn't outlive the current task/thread.
> -    pub unsafe fn current() -> impl Deref<Target = Task> {
> -        struct TaskRef<'a> {
> -            task: &'a Task,
> -            _not_send: NotThreadSafe,
> +    pub unsafe fn current() -> impl Deref<Target = CurrentTask> {
> +        struct TaskRef {
> +            task: *const CurrentTask,
>          }

Why do we drop the NotThreadSafe bit here? And it seems like the 'a lifetime
stuff has gone too?

I'm guessing the lifetime stuff is because of the SAFETY comment below about
assumptions about lifetime?

>
> -        impl Deref for TaskRef<'_> {
> -            type Target = Task;
> +        impl Deref for TaskRef {
> +            type Target = CurrentTask;
>
>              fn deref(&self) -> &Self::Target {
> -                self.task
> +                // SAFETY: The returned reference borrows from this `TaskRef`, so it cannot outlive
> +                // the `TaskRef`, which the caller of `Task::current()` has promised will not
> +                // outlive the task/thread for which `self.task` is the `current` pointer. Thus, it
> +                // is okay to return a `CurrentTask` reference here.
> +                unsafe { &*self.task }
>              }
>          }
>
> -        let current = Task::current_raw();
>          TaskRef {
> -            // SAFETY: If the current thread is still running, the current task is valid. Given
> -            // that `TaskRef` is not `Send`, we know it cannot be transferred to another thread
> -            // (where it could potentially outlive the caller).
> -            task: unsafe { &*current.cast() },
> -            _not_send: NotThreadSafe,
> +            task: Task::current_raw().cast(),
>          }

I guess these changes align with the changes above?

>      }
>
> @@ -203,6 +221,26 @@ pub fn wake_up(&self) {
>      }
>  }
>
> +impl CurrentTask {
> +    /// Access the address space of this task.
> +    ///
> +    /// To increment the refcount of the referenced `mm`, you can use `ARef::from`.
> +    #[inline]
> +    pub fn mm(&self) -> Option<&MmWithUser> {
> +        let mm = unsafe { (*self.as_ptr()).mm };
> +
> +        if mm.is_null() {
> +            None
> +        } else {
> +            // SAFETY: If `current->mm` is non-null, then it references a valid mm with a non-zero
> +            // value of `mm_users`. The returned `&MmWithUser` borrows from `CurrentTask`, so the
> +            // `&MmWithUser` cannot escape the current task, meaning `mm_users` can't reach zero
> +            // while the reference is still live.
> +            Some(unsafe { MmWithUser::from_raw(mm) })
> +        }
> +    }
> +}

Nice!

> +
>  // SAFETY: The type invariants guarantee that `Task` is always refcounted.
>  unsafe impl crate::types::AlwaysRefCounted for Task {
>      fn inc_ref(&self) {
>
> --
> 2.47.0.371.ga323438b13-goog
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 8/8] task: rust: rework how current is accessed
  2024-11-22 17:54   ` Lorenzo Stoakes
@ 2024-11-22 18:51     ` Alice Ryhl
  0 siblings, 0 replies; 38+ messages in thread
From: Alice Ryhl @ 2024-11-22 18:51 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Miguel Ojeda, Matthew Wilcox, Vlastimil Babka, John Hubbard,
	Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
	Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, linux-kernel, linux-mm, rust-for-linux,
	Andreas Hindborg

On Fri, Nov 22, 2024 at 6:55 PM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Fri, Nov 22, 2024 at 03:40:33PM +0000, Alice Ryhl wrote:
> > Introduce a new type called `CurrentTask` that lets you perform various
> > operations that are only safe on the `current` task. Use the new type to
> > provide a way to access the current mm without incrementing its
> > refcount.
>
> Nice!
>
> >
> > With this change, you can write stuff such as
> >
> >       let vma = current!().mm().lock_vma_under_rcu(addr);
> >
> > without incrementing any refcounts.
> >
> > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
>
> On assumption that the problem you reference with the rust imports is
> corrected in v10, and that what you are doing with current_raw() is
> sensible, then:
>
> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> Thanks!
>
> > ---
> > Reviewers: Does accessing task->mm on a non-current task require rcu
> > protection?
>
> Hm I am not actually sure, but it seems like you probably do, and I would say
> you need the task lock right?
>
> Looking at find_lock_task_mm() as used by the oomk for instance suggests as much.

Okay, sounds complicated. I'm not going to bother with that right now.

> >  /// The type of process identifiers (PIDs).
> >  type Pid = bindings::pid_t;
> >
> > @@ -121,27 +141,25 @@ pub fn current_raw() -> *mut bindings::task_struct {
> >      /// # Safety
> >      ///
> >      /// Callers must ensure that the returned object doesn't outlive the current task/thread.
> > -    pub unsafe fn current() -> impl Deref<Target = Task> {
> > -        struct TaskRef<'a> {
> > -            task: &'a Task,
> > -            _not_send: NotThreadSafe,
> > +    pub unsafe fn current() -> impl Deref<Target = CurrentTask> {
> > +        struct TaskRef {
> > +            task: *const CurrentTask,
> >          }
>
> Why do we drop the NotThreadSafe bit here? And it seems like the 'a lifetime
> stuff has gone too?
>
> I'm guessing the lifetime stuff is because of the SAFETY comment below about
> assumptions about lifetime?

I dropped the lifetime because it's not doing anything. As for NotThreadSafe:

1. See thread with Boqun.
2. Raw pointers are already considered not thread safe by default, so
the *const CurrentTask field has the same effect.

Alice


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 8/8] task: rust: rework how current is accessed
  2024-11-22 15:40 ` [PATCH v9 8/8] task: rust: rework how current is accessed Alice Ryhl
  2024-11-22 15:53   ` Alice Ryhl
  2024-11-22 17:54   ` Lorenzo Stoakes
@ 2024-11-22 18:03   ` Boqun Feng
  2024-11-22 18:48     ` Alice Ryhl
  2024-11-26 17:14   ` Jann Horn
  3 siblings, 1 reply; 38+ messages in thread
From: Boqun Feng @ 2024-11-22 18:03 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
	John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
	Alex Gaynor, Gary Guo, Björn Roy Baron, Benno Lossin,
	linux-kernel, linux-mm, rust-for-linux, Andreas Hindborg

On Fri, Nov 22, 2024 at 03:40:33PM +0000, Alice Ryhl wrote:
> Introduce a new type called `CurrentTask` that lets you perform various
> operations that are only safe on the `current` task. Use the new type to
> provide a way to access the current mm without incrementing its
> refcount.
> 
> With this change, you can write stuff such as
> 
> 	let vma = current!().mm().lock_vma_under_rcu(addr);
> 
> without incrementing any refcounts.
> 
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> ---
> Reviewers: Does accessing task->mm on a non-current task require rcu
> protection?
> 
> Christian: If you submit the PidNamespace abstractions this cycle, I'll
> update this to also apply to PidNamespace.
> ---
>  rust/kernel/mm.rs   | 22 ------------------
>  rust/kernel/task.rs | 64 ++++++++++++++++++++++++++++++++++++++++++-----------
>  2 files changed, 51 insertions(+), 35 deletions(-)
> 
> diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
> index 50f4861ae4b9..f7d1079391ef 100644
> --- a/rust/kernel/mm.rs
> +++ b/rust/kernel/mm.rs
> @@ -142,28 +142,6 @@ fn deref(&self) -> &MmWithUser {
>  
>  // These methods are safe to call even if `mm_users` is zero.
>  impl Mm {
> -    /// Call `mmgrab` on `current.mm`.
> -    #[inline]
> -    pub fn mmgrab_current() -> Option<ARef<Mm>> {
> -        // SAFETY: It's safe to get the `mm` field from current.
> -        let mm = unsafe {
> -            let current = bindings::get_current();
> -            (*current).mm
> -        };
> -
> -        if mm.is_null() {
> -            return None;
> -        }
> -
> -        // SAFETY: The value of `current->mm` is guaranteed to be null or a valid `mm_struct`. We
> -        // just checked that it's not null. Furthermore, the returned `&Mm` is valid only for the
> -        // duration of this function, and `current->mm` will stay valid for that long.
> -        let mm = unsafe { Mm::from_raw(mm) };
> -
> -        // This increments the refcount using `mmgrab`.
> -        Some(ARef::from(mm))
> -    }
> -
>      /// Returns a raw pointer to the inner `mm_struct`.
>      #[inline]
>      pub fn as_raw(&self) -> *mut bindings::mm_struct {
> diff --git a/rust/kernel/task.rs b/rust/kernel/task.rs
> index 9e59d86da42d..103d235eb844 100644
> --- a/rust/kernel/task.rs
> +++ b/rust/kernel/task.rs
> @@ -94,6 +94,26 @@ unsafe impl Send for Task {}
>  // synchronised by C code (e.g., `signal_pending`).
>  unsafe impl Sync for Task {}
>  
> +/// Represents a [`Task`] obtained from the `current` global.
> +///
> +/// This type exists to provide more efficient operations that are only valid on the current task.
> +/// For example, to retrieve the pid-namespace of a task, you must use rcu protection unless it is
> +/// the current task.
> +///
> +/// # Invariants
> +///
> +/// Must be equal to `current` of some thread that is currently running somewhere.
> +pub struct CurrentTask(Task);
> +

I think you need to make `CurrentTask` `!Sync`, right? Otherwise, other
threads can access the shared reference of a `CurrentTask` and the ->mm
field. I'm thinking if we have a scoped thread/workqueue support in the
future:

	let task = current!();
	Workqueue::scoped(|s| {
	    s.spawn(|| {
	        let mm = task.mm();
		// do something with the mm
	    });
	});

> +// Make all `Task` methods available on `CurrentTask`.
> +impl Deref for CurrentTask {
> +    type Target = Task;
> +    #[inline]
> +    fn deref(&self) -> &Task {
> +        &self.0
> +    }
> +}
> +
>  /// The type of process identifiers (PIDs).
>  type Pid = bindings::pid_t;
>  
> @@ -121,27 +141,25 @@ pub fn current_raw() -> *mut bindings::task_struct {
>      /// # Safety
>      ///
>      /// Callers must ensure that the returned object doesn't outlive the current task/thread.
> -    pub unsafe fn current() -> impl Deref<Target = Task> {
> -        struct TaskRef<'a> {
> -            task: &'a Task,
> -            _not_send: NotThreadSafe,
> +    pub unsafe fn current() -> impl Deref<Target = CurrentTask> {
> +        struct TaskRef {
> +            task: *const CurrentTask,
>          }
>  
> -        impl Deref for TaskRef<'_> {
> -            type Target = Task;
> +        impl Deref for TaskRef {
> +            type Target = CurrentTask;
>  
>              fn deref(&self) -> &Self::Target {
> -                self.task
> +                // SAFETY: The returned reference borrows from this `TaskRef`, so it cannot outlive
> +                // the `TaskRef`, which the caller of `Task::current()` has promised will not
> +                // outlive the task/thread for which `self.task` is the `current` pointer. Thus, it
> +                // is okay to return a `CurrentTask` reference here.
> +                unsafe { &*self.task }
>              }
>          }
>  
> -        let current = Task::current_raw();
>          TaskRef {
> -            // SAFETY: If the current thread is still running, the current task is valid. Given
> -            // that `TaskRef` is not `Send`, we know it cannot be transferred to another thread
> -            // (where it could potentially outlive the caller).
> -            task: unsafe { &*current.cast() },
> -            _not_send: NotThreadSafe,
> +            task: Task::current_raw().cast(),
>          }
>      }
>  
> @@ -203,6 +221,26 @@ pub fn wake_up(&self) {
>      }
>  }
>  
> +impl CurrentTask {
> +    /// Access the address space of this task.
> +    ///
> +    /// To increment the refcount of the referenced `mm`, you can use `ARef::from`.
> +    #[inline]
> +    pub fn mm(&self) -> Option<&MmWithUser> {

Hmm... similar issue, `MmWithUser` is `Sync`.

> +        let mm = unsafe { (*self.as_ptr()).mm };
> +
> +        if mm.is_null() {
> +            None
> +        } else {
> +            // SAFETY: If `current->mm` is non-null, then it references a valid mm with a non-zero
> +            // value of `mm_users`. The returned `&MmWithUser` borrows from `CurrentTask`, so the
> +            // `&MmWithUser` cannot escape the current task, meaning `mm_users` can't reach zero
> +            // while the reference is still live.

Regards,
Boqun

> +            Some(unsafe { MmWithUser::from_raw(mm) })
> +        }
> +    }
> +}
> +
>  // SAFETY: The type invariants guarantee that `Task` is always refcounted.
>  unsafe impl crate::types::AlwaysRefCounted for Task {
>      fn inc_ref(&self) {
> 
> -- 
> 2.47.0.371.ga323438b13-goog
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 8/8] task: rust: rework how current is accessed
  2024-11-22 18:03   ` Boqun Feng
@ 2024-11-22 18:48     ` Alice Ryhl
  2024-11-22 19:17       ` Boqun Feng
  0 siblings, 1 reply; 38+ messages in thread
From: Alice Ryhl @ 2024-11-22 18:48 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
	John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
	Alex Gaynor, Gary Guo, Björn Roy Baron, Benno Lossin,
	linux-kernel, linux-mm, rust-for-linux, Andreas Hindborg

On Fri, Nov 22, 2024 at 7:03 PM Boqun Feng <boqun.feng@gmail.com> wrote:
>
> On Fri, Nov 22, 2024 at 03:40:33PM +0000, Alice Ryhl wrote:
> > +/// Represents a [`Task`] obtained from the `current` global.
> > +///
> > +/// This type exists to provide more efficient operations that are only valid on the current task.
> > +/// For example, to retrieve the pid-namespace of a task, you must use rcu protection unless it is
> > +/// the current task.
> > +///
> > +/// # Invariants
> > +///
> > +/// Must be equal to `current` of some thread that is currently running somewhere.
> > +pub struct CurrentTask(Task);
> > +
>
> I think you need to make `CurrentTask` `!Sync`, right? Otherwise, other
> threads can access the shared reference of a `CurrentTask` and the ->mm
> field. I'm thinking if we have a scoped thread/workqueue support in the
> future:
>
>         let task = current!();
>         Workqueue::scoped(|s| {
>             s.spawn(|| {
>                 let mm = task.mm();
>                 // do something with the mm
>             });
>         });

I don't think this is a problem? As long as a thread exists somewhere
with `current` being equal to the task, we should be fine?

> > +impl CurrentTask {
> > +    /// Access the address space of this task.
> > +    ///
> > +    /// To increment the refcount of the referenced `mm`, you can use `ARef::from`.
> > +    #[inline]
> > +    pub fn mm(&self) -> Option<&MmWithUser> {
>
> Hmm... similar issue, `MmWithUser` is `Sync`.

What is the problem with that?

> > +        let mm = unsafe { (*self.as_ptr()).mm };
> > +
> > +        if mm.is_null() {
> > +            None
> > +        } else {
> > +            // SAFETY: If `current->mm` is non-null, then it references a valid mm with a non-zero
> > +            // value of `mm_users`. The returned `&MmWithUser` borrows from `CurrentTask`, so the
> > +            // `&MmWithUser` cannot escape the current task, meaning `mm_users` can't reach zero
> > +            // while the reference is still live.
>
> Regards,
> Boqun
>
> > +            Some(unsafe { MmWithUser::from_raw(mm) })
> > +        }
> > +    }
> > +}
> > +
> >  // SAFETY: The type invariants guarantee that `Task` is always refcounted.
> >  unsafe impl crate::types::AlwaysRefCounted for Task {
> >      fn inc_ref(&self) {
> >
> > --
> > 2.47.0.371.ga323438b13-goog
> >


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 8/8] task: rust: rework how current is accessed
  2024-11-22 18:48     ` Alice Ryhl
@ 2024-11-22 19:17       ` Boqun Feng
  2024-11-22 19:30         ` Matthew Wilcox
  0 siblings, 1 reply; 38+ messages in thread
From: Boqun Feng @ 2024-11-22 19:17 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
	John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
	Alex Gaynor, Gary Guo, Björn Roy Baron, Benno Lossin,
	linux-kernel, linux-mm, rust-for-linux, Andreas Hindborg

On Fri, Nov 22, 2024 at 07:48:16PM +0100, Alice Ryhl wrote:
> On Fri, Nov 22, 2024 at 7:03 PM Boqun Feng <boqun.feng@gmail.com> wrote:
> >
> > On Fri, Nov 22, 2024 at 03:40:33PM +0000, Alice Ryhl wrote:
> > > +/// Represents a [`Task`] obtained from the `current` global.
> > > +///
> > > +/// This type exists to provide more efficient operations that are only valid on the current task.
> > > +/// For example, to retrieve the pid-namespace of a task, you must use rcu protection unless it is
> > > +/// the current task.
> > > +///
> > > +/// # Invariants
> > > +///
> > > +/// Must be equal to `current` of some thread that is currently running somewhere.
> > > +pub struct CurrentTask(Task);
> > > +
> >
> > I think you need to make `CurrentTask` `!Sync`, right? Otherwise, other
> > threads can access the shared reference of a `CurrentTask` and the ->mm
> > field. I'm thinking if we have a scoped thread/workqueue support in the
> > future:
> >
> >         let task = current!();
> >         Workqueue::scoped(|s| {
> >             s.spawn(|| {
> >                 let mm = task.mm();
> >                 // do something with the mm
> >             });
> >         });
> 
> I don't think this is a problem? As long as a thread exists somewhere
> with `current` being equal to the task, we should be fine?
> 

I think I had a misunderstanding on what you meant by "operations
that are only valid on the current task", you mean these operations can
be run by other threads, but it has to be *on* a task_struct that's
"currently running", right? BTW, you probably want to reword a bit,
because the "current" task may be blocked, so technically it's not
"running".

Basically, the operations that `CurrentTask` have are the methods that
are safe to call (even on a different thread) for the "current" task, as
long as it exists (not dead or exited). In that definition, not being
`Sync` is fine.

But I have to admit I'm a bit worried that people may be confused, and
add new methods that can be only run by the current thread in the
future.

> > > +impl CurrentTask {
> > > +    /// Access the address space of this task.
> > > +    ///
> > > +    /// To increment the refcount of the referenced `mm`, you can use `ARef::from`.
> > > +    #[inline]
> > > +    pub fn mm(&self) -> Option<&MmWithUser> {
> >
> > Hmm... similar issue, `MmWithUser` is `Sync`.
> 
> What is the problem with that?
> 

It should be no problem under your definition of `CurrentTask`.

Regards,
Boqun

> > > +        let mm = unsafe { (*self.as_ptr()).mm };
> > > +
> > > +        if mm.is_null() {
> > > +            None
> > > +        } else {
> > > +            // SAFETY: If `current->mm` is non-null, then it references a valid mm with a non-zero
> > > +            // value of `mm_users`. The returned `&MmWithUser` borrows from `CurrentTask`, so the
> > > +            // `&MmWithUser` cannot escape the current task, meaning `mm_users` can't reach zero
> > > +            // while the reference is still live.
> >
> > Regards,
> > Boqun
> >
> > > +            Some(unsafe { MmWithUser::from_raw(mm) })
> > > +        }
> > > +    }
> > > +}
> > > +
> > >  // SAFETY: The type invariants guarantee that `Task` is always refcounted.
> > >  unsafe impl crate::types::AlwaysRefCounted for Task {
> > >      fn inc_ref(&self) {
> > >
> > > --
> > > 2.47.0.371.ga323438b13-goog
> > >


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 8/8] task: rust: rework how current is accessed
  2024-11-22 19:17       ` Boqun Feng
@ 2024-11-22 19:30         ` Matthew Wilcox
  2024-11-22 19:43           ` Alice Ryhl
  0 siblings, 1 reply; 38+ messages in thread
From: Matthew Wilcox @ 2024-11-22 19:30 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Alice Ryhl, Miguel Ojeda, Lorenzo Stoakes, Vlastimil Babka,
	John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
	Alex Gaynor, Gary Guo, Björn Roy Baron, Benno Lossin,
	linux-kernel, linux-mm, rust-for-linux, Andreas Hindborg

On Fri, Nov 22, 2024 at 11:17:15AM -0800, Boqun Feng wrote:
> > I don't think this is a problem? As long as a thread exists somewhere
> > with `current` being equal to the task, we should be fine?
> > 
> 
> I think I had a misunderstanding on what you meant by "operations
> that are only valid on the current task", you mean these operations can
> be run by other threads, but it has to be *on* a task_struct that's
> "currently running", right? BTW, you probably want to reword a bit,
> because the "current" task may be blocked, so technically it's not
> "running".
> 
> Basically, the operations that `CurrentTask` have are the methods that
> are safe to call (even on a different thread) for the "current" task, as
> long as it exists (not dead or exited). In that definition, not being
> `Sync` is fine.
> 
> But I have to admit I'm a bit worried that people may be confused, and
> add new methods that can be only run by the current thread in the
> future.

I agree, I think CurrentTask should refer to "current".  Or we'll
confuse everyone.  Would ActiveTask be a good name for this CurrentTask?


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 8/8] task: rust: rework how current is accessed
  2024-11-22 19:30         ` Matthew Wilcox
@ 2024-11-22 19:43           ` Alice Ryhl
  2024-11-22 19:54             ` Matthew Wilcox
  0 siblings, 1 reply; 38+ messages in thread
From: Alice Ryhl @ 2024-11-22 19:43 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Boqun Feng, Miguel Ojeda, Lorenzo Stoakes, Vlastimil Babka,
	John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
	Alex Gaynor, Gary Guo, Björn Roy Baron, Benno Lossin,
	linux-kernel, linux-mm, rust-for-linux, Andreas Hindborg

On Fri, Nov 22, 2024 at 8:30 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Nov 22, 2024 at 11:17:15AM -0800, Boqun Feng wrote:
> > > I don't think this is a problem? As long as a thread exists somewhere
> > > with `current` being equal to the task, we should be fine?
> > >
> >
> > I think I had a misunderstanding on what you meant by "operations
> > that are only valid on the current task", you mean these operations can
> > be run by other threads, but it has to be *on* a task_struct that's
> > "currently running", right? BTW, you probably want to reword a bit,
> > because the "current" task may be blocked, so technically it's not
> > "running".
> >
> > Basically, the operations that `CurrentTask` have are the methods that
> > are safe to call (even on a different thread) for the "current" task, as
> > long as it exists (not dead or exited). In that definition, not being
> > `Sync` is fine.
> >
> > But I have to admit I'm a bit worried that people may be confused, and
> > add new methods that can be only run by the current thread in the
> > future.
>
> I agree, I think CurrentTask should refer to "current".  Or we'll
> confuse everyone.  Would ActiveTask be a good name for this CurrentTask?

I mean, it does refer to current. Any time you have a `&CurrentTask`,
then you know that you got the pointer by reading the value of
`current`, and that the task you got it from hasn't returned to
userspace (or otherwise exited) yet.

But the name ActiveTask also makes sense I guess.

Alice


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 8/8] task: rust: rework how current is accessed
  2024-11-22 19:43           ` Alice Ryhl
@ 2024-11-22 19:54             ` Matthew Wilcox
  2024-11-22 20:16               ` Alice Ryhl
  0 siblings, 1 reply; 38+ messages in thread
From: Matthew Wilcox @ 2024-11-22 19:54 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Boqun Feng, Miguel Ojeda, Lorenzo Stoakes, Vlastimil Babka,
	John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
	Alex Gaynor, Gary Guo, Björn Roy Baron, Benno Lossin,
	linux-kernel, linux-mm, rust-for-linux, Andreas Hindborg

On Fri, Nov 22, 2024 at 08:43:33PM +0100, Alice Ryhl wrote:
> On Fri, Nov 22, 2024 at 8:30 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Fri, Nov 22, 2024 at 11:17:15AM -0800, Boqun Feng wrote:
> > > > I don't think this is a problem? As long as a thread exists somewhere
> > > > with `current` being equal to the task, we should be fine?
> > > >
> > >
> > > I think I had a misunderstanding on what you meant by "operations
> > > that are only valid on the current task", you mean these operations can
> > > be run by other threads, but it has to be *on* a task_struct that's
> > > "currently running", right? BTW, you probably want to reword a bit,
> > > because the "current" task may be blocked, so technically it's not
> > > "running".
> > >
> > > Basically, the operations that `CurrentTask` have are the methods that
> > > are safe to call (even on a different thread) for the "current" task, as
> > > long as it exists (not dead or exited). In that definition, not being
> > > `Sync` is fine.
> > >
> > > But I have to admit I'm a bit worried that people may be confused, and
> > > add new methods that can be only run by the current thread in the
> > > future.
> >
> > I agree, I think CurrentTask should refer to "current".  Or we'll
> > confuse everyone.  Would ActiveTask be a good name for this CurrentTask?
> 
> I mean, it does refer to current. Any time you have a `&CurrentTask`,
> then you know that you got the pointer by reading the value of
> `current`, and that the task you got it from hasn't returned to
> userspace (or otherwise exited) yet.
> 
> But the name ActiveTask also makes sense I guess.

OK, I'm going to be all rust newbie about this (zoea?)

Given that there are operations that we can do on 'current' that aren't
safe to do if we pass current to another thread, is the right thing
to say that we have Task, and you can get a (Rust) reference to Task
either by it being 'current', or by getting a refcount on it using
get_task_struct()?  And I think that's called a typestate?


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 8/8] task: rust: rework how current is accessed
  2024-11-22 19:54             ` Matthew Wilcox
@ 2024-11-22 20:16               ` Alice Ryhl
  0 siblings, 0 replies; 38+ messages in thread
From: Alice Ryhl @ 2024-11-22 20:16 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Boqun Feng, Miguel Ojeda, Lorenzo Stoakes, Vlastimil Babka,
	John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
	Alex Gaynor, Gary Guo, Björn Roy Baron, Benno Lossin,
	linux-kernel, linux-mm, rust-for-linux, Andreas Hindborg

On Fri, Nov 22, 2024 at 8:54 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Nov 22, 2024 at 08:43:33PM +0100, Alice Ryhl wrote:
> > On Fri, Nov 22, 2024 at 8:30 PM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Fri, Nov 22, 2024 at 11:17:15AM -0800, Boqun Feng wrote:
> > > > > I don't think this is a problem? As long as a thread exists somewhere
> > > > > with `current` being equal to the task, we should be fine?
> > > > >
> > > >
> > > > I think I had a misunderstanding on what you meant by "operations
> > > > that are only valid on the current task", you mean these operations can
> > > > be run by other threads, but it has to be *on* a task_struct that's
> > > > "currently running", right? BTW, you probably want to reword a bit,
> > > > because the "current" task may be blocked, so technically it's not
> > > > "running".
> > > >
> > > > Basically, the operations that `CurrentTask` have are the methods that
> > > > are safe to call (even on a different thread) for the "current" task, as
> > > > long as it exists (not dead or exited). In that definition, not being
> > > > `Sync` is fine.
> > > >
> > > > But I have to admit I'm a bit worried that people may be confused, and
> > > > add new methods that can be only run by the current thread in the
> > > > future.
> > >
> > > I agree, I think CurrentTask should refer to "current".  Or we'll
> > > confuse everyone.  Would ActiveTask be a good name for this CurrentTask?
> >
> > I mean, it does refer to current. Any time you have a `&CurrentTask`,
> > then you know that you got the pointer by reading the value of
> > `current`, and that the task you got it from hasn't returned to
> > userspace (or otherwise exited) yet.
> >
> > But the name ActiveTask also makes sense I guess.
>
> OK, I'm going to be all rust newbie about this (zoea?)
>
> Given that there are operations that we can do on 'current' that aren't
> safe to do if we pass current to another thread, is the right thing
> to say that we have Task, and you can get a (Rust) reference to Task
> either by it being 'current', or by getting a refcount on it using
> get_task_struct()?  And I think that's called a typestate?

There are essentially three important types here:

* ARef<Task>
* &Task
* &CurrentTask

The first one is using the pointer type ARef<_> to hold ownership over
a refcount to the task. When variables of type ARef<Task> go out of
scope, put_task_struct() is called in the destructor. And constructing
a new ARef<Task> involves a call to get_task_struct().

Now, the second type &Task is a reference to a task. A reference is
when you *don't* have ownership over a refcount. They're used whenever
there is *any* mechanism keeping the value alive; the actual mechanism
in question is not part of the type. The way references work is that
they are annotated with a compile-time concept called a lifetime,
which is essentially a region of code that the reference is not
allowed to escape. The compiler generally enforces this. For example,
given an ARef<Task>, you can obtain a &Task without touching the
refcount. Attempting to use the &Task after the ARef<Task> is
destroyed is a hard compiler error. In this case, the ARef<_> keeps a
refcount alive, so accessing the task through the &Task is always okay
even though the &Task doesn't have a refcount.

Another mechanism is `current`. The way that Rust's `current!()` macro
works is essentially by defining a local variable which goes out of
scope at the end of the current function, and giving you a &Task where
the reference's lifetime is bounded by that local variable. So by
ensuring that the &Task is bounded by the current function scope, we
ensure that we're not using it after current becomes invalid.

With this change, the `current!()` macro instead gives you a
`&CurrentTask` whose lifetime is bounded to the function scope in the
same way. Given a &CurrentTask, you can deref it to a normal &Task
with the same lifetime, so you can also access the normal Task methods
on a &CurrentTask.

And what's happening here is basically that ... you can use the
&CurrentTask as long as the function you created it in has not yet
returned. So if you spawn something on the workqueue and sleep until
the workqueue finishes executing the job, then technically that
satisfies the requirement and Rust will not prevent you from using the
&CurrentTask within that workqueue job. And this is technically also
okay from a C perspective, since the address space isn't going to go
away as long as the function doesn't return.

Alice

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 8/8] task: rust: rework how current is accessed
  2024-11-22 15:40 ` [PATCH v9 8/8] task: rust: rework how current is accessed Alice Ryhl
                     ` (2 preceding siblings ...)
  2024-11-22 18:03   ` Boqun Feng
@ 2024-11-26 17:14   ` Jann Horn
  2024-11-27 12:35     ` Alice Ryhl
  3 siblings, 1 reply; 38+ messages in thread
From: Jann Horn @ 2024-11-26 17:14 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
	John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Suren Baghdasaryan,
	Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, linux-kernel, linux-mm, rust-for-linux,
	Andreas Hindborg

On Fri, Nov 22, 2024 at 4:41 PM Alice Ryhl <aliceryhl@google.com> wrote:
> +impl CurrentTask {
> +    /// Access the address space of this task.
> +    ///
> +    /// To increment the refcount of the referenced `mm`, you can use `ARef::from`.
> +    #[inline]
> +    pub fn mm(&self) -> Option<&MmWithUser> {
> +        let mm = unsafe { (*self.as_ptr()).mm };
> +
> +        if mm.is_null() {
> +            None
> +        } else {
> +            // SAFETY: If `current->mm` is non-null, then it references a valid mm with a non-zero
> +            // value of `mm_users`. The returned `&MmWithUser` borrows from `CurrentTask`, so the
> +            // `&MmWithUser` cannot escape the current task, meaning `mm_users` can't reach zero
> +            // while the reference is still live.
> +            Some(unsafe { MmWithUser::from_raw(mm) })

Maybe also add safety comments for these nitpicky details:

kthreads can use kthread_use_mm()/kthread_unuse_mm() to change
current->mm (which allows kthreads to access arbitrary userspace
address spaces with copy_from_user/copy_to_user), but as long as you
can't call into kthread_use_mm()/kthread_unuse_mm() from Rust code,
this should be correct. If you do want to allow calls into
kthread_use_mm()/kthread_unuse_mm() later on, you might have to gate
this on a check for PF_KTHREAD, or something like that.

Binary formats' .load_binary implementations can change current->mm by
calling begin_new_exec(), but that's not an issue as long as no binary
format loaders are implemented in Rust.

> +        }
> +    }
> +}
> +
>  // SAFETY: The type invariants guarantee that `Task` is always refcounted.
>  unsafe impl crate::types::AlwaysRefCounted for Task {
>      fn inc_ref(&self) {
>
> --
> 2.47.0.371.ga323438b13-goog
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 8/8] task: rust: rework how current is accessed
  2024-11-26 17:14   ` Jann Horn
@ 2024-11-27 12:35     ` Alice Ryhl
  2024-11-27 15:52       ` Jann Horn
  0 siblings, 1 reply; 38+ messages in thread
From: Alice Ryhl @ 2024-11-27 12:35 UTC (permalink / raw)
  To: Jann Horn
  Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
	John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Suren Baghdasaryan,
	Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, linux-kernel, linux-mm, rust-for-linux,
	Andreas Hindborg

On Tue, Nov 26, 2024 at 6:15 PM Jann Horn <jannh@google.com> wrote:
>
> On Fri, Nov 22, 2024 at 4:41 PM Alice Ryhl <aliceryhl@google.com> wrote:
> > +impl CurrentTask {
> > +    /// Access the address space of this task.
> > +    ///
> > +    /// To increment the refcount of the referenced `mm`, you can use `ARef::from`.
> > +    #[inline]
> > +    pub fn mm(&self) -> Option<&MmWithUser> {
> > +        let mm = unsafe { (*self.as_ptr()).mm };
> > +
> > +        if mm.is_null() {
> > +            None
> > +        } else {
> > +            // SAFETY: If `current->mm` is non-null, then it references a valid mm with a non-zero
> > +            // value of `mm_users`. The returned `&MmWithUser` borrows from `CurrentTask`, so the
> > +            // `&MmWithUser` cannot escape the current task, meaning `mm_users` can't reach zero
> > +            // while the reference is still live.
> > +            Some(unsafe { MmWithUser::from_raw(mm) })
>
> Maybe also add safety comments for these nitpicky details:
>
> kthreads can use kthread_use_mm()/kthread_unuse_mm() to change
> current->mm (which allows kthreads to access arbitrary userspace
> address spaces with copy_from_user/copy_to_user), but as long as you
> can't call into kthread_use_mm()/kthread_unuse_mm() from Rust code,
> this should be correct. If you do want to allow calls into
> kthread_use_mm()/kthread_unuse_mm() later on, you might have to gate
> this on a check for PF_KTHREAD, or something like that.

Huh ... is it possible to use kthread_use_mm() to create a situation
where current->mm has mm_users equal to zero? If not, then I don't
think it's a problem.

> Binary formats' .load_binary implementations can change current->mm by
> calling begin_new_exec(), but that's not an issue as long as no binary
> format loaders are implemented in Rust.

I think we can allow such loaders by having them involve an unsafe
operation asserting that you're not holding any references into
current when you start the new process.

Alice


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 8/8] task: rust: rework how current is accessed
  2024-11-27 12:35     ` Alice Ryhl
@ 2024-11-27 15:52       ` Jann Horn
  2024-11-27 15:57         ` Alice Ryhl
  0 siblings, 1 reply; 38+ messages in thread
From: Jann Horn @ 2024-11-27 15:52 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
	John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Suren Baghdasaryan,
	Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, linux-kernel, linux-mm, rust-for-linux,
	Andreas Hindborg

On Wed, Nov 27, 2024 at 1:36 PM Alice Ryhl <aliceryhl@google.com> wrote:
> On Tue, Nov 26, 2024 at 6:15 PM Jann Horn <jannh@google.com> wrote:
> >
> > On Fri, Nov 22, 2024 at 4:41 PM Alice Ryhl <aliceryhl@google.com> wrote:
> > > +impl CurrentTask {
> > > +    /// Access the address space of this task.
> > > +    ///
> > > +    /// To increment the refcount of the referenced `mm`, you can use `ARef::from`.
> > > +    #[inline]
> > > +    pub fn mm(&self) -> Option<&MmWithUser> {
> > > +        let mm = unsafe { (*self.as_ptr()).mm };
> > > +
> > > +        if mm.is_null() {
> > > +            None
> > > +        } else {
> > > +            // SAFETY: If `current->mm` is non-null, then it references a valid mm with a non-zero
> > > +            // value of `mm_users`. The returned `&MmWithUser` borrows from `CurrentTask`, so the
> > > +            // `&MmWithUser` cannot escape the current task, meaning `mm_users` can't reach zero
> > > +            // while the reference is still live.
> > > +            Some(unsafe { MmWithUser::from_raw(mm) })
> >
> > Maybe also add safety comments for these nitpicky details:
> >
> > kthreads can use kthread_use_mm()/kthread_unuse_mm() to change
> > current->mm (which allows kthreads to access arbitrary userspace
> > address spaces with copy_from_user/copy_to_user), but as long as you
> > can't call into kthread_use_mm()/kthread_unuse_mm() from Rust code,
> > this should be correct. If you do want to allow calls into
> > kthread_use_mm()/kthread_unuse_mm() later on, you might have to gate
> > this on a check for PF_KTHREAD, or something like that.
>
> Huh ... is it possible to use kthread_use_mm() to create a situation
> where current->mm has mm_users equal to zero? If not, then I don't
> think it's a problem.

Ah, no, I don't think so. I think the only problematic scenario would
be if rust code created a borrow of current->mm, then called
kthread_unuse_mm() and dropped the reference that was held on the MM,
and then accessed the borrowed old mm_struct. Which isn't possible
unless a Rust binding is created for
kthread_use_mm()/kthread_unuse_mm().

> > Binary formats' .load_binary implementations can change current->mm by
> > calling begin_new_exec(), but that's not an issue as long as no binary
> > format loaders are implemented in Rust.
>
> I think we can allow such loaders by having them involve an unsafe
> operation asserting that you're not holding any references into
> current when you start the new process.

Sounds reasonable.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 8/8] task: rust: rework how current is accessed
  2024-11-27 15:52       ` Jann Horn
@ 2024-11-27 15:57         ` Alice Ryhl
  2024-11-27 16:18           ` Jann Horn
  0 siblings, 1 reply; 38+ messages in thread
From: Alice Ryhl @ 2024-11-27 15:57 UTC (permalink / raw)
  To: Jann Horn
  Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
	John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Suren Baghdasaryan,
	Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, linux-kernel, linux-mm, rust-for-linux,
	Andreas Hindborg

On Wed, Nov 27, 2024 at 4:52 PM Jann Horn <jannh@google.com> wrote:
>
> On Wed, Nov 27, 2024 at 1:36 PM Alice Ryhl <aliceryhl@google.com> wrote:
> > On Tue, Nov 26, 2024 at 6:15 PM Jann Horn <jannh@google.com> wrote:
> > >
> > > On Fri, Nov 22, 2024 at 4:41 PM Alice Ryhl <aliceryhl@google.com> wrote:
> > > > +impl CurrentTask {
> > > > +    /// Access the address space of this task.
> > > > +    ///
> > > > +    /// To increment the refcount of the referenced `mm`, you can use `ARef::from`.
> > > > +    #[inline]
> > > > +    pub fn mm(&self) -> Option<&MmWithUser> {
> > > > +        let mm = unsafe { (*self.as_ptr()).mm };
> > > > +
> > > > +        if mm.is_null() {
> > > > +            None
> > > > +        } else {
> > > > +            // SAFETY: If `current->mm` is non-null, then it references a valid mm with a non-zero
> > > > +            // value of `mm_users`. The returned `&MmWithUser` borrows from `CurrentTask`, so the
> > > > +            // `&MmWithUser` cannot escape the current task, meaning `mm_users` can't reach zero
> > > > +            // while the reference is still live.
> > > > +            Some(unsafe { MmWithUser::from_raw(mm) })
> > >
> > > Maybe also add safety comments for these nitpicky details:
> > >
> > > kthreads can use kthread_use_mm()/kthread_unuse_mm() to change
> > > current->mm (which allows kthreads to access arbitrary userspace
> > > address spaces with copy_from_user/copy_to_user), but as long as you
> > > can't call into kthread_use_mm()/kthread_unuse_mm() from Rust code,
> > > this should be correct. If you do want to allow calls into
> > > kthread_use_mm()/kthread_unuse_mm() later on, you might have to gate
> > > this on a check for PF_KTHREAD, or something like that.
> >
> > Huh ... is it possible to use kthread_use_mm() to create a situation
> > where current->mm has mm_users equal to zero? If not, then I don't
> > think it's a problem.
>
> Ah, no, I don't think so. I think the only problematic scenario would
> be if rust code created a borrow of current->mm, then called
> kthread_unuse_mm() and dropped the reference that was held on the MM,
> and then accessed the borrowed old mm_struct. Which isn't possible
> unless a Rust binding is created for
> kthread_use_mm()/kthread_unuse_mm().

Ah, ok.

The way that the current abstraction works is that it enforces that
the current pointer cannot escape the scope you were in when you
obtained it. If we enforce that kthread_use_mm() and
kthread_unuse_mm() involve a scope, then that should solve that.

Alice


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 8/8] task: rust: rework how current is accessed
  2024-11-27 15:57         ` Alice Ryhl
@ 2024-11-27 16:18           ` Jann Horn
  0 siblings, 0 replies; 38+ messages in thread
From: Jann Horn @ 2024-11-27 16:18 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
	John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
	Arnd Bergmann, Christian Brauner, Suren Baghdasaryan,
	Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, linux-kernel, linux-mm, rust-for-linux,
	Andreas Hindborg

On Wed, Nov 27, 2024 at 4:57 PM Alice Ryhl <aliceryhl@google.com> wrote:
> On Wed, Nov 27, 2024 at 4:52 PM Jann Horn <jannh@google.com> wrote:
> > On Wed, Nov 27, 2024 at 1:36 PM Alice Ryhl <aliceryhl@google.com> wrote:
> > > On Tue, Nov 26, 2024 at 6:15 PM Jann Horn <jannh@google.com> wrote:
> > > >
> > > > On Fri, Nov 22, 2024 at 4:41 PM Alice Ryhl <aliceryhl@google.com> wrote:
> > > > > +impl CurrentTask {
> > > > > +    /// Access the address space of this task.
> > > > > +    ///
> > > > > +    /// To increment the refcount of the referenced `mm`, you can use `ARef::from`.
> > > > > +    #[inline]
> > > > > +    pub fn mm(&self) -> Option<&MmWithUser> {
> > > > > +        let mm = unsafe { (*self.as_ptr()).mm };
> > > > > +
> > > > > +        if mm.is_null() {
> > > > > +            None
> > > > > +        } else {
> > > > > +            // SAFETY: If `current->mm` is non-null, then it references a valid mm with a non-zero
> > > > > +            // value of `mm_users`. The returned `&MmWithUser` borrows from `CurrentTask`, so the
> > > > > +            // `&MmWithUser` cannot escape the current task, meaning `mm_users` can't reach zero
> > > > > +            // while the reference is still live.
> > > > > +            Some(unsafe { MmWithUser::from_raw(mm) })
> > > >
> > > > Maybe also add safety comments for these nitpicky details:
> > > >
> > > > kthreads can use kthread_use_mm()/kthread_unuse_mm() to change
> > > > current->mm (which allows kthreads to access arbitrary userspace
> > > > address spaces with copy_from_user/copy_to_user), but as long as you
> > > > can't call into kthread_use_mm()/kthread_unuse_mm() from Rust code,
> > > > this should be correct. If you do want to allow calls into
> > > > kthread_use_mm()/kthread_unuse_mm() later on, you might have to gate
> > > > this on a check for PF_KTHREAD, or something like that.
> > >
> > > Huh ... is it possible to use kthread_use_mm() to create a situation
> > > where current->mm has mm_users equal to zero? If not, then I don't
> > > think it's a problem.
> >
> > Ah, no, I don't think so. I think the only problematic scenario would
> > be if rust code created a borrow of current->mm, then called
> > kthread_unuse_mm() and dropped the reference that was held on the MM,
> > and then accessed the borrowed old mm_struct. Which isn't possible
> > unless a Rust binding is created for
> > kthread_use_mm()/kthread_unuse_mm().
>
> Ah, ok.
>
> The way that the current abstraction works is that it enforces that
> the current pointer cannot escape the scope you were in when you
> obtained it. If we enforce that kthread_use_mm() and
> kthread_unuse_mm() involve a scope, then that should solve that.

Oooh, that's neat, thanks for the explanation.


^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2024-11-29 11:59 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-11-22 15:40 [PATCH v9 0/8] Rust support for mm_struct, vm_area_struct, and mmap Alice Ryhl
2024-11-22 15:40 ` [PATCH v9 1/8] mm: rust: add abstraction for struct mm_struct Alice Ryhl
2024-11-22 17:27   ` Lorenzo Stoakes
2024-11-22 15:40 ` [PATCH v9 2/8] mm: rust: add vm_area_struct methods that require read access Alice Ryhl
2024-11-26 22:09   ` Jann Horn
2024-11-27 12:01     ` Alice Ryhl
2024-11-27 15:40       ` Jann Horn
2024-11-27 15:45         ` Alice Ryhl
2024-11-27 16:16           ` Jann Horn
2024-11-29 11:44         ` Alice Ryhl
2024-11-29 11:58           ` Lorenzo Stoakes
2024-11-22 15:40 ` [PATCH v9 3/8] mm: rust: add vm_insert_page Alice Ryhl
2024-11-22 15:40 ` [PATCH v9 4/8] mm: rust: add lock_vma_under_rcu Alice Ryhl
2024-11-26 21:50   ` Jann Horn
2024-11-22 15:40 ` [PATCH v9 5/8] mm: rust: add mmput_async support Alice Ryhl
2024-11-22 15:40 ` [PATCH v9 6/8] mm: rust: add VmAreaNew for f_ops->mmap() Alice Ryhl
2024-11-22 17:33   ` Lorenzo Stoakes
2024-11-26 21:29   ` Jann Horn
2024-11-27 12:38     ` Alice Ryhl
2024-11-27 16:19       ` Jann Horn
2024-11-22 15:40 ` [PATCH v9 7/8] rust: miscdevice: add mmap support Alice Ryhl
2024-11-22 15:40 ` [PATCH v9 8/8] task: rust: rework how current is accessed Alice Ryhl
2024-11-22 15:53   ` Alice Ryhl
2024-11-22 17:34     ` Lorenzo Stoakes
2024-11-22 17:54   ` Lorenzo Stoakes
2024-11-22 18:51     ` Alice Ryhl
2024-11-22 18:03   ` Boqun Feng
2024-11-22 18:48     ` Alice Ryhl
2024-11-22 19:17       ` Boqun Feng
2024-11-22 19:30         ` Matthew Wilcox
2024-11-22 19:43           ` Alice Ryhl
2024-11-22 19:54             ` Matthew Wilcox
2024-11-22 20:16               ` Alice Ryhl
2024-11-26 17:14   ` Jann Horn
2024-11-27 12:35     ` Alice Ryhl
2024-11-27 15:52       ` Jann Horn
2024-11-27 15:57         ` Alice Ryhl
2024-11-27 16:18           ` Jann Horn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox