* [PATCH v11 1/8] mm: rust: add abstraction for struct mm_struct
2024-12-11 10:37 ` [PATCH v11 0/8] Rust support for mm_struct, vm_area_struct, and mmap Alice Ryhl
@ 2024-12-11 10:37 ` Alice Ryhl
2024-12-16 11:31 ` Andreas Hindborg
2025-01-17 0:45 ` Balbir Singh
2024-12-11 10:37 ` [PATCH v11 2/8] mm: rust: add vm_area_struct methods that require read access Alice Ryhl
` (8 subsequent siblings)
9 siblings, 2 replies; 65+ messages in thread
From: Alice Ryhl @ 2024-12-11 10:37 UTC (permalink / raw)
To: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan
Cc: Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Trevor Gross, linux-kernel,
linux-mm, rust-for-linux, Alice Ryhl
These abstractions allow you to reference a `struct mm_struct` using
both mmgrab and mmget refcounts. This is done using two Rust types:
* Mm - represents an mm_struct where you don't know anything about the
value of mm_users.
* MmWithUser - represents an mm_struct where you know at compile time
that mm_users is non-zero.
This allows us to encode in the type system whether a method requires
that mm_users is non-zero or not. For instance, you can always call
`mmget_not_zero` but you can only call `mmap_read_lock` when mm_users is
non-zero.
It's possible to access current->mm without a refcount increment, but
that is added in a later patch of this series.
Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
rust/helpers/helpers.c | 1 +
rust/helpers/mm.c | 39 +++++++++
rust/kernel/lib.rs | 1 +
rust/kernel/mm.rs | 219 +++++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 260 insertions(+)
diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c
index dcf827a61b52..9d748ec845b3 100644
--- a/rust/helpers/helpers.c
+++ b/rust/helpers/helpers.c
@@ -16,6 +16,7 @@
#include "fs.c"
#include "jump_label.c"
#include "kunit.c"
+#include "mm.c"
#include "mutex.c"
#include "page.c"
#include "pid_namespace.c"
diff --git a/rust/helpers/mm.c b/rust/helpers/mm.c
new file mode 100644
index 000000000000..7201747a5d31
--- /dev/null
+++ b/rust/helpers/mm.c
@@ -0,0 +1,39 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/mm.h>
+#include <linux/sched/mm.h>
+
+void rust_helper_mmgrab(struct mm_struct *mm)
+{
+ mmgrab(mm);
+}
+
+void rust_helper_mmdrop(struct mm_struct *mm)
+{
+ mmdrop(mm);
+}
+
+void rust_helper_mmget(struct mm_struct *mm)
+{
+ mmget(mm);
+}
+
+bool rust_helper_mmget_not_zero(struct mm_struct *mm)
+{
+ return mmget_not_zero(mm);
+}
+
+void rust_helper_mmap_read_lock(struct mm_struct *mm)
+{
+ mmap_read_lock(mm);
+}
+
+bool rust_helper_mmap_read_trylock(struct mm_struct *mm)
+{
+ return mmap_read_trylock(mm);
+}
+
+void rust_helper_mmap_read_unlock(struct mm_struct *mm)
+{
+ mmap_read_unlock(mm);
+}
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index e1065a7551a3..6555e0847192 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -46,6 +46,7 @@
pub mod kunit;
pub mod list;
pub mod miscdevice;
+pub mod mm;
#[cfg(CONFIG_NET)]
pub mod net;
pub mod page;
diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
new file mode 100644
index 000000000000..84cba581edaa
--- /dev/null
+++ b/rust/kernel/mm.rs
@@ -0,0 +1,219 @@
+// SPDX-License-Identifier: GPL-2.0
+
+// Copyright (C) 2024 Google LLC.
+
+//! Memory management.
+//!
+//! C header: [`include/linux/mm.h`](srctree/include/linux/mm.h)
+
+use crate::{
+ bindings,
+ types::{ARef, AlwaysRefCounted, NotThreadSafe, Opaque},
+};
+use core::{ops::Deref, ptr::NonNull};
+
+/// A wrapper for the kernel's `struct mm_struct`.
+///
+/// Since `mm_users` may be zero, the associated address space may not exist anymore. You can use
+/// [`mmget_not_zero`] to be able to access the address space.
+///
+/// The `ARef<Mm>` smart pointer holds an `mmgrab` refcount. Its destructor may sleep.
+///
+/// # Invariants
+///
+/// Values of this type are always refcounted using `mmgrab`.
+///
+/// [`mmget_not_zero`]: Mm::mmget_not_zero
+#[repr(transparent)]
+pub struct Mm {
+ mm: Opaque<bindings::mm_struct>,
+}
+
+// SAFETY: It is safe to call `mmdrop` on another thread than where `mmgrab` was called.
+unsafe impl Send for Mm {}
+// SAFETY: All methods on `Mm` can be called in parallel from several threads.
+unsafe impl Sync for Mm {}
+
+// SAFETY: By the type invariants, this type is always refcounted.
+unsafe impl AlwaysRefCounted for Mm {
+ #[inline]
+ fn inc_ref(&self) {
+ // SAFETY: The pointer is valid since self is a reference.
+ unsafe { bindings::mmgrab(self.as_raw()) };
+ }
+
+ #[inline]
+ unsafe fn dec_ref(obj: NonNull<Self>) {
+ // SAFETY: The caller is giving up their refcount.
+ unsafe { bindings::mmdrop(obj.cast().as_ptr()) };
+ }
+}
+
+/// A wrapper for the kernel's `struct mm_struct`.
+///
+/// This type is like [`Mm`], but with non-zero `mm_users`. It can only be used when `mm_users` can
+/// be proven to be non-zero at compile-time, usually because the relevant code holds an `mmget`
+/// refcount. It can be used to access the associated address space.
+///
+/// The `ARef<MmWithUser>` smart pointer holds an `mmget` refcount. Its destructor may sleep.
+///
+/// # Invariants
+///
+/// Values of this type are always refcounted using `mmget`. The value of `mm_users` is non-zero.
+#[repr(transparent)]
+pub struct MmWithUser {
+ mm: Mm,
+}
+
+// SAFETY: It is safe to call `mmput` on another thread than where `mmget` was called.
+unsafe impl Send for MmWithUser {}
+// SAFETY: All methods on `MmWithUser` can be called in parallel from several threads.
+unsafe impl Sync for MmWithUser {}
+
+// SAFETY: By the type invariants, this type is always refcounted.
+unsafe impl AlwaysRefCounted for MmWithUser {
+ #[inline]
+ fn inc_ref(&self) {
+ // SAFETY: The pointer is valid since self is a reference.
+ unsafe { bindings::mmget(self.as_raw()) };
+ }
+
+ #[inline]
+ unsafe fn dec_ref(obj: NonNull<Self>) {
+ // SAFETY: The caller is giving up their refcount.
+ unsafe { bindings::mmput(obj.cast().as_ptr()) };
+ }
+}
+
+// Make all `Mm` methods available on `MmWithUser`.
+impl Deref for MmWithUser {
+ type Target = Mm;
+
+ #[inline]
+ fn deref(&self) -> &Mm {
+ &self.mm
+ }
+}
+
+// These methods are safe to call even if `mm_users` is zero.
+impl Mm {
+ /// Call `mmgrab` on `current.mm`.
+ #[inline]
+ pub fn mmgrab_current() -> Option<ARef<Mm>> {
+ // SAFETY: It's safe to get the `mm` field from current.
+ let mm = unsafe {
+ let current = bindings::get_current();
+ (*current).mm
+ };
+
+ if mm.is_null() {
+ return None;
+ }
+
+ // SAFETY: The value of `current->mm` is guaranteed to be null or a valid `mm_struct`. We
+ // just checked that it's not null. Furthermore, the returned `&Mm` is valid only for the
+ // duration of this function, and `current->mm` will stay valid for that long.
+ let mm = unsafe { Mm::from_raw(mm) };
+
+ // This increments the refcount using `mmgrab`.
+ Some(ARef::from(mm))
+ }
+
+ /// Returns a raw pointer to the inner `mm_struct`.
+ #[inline]
+ pub fn as_raw(&self) -> *mut bindings::mm_struct {
+ self.mm.get()
+ }
+
+ /// Obtain a reference from a raw pointer.
+ ///
+ /// # Safety
+ ///
+ /// The caller must ensure that `ptr` points at an `mm_struct`, and that it is not deallocated
+ /// during the lifetime 'a.
+ #[inline]
+ pub unsafe fn from_raw<'a>(ptr: *const bindings::mm_struct) -> &'a Mm {
+ // SAFETY: Caller promises that the pointer is valid for 'a. Layouts are compatible due to
+ // repr(transparent).
+ unsafe { &*ptr.cast() }
+ }
+
+ /// Calls `mmget_not_zero` and returns a handle if it succeeds.
+ #[inline]
+ pub fn mmget_not_zero(&self) -> Option<ARef<MmWithUser>> {
+ // SAFETY: The pointer is valid since self is a reference.
+ let success = unsafe { bindings::mmget_not_zero(self.as_raw()) };
+
+ if success {
+ // SAFETY: We just created an `mmget` refcount.
+ Some(unsafe { ARef::from_raw(NonNull::new_unchecked(self.as_raw().cast())) })
+ } else {
+ None
+ }
+ }
+}
+
+// These methods require `mm_users` to be non-zero.
+impl MmWithUser {
+ /// Obtain a reference from a raw pointer.
+ ///
+ /// # Safety
+ ///
+ /// The caller must ensure that `ptr` points at an `mm_struct`, and that `mm_users` remains
+ /// non-zero for the duration of the lifetime 'a.
+ #[inline]
+ pub unsafe fn from_raw<'a>(ptr: *const bindings::mm_struct) -> &'a MmWithUser {
+ // SAFETY: Caller promises that the pointer is valid for 'a. The layout is compatible due
+ // to repr(transparent).
+ unsafe { &*ptr.cast() }
+ }
+
+ /// Lock the mmap read lock.
+ #[inline]
+ pub fn mmap_read_lock(&self) -> MmapReadGuard<'_> {
+ // SAFETY: The pointer is valid since self is a reference.
+ unsafe { bindings::mmap_read_lock(self.as_raw()) };
+
+ // INVARIANT: We just acquired the read lock.
+ MmapReadGuard {
+ mm: self,
+ _nts: NotThreadSafe,
+ }
+ }
+
+ /// Try to lock the mmap read lock.
+ #[inline]
+ pub fn mmap_read_trylock(&self) -> Option<MmapReadGuard<'_>> {
+ // SAFETY: The pointer is valid since self is a reference.
+ let success = unsafe { bindings::mmap_read_trylock(self.as_raw()) };
+
+ if success {
+ // INVARIANT: We just acquired the read lock.
+ Some(MmapReadGuard {
+ mm: self,
+ _nts: NotThreadSafe,
+ })
+ } else {
+ None
+ }
+ }
+}
+
+/// A guard for the mmap read lock.
+///
+/// # Invariants
+///
+/// This `MmapReadGuard` guard owns the mmap read lock.
+pub struct MmapReadGuard<'a> {
+ mm: &'a MmWithUser,
+ // `mmap_read_lock` and `mmap_read_unlock` must be called on the same thread
+ _nts: NotThreadSafe,
+}
+
+impl Drop for MmapReadGuard<'_> {
+ #[inline]
+ fn drop(&mut self) {
+ // SAFETY: We hold the read lock by the type invariants.
+ unsafe { bindings::mmap_read_unlock(self.mm.as_raw()) };
+ }
+}
--
2.47.1.613.gc27f4b7a9f-goog
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 1/8] mm: rust: add abstraction for struct mm_struct
2024-12-11 10:37 ` [PATCH v11 1/8] mm: rust: add abstraction for struct mm_struct Alice Ryhl
@ 2024-12-16 11:31 ` Andreas Hindborg
2025-01-13 9:53 ` Alice Ryhl
2025-01-17 0:45 ` Balbir Singh
1 sibling, 1 reply; 65+ messages in thread
From: Andreas Hindborg @ 2024-12-16 11:31 UTC (permalink / raw)
To: Alice Ryhl
Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo,
=?us-ascii?Q?=3D=3Fus-ascii=3FQ=3F=3D3D?=
=?us-ascii?Q?=3D3Fus-ascii=3D3FQ=3D3F=3D3D3D=3F=3D_=3D=3Fus-ascii=3FQ=3F?=
=?us-ascii?Q?=3D3D3Futf-8=3D3D3FQ=3D3D3FBj=3D3D3DC3=3D3F=3D3D=5F=3D3D=3D3?=
=?us-ascii?Q?Fus-ascii=3D3FQ=3D3F=3F=3D_=3D=3Fus-ascii=3FQ=3F=3D3D3DB6rn?=
=?us-ascii?Q?=3D3D3F=3D3D3D=3D3F=3D3D=3F=3D?= Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
"Alice Ryhl" <aliceryhl@google.com> writes:
> These abstractions allow you to reference a `struct mm_struct` using
> both mmgrab and mmget refcounts. This is done using two Rust types:
>
> * Mm - represents an mm_struct where you don't know anything about the
> value of mm_users.
> * MmWithUser - represents an mm_struct where you know at compile time
> that mm_users is non-zero.
>
> This allows us to encode in the type system whether a method requires
> that mm_users is non-zero or not. For instance, you can always call
> `mmget_not_zero` but you can only call `mmap_read_lock` when mm_users is
> non-zero.
>
> It's possible to access current->mm without a refcount increment, but
> that is added in a later patch of this series.
>
> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> ---
> rust/helpers/helpers.c | 1 +
> rust/helpers/mm.c | 39 +++++++++
> rust/kernel/lib.rs | 1 +
> rust/kernel/mm.rs | 219 +++++++++++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 260 insertions(+)
>
> diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
> new file mode 100644
> index 000000000000..84cba581edaa
> --- /dev/null
> +++ b/rust/kernel/mm.rs
> @@ -0,0 +1,219 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +// Copyright (C) 2024 Google LLC.
> +
> +//! Memory management.
Could you add a little more context here?
> +//!
> +//! C header: [`include/linux/mm.h`](srctree/include/linux/mm.h)
> +
> +use crate::{
> + bindings,
> + types::{ARef, AlwaysRefCounted, NotThreadSafe, Opaque},
> +};
> +use core::{ops::Deref, ptr::NonNull};
> +
> +/// A wrapper for the kernel's `struct mm_struct`.
Could you elaborate the data structure use case? When do I need it, what
does it do?
> +///
> +/// Since `mm_users` may be zero, the associated address space may not exist anymore. You can use
> +/// [`mmget_not_zero`] to be able to access the address space.
> +///
> +/// The `ARef<Mm>` smart pointer holds an `mmgrab` refcount. Its destructor may sleep.
> +///
> +/// # Invariants
> +///
> +/// Values of this type are always refcounted using `mmgrab`.
> +///
> +/// [`mmget_not_zero`]: Mm::mmget_not_zero
> +#[repr(transparent)]
> +pub struct Mm {
Could we come up with a better name? `MemoryMap` or `MemoryMapping`?. You
use `MMapReadGuard` later.
> + mm: Opaque<bindings::mm_struct>,
> +}
> +
> +// SAFETY: It is safe to call `mmdrop` on another thread than where `mmgrab` was called.
> +unsafe impl Send for Mm {}
> +// SAFETY: All methods on `Mm` can be called in parallel from several threads.
> +unsafe impl Sync for Mm {}
> +
> +// SAFETY: By the type invariants, this type is always refcounted.
> +unsafe impl AlwaysRefCounted for Mm {
> + #[inline]
> + fn inc_ref(&self) {
> + // SAFETY: The pointer is valid since self is a reference.
> + unsafe { bindings::mmgrab(self.as_raw()) };
> + }
> +
> + #[inline]
> + unsafe fn dec_ref(obj: NonNull<Self>) {
> + // SAFETY: The caller is giving up their refcount.
> + unsafe { bindings::mmdrop(obj.cast().as_ptr()) };
> + }
> +}
> +
> +/// A wrapper for the kernel's `struct mm_struct`.
> +///
> +/// This type is like [`Mm`], but with non-zero `mm_users`. It can only be used when `mm_users` can
> +/// be proven to be non-zero at compile-time, usually because the relevant code holds an `mmget`
> +/// refcount. It can be used to access the associated address space.
> +///
> +/// The `ARef<MmWithUser>` smart pointer holds an `mmget` refcount. Its destructor may sleep.
> +///
> +/// # Invariants
> +///
> +/// Values of this type are always refcounted using `mmget`. The value of `mm_users` is non-zero.
> +#[repr(transparent)]
> +pub struct MmWithUser {
> + mm: Mm,
> +}
> +
> +// SAFETY: It is safe to call `mmput` on another thread than where `mmget` was called.
> +unsafe impl Send for MmWithUser {}
> +// SAFETY: All methods on `MmWithUser` can be called in parallel from several threads.
> +unsafe impl Sync for MmWithUser {}
> +
> +// SAFETY: By the type invariants, this type is always refcounted.
> +unsafe impl AlwaysRefCounted for MmWithUser {
> + #[inline]
> + fn inc_ref(&self) {
> + // SAFETY: The pointer is valid since self is a reference.
> + unsafe { bindings::mmget(self.as_raw()) };
> + }
> +
> + #[inline]
> + unsafe fn dec_ref(obj: NonNull<Self>) {
> + // SAFETY: The caller is giving up their refcount.
> + unsafe { bindings::mmput(obj.cast().as_ptr()) };
> + }
> +}
> +
> +// Make all `Mm` methods available on `MmWithUser`.
> +impl Deref for MmWithUser {
> + type Target = Mm;
> +
> + #[inline]
> + fn deref(&self) -> &Mm {
> + &self.mm
> + }
> +}
> +
> +// These methods are safe to call even if `mm_users` is zero.
> +impl Mm {
> + /// Call `mmgrab` on `current.mm`.
> + #[inline]
> + pub fn mmgrab_current() -> Option<ARef<Mm>> {
> + // SAFETY: It's safe to get the `mm` field from current.
> + let mm = unsafe {
> + let current = bindings::get_current();
> + (*current).mm
> + };
> +
> + if mm.is_null() {
> + return None;
> + }
> +
> + // SAFETY: The value of `current->mm` is guaranteed to be null or a valid `mm_struct`. We
> + // just checked that it's not null. Furthermore, the returned `&Mm` is valid only for the
> + // duration of this function, and `current->mm` will stay valid for that long.
> + let mm = unsafe { Mm::from_raw(mm) };
> +
> + // This increments the refcount using `mmgrab`.
> + Some(ARef::from(mm))
> + }
> +
> + /// Returns a raw pointer to the inner `mm_struct`.
> + #[inline]
> + pub fn as_raw(&self) -> *mut bindings::mm_struct {
> + self.mm.get()
> + }
> +
> + /// Obtain a reference from a raw pointer.
> + ///
> + /// # Safety
> + ///
> + /// The caller must ensure that `ptr` points at an `mm_struct`, and that it is not deallocated
> + /// during the lifetime 'a.
> + #[inline]
> + pub unsafe fn from_raw<'a>(ptr: *const bindings::mm_struct) -> &'a Mm {
> + // SAFETY: Caller promises that the pointer is valid for 'a. Layouts are compatible due to
> + // repr(transparent).
> + unsafe { &*ptr.cast() }
> + }
> +
> + /// Calls `mmget_not_zero` and returns a handle if it succeeds.
> + #[inline]
> + pub fn mmget_not_zero(&self) -> Option<ARef<MmWithUser>> {
> + // SAFETY: The pointer is valid since self is a reference.
> + let success = unsafe { bindings::mmget_not_zero(self.as_raw()) };
> +
> + if success {
> + // SAFETY: We just created an `mmget` refcount.
> + Some(unsafe { ARef::from_raw(NonNull::new_unchecked(self.as_raw().cast())) })
> + } else {
> + None
> + }
> + }
> +}
Nit: could we put the impl next to the struct definition?
Best regards,
Andreas Hindborg
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 1/8] mm: rust: add abstraction for struct mm_struct
2024-12-16 11:31 ` Andreas Hindborg
@ 2025-01-13 9:53 ` Alice Ryhl
2025-01-14 15:48 ` Lorenzo Stoakes
2025-01-15 10:36 ` Andreas Hindborg
0 siblings, 2 replies; 65+ messages in thread
From: Alice Ryhl @ 2025-01-13 9:53 UTC (permalink / raw)
To: Andreas Hindborg, Lorenzo Stoakes
Cc: Miguel Ojeda, Matthew Wilcox, Vlastimil Babka, John Hubbard,
Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo,
=?us-ascii?Q?=3D=3Futf-8=3FQ=3FBj=3DC3?=
=?us-ascii?Q?=3DB6rn=3F=3D?= Roy Baron, Benno Lossin,
Trevor Gross, linux-kernel, linux-mm, rust-for-linux
On Mon, Dec 16, 2024 at 3:50 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>
> "Alice Ryhl" <aliceryhl@google.com> writes:
>
> > These abstractions allow you to reference a `struct mm_struct` using
> > both mmgrab and mmget refcounts. This is done using two Rust types:
> >
> > * Mm - represents an mm_struct where you don't know anything about the
> > value of mm_users.
> > * MmWithUser - represents an mm_struct where you know at compile time
> > that mm_users is non-zero.
> >
> > This allows us to encode in the type system whether a method requires
> > that mm_users is non-zero or not. For instance, you can always call
> > `mmget_not_zero` but you can only call `mmap_read_lock` when mm_users is
> > non-zero.
> >
> > It's possible to access current->mm without a refcount increment, but
> > that is added in a later patch of this series.
> >
> > Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> > ---
> > rust/helpers/helpers.c | 1 +
> > rust/helpers/mm.c | 39 +++++++++
> > rust/kernel/lib.rs | 1 +
> > rust/kernel/mm.rs | 219 +++++++++++++++++++++++++++++++++++++++++++++++++
> > 4 files changed, 260 insertions(+)
> >
> > diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
> > new file mode 100644
> > index 000000000000..84cba581edaa
> > --- /dev/null
> > +++ b/rust/kernel/mm.rs
> > @@ -0,0 +1,219 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +// Copyright (C) 2024 Google LLC.
> > +
> > +//! Memory management.
>
> Could you add a little more context here?
How about this?
//! Memory management.
//!
//! This module deals with managing the address space of userspace
processes. Each process has an
//! instance of [`Mm`], which keeps track of multiple VMAs (virtual
memory areas). Each VMA
//! corresponds to a region of memory that the userspace process can
access, and the VMA lets you
//! control what happens when userspace reads or writes to that region
of memory.
//!
//! C header: [`include/linux/mm.h`](srctree/include/linux/mm.h)
> > +//!
> > +//! C header: [`include/linux/mm.h`](srctree/include/linux/mm.h)
> > +
> > +use crate::{
> > + bindings,
> > + types::{ARef, AlwaysRefCounted, NotThreadSafe, Opaque},
> > +};
> > +use core::{ops::Deref, ptr::NonNull};
> > +
> > +/// A wrapper for the kernel's `struct mm_struct`.
>
> Could you elaborate the data structure use case? When do I need it, what
> does it do?
How about this?
/// A wrapper for the kernel's `struct mm_struct`.
///
/// This represents the address space of a userspace process, so each
process has one `Mm`
/// instance. It may hold many VMAs internally.
///
/// There is a counter called `mm_users` that counts the users of the
address space; this includes
/// the userspace process itself, but can also include kernel threads
accessing the address space.
/// Once `mm_users` reaches zero, this indicates that the address
space can be destroyed. To access
/// the address space, you must prevent `mm_users` from reaching zero
while you are accessing it.
/// The [`MmWithUser`] type represents an address space where this is
guaranteed, and you can
/// create one using [`mmget_not_zero`].
///
/// The `ARef<Mm>` smart pointer holds an `mmgrab` refcount. Its
destructor may sleep.
> > +///
> > +/// Since `mm_users` may be zero, the associated address space may not exist anymore. You can use
> > +/// [`mmget_not_zero`] to be able to access the address space.
> > +///
> > +/// The `ARef<Mm>` smart pointer holds an `mmgrab` refcount. Its destructor may sleep.
> > +///
> > +/// # Invariants
> > +///
> > +/// Values of this type are always refcounted using `mmgrab`.
> > +///
> > +/// [`mmget_not_zero`]: Mm::mmget_not_zero
> > +#[repr(transparent)]
> > +pub struct Mm {
>
> Could we come up with a better name? `MemoryMap` or `MemoryMapping`?. You
> use `MMapReadGuard` later.
Those names seem really confusing to me. The mmap syscall creates a
new VMA, but MemoryMap sounds like it's the thing that mmap creates.
Lorenzo, what do you think? I'm inclined to just call it Mm since
that's what C calls it.
Alice
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 1/8] mm: rust: add abstraction for struct mm_struct
2025-01-13 9:53 ` Alice Ryhl
@ 2025-01-14 15:48 ` Lorenzo Stoakes
2025-01-15 1:54 ` John Hubbard
2025-01-15 10:36 ` Andreas Hindborg
1 sibling, 1 reply; 65+ messages in thread
From: Lorenzo Stoakes @ 2025-01-14 15:48 UTC (permalink / raw)
To: Alice Ryhl
Cc: Andreas Hindborg, Miguel Ojeda, Matthew Wilcox, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo,
=?us-ascii?Q?=3D=3Futf-8=3FQ=3FBj=3DC3?=
=?us-ascii?Q?=3DB6rn=3F=3D?= Roy Baron, Benno Lossin,
Trevor Gross, linux-kernel, linux-mm, rust-for-linux
On Mon, Jan 13, 2025 at 10:53:33AM +0100, Alice Ryhl wrote:
> On Mon, Dec 16, 2024 at 3:50 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
> >
> > "Alice Ryhl" <aliceryhl@google.com> writes:
> >
> > > These abstractions allow you to reference a `struct mm_struct` using
> > > both mmgrab and mmget refcounts. This is done using two Rust types:
> > >
> > > * Mm - represents an mm_struct where you don't know anything about the
> > > value of mm_users.
> > > * MmWithUser - represents an mm_struct where you know at compile time
> > > that mm_users is non-zero.
> > >
> > > This allows us to encode in the type system whether a method requires
> > > that mm_users is non-zero or not. For instance, you can always call
> > > `mmget_not_zero` but you can only call `mmap_read_lock` when mm_users is
> > > non-zero.
> > >
> > > It's possible to access current->mm without a refcount increment, but
> > > that is added in a later patch of this series.
> > >
> > > Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> > > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> > > ---
> > > rust/helpers/helpers.c | 1 +
> > > rust/helpers/mm.c | 39 +++++++++
> > > rust/kernel/lib.rs | 1 +
> > > rust/kernel/mm.rs | 219 +++++++++++++++++++++++++++++++++++++++++++++++++
> > > 4 files changed, 260 insertions(+)
> > >
> > > diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
> > > new file mode 100644
> > > index 000000000000..84cba581edaa
> > > --- /dev/null
> > > +++ b/rust/kernel/mm.rs
> > > @@ -0,0 +1,219 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +
> > > +// Copyright (C) 2024 Google LLC.
> > > +
> > > +//! Memory management.
> >
> > Could you add a little more context here?
>
> How about this?
>
> //! Memory management.
> //!
> //! This module deals with managing the address space of userspace
> processes. Each process has an
> //! instance of [`Mm`], which keeps track of multiple VMAs (virtual
> memory areas). Each VMA
> //! corresponds to a region of memory that the userspace process can
> access, and the VMA lets you
> //! control what happens when userspace reads or writes to that region
> of memory.
> //!
> //! C header: [`include/linux/mm.h`](srctree/include/linux/mm.h)
>
> > > +//!
> > > +//! C header: [`include/linux/mm.h`](srctree/include/linux/mm.h)
> > > +
> > > +use crate::{
> > > + bindings,
> > > + types::{ARef, AlwaysRefCounted, NotThreadSafe, Opaque},
> > > +};
> > > +use core::{ops::Deref, ptr::NonNull};
> > > +
> > > +/// A wrapper for the kernel's `struct mm_struct`.
> >
> > Could you elaborate the data structure use case? When do I need it, what
> > does it do?
>
> How about this?
>
> /// A wrapper for the kernel's `struct mm_struct`.
> ///
> /// This represents the address space of a userspace process, so each
> process has one `Mm`
> /// instance. It may hold many VMAs internally.
> ///
> /// There is a counter called `mm_users` that counts the users of the
> address space; this includes
> /// the userspace process itself, but can also include kernel threads
> accessing the address space.
> /// Once `mm_users` reaches zero, this indicates that the address
> space can be destroyed. To access
> /// the address space, you must prevent `mm_users` from reaching zero
> while you are accessing it.
> /// The [`MmWithUser`] type represents an address space where this is
> guaranteed, and you can
> /// create one using [`mmget_not_zero`].
> ///
> /// The `ARef<Mm>` smart pointer holds an `mmgrab` refcount. Its
> destructor may sleep.
>
> > > +///
> > > +/// Since `mm_users` may be zero, the associated address space may not exist anymore. You can use
> > > +/// [`mmget_not_zero`] to be able to access the address space.
> > > +///
> > > +/// The `ARef<Mm>` smart pointer holds an `mmgrab` refcount. Its destructor may sleep.
> > > +///
> > > +/// # Invariants
> > > +///
> > > +/// Values of this type are always refcounted using `mmgrab`.
> > > +///
> > > +/// [`mmget_not_zero`]: Mm::mmget_not_zero
> > > +#[repr(transparent)]
> > > +pub struct Mm {
> >
> > Could we come up with a better name? `MemoryMap` or `MemoryMapping`?. You
> > use `MMapReadGuard` later.
>
> Those names seem really confusing to me. The mmap syscall creates a
> new VMA, but MemoryMap sounds like it's the thing that mmap creates.
>
> Lorenzo, what do you think? I'm inclined to just call it Mm since
> that's what C calls it.
I think Mm is better just for aligment with the C stuff, I mean the alternative
is MmStruct or something and... yuck.
And like, here I am TOTALLY onboard with Andreas here, because this naming
SUCKS. But it sucks on the C side too (we're experts at bad naming :). So for
consistency, let's suck everywhere...
Feel free to put a comment about this being a bad name if you like
though... (not obligatory :)
>
> Alice
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 1/8] mm: rust: add abstraction for struct mm_struct
2025-01-14 15:48 ` Lorenzo Stoakes
@ 2025-01-15 1:54 ` John Hubbard
2025-01-15 12:13 ` Lorenzo Stoakes
0 siblings, 1 reply; 65+ messages in thread
From: John Hubbard @ 2025-01-15 1:54 UTC (permalink / raw)
To: Lorenzo Stoakes, Alice Ryhl
Cc: Andreas Hindborg, Miguel Ojeda, Matthew Wilcox, Vlastimil Babka,
Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
On 1/14/25 7:48 AM, Lorenzo Stoakes wrote:
> On Mon, Jan 13, 2025 at 10:53:33AM +0100, Alice Ryhl wrote:
>> On Mon, Dec 16, 2024 at 3:50 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>>> "Alice Ryhl" <aliceryhl@google.com> writes:
...
>>>> +/// [`mmget_not_zero`]: Mm::mmget_not_zero
>>>> +#[repr(transparent)]
>>>> +pub struct Mm {
>>>
>>> Could we come up with a better name? `MemoryMap` or `MemoryMapping`?. You
>>> use `MMapReadGuard` later.
>>
>> Those names seem really confusing to me. The mmap syscall creates a
>> new VMA, but MemoryMap sounds like it's the thing that mmap creates.
>>
>> Lorenzo, what do you think? I'm inclined to just call it Mm since
>> that's what C calls it.
>
> I think Mm is better just for aligment with the C stuff, I mean the alternative
> is MmStruct or something and... yuck.
For what it's worth, I think using the C naming here is a very good approach.
Because if you come up with a "good" name that is different than what C has
been calling it for 30+ years, then we have to be very thorough in associating
that new name with the C name. And it's hard.
And "mm struct" goes waaay back. Just use that name and everyone will know
what it means.
For less well-established areas, with fewer callers, there is much more
freedom to come up with new, better names.
>
> And like, here I am TOTALLY onboard with Andreas here, because this naming
> SUCKS. But it sucks on the C side too (we're experts at bad naming :). So for
> consistency, let's suck everywhere...
>
> Feel free to put a comment about this being a bad name if you like
> though... (not obligatory :)
For mm struct? Maybe let's not! Explanation without the criticism seems
more appropriate imho. :)
btw, I'm very excited to see all of this Rust for Linux progress, it is
wonderful! Thank you for this!
thanks,
--
John Hubbard
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 1/8] mm: rust: add abstraction for struct mm_struct
2025-01-15 1:54 ` John Hubbard
@ 2025-01-15 12:13 ` Lorenzo Stoakes
0 siblings, 0 replies; 65+ messages in thread
From: Lorenzo Stoakes @ 2025-01-15 12:13 UTC (permalink / raw)
To: John Hubbard
Cc: Alice Ryhl, Andreas Hindborg, Miguel Ojeda, Matthew Wilcox,
Vlastimil Babka, Liam R. Howlett, Andrew Morton,
Greg Kroah-Hartman, Arnd Bergmann, Christian Brauner, Jann Horn,
Suren Baghdasaryan, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Trevor Gross, linux-kernel,
linux-mm, rust-for-linux
On Tue, Jan 14, 2025 at 05:54:15PM -0800, John Hubbard wrote:
> On 1/14/25 7:48 AM, Lorenzo Stoakes wrote:
> > On Mon, Jan 13, 2025 at 10:53:33AM +0100, Alice Ryhl wrote:
> > > On Mon, Dec 16, 2024 at 3:50 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
> > > > "Alice Ryhl" <aliceryhl@google.com> writes:
> ...
> > > > > +/// [`mmget_not_zero`]: Mm::mmget_not_zero
> > > > > +#[repr(transparent)]
> > > > > +pub struct Mm {
> > > >
> > > > Could we come up with a better name? `MemoryMap` or `MemoryMapping`?. You
> > > > use `MMapReadGuard` later.
> > >
> > > Those names seem really confusing to me. The mmap syscall creates a
> > > new VMA, but MemoryMap sounds like it's the thing that mmap creates.
> > >
> > > Lorenzo, what do you think? I'm inclined to just call it Mm since
> > > that's what C calls it.
> >
> > I think Mm is better just for aligment with the C stuff, I mean the alternative
> > is MmStruct or something and... yuck.
>
> For what it's worth, I think using the C naming here is a very good approach.
> Because if you come up with a "good" name that is different than what C has
> been calling it for 30+ years, then we have to be very thorough in associating
> that new name with the C name. And it's hard.
100% agree!
>
> And "mm struct" goes waaay back. Just use that name and everyone will know
> what it means.
>
> For less well-established areas, with fewer callers, there is much more
> freedom to come up with new, better names.
>
> >
> > And like, here I am TOTALLY onboard with Andreas here, because this naming
> > SUCKS. But it sucks on the C side too (we're experts at bad naming :). So for
> > consistency, let's suck everywhere...
> >
> > Feel free to put a comment about this being a bad name if you like
> > though... (not obligatory :)
>
> For mm struct? Maybe let's not! Explanation without the criticism seems
> more appropriate imho. :)
;) Well one could phrase this in a relatifvely benign way for instance 'while
this name may seem a little unclear, historically it has been used as a
shorthand within the kernel for time immemorial' or such.
>
> btw, I'm very excited to see all of this Rust for Linux progress, it is
> wonderful! Thank you for this!
+1 to this sentiment, am very happy to try to do my best to add value to get
this series in as - from my perspective - I want the compiler to tell me when I
make mistakes nice and early :))
Thanks Alice, Andreas and all involved!
>
>
> thanks,
> --
> John Hubbard
>
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH v11 1/8] mm: rust: add abstraction for struct mm_struct
2025-01-13 9:53 ` Alice Ryhl
2025-01-14 15:48 ` Lorenzo Stoakes
@ 2025-01-15 10:36 ` Andreas Hindborg
2025-01-15 20:20 ` John Hubbard
1 sibling, 1 reply; 65+ messages in thread
From: Andreas Hindborg @ 2025-01-15 10:36 UTC (permalink / raw)
To: Alice Ryhl
Cc: Lorenzo Stoakes, Miguel Ojeda, Matthew Wilcox, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
"Alice Ryhl" <aliceryhl@google.com> writes:
> On Mon, Dec 16, 2024 at 3:50 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>>
>> "Alice Ryhl" <aliceryhl@google.com> writes:
>>
>> > These abstractions allow you to reference a `struct mm_struct` using
>> > both mmgrab and mmget refcounts. This is done using two Rust types:
>> >
>> > * Mm - represents an mm_struct where you don't know anything about the
>> > value of mm_users.
>> > * MmWithUser - represents an mm_struct where you know at compile time
>> > that mm_users is non-zero.
>> >
>> > This allows us to encode in the type system whether a method requires
>> > that mm_users is non-zero or not. For instance, you can always call
>> > `mmget_not_zero` but you can only call `mmap_read_lock` when mm_users is
>> > non-zero.
>> >
>> > It's possible to access current->mm without a refcount increment, but
>> > that is added in a later patch of this series.
>> >
>> > Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
>> > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
>> > ---
>> > rust/helpers/helpers.c | 1 +
>> > rust/helpers/mm.c | 39 +++++++++
>> > rust/kernel/lib.rs | 1 +
>> > rust/kernel/mm.rs | 219 +++++++++++++++++++++++++++++++++++++++++++++++++
>> > 4 files changed, 260 insertions(+)
>> >
>> > diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
>> > new file mode 100644
>> > index 000000000000..84cba581edaa
>> > --- /dev/null
>> > +++ b/rust/kernel/mm.rs
>> > @@ -0,0 +1,219 @@
>> > +// SPDX-License-Identifier: GPL-2.0
>> > +
>> > +// Copyright (C) 2024 Google LLC.
>> > +
>> > +//! Memory management.
>>
>> Could you add a little more context here?
>
> How about this?
>
> //! Memory management.
> //!
> //! This module deals with managing the address space of userspace
> processes. Each process has an
> //! instance of [`Mm`], which keeps track of multiple VMAs (virtual
> memory areas). Each VMA
> //! corresponds to a region of memory that the userspace process can
> access, and the VMA lets you
> //! control what happens when userspace reads or writes to that region
> of memory.
> //!
> //! C header: [`include/linux/mm.h`](srctree/include/linux/mm.h)
Nice 👍
>
>> > +//!
>> > +//! C header: [`include/linux/mm.h`](srctree/include/linux/mm.h)
>> > +
>> > +use crate::{
>> > + bindings,
>> > + types::{ARef, AlwaysRefCounted, NotThreadSafe, Opaque},
>> > +};
>> > +use core::{ops::Deref, ptr::NonNull};
>> > +
>> > +/// A wrapper for the kernel's `struct mm_struct`.
>>
>> Could you elaborate the data structure use case? When do I need it, what
>> does it do?
>
> How about this?
>
> /// A wrapper for the kernel's `struct mm_struct`.
> ///
> /// This represents the address space of a userspace process, so each
> process has one `Mm`
> /// instance. It may hold many VMAs internally.
> ///
> /// There is a counter called `mm_users` that counts the users of the
> address space; this includes
> /// the userspace process itself, but can also include kernel threads
> accessing the address space.
> /// Once `mm_users` reaches zero, this indicates that the address
> space can be destroyed. To access
> /// the address space, you must prevent `mm_users` from reaching zero
> while you are accessing it.
> /// The [`MmWithUser`] type represents an address space where this is
> guaranteed, and you can
> /// create one using [`mmget_not_zero`].
> ///
> /// The `ARef<Mm>` smart pointer holds an `mmgrab` refcount. Its
> destructor may sleep.
Cool 👍
>
>> > +///
>> > +/// Since `mm_users` may be zero, the associated address space may not exist anymore. You can use
>> > +/// [`mmget_not_zero`] to be able to access the address space.
>> > +///
>> > +/// The `ARef<Mm>` smart pointer holds an `mmgrab` refcount. Its destructor may sleep.
>> > +///
>> > +/// # Invariants
>> > +///
>> > +/// Values of this type are always refcounted using `mmgrab`.
>> > +///
>> > +/// [`mmget_not_zero`]: Mm::mmget_not_zero
>> > +#[repr(transparent)]
>> > +pub struct Mm {
>>
>> Could we come up with a better name? `MemoryMap` or `MemoryMapping`?. You
>> use `MMapReadGuard` later.
>
> Those names seem really confusing to me. The mmap syscall creates a
> new VMA, but MemoryMap sounds like it's the thing that mmap creates.
>
> Lorenzo, what do you think? I'm inclined to just call it Mm since
> that's what C calls it.
Well I guess there is value in using same names as C. The additional
docs you sent help a lot so I guess it is fine.
If we were writing from scratch I would have held hard on `AddressSpace`
or `MemoryMap` over `Mm`. `Mm` has got to be one of the least
descriptive names we can come up with.
Best regards,
Andreas Hindborg
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 1/8] mm: rust: add abstraction for struct mm_struct
2025-01-15 10:36 ` Andreas Hindborg
@ 2025-01-15 20:20 ` John Hubbard
0 siblings, 0 replies; 65+ messages in thread
From: John Hubbard @ 2025-01-15 20:20 UTC (permalink / raw)
To: Andreas Hindborg, Alice Ryhl
Cc: Lorenzo Stoakes, Miguel Ojeda, Matthew Wilcox, Vlastimil Babka,
Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
On 1/15/25 2:36 AM, Andreas Hindborg wrote:
> "Alice Ryhl" <aliceryhl@google.com> writes:
>> On Mon, Dec 16, 2024 at 3:50 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>>> "Alice Ryhl" <aliceryhl@google.com> writes:
...
>>>> +/// [`mmget_not_zero`]: Mm::mmget_not_zero
>>>> +#[repr(transparent)]
>>>> +pub struct Mm {
>>>
>>> Could we come up with a better name? `MemoryMap` or `MemoryMapping`?. You
>>> use `MMapReadGuard` later.
>>
>> Those names seem really confusing to me. The mmap syscall creates a
>> new VMA, but MemoryMap sounds like it's the thing that mmap creates.
>>
>> Lorenzo, what do you think? I'm inclined to just call it Mm since
>> that's what C calls it.
>
> Well I guess there is value in using same names as C. The additional
> docs you sent help a lot so I guess it is fine.
Hi Andreas!
>
> If we were writing from scratch I would have held hard on `AddressSpace`
> or `MemoryMap` over `Mm`. `Mm` has got to be one of the least
> descriptive names we can come up with.
>
...but, see the other thread: "Mm" is actually very effective in the context
of kernel development. And we are doing a perfect mix of kernel and Rust
development here. So it's not from scratch at all.
Kernel engineers will immediately know what "Mm" means! Really.
thanks,
--
John Hubbard
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH v11 1/8] mm: rust: add abstraction for struct mm_struct
2024-12-11 10:37 ` [PATCH v11 1/8] mm: rust: add abstraction for struct mm_struct Alice Ryhl
2024-12-16 11:31 ` Andreas Hindborg
@ 2025-01-17 0:45 ` Balbir Singh
2025-01-17 12:47 ` Alice Ryhl
1 sibling, 1 reply; 65+ messages in thread
From: Balbir Singh @ 2025-01-17 0:45 UTC (permalink / raw)
To: Alice Ryhl, Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes,
Vlastimil Babka, John Hubbard, Liam R. Howlett, Andrew Morton,
Greg Kroah-Hartman, Arnd Bergmann, Christian Brauner, Jann Horn,
Suren Baghdasaryan
Cc: Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Trevor Gross, linux-kernel,
linux-mm, rust-for-linux
On 12/11/24 21:37, Alice Ryhl wrote:
> These abstractions allow you to reference a `struct mm_struct` using
> both mmgrab and mmget refcounts. This is done using two Rust types:
>
> * Mm - represents an mm_struct where you don't know anything about the
> value of mm_users.
> * MmWithUser - represents an mm_struct where you know at compile time
> that mm_users is non-zero.
>
> This allows us to encode in the type system whether a method requires
> that mm_users is non-zero or not. For instance, you can always call
> `mmget_not_zero` but you can only call `mmap_read_lock` when mm_users is
> non-zero.
>
> It's possible to access current->mm without a refcount increment, but
> that is added in a later patch of this series.
>
> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> ---
It might be good to add some #Examples similar to kernel/task.rs
Acked-by: Balbir Singh <balbirs@nvidia.com>
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH v11 1/8] mm: rust: add abstraction for struct mm_struct
2025-01-17 0:45 ` Balbir Singh
@ 2025-01-17 12:47 ` Alice Ryhl
0 siblings, 0 replies; 65+ messages in thread
From: Alice Ryhl @ 2025-01-17 12:47 UTC (permalink / raw)
To: Balbir Singh
Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Trevor Gross, linux-kernel,
linux-mm, rust-for-linux
On Fri, Jan 17, 2025 at 1:45 AM Balbir Singh <balbirs@nvidia.com> wrote:
>
> On 12/11/24 21:37, Alice Ryhl wrote:
> > These abstractions allow you to reference a `struct mm_struct` using
> > both mmgrab and mmget refcounts. This is done using two Rust types:
> >
> > * Mm - represents an mm_struct where you don't know anything about the
> > value of mm_users.
> > * MmWithUser - represents an mm_struct where you know at compile time
> > that mm_users is non-zero.
> >
> > This allows us to encode in the type system whether a method requires
> > that mm_users is non-zero or not. For instance, you can always call
> > `mmget_not_zero` but you can only call `mmap_read_lock` when mm_users is
> > non-zero.
> >
> > It's possible to access current->mm without a refcount increment, but
> > that is added in a later patch of this series.
> >
> > Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> > ---
>
> It might be good to add some #Examples similar to kernel/task.rs
You're probably right.
> Acked-by: Balbir Singh <balbirs@nvidia.com>
I'll pick this up for v13 since I already sent v12. Thanks!
Alice
^ permalink raw reply [flat|nested] 65+ messages in thread
* [PATCH v11 2/8] mm: rust: add vm_area_struct methods that require read access
2024-12-11 10:37 ` [PATCH v11 0/8] Rust support for mm_struct, vm_area_struct, and mmap Alice Ryhl
2024-12-11 10:37 ` [PATCH v11 1/8] mm: rust: add abstraction for struct mm_struct Alice Ryhl
@ 2024-12-11 10:37 ` Alice Ryhl
2024-12-16 12:12 ` Andreas Hindborg
2024-12-11 10:37 ` [PATCH v11 3/8] mm: rust: add vm_insert_page Alice Ryhl
` (7 subsequent siblings)
9 siblings, 1 reply; 65+ messages in thread
From: Alice Ryhl @ 2024-12-11 10:37 UTC (permalink / raw)
To: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan
Cc: Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Trevor Gross, linux-kernel,
linux-mm, rust-for-linux, Alice Ryhl
This adds a type called VmAreaRef which is used when referencing a vma
that you have read access to. Here, read access means that you hold
either the mmap read lock or the vma read lock (or stronger).
Additionally, a vma_lookup method is added to the mmap read guard, which
enables you to obtain a &VmAreaRef in safe Rust code.
This patch only provides a way to lock the mmap read lock, but a
follow-up patch also provides a way to just lock the vma read lock.
Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
Reviewed-by: Jann Horn <jannh@google.com>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
rust/helpers/mm.c | 6 ++
rust/kernel/mm.rs | 21 ++++++
rust/kernel/mm/virt.rs | 191 +++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 218 insertions(+)
diff --git a/rust/helpers/mm.c b/rust/helpers/mm.c
index 7201747a5d31..7b72eb065a3e 100644
--- a/rust/helpers/mm.c
+++ b/rust/helpers/mm.c
@@ -37,3 +37,9 @@ void rust_helper_mmap_read_unlock(struct mm_struct *mm)
{
mmap_read_unlock(mm);
}
+
+struct vm_area_struct *rust_helper_vma_lookup(struct mm_struct *mm,
+ unsigned long addr)
+{
+ return vma_lookup(mm, addr);
+}
diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
index 84cba581edaa..ace8e7d57afe 100644
--- a/rust/kernel/mm.rs
+++ b/rust/kernel/mm.rs
@@ -12,6 +12,8 @@
};
use core::{ops::Deref, ptr::NonNull};
+pub mod virt;
+
/// A wrapper for the kernel's `struct mm_struct`.
///
/// Since `mm_users` may be zero, the associated address space may not exist anymore. You can use
@@ -210,6 +212,25 @@ pub struct MmapReadGuard<'a> {
_nts: NotThreadSafe,
}
+impl<'a> MmapReadGuard<'a> {
+ /// Look up a vma at the given address.
+ #[inline]
+ pub fn vma_lookup(&self, vma_addr: usize) -> Option<&virt::VmAreaRef> {
+ // SAFETY: We hold a reference to the mm, so the pointer must be valid. Any value is okay
+ // for `vma_addr`.
+ let vma = unsafe { bindings::vma_lookup(self.mm.as_raw(), vma_addr as _) };
+
+ if vma.is_null() {
+ None
+ } else {
+ // SAFETY: We just checked that a vma was found, so the pointer is valid. Furthermore,
+ // the returned area will borrow from this read lock guard, so it can only be used
+ // while the mmap read lock is still held.
+ unsafe { Some(virt::VmAreaRef::from_raw(vma)) }
+ }
+ }
+}
+
impl Drop for MmapReadGuard<'_> {
#[inline]
fn drop(&mut self) {
diff --git a/rust/kernel/mm/virt.rs b/rust/kernel/mm/virt.rs
new file mode 100644
index 000000000000..68c763169cf0
--- /dev/null
+++ b/rust/kernel/mm/virt.rs
@@ -0,0 +1,191 @@
+// SPDX-License-Identifier: GPL-2.0
+
+// Copyright (C) 2024 Google LLC.
+
+//! Virtual memory.
+
+use crate::{bindings, mm::MmWithUser, types::Opaque};
+
+/// A wrapper for the kernel's `struct vm_area_struct` with read access.
+///
+/// It represents an area of virtual memory.
+///
+/// # Invariants
+///
+/// The caller must hold the mmap read lock or the vma read lock.
+#[repr(transparent)]
+pub struct VmAreaRef {
+ vma: Opaque<bindings::vm_area_struct>,
+}
+
+// Methods you can call when holding the mmap or vma read lock (or strong). They must be usable no
+// matter what the vma flags are.
+impl VmAreaRef {
+ /// Access a virtual memory area given a raw pointer.
+ ///
+ /// # Safety
+ ///
+ /// Callers must ensure that `vma` is valid for the duration of 'a, and that the mmap or vma
+ /// read lock (or stronger) is held for at least the duration of 'a.
+ #[inline]
+ pub unsafe fn from_raw<'a>(vma: *const bindings::vm_area_struct) -> &'a Self {
+ // SAFETY: The caller ensures that the invariants are satisfied for the duration of 'a.
+ unsafe { &*vma.cast() }
+ }
+
+ /// Returns a raw pointer to this area.
+ #[inline]
+ pub fn as_ptr(&self) -> *mut bindings::vm_area_struct {
+ self.vma.get()
+ }
+
+ /// Access the underlying `mm_struct`.
+ #[inline]
+ pub fn mm(&self) -> &MmWithUser {
+ // SAFETY: By the type invariants, this `vm_area_struct` is valid and we hold the mmap/vma
+ // read lock or stronger. This implies that the underlying mm has a non-zero value of
+ // `mm_users`.
+ unsafe { MmWithUser::from_raw((*self.as_ptr()).vm_mm) }
+ }
+
+ /// Returns the flags associated with the virtual memory area.
+ ///
+ /// The possible flags are a combination of the constants in [`flags`].
+ #[inline]
+ pub fn flags(&self) -> vm_flags_t {
+ // SAFETY: By the type invariants, the caller holds at least the mmap read lock, so this
+ // access is not a data race.
+ unsafe { (*self.as_ptr()).__bindgen_anon_2.vm_flags as _ }
+ }
+
+ /// Returns the (inclusive) start address of the virtual memory area.
+ #[inline]
+ pub fn start(&self) -> usize {
+ // SAFETY: By the type invariants, the caller holds at least the mmap read lock, so this
+ // access is not a data race.
+ unsafe { (*self.as_ptr()).__bindgen_anon_1.__bindgen_anon_1.vm_start as _ }
+ }
+
+ /// Returns the (exclusive) end address of the virtual memory area.
+ #[inline]
+ pub fn end(&self) -> usize {
+ // SAFETY: By the type invariants, the caller holds at least the mmap read lock, so this
+ // access is not a data race.
+ unsafe { (*self.as_ptr()).__bindgen_anon_1.__bindgen_anon_1.vm_end as _ }
+ }
+
+ /// Zap pages in the given page range.
+ ///
+ /// This clears page table mappings for the range at the leaf level, leaving all other page
+ /// tables intact, and freeing any memory referenced by the VMA in this range. That is,
+ /// anonymous memory is completely freed, file-backed memory has its reference count on page
+ /// cache folio's dropped, any dirty data will still be written back to disk as usual.
+ #[inline]
+ pub fn zap_page_range_single(&self, address: usize, size: usize) {
+ let (end, did_overflow) = address.overflowing_add(size);
+ if did_overflow || address < self.start() || self.end() < end {
+ // TODO: call WARN_ONCE once Rust version of it is added
+ return;
+ }
+
+ // SAFETY: By the type invariants, the caller has read access to this VMA, which is
+ // sufficient for this method call. This method has no requirements on the vma flags. The
+ // address range is checked to be within the vma.
+ unsafe {
+ bindings::zap_page_range_single(
+ self.as_ptr(),
+ address as _,
+ size as _,
+ core::ptr::null_mut(),
+ )
+ };
+ }
+}
+
+/// The integer type used for vma flags.
+#[doc(inline)]
+pub use bindings::vm_flags_t;
+
+/// All possible flags for [`VmAreaRef`].
+pub mod flags {
+ use super::vm_flags_t;
+ use crate::bindings;
+
+ /// No flags are set.
+ pub const NONE: vm_flags_t = bindings::VM_NONE as _;
+
+ /// Mapping allows reads.
+ pub const READ: vm_flags_t = bindings::VM_READ as _;
+
+ /// Mapping allows writes.
+ pub const WRITE: vm_flags_t = bindings::VM_WRITE as _;
+
+ /// Mapping allows execution.
+ pub const EXEC: vm_flags_t = bindings::VM_EXEC as _;
+
+ /// Mapping is shared.
+ pub const SHARED: vm_flags_t = bindings::VM_SHARED as _;
+
+ /// Mapping may be updated to allow reads.
+ pub const MAYREAD: vm_flags_t = bindings::VM_MAYREAD as _;
+
+ /// Mapping may be updated to allow writes.
+ pub const MAYWRITE: vm_flags_t = bindings::VM_MAYWRITE as _;
+
+ /// Mapping may be updated to allow execution.
+ pub const MAYEXEC: vm_flags_t = bindings::VM_MAYEXEC as _;
+
+ /// Mapping may be updated to be shared.
+ pub const MAYSHARE: vm_flags_t = bindings::VM_MAYSHARE as _;
+
+ /// Page-ranges managed without `struct page`, just pure PFN.
+ pub const PFNMAP: vm_flags_t = bindings::VM_PFNMAP as _;
+
+ /// Memory mapped I/O or similar.
+ pub const IO: vm_flags_t = bindings::VM_IO as _;
+
+ /// Do not copy this vma on fork.
+ pub const DONTCOPY: vm_flags_t = bindings::VM_DONTCOPY as _;
+
+ /// Cannot expand with mremap().
+ pub const DONTEXPAND: vm_flags_t = bindings::VM_DONTEXPAND as _;
+
+ /// Lock the pages covered when they are faulted in.
+ pub const LOCKONFAULT: vm_flags_t = bindings::VM_LOCKONFAULT as _;
+
+ /// Is a VM accounted object.
+ pub const ACCOUNT: vm_flags_t = bindings::VM_ACCOUNT as _;
+
+ /// Should the VM suppress accounting.
+ pub const NORESERVE: vm_flags_t = bindings::VM_NORESERVE as _;
+
+ /// Huge TLB Page VM.
+ pub const HUGETLB: vm_flags_t = bindings::VM_HUGETLB as _;
+
+ /// Synchronous page faults. (DAX-specific)
+ pub const SYNC: vm_flags_t = bindings::VM_SYNC as _;
+
+ /// Architecture-specific flag.
+ pub const ARCH_1: vm_flags_t = bindings::VM_ARCH_1 as _;
+
+ /// Wipe VMA contents in child on fork.
+ pub const WIPEONFORK: vm_flags_t = bindings::VM_WIPEONFORK as _;
+
+ /// Do not include in the core dump.
+ pub const DONTDUMP: vm_flags_t = bindings::VM_DONTDUMP as _;
+
+ /// Not soft dirty clean area.
+ pub const SOFTDIRTY: vm_flags_t = bindings::VM_SOFTDIRTY as _;
+
+ /// Can contain `struct page` and pure PFN pages.
+ pub const MIXEDMAP: vm_flags_t = bindings::VM_MIXEDMAP as _;
+
+ /// MADV_HUGEPAGE marked this vma.
+ pub const HUGEPAGE: vm_flags_t = bindings::VM_HUGEPAGE as _;
+
+ /// MADV_NOHUGEPAGE marked this vma.
+ pub const NOHUGEPAGE: vm_flags_t = bindings::VM_NOHUGEPAGE as _;
+
+ /// KSM may merge identical pages.
+ pub const MERGEABLE: vm_flags_t = bindings::VM_MERGEABLE as _;
+}
--
2.47.1.613.gc27f4b7a9f-goog
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 2/8] mm: rust: add vm_area_struct methods that require read access
2024-12-11 10:37 ` [PATCH v11 2/8] mm: rust: add vm_area_struct methods that require read access Alice Ryhl
@ 2024-12-16 12:12 ` Andreas Hindborg
2025-01-08 12:21 ` Alice Ryhl
0 siblings, 1 reply; 65+ messages in thread
From: Andreas Hindborg @ 2024-12-16 12:12 UTC (permalink / raw)
To: Alice Ryhl
Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo,
=?us-ascii?Q?=3D=3Fus-ascii=3FQ=3F=3D3D?=
=?us-ascii?Q?=3D3Futf-8=3D3FQ=3D3FBj=3D3DC3=3F=3D_=3D=3Fus-ascii=3FQ=3F?=
=?us-ascii?Q?=3D3DB6rn=3D3F=3D3D=3F=3D?= Roy Baron, Benno Lossin,
Trevor Gross, linux-kernel, linux-mm, rust-for-linux
Hi Alice,
In general, can we avoid the `as _` casts? If not, could you elaborate
why they are the right choice here, rather than `try_into`?
Other comments inline below.
"Alice Ryhl" <aliceryhl@google.com> writes:
> This adds a type called VmAreaRef which is used when referencing a vma
> that you have read access to. Here, read access means that you hold
> either the mmap read lock or the vma read lock (or stronger).
>
> Additionally, a vma_lookup method is added to the mmap read guard, which
> enables you to obtain a &VmAreaRef in safe Rust code.
>
> This patch only provides a way to lock the mmap read lock, but a
> follow-up patch also provides a way to just lock the vma read lock.
>
> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> Reviewed-by: Jann Horn <jannh@google.com>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> ---
> rust/helpers/mm.c | 6 ++
> rust/kernel/mm.rs | 21 ++++++
> rust/kernel/mm/virt.rs | 191 +++++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 218 insertions(+)
>
[cut]
> diff --git a/rust/kernel/mm/virt.rs b/rust/kernel/mm/virt.rs
> new file mode 100644
> index 000000000000..68c763169cf0
> --- /dev/null
> +++ b/rust/kernel/mm/virt.rs
> @@ -0,0 +1,191 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +// Copyright (C) 2024 Google LLC.
> +
> +//! Virtual memory.
Could you add a bit more context here?
> +
> +use crate::{bindings, mm::MmWithUser, types::Opaque};
> +
> +/// A wrapper for the kernel's `struct vm_area_struct` with read access.
> +///
> +/// It represents an area of virtual memory.
> +///
> +/// # Invariants
> +///
> +/// The caller must hold the mmap read lock or the vma read lock.
> +#[repr(transparent)]
> +pub struct VmAreaRef {
> + vma: Opaque<bindings::vm_area_struct>,
> +}
> +
> +// Methods you can call when holding the mmap or vma read lock (or
> strong). They must be usable no
typo "strong".
> +// matter what the vma flags are.
> +impl VmAreaRef {
> + /// Access a virtual memory area given a raw pointer.
> + ///
> + /// # Safety
> + ///
> + /// Callers must ensure that `vma` is valid for the duration of 'a, and that the mmap or vma
> + /// read lock (or stronger) is held for at least the duration of 'a.
> + #[inline]
> + pub unsafe fn from_raw<'a>(vma: *const bindings::vm_area_struct) -> &'a Self {
> + // SAFETY: The caller ensures that the invariants are satisfied for the duration of 'a.
> + unsafe { &*vma.cast() }
> + }
> +
> + /// Returns a raw pointer to this area.
> + #[inline]
> + pub fn as_ptr(&self) -> *mut bindings::vm_area_struct {
> + self.vma.get()
> + }
> +
> + /// Access the underlying `mm_struct`.
> + #[inline]
> + pub fn mm(&self) -> &MmWithUser {
> + // SAFETY: By the type invariants, this `vm_area_struct` is valid and we hold the mmap/vma
> + // read lock or stronger. This implies that the underlying mm has a non-zero value of
> + // `mm_users`.
> + unsafe { MmWithUser::from_raw((*self.as_ptr()).vm_mm) }
> + }
> +
> + /// Returns the flags associated with the virtual memory area.
> + ///
> + /// The possible flags are a combination of the constants in [`flags`].
> + #[inline]
> + pub fn flags(&self) -> vm_flags_t {
> + // SAFETY: By the type invariants, the caller holds at least the mmap read lock, so this
> + // access is not a data race.
> + unsafe { (*self.as_ptr()).__bindgen_anon_2.vm_flags as _ }
> + }
> +
> + /// Returns the (inclusive) start address of the virtual memory area.
> + #[inline]
> + pub fn start(&self) -> usize {
> + // SAFETY: By the type invariants, the caller holds at least the mmap read lock, so this
> + // access is not a data race.
> + unsafe { (*self.as_ptr()).__bindgen_anon_1.__bindgen_anon_1.vm_start as _ }
> + }
> +
> + /// Returns the (exclusive) end address of the virtual memory area.
> + #[inline]
> + pub fn end(&self) -> usize {
> + // SAFETY: By the type invariants, the caller holds at least the mmap read lock, so this
> + // access is not a data race.
> + unsafe { (*self.as_ptr()).__bindgen_anon_1.__bindgen_anon_1.vm_end as _ }
> + }
> +
> + /// Zap pages in the given page range.
> + ///
> + /// This clears page table mappings for the range at the leaf level, leaving all other page
> + /// tables intact,
I don't fully understand this docstring. Is it correct that the function
will unmap the address range given by `start` and `size`, _and_ free the
pages used to hold the mappings at the leaf level of the page table?
> and freeing any memory referenced by the VMA in this range. That is,
> + /// anonymous memory is completely freed, file-backed memory has its reference count on page
> + /// cache folio's dropped, any dirty data will still be written back to disk as usual.
> + #[inline]
> + pub fn zap_page_range_single(&self, address: usize, size: usize) {
Best regards,
Andreas Hindborg
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 2/8] mm: rust: add vm_area_struct methods that require read access
2024-12-16 12:12 ` Andreas Hindborg
@ 2025-01-08 12:21 ` Alice Ryhl
2025-01-09 8:02 ` Andreas Hindborg
0 siblings, 1 reply; 65+ messages in thread
From: Alice Ryhl @ 2025-01-08 12:21 UTC (permalink / raw)
To: Andreas Hindborg
Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
On Mon, Dec 16, 2024 at 3:51 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>
>
> Hi Alice,
>
> In general, can we avoid the `as _` casts? If not, could you elaborate
> why they are the right choice here, rather than `try_into`?
They're not fallible and will go away once we merge the patch that
makes integer types match better.
> Other comments inline below.
>
> "Alice Ryhl" <aliceryhl@google.com> writes:
>
> > This adds a type called VmAreaRef which is used when referencing a vma
> > that you have read access to. Here, read access means that you hold
> > either the mmap read lock or the vma read lock (or stronger).
> >
> > Additionally, a vma_lookup method is added to the mmap read guard, which
> > enables you to obtain a &VmAreaRef in safe Rust code.
> >
> > This patch only provides a way to lock the mmap read lock, but a
> > follow-up patch also provides a way to just lock the vma read lock.
> >
> > Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> > Reviewed-by: Jann Horn <jannh@google.com>
> > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> > ---
> > rust/helpers/mm.c | 6 ++
> > rust/kernel/mm.rs | 21 ++++++
> > rust/kernel/mm/virt.rs | 191 +++++++++++++++++++++++++++++++++++++++++++++++++
> > 3 files changed, 218 insertions(+)
> >
>
> [cut]
>
> > diff --git a/rust/kernel/mm/virt.rs b/rust/kernel/mm/virt.rs
> > new file mode 100644
> > index 000000000000..68c763169cf0
> > --- /dev/null
> > +++ b/rust/kernel/mm/virt.rs
> > @@ -0,0 +1,191 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +// Copyright (C) 2024 Google LLC.
> > +
> > +//! Virtual memory.
>
> Could you add a bit more context here?
>
> > +
> > +use crate::{bindings, mm::MmWithUser, types::Opaque};
> > +
> > +/// A wrapper for the kernel's `struct vm_area_struct` with read access.
> > +///
> > +/// It represents an area of virtual memory.
> > +///
> > +/// # Invariants
> > +///
> > +/// The caller must hold the mmap read lock or the vma read lock.
> > +#[repr(transparent)]
> > +pub struct VmAreaRef {
> > + vma: Opaque<bindings::vm_area_struct>,
> > +}
> > +
> > +// Methods you can call when holding the mmap or vma read lock (or
> > strong). They must be usable no
>
> typo "strong".
>
> > +// matter what the vma flags are.
> > +impl VmAreaRef {
> > + /// Access a virtual memory area given a raw pointer.
> > + ///
> > + /// # Safety
> > + ///
> > + /// Callers must ensure that `vma` is valid for the duration of 'a, and that the mmap or vma
> > + /// read lock (or stronger) is held for at least the duration of 'a.
> > + #[inline]
> > + pub unsafe fn from_raw<'a>(vma: *const bindings::vm_area_struct) -> &'a Self {
> > + // SAFETY: The caller ensures that the invariants are satisfied for the duration of 'a.
> > + unsafe { &*vma.cast() }
> > + }
> > +
> > + /// Returns a raw pointer to this area.
> > + #[inline]
> > + pub fn as_ptr(&self) -> *mut bindings::vm_area_struct {
> > + self.vma.get()
> > + }
> > +
> > + /// Access the underlying `mm_struct`.
> > + #[inline]
> > + pub fn mm(&self) -> &MmWithUser {
> > + // SAFETY: By the type invariants, this `vm_area_struct` is valid and we hold the mmap/vma
> > + // read lock or stronger. This implies that the underlying mm has a non-zero value of
> > + // `mm_users`.
> > + unsafe { MmWithUser::from_raw((*self.as_ptr()).vm_mm) }
> > + }
> > +
> > + /// Returns the flags associated with the virtual memory area.
> > + ///
> > + /// The possible flags are a combination of the constants in [`flags`].
> > + #[inline]
> > + pub fn flags(&self) -> vm_flags_t {
> > + // SAFETY: By the type invariants, the caller holds at least the mmap read lock, so this
> > + // access is not a data race.
> > + unsafe { (*self.as_ptr()).__bindgen_anon_2.vm_flags as _ }
> > + }
> > +
> > + /// Returns the (inclusive) start address of the virtual memory area.
> > + #[inline]
> > + pub fn start(&self) -> usize {
> > + // SAFETY: By the type invariants, the caller holds at least the mmap read lock, so this
> > + // access is not a data race.
> > + unsafe { (*self.as_ptr()).__bindgen_anon_1.__bindgen_anon_1.vm_start as _ }
> > + }
> > +
> > + /// Returns the (exclusive) end address of the virtual memory area.
> > + #[inline]
> > + pub fn end(&self) -> usize {
> > + // SAFETY: By the type invariants, the caller holds at least the mmap read lock, so this
> > + // access is not a data race.
> > + unsafe { (*self.as_ptr()).__bindgen_anon_1.__bindgen_anon_1.vm_end as _ }
> > + }
> > +
> > + /// Zap pages in the given page range.
> > + ///
> > + /// This clears page table mappings for the range at the leaf level, leaving all other page
> > + /// tables intact,
>
> I don't fully understand this docstring. Is it correct that the function
> will unmap the address range given by `start` and `size`, _and_ free the
> pages used to hold the mappings at the leaf level of the page table?
If the vma owns a refcount on those pages, then the refcounts are dropped.
> > and freeing any memory referenced by the VMA in this range. That is,
> > + /// anonymous memory is completely freed, file-backed memory has its reference count on page
> > + /// cache folio's dropped, any dirty data will still be written back to disk as usual.
> > + #[inline]
> > + pub fn zap_page_range_single(&self, address: usize, size: usize) {
>
>
> Best regards,
> Andreas Hindborg
>
>
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 2/8] mm: rust: add vm_area_struct methods that require read access
2025-01-08 12:21 ` Alice Ryhl
@ 2025-01-09 8:02 ` Andreas Hindborg
2025-01-09 8:19 ` Lorenzo Stoakes
0 siblings, 1 reply; 65+ messages in thread
From: Andreas Hindborg @ 2025-01-09 8:02 UTC (permalink / raw)
To: Alice Ryhl
Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
"Alice Ryhl" <aliceryhl@google.com> writes:
> On Mon, Dec 16, 2024 at 3:51 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>>
>>
>> > +
>> > + /// Zap pages in the given page range.
>> > + ///
>> > + /// This clears page table mappings for the range at the leaf level, leaving all other page
>> > + /// tables intact,
>>
>> I don't fully understand this docstring. Is it correct that the function
>> will unmap the address range given by `start` and `size`, _and_ free the
>> pages used to hold the mappings at the leaf level of the page table?
>
> If the vma owns a refcount on those pages, then the refcounts are dropped.
Maybe drop the "at the leaf level leaving all other page tables intact".
It confuses me, since when would this not be the case?
How about this:
This clears the virtual memory map for the range given by `start` and
`size`, dropping refcounts to memory held by the mappings in this range. That
is, anonymous memory is completely freed, file-backed memory has its
reference count on page cache folio's dropped, any dirty data will still
be written back to disk as usual.
>
>> > and freeing any memory referenced by the VMA in this range. That is,
>> > + /// anonymous memory is completely freed, file-backed memory has its reference count on page
>> > + /// cache folio's dropped, any dirty data will still be written back to disk as usual.
>> > + #[inline]
>> > + pub fn zap_page_range_single(&self, address: usize, size: usize) {
Best regards,
Andreas Hindborg
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 2/8] mm: rust: add vm_area_struct methods that require read access
2025-01-09 8:02 ` Andreas Hindborg
@ 2025-01-09 8:19 ` Lorenzo Stoakes
2025-01-09 9:50 ` Andreas Hindborg
0 siblings, 1 reply; 65+ messages in thread
From: Lorenzo Stoakes @ 2025-01-09 8:19 UTC (permalink / raw)
To: Andreas Hindborg
Cc: Alice Ryhl, Miguel Ojeda, Matthew Wilcox, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
On Thu, Jan 09, 2025 at 09:02:11AM +0100, Andreas Hindborg wrote:
> "Alice Ryhl" <aliceryhl@google.com> writes:
>
> > On Mon, Dec 16, 2024 at 3:51 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
> >>
> >>
> >> > +
> >> > + /// Zap pages in the given page range.
> >> > + ///
> >> > + /// This clears page table mappings for the range at the leaf level, leaving all other page
> >> > + /// tables intact,
> >>
> >> I don't fully understand this docstring. Is it correct that the function
> >> will unmap the address range given by `start` and `size`, _and_ free the
> >> pages used to hold the mappings at the leaf level of the page table?
> >
> > If the vma owns a refcount on those pages, then the refcounts are dropped.
>
> Maybe drop the "at the leaf level leaving all other page tables intact".
> It confuses me, since when would this not be the case?
I don't understand your objection. The whole nature of a zap is to traverse
leaf level page table mappings, clearing the entries, leaving the other
page table entries intact.
That is, precisely what is written here. In fact I think this
characterisation is derived from discussions had with us in mm, and it is
one with which I am happy.
Why is it problematic to accurately describe what this does?
For a series at v11 where there is broad agreement with maintainers within
the subsystem which it wraps, perhaps the priority should be to try to have
the series merged unless there is significant technical objection from the
rust side?
>
> How about this:
>
> This clears the virtual memory map for the range given by `start` and
> `size`, dropping refcounts to memory held by the mappings in this range. That
> is, anonymous memory is completely freed, file-backed memory has its
> reference count on page cache folio's dropped, any dirty data will still
> be written back to disk as usual.
Sorry I object to this, 'clears the virtual memory map' is really
vague. What is already there is better.
>
> >
> >> > and freeing any memory referenced by the VMA in this range. That is,
> >> > + /// anonymous memory is completely freed, file-backed memory has its reference count on page
> >> > + /// cache folio's dropped, any dirty data will still be written back to disk as usual.
> >> > + #[inline]
> >> > + pub fn zap_page_range_single(&self, address: usize, size: usize) {
>
>
> Best regards,
> Andreas Hindborg
>
>
Let's please get this series merged. I think Alice has demonstrated
remarkable patience already, and modulo significant technical pushback on
the rust side (on which I defer entirely to the expertise of rust people),
I want to see this go in.
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 2/8] mm: rust: add vm_area_struct methods that require read access
2025-01-09 8:19 ` Lorenzo Stoakes
@ 2025-01-09 9:50 ` Andreas Hindborg
2025-01-09 11:29 ` Lorenzo Stoakes
0 siblings, 1 reply; 65+ messages in thread
From: Andreas Hindborg @ 2025-01-09 9:50 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Alice Ryhl, Miguel Ojeda, Matthew Wilcox, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
"Lorenzo Stoakes" <lorenzo.stoakes@oracle.com> writes:
> On Thu, Jan 09, 2025 at 09:02:11AM +0100, Andreas Hindborg wrote:
>> "Alice Ryhl" <aliceryhl@google.com> writes:
>>
>> > On Mon, Dec 16, 2024 at 3:51 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>> >>
>> >>
>> >> > +
>> >> > + /// Zap pages in the given page range.
>> >> > + ///
>> >> > + /// This clears page table mappings for the range at the leaf level, leaving all other page
>> >> > + /// tables intact,
>> >>
>> >> I don't fully understand this docstring. Is it correct that the function
>> >> will unmap the address range given by `start` and `size`, _and_ free the
>> >> pages used to hold the mappings at the leaf level of the page table?
>> >
>> > If the vma owns a refcount on those pages, then the refcounts are dropped.
>>
>> Maybe drop the "at the leaf level leaving all other page tables intact".
>> It confuses me, since when would this not be the case?
>
> I don't understand your objection. The whole nature of a zap is to traverse
> leaf level page table mappings, clearing the entries, leaving the other
> page table entries intact.
As someone not deeply familiar with this function and it's use, I became
uncertain of my understanding when I read this sentence. As I asked
above: When would you not clear mappings at the leaf level and leave all
other mappings alone?
Imagine you have a collection structure backed by a tree and the
`remove_item` function has the sentence "remove item at the leaf level
but leave all other items in the collection alone". That would be over
specifying. It is enough information in the user facing documentation
that the item is removed. You don't need to state that a remove
operation on an item does not remove other items. Does this example
transfer to this function, or am I missing something?
> That is, precisely what is written here. In fact I think this
> characterisation is derived from discussions had with us in mm, and it is
> one with which I am happy.
>
> Why is it problematic to accurately describe what this does?
Again, it might be that I don't properly understand what the function
actually does, but if it is just removing the entries described by the
range - write that. Don't add irrelevant details or specify what the
function does not do. It slows down the user when reading documentation.
>
> For a series at v11 where there is broad agreement with maintainers within
> the subsystem which it wraps, perhaps the priority should be to try to have
> the series merged unless there is significant technical objection from the
> rust side?
>
>>
>> How about this:
>>
>> This clears the virtual memory map for the range given by `start` and
>> `size`, dropping refcounts to memory held by the mappings in this range. That
>> is, anonymous memory is completely freed, file-backed memory has its
>> reference count on page cache folio's dropped, any dirty data will still
>> be written back to disk as usual.
>
> Sorry I object to this, 'clears the virtual memory map' is really
> vague. What is already there is better.
Would you like the proposed paragraph if we replaced "virtual memory
map" with "page table mappings", or do you object to the entirety of the
new suggestion?
>
>>
>> >
>> >> > and freeing any memory referenced by the VMA in this range. That is,
>> >> > + /// anonymous memory is completely freed, file-backed memory has its reference count on page
>> >> > + /// cache folio's dropped, any dirty data will still be written back to disk as usual.
>> >> > + #[inline]
>> >> > + pub fn zap_page_range_single(&self, address: usize, size: usize) {
>>
>>
>> Best regards,
>> Andreas Hindborg
>>
>>
>
> Let's please get this series merged. I think Alice has demonstrated
> remarkable patience already, and modulo significant technical pushback on
> the rust side (on which I defer entirely to the expertise of rust people),
> I want to see this go in.
I am sensing that you don't feel my comments are relevant at the current
stage of this series (v11). Alice asked for reviews of the series. These are my
comments. Feel free do whatever you want with them.
Best regards,
Andreas Hindborg
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 2/8] mm: rust: add vm_area_struct methods that require read access
2025-01-09 9:50 ` Andreas Hindborg
@ 2025-01-09 11:29 ` Lorenzo Stoakes
2025-01-09 15:32 ` Andreas Hindborg
0 siblings, 1 reply; 65+ messages in thread
From: Lorenzo Stoakes @ 2025-01-09 11:29 UTC (permalink / raw)
To: Andreas Hindborg
Cc: Alice Ryhl, Miguel Ojeda, Matthew Wilcox, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
On Thu, Jan 09, 2025 at 10:50:13AM +0100, Andreas Hindborg wrote:
> "Lorenzo Stoakes" <lorenzo.stoakes@oracle.com> writes:
>
> > On Thu, Jan 09, 2025 at 09:02:11AM +0100, Andreas Hindborg wrote:
> >> "Alice Ryhl" <aliceryhl@google.com> writes:
> >>
> >> > On Mon, Dec 16, 2024 at 3:51 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
> >> >>
> >> >>
> >> >> > +
> >> >> > + /// Zap pages in the given page range.
> >> >> > + ///
> >> >> > + /// This clears page table mappings for the range at the leaf level, leaving all other page
> >> >> > + /// tables intact,
> >> >>
> >> >> I don't fully understand this docstring. Is it correct that the function
> >> >> will unmap the address range given by `start` and `size`, _and_ free the
> >> >> pages used to hold the mappings at the leaf level of the page table?
> >> >
> >> > If the vma owns a refcount on those pages, then the refcounts are dropped.
> >>
> >> Maybe drop the "at the leaf level leaving all other page tables intact".
> >> It confuses me, since when would this not be the case?
> >
> > I don't understand your objection. The whole nature of a zap is to traverse
> > leaf level page table mappings, clearing the entries, leaving the other
> > page table entries intact.
>
> As someone not deeply familiar with this function and it's use, I became
> uncertain of my understanding when I read this sentence. As I asked
> above: When would you not clear mappings at the leaf level and leave all
> other mappings alone?
Because these are page tables and page tables can span multiple PTE
tables. Correctly removing at the time of clearing would be expensive and
require very careful handling.
>
> Imagine you have a collection structure backed by a tree and the
> `remove_item` function has the sentence "remove item at the leaf level
> but leave all other items in the collection alone". That would be over
> specifying. It is enough information in the user facing documentation
> that the item is removed. You don't need to state that a remove
> operation on an item does not remove other items. Does this example
> transfer to this function, or am I missing something?
No, because we're dealing with page tables and you are explicitly requesting a
page table operation. Knowing what is touched is meaningful.
>
> > That is, precisely what is written here. In fact I think this
> > characterisation is derived from discussions had with us in mm, and it is
> > one with which I am happy.
> >
> > Why is it problematic to accurately describe what this does?
>
> Again, it might be that I don't properly understand what the function
> actually does, but if it is just removing the entries described by the
> range - write that. Don't add irrelevant details or specify what the
> function does not do. It slows down the user when reading documentation.
It is highly pertinent as mentioned above.
I mean we can expand the comment to explicitly add some detail around this
since obviously this is confusing (hey - a lot of mm is confusing - this is
an ongonig problem and why I have gone to lengths to try to improve
documentation and wrote a book about it :)
>
> >
> > For a series at v11 where there is broad agreement with maintainers within
> > the subsystem which it wraps, perhaps the priority should be to try to have
> > the series merged unless there is significant technical objection from the
> > rust side?
> >
> >>
> >> How about this:
> >>
> >> This clears the virtual memory map for the range given by `start` and
> >> `size`, dropping refcounts to memory held by the mappings in this range. That
> >> is, anonymous memory is completely freed, file-backed memory has its
> >> reference count on page cache folio's dropped, any dirty data will still
> >> be written back to disk as usual.
> >
> > Sorry I object to this, 'clears the virtual memory map' is really
> > vague. What is already there is better.
>
> Would you like the proposed paragraph if we replaced "virtual memory
> map" with "page table mappings", or do you object to the entirety of the
> new suggestion?
I object to the suggestion in general. The description is fine as it is.
>
> >
> >>
> >> >
> >> >> > and freeing any memory referenced by the VMA in this range. That is,
> >> >> > + /// anonymous memory is completely freed, file-backed memory has its reference count on page
> >> >> > + /// cache folio's dropped, any dirty data will still be written back to disk as usual.
> >> >> > + #[inline]
> >> >> > + pub fn zap_page_range_single(&self, address: usize, size: usize) {
> >>
> >>
> >> Best regards,
> >> Andreas Hindborg
> >>
> >>
> >
> > Let's please get this series merged. I think Alice has demonstrated
> > remarkable patience already, and modulo significant technical pushback on
> > the rust side (on which I defer entirely to the expertise of rust people),
> > I want to see this go in.
>
> I am sensing that you don't feel my comments are relevant at the current
> stage of this series (v11). Alice asked for reviews of the series. These are my
> comments. Feel free do whatever you want with them.
I think you're getting the wrong end of the stick - you are making comments
on something relevant to mm, as an mm maintainer I'm giving you my point of
view.
Your comments elsewhere seem highly useful, and review is always
appreciated, if you read what I said above - I defer entirely to the rust
community on things of which you are expert - so there is clearly no
disrespect intended.
I'd also ask you to respect that I have gone to great lengths to review
this series from mm side, motivated by a strong desire to help the rust
commnuity.
So where I am coming from is nothing negative, quite the opposite, I simply
feel _on this issue_ it is not worth holding up the series for.
This is no way intended to do down, disrespect or seem ungrateful for your
review or efforts. Apologies if it seemed that way, was not the intent.
And to reiterate what I said above - I want to see this series merge :) so
there is no ill will anywhere.
>
>
> Best regards,
> Andreas Hindborg
>
Perhaps the correct approach here, as alluded above, is for Alice to add an
extra commentary pointing out the role of page tables here?
To complicate matters further (of course) there are recent series which
actually _do_ unused clean up page tables, though not (I believe... I have
to check...) on zap. But of course we in mm JUST LOVE to complicate
everything... ;)
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 2/8] mm: rust: add vm_area_struct methods that require read access
2025-01-09 11:29 ` Lorenzo Stoakes
@ 2025-01-09 15:32 ` Andreas Hindborg
2025-01-13 14:45 ` Lorenzo Stoakes
0 siblings, 1 reply; 65+ messages in thread
From: Andreas Hindborg @ 2025-01-09 15:32 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Alice Ryhl, Miguel Ojeda, Matthew Wilcox, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
"Lorenzo Stoakes" <lorenzo.stoakes@oracle.com> writes:
> On Thu, Jan 09, 2025 at 10:50:13AM +0100, Andreas Hindborg wrote:
>> "Lorenzo Stoakes" <lorenzo.stoakes@oracle.com> writes:
>>
>> > On Thu, Jan 09, 2025 at 09:02:11AM +0100, Andreas Hindborg wrote:
>> >> "Alice Ryhl" <aliceryhl@google.com> writes:
>> >>
>> >> > On Mon, Dec 16, 2024 at 3:51 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>> >> >>
>> >> >>
>> >> >> > +
>> >> >> > + /// Zap pages in the given page range.
>> >> >> > + ///
>> >> >> > + /// This clears page table mappings for the range at the leaf level, leaving all other page
>> >> >> > + /// tables intact,
>> >> >>
>> >> >> I don't fully understand this docstring. Is it correct that the function
>> >> >> will unmap the address range given by `start` and `size`, _and_ free the
>> >> >> pages used to hold the mappings at the leaf level of the page table?
>> >> >
>> >> > If the vma owns a refcount on those pages, then the refcounts are dropped.
>> >>
>> >> Maybe drop the "at the leaf level leaving all other page tables intact".
>> >> It confuses me, since when would this not be the case?
>> >
>> > I don't understand your objection. The whole nature of a zap is to traverse
>> > leaf level page table mappings, clearing the entries, leaving the other
>> > page table entries intact.
>>
>> As someone not deeply familiar with this function and it's use, I became
>> uncertain of my understanding when I read this sentence. As I asked
>> above: When would you not clear mappings at the leaf level and leave all
>> other mappings alone?
>
> Because these are page tables and page tables can span multiple PTE
> tables. Correctly removing at the time of clearing would be expensive and
> require very careful handling.
What is the distinction between clearing a PTE and removing it?
I asked above if the leaf page holding the PTEs would be dropped if all
the PTEs it holds are cleared. Alice "If the vma owns a refcount on those pages,
then the refcounts are dropped.". But from your message I am guessing
maybe not?
>
>>
>> Imagine you have a collection structure backed by a tree and the
>> `remove_item` function has the sentence "remove item at the leaf level
>> but leave all other items in the collection alone". That would be over
>> specifying. It is enough information in the user facing documentation
>> that the item is removed. You don't need to state that a remove
>> operation on an item does not remove other items. Does this example
>> transfer to this function, or am I missing something?
>
> No, because we're dealing with page tables and you are explicitly requesting a
> page table operation. Knowing what is touched is meaningful.
When would a page table operation to remove (clear?) the PTEs
corresponding to an address range touch PTEs corresponding to addresses
outside of the range?
>
>>
>> > That is, precisely what is written here. In fact I think this
>> > characterisation is derived from discussions had with us in mm, and it is
>> > one with which I am happy.
>> >
>> > Why is it problematic to accurately describe what this does?
>>
>> Again, it might be that I don't properly understand what the function
>> actually does, but if it is just removing the entries described by the
>> range - write that. Don't add irrelevant details or specify what the
>> function does not do. It slows down the user when reading documentation.
>
> It is highly pertinent as mentioned above.
>
> I mean we can expand the comment to explicitly add some detail around this
> since obviously this is confusing (hey - a lot of mm is confusing - this is
> an ongonig problem and why I have gone to lengths to try to improve
> documentation and wrote a book about it :)
That would be nice :)
>
>>
>> >
>> > For a series at v11 where there is broad agreement with maintainers within
>> > the subsystem which it wraps, perhaps the priority should be to try to have
>> > the series merged unless there is significant technical objection from the
>> > rust side?
>> >
>> >>
>> >> How about this:
>> >>
>> >> This clears the virtual memory map for the range given by `start` and
>> >> `size`, dropping refcounts to memory held by the mappings in this range. That
>> >> is, anonymous memory is completely freed, file-backed memory has its
>> >> reference count on page cache folio's dropped, any dirty data will still
>> >> be written back to disk as usual.
>> >
>> > Sorry I object to this, 'clears the virtual memory map' is really
>> > vague. What is already there is better.
>>
>> Would you like the proposed paragraph if we replaced "virtual memory
>> map" with "page table mappings", or do you object to the entirety of the
>> new suggestion?
>
> I object to the suggestion in general. The description is fine as it is.
Ok. I'm raising a flag because I had more questions after reading the
docstring than before.
>
>>
>> >
>> >>
>> >> >
>> >> >> > and freeing any memory referenced by the VMA in this range. That is,
>> >> >> > + /// anonymous memory is completely freed, file-backed memory has its reference count on page
>> >> >> > + /// cache folio's dropped, any dirty data will still be written back to disk as usual.
>> >> >> > + #[inline]
>> >> >> > + pub fn zap_page_range_single(&self, address: usize, size: usize) {
>> >>
>> >>
>> >> Best regards,
>> >> Andreas Hindborg
>> >>
>> >>
>> >
>> > Let's please get this series merged. I think Alice has demonstrated
>> > remarkable patience already, and modulo significant technical pushback on
>> > the rust side (on which I defer entirely to the expertise of rust people),
>> > I want to see this go in.
>>
>> I am sensing that you don't feel my comments are relevant at the current
>> stage of this series (v11). Alice asked for reviews of the series. These are my
>> comments. Feel free do whatever you want with them.
>
> I think you're getting the wrong end of the stick - you are making comments
> on something relevant to mm, as an mm maintainer I'm giving you my point of
> view.
I appreciate that.
>
> Your comments elsewhere seem highly useful, and review is always
> appreciated, if you read what I said above - I defer entirely to the rust
> community on things of which you are expert - so there is clearly no
> disrespect intended.
I did not read any disrespect in your message. I understand if you think
I am late at the party at v11. Normally I would not pick up review of a
series that late.
>
> I'd also ask you to respect that I have gone to great lengths to review
> this series from mm side, motivated by a strong desire to help the rust
> commnuity.
I absolutely appreciate that!
>
> So where I am coming from is nothing negative, quite the opposite, I simply
> feel _on this issue_ it is not worth holding up the series for.
>
> This is no way intended to do down, disrespect or seem ungrateful for your
> review or efforts. Apologies if it seemed that way, was not the intent.
>
> And to reiterate what I said above - I want to see this series merge :) so
> there is no ill will anywhere.
We can always merge this as is and then discuss the finer points of
documentation later - I am fine with that. But obviously I cannot put my
review tag on it, when I don't understand the semantics of the functions
from reading the documentation strings. Perhaps we have someone who is
more well versed in mm that can.
>
>>
>>
>> Best regards,
>> Andreas Hindborg
>>
>
> Perhaps the correct approach here, as alluded above, is for Alice to add an
> extra commentary pointing out the role of page tables here?
That would be nice. Perhaps a bit of module level documentation is also
a good addition.
>
> To complicate matters further (of course) there are recent series which
> actually _do_ unused clean up page tables, though not (I believe... I have
> to check...) on zap. But of course we in mm JUST LOVE to complicate
> everything... ;)
We should make sure to document that :)
Best regards,
Andreas Hindborg
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 2/8] mm: rust: add vm_area_struct methods that require read access
2025-01-09 15:32 ` Andreas Hindborg
@ 2025-01-13 14:45 ` Lorenzo Stoakes
2025-01-14 9:50 ` Alice Ryhl
0 siblings, 1 reply; 65+ messages in thread
From: Lorenzo Stoakes @ 2025-01-13 14:45 UTC (permalink / raw)
To: Andreas Hindborg
Cc: Alice Ryhl, Miguel Ojeda, Matthew Wilcox, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
On Thu, Jan 09, 2025 at 04:32:13PM +0100, Andreas Hindborg wrote:
> "Lorenzo Stoakes" <lorenzo.stoakes@oracle.com> writes:
>
> > On Thu, Jan 09, 2025 at 10:50:13AM +0100, Andreas Hindborg wrote:
> >> "Lorenzo Stoakes" <lorenzo.stoakes@oracle.com> writes:
> >>
> >> > On Thu, Jan 09, 2025 at 09:02:11AM +0100, Andreas Hindborg wrote:
> >> >> "Alice Ryhl" <aliceryhl@google.com> writes:
> >> >>
> >> >> > On Mon, Dec 16, 2024 at 3:51 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
> >> >> >>
> >> >> >>
> >> >> >> > +
> >> >> >> > + /// Zap pages in the given page range.
> >> >> >> > + ///
> >> >> >> > + /// This clears page table mappings for the range at the leaf level, leaving all other page
> >> >> >> > + /// tables intact,
> >> >> >>
> >> >> >> I don't fully understand this docstring. Is it correct that the function
> >> >> >> will unmap the address range given by `start` and `size`, _and_ free the
> >> >> >> pages used to hold the mappings at the leaf level of the page table?
> >> >> >
> >> >> > If the vma owns a refcount on those pages, then the refcounts are dropped.
> >> >>
> >> >> Maybe drop the "at the leaf level leaving all other page tables intact".
> >> >> It confuses me, since when would this not be the case?
> >> >
> >> > I don't understand your objection. The whole nature of a zap is to traverse
> >> > leaf level page table mappings, clearing the entries, leaving the other
> >> > page table entries intact.
> >>
> >> As someone not deeply familiar with this function and it's use, I became
> >> uncertain of my understanding when I read this sentence. As I asked
> >> above: When would you not clear mappings at the leaf level and leave all
> >> other mappings alone?
> >
> > Because these are page tables and page tables can span multiple PTE
> > tables. Correctly removing at the time of clearing would be expensive and
> > require very careful handling.
>
> What is the distinction between clearing a PTE and removing it?
>
> I asked above if the leaf page holding the PTEs would be dropped if all
> the PTEs it holds are cleared. Alice "If the vma owns a refcount on those pages,
> then the refcounts are dropped.". But from your message I am guessing
> maybe not?
>
No they won't be, though Qi is implementing a series which changes
this :) but for the purposes of this function, assume not.
> >
> >>
> >> Imagine you have a collection structure backed by a tree and the
> >> `remove_item` function has the sentence "remove item at the leaf level
> >> but leave all other items in the collection alone". That would be over
> >> specifying. It is enough information in the user facing documentation
> >> that the item is removed. You don't need to state that a remove
> >> operation on an item does not remove other items. Does this example
> >> transfer to this function, or am I missing something?
> >
> > No, because we're dealing with page tables and you are explicitly requesting a
> > page table operation. Knowing what is touched is meaningful.
>
> When would a page table operation to remove (clear?) the PTEs
> corresponding to an address range touch PTEs corresponding to addresses
> outside of the range?
Well we clear PTE entries (yes PTE is _terribly named_) in the specified
range, which might span entire PTE tables maybe not.
It also might span higher level tables if you are zapping huge pages, but
in that instance the PMD (or even PUD) would be the leaf table.
You don't touch PTE _entries_ corresponding to addresses outside of the
range.
>
> >
> >>
> >> > That is, precisely what is written here. In fact I think this
> >> > characterisation is derived from discussions had with us in mm, and it is
> >> > one with which I am happy.
> >> >
> >> > Why is it problematic to accurately describe what this does?
> >>
> >> Again, it might be that I don't properly understand what the function
> >> actually does, but if it is just removing the entries described by the
> >> range - write that. Don't add irrelevant details or specify what the
> >> function does not do. It slows down the user when reading documentation.
> >
> > It is highly pertinent as mentioned above.
> >
> > I mean we can expand the comment to explicitly add some detail around this
> > since obviously this is confusing (hey - a lot of mm is confusing - this is
> > an ongonig problem and why I have gone to lengths to try to improve
> > documentation and wrote a book about it :)
>
> That would be nice :)
Yes indeed!
>
> >
> >>
> >> >
> >> > For a series at v11 where there is broad agreement with maintainers within
> >> > the subsystem which it wraps, perhaps the priority should be to try to have
> >> > the series merged unless there is significant technical objection from the
> >> > rust side?
> >> >
> >> >>
> >> >> How about this:
> >> >>
> >> >> This clears the virtual memory map for the range given by `start` and
> >> >> `size`, dropping refcounts to memory held by the mappings in this range. That
> >> >> is, anonymous memory is completely freed, file-backed memory has its
> >> >> reference count on page cache folio's dropped, any dirty data will still
> >> >> be written back to disk as usual.
> >> >
> >> > Sorry I object to this, 'clears the virtual memory map' is really
> >> > vague. What is already there is better.
> >>
> >> Would you like the proposed paragraph if we replaced "virtual memory
> >> map" with "page table mappings", or do you object to the entirety of the
> >> new suggestion?
> >
> > I object to the suggestion in general. The description is fine as it is.
>
> Ok. I'm raising a flag because I had more questions after reading the
> docstring than before.
Sure and so I think this is valuable information, and indicates it's
probably worthwhile adding a little extra information on mentioning page
tables.
>
> >
> >>
> >> >
> >> >>
> >> >> >
> >> >> >> > and freeing any memory referenced by the VMA in this range. That is,
> >> >> >> > + /// anonymous memory is completely freed, file-backed memory has its reference count on page
> >> >> >> > + /// cache folio's dropped, any dirty data will still be written back to disk as usual.
> >> >> >> > + #[inline]
> >> >> >> > + pub fn zap_page_range_single(&self, address: usize, size: usize) {
> >> >>
> >> >>
> >> >> Best regards,
> >> >> Andreas Hindborg
> >> >>
> >> >>
> >> >
> >> > Let's please get this series merged. I think Alice has demonstrated
> >> > remarkable patience already, and modulo significant technical pushback on
> >> > the rust side (on which I defer entirely to the expertise of rust people),
> >> > I want to see this go in.
> >>
> >> I am sensing that you don't feel my comments are relevant at the current
> >> stage of this series (v11). Alice asked for reviews of the series. These are my
> >> comments. Feel free do whatever you want with them.
> >
> > I think you're getting the wrong end of the stick - you are making comments
> > on something relevant to mm, as an mm maintainer I'm giving you my point of
> > view.
>
> I appreciate that.
Thanks
>
> >
> > Your comments elsewhere seem highly useful, and review is always
> > appreciated, if you read what I said above - I defer entirely to the rust
> > community on things of which you are expert - so there is clearly no
> > disrespect intended.
>
> I did not read any disrespect in your message. I understand if you think
> I am late at the party at v11. Normally I would not pick up review of a
> series that late.
Ah OK, good :) I just wanted to make sure things were clear, text is a poor
medium and things can get misinterpreted :) I very much appreciate your
review!
>
> >
> > I'd also ask you to respect that I have gone to great lengths to review
> > this series from mm side, motivated by a strong desire to help the rust
> > commnuity.
>
> I absolutely appreciate that!
Cool :) I am excited about rust's potential in the kernel, as I know you
are and you know I suspect _probably_ Alice is somewhat :P so I think we're
all on the same page.
>
> >
> > So where I am coming from is nothing negative, quite the opposite, I simply
> > feel _on this issue_ it is not worth holding up the series for.
> >
> > This is no way intended to do down, disrespect or seem ungrateful for your
> > review or efforts. Apologies if it seemed that way, was not the intent.
> >
> > And to reiterate what I said above - I want to see this series merge :) so
> > there is no ill will anywhere.
>
> We can always merge this as is and then discuss the finer points of
> documentation later - I am fine with that. But obviously I cannot put my
> review tag on it, when I don't understand the semantics of the functions
> from reading the documentation strings. Perhaps we have someone who is
> more well versed in mm that can.
Ack of course, I would never ask you to tag anything you're not comfortable
with.
I think probably we can agree that adding extra detail to the comment
should suffice to address your concerns right?
>
> >
> >>
> >>
> >> Best regards,
> >> Andreas Hindborg
> >>
> >
> > Perhaps the correct approach here, as alluded above, is for Alice to add an
> > extra commentary pointing out the role of page tables here?
>
> That would be nice. Perhaps a bit of module level documentation is also
> a good addition.
Ack
>
> >
> > To complicate matters further (of course) there are recent series which
> > actually _do_ unused clean up page tables, though not (I believe... I have
> > to check...) on zap. But of course we in mm JUST LOVE to complicate
> > everything... ;)
>
> We should make sure to document that :)
Yeah, I have added documentation around VMA locking and page tables which
relates to this, I can expand as needed depending on this series, when I
finally get round to properly looking at it...
See https://origin.kernel.org/doc/html/latest/mm/process_addrs.html for the
doc, which is being updated constantly also.
>
>
> Best regards,
> Andreas Hindborg
>
>
Cheers!
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 2/8] mm: rust: add vm_area_struct methods that require read access
2025-01-13 14:45 ` Lorenzo Stoakes
@ 2025-01-14 9:50 ` Alice Ryhl
2025-01-14 11:57 ` Lorenzo Stoakes
0 siblings, 1 reply; 65+ messages in thread
From: Alice Ryhl @ 2025-01-14 9:50 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andreas Hindborg, Miguel Ojeda, Matthew Wilcox, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
On Mon, Jan 13, 2025 at 3:45 PM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
> > >> > For a series at v11 where there is broad agreement with maintainers within
> > >> > the subsystem which it wraps, perhaps the priority should be to try to have
> > >> > the series merged unless there is significant technical objection from the
> > >> > rust side?
> > >> >
> > >> >>
> > >> >> How about this:
> > >> >>
> > >> >> This clears the virtual memory map for the range given by `start` and
> > >> >> `size`, dropping refcounts to memory held by the mappings in this range. That
> > >> >> is, anonymous memory is completely freed, file-backed memory has its
> > >> >> reference count on page cache folio's dropped, any dirty data will still
> > >> >> be written back to disk as usual.
> > >> >
> > >> > Sorry I object to this, 'clears the virtual memory map' is really
> > >> > vague. What is already there is better.
> > >>
> > >> Would you like the proposed paragraph if we replaced "virtual memory
> > >> map" with "page table mappings", or do you object to the entirety of the
> > >> new suggestion?
> > >
> > > I object to the suggestion in general. The description is fine as it is.
> >
> > Ok. I'm raising a flag because I had more questions after reading the
> > docstring than before.
>
> Sure and so I think this is valuable information, and indicates it's
> probably worthwhile adding a little extra information on mentioning page
> tables.
Sorry, I'm a bit lost. What would you like me to add? Perhaps there's
an existing file in Documentation/ that I can link to?
Alice
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH v11 2/8] mm: rust: add vm_area_struct methods that require read access
2025-01-14 9:50 ` Alice Ryhl
@ 2025-01-14 11:57 ` Lorenzo Stoakes
2025-01-14 13:42 ` Alice Ryhl
2025-01-15 11:02 ` Andreas Hindborg
0 siblings, 2 replies; 65+ messages in thread
From: Lorenzo Stoakes @ 2025-01-14 11:57 UTC (permalink / raw)
To: Alice Ryhl
Cc: Andreas Hindborg, Miguel Ojeda, Matthew Wilcox, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
On Tue, Jan 14, 2025 at 10:50:01AM +0100, Alice Ryhl wrote:
> On Mon, Jan 13, 2025 at 3:45 PM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> > > >> > For a series at v11 where there is broad agreement with maintainers within
> > > >> > the subsystem which it wraps, perhaps the priority should be to try to have
> > > >> > the series merged unless there is significant technical objection from the
> > > >> > rust side?
> > > >> >
> > > >> >>
> > > >> >> How about this:
> > > >> >>
> > > >> >> This clears the virtual memory map for the range given by `start` and
> > > >> >> `size`, dropping refcounts to memory held by the mappings in this range. That
> > > >> >> is, anonymous memory is completely freed, file-backed memory has its
> > > >> >> reference count on page cache folio's dropped, any dirty data will still
> > > >> >> be written back to disk as usual.
> > > >> >
> > > >> > Sorry I object to this, 'clears the virtual memory map' is really
> > > >> > vague. What is already there is better.
> > > >>
> > > >> Would you like the proposed paragraph if we replaced "virtual memory
> > > >> map" with "page table mappings", or do you object to the entirety of the
> > > >> new suggestion?
> > > >
> > > > I object to the suggestion in general. The description is fine as it is.
> > >
> > > Ok. I'm raising a flag because I had more questions after reading the
> > > docstring than before.
> >
> > Sure and so I think this is valuable information, and indicates it's
> > probably worthwhile adding a little extra information on mentioning page
> > tables.
>
> Sorry, I'm a bit lost. What would you like me to add? Perhaps there's
> an existing file in Documentation/ that I can link to?
Sure no problem, I propose expanding:
/// This clears page table mappings for the range at the leaf level, leaving all other page
/// tables intact,
/// anonymous memory is completely freed, file-backed memory has its reference count on page
/// cache folio's dropped, any dirty data will still be written back to disk as usual.
To include information on page tables. I suggest something like:
/// It may seem odd that we clear at the leaf level, this is however a product
/// of the page table structure used to map physical memory into a virtual
/// address space - each virtual address actually consists of a bitmap of array
/// indices into page tables, which form a hierarchical page table level
/// structure.
///
/// As a result, each page table level maps a multiple of page table levels
/// below, and thus span ever increasing ranges of pages. At the leaf or PTE
/// level, we map the actual physical memory.
///
/// It is here where a zap operates, as it the only place we can be certain of
/// clearing without impacting any other virtual mappings. It is an
/// implementation detail as to whether the kernel goes further in freeing
/// unused page tables, but for the purposes of this operation we must only
/// assume that the leaf level is cleared.
Alice, Andreas - please let me know if this makes sense/is clear or needs
further clarification.
>
> Alice
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH v11 2/8] mm: rust: add vm_area_struct methods that require read access
2025-01-14 11:57 ` Lorenzo Stoakes
@ 2025-01-14 13:42 ` Alice Ryhl
2025-01-14 15:33 ` Lorenzo Stoakes
2025-01-15 11:02 ` Andreas Hindborg
1 sibling, 1 reply; 65+ messages in thread
From: Alice Ryhl @ 2025-01-14 13:42 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andreas Hindborg, Miguel Ojeda, Matthew Wilcox, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
On Tue, Jan 14, 2025 at 12:57 PM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Tue, Jan 14, 2025 at 10:50:01AM +0100, Alice Ryhl wrote:
> > On Mon, Jan 13, 2025 at 3:45 PM Lorenzo Stoakes
> > <lorenzo.stoakes@oracle.com> wrote:
> > > > >> > For a series at v11 where there is broad agreement with maintainers within
> > > > >> > the subsystem which it wraps, perhaps the priority should be to try to have
> > > > >> > the series merged unless there is significant technical objection from the
> > > > >> > rust side?
> > > > >> >
> > > > >> >>
> > > > >> >> How about this:
> > > > >> >>
> > > > >> >> This clears the virtual memory map for the range given by `start` and
> > > > >> >> `size`, dropping refcounts to memory held by the mappings in this range. That
> > > > >> >> is, anonymous memory is completely freed, file-backed memory has its
> > > > >> >> reference count on page cache folio's dropped, any dirty data will still
> > > > >> >> be written back to disk as usual.
> > > > >> >
> > > > >> > Sorry I object to this, 'clears the virtual memory map' is really
> > > > >> > vague. What is already there is better.
> > > > >>
> > > > >> Would you like the proposed paragraph if we replaced "virtual memory
> > > > >> map" with "page table mappings", or do you object to the entirety of the
> > > > >> new suggestion?
> > > > >
> > > > > I object to the suggestion in general. The description is fine as it is.
> > > >
> > > > Ok. I'm raising a flag because I had more questions after reading the
> > > > docstring than before.
> > >
> > > Sure and so I think this is valuable information, and indicates it's
> > > probably worthwhile adding a little extra information on mentioning page
> > > tables.
> >
> > Sorry, I'm a bit lost. What would you like me to add? Perhaps there's
> > an existing file in Documentation/ that I can link to?
>
> Sure no problem, I propose expanding:
>
> /// This clears page table mappings for the range at the leaf level, leaving all other page
> /// tables intact,
> /// anonymous memory is completely freed, file-backed memory has its reference count on page
> /// cache folio's dropped, any dirty data will still be written back to disk as usual.
>
> To include information on page tables. I suggest something like:
>
> /// It may seem odd that we clear at the leaf level, this is however a product
> /// of the page table structure used to map physical memory into a virtual
> /// address space - each virtual address actually consists of a bitmap of array
> /// indices into page tables, which form a hierarchical page table level
> /// structure.
> ///
> /// As a result, each page table level maps a multiple of page table levels
> /// below, and thus span ever increasing ranges of pages. At the leaf or PTE
> /// level, we map the actual physical memory.
> ///
> /// It is here where a zap operates, as it the only place we can be certain of
> /// clearing without impacting any other virtual mappings. It is an
> /// implementation detail as to whether the kernel goes further in freeing
> /// unused page tables, but for the purposes of this operation we must only
> /// assume that the leaf level is cleared.
>
> Alice, Andreas - please let me know if this makes sense/is clear or needs
> further clarification.
That looks reasonable to me. Thanks!
Do you have thoughts on the wordings I proposed here?
https://lore.kernel.org/all/CAH5fLginc=uNPVp1-T-oBrgtE1oi_cBMd65sPkDgqSDjH_CNfA@mail.gmail.com/
Alice
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH v11 2/8] mm: rust: add vm_area_struct methods that require read access
2025-01-14 13:42 ` Alice Ryhl
@ 2025-01-14 15:33 ` Lorenzo Stoakes
0 siblings, 0 replies; 65+ messages in thread
From: Lorenzo Stoakes @ 2025-01-14 15:33 UTC (permalink / raw)
To: Alice Ryhl
Cc: Andreas Hindborg, Miguel Ojeda, Matthew Wilcox, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
On Tue, Jan 14, 2025 at 02:42:05PM +0100, Alice Ryhl wrote:
> On Tue, Jan 14, 2025 at 12:57 PM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> >
> > On Tue, Jan 14, 2025 at 10:50:01AM +0100, Alice Ryhl wrote:
> > > On Mon, Jan 13, 2025 at 3:45 PM Lorenzo Stoakes
> > > <lorenzo.stoakes@oracle.com> wrote:
> > > > > >> > For a series at v11 where there is broad agreement with maintainers within
> > > > > >> > the subsystem which it wraps, perhaps the priority should be to try to have
> > > > > >> > the series merged unless there is significant technical objection from the
> > > > > >> > rust side?
> > > > > >> >
> > > > > >> >>
> > > > > >> >> How about this:
> > > > > >> >>
> > > > > >> >> This clears the virtual memory map for the range given by `start` and
> > > > > >> >> `size`, dropping refcounts to memory held by the mappings in this range. That
> > > > > >> >> is, anonymous memory is completely freed, file-backed memory has its
> > > > > >> >> reference count on page cache folio's dropped, any dirty data will still
> > > > > >> >> be written back to disk as usual.
> > > > > >> >
> > > > > >> > Sorry I object to this, 'clears the virtual memory map' is really
> > > > > >> > vague. What is already there is better.
> > > > > >>
> > > > > >> Would you like the proposed paragraph if we replaced "virtual memory
> > > > > >> map" with "page table mappings", or do you object to the entirety of the
> > > > > >> new suggestion?
> > > > > >
> > > > > > I object to the suggestion in general. The description is fine as it is.
> > > > >
> > > > > Ok. I'm raising a flag because I had more questions after reading the
> > > > > docstring than before.
> > > >
> > > > Sure and so I think this is valuable information, and indicates it's
> > > > probably worthwhile adding a little extra information on mentioning page
> > > > tables.
> > >
> > > Sorry, I'm a bit lost. What would you like me to add? Perhaps there's
> > > an existing file in Documentation/ that I can link to?
> >
> > Sure no problem, I propose expanding:
> >
> > /// This clears page table mappings for the range at the leaf level, leaving all other page
> > /// tables intact,
> > /// anonymous memory is completely freed, file-backed memory has its reference count on page
> > /// cache folio's dropped, any dirty data will still be written back to disk as usual.
> >
> > To include information on page tables. I suggest something like:
> >
> > /// It may seem odd that we clear at the leaf level, this is however a product
> > /// of the page table structure used to map physical memory into a virtual
> > /// address space - each virtual address actually consists of a bitmap of array
> > /// indices into page tables, which form a hierarchical page table level
> > /// structure.
> > ///
> > /// As a result, each page table level maps a multiple of page table levels
> > /// below, and thus span ever increasing ranges of pages. At the leaf or PTE
> > /// level, we map the actual physical memory.
> > ///
> > /// It is here where a zap operates, as it the only place we can be certain of
> > /// clearing without impacting any other virtual mappings. It is an
> > /// implementation detail as to whether the kernel goes further in freeing
> > /// unused page tables, but for the purposes of this operation we must only
> > /// assume that the leaf level is cleared.
> >
> > Alice, Andreas - please let me know if this makes sense/is clear or needs
> > further clarification.
>
> That looks reasonable to me. Thanks!
Cool!
>
> Do you have thoughts on the wordings I proposed here?
> https://lore.kernel.org/all/CAH5fLginc=uNPVp1-T-oBrgtE1oi_cBMd65sPkDgqSDjH_CNfA@mail.gmail.com/
Oops, missed that. Do always feel free to ping me if I seem to miss things!
Will reply in thread
>
>
> Alice
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH v11 2/8] mm: rust: add vm_area_struct methods that require read access
2025-01-14 11:57 ` Lorenzo Stoakes
2025-01-14 13:42 ` Alice Ryhl
@ 2025-01-15 11:02 ` Andreas Hindborg
2025-01-15 11:04 ` Alice Ryhl
1 sibling, 1 reply; 65+ messages in thread
From: Andreas Hindborg @ 2025-01-15 11:02 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Alice Ryhl, Miguel Ojeda, Matthew Wilcox, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
"Lorenzo Stoakes" <lorenzo.stoakes@oracle.com> writes:
> On Tue, Jan 14, 2025 at 10:50:01AM +0100, Alice Ryhl wrote:
>> On Mon, Jan 13, 2025 at 3:45 PM Lorenzo Stoakes
>> <lorenzo.stoakes@oracle.com> wrote:
>> > > >> > For a series at v11 where there is broad agreement with maintainers within
>> > > >> > the subsystem which it wraps, perhaps the priority should be to try to have
>> > > >> > the series merged unless there is significant technical objection from the
>> > > >> > rust side?
>> > > >> >
>> > > >> >>
>> > > >> >> How about this:
>> > > >> >>
>> > > >> >> This clears the virtual memory map for the range given by `start` and
>> > > >> >> `size`, dropping refcounts to memory held by the mappings in this range. That
>> > > >> >> is, anonymous memory is completely freed, file-backed memory has its
>> > > >> >> reference count on page cache folio's dropped, any dirty data will still
>> > > >> >> be written back to disk as usual.
>> > > >> >
>> > > >> > Sorry I object to this, 'clears the virtual memory map' is really
>> > > >> > vague. What is already there is better.
>> > > >>
>> > > >> Would you like the proposed paragraph if we replaced "virtual memory
>> > > >> map" with "page table mappings", or do you object to the entirety of the
>> > > >> new suggestion?
>> > > >
>> > > > I object to the suggestion in general. The description is fine as it is.
>> > >
>> > > Ok. I'm raising a flag because I had more questions after reading the
>> > > docstring than before.
>> >
>> > Sure and so I think this is valuable information, and indicates it's
>> > probably worthwhile adding a little extra information on mentioning page
>> > tables.
>>
>> Sorry, I'm a bit lost. What would you like me to add? Perhaps there's
>> an existing file in Documentation/ that I can link to?
>
> Sure no problem, I propose expanding:
>
> /// This clears page table mappings for the range at the leaf level, leaving all other page
> /// tables intact,
> /// anonymous memory is completely freed, file-backed memory has its reference count on page
> /// cache folio's dropped, any dirty data will still be written back to disk as usual.
>
> To include information on page tables. I suggest something like:
>
> /// It may seem odd that we clear at the leaf level, this is however a product
> /// of the page table structure used to map physical memory into a virtual
> /// address space - each virtual address actually consists of a bitmap of array
> /// indices into page tables, which form a hierarchical page table level
> /// structure.
> ///
> /// As a result, each page table level maps a multiple of page table levels
> /// below, and thus span ever increasing ranges of pages. At the leaf or PTE
> /// level, we map the actual physical memory.
> ///
> /// It is here where a zap operates, as it the only place we can be certain of
> /// clearing without impacting any other virtual mappings. It is an
> /// implementation detail as to whether the kernel goes further in freeing
> /// unused page tables, but for the purposes of this operation we must only
> /// assume that the leaf level is cleared.
>
> Alice, Andreas - please let me know if this makes sense/is clear or needs
> further clarification.
Sounds good to me - thanks.
@Alice - can we add PTE, PTE entry, PMD, PUD to the vocabulary at the
top? Not sure if it should go here in virt.rs or in mm.rs. If you have
no cycles I can try to add it down the road.
Best regards,
Andreas Hindborg
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH v11 2/8] mm: rust: add vm_area_struct methods that require read access
2025-01-15 11:02 ` Andreas Hindborg
@ 2025-01-15 11:04 ` Alice Ryhl
0 siblings, 0 replies; 65+ messages in thread
From: Alice Ryhl @ 2025-01-15 11:04 UTC (permalink / raw)
To: Andreas Hindborg
Cc: Lorenzo Stoakes, Miguel Ojeda, Matthew Wilcox, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
On Wed, Jan 15, 2025 at 12:03 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>
> "Lorenzo Stoakes" <lorenzo.stoakes@oracle.com> writes:
>
> > On Tue, Jan 14, 2025 at 10:50:01AM +0100, Alice Ryhl wrote:
> >> On Mon, Jan 13, 2025 at 3:45 PM Lorenzo Stoakes
> >> <lorenzo.stoakes@oracle.com> wrote:
> >> > > >> > For a series at v11 where there is broad agreement with maintainers within
> >> > > >> > the subsystem which it wraps, perhaps the priority should be to try to have
> >> > > >> > the series merged unless there is significant technical objection from the
> >> > > >> > rust side?
> >> > > >> >
> >> > > >> >>
> >> > > >> >> How about this:
> >> > > >> >>
> >> > > >> >> This clears the virtual memory map for the range given by `start` and
> >> > > >> >> `size`, dropping refcounts to memory held by the mappings in this range. That
> >> > > >> >> is, anonymous memory is completely freed, file-backed memory has its
> >> > > >> >> reference count on page cache folio's dropped, any dirty data will still
> >> > > >> >> be written back to disk as usual.
> >> > > >> >
> >> > > >> > Sorry I object to this, 'clears the virtual memory map' is really
> >> > > >> > vague. What is already there is better.
> >> > > >>
> >> > > >> Would you like the proposed paragraph if we replaced "virtual memory
> >> > > >> map" with "page table mappings", or do you object to the entirety of the
> >> > > >> new suggestion?
> >> > > >
> >> > > > I object to the suggestion in general. The description is fine as it is.
> >> > >
> >> > > Ok. I'm raising a flag because I had more questions after reading the
> >> > > docstring than before.
> >> >
> >> > Sure and so I think this is valuable information, and indicates it's
> >> > probably worthwhile adding a little extra information on mentioning page
> >> > tables.
> >>
> >> Sorry, I'm a bit lost. What would you like me to add? Perhaps there's
> >> an existing file in Documentation/ that I can link to?
> >
> > Sure no problem, I propose expanding:
> >
> > /// This clears page table mappings for the range at the leaf level, leaving all other page
> > /// tables intact,
> > /// anonymous memory is completely freed, file-backed memory has its reference count on page
> > /// cache folio's dropped, any dirty data will still be written back to disk as usual.
> >
> > To include information on page tables. I suggest something like:
> >
> > /// It may seem odd that we clear at the leaf level, this is however a product
> > /// of the page table structure used to map physical memory into a virtual
> > /// address space - each virtual address actually consists of a bitmap of array
> > /// indices into page tables, which form a hierarchical page table level
> > /// structure.
> > ///
> > /// As a result, each page table level maps a multiple of page table levels
> > /// below, and thus span ever increasing ranges of pages. At the leaf or PTE
> > /// level, we map the actual physical memory.
> > ///
> > /// It is here where a zap operates, as it the only place we can be certain of
> > /// clearing without impacting any other virtual mappings. It is an
> > /// implementation detail as to whether the kernel goes further in freeing
> > /// unused page tables, but for the purposes of this operation we must only
> > /// assume that the leaf level is cleared.
> >
> > Alice, Andreas - please let me know if this makes sense/is clear or needs
> > further clarification.
>
> Sounds good to me - thanks.
>
> @Alice - can we add PTE, PTE entry, PMD, PUD to the vocabulary at the
> top? Not sure if it should go here in virt.rs or in mm.rs. If you have
> no cycles I can try to add it down the road.
Sorry, I don't know these concepts well enough to write about them
right now. If there's a documentation page I can link to that's great,
but otherwise I think it will have to wait.
Alice
^ permalink raw reply [flat|nested] 65+ messages in thread
* [PATCH v11 3/8] mm: rust: add vm_insert_page
2024-12-11 10:37 ` [PATCH v11 0/8] Rust support for mm_struct, vm_area_struct, and mmap Alice Ryhl
2024-12-11 10:37 ` [PATCH v11 1/8] mm: rust: add abstraction for struct mm_struct Alice Ryhl
2024-12-11 10:37 ` [PATCH v11 2/8] mm: rust: add vm_area_struct methods that require read access Alice Ryhl
@ 2024-12-11 10:37 ` Alice Ryhl
2024-12-16 12:25 ` Andreas Hindborg
2024-12-11 10:37 ` [PATCH v11 4/8] mm: rust: add lock_vma_under_rcu Alice Ryhl
` (6 subsequent siblings)
9 siblings, 1 reply; 65+ messages in thread
From: Alice Ryhl @ 2024-12-11 10:37 UTC (permalink / raw)
To: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan
Cc: Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Trevor Gross, linux-kernel,
linux-mm, rust-for-linux, Alice Ryhl
The vm_insert_page method is only usable on vmas with the VM_MIXEDMAP
flag, so we introduce a new type to keep track of such vmas.
The approach used in this patch assumes that we will not need to encode
many flag combinations in the type. I don't think we need to encode more
than VM_MIXEDMAP and VM_PFNMAP as things are now. However, if that
becomes necessary, using generic parameters in a single type would scale
better as the number of flags increases.
Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
rust/kernel/mm/virt.rs | 71 +++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 70 insertions(+), 1 deletion(-)
diff --git a/rust/kernel/mm/virt.rs b/rust/kernel/mm/virt.rs
index 68c763169cf0..3a23854e14f4 100644
--- a/rust/kernel/mm/virt.rs
+++ b/rust/kernel/mm/virt.rs
@@ -4,7 +4,15 @@
//! Virtual memory.
-use crate::{bindings, mm::MmWithUser, types::Opaque};
+use crate::{
+ bindings,
+ error::{to_result, Result},
+ mm::MmWithUser,
+ page::Page,
+ types::Opaque,
+};
+
+use core::ops::Deref;
/// A wrapper for the kernel's `struct vm_area_struct` with read access.
///
@@ -100,6 +108,67 @@ pub fn zap_page_range_single(&self, address: usize, size: usize) {
)
};
}
+
+ /// Check whether the `VM_MIXEDMAP` flag is set.
+ ///
+ /// This can be used to access methods that require `VM_MIXEDMAP` to be set.
+ #[inline]
+ pub fn as_mixedmap_vma(&self) -> Option<&VmAreaMixedMap> {
+ if self.flags() & flags::MIXEDMAP != 0 {
+ // SAFETY: We just checked that `VM_MIXEDMAP` is set. All other requirements are
+ // satisfied by the type invariants of `VmAreaRef`.
+ Some(unsafe { VmAreaMixedMap::from_raw(self.as_ptr()) })
+ } else {
+ None
+ }
+ }
+}
+
+/// A wrapper for the kernel's `struct vm_area_struct` with read access and `VM_MIXEDMAP` set.
+///
+/// It represents an area of virtual memory.
+///
+/// # Invariants
+///
+/// The caller must hold the mmap read lock or the vma read lock. The `VM_MIXEDMAP` flag must be
+/// set.
+#[repr(transparent)]
+pub struct VmAreaMixedMap {
+ vma: VmAreaRef,
+}
+
+// Make all `VmAreaRef` methods available on `VmAreaMixedMap`.
+impl Deref for VmAreaMixedMap {
+ type Target = VmAreaRef;
+
+ #[inline]
+ fn deref(&self) -> &VmAreaRef {
+ &self.vma
+ }
+}
+
+impl VmAreaMixedMap {
+ /// Access a virtual memory area given a raw pointer.
+ ///
+ /// # Safety
+ ///
+ /// Callers must ensure that `vma` is valid for the duration of 'a, and that the mmap read lock
+ /// (or stronger) is held for at least the duration of 'a. The `VM_MIXEDMAP` flag must be set.
+ #[inline]
+ pub unsafe fn from_raw<'a>(vma: *const bindings::vm_area_struct) -> &'a Self {
+ // SAFETY: The caller ensures that the invariants are satisfied for the duration of 'a.
+ unsafe { &*vma.cast() }
+ }
+
+ /// Maps a single page at the given address within the virtual memory area.
+ ///
+ /// This operation does not take ownership of the page.
+ #[inline]
+ pub fn vm_insert_page(&self, address: usize, page: &Page) -> Result {
+ // SAFETY: The caller has read access and has verified that `VM_MIXEDMAP` is set. The page
+ // is order 0. The address is checked on the C side so it can take any value.
+ to_result(unsafe { bindings::vm_insert_page(self.as_ptr(), address as _, page.as_ptr()) })
+ }
}
/// The integer type used for vma flags.
--
2.47.1.613.gc27f4b7a9f-goog
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 3/8] mm: rust: add vm_insert_page
2024-12-11 10:37 ` [PATCH v11 3/8] mm: rust: add vm_insert_page Alice Ryhl
@ 2024-12-16 12:25 ` Andreas Hindborg
2025-01-13 10:02 ` Alice Ryhl
0 siblings, 1 reply; 65+ messages in thread
From: Andreas Hindborg @ 2024-12-16 12:25 UTC (permalink / raw)
To: Alice Ryhl
Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo,
=?utf-8?Q?Bj=C3=B6rn?= Roy Baron, Benno Lossin, Trevor Gross,
linux-kernel, linux-mm, rust-for-linux
"Alice Ryhl" <aliceryhl@google.com> writes:
> The vm_insert_page method is only usable on vmas with the VM_MIXEDMAP
> flag, so we introduce a new type to keep track of such vmas.
>
> The approach used in this patch assumes that we will not need to encode
> many flag combinations in the type. I don't think we need to encode more
> than VM_MIXEDMAP and VM_PFNMAP as things are now. However, if that
> becomes necessary, using generic parameters in a single type would scale
> better as the number of flags increases.
>
> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> ---
> rust/kernel/mm/virt.rs | 71 +++++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 70 insertions(+), 1 deletion(-)
>
> diff --git a/rust/kernel/mm/virt.rs b/rust/kernel/mm/virt.rs
> index 68c763169cf0..3a23854e14f4 100644
> --- a/rust/kernel/mm/virt.rs
> +++ b/rust/kernel/mm/virt.rs
> @@ -4,7 +4,15 @@
>
> //! Virtual memory.
>
> -use crate::{bindings, mm::MmWithUser, types::Opaque};
> +use crate::{
> + bindings,
> + error::{to_result, Result},
> + mm::MmWithUser,
> + page::Page,
> + types::Opaque,
> +};
> +
> +use core::ops::Deref;
>
> /// A wrapper for the kernel's `struct vm_area_struct` with read access.
> ///
> @@ -100,6 +108,67 @@ pub fn zap_page_range_single(&self, address: usize, size: usize) {
> )
> };
> }
> +
> + /// Check whether the `VM_MIXEDMAP` flag is set.
Perhaps "Check whether the `VM_MIXEDMAP` flag is set. If so, return
`Some`, otherwise `None` ?
> + ///
> + /// This can be used to access methods that require `VM_MIXEDMAP` to be set.
> + #[inline]
> + pub fn as_mixedmap_vma(&self) -> Option<&VmAreaMixedMap> {
> + if self.flags() & flags::MIXEDMAP != 0 {
> + // SAFETY: We just checked that `VM_MIXEDMAP` is set. All other requirements are
> + // satisfied by the type invariants of `VmAreaRef`.
> + Some(unsafe { VmAreaMixedMap::from_raw(self.as_ptr()) })
> + } else {
> + None
> + }
> + }
> +}
> +
> +/// A wrapper for the kernel's `struct vm_area_struct` with read access and `VM_MIXEDMAP` set.
> +///
> +/// It represents an area of virtual memory.
Could we have a link to `VmAreaRef` and explain that this is a
`VmAreaRef` with an additional requirement?
> +///
> +/// # Invariants
> +///
> +/// The caller must hold the mmap read lock or the vma read lock. The `VM_MIXEDMAP` flag must be
> +/// set.
> +#[repr(transparent)]
> +pub struct VmAreaMixedMap {
> + vma: VmAreaRef,
> +}
> +
> +// Make all `VmAreaRef` methods available on `VmAreaMixedMap`.
> +impl Deref for VmAreaMixedMap {
> + type Target = VmAreaRef;
> +
> + #[inline]
> + fn deref(&self) -> &VmAreaRef {
> + &self.vma
> + }
> +}
> +
> +impl VmAreaMixedMap {
> + /// Access a virtual memory area given a raw pointer.
> + ///
> + /// # Safety
> + ///
> + /// Callers must ensure that `vma` is valid for the duration of 'a, and that the mmap read lock
> + /// (or stronger) is held for at least the duration of 'a. The `VM_MIXEDMAP` flag must be set.
> + #[inline]
> + pub unsafe fn from_raw<'a>(vma: *const bindings::vm_area_struct) -> &'a Self {
> + // SAFETY: The caller ensures that the invariants are satisfied for the duration of 'a.
> + unsafe { &*vma.cast() }
> + }
> +
> + /// Maps a single page at the given address within the virtual memory area.
> + ///
> + /// This operation does not take ownership of the page.
> + #[inline]
> + pub fn vm_insert_page(&self, address: usize, page: &Page) -> Result {
> + // SAFETY: The caller has read access and has verified that `VM_MIXEDMAP` is set. The page
> + // is order 0. The address is checked on the C side so it can take any value.
Maybe something like this: "By the type invariant of `Self` caller has read
access and has verified that `VM_MIXEDMAP` is set. By invariant on
`Page` the page has order 0."
> + to_result(unsafe { bindings::vm_insert_page(self.as_ptr(), address as _, page.as_ptr()) })
> + }
> }
>
> /// The integer type used for vma flags.
Best regards,
Andreas Hindborg
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 3/8] mm: rust: add vm_insert_page
2024-12-16 12:25 ` Andreas Hindborg
@ 2025-01-13 10:02 ` Alice Ryhl
2025-01-15 9:33 ` Andreas Hindborg
0 siblings, 1 reply; 65+ messages in thread
From: Alice Ryhl @ 2025-01-13 10:02 UTC (permalink / raw)
To: Andreas Hindborg
Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
On Mon, Dec 16, 2024 at 3:51 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>
> "Alice Ryhl" <aliceryhl@google.com> writes:
>
> > The vm_insert_page method is only usable on vmas with the VM_MIXEDMAP
> > flag, so we introduce a new type to keep track of such vmas.
> >
> > The approach used in this patch assumes that we will not need to encode
> > many flag combinations in the type. I don't think we need to encode more
> > than VM_MIXEDMAP and VM_PFNMAP as things are now. However, if that
> > becomes necessary, using generic parameters in a single type would scale
> > better as the number of flags increases.
> >
> > Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> > ---
> > rust/kernel/mm/virt.rs | 71 +++++++++++++++++++++++++++++++++++++++++++++++++-
> > 1 file changed, 70 insertions(+), 1 deletion(-)
> >
> > diff --git a/rust/kernel/mm/virt.rs b/rust/kernel/mm/virt.rs
> > index 68c763169cf0..3a23854e14f4 100644
> > --- a/rust/kernel/mm/virt.rs
> > +++ b/rust/kernel/mm/virt.rs
> > @@ -4,7 +4,15 @@
> >
> > //! Virtual memory.
> >
> > -use crate::{bindings, mm::MmWithUser, types::Opaque};
> > +use crate::{
> > + bindings,
> > + error::{to_result, Result},
> > + mm::MmWithUser,
> > + page::Page,
> > + types::Opaque,
> > +};
> > +
> > +use core::ops::Deref;
> >
> > /// A wrapper for the kernel's `struct vm_area_struct` with read access.
> > ///
> > @@ -100,6 +108,67 @@ pub fn zap_page_range_single(&self, address: usize, size: usize) {
> > )
> > };
> > }
> > +
> > + /// Check whether the `VM_MIXEDMAP` flag is set.
>
> Perhaps "Check whether the `VM_MIXEDMAP` flag is set. If so, return
> `Some`, otherwise `None` ?
How about
If the `VM_MIXEDMAP` flag is set, returns a `VmAreaMixedMap` to this
VMA, otherwise returns `None`.
This follows the example of slice::as_ascii
> > + ///
> > + /// This can be used to access methods that require `VM_MIXEDMAP` to be set.
> > + #[inline]
> > + pub fn as_mixedmap_vma(&self) -> Option<&VmAreaMixedMap> {
> > + if self.flags() & flags::MIXEDMAP != 0 {
> > + // SAFETY: We just checked that `VM_MIXEDMAP` is set. All other requirements are
> > + // satisfied by the type invariants of `VmAreaRef`.
> > + Some(unsafe { VmAreaMixedMap::from_raw(self.as_ptr()) })
> > + } else {
> > + None
> > + }
> > + }
> > +}
> > +
> > +/// A wrapper for the kernel's `struct vm_area_struct` with read access and `VM_MIXEDMAP` set.
> > +///
> > +/// It represents an area of virtual memory.
>
> Could we have a link to `VmAreaRef` and explain that this is a
> `VmAreaRef` with an additional requirement?
Ok.
> > +///
> > +/// # Invariants
> > +///
> > +/// The caller must hold the mmap read lock or the vma read lock. The `VM_MIXEDMAP` flag must be
> > +/// set.
> > +#[repr(transparent)]
> > +pub struct VmAreaMixedMap {
> > + vma: VmAreaRef,
> > +}
> > +
> > +// Make all `VmAreaRef` methods available on `VmAreaMixedMap`.
> > +impl Deref for VmAreaMixedMap {
> > + type Target = VmAreaRef;
> > +
> > + #[inline]
> > + fn deref(&self) -> &VmAreaRef {
> > + &self.vma
> > + }
> > +}
> > +
> > +impl VmAreaMixedMap {
> > + /// Access a virtual memory area given a raw pointer.
> > + ///
> > + /// # Safety
> > + ///
> > + /// Callers must ensure that `vma` is valid for the duration of 'a, and that the mmap read lock
> > + /// (or stronger) is held for at least the duration of 'a. The `VM_MIXEDMAP` flag must be set.
> > + #[inline]
> > + pub unsafe fn from_raw<'a>(vma: *const bindings::vm_area_struct) -> &'a Self {
> > + // SAFETY: The caller ensures that the invariants are satisfied for the duration of 'a.
> > + unsafe { &*vma.cast() }
> > + }
> > +
> > + /// Maps a single page at the given address within the virtual memory area.
> > + ///
> > + /// This operation does not take ownership of the page.
> > + #[inline]
> > + pub fn vm_insert_page(&self, address: usize, page: &Page) -> Result {
> > + // SAFETY: The caller has read access and has verified that `VM_MIXEDMAP` is set. The page
> > + // is order 0. The address is checked on the C side so it can take any value.
>
> Maybe something like this: "By the type invariant of `Self` caller has read
> access and has verified that `VM_MIXEDMAP` is set. By invariant on
> `Page` the page has order 0."
Ok.
Alice
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 3/8] mm: rust: add vm_insert_page
2025-01-13 10:02 ` Alice Ryhl
@ 2025-01-15 9:33 ` Andreas Hindborg
0 siblings, 0 replies; 65+ messages in thread
From: Andreas Hindborg @ 2025-01-15 9:33 UTC (permalink / raw)
To: Alice Ryhl
Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
"Alice Ryhl" <aliceryhl@google.com> writes:
> On Mon, Dec 16, 2024 at 3:51 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>>
>> "Alice Ryhl" <aliceryhl@google.com> writes:
>>
>> > The vm_insert_page method is only usable on vmas with the VM_MIXEDMAP
>> > flag, so we introduce a new type to keep track of such vmas.
>> >
>> > The approach used in this patch assumes that we will not need to encode
>> > many flag combinations in the type. I don't think we need to encode more
>> > than VM_MIXEDMAP and VM_PFNMAP as things are now. However, if that
>> > becomes necessary, using generic parameters in a single type would scale
>> > better as the number of flags increases.
>> >
>> > Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
>> > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
>> > ---
>> > rust/kernel/mm/virt.rs | 71 +++++++++++++++++++++++++++++++++++++++++++++++++-
>> > 1 file changed, 70 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/rust/kernel/mm/virt.rs b/rust/kernel/mm/virt.rs
>> > index 68c763169cf0..3a23854e14f4 100644
>> > --- a/rust/kernel/mm/virt.rs
>> > +++ b/rust/kernel/mm/virt.rs
>> > @@ -4,7 +4,15 @@
>> >
>> > //! Virtual memory.
>> >
>> > -use crate::{bindings, mm::MmWithUser, types::Opaque};
>> > +use crate::{
>> > + bindings,
>> > + error::{to_result, Result},
>> > + mm::MmWithUser,
>> > + page::Page,
>> > + types::Opaque,
>> > +};
>> > +
>> > +use core::ops::Deref;
>> >
>> > /// A wrapper for the kernel's `struct vm_area_struct` with read access.
>> > ///
>> > @@ -100,6 +108,67 @@ pub fn zap_page_range_single(&self, address: usize, size: usize) {
>> > )
>> > };
>> > }
>> > +
>> > + /// Check whether the `VM_MIXEDMAP` flag is set.
>>
>> Perhaps "Check whether the `VM_MIXEDMAP` flag is set. If so, return
>> `Some`, otherwise `None` ?
>
> How about
>
> If the `VM_MIXEDMAP` flag is set, returns a `VmAreaMixedMap` to this
> VMA, otherwise returns `None`.
>
> This follows the example of slice::as_ascii
Sounds good 👍
Best regards,
Andreas Hindborg
^ permalink raw reply [flat|nested] 65+ messages in thread
* [PATCH v11 4/8] mm: rust: add lock_vma_under_rcu
2024-12-11 10:37 ` [PATCH v11 0/8] Rust support for mm_struct, vm_area_struct, and mmap Alice Ryhl
` (2 preceding siblings ...)
2024-12-11 10:37 ` [PATCH v11 3/8] mm: rust: add vm_insert_page Alice Ryhl
@ 2024-12-11 10:37 ` Alice Ryhl
2024-12-16 12:47 ` Andreas Hindborg
2024-12-11 10:37 ` [PATCH v11 5/8] mm: rust: add mmput_async support Alice Ryhl
` (5 subsequent siblings)
9 siblings, 1 reply; 65+ messages in thread
From: Alice Ryhl @ 2024-12-11 10:37 UTC (permalink / raw)
To: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan
Cc: Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Trevor Gross, linux-kernel,
linux-mm, rust-for-linux, Alice Ryhl
Currently, the binder driver always uses the mmap lock to make changes
to its vma. Because the mmap lock is global to the process, this can
involve significant contention. However, the kernel has a feature called
per-vma locks, which can significantly reduce contention. For example,
you can take a vma lock in parallel with an mmap write lock. This is
important because contention on the mmap lock has been a long-term
recurring challenge for the Binder driver.
This patch introduces support for using `lock_vma_under_rcu` from Rust.
The Rust Binder driver will be able to use this to reduce contention on
the mmap lock.
Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
Reviewed-by: Jann Horn <jannh@google.com>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
rust/helpers/mm.c | 5 +++++
rust/kernel/mm.rs | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 61 insertions(+)
diff --git a/rust/helpers/mm.c b/rust/helpers/mm.c
index 7b72eb065a3e..81b510c96fd2 100644
--- a/rust/helpers/mm.c
+++ b/rust/helpers/mm.c
@@ -43,3 +43,8 @@ struct vm_area_struct *rust_helper_vma_lookup(struct mm_struct *mm,
{
return vma_lookup(mm, addr);
}
+
+void rust_helper_vma_end_read(struct vm_area_struct *vma)
+{
+ vma_end_read(vma);
+}
diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
index ace8e7d57afe..425b73a9dfe6 100644
--- a/rust/kernel/mm.rs
+++ b/rust/kernel/mm.rs
@@ -13,6 +13,7 @@
use core::{ops::Deref, ptr::NonNull};
pub mod virt;
+use virt::VmAreaRef;
/// A wrapper for the kernel's `struct mm_struct`.
///
@@ -170,6 +171,32 @@ pub unsafe fn from_raw<'a>(ptr: *const bindings::mm_struct) -> &'a MmWithUser {
unsafe { &*ptr.cast() }
}
+ /// Attempt to access a vma using the vma read lock.
+ ///
+ /// This is an optimistic trylock operation, so it may fail if there is contention. In that
+ /// case, you should fall back to taking the mmap read lock.
+ ///
+ /// When per-vma locks are disabled, this always returns `None`.
+ #[inline]
+ pub fn lock_vma_under_rcu(&self, vma_addr: usize) -> Option<VmaReadGuard<'_>> {
+ #[cfg(CONFIG_PER_VMA_LOCK)]
+ {
+ // SAFETY: Calling `bindings::lock_vma_under_rcu` is always okay given an mm where
+ // `mm_users` is non-zero.
+ let vma = unsafe { bindings::lock_vma_under_rcu(self.as_raw(), vma_addr as _) };
+ if !vma.is_null() {
+ return Some(VmaReadGuard {
+ // SAFETY: If `lock_vma_under_rcu` returns a non-null ptr, then it points at a
+ // valid vma. The vma is stable for as long as the vma read lock is held.
+ vma: unsafe { VmAreaRef::from_raw(vma) },
+ _nts: NotThreadSafe,
+ });
+ }
+ }
+
+ None
+ }
+
/// Lock the mmap read lock.
#[inline]
pub fn mmap_read_lock(&self) -> MmapReadGuard<'_> {
@@ -238,3 +265,32 @@ fn drop(&mut self) {
unsafe { bindings::mmap_read_unlock(self.mm.as_raw()) };
}
}
+
+/// A guard for the vma read lock.
+///
+/// # Invariants
+///
+/// This `VmaReadGuard` guard owns the vma read lock.
+pub struct VmaReadGuard<'a> {
+ vma: &'a VmAreaRef,
+ // `vma_end_read` must be called on the same thread as where the lock was taken
+ _nts: NotThreadSafe,
+}
+
+// Make all `VmAreaRef` methods available on `VmaReadGuard`.
+impl Deref for VmaReadGuard<'_> {
+ type Target = VmAreaRef;
+
+ #[inline]
+ fn deref(&self) -> &VmAreaRef {
+ self.vma
+ }
+}
+
+impl Drop for VmaReadGuard<'_> {
+ #[inline]
+ fn drop(&mut self) {
+ // SAFETY: We hold the read lock by the type invariants.
+ unsafe { bindings::vma_end_read(self.vma.as_ptr()) };
+ }
+}
--
2.47.1.613.gc27f4b7a9f-goog
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 4/8] mm: rust: add lock_vma_under_rcu
2024-12-11 10:37 ` [PATCH v11 4/8] mm: rust: add lock_vma_under_rcu Alice Ryhl
@ 2024-12-16 12:47 ` Andreas Hindborg
2025-01-13 10:04 ` Alice Ryhl
0 siblings, 1 reply; 65+ messages in thread
From: Andreas Hindborg @ 2024-12-16 12:47 UTC (permalink / raw)
To: Alice Ryhl
Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo,
=?utf-8?Q?Bj=C3=B6rn?= Roy Baron, Benno Lossin, Trevor Gross,
linux-kernel, linux-mm, rust-for-linux
"Alice Ryhl" <aliceryhl@google.com> writes:
> Currently, the binder driver always uses the mmap lock to make changes
> to its vma. Because the mmap lock is global to the process, this can
> involve significant contention. However, the kernel has a feature called
> per-vma locks, which can significantly reduce contention. For example,
> you can take a vma lock in parallel with an mmap write lock. This is
> important because contention on the mmap lock has been a long-term
> recurring challenge for the Binder driver.
>
> This patch introduces support for using `lock_vma_under_rcu` from Rust.
> The Rust Binder driver will be able to use this to reduce contention on
> the mmap lock.
>
> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> Reviewed-by: Jann Horn <jannh@google.com>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> ---
> rust/helpers/mm.c | 5 +++++
> rust/kernel/mm.rs | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 61 insertions(+)
>
> diff --git a/rust/helpers/mm.c b/rust/helpers/mm.c
> index 7b72eb065a3e..81b510c96fd2 100644
> --- a/rust/helpers/mm.c
> +++ b/rust/helpers/mm.c
> @@ -43,3 +43,8 @@ struct vm_area_struct *rust_helper_vma_lookup(struct mm_struct *mm,
> {
> return vma_lookup(mm, addr);
> }
> +
> +void rust_helper_vma_end_read(struct vm_area_struct *vma)
> +{
> + vma_end_read(vma);
> +}
> diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
> index ace8e7d57afe..425b73a9dfe6 100644
> --- a/rust/kernel/mm.rs
> +++ b/rust/kernel/mm.rs
> @@ -13,6 +13,7 @@
> use core::{ops::Deref, ptr::NonNull};
>
> pub mod virt;
> +use virt::VmAreaRef;
>
> /// A wrapper for the kernel's `struct mm_struct`.
> ///
> @@ -170,6 +171,32 @@ pub unsafe fn from_raw<'a>(ptr: *const bindings::mm_struct) -> &'a MmWithUser {
> unsafe { &*ptr.cast() }
> }
>
> + /// Attempt to access a vma using the vma read lock.
> + ///
> + /// This is an optimistic trylock operation, so it may fail if there is contention. In that
> + /// case, you should fall back to taking the mmap read lock.
> + ///
> + /// When per-vma locks are disabled, this always returns `None`.
> + #[inline]
> + pub fn lock_vma_under_rcu(&self, vma_addr: usize) -> Option<VmaReadGuard<'_>> {
> + #[cfg(CONFIG_PER_VMA_LOCK)]
> + {
> + // SAFETY: Calling `bindings::lock_vma_under_rcu` is always okay given an mm where
> + // `mm_users` is non-zero.
> + let vma = unsafe { bindings::lock_vma_under_rcu(self.as_raw(), vma_addr as _) };
Is `as _` the right approach here?
Best regards,
Andreas Hindborg
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 4/8] mm: rust: add lock_vma_under_rcu
2024-12-16 12:47 ` Andreas Hindborg
@ 2025-01-13 10:04 ` Alice Ryhl
2025-01-15 9:34 ` Andreas Hindborg
0 siblings, 1 reply; 65+ messages in thread
From: Alice Ryhl @ 2025-01-13 10:04 UTC (permalink / raw)
To: Andreas Hindborg
Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
On Mon, Dec 16, 2024 at 3:50 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>
> "Alice Ryhl" <aliceryhl@google.com> writes:
>
> > Currently, the binder driver always uses the mmap lock to make changes
> > to its vma. Because the mmap lock is global to the process, this can
> > involve significant contention. However, the kernel has a feature called
> > per-vma locks, which can significantly reduce contention. For example,
> > you can take a vma lock in parallel with an mmap write lock. This is
> > important because contention on the mmap lock has been a long-term
> > recurring challenge for the Binder driver.
> >
> > This patch introduces support for using `lock_vma_under_rcu` from Rust.
> > The Rust Binder driver will be able to use this to reduce contention on
> > the mmap lock.
> >
> > Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> > Reviewed-by: Jann Horn <jannh@google.com>
> > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> > ---
> > rust/helpers/mm.c | 5 +++++
> > rust/kernel/mm.rs | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 61 insertions(+)
> >
> > diff --git a/rust/helpers/mm.c b/rust/helpers/mm.c
> > index 7b72eb065a3e..81b510c96fd2 100644
> > --- a/rust/helpers/mm.c
> > +++ b/rust/helpers/mm.c
> > @@ -43,3 +43,8 @@ struct vm_area_struct *rust_helper_vma_lookup(struct mm_struct *mm,
> > {
> > return vma_lookup(mm, addr);
> > }
> > +
> > +void rust_helper_vma_end_read(struct vm_area_struct *vma)
> > +{
> > + vma_end_read(vma);
> > +}
> > diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
> > index ace8e7d57afe..425b73a9dfe6 100644
> > --- a/rust/kernel/mm.rs
> > +++ b/rust/kernel/mm.rs
> > @@ -13,6 +13,7 @@
> > use core::{ops::Deref, ptr::NonNull};
> >
> > pub mod virt;
> > +use virt::VmAreaRef;
> >
> > /// A wrapper for the kernel's `struct mm_struct`.
> > ///
> > @@ -170,6 +171,32 @@ pub unsafe fn from_raw<'a>(ptr: *const bindings::mm_struct) -> &'a MmWithUser {
> > unsafe { &*ptr.cast() }
> > }
> >
> > + /// Attempt to access a vma using the vma read lock.
> > + ///
> > + /// This is an optimistic trylock operation, so it may fail if there is contention. In that
> > + /// case, you should fall back to taking the mmap read lock.
> > + ///
> > + /// When per-vma locks are disabled, this always returns `None`.
> > + #[inline]
> > + pub fn lock_vma_under_rcu(&self, vma_addr: usize) -> Option<VmaReadGuard<'_>> {
> > + #[cfg(CONFIG_PER_VMA_LOCK)]
> > + {
> > + // SAFETY: Calling `bindings::lock_vma_under_rcu` is always okay given an mm where
> > + // `mm_users` is non-zero.
> > + let vma = unsafe { bindings::lock_vma_under_rcu(self.as_raw(), vma_addr as _) };
>
> Is `as _` the right approach here?
We can drop it once the FFI integer types are fixed. It's late in the
cycle, so this patch probably won't make it for 6.14. This means I can
remove the casts entirely before this is merged. Otherwise we can
remove them in a follow-up.
Alice
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 4/8] mm: rust: add lock_vma_under_rcu
2025-01-13 10:04 ` Alice Ryhl
@ 2025-01-15 9:34 ` Andreas Hindborg
0 siblings, 0 replies; 65+ messages in thread
From: Andreas Hindborg @ 2025-01-15 9:34 UTC (permalink / raw)
To: Alice Ryhl
Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
"Alice Ryhl" <aliceryhl@google.com> writes:
> On Mon, Dec 16, 2024 at 3:50 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>>
>> "Alice Ryhl" <aliceryhl@google.com> writes:
>>
>> > Currently, the binder driver always uses the mmap lock to make changes
>> > to its vma. Because the mmap lock is global to the process, this can
>> > involve significant contention. However, the kernel has a feature called
>> > per-vma locks, which can significantly reduce contention. For example,
>> > you can take a vma lock in parallel with an mmap write lock. This is
>> > important because contention on the mmap lock has been a long-term
>> > recurring challenge for the Binder driver.
>> >
>> > This patch introduces support for using `lock_vma_under_rcu` from Rust.
>> > The Rust Binder driver will be able to use this to reduce contention on
>> > the mmap lock.
>> >
>> > Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
>> > Reviewed-by: Jann Horn <jannh@google.com>
>> > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
>> > ---
>> > rust/helpers/mm.c | 5 +++++
>> > rust/kernel/mm.rs | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > 2 files changed, 61 insertions(+)
>> >
>> > diff --git a/rust/helpers/mm.c b/rust/helpers/mm.c
>> > index 7b72eb065a3e..81b510c96fd2 100644
>> > --- a/rust/helpers/mm.c
>> > +++ b/rust/helpers/mm.c
>> > @@ -43,3 +43,8 @@ struct vm_area_struct *rust_helper_vma_lookup(struct mm_struct *mm,
>> > {
>> > return vma_lookup(mm, addr);
>> > }
>> > +
>> > +void rust_helper_vma_end_read(struct vm_area_struct *vma)
>> > +{
>> > + vma_end_read(vma);
>> > +}
>> > diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
>> > index ace8e7d57afe..425b73a9dfe6 100644
>> > --- a/rust/kernel/mm.rs
>> > +++ b/rust/kernel/mm.rs
>> > @@ -13,6 +13,7 @@
>> > use core::{ops::Deref, ptr::NonNull};
>> >
>> > pub mod virt;
>> > +use virt::VmAreaRef;
>> >
>> > /// A wrapper for the kernel's `struct mm_struct`.
>> > ///
>> > @@ -170,6 +171,32 @@ pub unsafe fn from_raw<'a>(ptr: *const bindings::mm_struct) -> &'a MmWithUser {
>> > unsafe { &*ptr.cast() }
>> > }
>> >
>> > + /// Attempt to access a vma using the vma read lock.
>> > + ///
>> > + /// This is an optimistic trylock operation, so it may fail if there is contention. In that
>> > + /// case, you should fall back to taking the mmap read lock.
>> > + ///
>> > + /// When per-vma locks are disabled, this always returns `None`.
>> > + #[inline]
>> > + pub fn lock_vma_under_rcu(&self, vma_addr: usize) -> Option<VmaReadGuard<'_>> {
>> > + #[cfg(CONFIG_PER_VMA_LOCK)]
>> > + {
>> > + // SAFETY: Calling `bindings::lock_vma_under_rcu` is always okay given an mm where
>> > + // `mm_users` is non-zero.
>> > + let vma = unsafe { bindings::lock_vma_under_rcu(self.as_raw(), vma_addr as _) };
>>
>> Is `as _` the right approach here?
>
> We can drop it once the FFI integer types are fixed. It's late in the
> cycle, so this patch probably won't make it for 6.14. This means I can
> remove the casts entirely before this is merged. Otherwise we can
> remove them in a follow-up.
Sounds good to me 👍
Best regards,
Andreas Hindborg
^ permalink raw reply [flat|nested] 65+ messages in thread
* [PATCH v11 5/8] mm: rust: add mmput_async support
2024-12-11 10:37 ` [PATCH v11 0/8] Rust support for mm_struct, vm_area_struct, and mmap Alice Ryhl
` (3 preceding siblings ...)
2024-12-11 10:37 ` [PATCH v11 4/8] mm: rust: add lock_vma_under_rcu Alice Ryhl
@ 2024-12-11 10:37 ` Alice Ryhl
2024-12-16 13:10 ` Andreas Hindborg
2024-12-11 10:37 ` [PATCH v11 6/8] mm: rust: add VmAreaNew for f_ops->mmap() Alice Ryhl
` (4 subsequent siblings)
9 siblings, 1 reply; 65+ messages in thread
From: Alice Ryhl @ 2024-12-11 10:37 UTC (permalink / raw)
To: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan
Cc: Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Trevor Gross, linux-kernel,
linux-mm, rust-for-linux, Alice Ryhl
Adds an MmWithUserAsync type that uses mmput_async when dropped but is
otherwise identical to MmWithUser. This has to be done using a separate
type because the thing we are changing is the destructor.
Rust Binder needs this to avoid a certain deadlock. See commit
9a9ab0d96362 ("binder: fix race between mmput() and do_exit()") for
details. It's also needed in the shrinker to avoid cleaning up the mm in
the shrinker's context.
Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
rust/kernel/mm.rs | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 49 insertions(+)
diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
index 425b73a9dfe6..50f4861ae4b9 100644
--- a/rust/kernel/mm.rs
+++ b/rust/kernel/mm.rs
@@ -98,6 +98,48 @@ fn deref(&self) -> &Mm {
}
}
+/// A wrapper for the kernel's `struct mm_struct`.
+///
+/// This type is identical to `MmWithUser` except that it uses `mmput_async` when dropping a
+/// refcount. This means that the destructor of `ARef<MmWithUserAsync>` is safe to call in atomic
+/// context.
+///
+/// # Invariants
+///
+/// Values of this type are always refcounted using `mmget`. The value of `mm_users` is non-zero.
+#[repr(transparent)]
+pub struct MmWithUserAsync {
+ mm: MmWithUser,
+}
+
+// SAFETY: It is safe to call `mmput_async` on another thread than where `mmget` was called.
+unsafe impl Send for MmWithUserAsync {}
+// SAFETY: All methods on `MmWithUserAsync` can be called in parallel from several threads.
+unsafe impl Sync for MmWithUserAsync {}
+
+// SAFETY: By the type invariants, this type is always refcounted.
+unsafe impl AlwaysRefCounted for MmWithUserAsync {
+ fn inc_ref(&self) {
+ // SAFETY: The pointer is valid since self is a reference.
+ unsafe { bindings::mmget(self.as_raw()) };
+ }
+
+ unsafe fn dec_ref(obj: NonNull<Self>) {
+ // SAFETY: The caller is giving up their refcount.
+ unsafe { bindings::mmput_async(obj.cast().as_ptr()) };
+ }
+}
+
+// Make all `MmWithUser` methods available on `MmWithUserAsync`.
+impl Deref for MmWithUserAsync {
+ type Target = MmWithUser;
+
+ #[inline]
+ fn deref(&self) -> &MmWithUser {
+ &self.mm
+ }
+}
+
// These methods are safe to call even if `mm_users` is zero.
impl Mm {
/// Call `mmgrab` on `current.mm`.
@@ -171,6 +213,13 @@ pub unsafe fn from_raw<'a>(ptr: *const bindings::mm_struct) -> &'a MmWithUser {
unsafe { &*ptr.cast() }
}
+ /// Use `mmput_async` when dropping this refcount.
+ #[inline]
+ pub fn into_mmput_async(me: ARef<MmWithUser>) -> ARef<MmWithUserAsync> {
+ // SAFETY: The layouts and invariants are compatible.
+ unsafe { ARef::from_raw(ARef::into_raw(me).cast()) }
+ }
+
/// Attempt to access a vma using the vma read lock.
///
/// This is an optimistic trylock operation, so it may fail if there is contention. In that
--
2.47.1.613.gc27f4b7a9f-goog
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 5/8] mm: rust: add mmput_async support
2024-12-11 10:37 ` [PATCH v11 5/8] mm: rust: add mmput_async support Alice Ryhl
@ 2024-12-16 13:10 ` Andreas Hindborg
0 siblings, 0 replies; 65+ messages in thread
From: Andreas Hindborg @ 2024-12-16 13:10 UTC (permalink / raw)
To: Alice Ryhl
Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
"Alice Ryhl" <aliceryhl@google.com> writes:
> Adds an MmWithUserAsync type that uses mmput_async when dropped but is
> otherwise identical to MmWithUser. This has to be done using a separate
> type because the thing we are changing is the destructor.
>
> Rust Binder needs this to avoid a certain deadlock. See commit
> 9a9ab0d96362 ("binder: fix race between mmput() and do_exit()") for
> details. It's also needed in the shrinker to avoid cleaning up the mm in
> the shrinker's context.
>
> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
Best regards,
Andreas Hindborg
^ permalink raw reply [flat|nested] 65+ messages in thread
* [PATCH v11 6/8] mm: rust: add VmAreaNew for f_ops->mmap()
2024-12-11 10:37 ` [PATCH v11 0/8] Rust support for mm_struct, vm_area_struct, and mmap Alice Ryhl
` (4 preceding siblings ...)
2024-12-11 10:37 ` [PATCH v11 5/8] mm: rust: add mmput_async support Alice Ryhl
@ 2024-12-11 10:37 ` Alice Ryhl
2024-12-16 13:41 ` Andreas Hindborg
` (2 more replies)
2024-12-11 10:37 ` [PATCH v11 7/8] rust: miscdevice: add mmap support Alice Ryhl
` (3 subsequent siblings)
9 siblings, 3 replies; 65+ messages in thread
From: Alice Ryhl @ 2024-12-11 10:37 UTC (permalink / raw)
To: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan
Cc: Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Trevor Gross, linux-kernel,
linux-mm, rust-for-linux, Alice Ryhl
This type will be used when setting up a new vma in an f_ops->mmap()
hook. Using a separate type from VmAreaRef allows us to have a separate
set of operations that you are only able to use during the mmap() hook.
For example, the VM_MIXEDMAP flag must not be changed after the initial
setup that happens during the f_ops->mmap() hook.
To avoid setting invalid flag values, the methods for clearing
VM_MAYWRITE and similar involve a check of VM_WRITE, and return an error
if VM_WRITE is set. Trying to use `try_clear_maywrite` without checking
the return value results in a compilation error because the `Result`
type is marked #[must_use].
For now, there's only a method for VM_MIXEDMAP and not VM_PFNMAP. When
we add a VM_PFNMAP method, we will need some way to prevent you from
setting both VM_MIXEDMAP and VM_PFNMAP on the same vma.
Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
Reviewed-by: Jann Horn <jannh@google.com>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
rust/kernel/mm/virt.rs | 181 ++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 180 insertions(+), 1 deletion(-)
diff --git a/rust/kernel/mm/virt.rs b/rust/kernel/mm/virt.rs
index 3a23854e14f4..6d9ba56d4f95 100644
--- a/rust/kernel/mm/virt.rs
+++ b/rust/kernel/mm/virt.rs
@@ -6,7 +6,7 @@
use crate::{
bindings,
- error::{to_result, Result},
+ error::{code::EINVAL, to_result, Result},
mm::MmWithUser,
page::Page,
types::Opaque,
@@ -171,6 +171,185 @@ pub fn vm_insert_page(&self, address: usize, page: &Page) -> Result {
}
}
+/// A builder for setting up a vma in an `f_ops->mmap()` hook.
+///
+/// # Invariants
+///
+/// For the duration of 'a, the referenced vma must be undergoing initialization in an
+/// `f_ops->mmap()` hook.
+pub struct VmAreaNew {
+ vma: VmAreaRef,
+}
+
+// Make all `VmAreaRef` methods available on `VmAreaNew`.
+impl Deref for VmAreaNew {
+ type Target = VmAreaRef;
+
+ #[inline]
+ fn deref(&self) -> &VmAreaRef {
+ &self.vma
+ }
+}
+
+impl VmAreaNew {
+ /// Access a virtual memory area given a raw pointer.
+ ///
+ /// # Safety
+ ///
+ /// Callers must ensure that `vma` is undergoing initial vma setup for the duration of 'a.
+ #[inline]
+ pub unsafe fn from_raw<'a>(vma: *const bindings::vm_area_struct) -> &'a Self {
+ // SAFETY: The caller ensures that the invariants are satisfied for the duration of 'a.
+ unsafe { &*vma.cast() }
+ }
+
+ /// Internal method for updating the vma flags.
+ ///
+ /// # Safety
+ ///
+ /// This must not be used to set the flags to an invalid value.
+ #[inline]
+ unsafe fn update_flags(&self, set: vm_flags_t, unset: vm_flags_t) {
+ let mut flags = self.flags();
+ flags |= set;
+ flags &= !unset;
+
+ // SAFETY: This is not a data race: the vma is undergoing initial setup, so it's not yet
+ // shared. Additionally, `VmAreaNew` is `!Sync`, so it cannot be used to write in parallel.
+ // The caller promises that this does not set the flags to an invalid value.
+ unsafe { (*self.as_ptr()).__bindgen_anon_2.__vm_flags = flags };
+ }
+
+ /// Set the `VM_MIXEDMAP` flag on this vma.
+ ///
+ /// This enables the vma to contain both `struct page` and pure PFN pages. Returns a reference
+ /// that can be used to call `vm_insert_page` on the vma.
+ #[inline]
+ pub fn set_mixedmap(&self) -> &VmAreaMixedMap {
+ // SAFETY: We don't yet provide a way to set VM_PFNMAP, so this cannot put the flags in an
+ // invalid state.
+ unsafe { self.update_flags(flags::MIXEDMAP, 0) };
+
+ // SAFETY: We just set `VM_MIXEDMAP` on the vma.
+ unsafe { VmAreaMixedMap::from_raw(self.vma.as_ptr()) }
+ }
+
+ /// Set the `VM_IO` flag on this vma.
+ ///
+ /// This is used for memory mapped IO and similar. The flag tells other parts of the kernel to
+ /// avoid looking at the pages. For memory mapped IO this is useful as accesses to the pages
+ /// could have side effects.
+ #[inline]
+ pub fn set_io(&self) {
+ // SAFETY: Setting the VM_IO flag is always okay.
+ unsafe { self.update_flags(flags::IO, 0) };
+ }
+
+ /// Set the `VM_DONTEXPAND` flag on this vma.
+ ///
+ /// This prevents the vma from being expanded with `mremap()`.
+ #[inline]
+ pub fn set_dontexpand(&self) {
+ // SAFETY: Setting the VM_DONTEXPAND flag is always okay.
+ unsafe { self.update_flags(flags::DONTEXPAND, 0) };
+ }
+
+ /// Set the `VM_DONTCOPY` flag on this vma.
+ ///
+ /// This prevents the vma from being copied on fork. This option is only permanent if `VM_IO`
+ /// is set.
+ #[inline]
+ pub fn set_dontcopy(&self) {
+ // SAFETY: Setting the VM_DONTCOPY flag is always okay.
+ unsafe { self.update_flags(flags::DONTCOPY, 0) };
+ }
+
+ /// Set the `VM_DONTDUMP` flag on this vma.
+ ///
+ /// This prevents the vma from being included in core dumps. This option is only permanent if
+ /// `VM_IO` is set.
+ #[inline]
+ pub fn set_dontdump(&self) {
+ // SAFETY: Setting the VM_DONTDUMP flag is always okay.
+ unsafe { self.update_flags(flags::DONTDUMP, 0) };
+ }
+
+ /// Returns whether `VM_READ` is set.
+ ///
+ /// This flag indicates whether userspace is mapping this vma as readable.
+ #[inline]
+ pub fn get_read(&self) -> bool {
+ (self.flags() & flags::READ) != 0
+ }
+
+ /// Try to clear the `VM_MAYREAD` flag, failing if `VM_READ` is set.
+ ///
+ /// This flag indicates whether userspace is allowed to make this vma readable with
+ /// `mprotect()`.
+ ///
+ /// Note that this operation is irreversible. Once `VM_MAYREAD` has been cleared, it can never
+ /// be set again.
+ #[inline]
+ pub fn try_clear_mayread(&self) -> Result {
+ if self.get_read() {
+ return Err(EINVAL);
+ }
+ // SAFETY: Clearing `VM_MAYREAD` is okay when `VM_READ` is not set.
+ unsafe { self.update_flags(0, flags::MAYREAD) };
+ Ok(())
+ }
+
+ /// Returns whether `VM_WRITE` is set.
+ ///
+ /// This flag indicates whether userspace is mapping this vma as writable.
+ #[inline]
+ pub fn get_write(&self) -> bool {
+ (self.flags() & flags::WRITE) != 0
+ }
+
+ /// Try to clear the `VM_MAYWRITE` flag, failing if `VM_WRITE` is set.
+ ///
+ /// This flag indicates whether userspace is allowed to make this vma writable with
+ /// `mprotect()`.
+ ///
+ /// Note that this operation is irreversible. Once `VM_MAYWRITE` has been cleared, it can never
+ /// be set again.
+ #[inline]
+ pub fn try_clear_maywrite(&self) -> Result {
+ if self.get_write() {
+ return Err(EINVAL);
+ }
+ // SAFETY: Clearing `VM_MAYWRITE` is okay when `VM_WRITE` is not set.
+ unsafe { self.update_flags(0, flags::MAYWRITE) };
+ Ok(())
+ }
+
+ /// Returns whether `VM_EXEC` is set.
+ ///
+ /// This flag indicates whether userspace is mapping this vma as executable.
+ #[inline]
+ pub fn get_exec(&self) -> bool {
+ (self.flags() & flags::EXEC) != 0
+ }
+
+ /// Try to clear the `VM_MAYEXEC` flag, failing if `VM_EXEC` is set.
+ ///
+ /// This flag indicates whether userspace is allowed to make this vma executable with
+ /// `mprotect()`.
+ ///
+ /// Note that this operation is irreversible. Once `VM_MAYEXEC` has been cleared, it can never
+ /// be set again.
+ #[inline]
+ pub fn try_clear_mayexec(&self) -> Result {
+ if self.get_exec() {
+ return Err(EINVAL);
+ }
+ // SAFETY: Clearing `VM_MAYEXEC` is okay when `VM_EXEC` is not set.
+ unsafe { self.update_flags(0, flags::MAYEXEC) };
+ Ok(())
+ }
+}
+
/// The integer type used for vma flags.
#[doc(inline)]
pub use bindings::vm_flags_t;
--
2.47.1.613.gc27f4b7a9f-goog
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 6/8] mm: rust: add VmAreaNew for f_ops->mmap()
2024-12-11 10:37 ` [PATCH v11 6/8] mm: rust: add VmAreaNew for f_ops->mmap() Alice Ryhl
@ 2024-12-16 13:41 ` Andreas Hindborg
2025-01-08 12:23 ` Alice Ryhl
2024-12-17 9:31 ` Andreas Hindborg
2025-01-10 13:34 ` Alice Ryhl
2 siblings, 1 reply; 65+ messages in thread
From: Andreas Hindborg @ 2024-12-16 13:41 UTC (permalink / raw)
To: Alice Ryhl
Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
"Alice Ryhl" <aliceryhl@google.com> writes:
> This type will be used when setting up a new vma in an f_ops->mmap()
> hook. Using a separate type from VmAreaRef allows us to have a separate
> set of operations that you are only able to use during the mmap() hook.
> For example, the VM_MIXEDMAP flag must not be changed after the initial
> setup that happens during the f_ops->mmap() hook.
>
> To avoid setting invalid flag values, the methods for clearing
> VM_MAYWRITE and similar involve a check of VM_WRITE, and return an error
> if VM_WRITE is set. Trying to use `try_clear_maywrite` without checking
> the return value results in a compilation error because the `Result`
> type is marked #[must_use].
>
> For now, there's only a method for VM_MIXEDMAP and not VM_PFNMAP. When
> we add a VM_PFNMAP method, we will need some way to prevent you from
> setting both VM_MIXEDMAP and VM_PFNMAP on the same vma.
>
> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> Reviewed-by: Jann Horn <jannh@google.com>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> ---
> rust/kernel/mm/virt.rs | 181 ++++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 180 insertions(+), 1 deletion(-)
>
> diff --git a/rust/kernel/mm/virt.rs b/rust/kernel/mm/virt.rs
> index 3a23854e14f4..6d9ba56d4f95 100644
> --- a/rust/kernel/mm/virt.rs
> +++ b/rust/kernel/mm/virt.rs
> @@ -6,7 +6,7 @@
>
> use crate::{
> bindings,
> - error::{to_result, Result},
> + error::{code::EINVAL, to_result, Result},
> mm::MmWithUser,
> page::Page,
> types::Opaque,
> @@ -171,6 +171,185 @@ pub fn vm_insert_page(&self, address: usize, page: &Page) -> Result {
> }
> }
>
> +/// A builder for setting up a vma in an `f_ops->mmap()` hook.
Reading this line, I would expect to be able to chain update methods as
in `Builder::new().prop_a().prop_b().build()`. Could/should this type
accommodate a proper builder pattern? Or is "builder" not the right word
to use here?
Best regards,
Andreas Hindborg
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 6/8] mm: rust: add VmAreaNew for f_ops->mmap()
2024-12-16 13:41 ` Andreas Hindborg
@ 2025-01-08 12:23 ` Alice Ryhl
2025-01-09 8:19 ` Andreas Hindborg
0 siblings, 1 reply; 65+ messages in thread
From: Alice Ryhl @ 2025-01-08 12:23 UTC (permalink / raw)
To: Andreas Hindborg
Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
On Mon, Dec 16, 2024 at 3:51 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>
> "Alice Ryhl" <aliceryhl@google.com> writes:
>
> > This type will be used when setting up a new vma in an f_ops->mmap()
> > hook. Using a separate type from VmAreaRef allows us to have a separate
> > set of operations that you are only able to use during the mmap() hook.
> > For example, the VM_MIXEDMAP flag must not be changed after the initial
> > setup that happens during the f_ops->mmap() hook.
> >
> > To avoid setting invalid flag values, the methods for clearing
> > VM_MAYWRITE and similar involve a check of VM_WRITE, and return an error
> > if VM_WRITE is set. Trying to use `try_clear_maywrite` without checking
> > the return value results in a compilation error because the `Result`
> > type is marked #[must_use].
> >
> > For now, there's only a method for VM_MIXEDMAP and not VM_PFNMAP. When
> > we add a VM_PFNMAP method, we will need some way to prevent you from
> > setting both VM_MIXEDMAP and VM_PFNMAP on the same vma.
> >
> > Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> > Reviewed-by: Jann Horn <jannh@google.com>
> > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> > ---
> > rust/kernel/mm/virt.rs | 181 ++++++++++++++++++++++++++++++++++++++++++++++++-
> > 1 file changed, 180 insertions(+), 1 deletion(-)
> >
> > diff --git a/rust/kernel/mm/virt.rs b/rust/kernel/mm/virt.rs
> > index 3a23854e14f4..6d9ba56d4f95 100644
> > --- a/rust/kernel/mm/virt.rs
> > +++ b/rust/kernel/mm/virt.rs
> > @@ -6,7 +6,7 @@
> >
> > use crate::{
> > bindings,
> > - error::{to_result, Result},
> > + error::{code::EINVAL, to_result, Result},
> > mm::MmWithUser,
> > page::Page,
> > types::Opaque,
> > @@ -171,6 +171,185 @@ pub fn vm_insert_page(&self, address: usize, page: &Page) -> Result {
> > }
> > }
> >
> > +/// A builder for setting up a vma in an `f_ops->mmap()` hook.
>
> Reading this line, I would expect to be able to chain update methods as
> in `Builder::new().prop_a().prop_b().build()`. Could/should this type
> accommodate a proper builder pattern? Or is "builder" not the right word
> to use here?
You cannot create values of this type yourself. Only the C
infrastructure can do so.
What would you call it if not "builder"?
Alice
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 6/8] mm: rust: add VmAreaNew for f_ops->mmap()
2025-01-08 12:23 ` Alice Ryhl
@ 2025-01-09 8:19 ` Andreas Hindborg
2025-01-13 10:17 ` Alice Ryhl
0 siblings, 1 reply; 65+ messages in thread
From: Andreas Hindborg @ 2025-01-09 8:19 UTC (permalink / raw)
To: Alice Ryhl
Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
"Alice Ryhl" <aliceryhl@google.com> writes:
> On Mon, Dec 16, 2024 at 3:51 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>>
>> "Alice Ryhl" <aliceryhl@google.com> writes:
>>
>> > This type will be used when setting up a new vma in an f_ops->mmap()
>> > hook. Using a separate type from VmAreaRef allows us to have a separate
>> > set of operations that you are only able to use during the mmap() hook.
>> > For example, the VM_MIXEDMAP flag must not be changed after the initial
>> > setup that happens during the f_ops->mmap() hook.
>> >
>> > To avoid setting invalid flag values, the methods for clearing
>> > VM_MAYWRITE and similar involve a check of VM_WRITE, and return an error
>> > if VM_WRITE is set. Trying to use `try_clear_maywrite` without checking
>> > the return value results in a compilation error because the `Result`
>> > type is marked #[must_use].
>> >
>> > For now, there's only a method for VM_MIXEDMAP and not VM_PFNMAP. When
>> > we add a VM_PFNMAP method, we will need some way to prevent you from
>> > setting both VM_MIXEDMAP and VM_PFNMAP on the same vma.
>> >
>> > Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
>> > Reviewed-by: Jann Horn <jannh@google.com>
>> > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
>> > ---
>> > rust/kernel/mm/virt.rs | 181 ++++++++++++++++++++++++++++++++++++++++++++++++-
>> > 1 file changed, 180 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/rust/kernel/mm/virt.rs b/rust/kernel/mm/virt.rs
>> > index 3a23854e14f4..6d9ba56d4f95 100644
>> > --- a/rust/kernel/mm/virt.rs
>> > +++ b/rust/kernel/mm/virt.rs
>> > @@ -6,7 +6,7 @@
>> >
>> > use crate::{
>> > bindings,
>> > - error::{to_result, Result},
>> > + error::{code::EINVAL, to_result, Result},
>> > mm::MmWithUser,
>> > page::Page,
>> > types::Opaque,
>> > @@ -171,6 +171,185 @@ pub fn vm_insert_page(&self, address: usize, page: &Page) -> Result {
>> > }
>> > }
>> >
>> > +/// A builder for setting up a vma in an `f_ops->mmap()` hook.
>>
>> Reading this line, I would expect to be able to chain update methods as
>> in `Builder::new().prop_a().prop_b().build()`. Could/should this type
>> accommodate a proper builder pattern? Or is "builder" not the right word
>> to use here?
>
> You cannot create values of this type yourself. Only the C
> infrastructure can do so.
>
> What would you call it if not "builder"?
It looks more like a newtype with a bunch of setters and getters. It
also does not have a method to instantiate (`build()` or similar). So
how about newtype?
Best regards,
Andreas Hindborg
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 6/8] mm: rust: add VmAreaNew for f_ops->mmap()
2025-01-09 8:19 ` Andreas Hindborg
@ 2025-01-13 10:17 ` Alice Ryhl
2025-01-15 9:57 ` Andreas Hindborg
0 siblings, 1 reply; 65+ messages in thread
From: Alice Ryhl @ 2025-01-13 10:17 UTC (permalink / raw)
To: Andreas Hindborg
Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
On Thu, Jan 9, 2025 at 9:19 AM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>
> "Alice Ryhl" <aliceryhl@google.com> writes:
>
> > On Mon, Dec 16, 2024 at 3:51 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
> >>
> >> "Alice Ryhl" <aliceryhl@google.com> writes:
> >>
> >> > This type will be used when setting up a new vma in an f_ops->mmap()
> >> > hook. Using a separate type from VmAreaRef allows us to have a separate
> >> > set of operations that you are only able to use during the mmap() hook.
> >> > For example, the VM_MIXEDMAP flag must not be changed after the initial
> >> > setup that happens during the f_ops->mmap() hook.
> >> >
> >> > To avoid setting invalid flag values, the methods for clearing
> >> > VM_MAYWRITE and similar involve a check of VM_WRITE, and return an error
> >> > if VM_WRITE is set. Trying to use `try_clear_maywrite` without checking
> >> > the return value results in a compilation error because the `Result`
> >> > type is marked #[must_use].
> >> >
> >> > For now, there's only a method for VM_MIXEDMAP and not VM_PFNMAP. When
> >> > we add a VM_PFNMAP method, we will need some way to prevent you from
> >> > setting both VM_MIXEDMAP and VM_PFNMAP on the same vma.
> >> >
> >> > Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> >> > Reviewed-by: Jann Horn <jannh@google.com>
> >> > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> >> > ---
> >> > rust/kernel/mm/virt.rs | 181 ++++++++++++++++++++++++++++++++++++++++++++++++-
> >> > 1 file changed, 180 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/rust/kernel/mm/virt.rs b/rust/kernel/mm/virt.rs
> >> > index 3a23854e14f4..6d9ba56d4f95 100644
> >> > --- a/rust/kernel/mm/virt.rs
> >> > +++ b/rust/kernel/mm/virt.rs
> >> > @@ -6,7 +6,7 @@
> >> >
> >> > use crate::{
> >> > bindings,
> >> > - error::{to_result, Result},
> >> > + error::{code::EINVAL, to_result, Result},
> >> > mm::MmWithUser,
> >> > page::Page,
> >> > types::Opaque,
> >> > @@ -171,6 +171,185 @@ pub fn vm_insert_page(&self, address: usize, page: &Page) -> Result {
> >> > }
> >> > }
> >> >
> >> > +/// A builder for setting up a vma in an `f_ops->mmap()` hook.
> >>
> >> Reading this line, I would expect to be able to chain update methods as
> >> in `Builder::new().prop_a().prop_b().build()`. Could/should this type
> >> accommodate a proper builder pattern? Or is "builder" not the right word
> >> to use here?
> >
> > You cannot create values of this type yourself. Only the C
> > infrastructure can do so.
> >
> > What would you call it if not "builder"?
>
> It looks more like a newtype with a bunch of setters and getters. It
> also does not have a method to instantiate (`build()` or similar). So
> how about newtype?
I don't think newtype is helpful. Ultimately, the f_ops->mmap() hook
is a *constructor* for a VMA, and the VmAreaNew type represents a VMA
whose constructor is currently running. The "method to instantiate" is
called "return".
fn mmap(new_vma: &VmAreaNew) -> Result {
// VMAs for this driver must not be mapped as executable
new_vma.try_clear_may_exec()?;
// we are done constructing the vma, so return
Ok(())
}
Alice
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 6/8] mm: rust: add VmAreaNew for f_ops->mmap()
2025-01-13 10:17 ` Alice Ryhl
@ 2025-01-15 9:57 ` Andreas Hindborg
0 siblings, 0 replies; 65+ messages in thread
From: Andreas Hindborg @ 2025-01-15 9:57 UTC (permalink / raw)
To: Alice Ryhl
Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
"Alice Ryhl" <aliceryhl@google.com> writes:
> On Thu, Jan 9, 2025 at 9:19 AM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>>
>> "Alice Ryhl" <aliceryhl@google.com> writes:
>>
>> > On Mon, Dec 16, 2024 at 3:51 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>> >>
>> >> "Alice Ryhl" <aliceryhl@google.com> writes:
>> >>
>> >> > This type will be used when setting up a new vma in an f_ops->mmap()
>> >> > hook. Using a separate type from VmAreaRef allows us to have a separate
>> >> > set of operations that you are only able to use during the mmap() hook.
>> >> > For example, the VM_MIXEDMAP flag must not be changed after the initial
>> >> > setup that happens during the f_ops->mmap() hook.
>> >> >
>> >> > To avoid setting invalid flag values, the methods for clearing
>> >> > VM_MAYWRITE and similar involve a check of VM_WRITE, and return an error
>> >> > if VM_WRITE is set. Trying to use `try_clear_maywrite` without checking
>> >> > the return value results in a compilation error because the `Result`
>> >> > type is marked #[must_use].
>> >> >
>> >> > For now, there's only a method for VM_MIXEDMAP and not VM_PFNMAP. When
>> >> > we add a VM_PFNMAP method, we will need some way to prevent you from
>> >> > setting both VM_MIXEDMAP and VM_PFNMAP on the same vma.
>> >> >
>> >> > Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
>> >> > Reviewed-by: Jann Horn <jannh@google.com>
>> >> > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
>> >> > ---
>> >> > rust/kernel/mm/virt.rs | 181 ++++++++++++++++++++++++++++++++++++++++++++++++-
>> >> > 1 file changed, 180 insertions(+), 1 deletion(-)
>> >> >
>> >> > diff --git a/rust/kernel/mm/virt.rs b/rust/kernel/mm/virt.rs
>> >> > index 3a23854e14f4..6d9ba56d4f95 100644
>> >> > --- a/rust/kernel/mm/virt.rs
>> >> > +++ b/rust/kernel/mm/virt.rs
>> >> > @@ -6,7 +6,7 @@
>> >> >
>> >> > use crate::{
>> >> > bindings,
>> >> > - error::{to_result, Result},
>> >> > + error::{code::EINVAL, to_result, Result},
>> >> > mm::MmWithUser,
>> >> > page::Page,
>> >> > types::Opaque,
>> >> > @@ -171,6 +171,185 @@ pub fn vm_insert_page(&self, address: usize, page: &Page) -> Result {
>> >> > }
>> >> > }
>> >> >
>> >> > +/// A builder for setting up a vma in an `f_ops->mmap()` hook.
>> >>
>> >> Reading this line, I would expect to be able to chain update methods as
>> >> in `Builder::new().prop_a().prop_b().build()`. Could/should this type
>> >> accommodate a proper builder pattern? Or is "builder" not the right word
>> >> to use here?
>> >
>> > You cannot create values of this type yourself. Only the C
>> > infrastructure can do so.
>> >
>> > What would you call it if not "builder"?
>>
>> It looks more like a newtype with a bunch of setters and getters. It
>> also does not have a method to instantiate (`build()` or similar). So
>> how about newtype?
>
> I don't think newtype is helpful. Ultimately, the f_ops->mmap() hook
> is a *constructor* for a VMA, and the VmAreaNew type represents a VMA
> whose constructor is currently running. The "method to instantiate" is
> called "return".
>
> fn mmap(new_vma: &VmAreaNew) -> Result {
> // VMAs for this driver must not be mapped as executable
> new_vma.try_clear_may_exec()?;
>
> // we are done constructing the vma, so return
> Ok(())
> }
>
> Alice
Right. Let's update the docs for the `mmap` hook then:
+ /// Handle for mmap.
+ fn mmap(
+ _device: <Self::Ptr as ForeignOwnable>::Borrowed<'_>,
+ _file: &File,
+ _vma: &VmAreaNew,
+ ) -> Result {
+ kernel::build_error!(VTABLE_DEFAULT_ERROR)
+ }
+
```
Handle for mmap.
This function is invoked when a user space process invokes the `mmap`
system call on `_file`. The function is a callback that is part of the
VMA initializer. The kernel will do initial setup of the VMA before
calling this function. The function can then interact with the VMA
initialization by calling methods of `_vma`. If the function does not
return an error, the kernel will complete intialization of the VMA
according to the properties of `_vma`.
```
But I still do not think "builder" is the right term. `VmAreaNew` is
more like a configuration object to pass initialization properties of
the VMA back to the kernel.
How about "configuration object"?
Best regards,
Andreas Hindborg
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH v11 6/8] mm: rust: add VmAreaNew for f_ops->mmap()
2024-12-11 10:37 ` [PATCH v11 6/8] mm: rust: add VmAreaNew for f_ops->mmap() Alice Ryhl
2024-12-16 13:41 ` Andreas Hindborg
@ 2024-12-17 9:31 ` Andreas Hindborg
2025-01-08 12:24 ` Alice Ryhl
2025-01-10 13:34 ` Alice Ryhl
2 siblings, 1 reply; 65+ messages in thread
From: Andreas Hindborg @ 2024-12-17 9:31 UTC (permalink / raw)
To: Alice Ryhl
Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
"Alice Ryhl" <aliceryhl@google.com> writes:
> This type will be used when setting up a new vma in an f_ops->mmap()
> hook. Using a separate type from VmAreaRef allows us to have a separate
> set of operations that you are only able to use during the mmap() hook.
> For example, the VM_MIXEDMAP flag must not be changed after the initial
> setup that happens during the f_ops->mmap() hook.
>
> To avoid setting invalid flag values, the methods for clearing
> VM_MAYWRITE and similar involve a check of VM_WRITE, and return an error
> if VM_WRITE is set. Trying to use `try_clear_maywrite` without checking
> the return value results in a compilation error because the `Result`
> type is marked #[must_use].
>
> For now, there's only a method for VM_MIXEDMAP and not VM_PFNMAP. When
> we add a VM_PFNMAP method, we will need some way to prevent you from
> setting both VM_MIXEDMAP and VM_PFNMAP on the same vma.
>
> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> Reviewed-by: Jann Horn <jannh@google.com>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> ---
> rust/kernel/mm/virt.rs | 181 ++++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 180 insertions(+), 1 deletion(-)
>
> diff --git a/rust/kernel/mm/virt.rs b/rust/kernel/mm/virt.rs
> index 3a23854e14f4..6d9ba56d4f95 100644
> --- a/rust/kernel/mm/virt.rs
> +++ b/rust/kernel/mm/virt.rs
[cut]
> + /// Returns whether `VM_READ` is set.
> + ///
> + /// This flag indicates whether userspace is mapping this vma as readable.
> + #[inline]
> + pub fn get_read(&self) -> bool {
> + (self.flags() & flags::READ) != 0
> + }
As an afterthought, should we name these getters according to RFC344 [1]
(remove get_ prefix)?
Best regards,
Andreas Hindborg
[1] https://github.com/rust-lang/rfcs/blob/master/text/0344-conventions-galore.md#gettersetter-apis
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 6/8] mm: rust: add VmAreaNew for f_ops->mmap()
2024-12-17 9:31 ` Andreas Hindborg
@ 2025-01-08 12:24 ` Alice Ryhl
2025-01-09 8:23 ` Andreas Hindborg
0 siblings, 1 reply; 65+ messages in thread
From: Alice Ryhl @ 2025-01-08 12:24 UTC (permalink / raw)
To: Andreas Hindborg
Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
On Tue, Dec 17, 2024 at 10:31 AM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>
> "Alice Ryhl" <aliceryhl@google.com> writes:
>
> > This type will be used when setting up a new vma in an f_ops->mmap()
> > hook. Using a separate type from VmAreaRef allows us to have a separate
> > set of operations that you are only able to use during the mmap() hook.
> > For example, the VM_MIXEDMAP flag must not be changed after the initial
> > setup that happens during the f_ops->mmap() hook.
> >
> > To avoid setting invalid flag values, the methods for clearing
> > VM_MAYWRITE and similar involve a check of VM_WRITE, and return an error
> > if VM_WRITE is set. Trying to use `try_clear_maywrite` without checking
> > the return value results in a compilation error because the `Result`
> > type is marked #[must_use].
> >
> > For now, there's only a method for VM_MIXEDMAP and not VM_PFNMAP. When
> > we add a VM_PFNMAP method, we will need some way to prevent you from
> > setting both VM_MIXEDMAP and VM_PFNMAP on the same vma.
> >
> > Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> > Reviewed-by: Jann Horn <jannh@google.com>
> > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> > ---
> > rust/kernel/mm/virt.rs | 181 ++++++++++++++++++++++++++++++++++++++++++++++++-
> > 1 file changed, 180 insertions(+), 1 deletion(-)
> >
> > diff --git a/rust/kernel/mm/virt.rs b/rust/kernel/mm/virt.rs
> > index 3a23854e14f4..6d9ba56d4f95 100644
> > --- a/rust/kernel/mm/virt.rs
> > +++ b/rust/kernel/mm/virt.rs
>
> [cut]
>
> > + /// Returns whether `VM_READ` is set.
> > + ///
> > + /// This flag indicates whether userspace is mapping this vma as readable.
> > + #[inline]
> > + pub fn get_read(&self) -> bool {
> > + (self.flags() & flags::READ) != 0
> > + }
>
> As an afterthought, should we name these getters according to RFC344 [1]
> (remove get_ prefix)?
Well, perhaps is_readable?
Alice
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 6/8] mm: rust: add VmAreaNew for f_ops->mmap()
2025-01-08 12:24 ` Alice Ryhl
@ 2025-01-09 8:23 ` Andreas Hindborg
2025-01-13 10:18 ` Alice Ryhl
0 siblings, 1 reply; 65+ messages in thread
From: Andreas Hindborg @ 2025-01-09 8:23 UTC (permalink / raw)
To: Alice Ryhl
Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
"Alice Ryhl" <aliceryhl@google.com> writes:
> On Tue, Dec 17, 2024 at 10:31 AM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>>
>> "Alice Ryhl" <aliceryhl@google.com> writes:
>>
>> > This type will be used when setting up a new vma in an f_ops->mmap()
>> > hook. Using a separate type from VmAreaRef allows us to have a separate
>> > set of operations that you are only able to use during the mmap() hook.
>> > For example, the VM_MIXEDMAP flag must not be changed after the initial
>> > setup that happens during the f_ops->mmap() hook.
>> >
>> > To avoid setting invalid flag values, the methods for clearing
>> > VM_MAYWRITE and similar involve a check of VM_WRITE, and return an error
>> > if VM_WRITE is set. Trying to use `try_clear_maywrite` without checking
>> > the return value results in a compilation error because the `Result`
>> > type is marked #[must_use].
>> >
>> > For now, there's only a method for VM_MIXEDMAP and not VM_PFNMAP. When
>> > we add a VM_PFNMAP method, we will need some way to prevent you from
>> > setting both VM_MIXEDMAP and VM_PFNMAP on the same vma.
>> >
>> > Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
>> > Reviewed-by: Jann Horn <jannh@google.com>
>> > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
>> > ---
>> > rust/kernel/mm/virt.rs | 181 ++++++++++++++++++++++++++++++++++++++++++++++++-
>> > 1 file changed, 180 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/rust/kernel/mm/virt.rs b/rust/kernel/mm/virt.rs
>> > index 3a23854e14f4..6d9ba56d4f95 100644
>> > --- a/rust/kernel/mm/virt.rs
>> > +++ b/rust/kernel/mm/virt.rs
>>
>> [cut]
>>
>> > + /// Returns whether `VM_READ` is set.
>> > + ///
>> > + /// This flag indicates whether userspace is mapping this vma as readable.
>> > + #[inline]
>> > + pub fn get_read(&self) -> bool {
>> > + (self.flags() & flags::READ) != 0
>> > + }
>>
>> As an afterthought, should we name these getters according to RFC344 [1]
>> (remove get_ prefix)?
>
> Well, perhaps is_readable?
Why not just `readable() -> bool`? That would match the guidelines.
Best regards,
Andreas Hindborg
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 6/8] mm: rust: add VmAreaNew for f_ops->mmap()
2025-01-09 8:23 ` Andreas Hindborg
@ 2025-01-13 10:18 ` Alice Ryhl
0 siblings, 0 replies; 65+ messages in thread
From: Alice Ryhl @ 2025-01-13 10:18 UTC (permalink / raw)
To: Andreas Hindborg
Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
On Thu, Jan 9, 2025 at 9:23 AM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>
> "Alice Ryhl" <aliceryhl@google.com> writes:
>
> > On Tue, Dec 17, 2024 at 10:31 AM Andreas Hindborg <a.hindborg@kernel.org> wrote:
> >>
> >> "Alice Ryhl" <aliceryhl@google.com> writes:
> >>
> >> > This type will be used when setting up a new vma in an f_ops->mmap()
> >> > hook. Using a separate type from VmAreaRef allows us to have a separate
> >> > set of operations that you are only able to use during the mmap() hook.
> >> > For example, the VM_MIXEDMAP flag must not be changed after the initial
> >> > setup that happens during the f_ops->mmap() hook.
> >> >
> >> > To avoid setting invalid flag values, the methods for clearing
> >> > VM_MAYWRITE and similar involve a check of VM_WRITE, and return an error
> >> > if VM_WRITE is set. Trying to use `try_clear_maywrite` without checking
> >> > the return value results in a compilation error because the `Result`
> >> > type is marked #[must_use].
> >> >
> >> > For now, there's only a method for VM_MIXEDMAP and not VM_PFNMAP. When
> >> > we add a VM_PFNMAP method, we will need some way to prevent you from
> >> > setting both VM_MIXEDMAP and VM_PFNMAP on the same vma.
> >> >
> >> > Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> >> > Reviewed-by: Jann Horn <jannh@google.com>
> >> > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> >> > ---
> >> > rust/kernel/mm/virt.rs | 181 ++++++++++++++++++++++++++++++++++++++++++++++++-
> >> > 1 file changed, 180 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/rust/kernel/mm/virt.rs b/rust/kernel/mm/virt.rs
> >> > index 3a23854e14f4..6d9ba56d4f95 100644
> >> > --- a/rust/kernel/mm/virt.rs
> >> > +++ b/rust/kernel/mm/virt.rs
> >>
> >> [cut]
> >>
> >> > + /// Returns whether `VM_READ` is set.
> >> > + ///
> >> > + /// This flag indicates whether userspace is mapping this vma as readable.
> >> > + #[inline]
> >> > + pub fn get_read(&self) -> bool {
> >> > + (self.flags() & flags::READ) != 0
> >> > + }
> >>
> >> As an afterthought, should we name these getters according to RFC344 [1]
> >> (remove get_ prefix)?
> >
> > Well, perhaps is_readable?
>
> Why not just `readable() -> bool`? That would match the guidelines.
I guess that could work.
Alice
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH v11 6/8] mm: rust: add VmAreaNew for f_ops->mmap()
2024-12-11 10:37 ` [PATCH v11 6/8] mm: rust: add VmAreaNew for f_ops->mmap() Alice Ryhl
2024-12-16 13:41 ` Andreas Hindborg
2024-12-17 9:31 ` Andreas Hindborg
@ 2025-01-10 13:34 ` Alice Ryhl
2025-01-10 16:09 ` Lorenzo Stoakes
2 siblings, 1 reply; 65+ messages in thread
From: Alice Ryhl @ 2025-01-10 13:34 UTC (permalink / raw)
To: Lorenzo Stoakes, Frederick Mayle
Cc: Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Trevor Gross, linux-kernel,
linux-mm, rust-for-linux, Miguel Ojeda, Matthew Wilcox,
Vlastimil Babka, John Hubbard, Liam R. Howlett, Andrew Morton,
Greg Kroah-Hartman, Arnd Bergmann, Christian Brauner, Jann Horn,
Suren Baghdasaryan
On Wed, Dec 11, 2024 at 11:37 AM Alice Ryhl <aliceryhl@google.com> wrote:
>
> This type will be used when setting up a new vma in an f_ops->mmap()
> hook. Using a separate type from VmAreaRef allows us to have a separate
> set of operations that you are only able to use during the mmap() hook.
> For example, the VM_MIXEDMAP flag must not be changed after the initial
> setup that happens during the f_ops->mmap() hook.
>
> To avoid setting invalid flag values, the methods for clearing
> VM_MAYWRITE and similar involve a check of VM_WRITE, and return an error
> if VM_WRITE is set. Trying to use `try_clear_maywrite` without checking
> the return value results in a compilation error because the `Result`
> type is marked #[must_use].
>
> For now, there's only a method for VM_MIXEDMAP and not VM_PFNMAP. When
> we add a VM_PFNMAP method, we will need some way to prevent you from
> setting both VM_MIXEDMAP and VM_PFNMAP on the same vma.
>
> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> Reviewed-by: Jann Horn <jannh@google.com>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
[...]
> +impl VmAreaNew {
> + /// Access a virtual memory area given a raw pointer.
> + ///
> + /// # Safety
> + ///
> + /// Callers must ensure that `vma` is undergoing initial vma setup for the duration of 'a.
> + #[inline]
> + pub unsafe fn from_raw<'a>(vma: *const bindings::vm_area_struct) -> &'a Self {
> + // SAFETY: The caller ensures that the invariants are satisfied for the duration of 'a.
> + unsafe { &*vma.cast() }
> + }
It was suggested at https://r.android.com/3389887 that this should
take a mutable raw pointer for better intent. That's fine with me
(Rust doesn't care). Lorenzo, what do you think?
Alice
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 6/8] mm: rust: add VmAreaNew for f_ops->mmap()
2025-01-10 13:34 ` Alice Ryhl
@ 2025-01-10 16:09 ` Lorenzo Stoakes
0 siblings, 0 replies; 65+ messages in thread
From: Lorenzo Stoakes @ 2025-01-10 16:09 UTC (permalink / raw)
To: Alice Ryhl
Cc: Frederick Mayle, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Trevor Gross, linux-kernel, linux-mm, rust-for-linux,
Miguel Ojeda, Matthew Wilcox, Vlastimil Babka, John Hubbard,
Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan
On Fri, Jan 10, 2025 at 02:34:48PM +0100, Alice Ryhl wrote:
> On Wed, Dec 11, 2024 at 11:37 AM Alice Ryhl <aliceryhl@google.com> wrote:
> >
> > This type will be used when setting up a new vma in an f_ops->mmap()
> > hook. Using a separate type from VmAreaRef allows us to have a separate
> > set of operations that you are only able to use during the mmap() hook.
> > For example, the VM_MIXEDMAP flag must not be changed after the initial
> > setup that happens during the f_ops->mmap() hook.
> >
> > To avoid setting invalid flag values, the methods for clearing
> > VM_MAYWRITE and similar involve a check of VM_WRITE, and return an error
> > if VM_WRITE is set. Trying to use `try_clear_maywrite` without checking
> > the return value results in a compilation error because the `Result`
> > type is marked #[must_use].
> >
> > For now, there's only a method for VM_MIXEDMAP and not VM_PFNMAP. When
> > we add a VM_PFNMAP method, we will need some way to prevent you from
> > setting both VM_MIXEDMAP and VM_PFNMAP on the same vma.
> >
> > Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> > Reviewed-by: Jann Horn <jannh@google.com>
> > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
>
> [...]
>
> > +impl VmAreaNew {
> > + /// Access a virtual memory area given a raw pointer.
> > + ///
> > + /// # Safety
> > + ///
> > + /// Callers must ensure that `vma` is undergoing initial vma setup for the duration of 'a.
> > + #[inline]
> > + pub unsafe fn from_raw<'a>(vma: *const bindings::vm_area_struct) -> &'a Self {
> > + // SAFETY: The caller ensures that the invariants are satisfied for the duration of 'a.
> > + unsafe { &*vma.cast() }
> > + }
>
> It was suggested at https://r.android.com/3389887 that this should
> take a mutable raw pointer for better intent. That's fine with me
> (Rust doesn't care). Lorenzo, what do you think?
Yeah sounds reasonable, in C it's mutable right up until... well ok it
never stops being that :P
>
> Alice
^ permalink raw reply [flat|nested] 65+ messages in thread
* [PATCH v11 7/8] rust: miscdevice: add mmap support
2024-12-11 10:37 ` [PATCH v11 0/8] Rust support for mm_struct, vm_area_struct, and mmap Alice Ryhl
` (5 preceding siblings ...)
2024-12-11 10:37 ` [PATCH v11 6/8] mm: rust: add VmAreaNew for f_ops->mmap() Alice Ryhl
@ 2024-12-11 10:37 ` Alice Ryhl
2024-12-16 13:53 ` Andreas Hindborg
2024-12-11 10:37 ` [PATCH v11 8/8] task: rust: rework how current is accessed Alice Ryhl
` (2 subsequent siblings)
9 siblings, 1 reply; 65+ messages in thread
From: Alice Ryhl @ 2024-12-11 10:37 UTC (permalink / raw)
To: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan
Cc: Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Trevor Gross, linux-kernel,
linux-mm, rust-for-linux, Alice Ryhl
Add the ability to write a file_operations->mmap hook in Rust when using
the miscdevice abstraction. The `vma` argument to the `mmap` hook uses
the `VmAreaNew` type from the previous commit; this type provides the
correct set of operations for a file_operations->mmap hook.
Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
rust/kernel/miscdevice.rs | 37 +++++++++++++++++++++++++++++++++++++
1 file changed, 37 insertions(+)
diff --git a/rust/kernel/miscdevice.rs b/rust/kernel/miscdevice.rs
index 7e2a79b3ae26..e5366f9c6d7f 100644
--- a/rust/kernel/miscdevice.rs
+++ b/rust/kernel/miscdevice.rs
@@ -11,6 +11,8 @@
use crate::{
bindings,
error::{to_result, Error, Result, VTABLE_DEFAULT_ERROR},
+ fs::File,
+ mm::virt::VmAreaNew,
prelude::*,
str::CStr,
types::{ForeignOwnable, Opaque},
@@ -110,6 +112,15 @@ fn release(device: Self::Ptr) {
drop(device);
}
+ /// Handle for mmap.
+ fn mmap(
+ _device: <Self::Ptr as ForeignOwnable>::Borrowed<'_>,
+ _file: &File,
+ _vma: &VmAreaNew,
+ ) -> Result {
+ kernel::build_error!(VTABLE_DEFAULT_ERROR)
+ }
+
/// Handler for ioctls.
///
/// The `cmd` argument is usually manipulated using the utilties in [`kernel::ioctl`].
@@ -156,6 +167,7 @@ impl<T: MiscDevice> VtableHelper<T> {
const VTABLE: bindings::file_operations = bindings::file_operations {
open: Some(fops_open::<T>),
release: Some(fops_release::<T>),
+ mmap: maybe_fn(T::HAS_MMAP, fops_mmap::<T>),
unlocked_ioctl: maybe_fn(T::HAS_IOCTL, fops_ioctl::<T>),
#[cfg(CONFIG_COMPAT)]
compat_ioctl: if T::HAS_COMPAT_IOCTL {
@@ -216,6 +228,31 @@ impl<T: MiscDevice> VtableHelper<T> {
0
}
+/// # Safety
+///
+/// `file` must be a valid file that is associated with a `MiscDeviceRegistration<T>`.
+/// `vma` must be a vma that is currently being mmap'ed with this file.
+unsafe extern "C" fn fops_mmap<T: MiscDevice>(
+ file: *mut bindings::file,
+ vma: *mut bindings::vm_area_struct,
+) -> c_int {
+ // SAFETY: The mmap call of a file can access the private data.
+ let private = unsafe { (*file).private_data };
+ // SAFETY: Mmap calls can borrow the private data of the file.
+ let device = unsafe { <T::Ptr as ForeignOwnable>::borrow(private) };
+ // SAFETY: The caller provides a vma that is undergoing initial VMA setup.
+ let area = unsafe { VmAreaNew::from_raw(vma) };
+ // SAFETY:
+ // * The file is valid for the duration of this call.
+ // * There is no active fdget_pos region on the file on this thread.
+ let file = unsafe { File::from_raw_file(file) };
+
+ match T::mmap(device, file, area) {
+ Ok(()) => 0,
+ Err(err) => err.to_errno() as c_int,
+ }
+}
+
/// # Safety
///
/// `file` must be a valid file that is associated with a `MiscDeviceRegistration<T>`.
--
2.47.1.613.gc27f4b7a9f-goog
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 7/8] rust: miscdevice: add mmap support
2024-12-11 10:37 ` [PATCH v11 7/8] rust: miscdevice: add mmap support Alice Ryhl
@ 2024-12-16 13:53 ` Andreas Hindborg
0 siblings, 0 replies; 65+ messages in thread
From: Andreas Hindborg @ 2024-12-16 13:53 UTC (permalink / raw)
To: Alice Ryhl
Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
"Alice Ryhl" <aliceryhl@google.com> writes:
> Add the ability to write a file_operations->mmap hook in Rust when using
> the miscdevice abstraction. The `vma` argument to the `mmap` hook uses
> the `VmAreaNew` type from the previous commit; this type provides the
> correct set of operations for a file_operations->mmap hook.
>
> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> ---
> rust/kernel/miscdevice.rs | 37 +++++++++++++++++++++++++++++++++++++
> 1 file changed, 37 insertions(+)
>
> diff --git a/rust/kernel/miscdevice.rs b/rust/kernel/miscdevice.rs
> index 7e2a79b3ae26..e5366f9c6d7f 100644
> --- a/rust/kernel/miscdevice.rs
> +++ b/rust/kernel/miscdevice.rs
> @@ -11,6 +11,8 @@
> use crate::{
> bindings,
> error::{to_result, Error, Result, VTABLE_DEFAULT_ERROR},
> + fs::File,
> + mm::virt::VmAreaNew,
> prelude::*,
> str::CStr,
> types::{ForeignOwnable, Opaque},
> @@ -110,6 +112,15 @@ fn release(device: Self::Ptr) {
> drop(device);
> }
>
> + /// Handle for mmap.
> + fn mmap(
> + _device: <Self::Ptr as ForeignOwnable>::Borrowed<'_>,
> + _file: &File,
> + _vma: &VmAreaNew,
> + ) -> Result {
> + kernel::build_error!(VTABLE_DEFAULT_ERROR)
> + }
> +
> /// Handler for ioctls.
> ///
> /// The `cmd` argument is usually manipulated using the utilties in [`kernel::ioctl`].
> @@ -156,6 +167,7 @@ impl<T: MiscDevice> VtableHelper<T> {
> const VTABLE: bindings::file_operations = bindings::file_operations {
> open: Some(fops_open::<T>),
> release: Some(fops_release::<T>),
> + mmap: maybe_fn(T::HAS_MMAP, fops_mmap::<T>),
> unlocked_ioctl: maybe_fn(T::HAS_IOCTL, fops_ioctl::<T>),
> #[cfg(CONFIG_COMPAT)]
> compat_ioctl: if T::HAS_COMPAT_IOCTL {
> @@ -216,6 +228,31 @@ impl<T: MiscDevice> VtableHelper<T> {
> 0
> }
>
> +/// # Safety
> +///
> +/// `file` must be a valid file that is associated with a `MiscDeviceRegistration<T>`.
> +/// `vma` must be a vma that is currently being mmap'ed with this file.
> +unsafe extern "C" fn fops_mmap<T: MiscDevice>(
> + file: *mut bindings::file,
> + vma: *mut bindings::vm_area_struct,
> +) -> c_int {
> + // SAFETY: The mmap call of a file can access the private data.
> + let private = unsafe { (*file).private_data };
> + // SAFETY: Mmap calls can borrow the private data of the file.
This safety comment seems unrelated to the safety requirements of `ForeignOwnable::borrow`.
Best regards,
Andreas Hindborg
^ permalink raw reply [flat|nested] 65+ messages in thread
* [PATCH v11 8/8] task: rust: rework how current is accessed
2024-12-11 10:37 ` [PATCH v11 0/8] Rust support for mm_struct, vm_area_struct, and mmap Alice Ryhl
` (6 preceding siblings ...)
2024-12-11 10:37 ` [PATCH v11 7/8] rust: miscdevice: add mmap support Alice Ryhl
@ 2024-12-11 10:37 ` Alice Ryhl
2024-12-16 14:47 ` Andreas Hindborg
2024-12-16 23:40 ` Boqun Feng
2024-12-11 10:47 ` [PATCH v11 0/8] Rust support for mm_struct, vm_area_struct, and mmap Alice Ryhl
2024-12-16 11:04 ` Andreas Hindborg
9 siblings, 2 replies; 65+ messages in thread
From: Alice Ryhl @ 2024-12-11 10:37 UTC (permalink / raw)
To: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan
Cc: Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Trevor Gross, linux-kernel,
linux-mm, rust-for-linux, Alice Ryhl
Introduce a new type called `CurrentTask` that lets you perform various
operations that are only safe on the `current` task. Use the new type to
provide a way to access the current mm without incrementing its
refcount.
With this change, you can write stuff such as
let vma = current!().mm().lock_vma_under_rcu(addr);
without incrementing any refcounts.
This replaces the existing abstractions for accessing the current pid
namespace. With the old approach, every field access to current involves
both a macro and a unsafe helper function. The new approach simplifies
that to a single safe function on the `CurrentTask` type. This makes it
less heavy-weight to add additional current accessors in the future.
That said, creating a `CurrentTask` type like the one in this patch
requires that we are careful to ensure that it cannot escape the current
task or otherwise access things after they are freed. To do this, I
declared that it cannot escape the current "task context" where I
defined a "task context" as essentially the region in which `current`
remains unchanged. So e.g., release_task() or begin_new_exec() would
leave the task context.
If a userspace thread returns to userspace and later makes another
syscall, then I consider the two syscalls to be different task contexts.
This allows values stored in that task to be modified between syscalls,
even if they're guaranteed to be immutable during a syscall.
Ensuring correctness of `CurrentTask` is slightly tricky if we also want
the ability to have a safe `kthread_use_mm()` implementation in Rust. To
support that safely, there are two patterns we need to ensure are safe:
// Case 1: current!() called inside the scope.
let mm;
kthread_use_mm(some_mm, || {
mm = current!().mm();
});
drop(some_mm);
mm.do_something(); // UAF
and:
// Case 2: current!() called before the scope.
let mm;
let task = current!();
kthread_use_mm(some_mm, || {
mm = task.mm();
});
drop(some_mm);
mm.do_something(); // UAF
The existing `current!()` abstraction already natively prevents the
first case: The `&CurrentTask` would be tied to the inner scope, so the
borrow-checker ensures that no reference derived from it can escape the
scope.
Fixing the second case is a bit more tricky. The solution is to
essentially pretend that the contents of the scope execute on an
different thread, which means that only thread-safe types can cross the
boundary. Since `CurrentTask` is marked `NotThreadSafe`, attempts to
move it to another thread will fail, and this includes our fake pretend
thread boundary.
This has the disadvantage that other types that aren't thread-safe for
reasons unrelated to `current` also cannot be moved across the
`kthread_use_mm()` boundary. I consider this an acceptable tradeoff.
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
rust/kernel/mm.rs | 22 ----
rust/kernel/task.rs | 284 ++++++++++++++++++++++++++++++----------------------
2 files changed, 167 insertions(+), 139 deletions(-)
diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
index 50f4861ae4b9..f7d1079391ef 100644
--- a/rust/kernel/mm.rs
+++ b/rust/kernel/mm.rs
@@ -142,28 +142,6 @@ fn deref(&self) -> &MmWithUser {
// These methods are safe to call even if `mm_users` is zero.
impl Mm {
- /// Call `mmgrab` on `current.mm`.
- #[inline]
- pub fn mmgrab_current() -> Option<ARef<Mm>> {
- // SAFETY: It's safe to get the `mm` field from current.
- let mm = unsafe {
- let current = bindings::get_current();
- (*current).mm
- };
-
- if mm.is_null() {
- return None;
- }
-
- // SAFETY: The value of `current->mm` is guaranteed to be null or a valid `mm_struct`. We
- // just checked that it's not null. Furthermore, the returned `&Mm` is valid only for the
- // duration of this function, and `current->mm` will stay valid for that long.
- let mm = unsafe { Mm::from_raw(mm) };
-
- // This increments the refcount using `mmgrab`.
- Some(ARef::from(mm))
- }
-
/// Returns a raw pointer to the inner `mm_struct`.
#[inline]
pub fn as_raw(&self) -> *mut bindings::mm_struct {
diff --git a/rust/kernel/task.rs b/rust/kernel/task.rs
index 07bc22a7645c..8c1ee46c03eb 100644
--- a/rust/kernel/task.rs
+++ b/rust/kernel/task.rs
@@ -7,6 +7,7 @@
use crate::{
bindings,
ffi::{c_int, c_long, c_uint},
+ mm::MmWithUser,
pid_namespace::PidNamespace,
types::{ARef, NotThreadSafe, Opaque},
};
@@ -31,22 +32,20 @@
#[macro_export]
macro_rules! current {
() => {
- // SAFETY: Deref + addr-of below create a temporary `TaskRef` that cannot outlive the
- // caller.
+ // SAFETY: This expression creates a temporary value that is dropped at the end of the
+ // caller's scope. The following mechanisms ensure that the resulting `&CurrentTask` cannot
+ // leave current task context:
+ //
+ // * To return to userspace, the caller must leave the current scope.
+ // * Operations such as `begin_new_exec()` are necessarily unsafe and the caller of
+ // `begin_new_exec()` is responsible for safety.
+ // * Rust abstractions for things such as a `kthread_use_mm()` scope must require the
+ // closure to be `Send`, so the `NotThreadSafe` field of `CurrentTask` ensures that the
+ // `&CurrentTask` cannot cross the scope in either direction.
unsafe { &*$crate::task::Task::current() }
};
}
-/// Returns the currently running task's pid namespace.
-#[macro_export]
-macro_rules! current_pid_ns {
- () => {
- // SAFETY: Deref + addr-of below create a temporary `PidNamespaceRef` that cannot outlive
- // the caller.
- unsafe { &*$crate::task::Task::current_pid_ns() }
- };
-}
-
/// Wraps the kernel's `struct task_struct`.
///
/// # Invariants
@@ -105,6 +104,44 @@ unsafe impl Send for Task {}
// synchronised by C code (e.g., `signal_pending`).
unsafe impl Sync for Task {}
+/// Represents the [`Task`] in the `current` global.
+///
+/// This type exists to provide more efficient operations that are only valid on the current task.
+/// For example, to retrieve the pid-namespace of a task, you must use rcu protection unless it is
+/// the current task.
+///
+/// # Invariants
+///
+/// Each value of this type must only be accessed from the task context it was created within.
+///
+/// Of course, every thread is in a different task context, but for the purposes of this invariant,
+/// these operations also permanently leave the task context:
+///
+/// * Returning to userspace from system call context.
+/// * Calling `release_task()`.
+/// * Calling `begin_new_exec()` in a binary format loader.
+///
+/// Other operations temporarily create a new sub-context:
+///
+/// * Calling `kthread_use_mm()` creates a new context, and `kthread_unuse_mm()` returns to the
+/// old context.
+///
+/// This means that a `CurrentTask` obtained before a `kthread_use_mm()` call may be used again
+/// once `kthread_unuse_mm()` is called, but it must not be used between these two calls.
+/// Conversely, a `CurrentTask` obtained between a `kthread_use_mm()`/`kthread_unuse_mm()` pair
+/// must not be used after `kthread_unuse_mm()`.
+#[repr(transparent)]
+pub struct CurrentTask(Task, NotThreadSafe);
+
+// Make all `Task` methods available on `CurrentTask`.
+impl Deref for CurrentTask {
+ type Target = Task;
+ #[inline]
+ fn deref(&self) -> &Task {
+ &self.0
+ }
+}
+
/// The type of process identifiers (PIDs).
type Pid = bindings::pid_t;
@@ -131,119 +168,29 @@ pub fn current_raw() -> *mut bindings::task_struct {
///
/// # Safety
///
- /// Callers must ensure that the returned object doesn't outlive the current task/thread.
- pub unsafe fn current() -> impl Deref<Target = Task> {
- struct TaskRef<'a> {
- task: &'a Task,
- _not_send: NotThreadSafe,
+ /// Callers must ensure that the returned object is only used to access a [`CurrentTask`]
+ /// within the task context that was active when this function was called. For more details,
+ /// see the invariants section for [`CurrentTask`].
+ pub unsafe fn current() -> impl Deref<Target = CurrentTask> {
+ struct TaskRef {
+ task: *const CurrentTask,
}
- impl Deref for TaskRef<'_> {
- type Target = Task;
+ impl Deref for TaskRef {
+ type Target = CurrentTask;
fn deref(&self) -> &Self::Target {
- self.task
+ // SAFETY: The returned reference borrows from this `TaskRef`, so it cannot outlive
+ // the `TaskRef`, which the caller of `Task::current()` has promised will not
+ // outlive the task/thread for which `self.task` is the `current` pointer. Thus, it
+ // is okay to return a `CurrentTask` reference here.
+ unsafe { &*self.task }
}
}
- let current = Task::current_raw();
TaskRef {
- // SAFETY: If the current thread is still running, the current task is valid. Given
- // that `TaskRef` is not `Send`, we know it cannot be transferred to another thread
- // (where it could potentially outlive the caller).
- task: unsafe { &*current.cast() },
- _not_send: NotThreadSafe,
- }
- }
-
- /// Returns a PidNamespace reference for the currently executing task's/thread's pid namespace.
- ///
- /// This function can be used to create an unbounded lifetime by e.g., storing the returned
- /// PidNamespace in a global variable which would be a bug. So the recommended way to get the
- /// current task's/thread's pid namespace is to use the [`current_pid_ns`] macro because it is
- /// safe.
- ///
- /// # Safety
- ///
- /// Callers must ensure that the returned object doesn't outlive the current task/thread.
- pub unsafe fn current_pid_ns() -> impl Deref<Target = PidNamespace> {
- struct PidNamespaceRef<'a> {
- task: &'a PidNamespace,
- _not_send: NotThreadSafe,
- }
-
- impl Deref for PidNamespaceRef<'_> {
- type Target = PidNamespace;
-
- fn deref(&self) -> &Self::Target {
- self.task
- }
- }
-
- // The lifetime of `PidNamespace` is bound to `Task` and `struct pid`.
- //
- // The `PidNamespace` of a `Task` doesn't ever change once the `Task` is alive. A
- // `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)` will not have an effect
- // on the calling `Task`'s pid namespace. It will only effect the pid namespace of children
- // created by the calling `Task`. This invariant guarantees that after having acquired a
- // reference to a `Task`'s pid namespace it will remain unchanged.
- //
- // When a task has exited and been reaped `release_task()` will be called. This will set
- // the `PidNamespace` of the task to `NULL`. So retrieving the `PidNamespace` of a task
- // that is dead will return `NULL`. Note, that neither holding the RCU lock nor holding a
- // referencing count to
- // the `Task` will prevent `release_task()` being called.
- //
- // In order to retrieve the `PidNamespace` of a `Task` the `task_active_pid_ns()` function
- // can be used. There are two cases to consider:
- //
- // (1) retrieving the `PidNamespace` of the `current` task
- // (2) retrieving the `PidNamespace` of a non-`current` task
- //
- // From system call context retrieving the `PidNamespace` for case (1) is always safe and
- // requires neither RCU locking nor a reference count to be held. Retrieving the
- // `PidNamespace` after `release_task()` for current will return `NULL` but no codepath
- // like that is exposed to Rust.
- //
- // Retrieving the `PidNamespace` from system call context for (2) requires RCU protection.
- // Accessing `PidNamespace` outside of RCU protection requires a reference count that
- // must've been acquired while holding the RCU lock. Note that accessing a non-`current`
- // task means `NULL` can be returned as the non-`current` task could have already passed
- // through `release_task()`.
- //
- // To retrieve (1) the `current_pid_ns!()` macro should be used which ensure that the
- // returned `PidNamespace` cannot outlive the calling scope. The associated
- // `current_pid_ns()` function should not be called directly as it could be abused to
- // created an unbounded lifetime for `PidNamespace`. The `current_pid_ns!()` macro allows
- // Rust to handle the common case of accessing `current`'s `PidNamespace` without RCU
- // protection and without having to acquire a reference count.
- //
- // For (2) the `task_get_pid_ns()` method must be used. This will always acquire a
- // reference on `PidNamespace` and will return an `Option` to force the caller to
- // explicitly handle the case where `PidNamespace` is `None`, something that tends to be
- // forgotten when doing the equivalent operation in `C`. Missing RCU primitives make it
- // difficult to perform operations that are otherwise safe without holding a reference
- // count as long as RCU protection is guaranteed. But it is not important currently. But we
- // do want it in the future.
- //
- // Note for (2) the required RCU protection around calling `task_active_pid_ns()`
- // synchronizes against putting the last reference of the associated `struct pid` of
- // `task->thread_pid`. The `struct pid` stored in that field is used to retrieve the
- // `PidNamespace` of the caller. When `release_task()` is called `task->thread_pid` will be
- // `NULL`ed and `put_pid()` on said `struct pid` will be delayed in `free_pid()` via
- // `call_rcu()` allowing everyone with an RCU protected access to the `struct pid` acquired
- // from `task->thread_pid` to finish.
- //
- // SAFETY: The current task's pid namespace is valid as long as the current task is running.
- let pidns = unsafe { bindings::task_active_pid_ns(Task::current_raw()) };
- PidNamespaceRef {
- // SAFETY: If the current thread is still running, the current task and its associated
- // pid namespace are valid. `PidNamespaceRef` is not `Send`, so we know it cannot be
- // transferred to another thread (where it could potentially outlive the current
- // `Task`). The caller needs to ensure that the PidNamespaceRef doesn't outlive the
- // current task/thread.
- task: unsafe { PidNamespace::from_ptr(pidns) },
- _not_send: NotThreadSafe,
+ // CAST: The layout of `struct task_struct` and `CurrentTask` is identical.
+ task: Task::current_raw().cast(),
}
}
@@ -326,6 +273,109 @@ pub fn wake_up(&self) {
}
}
+impl CurrentTask {
+ /// Access the address space of the current task.
+ ///
+ /// This function does not touch the refcount of the mm.
+ #[inline]
+ pub fn mm(&self) -> Option<&MmWithUser> {
+ // SAFETY: The `mm` field of `current` is not modified from other threads, so reading it is
+ // not a data race.
+ let mm = unsafe { (*self.as_ptr()).mm };
+
+ if mm.is_null() {
+ return None;
+ }
+
+ // SAFETY: If `current->mm` is non-null, then it references a valid mm with a non-zero
+ // value of `mm_users`. Furthermore, the returned `&MmWithUser` borrows from this
+ // `CurrentTask`, so it cannot escape the scope in which the current pointer was obtained.
+ //
+ // This is safe even if `kthread_use_mm()`/`kthread_unuse_mm()` are used. There are two
+ // relevant cases:
+ // * If the `&CurrentTask` was created before `kthread_use_mm()`, then it cannot be
+ // accessed during the `kthread_use_mm()`/`kthread_unuse_mm()` scope due to the
+ // `NotThreadSafe` field of `CurrentTask`.
+ // * If the `&CurrentTask` was created within a `kthread_use_mm()`/`kthread_unuse_mm()`
+ // scope, then the `&CurrentTask` cannot escape that scope, so the returned `&MmWithUser`
+ // also cannot escape that scope.
+ // In either case, it's not possible to read `current->mm` and keep using it after the
+ // scope is ended with `kthread_unuse_mm()`.
+ Some(unsafe { MmWithUser::from_raw(mm) })
+ }
+
+ /// Access the pid namespace of the current task.
+ ///
+ /// This function does not touch the refcount of the namespace or use RCU protection.
+ #[doc(alias = "task_active_pid_ns")]
+ #[inline]
+ pub fn active_pid_ns(&self) -> Option<&PidNamespace> {
+ // SAFETY: It is safe to call `task_active_pid_ns` without RCU protection when calling it
+ // on the current task.
+ let active_ns = unsafe { bindings::task_active_pid_ns(self.as_ptr()) };
+
+ if active_ns.is_null() {
+ return None;
+ }
+
+ // The lifetime of `PidNamespace` is bound to `Task` and `struct pid`.
+ //
+ // The `PidNamespace` of a `Task` doesn't ever change once the `Task` is alive. A
+ // `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)` will not have an effect
+ // on the calling `Task`'s pid namespace. It will only effect the pid namespace of children
+ // created by the calling `Task`. This invariant guarantees that after having acquired a
+ // reference to a `Task`'s pid namespace it will remain unchanged.
+ //
+ // When a task has exited and been reaped `release_task()` will be called. This will set
+ // the `PidNamespace` of the task to `NULL`. So retrieving the `PidNamespace` of a task
+ // that is dead will return `NULL`. Note, that neither holding the RCU lock nor holding a
+ // referencing count to the `Task` will prevent `release_task()` being called.
+ //
+ // In order to retrieve the `PidNamespace` of a `Task` the `task_active_pid_ns()` function
+ // can be used. There are two cases to consider:
+ //
+ // (1) retrieving the `PidNamespace` of the `current` task
+ // (2) retrieving the `PidNamespace` of a non-`current` task
+ //
+ // From system call context retrieving the `PidNamespace` for case (1) is always safe and
+ // requires neither RCU locking nor a reference count to be held. Retrieving the
+ // `PidNamespace` after `release_task()` for current will return `NULL` but no codepath
+ // like that is exposed to Rust.
+ //
+ // Retrieving the `PidNamespace` from system call context for (2) requires RCU protection.
+ // Accessing `PidNamespace` outside of RCU protection requires a reference count that
+ // must've been acquired while holding the RCU lock. Note that accessing a non-`current`
+ // task means `NULL` can be returned as the non-`current` task could have already passed
+ // through `release_task()`.
+ //
+ // To retrieve (1) the `&CurrentTask` type should be used which ensures that the returned
+ // `PidNamespace` cannot outlive the current task context. The `CurrentTask::active_pid_ns`
+ // function allows Rust to handle the common case of accessing `current`'s `PidNamespace`
+ // without RCU protection and without having to acquire a reference count.
+ //
+ // For (2) the `task_get_pid_ns()` method must be used. This will always acquire a
+ // reference on `PidNamespace` and will return an `Option` to force the caller to
+ // explicitly handle the case where `PidNamespace` is `None`, something that tends to be
+ // forgotten when doing the equivalent operation in `C`. Missing RCU primitives make it
+ // difficult to perform operations that are otherwise safe without holding a reference
+ // count as long as RCU protection is guaranteed. But it is not important currently. But we
+ // do want it in the future.
+ //
+ // Note for (2) the required RCU protection around calling `task_active_pid_ns()`
+ // synchronizes against putting the last reference of the associated `struct pid` of
+ // `task->thread_pid`. The `struct pid` stored in that field is used to retrieve the
+ // `PidNamespace` of the caller. When `release_task()` is called `task->thread_pid` will be
+ // `NULL`ed and `put_pid()` on said `struct pid` will be delayed in `free_pid()` via
+ // `call_rcu()` allowing everyone with an RCU protected access to the `struct pid` acquired
+ // from `task->thread_pid` to finish.
+ //
+ // SAFETY: If `current`'s pid ns is non-null, then it references a valid pid ns.
+ // Furthermore, the returned `&PidNamespace` borrows from this `CurrentTask`, so it cannot
+ // escape the scope in which the current pointer was obtained.
+ Some(unsafe { PidNamespace::from_ptr(active_ns) })
+ }
+}
+
// SAFETY: The type invariants guarantee that `Task` is always refcounted.
unsafe impl crate::types::AlwaysRefCounted for Task {
fn inc_ref(&self) {
--
2.47.1.613.gc27f4b7a9f-goog
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 8/8] task: rust: rework how current is accessed
2024-12-11 10:37 ` [PATCH v11 8/8] task: rust: rework how current is accessed Alice Ryhl
@ 2024-12-16 14:47 ` Andreas Hindborg
2025-01-08 12:32 ` Alice Ryhl
2024-12-16 23:40 ` Boqun Feng
1 sibling, 1 reply; 65+ messages in thread
From: Andreas Hindborg @ 2024-12-16 14:47 UTC (permalink / raw)
To: Alice Ryhl
Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
"Alice Ryhl" <aliceryhl@google.com> writes:
> Introduce a new type called `CurrentTask` that lets you perform various
> operations that are only safe on the `current` task. Use the new type to
> provide a way to access the current mm without incrementing its
> refcount.
>
> With this change, you can write stuff such as
>
> let vma = current!().mm().lock_vma_under_rcu(addr);
>
> without incrementing any refcounts.
>
> This replaces the existing abstractions for accessing the current pid
> namespace. With the old approach, every field access to current involves
> both a macro and a unsafe helper function. The new approach simplifies
> that to a single safe function on the `CurrentTask` type. This makes it
> less heavy-weight to add additional current accessors in the future.
>
> That said, creating a `CurrentTask` type like the one in this patch
> requires that we are careful to ensure that it cannot escape the current
> task or otherwise access things after they are freed. To do this, I
> declared that it cannot escape the current "task context" where I
> defined a "task context" as essentially the region in which `current`
> remains unchanged. So e.g., release_task() or begin_new_exec() would
> leave the task context.
>
> If a userspace thread returns to userspace and later makes another
> syscall, then I consider the two syscalls to be different task contexts.
> This allows values stored in that task to be modified between syscalls,
> even if they're guaranteed to be immutable during a syscall.
>
> Ensuring correctness of `CurrentTask` is slightly tricky if we also want
> the ability to have a safe `kthread_use_mm()` implementation in Rust. To
> support that safely, there are two patterns we need to ensure are safe:
>
> // Case 1: current!() called inside the scope.
> let mm;
> kthread_use_mm(some_mm, || {
> mm = current!().mm();
> });
> drop(some_mm);
> mm.do_something(); // UAF
>
> and:
>
> // Case 2: current!() called before the scope.
> let mm;
> let task = current!();
> kthread_use_mm(some_mm, || {
> mm = task.mm();
> });
> drop(some_mm);
> mm.do_something(); // UAF
>
> The existing `current!()` abstraction already natively prevents the
> first case: The `&CurrentTask` would be tied to the inner scope, so the
> borrow-checker ensures that no reference derived from it can escape the
> scope.
>
> Fixing the second case is a bit more tricky. The solution is to
> essentially pretend that the contents of the scope execute on an
> different thread, which means that only thread-safe types can cross the
> boundary. Since `CurrentTask` is marked `NotThreadSafe`, attempts to
> move it to another thread will fail, and this includes our fake pretend
> thread boundary.
>
> This has the disadvantage that other types that aren't thread-safe for
> reasons unrelated to `current` also cannot be moved across the
> `kthread_use_mm()` boundary. I consider this an acceptable tradeoff.
>
> Cc: Christian Brauner <brauner@kernel.org>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> ---
> rust/kernel/mm.rs | 22 ----
> rust/kernel/task.rs | 284 ++++++++++++++++++++++++++++++----------------------
> 2 files changed, 167 insertions(+), 139 deletions(-)
>
> diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
> index 50f4861ae4b9..f7d1079391ef 100644
> --- a/rust/kernel/mm.rs
> +++ b/rust/kernel/mm.rs
> @@ -142,28 +142,6 @@ fn deref(&self) -> &MmWithUser {
>
> // These methods are safe to call even if `mm_users` is zero.
> impl Mm {
> - /// Call `mmgrab` on `current.mm`.
> - #[inline]
> - pub fn mmgrab_current() -> Option<ARef<Mm>> {
> - // SAFETY: It's safe to get the `mm` field from current.
> - let mm = unsafe {
> - let current = bindings::get_current();
> - (*current).mm
> - };
> -
> - if mm.is_null() {
> - return None;
> - }
> -
> - // SAFETY: The value of `current->mm` is guaranteed to be null or a valid `mm_struct`. We
> - // just checked that it's not null. Furthermore, the returned `&Mm` is valid only for the
> - // duration of this function, and `current->mm` will stay valid for that long.
> - let mm = unsafe { Mm::from_raw(mm) };
> -
> - // This increments the refcount using `mmgrab`.
> - Some(ARef::from(mm))
> - }
> -
> /// Returns a raw pointer to the inner `mm_struct`.
> #[inline]
> pub fn as_raw(&self) -> *mut bindings::mm_struct {
> diff --git a/rust/kernel/task.rs b/rust/kernel/task.rs
> index 07bc22a7645c..8c1ee46c03eb 100644
> --- a/rust/kernel/task.rs
> +++ b/rust/kernel/task.rs
> @@ -7,6 +7,7 @@
> use crate::{
> bindings,
> ffi::{c_int, c_long, c_uint},
> + mm::MmWithUser,
> pid_namespace::PidNamespace,
> types::{ARef, NotThreadSafe, Opaque},
> };
> @@ -31,22 +32,20 @@
> #[macro_export]
> macro_rules! current {
> () => {
> - // SAFETY: Deref + addr-of below create a temporary `TaskRef` that cannot outlive the
> - // caller.
> + // SAFETY: This expression creates a temporary value that is dropped at the end of the
> + // caller's scope. The following mechanisms ensure that the resulting `&CurrentTask` cannot
> + // leave current task context:
> + //
> + // * To return to userspace, the caller must leave the current scope.
> + // * Operations such as `begin_new_exec()` are necessarily unsafe and the caller of
> + // `begin_new_exec()` is responsible for safety.
> + // * Rust abstractions for things such as a `kthread_use_mm()` scope must require the
> + // closure to be `Send`, so the `NotThreadSafe` field of `CurrentTask` ensures that the
> + // `&CurrentTask` cannot cross the scope in either direction.
> unsafe { &*$crate::task::Task::current() }
> };
> }
>
> -/// Returns the currently running task's pid namespace.
> -#[macro_export]
> -macro_rules! current_pid_ns {
> - () => {
> - // SAFETY: Deref + addr-of below create a temporary `PidNamespaceRef` that cannot outlive
> - // the caller.
> - unsafe { &*$crate::task::Task::current_pid_ns() }
> - };
> -}
> -
> /// Wraps the kernel's `struct task_struct`.
> ///
> /// # Invariants
> @@ -105,6 +104,44 @@ unsafe impl Send for Task {}
> // synchronised by C code (e.g., `signal_pending`).
> unsafe impl Sync for Task {}
>
> +/// Represents the [`Task`] in the `current` global.
> +///
> +/// This type exists to provide more efficient operations that are only valid on the current task.
> +/// For example, to retrieve the pid-namespace of a task, you must use rcu protection unless it is
> +/// the current task.
> +///
> +/// # Invariants
> +///
> +/// Each value of this type must only be accessed from the task context it was created within.
> +///
> +/// Of course, every thread is in a different task context, but for the purposes of this invariant,
> +/// these operations also permanently leave the task context:
> +///
> +/// * Returning to userspace from system call context.
> +/// * Calling `release_task()`.
> +/// * Calling `begin_new_exec()` in a binary format loader.
> +///
> +/// Other operations temporarily create a new sub-context:
> +///
> +/// * Calling `kthread_use_mm()` creates a new context, and `kthread_unuse_mm()` returns to the
> +/// old context.
> +///
> +/// This means that a `CurrentTask` obtained before a `kthread_use_mm()` call may be used again
> +/// once `kthread_unuse_mm()` is called, but it must not be used between these two calls.
> +/// Conversely, a `CurrentTask` obtained between a `kthread_use_mm()`/`kthread_unuse_mm()` pair
> +/// must not be used after `kthread_unuse_mm()`.
> +#[repr(transparent)]
> +pub struct CurrentTask(Task, NotThreadSafe);
> +
> +// Make all `Task` methods available on `CurrentTask`.
> +impl Deref for CurrentTask {
> + type Target = Task;
> + #[inline]
> + fn deref(&self) -> &Task {
> + &self.0
> + }
> +}
> +
> /// The type of process identifiers (PIDs).
> type Pid = bindings::pid_t;
>
> @@ -131,119 +168,29 @@ pub fn current_raw() -> *mut bindings::task_struct {
> ///
> /// # Safety
> ///
> - /// Callers must ensure that the returned object doesn't outlive the current task/thread.
> - pub unsafe fn current() -> impl Deref<Target = Task> {
> - struct TaskRef<'a> {
> - task: &'a Task,
> - _not_send: NotThreadSafe,
> + /// Callers must ensure that the returned object is only used to access a [`CurrentTask`]
> + /// within the task context that was active when this function was called. For more details,
> + /// see the invariants section for [`CurrentTask`].
> + pub unsafe fn current() -> impl Deref<Target = CurrentTask> {
> + struct TaskRef {
> + task: *const CurrentTask,
> }
>
> - impl Deref for TaskRef<'_> {
> - type Target = Task;
> + impl Deref for TaskRef {
> + type Target = CurrentTask;
>
> fn deref(&self) -> &Self::Target {
> - self.task
> + // SAFETY: The returned reference borrows from this `TaskRef`, so it cannot outlive
> + // the `TaskRef`, which the caller of `Task::current()` has promised will not
> + // outlive the task/thread for which `self.task` is the `current` pointer. Thus, it
> + // is okay to return a `CurrentTask` reference here.
> + unsafe { &*self.task }
> }
> }
>
> - let current = Task::current_raw();
> TaskRef {
> - // SAFETY: If the current thread is still running, the current task is valid. Given
> - // that `TaskRef` is not `Send`, we know it cannot be transferred to another thread
> - // (where it could potentially outlive the caller).
> - task: unsafe { &*current.cast() },
> - _not_send: NotThreadSafe,
> - }
> - }
> -
> - /// Returns a PidNamespace reference for the currently executing task's/thread's pid namespace.
> - ///
> - /// This function can be used to create an unbounded lifetime by e.g., storing the returned
> - /// PidNamespace in a global variable which would be a bug. So the recommended way to get the
> - /// current task's/thread's pid namespace is to use the [`current_pid_ns`] macro because it is
> - /// safe.
> - ///
> - /// # Safety
> - ///
> - /// Callers must ensure that the returned object doesn't outlive the current task/thread.
> - pub unsafe fn current_pid_ns() -> impl Deref<Target = PidNamespace> {
> - struct PidNamespaceRef<'a> {
> - task: &'a PidNamespace,
> - _not_send: NotThreadSafe,
> - }
> -
> - impl Deref for PidNamespaceRef<'_> {
> - type Target = PidNamespace;
> -
> - fn deref(&self) -> &Self::Target {
> - self.task
> - }
> - }
> -
> - // The lifetime of `PidNamespace` is bound to `Task` and `struct pid`.
> - //
> - // The `PidNamespace` of a `Task` doesn't ever change once the `Task` is alive. A
> - // `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)` will not have an effect
> - // on the calling `Task`'s pid namespace. It will only effect the pid namespace of children
> - // created by the calling `Task`. This invariant guarantees that after having acquired a
> - // reference to a `Task`'s pid namespace it will remain unchanged.
> - //
> - // When a task has exited and been reaped `release_task()` will be called. This will set
> - // the `PidNamespace` of the task to `NULL`. So retrieving the `PidNamespace` of a task
> - // that is dead will return `NULL`. Note, that neither holding the RCU lock nor holding a
> - // referencing count to
> - // the `Task` will prevent `release_task()` being called.
> - //
> - // In order to retrieve the `PidNamespace` of a `Task` the `task_active_pid_ns()` function
> - // can be used. There are two cases to consider:
> - //
> - // (1) retrieving the `PidNamespace` of the `current` task
> - // (2) retrieving the `PidNamespace` of a non-`current` task
> - //
> - // From system call context retrieving the `PidNamespace` for case (1) is always safe and
> - // requires neither RCU locking nor a reference count to be held. Retrieving the
> - // `PidNamespace` after `release_task()` for current will return `NULL` but no codepath
> - // like that is exposed to Rust.
> - //
> - // Retrieving the `PidNamespace` from system call context for (2) requires RCU protection.
> - // Accessing `PidNamespace` outside of RCU protection requires a reference count that
> - // must've been acquired while holding the RCU lock. Note that accessing a non-`current`
> - // task means `NULL` can be returned as the non-`current` task could have already passed
> - // through `release_task()`.
> - //
> - // To retrieve (1) the `current_pid_ns!()` macro should be used which ensure that the
> - // returned `PidNamespace` cannot outlive the calling scope. The associated
> - // `current_pid_ns()` function should not be called directly as it could be abused to
> - // created an unbounded lifetime for `PidNamespace`. The `current_pid_ns!()` macro allows
> - // Rust to handle the common case of accessing `current`'s `PidNamespace` without RCU
> - // protection and without having to acquire a reference count.
> - //
> - // For (2) the `task_get_pid_ns()` method must be used. This will always acquire a
> - // reference on `PidNamespace` and will return an `Option` to force the caller to
> - // explicitly handle the case where `PidNamespace` is `None`, something that tends to be
> - // forgotten when doing the equivalent operation in `C`. Missing RCU primitives make it
> - // difficult to perform operations that are otherwise safe without holding a reference
> - // count as long as RCU protection is guaranteed. But it is not important currently. But we
> - // do want it in the future.
> - //
> - // Note for (2) the required RCU protection around calling `task_active_pid_ns()`
> - // synchronizes against putting the last reference of the associated `struct pid` of
> - // `task->thread_pid`. The `struct pid` stored in that field is used to retrieve the
> - // `PidNamespace` of the caller. When `release_task()` is called `task->thread_pid` will be
> - // `NULL`ed and `put_pid()` on said `struct pid` will be delayed in `free_pid()` via
> - // `call_rcu()` allowing everyone with an RCU protected access to the `struct pid` acquired
> - // from `task->thread_pid` to finish.
> - //
> - // SAFETY: The current task's pid namespace is valid as long as the current task is running.
> - let pidns = unsafe { bindings::task_active_pid_ns(Task::current_raw()) };
> - PidNamespaceRef {
> - // SAFETY: If the current thread is still running, the current task and its associated
> - // pid namespace are valid. `PidNamespaceRef` is not `Send`, so we know it cannot be
> - // transferred to another thread (where it could potentially outlive the current
> - // `Task`). The caller needs to ensure that the PidNamespaceRef doesn't outlive the
> - // current task/thread.
> - task: unsafe { PidNamespace::from_ptr(pidns) },
> - _not_send: NotThreadSafe,
> + // CAST: The layout of `struct task_struct` and `CurrentTask` is identical.
> + task: Task::current_raw().cast(),
> }
> }
>
> @@ -326,6 +273,109 @@ pub fn wake_up(&self) {
> }
> }
>
> +impl CurrentTask {
> + /// Access the address space of the current task.
> + ///
> + /// This function does not touch the refcount of the mm.
> + #[inline]
> + pub fn mm(&self) -> Option<&MmWithUser> {
> + // SAFETY: The `mm` field of `current` is not modified from other threads, so reading it is
> + // not a data race.
> + let mm = unsafe { (*self.as_ptr()).mm };
> +
> + if mm.is_null() {
> + return None;
> + }
> +
> + // SAFETY: If `current->mm` is non-null, then it references a valid mm with a non-zero
> + // value of `mm_users`. Furthermore, the returned `&MmWithUser` borrows from this
> + // `CurrentTask`, so it cannot escape the scope in which the current pointer was obtained.
> + //
> + // This is safe even if `kthread_use_mm()`/`kthread_unuse_mm()` are used. There are two
> + // relevant cases:
> + // * If the `&CurrentTask` was created before `kthread_use_mm()`, then it cannot be
> + // accessed during the `kthread_use_mm()`/`kthread_unuse_mm()` scope due to the
> + // `NotThreadSafe` field of `CurrentTask`.
> + // * If the `&CurrentTask` was created within a `kthread_use_mm()`/`kthread_unuse_mm()`
> + // scope, then the `&CurrentTask` cannot escape that scope, so the returned `&MmWithUser`
> + // also cannot escape that scope.
> + // In either case, it's not possible to read `current->mm` and keep using it after the
> + // scope is ended with `kthread_unuse_mm()`.
I guess we don't actually need the last section until we see
`ktread_use_mm` / `kthread_unuse_mm` abstractions in tree?
> + Some(unsafe { MmWithUser::from_raw(mm) })
> + }
> +
> + /// Access the pid namespace of the current task.
Is it an address space or a memory map(ping)? Can we use consistent vocabulary?
> + ///
> + /// This function does not touch the refcount of the namespace or use RCU protection.
> + #[doc(alias = "task_active_pid_ns")]
What is with the alias?
> + #[inline]
> + pub fn active_pid_ns(&self) -> Option<&PidNamespace> {
> + // SAFETY: It is safe to call `task_active_pid_ns` without RCU protection when calling it
> + // on the current task.
> + let active_ns = unsafe { bindings::task_active_pid_ns(self.as_ptr()) };
> +
> + if active_ns.is_null() {
> + return None;
> + }
> +
> + // The lifetime of `PidNamespace` is bound to `Task` and `struct pid`.
> + //
> + // The `PidNamespace` of a `Task` doesn't ever change once the `Task` is alive. A
> + // `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)` will not have an effect
> + // on the calling `Task`'s pid namespace. It will only effect the pid namespace of children
> + // created by the calling `Task`. This invariant guarantees that after having acquired a
> + // reference to a `Task`'s pid namespace it will remain unchanged.
> + //
> + // When a task has exited and been reaped `release_task()` will be called. This will set
> + // the `PidNamespace` of the task to `NULL`. So retrieving the `PidNamespace` of a task
> + // that is dead will return `NULL`. Note, that neither holding the RCU lock nor holding a
> + // referencing count to the `Task` will prevent `release_task()` being called.
> + //
> + // In order to retrieve the `PidNamespace` of a `Task` the `task_active_pid_ns()` function
> + // can be used. There are two cases to consider:
> + //
> + // (1) retrieving the `PidNamespace` of the `current` task
> + // (2) retrieving the `PidNamespace` of a non-`current` task
> + //
> + // From system call context retrieving the `PidNamespace` for case (1) is always safe and
> + // requires neither RCU locking nor a reference count to be held. Retrieving the
> + // `PidNamespace` after `release_task()` for current will return `NULL` but no codepath
> + // like that is exposed to Rust.
> + //
> + // Retrieving the `PidNamespace` from system call context for (2) requires RCU protection.
> + // Accessing `PidNamespace` outside of RCU protection requires a reference count that
> + // must've been acquired while holding the RCU lock. Note that accessing a non-`current`
> + // task means `NULL` can be returned as the non-`current` task could have already passed
> + // through `release_task()`.
> + //
> + // To retrieve (1) the `&CurrentTask` type should be used which ensures that the returned
> + // `PidNamespace` cannot outlive the current task context. The `CurrentTask::active_pid_ns`
> + // function allows Rust to handle the common case of accessing `current`'s `PidNamespace`
> + // without RCU protection and without having to acquire a reference count.
> + //
> + // For (2) the `task_get_pid_ns()` method must be used. This will always acquire a
> + // reference on `PidNamespace` and will return an `Option` to force the caller to
> + // explicitly handle the case where `PidNamespace` is `None`, something that tends to be
> + // forgotten when doing the equivalent operation in `C`. Missing RCU primitives make it
> + // difficult to perform operations that are otherwise safe without holding a reference
> + // count as long as RCU protection is guaranteed. But it is not important currently. But we
> + // do want it in the future.
> + //
> + // Note for (2) the required RCU protection around calling `task_active_pid_ns()`
> + // synchronizes against putting the last reference of the associated `struct pid` of
> + // `task->thread_pid`. The `struct pid` stored in that field is used to retrieve the
> + // `PidNamespace` of the caller. When `release_task()` is called `task->thread_pid` will be
> + // `NULL`ed and `put_pid()` on said `struct pid` will be delayed in `free_pid()` via
> + // `call_rcu()` allowing everyone with an RCU protected access to the `struct pid` acquired
> + // from `task->thread_pid` to finish.
While this comment is a nice piece of documentation, I think we should
move it elsewhere, or restrict it to paragraphs pertaining to (1), since
that is the only case we consider here?
> + //
> + // SAFETY: If `current`'s pid ns is non-null, then it references a valid pid ns.
> + // Furthermore, the returned `&PidNamespace` borrows from this `CurrentTask`, so it cannot
> + // escape the scope in which the current pointer was obtained.
> + Some(unsafe { PidNamespace::from_ptr(active_ns) })
> + }
Can we move the impl block and the struct definition next to each other?
Best regards,
Andreas Hindborg
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 8/8] task: rust: rework how current is accessed
2024-12-16 14:47 ` Andreas Hindborg
@ 2025-01-08 12:32 ` Alice Ryhl
2025-01-09 8:42 ` Andreas Hindborg
0 siblings, 1 reply; 65+ messages in thread
From: Alice Ryhl @ 2025-01-08 12:32 UTC (permalink / raw)
To: Andreas Hindborg
Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
On Mon, Dec 16, 2024 at 3:51 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>
> "Alice Ryhl" <aliceryhl@google.com> writes:
> > +impl CurrentTask {
> > + /// Access the address space of the current task.
> > + ///
> > + /// This function does not touch the refcount of the mm.
> > + #[inline]
> > + pub fn mm(&self) -> Option<&MmWithUser> {
> > + // SAFETY: The `mm` field of `current` is not modified from other threads, so reading it is
> > + // not a data race.
> > + let mm = unsafe { (*self.as_ptr()).mm };
> > +
> > + if mm.is_null() {
> > + return None;
> > + }
> > +
> > + // SAFETY: If `current->mm` is non-null, then it references a valid mm with a non-zero
> > + // value of `mm_users`. Furthermore, the returned `&MmWithUser` borrows from this
> > + // `CurrentTask`, so it cannot escape the scope in which the current pointer was obtained.
> > + //
> > + // This is safe even if `kthread_use_mm()`/`kthread_unuse_mm()` are used. There are two
> > + // relevant cases:
> > + // * If the `&CurrentTask` was created before `kthread_use_mm()`, then it cannot be
> > + // accessed during the `kthread_use_mm()`/`kthread_unuse_mm()` scope due to the
> > + // `NotThreadSafe` field of `CurrentTask`.
> > + // * If the `&CurrentTask` was created within a `kthread_use_mm()`/`kthread_unuse_mm()`
> > + // scope, then the `&CurrentTask` cannot escape that scope, so the returned `&MmWithUser`
> > + // also cannot escape that scope.
> > + // In either case, it's not possible to read `current->mm` and keep using it after the
> > + // scope is ended with `kthread_unuse_mm()`.
>
> I guess we don't actually need the last section until we see
> `ktread_use_mm` / `kthread_unuse_mm` abstractions in tree?
I mean, there could be such a scope in C code that called into Rust?
And I don't think there's anything wrong with future-proofing this
abstraction towards adding it in the future.
> > + Some(unsafe { MmWithUser::from_raw(mm) })
> > + }
> > +
> > + /// Access the pid namespace of the current task.
>
> Is it an address space or a memory map(ping)? Can we use consistent vocabulary?
Neither. It's a pid namespace which has nothing to do with address
spaces or memory mappings. This part of this patch is moving an
existing abstraction to work with the reworked way to access current.
> > + ///
> > + /// This function does not touch the refcount of the namespace or use RCU protection.
> > + #[doc(alias = "task_active_pid_ns")]
>
> What is with the alias?
This is the Rust equivalent to the C function called
task_active_pid_ns. The alias makes it easier to find it.
> > + #[inline]
> > + pub fn active_pid_ns(&self) -> Option<&PidNamespace> {
> > + // SAFETY: It is safe to call `task_active_pid_ns` without RCU protection when calling it
> > + // on the current task.
> > + let active_ns = unsafe { bindings::task_active_pid_ns(self.as_ptr()) };
> > +
> > + if active_ns.is_null() {
> > + return None;
> > + }
> > +
> > + // The lifetime of `PidNamespace` is bound to `Task` and `struct pid`.
> > + //
> > + // The `PidNamespace` of a `Task` doesn't ever change once the `Task` is alive. A
> > + // `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)` will not have an effect
> > + // on the calling `Task`'s pid namespace. It will only effect the pid namespace of children
> > + // created by the calling `Task`. This invariant guarantees that after having acquired a
> > + // reference to a `Task`'s pid namespace it will remain unchanged.
> > + //
> > + // When a task has exited and been reaped `release_task()` will be called. This will set
> > + // the `PidNamespace` of the task to `NULL`. So retrieving the `PidNamespace` of a task
> > + // that is dead will return `NULL`. Note, that neither holding the RCU lock nor holding a
> > + // referencing count to the `Task` will prevent `release_task()` being called.
> > + //
> > + // In order to retrieve the `PidNamespace` of a `Task` the `task_active_pid_ns()` function
> > + // can be used. There are two cases to consider:
> > + //
> > + // (1) retrieving the `PidNamespace` of the `current` task
> > + // (2) retrieving the `PidNamespace` of a non-`current` task
> > + //
> > + // From system call context retrieving the `PidNamespace` for case (1) is always safe and
> > + // requires neither RCU locking nor a reference count to be held. Retrieving the
> > + // `PidNamespace` after `release_task()` for current will return `NULL` but no codepath
> > + // like that is exposed to Rust.
> > + //
> > + // Retrieving the `PidNamespace` from system call context for (2) requires RCU protection.
> > + // Accessing `PidNamespace` outside of RCU protection requires a reference count that
> > + // must've been acquired while holding the RCU lock. Note that accessing a non-`current`
> > + // task means `NULL` can be returned as the non-`current` task could have already passed
> > + // through `release_task()`.
> > + //
> > + // To retrieve (1) the `&CurrentTask` type should be used which ensures that the returned
> > + // `PidNamespace` cannot outlive the current task context. The `CurrentTask::active_pid_ns`
> > + // function allows Rust to handle the common case of accessing `current`'s `PidNamespace`
> > + // without RCU protection and without having to acquire a reference count.
> > + //
> > + // For (2) the `task_get_pid_ns()` method must be used. This will always acquire a
> > + // reference on `PidNamespace` and will return an `Option` to force the caller to
> > + // explicitly handle the case where `PidNamespace` is `None`, something that tends to be
> > + // forgotten when doing the equivalent operation in `C`. Missing RCU primitives make it
> > + // difficult to perform operations that are otherwise safe without holding a reference
> > + // count as long as RCU protection is guaranteed. But it is not important currently. But we
> > + // do want it in the future.
> > + //
> > + // Note for (2) the required RCU protection around calling `task_active_pid_ns()`
> > + // synchronizes against putting the last reference of the associated `struct pid` of
> > + // `task->thread_pid`. The `struct pid` stored in that field is used to retrieve the
> > + // `PidNamespace` of the caller. When `release_task()` is called `task->thread_pid` will be
> > + // `NULL`ed and `put_pid()` on said `struct pid` will be delayed in `free_pid()` via
> > + // `call_rcu()` allowing everyone with an RCU protected access to the `struct pid` acquired
> > + // from `task->thread_pid` to finish.
>
> While this comment is a nice piece of documentation, I think we should
> move it elsewhere, or restrict it to paragraphs pertaining to (1), since
> that is the only case we consider here?
Where would you move it?
> > + //
> > + // SAFETY: If `current`'s pid ns is non-null, then it references a valid pid ns.
> > + // Furthermore, the returned `&PidNamespace` borrows from this `CurrentTask`, so it cannot
> > + // escape the scope in which the current pointer was obtained.
> > + Some(unsafe { PidNamespace::from_ptr(active_ns) })
> > + }
>
> Can we move the impl block and the struct definition next to each other?
I could move the definition of CurrentTask down, but I'm not really
convinced that it's an improvement.
Alice
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 8/8] task: rust: rework how current is accessed
2025-01-08 12:32 ` Alice Ryhl
@ 2025-01-09 8:42 ` Andreas Hindborg
2025-01-13 10:26 ` Alice Ryhl
0 siblings, 1 reply; 65+ messages in thread
From: Andreas Hindborg @ 2025-01-09 8:42 UTC (permalink / raw)
To: Alice Ryhl
Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
"Alice Ryhl" <aliceryhl@google.com> writes:
> On Mon, Dec 16, 2024 at 3:51 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>>
>> "Alice Ryhl" <aliceryhl@google.com> writes:
>> > +impl CurrentTask {
>> > + /// Access the address space of the current task.
>> > + ///
>> > + /// This function does not touch the refcount of the mm.
>> > + #[inline]
>> > + pub fn mm(&self) -> Option<&MmWithUser> {
>> > + // SAFETY: The `mm` field of `current` is not modified from other threads, so reading it is
>> > + // not a data race.
>> > + let mm = unsafe { (*self.as_ptr()).mm };
>> > +
>> > + if mm.is_null() {
>> > + return None;
>> > + }
>> > +
>> > + // SAFETY: If `current->mm` is non-null, then it references a valid mm with a non-zero
>> > + // value of `mm_users`. Furthermore, the returned `&MmWithUser` borrows from this
>> > + // `CurrentTask`, so it cannot escape the scope in which the current pointer was obtained.
>> > + //
>> > + // This is safe even if `kthread_use_mm()`/`kthread_unuse_mm()` are used. There are two
>> > + // relevant cases:
>> > + // * If the `&CurrentTask` was created before `kthread_use_mm()`, then it cannot be
>> > + // accessed during the `kthread_use_mm()`/`kthread_unuse_mm()` scope due to the
>> > + // `NotThreadSafe` field of `CurrentTask`.
>> > + // * If the `&CurrentTask` was created within a `kthread_use_mm()`/`kthread_unuse_mm()`
>> > + // scope, then the `&CurrentTask` cannot escape that scope, so the returned `&MmWithUser`
>> > + // also cannot escape that scope.
>> > + // In either case, it's not possible to read `current->mm` and keep using it after the
>> > + // scope is ended with `kthread_unuse_mm()`.
>>
>> I guess we don't actually need the last section until we see
>> `ktread_use_mm` / `kthread_unuse_mm` abstractions in tree?
>
> I mean, there could be such a scope in C code that called into Rust?
👍
>> > + Some(unsafe { MmWithUser::from_raw(mm) })
>> > + }
>> > +
>> > + /// Access the pid namespace of the current task.
>>
>> Is it an address space or a memory map(ping)? Can we use consistent vocabulary?
>
> Neither. It's a pid namespace which has nothing to do with address
> spaces or memory mappings. This part of this patch is moving an
> existing abstraction to work with the reworked way to access current.
Sorry, not sure what I was talking about here. I feel like this comment
landed in the wrong place 😬
I remember taking note of the use of VMA, memory map, address space all
over the place. I object to "VMA" and would rather have it spelled out
in documentation.
>
>> > + ///
>> > + /// This function does not touch the refcount of the namespace or use RCU protection.
>> > + #[doc(alias = "task_active_pid_ns")]
>>
>> What is with the alias?
>
> This is the Rust equivalent to the C function called
> task_active_pid_ns. The alias makes it easier to find it.
Cool.
>
>> > + #[inline]
>> > + pub fn active_pid_ns(&self) -> Option<&PidNamespace> {
>> > + // SAFETY: It is safe to call `task_active_pid_ns` without RCU protection when calling it
>> > + // on the current task.
>> > + let active_ns = unsafe { bindings::task_active_pid_ns(self.as_ptr()) };
>> > +
>> > + if active_ns.is_null() {
>> > + return None;
>> > + }
>> > +
>> > + // The lifetime of `PidNamespace` is bound to `Task` and `struct pid`.
>> > + //
>> > + // The `PidNamespace` of a `Task` doesn't ever change once the `Task` is alive. A
>> > + // `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)` will not have an effect
>> > + // on the calling `Task`'s pid namespace. It will only effect the pid namespace of children
>> > + // created by the calling `Task`. This invariant guarantees that after having acquired a
>> > + // reference to a `Task`'s pid namespace it will remain unchanged.
>> > + //
>> > + // When a task has exited and been reaped `release_task()` will be called. This will set
>> > + // the `PidNamespace` of the task to `NULL`. So retrieving the `PidNamespace` of a task
>> > + // that is dead will return `NULL`. Note, that neither holding the RCU lock nor holding a
>> > + // referencing count to the `Task` will prevent `release_task()` being called.
>> > + //
>> > + // In order to retrieve the `PidNamespace` of a `Task` the `task_active_pid_ns()` function
>> > + // can be used. There are two cases to consider:
>> > + //
>> > + // (1) retrieving the `PidNamespace` of the `current` task
>> > + // (2) retrieving the `PidNamespace` of a non-`current` task
>> > + //
>> > + // From system call context retrieving the `PidNamespace` for case (1) is always safe and
>> > + // requires neither RCU locking nor a reference count to be held. Retrieving the
>> > + // `PidNamespace` after `release_task()` for current will return `NULL` but no codepath
>> > + // like that is exposed to Rust.
>> > + //
>> > + // Retrieving the `PidNamespace` from system call context for (2) requires RCU protection.
>> > + // Accessing `PidNamespace` outside of RCU protection requires a reference count that
>> > + // must've been acquired while holding the RCU lock. Note that accessing a non-`current`
>> > + // task means `NULL` can be returned as the non-`current` task could have already passed
>> > + // through `release_task()`.
>> > + //
>> > + // To retrieve (1) the `&CurrentTask` type should be used which ensures that the returned
>> > + // `PidNamespace` cannot outlive the current task context. The `CurrentTask::active_pid_ns`
>> > + // function allows Rust to handle the common case of accessing `current`'s `PidNamespace`
>> > + // without RCU protection and without having to acquire a reference count.
>> > + //
>> > + // For (2) the `task_get_pid_ns()` method must be used. This will always acquire a
>> > + // reference on `PidNamespace` and will return an `Option` to force the caller to
>> > + // explicitly handle the case where `PidNamespace` is `None`, something that tends to be
>> > + // forgotten when doing the equivalent operation in `C`. Missing RCU primitives make it
>> > + // difficult to perform operations that are otherwise safe without holding a reference
>> > + // count as long as RCU protection is guaranteed. But it is not important currently. But we
>> > + // do want it in the future.
>> > + //
>> > + // Note for (2) the required RCU protection around calling `task_active_pid_ns()`
>> > + // synchronizes against putting the last reference of the associated `struct pid` of
>> > + // `task->thread_pid`. The `struct pid` stored in that field is used to retrieve the
>> > + // `PidNamespace` of the caller. When `release_task()` is called `task->thread_pid` will be
>> > + // `NULL`ed and `put_pid()` on said `struct pid` will be delayed in `free_pid()` via
>> > + // `call_rcu()` allowing everyone with an RCU protected access to the `struct pid` acquired
>> > + // from `task->thread_pid` to finish.
>>
>> While this comment is a nice piece of documentation, I think we should
>> move it elsewhere, or restrict it to paragraphs pertaining to (1), since
>> that is the only case we consider here?
>
> Where would you move it?
The info about (2) should probably be with the implementation for that
case, when it lands. Perhaps we can move it hen?
>
>> > + //
>> > + // SAFETY: If `current`'s pid ns is non-null, then it references a valid pid ns.
>> > + // Furthermore, the returned `&PidNamespace` borrows from this `CurrentTask`, so it cannot
>> > + // escape the scope in which the current pointer was obtained.
>> > + Some(unsafe { PidNamespace::from_ptr(active_ns) })
>> > + }
>>
>> Can we move the impl block and the struct definition next to each other?
>
> I could move the definition of CurrentTask down, but I'm not really
> convinced that it's an improvement.
I would prefer that, but it's just personal preference. I think it makes
for a more comfortable ride when reading the code first time.
Best regards,
Andreas Hindborg
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 8/8] task: rust: rework how current is accessed
2025-01-09 8:42 ` Andreas Hindborg
@ 2025-01-13 10:26 ` Alice Ryhl
2025-01-15 10:24 ` Andreas Hindborg
0 siblings, 1 reply; 65+ messages in thread
From: Alice Ryhl @ 2025-01-13 10:26 UTC (permalink / raw)
To: Andreas Hindborg
Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
On Thu, Jan 9, 2025 at 9:42 AM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>
> "Alice Ryhl" <aliceryhl@google.com> writes:
>
> > On Mon, Dec 16, 2024 at 3:51 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
> >>
> >> "Alice Ryhl" <aliceryhl@google.com> writes:
> >> > + #[inline]
> >> > + pub fn active_pid_ns(&self) -> Option<&PidNamespace> {
> >> > + // SAFETY: It is safe to call `task_active_pid_ns` without RCU protection when calling it
> >> > + // on the current task.
> >> > + let active_ns = unsafe { bindings::task_active_pid_ns(self.as_ptr()) };
> >> > +
> >> > + if active_ns.is_null() {
> >> > + return None;
> >> > + }
> >> > +
> >> > + // The lifetime of `PidNamespace` is bound to `Task` and `struct pid`.
> >> > + //
> >> > + // The `PidNamespace` of a `Task` doesn't ever change once the `Task` is alive. A
> >> > + // `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)` will not have an effect
> >> > + // on the calling `Task`'s pid namespace. It will only effect the pid namespace of children
> >> > + // created by the calling `Task`. This invariant guarantees that after having acquired a
> >> > + // reference to a `Task`'s pid namespace it will remain unchanged.
> >> > + //
> >> > + // When a task has exited and been reaped `release_task()` will be called. This will set
> >> > + // the `PidNamespace` of the task to `NULL`. So retrieving the `PidNamespace` of a task
> >> > + // that is dead will return `NULL`. Note, that neither holding the RCU lock nor holding a
> >> > + // referencing count to the `Task` will prevent `release_task()` being called.
> >> > + //
> >> > + // In order to retrieve the `PidNamespace` of a `Task` the `task_active_pid_ns()` function
> >> > + // can be used. There are two cases to consider:
> >> > + //
> >> > + // (1) retrieving the `PidNamespace` of the `current` task
> >> > + // (2) retrieving the `PidNamespace` of a non-`current` task
> >> > + //
> >> > + // From system call context retrieving the `PidNamespace` for case (1) is always safe and
> >> > + // requires neither RCU locking nor a reference count to be held. Retrieving the
> >> > + // `PidNamespace` after `release_task()` for current will return `NULL` but no codepath
> >> > + // like that is exposed to Rust.
> >> > + //
> >> > + // Retrieving the `PidNamespace` from system call context for (2) requires RCU protection.
> >> > + // Accessing `PidNamespace` outside of RCU protection requires a reference count that
> >> > + // must've been acquired while holding the RCU lock. Note that accessing a non-`current`
> >> > + // task means `NULL` can be returned as the non-`current` task could have already passed
> >> > + // through `release_task()`.
> >> > + //
> >> > + // To retrieve (1) the `&CurrentTask` type should be used which ensures that the returned
> >> > + // `PidNamespace` cannot outlive the current task context. The `CurrentTask::active_pid_ns`
> >> > + // function allows Rust to handle the common case of accessing `current`'s `PidNamespace`
> >> > + // without RCU protection and without having to acquire a reference count.
> >> > + //
> >> > + // For (2) the `task_get_pid_ns()` method must be used. This will always acquire a
> >> > + // reference on `PidNamespace` and will return an `Option` to force the caller to
> >> > + // explicitly handle the case where `PidNamespace` is `None`, something that tends to be
> >> > + // forgotten when doing the equivalent operation in `C`. Missing RCU primitives make it
> >> > + // difficult to perform operations that are otherwise safe without holding a reference
> >> > + // count as long as RCU protection is guaranteed. But it is not important currently. But we
> >> > + // do want it in the future.
> >> > + //
> >> > + // Note for (2) the required RCU protection around calling `task_active_pid_ns()`
> >> > + // synchronizes against putting the last reference of the associated `struct pid` of
> >> > + // `task->thread_pid`. The `struct pid` stored in that field is used to retrieve the
> >> > + // `PidNamespace` of the caller. When `release_task()` is called `task->thread_pid` will be
> >> > + // `NULL`ed and `put_pid()` on said `struct pid` will be delayed in `free_pid()` via
> >> > + // `call_rcu()` allowing everyone with an RCU protected access to the `struct pid` acquired
> >> > + // from `task->thread_pid` to finish.
> >>
> >> While this comment is a nice piece of documentation, I think we should
> >> move it elsewhere, or restrict it to paragraphs pertaining to (1), since
> >> that is the only case we consider here?
> >
> > Where would you move it?
>
> The info about (2) should probably be with the implementation for that
> case, when it lands. Perhaps we can move it hen?
The function already exists. It's called Task::get_pid_ns(). I think
the comment makes sense here: get_pid_ns() is the normal case where
you don't skip synchronization, and active_pid_ns() is the special
case where you can skip RCU due to reasons. This comment explains that
normally you cannot skip RCU, but in this special case you can.
Alice
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 8/8] task: rust: rework how current is accessed
2025-01-13 10:26 ` Alice Ryhl
@ 2025-01-15 10:24 ` Andreas Hindborg
0 siblings, 0 replies; 65+ messages in thread
From: Andreas Hindborg @ 2025-01-15 10:24 UTC (permalink / raw)
To: Alice Ryhl
Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
"Alice Ryhl" <aliceryhl@google.com> writes:
> On Thu, Jan 9, 2025 at 9:42 AM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>>
>> "Alice Ryhl" <aliceryhl@google.com> writes:
>>
>> > On Mon, Dec 16, 2024 at 3:51 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>> >>
>> >> "Alice Ryhl" <aliceryhl@google.com> writes:
>> >> > + #[inline]
>> >> > + pub fn active_pid_ns(&self) -> Option<&PidNamespace> {
>> >> > + // SAFETY: It is safe to call `task_active_pid_ns` without RCU protection when calling it
>> >> > + // on the current task.
>> >> > + let active_ns = unsafe { bindings::task_active_pid_ns(self.as_ptr()) };
>> >> > +
>> >> > + if active_ns.is_null() {
>> >> > + return None;
>> >> > + }
>> >> > +
>> >> > + // The lifetime of `PidNamespace` is bound to `Task` and `struct pid`.
>> >> > + //
>> >> > + // The `PidNamespace` of a `Task` doesn't ever change once the `Task` is alive. A
>> >> > + // `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)` will not have an effect
>> >> > + // on the calling `Task`'s pid namespace. It will only effect the pid namespace of children
>> >> > + // created by the calling `Task`. This invariant guarantees that after having acquired a
>> >> > + // reference to a `Task`'s pid namespace it will remain unchanged.
>> >> > + //
>> >> > + // When a task has exited and been reaped `release_task()` will be called. This will set
>> >> > + // the `PidNamespace` of the task to `NULL`. So retrieving the `PidNamespace` of a task
>> >> > + // that is dead will return `NULL`. Note, that neither holding the RCU lock nor holding a
>> >> > + // referencing count to the `Task` will prevent `release_task()` being called.
>> >> > + //
>> >> > + // In order to retrieve the `PidNamespace` of a `Task` the `task_active_pid_ns()` function
>> >> > + // can be used. There are two cases to consider:
>> >> > + //
>> >> > + // (1) retrieving the `PidNamespace` of the `current` task
>> >> > + // (2) retrieving the `PidNamespace` of a non-`current` task
>> >> > + //
>> >> > + // From system call context retrieving the `PidNamespace` for case (1) is always safe and
>> >> > + // requires neither RCU locking nor a reference count to be held. Retrieving the
>> >> > + // `PidNamespace` after `release_task()` for current will return `NULL` but no codepath
>> >> > + // like that is exposed to Rust.
>> >> > + //
>> >> > + // Retrieving the `PidNamespace` from system call context for (2) requires RCU protection.
>> >> > + // Accessing `PidNamespace` outside of RCU protection requires a reference count that
>> >> > + // must've been acquired while holding the RCU lock. Note that accessing a non-`current`
>> >> > + // task means `NULL` can be returned as the non-`current` task could have already passed
>> >> > + // through `release_task()`.
>> >> > + //
>> >> > + // To retrieve (1) the `&CurrentTask` type should be used which ensures that the returned
>> >> > + // `PidNamespace` cannot outlive the current task context. The `CurrentTask::active_pid_ns`
>> >> > + // function allows Rust to handle the common case of accessing `current`'s `PidNamespace`
>> >> > + // without RCU protection and without having to acquire a reference count.
>> >> > + //
>> >> > + // For (2) the `task_get_pid_ns()` method must be used. This will always acquire a
>> >> > + // reference on `PidNamespace` and will return an `Option` to force the caller to
>> >> > + // explicitly handle the case where `PidNamespace` is `None`, something that tends to be
>> >> > + // forgotten when doing the equivalent operation in `C`. Missing RCU primitives make it
>> >> > + // difficult to perform operations that are otherwise safe without holding a reference
>> >> > + // count as long as RCU protection is guaranteed. But it is not important currently. But we
>> >> > + // do want it in the future.
>> >> > + //
>> >> > + // Note for (2) the required RCU protection around calling `task_active_pid_ns()`
>> >> > + // synchronizes against putting the last reference of the associated `struct pid` of
>> >> > + // `task->thread_pid`. The `struct pid` stored in that field is used to retrieve the
>> >> > + // `PidNamespace` of the caller. When `release_task()` is called `task->thread_pid` will be
>> >> > + // `NULL`ed and `put_pid()` on said `struct pid` will be delayed in `free_pid()` via
>> >> > + // `call_rcu()` allowing everyone with an RCU protected access to the `struct pid` acquired
>> >> > + // from `task->thread_pid` to finish.
>> >>
>> >> While this comment is a nice piece of documentation, I think we should
>> >> move it elsewhere, or restrict it to paragraphs pertaining to (1), since
>> >> that is the only case we consider here?
>> >
>> > Where would you move it?
>>
>> The info about (2) should probably be with the implementation for that
>> case, when it lands. Perhaps we can move it hen?
>
> The function already exists. It's called Task::get_pid_ns(). I think
> the comment makes sense here: get_pid_ns() is the normal case where
> you don't skip synchronization, and active_pid_ns() is the special
> case where you can skip RCU due to reasons. This comment explains that
> normally you cannot skip RCU, but in this special case you can.
Reading this again I think it should simply be cut down in size:
```
The lifetime of `PidNamespace` is bound to `Task` and `struct pid`.
The `PidNamespace` of a `Task` doesn't ever change once the `Task` is alive.
From system call context retrieving the `PidNamespace` for the current
task is always safe and requires neither RCU locking nor a reference
count to be held. Retrieving the `PidNamespace` after `release_task()`
for current will return `NULL` but no codepath like that is exposed to
Rust.
```
The rest is not relevant to this function and it does not help
understanding the function.
Another thought - add a link to `get_pid_ns`:
@@ -307,6 +307,8 @@ pub fn mm(&self) -> Option<&MmWithUser> {
/// Access the pid namespace of the current task.
///
/// This function does not touch the refcount of the namespace or use RCU protection.
+ ///
+ /// To access the pid namespace of another task, see [`Task::get_pid_ns`].
#[doc(alias = "task_active_pid_ns")]
#[inline]
pub fn active_pid_ns(&self) -> Option<&PidNamespace> {
Best regards,
Andreas Hindborg
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH v11 8/8] task: rust: rework how current is accessed
2024-12-11 10:37 ` [PATCH v11 8/8] task: rust: rework how current is accessed Alice Ryhl
2024-12-16 14:47 ` Andreas Hindborg
@ 2024-12-16 23:40 ` Boqun Feng
2025-01-13 10:30 ` Alice Ryhl
1 sibling, 1 reply; 65+ messages in thread
From: Boqun Feng @ 2024-12-16 23:40 UTC (permalink / raw)
To: Alice Ryhl
Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
On Wed, Dec 11, 2024 at 10:37:12AM +0000, Alice Ryhl wrote:
> Introduce a new type called `CurrentTask` that lets you perform various
> operations that are only safe on the `current` task. Use the new type to
> provide a way to access the current mm without incrementing its
> refcount.
>
> With this change, you can write stuff such as
>
> let vma = current!().mm().lock_vma_under_rcu(addr);
>
> without incrementing any refcounts.
>
> This replaces the existing abstractions for accessing the current pid
> namespace. With the old approach, every field access to current involves
> both a macro and a unsafe helper function. The new approach simplifies
> that to a single safe function on the `CurrentTask` type. This makes it
> less heavy-weight to add additional current accessors in the future.
>
> That said, creating a `CurrentTask` type like the one in this patch
> requires that we are careful to ensure that it cannot escape the current
> task or otherwise access things after they are freed. To do this, I
> declared that it cannot escape the current "task context" where I
> defined a "task context" as essentially the region in which `current`
> remains unchanged. So e.g., release_task() or begin_new_exec() would
> leave the task context.
>
> If a userspace thread returns to userspace and later makes another
> syscall, then I consider the two syscalls to be different task contexts.
> This allows values stored in that task to be modified between syscalls,
> even if they're guaranteed to be immutable during a syscall.
>
> Ensuring correctness of `CurrentTask` is slightly tricky if we also want
> the ability to have a safe `kthread_use_mm()` implementation in Rust. To
> support that safely, there are two patterns we need to ensure are safe:
>
> // Case 1: current!() called inside the scope.
> let mm;
> kthread_use_mm(some_mm, || {
> mm = current!().mm();
> });
> drop(some_mm);
> mm.do_something(); // UAF
>
> and:
>
> // Case 2: current!() called before the scope.
> let mm;
> let task = current!();
> kthread_use_mm(some_mm, || {
> mm = task.mm();
> });
> drop(some_mm);
> mm.do_something(); // UAF
>
> The existing `current!()` abstraction already natively prevents the
> first case: The `&CurrentTask` would be tied to the inner scope, so the
> borrow-checker ensures that no reference derived from it can escape the
> scope.
>
> Fixing the second case is a bit more tricky. The solution is to
> essentially pretend that the contents of the scope execute on an
> different thread, which means that only thread-safe types can cross the
> boundary. Since `CurrentTask` is marked `NotThreadSafe`, attempts to
> move it to another thread will fail, and this includes our fake pretend
> thread boundary.
>
> This has the disadvantage that other types that aren't thread-safe for
> reasons unrelated to `current` also cannot be moved across the
> `kthread_use_mm()` boundary. I consider this an acceptable tradeoff.
>
> Cc: Christian Brauner <brauner@kernel.org>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> ---
> rust/kernel/mm.rs | 22 ----
> rust/kernel/task.rs | 284 ++++++++++++++++++++++++++++++----------------------
> 2 files changed, 167 insertions(+), 139 deletions(-)
>
> diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
> index 50f4861ae4b9..f7d1079391ef 100644
> --- a/rust/kernel/mm.rs
> +++ b/rust/kernel/mm.rs
> @@ -142,28 +142,6 @@ fn deref(&self) -> &MmWithUser {
>
> // These methods are safe to call even if `mm_users` is zero.
> impl Mm {
> - /// Call `mmgrab` on `current.mm`.
> - #[inline]
> - pub fn mmgrab_current() -> Option<ARef<Mm>> {
> - // SAFETY: It's safe to get the `mm` field from current.
> - let mm = unsafe {
> - let current = bindings::get_current();
> - (*current).mm
> - };
> -
> - if mm.is_null() {
> - return None;
> - }
> -
> - // SAFETY: The value of `current->mm` is guaranteed to be null or a valid `mm_struct`. We
> - // just checked that it's not null. Furthermore, the returned `&Mm` is valid only for the
> - // duration of this function, and `current->mm` will stay valid for that long.
> - let mm = unsafe { Mm::from_raw(mm) };
> -
> - // This increments the refcount using `mmgrab`.
> - Some(ARef::from(mm))
> - }
> -
This is removed because of no user? If so, maybe don't introduce this at
all in the earlier patch of this series? The rest looks good to me.
Regards,
Boqun
> /// Returns a raw pointer to the inner `mm_struct`.
> #[inline]
> pub fn as_raw(&self) -> *mut bindings::mm_struct {
> diff --git a/rust/kernel/task.rs b/rust/kernel/task.rs
> index 07bc22a7645c..8c1ee46c03eb 100644
> --- a/rust/kernel/task.rs
> +++ b/rust/kernel/task.rs
> @@ -7,6 +7,7 @@
> use crate::{
> bindings,
> ffi::{c_int, c_long, c_uint},
> + mm::MmWithUser,
> pid_namespace::PidNamespace,
> types::{ARef, NotThreadSafe, Opaque},
> };
> @@ -31,22 +32,20 @@
[...]
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 8/8] task: rust: rework how current is accessed
2024-12-16 23:40 ` Boqun Feng
@ 2025-01-13 10:30 ` Alice Ryhl
2025-01-14 15:30 ` Boqun Feng
0 siblings, 1 reply; 65+ messages in thread
From: Alice Ryhl @ 2025-01-13 10:30 UTC (permalink / raw)
To: Boqun Feng
Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
On Tue, Dec 17, 2024 at 12:40 AM Boqun Feng <boqun.feng@gmail.com> wrote:
>
> On Wed, Dec 11, 2024 at 10:37:12AM +0000, Alice Ryhl wrote:
> > Introduce a new type called `CurrentTask` that lets you perform various
> > operations that are only safe on the `current` task. Use the new type to
> > provide a way to access the current mm without incrementing its
> > refcount.
> >
> > With this change, you can write stuff such as
> >
> > let vma = current!().mm().lock_vma_under_rcu(addr);
> >
> > without incrementing any refcounts.
> >
> > This replaces the existing abstractions for accessing the current pid
> > namespace. With the old approach, every field access to current involves
> > both a macro and a unsafe helper function. The new approach simplifies
> > that to a single safe function on the `CurrentTask` type. This makes it
> > less heavy-weight to add additional current accessors in the future.
> >
> > That said, creating a `CurrentTask` type like the one in this patch
> > requires that we are careful to ensure that it cannot escape the current
> > task or otherwise access things after they are freed. To do this, I
> > declared that it cannot escape the current "task context" where I
> > defined a "task context" as essentially the region in which `current`
> > remains unchanged. So e.g., release_task() or begin_new_exec() would
> > leave the task context.
> >
> > If a userspace thread returns to userspace and later makes another
> > syscall, then I consider the two syscalls to be different task contexts.
> > This allows values stored in that task to be modified between syscalls,
> > even if they're guaranteed to be immutable during a syscall.
> >
> > Ensuring correctness of `CurrentTask` is slightly tricky if we also want
> > the ability to have a safe `kthread_use_mm()` implementation in Rust. To
> > support that safely, there are two patterns we need to ensure are safe:
> >
> > // Case 1: current!() called inside the scope.
> > let mm;
> > kthread_use_mm(some_mm, || {
> > mm = current!().mm();
> > });
> > drop(some_mm);
> > mm.do_something(); // UAF
> >
> > and:
> >
> > // Case 2: current!() called before the scope.
> > let mm;
> > let task = current!();
> > kthread_use_mm(some_mm, || {
> > mm = task.mm();
> > });
> > drop(some_mm);
> > mm.do_something(); // UAF
> >
> > The existing `current!()` abstraction already natively prevents the
> > first case: The `&CurrentTask` would be tied to the inner scope, so the
> > borrow-checker ensures that no reference derived from it can escape the
> > scope.
> >
> > Fixing the second case is a bit more tricky. The solution is to
> > essentially pretend that the contents of the scope execute on an
> > different thread, which means that only thread-safe types can cross the
> > boundary. Since `CurrentTask` is marked `NotThreadSafe`, attempts to
> > move it to another thread will fail, and this includes our fake pretend
> > thread boundary.
> >
> > This has the disadvantage that other types that aren't thread-safe for
> > reasons unrelated to `current` also cannot be moved across the
> > `kthread_use_mm()` boundary. I consider this an acceptable tradeoff.
> >
> > Cc: Christian Brauner <brauner@kernel.org>
> > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> > ---
> > rust/kernel/mm.rs | 22 ----
> > rust/kernel/task.rs | 284 ++++++++++++++++++++++++++++++----------------------
> > 2 files changed, 167 insertions(+), 139 deletions(-)
> >
> > diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
> > index 50f4861ae4b9..f7d1079391ef 100644
> > --- a/rust/kernel/mm.rs
> > +++ b/rust/kernel/mm.rs
> > @@ -142,28 +142,6 @@ fn deref(&self) -> &MmWithUser {
> >
> > // These methods are safe to call even if `mm_users` is zero.
> > impl Mm {
> > - /// Call `mmgrab` on `current.mm`.
> > - #[inline]
> > - pub fn mmgrab_current() -> Option<ARef<Mm>> {
> > - // SAFETY: It's safe to get the `mm` field from current.
> > - let mm = unsafe {
> > - let current = bindings::get_current();
> > - (*current).mm
> > - };
> > -
> > - if mm.is_null() {
> > - return None;
> > - }
> > -
> > - // SAFETY: The value of `current->mm` is guaranteed to be null or a valid `mm_struct`. We
> > - // just checked that it's not null. Furthermore, the returned `&Mm` is valid only for the
> > - // duration of this function, and `current->mm` will stay valid for that long.
> > - let mm = unsafe { Mm::from_raw(mm) };
> > -
> > - // This increments the refcount using `mmgrab`.
> > - Some(ARef::from(mm))
> > - }
> > -
>
> This is removed because of no user? If so, maybe don't introduce this at
> all in the earlier patch of this series? The rest looks good to me.
I guess I can drop the temporary introduction of this. It's here due
to the history of this series where originally it only had
mmgrab_current, and Binder would use that. But with this patch, you
can use CurrentTask::mm() + ARef::from() to do the same thing. For
Binder, the difference doesn't matter, but the latter is more powerful
as you can access the current task's mm_struct without incrementing
refcounts.
Alice
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 8/8] task: rust: rework how current is accessed
2025-01-13 10:30 ` Alice Ryhl
@ 2025-01-14 15:30 ` Boqun Feng
0 siblings, 0 replies; 65+ messages in thread
From: Boqun Feng @ 2025-01-14 15:30 UTC (permalink / raw)
To: Alice Ryhl
Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
On Mon, Jan 13, 2025 at 11:30:05AM +0100, Alice Ryhl wrote:
> On Tue, Dec 17, 2024 at 12:40 AM Boqun Feng <boqun.feng@gmail.com> wrote:
> >
> > On Wed, Dec 11, 2024 at 10:37:12AM +0000, Alice Ryhl wrote:
> > > Introduce a new type called `CurrentTask` that lets you perform various
> > > operations that are only safe on the `current` task. Use the new type to
> > > provide a way to access the current mm without incrementing its
> > > refcount.
> > >
> > > With this change, you can write stuff such as
> > >
> > > let vma = current!().mm().lock_vma_under_rcu(addr);
> > >
> > > without incrementing any refcounts.
> > >
> > > This replaces the existing abstractions for accessing the current pid
> > > namespace. With the old approach, every field access to current involves
> > > both a macro and a unsafe helper function. The new approach simplifies
> > > that to a single safe function on the `CurrentTask` type. This makes it
> > > less heavy-weight to add additional current accessors in the future.
> > >
> > > That said, creating a `CurrentTask` type like the one in this patch
> > > requires that we are careful to ensure that it cannot escape the current
> > > task or otherwise access things after they are freed. To do this, I
> > > declared that it cannot escape the current "task context" where I
> > > defined a "task context" as essentially the region in which `current`
> > > remains unchanged. So e.g., release_task() or begin_new_exec() would
> > > leave the task context.
> > >
> > > If a userspace thread returns to userspace and later makes another
> > > syscall, then I consider the two syscalls to be different task contexts.
> > > This allows values stored in that task to be modified between syscalls,
> > > even if they're guaranteed to be immutable during a syscall.
> > >
> > > Ensuring correctness of `CurrentTask` is slightly tricky if we also want
> > > the ability to have a safe `kthread_use_mm()` implementation in Rust. To
> > > support that safely, there are two patterns we need to ensure are safe:
> > >
> > > // Case 1: current!() called inside the scope.
> > > let mm;
> > > kthread_use_mm(some_mm, || {
> > > mm = current!().mm();
> > > });
> > > drop(some_mm);
> > > mm.do_something(); // UAF
> > >
> > > and:
> > >
> > > // Case 2: current!() called before the scope.
> > > let mm;
> > > let task = current!();
> > > kthread_use_mm(some_mm, || {
> > > mm = task.mm();
> > > });
> > > drop(some_mm);
> > > mm.do_something(); // UAF
> > >
> > > The existing `current!()` abstraction already natively prevents the
> > > first case: The `&CurrentTask` would be tied to the inner scope, so the
> > > borrow-checker ensures that no reference derived from it can escape the
> > > scope.
> > >
> > > Fixing the second case is a bit more tricky. The solution is to
> > > essentially pretend that the contents of the scope execute on an
> > > different thread, which means that only thread-safe types can cross the
> > > boundary. Since `CurrentTask` is marked `NotThreadSafe`, attempts to
> > > move it to another thread will fail, and this includes our fake pretend
> > > thread boundary.
> > >
> > > This has the disadvantage that other types that aren't thread-safe for
> > > reasons unrelated to `current` also cannot be moved across the
> > > `kthread_use_mm()` boundary. I consider this an acceptable tradeoff.
> > >
> > > Cc: Christian Brauner <brauner@kernel.org>
> > > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> > > ---
> > > rust/kernel/mm.rs | 22 ----
> > > rust/kernel/task.rs | 284 ++++++++++++++++++++++++++++++----------------------
> > > 2 files changed, 167 insertions(+), 139 deletions(-)
> > >
> > > diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
> > > index 50f4861ae4b9..f7d1079391ef 100644
> > > --- a/rust/kernel/mm.rs
> > > +++ b/rust/kernel/mm.rs
> > > @@ -142,28 +142,6 @@ fn deref(&self) -> &MmWithUser {
> > >
> > > // These methods are safe to call even if `mm_users` is zero.
> > > impl Mm {
> > > - /// Call `mmgrab` on `current.mm`.
> > > - #[inline]
> > > - pub fn mmgrab_current() -> Option<ARef<Mm>> {
> > > - // SAFETY: It's safe to get the `mm` field from current.
> > > - let mm = unsafe {
> > > - let current = bindings::get_current();
> > > - (*current).mm
> > > - };
> > > -
> > > - if mm.is_null() {
> > > - return None;
> > > - }
> > > -
> > > - // SAFETY: The value of `current->mm` is guaranteed to be null or a valid `mm_struct`. We
> > > - // just checked that it's not null. Furthermore, the returned `&Mm` is valid only for the
> > > - // duration of this function, and `current->mm` will stay valid for that long.
> > > - let mm = unsafe { Mm::from_raw(mm) };
> > > -
> > > - // This increments the refcount using `mmgrab`.
> > > - Some(ARef::from(mm))
> > > - }
> > > -
> >
> > This is removed because of no user? If so, maybe don't introduce this at
> > all in the earlier patch of this series? The rest looks good to me.
>
> I guess I can drop the temporary introduction of this. It's here due
> to the history of this series where originally it only had
> mmgrab_current, and Binder would use that. But with this patch, you
As someone who usually dig a lot of history in git, I would like to see
the drop of the temporary introduction ;-) If you're going to rebase,
please see whether the drop can be done easily, thanks! Not a block
issue though.
With or without the change, feel free to add:
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Regards,
Boqun
> can use CurrentTask::mm() + ARef::from() to do the same thing. For
> Binder, the difference doesn't matter, but the latter is more powerful
> as you can access the current task's mm_struct without incrementing
> refcounts.
>
> Alice
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH v11 0/8] Rust support for mm_struct, vm_area_struct, and mmap
2024-12-11 10:37 ` [PATCH v11 0/8] Rust support for mm_struct, vm_area_struct, and mmap Alice Ryhl
` (7 preceding siblings ...)
2024-12-11 10:37 ` [PATCH v11 8/8] task: rust: rework how current is accessed Alice Ryhl
@ 2024-12-11 10:47 ` Alice Ryhl
2024-12-12 14:47 ` Konstantin Ryabitsev
2024-12-16 11:04 ` Andreas Hindborg
9 siblings, 1 reply; 65+ messages in thread
From: Alice Ryhl @ 2024-12-11 10:47 UTC (permalink / raw)
To: Konstantin Ryabitsev; +Cc: linux-kernel, linux-mm, rust-for-linux
On Wed, Dec 11, 2024 at 11:37 AM Alice Ryhl <aliceryhl@google.com> wrote:
>
> This updates the vm_area_struct support to use the approach we discussed
> at LPC where there are several different Rust wrappers for
> vm_area_struct depending on the kind of access you have to the vma. Each
> case allows a different set of operations on the vma.
>
> Patch 8 in particular could use review.
>
> To: Miguel Ojeda <ojeda@kernel.org>
> To: Matthew Wilcox <willy@infradead.org>
> To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> To: Vlastimil Babka <vbabka@suse.cz>
> To: John Hubbard <jhubbard@nvidia.com>
> To: Liam R. Howlett <Liam.Howlett@oracle.com>
> To: Andrew Morton <akpm@linux-foundation.org>
> To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> To: Arnd Bergmann <arnd@arndb.de>
> To: Christian Brauner <brauner@kernel.org>
> To: Jann Horn <jannh@google.com>
> To: Suren Baghdasaryan <surenb@google.com>
> Cc: Alex Gaynor <alex.gaynor@gmail.com>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Gary Guo <gary@garyguo.net>
> Cc: Björn Roy Baron <bjorn3_gh@protonmail.com>
> Cc: Benno Lossin <benno.lossin@proton.me>
> Cc: Andreas Hindborg <a.hindborg@kernel.org>
> Cc: Trevor Gross <tmgross@umich.edu>
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-mm@kvack.org
> Cc: rust-for-linux@vger.kernel.org
> Cc: Alice Ryhl <aliceryhl@google.com>
When I sent this series, b4 put the changelog stub for v12 above the
cover letter for some reason. Also, I'm not sure why the list of
recipients were included in the cover letter. Any ideas what I'm doing
wrong?
This is what I sent:
https://github.com/Darksonn/linux/tree/b4/vma-v11
Alice
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 0/8] Rust support for mm_struct, vm_area_struct, and mmap
2024-12-11 10:47 ` [PATCH v11 0/8] Rust support for mm_struct, vm_area_struct, and mmap Alice Ryhl
@ 2024-12-12 14:47 ` Konstantin Ryabitsev
2024-12-13 14:42 ` Alice Ryhl
0 siblings, 1 reply; 65+ messages in thread
From: Konstantin Ryabitsev @ 2024-12-12 14:47 UTC (permalink / raw)
To: Alice Ryhl; +Cc: linux-kernel, linux-mm, rust-for-linux
On Wed, Dec 11, 2024 at 11:47:41AM +0100, Alice Ryhl wrote:
> When I sent this series, b4 put the changelog stub for v12 above the
> cover letter for some reason. Also, I'm not sure why the list of
> recipients were included in the cover letter. Any ideas what I'm doing
> wrong?
Yes, and it's a common gotcha that I don't know how to properly address. For
the moment, we use "---" lines to indicate the main sections of the cover
letter. There are three main sections:
The main message
---
Additional information
---
The basement
Looks like you removed the "---" between the changelog and the main message,
which causes b4 to stop properly parsing the cover letter.
I'm open to suggestions on how to make this less fragile, short of "use AI to
figure out what part of the cover letter does what."
-K
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 0/8] Rust support for mm_struct, vm_area_struct, and mmap
2024-12-12 14:47 ` Konstantin Ryabitsev
@ 2024-12-13 14:42 ` Alice Ryhl
2024-12-13 14:47 ` Konstantin Ryabitsev
0 siblings, 1 reply; 65+ messages in thread
From: Alice Ryhl @ 2024-12-13 14:42 UTC (permalink / raw)
To: Konstantin Ryabitsev; +Cc: linux-kernel, linux-mm, rust-for-linux
On Thu, Dec 12, 2024 at 3:47 PM Konstantin Ryabitsev
<konstantin@linuxfoundation.org> wrote:
>
> On Wed, Dec 11, 2024 at 11:47:41AM +0100, Alice Ryhl wrote:
> > When I sent this series, b4 put the changelog stub for v12 above the
> > cover letter for some reason. Also, I'm not sure why the list of
> > recipients were included in the cover letter. Any ideas what I'm doing
> > wrong?
>
> Yes, and it's a common gotcha that I don't know how to properly address. For
> the moment, we use "---" lines to indicate the main sections of the cover
> letter. There are three main sections:
>
> The main message
>
> ---
>
> Additional information
>
> ---
>
> The basement
>
> Looks like you removed the "---" between the changelog and the main message,
> which causes b4 to stop properly parsing the cover letter.
>
> I'm open to suggestions on how to make this less fragile, short of "use AI to
> figure out what part of the cover letter does what."
Could you print an error if the --- is missing, that is, if the number
of sections is incorrect?
Alice
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH v11 0/8] Rust support for mm_struct, vm_area_struct, and mmap
2024-12-13 14:42 ` Alice Ryhl
@ 2024-12-13 14:47 ` Konstantin Ryabitsev
0 siblings, 0 replies; 65+ messages in thread
From: Konstantin Ryabitsev @ 2024-12-13 14:47 UTC (permalink / raw)
To: Alice Ryhl; +Cc: linux-kernel, linux-mm, rust-for-linux
On Fri, Dec 13, 2024 at 03:42:48PM +0100, Alice Ryhl wrote:
> Could you print an error if the --- is missing, that is, if the number
> of sections is incorrect?
I don't think that's the right way to go, either, because the number of "---"
sections can vary (including having none at all). Throwing an error when that
happens would just annoy a different set of people.
I'll think of something.
-K
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH v11 0/8] Rust support for mm_struct, vm_area_struct, and mmap
2024-12-11 10:37 ` [PATCH v11 0/8] Rust support for mm_struct, vm_area_struct, and mmap Alice Ryhl
` (8 preceding siblings ...)
2024-12-11 10:47 ` [PATCH v11 0/8] Rust support for mm_struct, vm_area_struct, and mmap Alice Ryhl
@ 2024-12-16 11:04 ` Andreas Hindborg
2024-12-16 11:46 ` Alice Ryhl
9 siblings, 1 reply; 65+ messages in thread
From: Andreas Hindborg @ 2024-12-16 11:04 UTC (permalink / raw)
To: Alice Ryhl
Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
Hi Alice,
Applied on top of v6.13-rc2 and tried to build:
error[E0277]: the trait bound `ARef<Task>: From<&CurrentTask>` is not satisfied
--> rust/doctests_kernel_generated.rs:6884:22
|
6884 | creator: current!().into(),
| ^^^^^^^^^^ ---- required by a bound introduced by this call
| |
| the trait `From<&CurrentTask>` is not implemented for `ARef<Task>`, which is required by `&CurrentTask: Into<_>`
| this tail expression is of type `&CurrentTask`
|
= help: the trait `From<&Task>` is implemented for `ARef<Task>`
= help: for that trait implementation, expected `Task`, found `CurrentTask`
= note: required for `&CurrentTask` to implement `Into<ARef<Task>>`
error: aborting due to 1 previous error
Best regards,
Andreas Hindborg
^ permalink raw reply [flat|nested] 65+ messages in thread* Re: [PATCH v11 0/8] Rust support for mm_struct, vm_area_struct, and mmap
2024-12-16 11:04 ` Andreas Hindborg
@ 2024-12-16 11:46 ` Alice Ryhl
0 siblings, 0 replies; 65+ messages in thread
From: Alice Ryhl @ 2024-12-16 11:46 UTC (permalink / raw)
To: Andreas Hindborg, Alice Ryhl
Cc: Miguel Ojeda, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka,
John Hubbard, Liam R. Howlett, Andrew Morton, Greg Kroah-Hartman,
Arnd Bergmann, Christian Brauner, Jann Horn, Suren Baghdasaryan,
Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Trevor Gross, linux-kernel, linux-mm,
rust-for-linux
On 12/16/24 12:04 PM, Andreas Hindborg wrote:
> Hi Alice,
>
> Applied on top of v6.13-rc2 and tried to build:
>
> error[E0277]: the trait bound `ARef<Task>: From<&CurrentTask>` is not satisfied
> --> rust/doctests_kernel_generated.rs:6884:22
> |
> 6884 | creator: current!().into(),
> | ^^^^^^^^^^ ---- required by a bound introduced by this call
> | |
> | the trait `From<&CurrentTask>` is not implemented for `ARef<Task>`, which is required by `&CurrentTask: Into<_>`
> | this tail expression is of type `&CurrentTask`
> |
> = help: the trait `From<&Task>` is implemented for `ARef<Task>`
> = help: for that trait implementation, expected `Task`, found `CurrentTask`
> = note: required for `&CurrentTask` to implement `Into<ARef<Task>>`
>
> error: aborting due to 1 previous error
Ah, thanks. Looks like a documentation test that needs to be adjusted.
Alice
^ permalink raw reply [flat|nested] 65+ messages in thread