From: Yury Norov <ynorov@nvidia.com>
To: Mitchell Levy <levymitchell0@gmail.com>
Cc: "Miguel Ojeda" <ojeda@kernel.org>,
"Alex Gaynor" <alex.gaynor@gmail.com>,
"Gary Guo" <gary@garyguo.net>,
"Björn Roy Baron" <bjorn3_gh@protonmail.com>,
"Andreas Hindborg" <a.hindborg@kernel.org>,
"Alice Ryhl" <aliceryhl@google.com>,
"Trevor Gross" <tmgross@umich.edu>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Dennis Zhou" <dennis@kernel.org>, "Tejun Heo" <tj@kernel.org>,
"Christoph Lameter" <cl@linux.com>,
"Danilo Krummrich" <dakr@kernel.org>,
"Benno Lossin" <lossin@kernel.org>,
"Yury Norov" <yury.norov@gmail.com>,
"Viresh Kumar" <viresh.kumar@linaro.org>,
"Boqun Feng" <boqun@kernel.org>, "Tyler Hicks" <code@tyhicks.com>,
"Allen Pais" <apais@linux.microsoft.com>,
linux-kernel@vger.kernel.org, rust-for-linux@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [PATCH v5 7/8] rust: percpu: Add pin-hole optimizations for numerics
Date: Fri, 10 Apr 2026 23:06:22 -0400 [thread overview]
Message-ID: <adm6rmerdgVtclo5@yury> (raw)
In-Reply-To: <20260410-rust-percpu-v5-7-4292380d7a41@gmail.com>
On Fri, Apr 10, 2026 at 02:35:37PM -0700, Mitchell Levy wrote:
> The C implementations of `this_cpu_add`, `this_cpu_sub`, etc., are
> optimized to save an instruction by avoiding having to compute
> `this_cpu_ptr(&x)` for some per-CPU variable `x`. For example, rather
> than
>
> u64 *x_ptr = this_cpu_ptr(&x);
> *x_ptr += 5;
>
> the implementation of `this_cpu_add` is clever enough to make use of the
> fact that per-CPU variables are implemented on x86 via segment
> registers, and so we can use only a single instruction (where we assume
> `&x` is already in `rax`)
>
> add gs:[rax], 5
>
> Add this optimization via a `PerCpuNumeric` type to enable code-reuse
> between `DynamicPerCpu` and `StaticPerCpu`.
>
> Signed-off-by: Mitchell Levy <levymitchell0@gmail.com>
> ---
> rust/kernel/percpu.rs | 1 +
> rust/kernel/percpu/dynamic.rs | 10 ++-
> rust/kernel/percpu/numeric.rs | 138 ++++++++++++++++++++++++++++++++++++++++++
> samples/rust/rust_percpu.rs | 36 +++++++++++
> 4 files changed, 184 insertions(+), 1 deletion(-)
>
> diff --git a/rust/kernel/percpu.rs b/rust/kernel/percpu.rs
> index 72c83fef68ee..ff04607ee047 100644
> --- a/rust/kernel/percpu.rs
> +++ b/rust/kernel/percpu.rs
> @@ -6,6 +6,7 @@
>
> pub mod cpu_guard;
> mod dynamic;
> +pub mod numeric;
> mod static_;
>
> #[doc(inline)]
> diff --git a/rust/kernel/percpu/dynamic.rs b/rust/kernel/percpu/dynamic.rs
> index 40514704b3d0..a717138b93dc 100644
> --- a/rust/kernel/percpu/dynamic.rs
> +++ b/rust/kernel/percpu/dynamic.rs
> @@ -28,7 +28,7 @@
> /// the memory location on any particular CPU has been initialized. This means that it cannot tell
> /// whether it should drop the *contents* of the allocation when it is dropped. It is up to the
> /// user to do this via something like [`core::ptr::drop_in_place`].
> -pub struct PerCpuAllocation<T>(PerCpuPtr<T>);
> +pub struct PerCpuAllocation<T>(pub(super) PerCpuPtr<T>);
>
> impl<T: Zeroable> PerCpuAllocation<T> {
> /// Dynamically allocates a space in the per-CPU area suitably sized and aligned to hold a `T`,
> @@ -162,6 +162,14 @@ pub fn new_from(mut initer: impl FnMut(CpuId) -> T, flags: Flags) -> Option<Self
> }
> }
>
> +impl<T> DynamicPerCpu<T> {
> + /// Gets the allocation backing this per-CPU variable.
> + pub(crate) fn alloc(&self) -> &Arc<PerCpuAllocation<T>> {
> + // SAFETY: This type's invariant ensures that `self.alloc` is `Some`.
> + unsafe { self.alloc.as_ref().unwrap_unchecked() }
> + }
> +}
> +
> impl<T> PerCpu<T> for DynamicPerCpu<T> {
> unsafe fn get_mut(&mut self, guard: CpuGuard) -> PerCpuToken<'_, T> {
> // SAFETY:
> diff --git a/rust/kernel/percpu/numeric.rs b/rust/kernel/percpu/numeric.rs
> new file mode 100644
> index 000000000000..13b4ab4a794d
> --- /dev/null
> +++ b/rust/kernel/percpu/numeric.rs
> @@ -0,0 +1,138 @@
> +// SPDX-License-Identifier: GPL-2.0
> +//! Pin-hole optimizations for [`PerCpu<T>`] where T is a numeric type.
> +
> +use super::*;
> +use core::arch::asm;
> +
> +/// Represents a per-CPU variable that can be manipulated with machine-intrinsic numeric
> +/// operations.
> +pub struct PerCpuNumeric<'a, T> {
> + // INVARIANT: `ptr.0` is a valid offset into the per-CPU area and is initialized on all CPUs
> + // (since we don't have a CPU guard, we have to be pessimistic and assume we could be on any
> + // CPU).
> + ptr: &'a PerCpuPtr<T>,
> +}
> +
> +macro_rules! impl_ops {
> + ($ty:ty, $reg:tt) => {
> + impl DynamicPerCpu<$ty> {
> + /// Returns a [`PerCpuNumeric`] that can be used to manipulate the underlying per-CPU
> + /// variable.
> + #[inline]
> + pub fn num(&mut self) -> PerCpuNumeric<'_, $ty> {
> + // The invariant is satisfied because `DynamicPerCpu`'s invariant guarantees that
> + // this pointer is valid and initialized on all CPUs.
> + PerCpuNumeric { ptr: &self.alloc().0 }
> + }
> + }
> + impl StaticPerCpu<$ty> {
> + /// Returns a [`PerCpuNumeric`] that can be used to manipulate the underlying per-CPU
> + /// variable.
> + #[inline]
> + pub fn num(&mut self) -> PerCpuNumeric<'_, $ty> {
> + // The invariant is satisfied because `StaticPerCpu`'s invariant guarantees that
> + // this pointer is valid and initialized on all CPUs.
> + PerCpuNumeric { ptr: &self.0 }
> + }
> + }
> +
> + impl PerCpuNumeric<'_, $ty> {
> + /// Adds `rhs` to the per-CPU variable.
> + #[inline]
> + pub fn add(&mut self, rhs: $ty) {
> + // SAFETY: `self.ptr.0` is a valid offset into the per-CPU area (i.e., valid as a
> + // pointer relative to the `gs` segment register) by the invariants of this type.
> + unsafe {
> + asm!(
> + concat!("add gs:[{off}], {val:", $reg, "}"),
> + off = in(reg) self.ptr.0.cast::<$ty>(),
> + val = in(reg) rhs,
So, every user of .add() now will be only compilable against x86_64?
I don't think it's right. Can you make it in a more convenient way:
implement a generic version, and then an x86_64-optimized.
How bad the generic x86_64 version looks comparing to the optimized
one?
Thanks,
Yury
> + );
> + }
> + }
> + }
> + impl PerCpuNumeric<'_, $ty> {
> + /// Subtracts `rhs` from the per-CPU variable.
> + #[inline]
> + pub fn sub(&mut self, rhs: $ty) {
> + // SAFETY: `self.ptr.0` is a valid offset into the per-CPU area (i.e., valid as a
> + // pointer relative to the `gs` segment register) by the invariants of this type.
> + unsafe {
> + asm!(
> + concat!("sub gs:[{off}], {val:", $reg, "}"),
> + off = in(reg) self.ptr.0.cast::<$ty>(),
> + val = in(reg) rhs,
> + );
> + }
> + }
> + }
> + };
> +}
> +
> +macro_rules! impl_ops_byte {
> + ($ty:ty) => {
> + impl DynamicPerCpu<$ty> {
> + /// Returns a [`PerCpuNumeric`] that can be used to manipulate the underlying per-CPU
> + /// variable.
> + #[inline]
> + pub fn num(&mut self) -> PerCpuNumeric<'_, $ty> {
> + // The invariant is satisfied because `DynamicPerCpu`'s invariant guarantees that
> + // this pointer is valid and initialized on all CPUs.
> + PerCpuNumeric { ptr: &self.alloc().0 }
> + }
> + }
> + impl StaticPerCpu<$ty> {
> + /// Returns a [`PerCpuNumeric`] that can be used to manipulate the underlying per-CPU
> + /// variable.
> + #[inline]
> + pub fn num(&mut self) -> PerCpuNumeric<'_, $ty> {
> + // The invariant is satisfied because `StaticPerCpu`'s invariant guarantees that
> + // this pointer is valid and initialized on all CPUs.
> + PerCpuNumeric { ptr: &self.0 }
> + }
> + }
> +
> + impl PerCpuNumeric<'_, $ty> {
> + /// Adds `rhs` to the per-CPU variable.
> + #[inline]
> + pub fn add(&mut self, rhs: $ty) {
> + // SAFETY: `self.ptr.0` is a valid offset into the per-CPU area (i.e., valid as a
> + // pointer relative to the `gs` segment register) by the invariants of this type.
> + unsafe {
> + asm!(
> + "add gs:[{off}], {val}",
> + off = in(reg) self.ptr.0.cast::<$ty>(),
> + val = in(reg_byte) rhs,
> + );
> + }
> + }
> + }
> + impl PerCpuNumeric<'_, $ty> {
> + /// Subtracts `rhs` from the per-CPU variable.
> + #[inline]
> + pub fn sub(&mut self, rhs: $ty) {
> + // SAFETY: `self.ptr.0` is a valid offset into the per-CPU area (i.e., valid as a
> + // pointer relative to the `gs` segment register) by the invariants of this type.
> + unsafe {
> + asm!(
> + "sub gs:[{off}], {val}",
> + off = in(reg) self.ptr.0.cast::<$ty>(),
> + val = in(reg_byte) rhs,
> + );
> + }
> + }
> + }
> + };
> +}
> +
> +impl_ops_byte!(i8);
> +impl_ops!(i16, "x");
> +impl_ops!(i32, "e");
> +impl_ops!(i64, "r");
> +impl_ops!(isize, "r");
> +
> +impl_ops_byte!(u8);
> +impl_ops!(u16, "x");
> +impl_ops!(u32, "e");
> +impl_ops!(u64, "r");
> +impl_ops!(usize, "r");
> diff --git a/samples/rust/rust_percpu.rs b/samples/rust/rust_percpu.rs
> index 5adb30509bd4..90f5debd3c7a 100644
> --- a/samples/rust/rust_percpu.rs
> +++ b/samples/rust/rust_percpu.rs
> @@ -28,6 +28,26 @@
> define_per_cpu!(UPERCPU: u64 = 0);
> define_per_cpu!(CHECKED: RefCell<u64> = RefCell::new(0));
>
> +macro_rules! make_optimization_test {
> + ($ty:ty) => {
> + let mut test: DynamicPerCpu<$ty> = DynamicPerCpu::new_zero(GFP_KERNEL).unwrap();
> + {
> + let _guard = CpuGuard::new();
> + // SAFETY: No other usage of `test`
> + unsafe { test.get_mut(CpuGuard::new()) }.with(|val: &mut $ty| *val = 10);
> + test.num().add(1);
> + // SAFETY: No other usage of `test`
> + unsafe { test.get_mut(CpuGuard::new()) }.with(|val: &mut $ty| assert_eq!(*val, 11));
> + test.num().add(10);
> + // SAFETY: No other usage of `test`
> + unsafe { test.get_mut(CpuGuard::new()) }.with(|val: &mut $ty| assert_eq!(*val, 21));
> + test.num().sub(5);
> + // SAFETY: No other usage of `test`
> + unsafe { test.get_mut(CpuGuard::new()) }.with(|val: &mut $ty| assert_eq!(*val, 16));
> + }
> + };
> +}
> +
> impl kernel::Module for PerCpuMod {
> fn init(_module: &'static ThisModule) -> Result<Self, Error> {
> pr_info!("rust percpu test start\n");
> @@ -228,6 +248,22 @@ fn init(_module: &'static ThisModule) -> Result<Self, Error> {
>
> pr_info!("rust dynamic percpu test done\n");
>
> + pr_info!("rust numeric optimizations test start\n");
> +
> + make_optimization_test!(u8);
> + make_optimization_test!(u16);
> + make_optimization_test!(u32);
> + make_optimization_test!(u64);
> + make_optimization_test!(usize);
> +
> + make_optimization_test!(i8);
> + make_optimization_test!(i16);
> + make_optimization_test!(i32);
> + make_optimization_test!(i64);
> + make_optimization_test!(isize);
> +
> + pr_info!("rust numeric optimizations test done\n");
> +
> // Return Err to unload the module
> Result::Err(EINVAL)
> }
>
> --
> 2.34.1
next prev parent reply other threads:[~2026-04-11 3:06 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-10 21:35 [PATCH v5 0/8] rust: Add Per-CPU Variable API Mitchell Levy
2026-04-10 21:35 ` [PATCH v5 1/8] rust: cpumask: Add a `Cpumask` iterator Mitchell Levy
2026-04-10 21:35 ` [PATCH v5 2/8] rust: cpumask: Add getters for globally defined cpumasks Mitchell Levy
2026-04-10 21:35 ` [PATCH v5 3/8] rust: percpu: Add C bindings for per-CPU variable API Mitchell Levy
2026-04-10 21:35 ` [PATCH v5 4/8] rust: percpu: introduce a rust API for static per-CPU variables Mitchell Levy
2026-04-10 21:35 ` [PATCH v5 5/8] rust: percpu: introduce a rust API for dynamic " Mitchell Levy
2026-04-10 21:35 ` [PATCH v5 6/8] rust: percpu: add a rust per-CPU variable sample Mitchell Levy
2026-04-10 21:35 ` [PATCH v5 7/8] rust: percpu: Add pin-hole optimizations for numerics Mitchell Levy
2026-04-11 3:06 ` Yury Norov [this message]
2026-04-10 21:35 ` [PATCH v5 8/8] rust: percpu: cache per-CPU pointers in the dynamic case Mitchell Levy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=adm6rmerdgVtclo5@yury \
--to=ynorov@nvidia.com \
--cc=a.hindborg@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=alex.gaynor@gmail.com \
--cc=aliceryhl@google.com \
--cc=apais@linux.microsoft.com \
--cc=bjorn3_gh@protonmail.com \
--cc=boqun@kernel.org \
--cc=cl@linux.com \
--cc=code@tyhicks.com \
--cc=dakr@kernel.org \
--cc=dennis@kernel.org \
--cc=gary@garyguo.net \
--cc=levymitchell0@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lossin@kernel.org \
--cc=ojeda@kernel.org \
--cc=rust-for-linux@vger.kernel.org \
--cc=tj@kernel.org \
--cc=tmgross@umich.edu \
--cc=viresh.kumar@linaro.org \
--cc=yury.norov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox