linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yury Norov <ynorov@nvidia.com>
To: Mitchell Levy <levymitchell0@gmail.com>
Cc: "Miguel Ojeda" <ojeda@kernel.org>,
	"Alex Gaynor" <alex.gaynor@gmail.com>,
	"Gary Guo" <gary@garyguo.net>,
	"Björn Roy Baron" <bjorn3_gh@protonmail.com>,
	"Andreas Hindborg" <a.hindborg@kernel.org>,
	"Alice Ryhl" <aliceryhl@google.com>,
	"Trevor Gross" <tmgross@umich.edu>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Dennis Zhou" <dennis@kernel.org>, "Tejun Heo" <tj@kernel.org>,
	"Christoph Lameter" <cl@linux.com>,
	"Danilo Krummrich" <dakr@kernel.org>,
	"Benno Lossin" <lossin@kernel.org>,
	"Yury Norov" <yury.norov@gmail.com>,
	"Viresh Kumar" <viresh.kumar@linaro.org>,
	"Boqun Feng" <boqun@kernel.org>, "Tyler Hicks" <code@tyhicks.com>,
	"Allen Pais" <apais@linux.microsoft.com>,
	linux-kernel@vger.kernel.org, rust-for-linux@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [PATCH v5 7/8] rust: percpu: Add pin-hole optimizations for numerics
Date: Wed, 15 Apr 2026 17:57:36 -0400	[thread overview]
Message-ID: <aeAJ0Hja36TPWDS7@yury> (raw)
In-Reply-To: <ad_2VOc2hjhTYMQk@Cyndaquil.localdomain>

On Wed, Apr 15, 2026 at 01:34:28PM -0700, Mitchell Levy wrote:
> On Fri, Apr 10, 2026 at 11:06:22PM -0400, Yury Norov wrote:
> > On Fri, Apr 10, 2026 at 02:35:37PM -0700, Mitchell Levy wrote:
> > > The C implementations of `this_cpu_add`, `this_cpu_sub`, etc., are
> > > optimized to save an instruction by avoiding having to compute
> > > `this_cpu_ptr(&x)` for some per-CPU variable `x`. For example, rather
> > > than
> > > 
> > >     u64 *x_ptr = this_cpu_ptr(&x);
> > >     *x_ptr += 5;
> > > 
> > > the implementation of `this_cpu_add` is clever enough to make use of the
> > > fact that per-CPU variables are implemented on x86 via segment
> > > registers, and so we can use only a single instruction (where we assume
> > > `&x` is already in `rax`)
> > > 
> > >     add gs:[rax], 5
> > > 
> > > Add this optimization via a `PerCpuNumeric` type to enable code-reuse
> > > between `DynamicPerCpu` and `StaticPerCpu`.
> > > 
> > > Signed-off-by: Mitchell Levy <levymitchell0@gmail.com>
> > > ---
> > >  rust/kernel/percpu.rs         |   1 +
> > >  rust/kernel/percpu/dynamic.rs |  10 ++-
> > >  rust/kernel/percpu/numeric.rs | 138 ++++++++++++++++++++++++++++++++++++++++++
> > >  samples/rust/rust_percpu.rs   |  36 +++++++++++
> > >  4 files changed, 184 insertions(+), 1 deletion(-)
> > > 

...

> > > +        impl PerCpuNumeric<'_, $ty> {
> > > +            /// Adds `rhs` to the per-CPU variable.
> > > +            #[inline]
> > > +            pub fn add(&mut self, rhs: $ty) {
> > > +                // SAFETY: `self.ptr.0` is a valid offset into the per-CPU area (i.e., valid as a
> > > +                // pointer relative to the `gs` segment register) by the invariants of this type.
> > > +                unsafe {
> > > +                    asm!(
> > > +                        concat!("add gs:[{off}], {val:", $reg, "}"),
> > > +                        off = in(reg) self.ptr.0.cast::<$ty>(),
> > > +                        val = in(reg) rhs,
> > 
> > So, every user of .add() now will be only compilable against x86_64? 
> > I don't think it's right. Can you make it in a more convenient way:
> > implement a generic version, and then an x86_64-optimized.
> > 
> > How bad the generic x86_64 version looks comparing to the optimized
> > one?
> 
> Currently, all of `mod percpu` is behind `#[cfg(X86_64)]`, so usage of
> per-CPU variables in general is only compatible against x86_64.

Yes, and seemingly for no good reason. The only assembler function in
that patch (#4) is the get_ptr(), and to me it's quite easy to get it
implemented it in C.
 
> I believe a generic implementation would require implicitly creating a
> `CpuGuard` since in general you require two steps: computing the pointer
> to the per-CPU variable's slot in the current CPU's area and actually
> doing the write. On x86_64 we can get around this because segment
> register relative writes let us combine these two ops into one
> instruction which can't be torn across CPUs. But in the general case you
> could have the task get preempted between those two operations and end
> up with a data race.

Yes, you need CpuGuard to protect the .add(), and there's nothing
wrong with that. Especially if you provide better alternative for
the x86.

> As I understand it, x86 is the only arch where this is possible, so even
> once `mod percpu` supports more architectures, I think it'd still make
> some sense to have `PerCpuNumeric` specifically be x86 exclusive. This
> means that the user must always explicitly disable preemption rather
> than having a `PerCpuNumeric` type that sometimes does and sometimes
> doesn't.

GS register is the x86-only feature, but per-CPU variables are not. In
the mother kernel, we implement them for all architectures. For x86,
we've got the effective arch implementation where needed. But the
generic one is the first and foremost. Refer raw_cpu_ptr() for
example.

Have you checked performance difference of generic vs arch per-CPU
API on your workload? Is there any measurable difference? If not, I'd
rather focus on the generic version.

If you prefer the arch one, it's OK, after all. But can you please put
it under rust/arch/x86_64 please? And your driver, correspondingly,
would be hosted there as well.

Looking in the rust codebase, I see, it uses assembler inlines for
things like compile-time barriers, labels, sections. Those are
features of assembler language, a generic thing.

You're the first trying to add something really arch-specific, unless
I overlooked something. So please, either stay (happy) on the generic
ground, or create all the required infrastructure for architectures,
like the mother kernel does. Particularly, such a basic thing like
per-CPU API should be available for all architectures.

Thanks,
Yury


  reply	other threads:[~2026-04-15 21:57 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-10 21:35 [PATCH v5 0/8] rust: Add Per-CPU Variable API Mitchell Levy
2026-04-10 21:35 ` [PATCH v5 1/8] rust: cpumask: Add a `Cpumask` iterator Mitchell Levy
2026-04-10 21:35 ` [PATCH v5 2/8] rust: cpumask: Add getters for globally defined cpumasks Mitchell Levy
2026-04-10 21:35 ` [PATCH v5 3/8] rust: percpu: Add C bindings for per-CPU variable API Mitchell Levy
2026-04-10 21:35 ` [PATCH v5 4/8] rust: percpu: introduce a rust API for static per-CPU variables Mitchell Levy
2026-04-10 21:35 ` [PATCH v5 5/8] rust: percpu: introduce a rust API for dynamic " Mitchell Levy
2026-04-10 21:35 ` [PATCH v5 6/8] rust: percpu: add a rust per-CPU variable sample Mitchell Levy
2026-04-10 21:35 ` [PATCH v5 7/8] rust: percpu: Add pin-hole optimizations for numerics Mitchell Levy
2026-04-11  3:06   ` Yury Norov
2026-04-15 20:34     ` Mitchell Levy
2026-04-15 21:57       ` Yury Norov [this message]
2026-04-10 21:35 ` [PATCH v5 8/8] rust: percpu: cache per-CPU pointers in the dynamic case Mitchell Levy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aeAJ0Hja36TPWDS7@yury \
    --to=ynorov@nvidia.com \
    --cc=a.hindborg@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=alex.gaynor@gmail.com \
    --cc=aliceryhl@google.com \
    --cc=apais@linux.microsoft.com \
    --cc=bjorn3_gh@protonmail.com \
    --cc=boqun@kernel.org \
    --cc=cl@linux.com \
    --cc=code@tyhicks.com \
    --cc=dakr@kernel.org \
    --cc=dennis@kernel.org \
    --cc=gary@garyguo.net \
    --cc=levymitchell0@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lossin@kernel.org \
    --cc=ojeda@kernel.org \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=tmgross@umich.edu \
    --cc=viresh.kumar@linaro.org \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox