From: Ryan Roberts <ryan.roberts@arm.com>
To: Yang Shi <shy828301@gmail.com>,
lsf-pc@lists.linux-foundation.org, Linux MM <linux-mm@kvack.org>,
"Christoph Lameter (Ampere)" <cl@gentwo.org>,
dennis@kernel.org, Tejun Heo <tj@kernel.org>,
urezki@gmail.com, Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>
Cc: Yang Shi <yang@os.amperecomputing.com>
Subject: Re: [LSF/MM/BPF TOPIC] Improve this_cpu_ops performance for ARM64 (and potentially other architectures)
Date: Thu, 12 Feb 2026 18:41:57 +0000 [thread overview]
Message-ID: <5a648f49-97b1-4195-a825-47f3261225eb@arm.com> (raw)
In-Reply-To: <CAHbLzkpcN-T8MH6=W3jCxcFj1gVZp8fRqe231yzZT-rV_E_org@mail.gmail.com>
On 11/02/2026 23:14, Yang Shi wrote:
> Background
> =========
> The APIs using this_cpu_*() operate on a local copy of a percpu
> variable for the current processor. In order to obtain the address of
> this cpu specific variable a cpu specific offset has to be added to
> the address.
> On x86 this address calculation can be created by prefixing an
> instruction with a segment register. x86 can increment a percpu
> counter with a single instruction. Since the address calculation and
> the RMV operation occurs within one instruction it is atomic vs the
> scheduler. So no preemption is needed.
> f.e
> INC %gs:[my_counter]
> See https://www.kernel.org/doc/Documentation/this_cpu_ops.txt for more details.
>
> ARM64 and some other non-x86 architectures don't have a segment
> register. The address of the current percpu variable has to be
> calculated and then that address can be used for an operation on
> percpu data. This process must be atomic vs the scheduler. Therefore,
> it is necessary to disable preemption, perform the address calculation
> and then the increment operation. The cpu specific offset is in a MSR
> that also needs to be accessed on ARM64. The code flow looks like:
> Disable preemption
> Calculate the current CPU copy address by using the offset
> Manipulate the counter
> Enable preemption
By massive coincidence, Dev Jain and I have been investigating a large
regression seen in a munmap micro-benchmark in 6.19, which is root caused to a
change that ends up using this_cpu_*() a lot more on the path.
We have concluded that we can simplify this_cpu_read() to not bother
disabling/enabling preemption, since it is read-only and a migration between the
2 ops vs after the second op is indistinguishable. I believe Dev is planning to
post a patch to list soon. This will solve our immediate regression issue.
But we can't do the same trick for ops that write. See [1].
[1] https://lore.kernel.org/all/20190311164837.GD24275@lakrids.cambridge.arm.com/
>
> This process is inefficient relative to x86 and has to be repeated for
> every access to per cpu data.
> ARM64 has an increment instruction but this increment does not allow
> the use of a base register or a segment register like on x86. So an
> address calculation is always necessary even if the atomic instruction
> is used.
> A page table allows us to do remapping of addresses. So if the atomic
> instruction would be using a virtual address and the page tables for
> the local processor would map this area to the local per cpu data then
> we can also create a single instruction on ARM64 (hopefully for some
> other non-x86 architectures too) and be as efficient as x86 is.
>
> So, the code flow should just become:
> INC VIRTUAL_BASE + percpu_variable_offset
>
> In order to do that we need to have the same virtual address mapped
> differently for each processor. This means we need different page
> tables for each processor. These page tables
> can map almost all of the address space in the same way. The only area
> that will be special is the area starting at VIRTUAL_BASE.
This is an interesting idea. I'm keen to be involved in discussions.
My immediate concern is that this would not be compatible with FEAT_TTCNP, which
allows multiple PEs (ARM speak for CPU) to share a TLB - e.g. for SMT. I'm not
sure if that would be the end of the world; the perf numbers below are
compelling. I'll defer to others' opions on that.
Thanks,
Ryan
next prev parent reply other threads:[~2026-02-12 18:42 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-11 23:14 Yang Shi
2026-02-11 23:29 ` Tejun Heo
2026-02-11 23:39 ` Christoph Lameter (Ampere)
2026-02-11 23:40 ` Tejun Heo
2026-02-12 0:05 ` Christoph Lameter (Ampere)
2026-02-11 23:58 ` Yang Shi
2026-02-12 17:54 ` Catalin Marinas
2026-02-12 18:43 ` Catalin Marinas
2026-02-13 0:23 ` Yang Shi
2026-02-12 18:45 ` Ryan Roberts
2026-02-12 19:36 ` Catalin Marinas
2026-02-12 21:12 ` Ryan Roberts
2026-02-16 10:37 ` Catalin Marinas
2026-02-18 8:59 ` Ryan Roberts
2026-02-12 18:41 ` Ryan Roberts [this message]
2026-02-12 18:55 ` Christoph Lameter (Ampere)
2026-02-12 18:58 ` Ryan Roberts
2026-02-13 18:42 ` Yang Shi
2026-02-16 11:39 ` Catalin Marinas
2026-02-17 17:28 ` Christoph Lameter (Ampere)
2026-02-18 9:18 ` Ryan Roberts
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5a648f49-97b1-4195-a825-47f3261225eb@arm.com \
--to=ryan.roberts@arm.com \
--cc=catalin.marinas@arm.com \
--cc=cl@gentwo.org \
--cc=dennis@kernel.org \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=shy828301@gmail.com \
--cc=tj@kernel.org \
--cc=urezki@gmail.com \
--cc=will@kernel.org \
--cc=yang@os.amperecomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox