Re: [LSF/MM/BPF TOPIC] Improve this_cpu_ops performance for ARM64 (and potentially other architectures)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Yang Shi <shy828301@gmail.com>
To: Tejun Heo <tj@kernel.org>
Cc: lsf-pc@lists.linux-foundation.org, Linux MM <linux-mm@kvack.org>,
	 "Christoph Lameter (Ampere)" <cl@gentwo.org>,
	dennis@kernel.org, urezki@gmail.com,
	 Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	 Ryan Roberts <ryan.roberts@arm.com>,
	Yang Shi <yang@os.amperecomputing.com>
Subject: Re: [LSF/MM/BPF TOPIC] Improve this_cpu_ops performance for ARM64 (and potentially other architectures)
Date: Wed, 11 Feb 2026 15:58:50 -0800	[thread overview]
Message-ID: <CAHbLzkoE0UQLZSKQJttv1_XGT-6HPKdj5o7aYnpuiXEyvbAHxA@mail.gmail.com> (raw)
In-Reply-To: <aY0Q7QhGDKwKk2ie@slm.duckdns.org>

On Wed, Feb 11, 2026 at 3:29 PM Tejun Heo <tj@kernel.org> wrote:
>
> Hello,
>
> On Wed, Feb 11, 2026 at 03:14:57PM -0800, Yang Shi wrote:
> ...
> > Overhead
> > ========
> > 1. Some extra virtual memory space. But it shouldn’t be too much. I
> > saw 960K with Fedora default kernel config. Given terabytes virtual
> > memory space on 64 bit machine, 960K is negligible.
> > 2. Some extra physical memory for percpu kernel page table. 4K *
> > (nr_cpus – 1) for PGD pages, plus the page tables used by percpu local
> > mapping area. A couple of megabytes with Fedora default kernel config
> > on AmpereOne with 160 cores.
> > 3. Percpu allocation and free will be slower due to extra virtual
> > memory allocation and page table manipulation. However, percpu is
> > allocated by chunk. One chunk typically holds a lot percpu variables.
> > So the slowdown should be negligible. The test result below also
> > proved it.
>
> It will also add a bit of TLB pressure as a lot of percpu allocations are
> currently embedded in the linear address space backed by large page
> mappings. Likely immaterial compared to the reduced overhead of
> this_cpu_*().

Yes, this should be not noticeable. This can be optimized further by
using cont PTEs on ARM64 if it turns out to be a problem. The percpu
area is typically larger than 64K (cont PTE size with 4K page size on
arm64).

And linear address space may be not backed by large page mappins on
ARM64. If rodata=on (the default, it was called "full" before) and the
machines don't support BBML2_NOABORT, linear address space is backed
by PTEs.

>
> One property that this breaks is per_cpu_ptr() of a given CPU disagreeing
> with this_cpu_ptr(). e.g. If there are users that take this_cpu_ptr() and
> uses that outside preempt disable block (which is a bit odd but allowed),
> the end result would be surprising. Hmm... I wonder whether it'd be
> worthwhile to keep this_cpu_ptr() returning the global address - ie. make it
> access global offset from local mapping and then return the computed global
> address. This should still be pretty cheap and gets rid of surprising and
> potentially extremely subtle corner cases.

Yes, this is going to be a problem. So we don't change how
this_cpu_ptr() works and keep it returning the global address. Because
I noticed this may cause confusion for list APIs too. For example,
when initializing a list embedded into a percpu variable, the ->next
and ->prev will be initialized to global addresses by using
per_cpu_ptr(), but if the list is accessed via this_cpu_ptr(), list
head will be dereferenced by using local address, then list_empty()
will complain, which compare the list head pointer and ->next pointer.
This will cause some problems.

So we just use the local address for this_cpu_add/sub/inc/dec and so
on, which just manipulate a scalar counter.

>
> Generally sounds like a great solution for !x86.

Thank you.

Yang

>
> Thanks.
>
> --
> tejun

next prev parent reply	other threads:[~2026-02-11 23:59 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-11 23:14 Yang Shi
2026-02-11 23:29 ` Tejun Heo
2026-02-11 23:39   ` Christoph Lameter (Ampere)
2026-02-11 23:40     ` Tejun Heo
2026-02-12  0:05       ` Christoph Lameter (Ampere)
2026-02-11 23:58   ` Yang Shi [this message]
2026-02-12 17:54     ` Catalin Marinas
2026-02-12 18:43       ` Catalin Marinas
2026-02-13  0:23         ` Yang Shi
2026-02-12 18:45       ` Ryan Roberts
2026-02-12 19:36         ` Catalin Marinas
2026-02-12 21:12           ` Ryan Roberts
2026-02-16 10:37             ` Catalin Marinas
2026-02-18  8:59               ` Ryan Roberts
2026-02-12 18:41 ` Ryan Roberts
2026-02-12 18:55   ` Christoph Lameter (Ampere)
2026-02-12 18:58     ` Ryan Roberts
2026-02-13 18:42   ` Yang Shi
2026-02-16 11:39     ` Catalin Marinas
2026-02-17 17:28       ` Christoph Lameter (Ampere)
2026-02-18  9:18         ` Ryan Roberts

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHbLzkoE0UQLZSKQJttv1_XGT-6HPKdj5o7aYnpuiXEyvbAHxA@mail.gmail.com \
    --to=shy828301@gmail.com \
    --cc=catalin.marinas@arm.com \
    --cc=cl@gentwo.org \
    --cc=dennis@kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=ryan.roberts@arm.com \
    --cc=tj@kernel.org \
    --cc=urezki@gmail.com \
    --cc=will@kernel.org \
    --cc=yang@os.amperecomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox