Re: [LSF/MM/BPF TOPIC] Improve this_cpu_ops performance for ARM64 (and potentially other architectures)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Catalin Marinas <catalin.marinas@arm.com>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <shy828301@gmail.com>, Tejun Heo <tj@kernel.org>,
	lsf-pc@lists.linux-foundation.org, Linux MM <linux-mm@kvack.org>,
	"Christoph Lameter (Ampere)" <cl@gentwo.org>,
	dennis@kernel.org, urezki@gmail.com,
	Will Deacon <will@kernel.org>,
	Yang Shi <yang@os.amperecomputing.com>
Subject: Re: [LSF/MM/BPF TOPIC] Improve this_cpu_ops performance for ARM64 (and potentially other architectures)
Date: Thu, 12 Feb 2026 19:36:50 +0000	[thread overview]
Message-ID: <aY4r0lL9oGE7tzJW@arm.com> (raw)
In-Reply-To: <82420c8c-d7b0-4ebf-870f-a6061fa4428f@arm.com>

On Thu, Feb 12, 2026 at 06:45:19PM +0000, Ryan Roberts wrote:
> On 12/02/2026 17:54, Catalin Marinas wrote:
> > On Wed, Feb 11, 2026 at 03:58:50PM -0800, Yang Shi wrote:
> >> On Wed, Feb 11, 2026 at 3:29 PM Tejun Heo <tj@kernel.org> wrote:
> >>> On Wed, Feb 11, 2026 at 03:14:57PM -0800, Yang Shi wrote:
> >>> ...
> >>>> Overhead
> >>>> ========
> >>>> 1. Some extra virtual memory space. But it shouldn’t be too much. I
> >>>> saw 960K with Fedora default kernel config. Given terabytes virtual
> >>>> memory space on 64 bit machine, 960K is negligible.
> >>>> 2. Some extra physical memory for percpu kernel page table. 4K *
> >>>> (nr_cpus – 1) for PGD pages, plus the page tables used by percpu local
> >>>> mapping area. A couple of megabytes with Fedora default kernel config
> >>>> on AmpereOne with 160 cores.
> >>>> 3. Percpu allocation and free will be slower due to extra virtual
> >>>> memory allocation and page table manipulation. However, percpu is
> >>>> allocated by chunk. One chunk typically holds a lot percpu variables.
> >>>> So the slowdown should be negligible. The test result below also
> >>>> proved it.
> > [...]
> >>> One property that this breaks is per_cpu_ptr() of a given CPU disagreeing
> >>> with this_cpu_ptr(). e.g. If there are users that take this_cpu_ptr() and
> >>> uses that outside preempt disable block (which is a bit odd but allowed),
> >>> the end result would be surprising. Hmm... I wonder whether it'd be
> >>> worthwhile to keep this_cpu_ptr() returning the global address - ie. make it
> >>> access global offset from local mapping and then return the computed global
> >>> address. This should still be pretty cheap and gets rid of surprising and
> >>> potentially extremely subtle corner cases.
> >>
> >> Yes, this is going to be a problem. So we don't change how
> >> this_cpu_ptr() works and keep it returning the global address. Because
> >> I noticed this may cause confusion for list APIs too. For example,
> >> when initializing a list embedded into a percpu variable, the ->next
> >> and ->prev will be initialized to global addresses by using
> >> per_cpu_ptr(), but if the list is accessed via this_cpu_ptr(), list
> >> head will be dereferenced by using local address, then list_empty()
> >> will complain, which compare the list head pointer and ->next pointer.
> >> This will cause some problems.
> >>
> >> So we just use the local address for this_cpu_add/sub/inc/dec and so
> >> on, which just manipulate a scalar counter.
> > 
> > I wonder how much overhead is caused by calling into the scheduler on
> > preempt_enable(). It would be good to get some numbers for something
> > like the patch below (also removing the preempt disabling for
> > this_cpu_read() as I don't think it matters - a thread cannot
> > distinguish whether it was preempted between TPIDR read and variable
> > read or immediately after the variable read; we can't do this for writes
> > as other threads may notice unexpected updates).
> > 
> > Another wild hack could be to read the kernel instruction at
> > (current_pt_regs()->pc - 4) in arch_irqentry_exit_need_resched() and
> > return false if it's a read from TPIDR_EL1/2, together with removing the
> > preempt disabling. Or some other lighter way of detecting this_cpu_*
> > constructs without full preemption disabling.
> 
> Could a sort of kernel version of restartable sequences help? i.e. detect
> preemption instead of preventing it?

Yes, in principle that's what we'd need but it's too expensive to check,
especially as those accessors are inlined.

For the write variants with LL/SC, we can check the TPIDR_EL2 again
between the LDXR and STXR and bail out if it's different from the one
read outside the loop. An interrupt would clear the exclusive monitor
anyway and STXR fail. This won't work for the theoretical
this_cpu_read() case.

-- 
Catalin

next prev parent reply	other threads:[~2026-02-12 19:36 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-11 23:14 Yang Shi
2026-02-11 23:29 ` Tejun Heo
2026-02-11 23:39   ` Christoph Lameter (Ampere)
2026-02-11 23:40     ` Tejun Heo
2026-02-12  0:05       ` Christoph Lameter (Ampere)
2026-02-11 23:58   ` Yang Shi
2026-02-12 17:54     ` Catalin Marinas
2026-02-12 18:43       ` Catalin Marinas
2026-02-13  0:23         ` Yang Shi
2026-02-12 18:45       ` Ryan Roberts
2026-02-12 19:36         ` Catalin Marinas [this message]
2026-02-12 21:12           ` Ryan Roberts
2026-02-16 10:37             ` Catalin Marinas
2026-02-18  8:59               ` Ryan Roberts
2026-02-12 18:41 ` Ryan Roberts
2026-02-12 18:55   ` Christoph Lameter (Ampere)
2026-02-12 18:58     ` Ryan Roberts
2026-02-13 18:42   ` Yang Shi
2026-02-16 11:39     ` Catalin Marinas
2026-02-17 17:28       ` Christoph Lameter (Ampere)
2026-02-18  9:18         ` Ryan Roberts

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aY4r0lL9oGE7tzJW@arm.com \
    --to=catalin.marinas@arm.com \
    --cc=cl@gentwo.org \
    --cc=dennis@kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=ryan.roberts@arm.com \
    --cc=shy828301@gmail.com \
    --cc=tj@kernel.org \
    --cc=urezki@gmail.com \
    --cc=will@kernel.org \
    --cc=yang@os.amperecomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox