Re: [PATCH] arm64: remove HAVE_CMPXCHG_LOCAL

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Jisheng Zhang <jszhang@kernel.org>
To: Dev Jain <dev.jain@arm.com>
Cc: Will Deacon <will@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Dennis Zhou <dennis@kernel.org>, Tejun Heo <tj@kernel.org>,
	Christoph Lameter <cl@gentwo.org>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, maz@kernel.org
Subject: Re: [PATCH] arm64: remove HAVE_CMPXCHG_LOCAL
Date: Fri, 20 Feb 2026 14:14:23 +0800	[thread overview]
Message-ID: <aZf7v4Ix1ZWCXxPu@xhacker> (raw)
In-Reply-To: <89606308-3c03-4dcf-a89d-479258b710e4@arm.com>

On Mon, Feb 16, 2026 at 08:59:17PM +0530, Dev Jain wrote:
> 
> On 16/02/26 4:30 pm, Will Deacon wrote:
> > On Sun, Feb 15, 2026 at 11:39:44AM +0800, Jisheng Zhang wrote:
> >> It turns out the generic disable/enable irq this_cpu_cmpxchg
> >> implementation is faster than LL/SC or lse implementation. Remove
> >> HAVE_CMPXCHG_LOCAL for better performance on arm64.
> >>
> >> Tested on Quad 1.9GHZ CA55 platform:
> >> average mod_node_page_state() cost decreases from 167ns to 103ns
> >> the spawn (30 duration) benchmark in unixbench is improved
> >> from 147494 lps to 150561 lps, improved by 2.1%
> >>
> >> Tested on Quad 2.1GHZ CA73 platform:
> >> average mod_node_page_state() cost decreases from 113ns to 85ns
> >> the spawn (30 duration) benchmark in unixbench is improved
> >> from 209844 lps to 212581 lps, improved by 1.3%
> >>
> >> Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> >> ---
> >>  arch/arm64/Kconfig              |  1 -
> >>  arch/arm64/include/asm/percpu.h | 24 ------------------------
> >>  2 files changed, 25 deletions(-)
> > That is _entirely_ dependent on the system, so this isn't the right
> > approach. I also don't think it's something we particularly want to
> > micro-optimise to accomodate systems that suck at atomics.

Hi Will,

I read this as an implication that the cmpxchg_local version is better
than generic disable/enable irq version on the newer arm64 systems. Is my
understanding correct?

> 
> Hi Will,
> 
> As I mention in the other email, the suspect is not the atomics, but
> preempt_disable(). On Apple M3, the regression reported in [1] resolves
> by removing preempt_disable/enable in _pcp_protect_return. To prove
> this another way, I disabled CONFIG_ARM64_HAS_LSE_ATOMICS and the
> regression worsened, indicating that at least on Apple M3 the
> atomics are faster.
> 
> It may help to confirm this hypothesis on other hardware - perhaps
> Jisheng can test with this change on his hardware and confirm
> whether he gets the same performance improvement.

Hi Dev,

Thanks for the hints. I tried to remove the preempt_disable/enable from
_pcp_protect_return, it improves, but the HAVE_CMPXCHG_LOCAL version is
still worse than generic disable/enable irq version on CA55 and CA73.

> 
> By coincidence, Yang Shi has been discussing the this_cpu_* overhead
> at [2].
> 
> [1] https://lore.kernel.org/all/1052a452-9ba3-4da7-be47-7d27d27b3d1d@arm.com/
> [2] https://lore.kernel.org/all/CAHbLzkpcN-T8MH6=W3jCxcFj1gVZp8fRqe231yzZT-rV_E_org@mail.gmail.com/
> 
> >
> > Will
> >
> >> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> >> index 38dba5f7e4d2..5e7e2e65d5a5 100644
> >> --- a/arch/arm64/Kconfig
> >> +++ b/arch/arm64/Kconfig
> >> @@ -205,7 +205,6 @@ config ARM64
> >>  	select HAVE_EBPF_JIT
> >>  	select HAVE_C_RECORDMCOUNT
> >>  	select HAVE_CMPXCHG_DOUBLE
> >> -	select HAVE_CMPXCHG_LOCAL
> >>  	select HAVE_CONTEXT_TRACKING_USER
> >>  	select HAVE_DEBUG_KMEMLEAK
> >>  	select HAVE_DMA_CONTIGUOUS
> >> diff --git a/arch/arm64/include/asm/percpu.h b/arch/arm64/include/asm/percpu.h
> >> index b57b2bb00967..70ffe566cb4b 100644
> >> --- a/arch/arm64/include/asm/percpu.h
> >> +++ b/arch/arm64/include/asm/percpu.h
> >> @@ -232,30 +232,6 @@ PERCPU_RET_OP(add, add, ldadd)
> >>  #define this_cpu_xchg_8(pcp, val)	\
> >>  	_pcp_protect_return(xchg_relaxed, pcp, val)
> >>  
> >> -#define this_cpu_cmpxchg_1(pcp, o, n)	\
> >> -	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> >> -#define this_cpu_cmpxchg_2(pcp, o, n)	\
> >> -	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> >> -#define this_cpu_cmpxchg_4(pcp, o, n)	\
> >> -	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> >> -#define this_cpu_cmpxchg_8(pcp, o, n)	\
> >> -	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> >> -
> >> -#define this_cpu_cmpxchg64(pcp, o, n)	this_cpu_cmpxchg_8(pcp, o, n)
> >> -
> >> -#define this_cpu_cmpxchg128(pcp, o, n)					\
> >> -({									\
> >> -	typedef typeof(pcp) pcp_op_T__;					\
> >> -	u128 old__, new__, ret__;					\
> >> -	pcp_op_T__ *ptr__;						\
> >> -	old__ = o;							\
> >> -	new__ = n;							\
> >> -	preempt_disable_notrace();					\
> >> -	ptr__ = raw_cpu_ptr(&(pcp));					\
> >> -	ret__ = cmpxchg128_local((void *)ptr__, old__, new__);		\
> >> -	preempt_enable_notrace();					\
> >> -	ret__;								\
> >> -})
> >>  
> >>  #ifdef __KVM_NVHE_HYPERVISOR__
> >>  extern unsigned long __hyp_per_cpu_offset(unsigned int cpu);
> >> -- 
> >> 2.51.0
> >>

next prev parent reply	other threads:[~2026-02-20  6:33 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-15  3:39 Jisheng Zhang
2026-02-16 10:59 ` Dev Jain
2026-02-16 11:00 ` Will Deacon
2026-02-16 15:29   ` Dev Jain
2026-02-17 13:53     ` Catalin Marinas
2026-02-17 15:00       ` Will Deacon
2026-02-17 16:48         ` Catalin Marinas
2026-02-18  4:01           ` K Prateek Nayak
2026-02-18  9:29             ` Catalin Marinas
2026-02-17 17:19     ` Christoph Lameter (Ampere)
2026-02-23  9:19       ` Heiko Carstens
2026-02-20  6:14     ` Jisheng Zhang [this message]
2026-02-18 22:07 ` Shakeel Butt
2026-02-20  6:20   ` Jisheng Zhang
2026-02-20 23:27     ` Shakeel Butt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aZf7v4Ix1ZWCXxPu@xhacker \
    --to=jszhang@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=cl@gentwo.org \
    --cc=dennis@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=maz@kernel.org \
    --cc=tj@kernel.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox