[PATCH] arm64: remove HAVE_CMPXCHG

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] arm64: remove HAVE_CMPXCHG_LOCAL
@ 2026-02-15  3:39 Jisheng Zhang
  2026-02-16 10:59 ` Dev Jain
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Jisheng Zhang @ 2026-02-15  3:39 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Dennis Zhou, Tejun Heo, Christoph Lameter
  Cc: linux-arm-kernel, linux-kernel, linux-mm

It turns out the generic disable/enable irq this_cpu_cmpxchg
implementation is faster than LL/SC or lse implementation. Remove
HAVE_CMPXCHG_LOCAL for better performance on arm64.

Tested on Quad 1.9GHZ CA55 platform:
average mod_node_page_state() cost decreases from 167ns to 103ns
the spawn (30 duration) benchmark in unixbench is improved
from 147494 lps to 150561 lps, improved by 2.1%

Tested on Quad 2.1GHZ CA73 platform:
average mod_node_page_state() cost decreases from 113ns to 85ns
the spawn (30 duration) benchmark in unixbench is improved
from 209844 lps to 212581 lps, improved by 1.3%

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
---
 arch/arm64/Kconfig              |  1 -
 arch/arm64/include/asm/percpu.h | 24 ------------------------
 2 files changed, 25 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 38dba5f7e4d2..5e7e2e65d5a5 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -205,7 +205,6 @@ config ARM64
 	select HAVE_EBPF_JIT
 	select HAVE_C_RECORDMCOUNT
 	select HAVE_CMPXCHG_DOUBLE
-	select HAVE_CMPXCHG_LOCAL
 	select HAVE_CONTEXT_TRACKING_USER
 	select HAVE_DEBUG_KMEMLEAK
 	select HAVE_DMA_CONTIGUOUS
diff --git a/arch/arm64/include/asm/percpu.h b/arch/arm64/include/asm/percpu.h
index b57b2bb00967..70ffe566cb4b 100644
--- a/arch/arm64/include/asm/percpu.h
+++ b/arch/arm64/include/asm/percpu.h
@@ -232,30 +232,6 @@ PERCPU_RET_OP(add, add, ldadd)
 #define this_cpu_xchg_8(pcp, val)	\
 	_pcp_protect_return(xchg_relaxed, pcp, val)
 
-#define this_cpu_cmpxchg_1(pcp, o, n)	\
-	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
-#define this_cpu_cmpxchg_2(pcp, o, n)	\
-	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
-#define this_cpu_cmpxchg_4(pcp, o, n)	\
-	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
-#define this_cpu_cmpxchg_8(pcp, o, n)	\
-	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
-
-#define this_cpu_cmpxchg64(pcp, o, n)	this_cpu_cmpxchg_8(pcp, o, n)
-
-#define this_cpu_cmpxchg128(pcp, o, n)					\
-({									\
-	typedef typeof(pcp) pcp_op_T__;					\
-	u128 old__, new__, ret__;					\
-	pcp_op_T__ *ptr__;						\
-	old__ = o;							\
-	new__ = n;							\
-	preempt_disable_notrace();					\
-	ptr__ = raw_cpu_ptr(&(pcp));					\
-	ret__ = cmpxchg128_local((void *)ptr__, old__, new__);		\
-	preempt_enable_notrace();					\
-	ret__;								\
-})
 
 #ifdef __KVM_NVHE_HYPERVISOR__
 extern unsigned long __hyp_per_cpu_offset(unsigned int cpu);
-- 
2.51.0



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] arm64: remove HAVE_CMPXCHG_LOCAL
  2026-02-15  3:39 [PATCH] arm64: remove HAVE_CMPXCHG_LOCAL Jisheng Zhang
@ 2026-02-16 10:59 ` Dev Jain
  2026-02-16 11:00 ` Will Deacon
  2026-02-18 22:07 ` Shakeel Butt
  2 siblings, 0 replies; 14+ messages in thread
From: Dev Jain @ 2026-02-16 10:59 UTC (permalink / raw)
  To: Jisheng Zhang, Catalin Marinas, Will Deacon, Dennis Zhou,
	Tejun Heo, Christoph Lameter
  Cc: linux-arm-kernel, linux-kernel, linux-mm


On 15/02/26 9:09 am, Jisheng Zhang wrote:
> It turns out the generic disable/enable irq this_cpu_cmpxchg
> implementation is faster than LL/SC or lse implementation. Remove
> HAVE_CMPXCHG_LOCAL for better performance on arm64.
>
> Tested on Quad 1.9GHZ CA55 platform:
> average mod_node_page_state() cost decreases from 167ns to 103ns
> the spawn (30 duration) benchmark in unixbench is improved
> from 147494 lps to 150561 lps, improved by 2.1%
>
> Tested on Quad 2.1GHZ CA73 platform:
> average mod_node_page_state() cost decreases from 113ns to 85ns
> the spawn (30 duration) benchmark in unixbench is improved
> from 209844 lps to 212581 lps, improved by 1.3%
>
> Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> ---

Thanks. This concurs with my investigation on [1]. The problem
isn't really LL/SC/LSE but preempt_disable()/enable() in
this_cpu_* [1, 2].

I think you should only remove the selection of the config,
but keep the code? We may want to switch this on again if
the real issue gets solved.

[1] https://lore.kernel.org/all/5a6782f3-d758-4d9c-975b-5ae4b5d80d4e@arm.com/
[2] https://lore.kernel.org/all/CAHbLzkpcN-T8MH6=W3jCxcFj1gVZp8fRqe231yzZT-rV_E_org@mail.gmail.com/

>  arch/arm64/Kconfig              |  1 -
>  arch/arm64/include/asm/percpu.h | 24 ------------------------
>  2 files changed, 25 deletions(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 38dba5f7e4d2..5e7e2e65d5a5 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -205,7 +205,6 @@ config ARM64
>  	select HAVE_EBPF_JIT
>  	select HAVE_C_RECORDMCOUNT
>  	select HAVE_CMPXCHG_DOUBLE
> -	select HAVE_CMPXCHG_LOCAL
>  	select HAVE_CONTEXT_TRACKING_USER
>  	select HAVE_DEBUG_KMEMLEAK
>  	select HAVE_DMA_CONTIGUOUS
> diff --git a/arch/arm64/include/asm/percpu.h b/arch/arm64/include/asm/percpu.h
> index b57b2bb00967..70ffe566cb4b 100644
> --- a/arch/arm64/include/asm/percpu.h
> +++ b/arch/arm64/include/asm/percpu.h
> @@ -232,30 +232,6 @@ PERCPU_RET_OP(add, add, ldadd)
>  #define this_cpu_xchg_8(pcp, val)	\
>  	_pcp_protect_return(xchg_relaxed, pcp, val)
>  
> -#define this_cpu_cmpxchg_1(pcp, o, n)	\
> -	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> -#define this_cpu_cmpxchg_2(pcp, o, n)	\
> -	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> -#define this_cpu_cmpxchg_4(pcp, o, n)	\
> -	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> -#define this_cpu_cmpxchg_8(pcp, o, n)	\
> -	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> -
> -#define this_cpu_cmpxchg64(pcp, o, n)	this_cpu_cmpxchg_8(pcp, o, n)
> -
> -#define this_cpu_cmpxchg128(pcp, o, n)					\
> -({									\
> -	typedef typeof(pcp) pcp_op_T__;					\
> -	u128 old__, new__, ret__;					\
> -	pcp_op_T__ *ptr__;						\
> -	old__ = o;							\
> -	new__ = n;							\
> -	preempt_disable_notrace();					\
> -	ptr__ = raw_cpu_ptr(&(pcp));					\
> -	ret__ = cmpxchg128_local((void *)ptr__, old__, new__);		\
> -	preempt_enable_notrace();					\
> -	ret__;								\
> -})
>  
>  #ifdef __KVM_NVHE_HYPERVISOR__
>  extern unsigned long __hyp_per_cpu_offset(unsigned int cpu);


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] arm64: remove HAVE_CMPXCHG_LOCAL
  2026-02-15  3:39 [PATCH] arm64: remove HAVE_CMPXCHG_LOCAL Jisheng Zhang
  2026-02-16 10:59 ` Dev Jain
@ 2026-02-16 11:00 ` Will Deacon
  2026-02-16 15:29   ` Dev Jain
  2026-02-18 22:07 ` Shakeel Butt
  2 siblings, 1 reply; 14+ messages in thread
From: Will Deacon @ 2026-02-16 11:00 UTC (permalink / raw)
  To: Jisheng Zhang
  Cc: Catalin Marinas, Dennis Zhou, Tejun Heo, Christoph Lameter,
	linux-arm-kernel, linux-kernel, linux-mm, maz

On Sun, Feb 15, 2026 at 11:39:44AM +0800, Jisheng Zhang wrote:
> It turns out the generic disable/enable irq this_cpu_cmpxchg
> implementation is faster than LL/SC or lse implementation. Remove
> HAVE_CMPXCHG_LOCAL for better performance on arm64.
> 
> Tested on Quad 1.9GHZ CA55 platform:
> average mod_node_page_state() cost decreases from 167ns to 103ns
> the spawn (30 duration) benchmark in unixbench is improved
> from 147494 lps to 150561 lps, improved by 2.1%
> 
> Tested on Quad 2.1GHZ CA73 platform:
> average mod_node_page_state() cost decreases from 113ns to 85ns
> the spawn (30 duration) benchmark in unixbench is improved
> from 209844 lps to 212581 lps, improved by 1.3%
> 
> Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> ---
>  arch/arm64/Kconfig              |  1 -
>  arch/arm64/include/asm/percpu.h | 24 ------------------------
>  2 files changed, 25 deletions(-)

That is _entirely_ dependent on the system, so this isn't the right
approach. I also don't think it's something we particularly want to
micro-optimise to accomodate systems that suck at atomics.

Will

> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 38dba5f7e4d2..5e7e2e65d5a5 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -205,7 +205,6 @@ config ARM64
>  	select HAVE_EBPF_JIT
>  	select HAVE_C_RECORDMCOUNT
>  	select HAVE_CMPXCHG_DOUBLE
> -	select HAVE_CMPXCHG_LOCAL
>  	select HAVE_CONTEXT_TRACKING_USER
>  	select HAVE_DEBUG_KMEMLEAK
>  	select HAVE_DMA_CONTIGUOUS
> diff --git a/arch/arm64/include/asm/percpu.h b/arch/arm64/include/asm/percpu.h
> index b57b2bb00967..70ffe566cb4b 100644
> --- a/arch/arm64/include/asm/percpu.h
> +++ b/arch/arm64/include/asm/percpu.h
> @@ -232,30 +232,6 @@ PERCPU_RET_OP(add, add, ldadd)
>  #define this_cpu_xchg_8(pcp, val)	\
>  	_pcp_protect_return(xchg_relaxed, pcp, val)
>  
> -#define this_cpu_cmpxchg_1(pcp, o, n)	\
> -	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> -#define this_cpu_cmpxchg_2(pcp, o, n)	\
> -	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> -#define this_cpu_cmpxchg_4(pcp, o, n)	\
> -	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> -#define this_cpu_cmpxchg_8(pcp, o, n)	\
> -	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> -
> -#define this_cpu_cmpxchg64(pcp, o, n)	this_cpu_cmpxchg_8(pcp, o, n)
> -
> -#define this_cpu_cmpxchg128(pcp, o, n)					\
> -({									\
> -	typedef typeof(pcp) pcp_op_T__;					\
> -	u128 old__, new__, ret__;					\
> -	pcp_op_T__ *ptr__;						\
> -	old__ = o;							\
> -	new__ = n;							\
> -	preempt_disable_notrace();					\
> -	ptr__ = raw_cpu_ptr(&(pcp));					\
> -	ret__ = cmpxchg128_local((void *)ptr__, old__, new__);		\
> -	preempt_enable_notrace();					\
> -	ret__;								\
> -})
>  
>  #ifdef __KVM_NVHE_HYPERVISOR__
>  extern unsigned long __hyp_per_cpu_offset(unsigned int cpu);
> -- 
> 2.51.0
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] arm64: remove HAVE_CMPXCHG_LOCAL
  2026-02-16 11:00 ` Will Deacon
@ 2026-02-16 15:29   ` Dev Jain
  2026-02-17 13:53     ` Catalin Marinas
                       ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Dev Jain @ 2026-02-16 15:29 UTC (permalink / raw)
  To: Will Deacon, Jisheng Zhang
  Cc: Catalin Marinas, Dennis Zhou, Tejun Heo, Christoph Lameter,
	linux-arm-kernel, linux-kernel, linux-mm, maz


On 16/02/26 4:30 pm, Will Deacon wrote:
> On Sun, Feb 15, 2026 at 11:39:44AM +0800, Jisheng Zhang wrote:
>> It turns out the generic disable/enable irq this_cpu_cmpxchg
>> implementation is faster than LL/SC or lse implementation. Remove
>> HAVE_CMPXCHG_LOCAL for better performance on arm64.
>>
>> Tested on Quad 1.9GHZ CA55 platform:
>> average mod_node_page_state() cost decreases from 167ns to 103ns
>> the spawn (30 duration) benchmark in unixbench is improved
>> from 147494 lps to 150561 lps, improved by 2.1%
>>
>> Tested on Quad 2.1GHZ CA73 platform:
>> average mod_node_page_state() cost decreases from 113ns to 85ns
>> the spawn (30 duration) benchmark in unixbench is improved
>> from 209844 lps to 212581 lps, improved by 1.3%
>>
>> Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
>> ---
>>  arch/arm64/Kconfig              |  1 -
>>  arch/arm64/include/asm/percpu.h | 24 ------------------------
>>  2 files changed, 25 deletions(-)
> That is _entirely_ dependent on the system, so this isn't the right
> approach. I also don't think it's something we particularly want to
> micro-optimise to accomodate systems that suck at atomics.

Hi Will,

As I mention in the other email, the suspect is not the atomics, but
preempt_disable(). On Apple M3, the regression reported in [1] resolves
by removing preempt_disable/enable in _pcp_protect_return. To prove
this another way, I disabled CONFIG_ARM64_HAS_LSE_ATOMICS and the
regression worsened, indicating that at least on Apple M3 the
atomics are faster.

It may help to confirm this hypothesis on other hardware - perhaps
Jisheng can test with this change on his hardware and confirm
whether he gets the same performance improvement.

By coincidence, Yang Shi has been discussing the this_cpu_* overhead
at [2].

[1] https://lore.kernel.org/all/1052a452-9ba3-4da7-be47-7d27d27b3d1d@arm.com/
[2] https://lore.kernel.org/all/CAHbLzkpcN-T8MH6=W3jCxcFj1gVZp8fRqe231yzZT-rV_E_org@mail.gmail.com/

>
> Will
>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 38dba5f7e4d2..5e7e2e65d5a5 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -205,7 +205,6 @@ config ARM64
>>  	select HAVE_EBPF_JIT
>>  	select HAVE_C_RECORDMCOUNT
>>  	select HAVE_CMPXCHG_DOUBLE
>> -	select HAVE_CMPXCHG_LOCAL
>>  	select HAVE_CONTEXT_TRACKING_USER
>>  	select HAVE_DEBUG_KMEMLEAK
>>  	select HAVE_DMA_CONTIGUOUS
>> diff --git a/arch/arm64/include/asm/percpu.h b/arch/arm64/include/asm/percpu.h
>> index b57b2bb00967..70ffe566cb4b 100644
>> --- a/arch/arm64/include/asm/percpu.h
>> +++ b/arch/arm64/include/asm/percpu.h
>> @@ -232,30 +232,6 @@ PERCPU_RET_OP(add, add, ldadd)
>>  #define this_cpu_xchg_8(pcp, val)	\
>>  	_pcp_protect_return(xchg_relaxed, pcp, val)
>>  
>> -#define this_cpu_cmpxchg_1(pcp, o, n)	\
>> -	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
>> -#define this_cpu_cmpxchg_2(pcp, o, n)	\
>> -	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
>> -#define this_cpu_cmpxchg_4(pcp, o, n)	\
>> -	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
>> -#define this_cpu_cmpxchg_8(pcp, o, n)	\
>> -	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
>> -
>> -#define this_cpu_cmpxchg64(pcp, o, n)	this_cpu_cmpxchg_8(pcp, o, n)
>> -
>> -#define this_cpu_cmpxchg128(pcp, o, n)					\
>> -({									\
>> -	typedef typeof(pcp) pcp_op_T__;					\
>> -	u128 old__, new__, ret__;					\
>> -	pcp_op_T__ *ptr__;						\
>> -	old__ = o;							\
>> -	new__ = n;							\
>> -	preempt_disable_notrace();					\
>> -	ptr__ = raw_cpu_ptr(&(pcp));					\
>> -	ret__ = cmpxchg128_local((void *)ptr__, old__, new__);		\
>> -	preempt_enable_notrace();					\
>> -	ret__;								\
>> -})
>>  
>>  #ifdef __KVM_NVHE_HYPERVISOR__
>>  extern unsigned long __hyp_per_cpu_offset(unsigned int cpu);
>> -- 
>> 2.51.0
>>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] arm64: remove HAVE_CMPXCHG_LOCAL
  2026-02-16 15:29   ` Dev Jain
@ 2026-02-17 13:53     ` Catalin Marinas
  2026-02-17 15:00       ` Will Deacon
  2026-02-17 17:19     ` Christoph Lameter (Ampere)
  2026-02-20  6:14     ` Jisheng Zhang
  2 siblings, 1 reply; 14+ messages in thread
From: Catalin Marinas @ 2026-02-17 13:53 UTC (permalink / raw)
  To: Dev Jain
  Cc: Will Deacon, Jisheng Zhang, Dennis Zhou, Tejun Heo,
	Christoph Lameter, linux-arm-kernel, linux-kernel, linux-mm, maz

On Mon, Feb 16, 2026 at 08:59:17PM +0530, Dev Jain wrote:
> On 16/02/26 4:30 pm, Will Deacon wrote:
> > On Sun, Feb 15, 2026 at 11:39:44AM +0800, Jisheng Zhang wrote:
> >> It turns out the generic disable/enable irq this_cpu_cmpxchg
> >> implementation is faster than LL/SC or lse implementation. Remove
> >> HAVE_CMPXCHG_LOCAL for better performance on arm64.
> >>
> >> Tested on Quad 1.9GHZ CA55 platform:
> >> average mod_node_page_state() cost decreases from 167ns to 103ns
> >> the spawn (30 duration) benchmark in unixbench is improved
> >> from 147494 lps to 150561 lps, improved by 2.1%
> >>
> >> Tested on Quad 2.1GHZ CA73 platform:
> >> average mod_node_page_state() cost decreases from 113ns to 85ns
> >> the spawn (30 duration) benchmark in unixbench is improved
> >> from 209844 lps to 212581 lps, improved by 1.3%
> >>
> >> Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> >> ---
> >>  arch/arm64/Kconfig              |  1 -
> >>  arch/arm64/include/asm/percpu.h | 24 ------------------------
> >>  2 files changed, 25 deletions(-)
> > That is _entirely_ dependent on the system, so this isn't the right
> > approach. I also don't think it's something we particularly want to
> > micro-optimise to accomodate systems that suck at atomics.
> 
> Hi Will,
> 
> As I mention in the other email, the suspect is not the atomics, but
> preempt_disable(). On Apple M3, the regression reported in [1] resolves
> by removing preempt_disable/enable in _pcp_protect_return. To prove
> this another way, I disabled CONFIG_ARM64_HAS_LSE_ATOMICS and the
> regression worsened, indicating that at least on Apple M3 the
> atomics are faster.

Then why don't we replace the preempt disabling with local_irq_save()
in the arm64 code and still use the LSE atomics?

IIUC (lots of macro indirection), the generic cmpxchg is not atomic, so
another CPU is allowed to mess this up if it accesses current CPU's
variable via per_cpu_ptr().

-- 
Catalin


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] arm64: remove HAVE_CMPXCHG_LOCAL
  2026-02-17 13:53     ` Catalin Marinas
@ 2026-02-17 15:00       ` Will Deacon
  2026-02-17 16:48         ` Catalin Marinas
  0 siblings, 1 reply; 14+ messages in thread
From: Will Deacon @ 2026-02-17 15:00 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Dev Jain, Jisheng Zhang, Dennis Zhou, Tejun Heo,
	Christoph Lameter, linux-arm-kernel, linux-kernel, linux-mm, maz

On Tue, Feb 17, 2026 at 01:53:19PM +0000, Catalin Marinas wrote:
> On Mon, Feb 16, 2026 at 08:59:17PM +0530, Dev Jain wrote:
> > On 16/02/26 4:30 pm, Will Deacon wrote:
> > > On Sun, Feb 15, 2026 at 11:39:44AM +0800, Jisheng Zhang wrote:
> > >> It turns out the generic disable/enable irq this_cpu_cmpxchg
> > >> implementation is faster than LL/SC or lse implementation. Remove
> > >> HAVE_CMPXCHG_LOCAL for better performance on arm64.
> > >>
> > >> Tested on Quad 1.9GHZ CA55 platform:
> > >> average mod_node_page_state() cost decreases from 167ns to 103ns
> > >> the spawn (30 duration) benchmark in unixbench is improved
> > >> from 147494 lps to 150561 lps, improved by 2.1%
> > >>
> > >> Tested on Quad 2.1GHZ CA73 platform:
> > >> average mod_node_page_state() cost decreases from 113ns to 85ns
> > >> the spawn (30 duration) benchmark in unixbench is improved
> > >> from 209844 lps to 212581 lps, improved by 1.3%
> > >>
> > >> Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> > >> ---
> > >>  arch/arm64/Kconfig              |  1 -
> > >>  arch/arm64/include/asm/percpu.h | 24 ------------------------
> > >>  2 files changed, 25 deletions(-)
> > > That is _entirely_ dependent on the system, so this isn't the right
> > > approach. I also don't think it's something we particularly want to
> > > micro-optimise to accomodate systems that suck at atomics.
> > 
> > Hi Will,
> > 
> > As I mention in the other email, the suspect is not the atomics, but
> > preempt_disable(). On Apple M3, the regression reported in [1] resolves
> > by removing preempt_disable/enable in _pcp_protect_return. To prove
> > this another way, I disabled CONFIG_ARM64_HAS_LSE_ATOMICS and the
> > regression worsened, indicating that at least on Apple M3 the
> > atomics are faster.
> 
> Then why don't we replace the preempt disabling with local_irq_save()
> in the arm64 code and still use the LSE atomics?

Even better, work on making preempt_disable() faster as it's used in many
other places. Of course, if people want to hack the .config, they could
also change the preemption mode...

Will


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] arm64: remove HAVE_CMPXCHG_LOCAL
  2026-02-17 15:00       ` Will Deacon
@ 2026-02-17 16:48         ` Catalin Marinas
  2026-02-18  4:01           ` K Prateek Nayak
  0 siblings, 1 reply; 14+ messages in thread
From: Catalin Marinas @ 2026-02-17 16:48 UTC (permalink / raw)
  To: Will Deacon
  Cc: Dev Jain, Jisheng Zhang, Dennis Zhou, Tejun Heo,
	Christoph Lameter, linux-arm-kernel, linux-kernel, linux-mm, maz

On Tue, Feb 17, 2026 at 03:00:22PM +0000, Will Deacon wrote:
> On Tue, Feb 17, 2026 at 01:53:19PM +0000, Catalin Marinas wrote:
> > On Mon, Feb 16, 2026 at 08:59:17PM +0530, Dev Jain wrote:
> > > On 16/02/26 4:30 pm, Will Deacon wrote:
> > > > On Sun, Feb 15, 2026 at 11:39:44AM +0800, Jisheng Zhang wrote:
> > > >> It turns out the generic disable/enable irq this_cpu_cmpxchg
> > > >> implementation is faster than LL/SC or lse implementation. Remove
> > > >> HAVE_CMPXCHG_LOCAL for better performance on arm64.
> > > >>
> > > >> Tested on Quad 1.9GHZ CA55 platform:
> > > >> average mod_node_page_state() cost decreases from 167ns to 103ns
> > > >> the spawn (30 duration) benchmark in unixbench is improved
> > > >> from 147494 lps to 150561 lps, improved by 2.1%
> > > >>
> > > >> Tested on Quad 2.1GHZ CA73 platform:
> > > >> average mod_node_page_state() cost decreases from 113ns to 85ns
> > > >> the spawn (30 duration) benchmark in unixbench is improved
> > > >> from 209844 lps to 212581 lps, improved by 1.3%
[...]
> > > > That is _entirely_ dependent on the system, so this isn't the right
> > > > approach. I also don't think it's something we particularly want to
> > > > micro-optimise to accomodate systems that suck at atomics.
> > > 
> > > As I mention in the other email, the suspect is not the atomics, but
> > > preempt_disable(). On Apple M3, the regression reported in [1] resolves
> > > by removing preempt_disable/enable in _pcp_protect_return. To prove
> > > this another way, I disabled CONFIG_ARM64_HAS_LSE_ATOMICS and the
> > > regression worsened, indicating that at least on Apple M3 the
> > > atomics are faster.
> > 
> > Then why don't we replace the preempt disabling with local_irq_save()
> > in the arm64 code and still use the LSE atomics?
> 
> Even better, work on making preempt_disable() faster as it's used in many
> other places.

Yes, that would be good. It's the preempt_enable_notrace() path that
ends up calling preempt_schedule_notrace() -> __schedule() pretty much
unconditionally. Not sure what would go wrong but some simple change
like this (can be done at a higher in the preempt macros to even avoid
getting here):

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 854984967fe2..d9a5d6438303 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7119,7 +7119,7 @@ asmlinkage __visible void __sched notrace preempt_schedule_notrace(void)
 	if (likely(!preemptible()))
 		return;
 
-	do {
+	while (need_resched()) {
 		/*
 		 * Because the function tracer can trace preempt_count_sub()
 		 * and it also uses preempt_enable/disable_notrace(), if
@@ -7146,7 +7146,7 @@ asmlinkage __visible void __sched notrace preempt_schedule_notrace(void)
 
 		preempt_latency_stop(1);
 		preempt_enable_no_resched_notrace();
-	} while (need_resched());
+	}
 }
 EXPORT_SYMBOL_GPL(preempt_schedule_notrace);
 

Of course, changing the preemption model solves this by making the
macros no-ops but I assume people want to keep preemption on.

-- 
Catalin


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] arm64: remove HAVE_CMPXCHG_LOCAL
  2026-02-17 16:48         ` Catalin Marinas
@ 2026-02-18  4:01           ` K Prateek Nayak
  2026-02-18  9:29             ` Catalin Marinas
  0 siblings, 1 reply; 14+ messages in thread
From: K Prateek Nayak @ 2026-02-18  4:01 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Dev Jain, Jisheng Zhang, Dennis Zhou, Tejun Heo,
	Christoph Lameter, linux-arm-kernel, linux-kernel, linux-mm, maz

Hello Catalin,

On 2/17/2026 10:18 PM, Catalin Marinas wrote:
> Yes, that would be good. It's the preempt_enable_notrace() path that
> ends up calling preempt_schedule_notrace() -> __schedule() pretty much
> unconditionally.

What do you mean by unconditionally? We always check
__preempt_count_dec_and_test() before calling into __schedule().

On x86, We use MSB of preempt_count to indicate a resched and
set_preempt_need_resched() would just clear this MSB.

If the preempt_count() turns 0, we immediately go into schedule
or  or the next preempt_enable() -> __preempt_count_dec_and_test()
would see the entire preempt_count being clear and will call into
schedule.

The arm64 implementation seems to be doing something similar too
with a separate "ti->preempt.need_resched" bit which is part of
the "ti->preempt_count"'s union so it isn't really unconditional.

> Not sure what would go wrong but some simple change
> like this (can be done at a higher in the preempt macros to even avoid
> getting here):
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 854984967fe2..d9a5d6438303 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7119,7 +7119,7 @@ asmlinkage __visible void __sched notrace preempt_schedule_notrace(void)
>  	if (likely(!preemptible()))
>  		return;
>  
> -	do {
> +	while (need_resched()) {

Essentially you are simply checking it twice now on entry since
need_resched() state would have already been communicated by
__preempt_count_dec_and_test().

>  		/*
>  		 * Because the function tracer can trace preempt_count_sub()
>  		 * and it also uses preempt_enable/disable_notrace(), if
> @@ -7146,7 +7146,7 @@ asmlinkage __visible void __sched notrace preempt_schedule_notrace(void)
>  
>  		preempt_latency_stop(1);
>  		preempt_enable_no_resched_notrace();
> -	} while (need_resched());
> +	}
>  }
>  EXPORT_SYMBOL_GPL(preempt_schedule_notrace);
-- 
Thanks and Regards,
Prateek

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] arm64: remove HAVE_CMPXCHG_LOCAL
  2026-02-18  4:01           ` K Prateek Nayak
@ 2026-02-18  9:29             ` Catalin Marinas
  0 siblings, 0 replies; 14+ messages in thread
From: Catalin Marinas @ 2026-02-18  9:29 UTC (permalink / raw)
  To: K Prateek Nayak
  Cc: Will Deacon, Dev Jain, Jisheng Zhang, Dennis Zhou, Tejun Heo,
	Christoph Lameter, linux-arm-kernel, linux-kernel, linux-mm, maz

Hi Prateek,

On Wed, Feb 18, 2026 at 09:31:19AM +0530, K Prateek Nayak wrote:
> On 2/17/2026 10:18 PM, Catalin Marinas wrote:
> > Yes, that would be good. It's the preempt_enable_notrace() path that
> > ends up calling preempt_schedule_notrace() -> __schedule() pretty much
> > unconditionally.
> 
> What do you mean by unconditionally? We always check
> __preempt_count_dec_and_test() before calling into __schedule().
> 
> On x86, We use MSB of preempt_count to indicate a resched and
> set_preempt_need_resched() would just clear this MSB.
> 
> If the preempt_count() turns 0, we immediately go into schedule
> or  or the next preempt_enable() -> __preempt_count_dec_and_test()
> would see the entire preempt_count being clear and will call into
> schedule.
> 
> The arm64 implementation seems to be doing something similar too
> with a separate "ti->preempt.need_resched" bit which is part of
> the "ti->preempt_count"'s union so it isn't really unconditional.

Ah, yes, you are right. I got the polarity of need_resched in
thread_info wrong (we should have named it no_need_to_resched).

So in the common case, the overhead is caused by the additional
pointer chase and preempt_count update, on top of the cpu offset read.
Not sure we can squeeze any more cycles out of these without some
large overhaul like:

https://git.kernel.org/mark/c/84ee5f23f93d4a650e828f831da9ed29c54623c5

or Yang's per-CPU page tables. Well, there are more ideas like in-kernel
restartable sequences but they move the overhead elsewhere.

Thanks.

-- 
Catalin


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] arm64: remove HAVE_CMPXCHG_LOCAL
  2026-02-16 15:29   ` Dev Jain
  2026-02-17 13:53     ` Catalin Marinas
@ 2026-02-17 17:19     ` Christoph Lameter (Ampere)
  2026-02-20  6:14     ` Jisheng Zhang
  2 siblings, 0 replies; 14+ messages in thread
From: Christoph Lameter (Ampere) @ 2026-02-17 17:19 UTC (permalink / raw)
  To: Dev Jain
  Cc: Will Deacon, Jisheng Zhang, Catalin Marinas, Dennis Zhou,
	Tejun Heo, linux-arm-kernel, linux-kernel, linux-mm, maz

On Mon, 16 Feb 2026, Dev Jain wrote:

> By coincidence, Yang Shi has been discussing the this_cpu_* overhead
> at [2].

Yang Shi is on vacation but we have a patchset that removes
preempt_enable/disable from this_cpu operations on ARM64.

The performance of cmpxchg varies by platform in use and with the kernel
config. The measurements that I did 2 years ago indicated that the cmpxchg
use with Ampere processors did not cause a regression.

Note that distro kernels often do not enable PREEMPT_FULL and therefore
preempt_disable/enable overhead is not incurred in production systems.

PREEMPT_VOLUNTARY does not use preemption for this_cpu ops.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] arm64: remove HAVE_CMPXCHG_LOCAL
  2026-02-16 15:29   ` Dev Jain
  2026-02-17 13:53     ` Catalin Marinas
  2026-02-17 17:19     ` Christoph Lameter (Ampere)
@ 2026-02-20  6:14     ` Jisheng Zhang
  2 siblings, 0 replies; 14+ messages in thread
From: Jisheng Zhang @ 2026-02-20  6:14 UTC (permalink / raw)
  To: Dev Jain
  Cc: Will Deacon, Catalin Marinas, Dennis Zhou, Tejun Heo,
	Christoph Lameter, linux-arm-kernel, linux-kernel, linux-mm, maz

On Mon, Feb 16, 2026 at 08:59:17PM +0530, Dev Jain wrote:
> 
> On 16/02/26 4:30 pm, Will Deacon wrote:
> > On Sun, Feb 15, 2026 at 11:39:44AM +0800, Jisheng Zhang wrote:
> >> It turns out the generic disable/enable irq this_cpu_cmpxchg
> >> implementation is faster than LL/SC or lse implementation. Remove
> >> HAVE_CMPXCHG_LOCAL for better performance on arm64.
> >>
> >> Tested on Quad 1.9GHZ CA55 platform:
> >> average mod_node_page_state() cost decreases from 167ns to 103ns
> >> the spawn (30 duration) benchmark in unixbench is improved
> >> from 147494 lps to 150561 lps, improved by 2.1%
> >>
> >> Tested on Quad 2.1GHZ CA73 platform:
> >> average mod_node_page_state() cost decreases from 113ns to 85ns
> >> the spawn (30 duration) benchmark in unixbench is improved
> >> from 209844 lps to 212581 lps, improved by 1.3%
> >>
> >> Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> >> ---
> >>  arch/arm64/Kconfig              |  1 -
> >>  arch/arm64/include/asm/percpu.h | 24 ------------------------
> >>  2 files changed, 25 deletions(-)
> > That is _entirely_ dependent on the system, so this isn't the right
> > approach. I also don't think it's something we particularly want to
> > micro-optimise to accomodate systems that suck at atomics.

Hi Will,

I read this as an implication that the cmpxchg_local version is better
than generic disable/enable irq version on the newer arm64 systems. Is my
understanding correct?

> 
> Hi Will,
> 
> As I mention in the other email, the suspect is not the atomics, but
> preempt_disable(). On Apple M3, the regression reported in [1] resolves
> by removing preempt_disable/enable in _pcp_protect_return. To prove
> this another way, I disabled CONFIG_ARM64_HAS_LSE_ATOMICS and the
> regression worsened, indicating that at least on Apple M3 the
> atomics are faster.
> 
> It may help to confirm this hypothesis on other hardware - perhaps
> Jisheng can test with this change on his hardware and confirm
> whether he gets the same performance improvement.

Hi Dev,

Thanks for the hints. I tried to remove the preempt_disable/enable from
_pcp_protect_return, it improves, but the HAVE_CMPXCHG_LOCAL version is
still worse than generic disable/enable irq version on CA55 and CA73.

> 
> By coincidence, Yang Shi has been discussing the this_cpu_* overhead
> at [2].
> 
> [1] https://lore.kernel.org/all/1052a452-9ba3-4da7-be47-7d27d27b3d1d@arm.com/
> [2] https://lore.kernel.org/all/CAHbLzkpcN-T8MH6=W3jCxcFj1gVZp8fRqe231yzZT-rV_E_org@mail.gmail.com/
> 
> >
> > Will
> >
> >> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> >> index 38dba5f7e4d2..5e7e2e65d5a5 100644
> >> --- a/arch/arm64/Kconfig
> >> +++ b/arch/arm64/Kconfig
> >> @@ -205,7 +205,6 @@ config ARM64
> >>  	select HAVE_EBPF_JIT
> >>  	select HAVE_C_RECORDMCOUNT
> >>  	select HAVE_CMPXCHG_DOUBLE
> >> -	select HAVE_CMPXCHG_LOCAL
> >>  	select HAVE_CONTEXT_TRACKING_USER
> >>  	select HAVE_DEBUG_KMEMLEAK
> >>  	select HAVE_DMA_CONTIGUOUS
> >> diff --git a/arch/arm64/include/asm/percpu.h b/arch/arm64/include/asm/percpu.h
> >> index b57b2bb00967..70ffe566cb4b 100644
> >> --- a/arch/arm64/include/asm/percpu.h
> >> +++ b/arch/arm64/include/asm/percpu.h
> >> @@ -232,30 +232,6 @@ PERCPU_RET_OP(add, add, ldadd)
> >>  #define this_cpu_xchg_8(pcp, val)	\
> >>  	_pcp_protect_return(xchg_relaxed, pcp, val)
> >>  
> >> -#define this_cpu_cmpxchg_1(pcp, o, n)	\
> >> -	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> >> -#define this_cpu_cmpxchg_2(pcp, o, n)	\
> >> -	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> >> -#define this_cpu_cmpxchg_4(pcp, o, n)	\
> >> -	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> >> -#define this_cpu_cmpxchg_8(pcp, o, n)	\
> >> -	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> >> -
> >> -#define this_cpu_cmpxchg64(pcp, o, n)	this_cpu_cmpxchg_8(pcp, o, n)
> >> -
> >> -#define this_cpu_cmpxchg128(pcp, o, n)					\
> >> -({									\
> >> -	typedef typeof(pcp) pcp_op_T__;					\
> >> -	u128 old__, new__, ret__;					\
> >> -	pcp_op_T__ *ptr__;						\
> >> -	old__ = o;							\
> >> -	new__ = n;							\
> >> -	preempt_disable_notrace();					\
> >> -	ptr__ = raw_cpu_ptr(&(pcp));					\
> >> -	ret__ = cmpxchg128_local((void *)ptr__, old__, new__);		\
> >> -	preempt_enable_notrace();					\
> >> -	ret__;								\
> >> -})
> >>  
> >>  #ifdef __KVM_NVHE_HYPERVISOR__
> >>  extern unsigned long __hyp_per_cpu_offset(unsigned int cpu);
> >> -- 
> >> 2.51.0
> >>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] arm64: remove HAVE_CMPXCHG_LOCAL
  2026-02-15  3:39 [PATCH] arm64: remove HAVE_CMPXCHG_LOCAL Jisheng Zhang
  2026-02-16 10:59 ` Dev Jain
  2026-02-16 11:00 ` Will Deacon
@ 2026-02-18 22:07 ` Shakeel Butt
  2026-02-20  6:20   ` Jisheng Zhang
  2 siblings, 1 reply; 14+ messages in thread
From: Shakeel Butt @ 2026-02-18 22:07 UTC (permalink / raw)
  To: Jisheng Zhang
  Cc: Catalin Marinas, Will Deacon, Dennis Zhou, Tejun Heo,
	Christoph Lameter, linux-arm-kernel, linux-kernel, linux-mm

On Sun, Feb 15, 2026 at 11:39:44AM +0800, Jisheng Zhang wrote:
> It turns out the generic disable/enable irq this_cpu_cmpxchg
> implementation is faster than LL/SC or lse implementation. Remove
> HAVE_CMPXCHG_LOCAL for better performance on arm64.
> 
> Tested on Quad 1.9GHZ CA55 platform:
> average mod_node_page_state() cost decreases from 167ns to 103ns
> the spawn (30 duration) benchmark in unixbench is improved
> from 147494 lps to 150561 lps, improved by 2.1%
> 
> Tested on Quad 2.1GHZ CA73 platform:
> average mod_node_page_state() cost decreases from 113ns to 85ns
> the spawn (30 duration) benchmark in unixbench is improved
> from 209844 lps to 212581 lps, improved by 1.3%
> 
> Signed-off-by: Jisheng Zhang <jszhang@kernel.org>

Please note that mod_node_page_state() can be called in NMI context and
generic disable/enable irq are not safe against NMIs (newer arm arch supports
NMI).



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] arm64: remove HAVE_CMPXCHG_LOCAL
  2026-02-18 22:07 ` Shakeel Butt
@ 2026-02-20  6:20   ` Jisheng Zhang
  2026-02-20 23:27     ` Shakeel Butt
  0 siblings, 1 reply; 14+ messages in thread
From: Jisheng Zhang @ 2026-02-20  6:20 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Catalin Marinas, Will Deacon, Dennis Zhou, Tejun Heo,
	Christoph Lameter, linux-arm-kernel, linux-kernel, linux-mm

On Wed, Feb 18, 2026 at 02:07:57PM -0800, Shakeel Butt wrote:
> On Sun, Feb 15, 2026 at 11:39:44AM +0800, Jisheng Zhang wrote:
> > It turns out the generic disable/enable irq this_cpu_cmpxchg
> > implementation is faster than LL/SC or lse implementation. Remove
> > HAVE_CMPXCHG_LOCAL for better performance on arm64.
> > 
> > Tested on Quad 1.9GHZ CA55 platform:
> > average mod_node_page_state() cost decreases from 167ns to 103ns
> > the spawn (30 duration) benchmark in unixbench is improved
> > from 147494 lps to 150561 lps, improved by 2.1%
> > 
> > Tested on Quad 2.1GHZ CA73 platform:
> > average mod_node_page_state() cost decreases from 113ns to 85ns
> > the spawn (30 duration) benchmark in unixbench is improved
> > from 209844 lps to 212581 lps, improved by 1.3%
> > 
> > Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> 
> Please note that mod_node_page_state() can be called in NMI context and
> generic disable/enable irq are not safe against NMIs (newer arm arch supports
> NMI).

hmm, interesting...

fgrep HAVE_NMI arch/*/Kconfig
then
fgrep HAVE_CMPXCHG_LOCAL arch/*/Kconfig

shows that only x86, arm64, s390 and loongarch are safe, while arm,
powerpc and mips enable HAVE_NMI but missing HAVE_CMPXCHG_LOCAL, so
they rely on generic generic disable/enable irq version, so you imply
that these three arch are not safe considering mod_node_page_state()
in NMI context.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] arm64: remove HAVE_CMPXCHG_LOCAL
  2026-02-20  6:20   ` Jisheng Zhang
@ 2026-02-20 23:27     ` Shakeel Butt
  0 siblings, 0 replies; 14+ messages in thread
From: Shakeel Butt @ 2026-02-20 23:27 UTC (permalink / raw)
  To: Jisheng Zhang
  Cc: Catalin Marinas, Will Deacon, Dennis Zhou, Tejun Heo,
	Christoph Lameter, linux-arm-kernel, linux-kernel, linux-mm

On Fri, Feb 20, 2026 at 02:20:54PM +0800, Jisheng Zhang wrote:
> On Wed, Feb 18, 2026 at 02:07:57PM -0800, Shakeel Butt wrote:
> > On Sun, Feb 15, 2026 at 11:39:44AM +0800, Jisheng Zhang wrote:
> > > It turns out the generic disable/enable irq this_cpu_cmpxchg
> > > implementation is faster than LL/SC or lse implementation. Remove
> > > HAVE_CMPXCHG_LOCAL for better performance on arm64.
> > > 
> > > Tested on Quad 1.9GHZ CA55 platform:
> > > average mod_node_page_state() cost decreases from 167ns to 103ns
> > > the spawn (30 duration) benchmark in unixbench is improved
> > > from 147494 lps to 150561 lps, improved by 2.1%
> > > 
> > > Tested on Quad 2.1GHZ CA73 platform:
> > > average mod_node_page_state() cost decreases from 113ns to 85ns
> > > the spawn (30 duration) benchmark in unixbench is improved
> > > from 209844 lps to 212581 lps, improved by 1.3%
> > > 
> > > Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> > 
> > Please note that mod_node_page_state() can be called in NMI context and
> > generic disable/enable irq are not safe against NMIs (newer arm arch supports
> > NMI).
> 
> hmm, interesting...
> 
> fgrep HAVE_NMI arch/*/Kconfig
> then
> fgrep HAVE_CMPXCHG_LOCAL arch/*/Kconfig
> 
> shows that only x86, arm64, s390 and loongarch are safe, while arm,
> powerpc and mips enable HAVE_NMI but missing HAVE_CMPXCHG_LOCAL, so
> they rely on generic generic disable/enable irq version, so you imply
> that these three arch are not safe considering mod_node_page_state()
> in NMI context.

Yes it seems like it. For memcg stats, we use ARCH_HAVE_NMI_SAFE_CMPXCHG and
ARCH_HAS_NMI_SAFE_THIS_CPU_OPS config options to correctly handle the updates
from NMI context. Maybe we need something similar for vmstat as well.

So arm, powerpc and mips does not have ARCH_HAS_NMI_SAFE_THIS_CPU_OPS but
powerpc does have ARCH_HAVE_NMI_SAFE_CMPXCHG and arm has
it for CPU_V7, CPU_V7M & CPU_V6K models.

I wonder if we need to add complexity for these archs.


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2026-02-20 23:28 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-15  3:39 [PATCH] arm64: remove HAVE_CMPXCHG_LOCAL Jisheng Zhang
2026-02-16 10:59 ` Dev Jain
2026-02-16 11:00 ` Will Deacon
2026-02-16 15:29   ` Dev Jain
2026-02-17 13:53     ` Catalin Marinas
2026-02-17 15:00       ` Will Deacon
2026-02-17 16:48         ` Catalin Marinas
2026-02-18  4:01           ` K Prateek Nayak
2026-02-18  9:29             ` Catalin Marinas
2026-02-17 17:19     ` Christoph Lameter (Ampere)
2026-02-20  6:14     ` Jisheng Zhang
2026-02-18 22:07 ` Shakeel Butt
2026-02-20  6:20   ` Jisheng Zhang
2026-02-20 23:27     ` Shakeel Butt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox