linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] RISC-V: add percpu.h to include/asm
@ 2025-12-08  3:49 Yunhui Cui
  2025-12-08  3:49 ` [PATCH v2 1/3] riscv: remove irqflags.h inclusion in asm/bitops.h Yunhui Cui
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Yunhui Cui @ 2025-12-08  3:49 UTC (permalink / raw)
  To: aou, alex, andii, andybnac, apatel, ast, ben.dooks, bjorn, bpf,
	charlie, cl, conor.dooley, cuiyunhui, cyrilbur, daniel, debug,
	dennis, eddyz87, haoluo, john.fastabend, jolsa, kpsingh,
	linux-kernel, linux-mm, linux-riscv, linux, martin.lau, palmer,
	pjw, puranjay, pulehui, ruanjinjie, rkrcmar, samuel.holland, sdf,
	song, tglx, tj, thuth, yonghong.song, yury.norov, zong.li

v1->v2:
1. Support percpu add/and/or operations for non-ZABHA
2. Implement optimization: store percpu offset in thread_info

Yunhui Cui (3):
  riscv: remove irqflags.h inclusion in asm/bitops.h
  riscv: introduce percpu.h into include/asm
  riscv: store percpu offset into thread_info

 arch/riscv/include/asm/asm.h         |   6 +-
 arch/riscv/include/asm/bitops.h      |   1 -
 arch/riscv/include/asm/percpu.h      | 242 +++++++++++++++++++++++++++
 arch/riscv/include/asm/switch_to.h   |   8 +
 arch/riscv/include/asm/thread_info.h |   5 +-
 arch/riscv/kernel/asm-offsets.c      |   1 +
 arch/riscv/kernel/smpboot.c          |   7 +
 arch/riscv/net/bpf_jit_comp64.c      |   9 +-
 8 files changed, 263 insertions(+), 16 deletions(-)
 create mode 100644 arch/riscv/include/asm/percpu.h

-- 
2.39.5



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 1/3] riscv: remove irqflags.h inclusion in asm/bitops.h
  2025-12-08  3:49 [PATCH v2 0/3] RISC-V: add percpu.h to include/asm Yunhui Cui
@ 2025-12-08  3:49 ` Yunhui Cui
  2025-12-08  3:49 ` [PATCH v2 2/3] riscv: introduce percpu.h into include/asm Yunhui Cui
  2025-12-08  3:49 ` [PATCH v2 3/3] riscv: store percpu offset into thread_info Yunhui Cui
  2 siblings, 0 replies; 7+ messages in thread
From: Yunhui Cui @ 2025-12-08  3:49 UTC (permalink / raw)
  To: aou, alex, andii, andybnac, apatel, ast, ben.dooks, bjorn, bpf,
	charlie, cl, conor.dooley, cuiyunhui, cyrilbur, daniel, debug,
	dennis, eddyz87, haoluo, john.fastabend, jolsa, kpsingh,
	linux-kernel, linux-mm, linux-riscv, linux, martin.lau, palmer,
	pjw, puranjay, pulehui, ruanjinjie, rkrcmar, samuel.holland, sdf,
	song, tglx, tj, thuth, yonghong.song, yury.norov, zong.li

The arch/riscv/include/asm/bitops.h does not functionally require
including /linux/irqflags.h. Additionally, adding
arch/riscv/include/asm/percpu.h causes a circular inclusion:
kernel/bounds.c
->include/linux/log2.h
->include/linux/bitops.h
->arch/riscv/include/asm/bitops.h
->include/linux/irqflags.h
->include/linux/find.h
->return val ? __ffs(val) : size;
->arch/riscv/include/asm/bitops.h

The compilation log is as follows:
CC      kernel/bounds.s
In file included from ./include/linux/bitmap.h:11,
               from ./include/linux/cpumask.h:12,
               from ./arch/riscv/include/asm/processor.h:55,
               from ./arch/riscv/include/asm/thread_info.h:42,
               from ./include/linux/thread_info.h:60,
               from ./include/asm-generic/preempt.h:5,
               from ./arch/riscv/include/generated/asm/preempt.h:1,
               from ./include/linux/preempt.h:79,
               from ./arch/riscv/include/asm/percpu.h:8,
               from ./include/linux/irqflags.h:19,
               from ./arch/riscv/include/asm/bitops.h:14,
               from ./include/linux/bitops.h:68,
               from ./include/linux/log2.h:12,
               from kernel/bounds.c:13:
./include/linux/find.h: In function 'find_next_bit':
./include/linux/find.h:66:30: error: implicit declaration of function '__ffs' [-Wimplicit-function-declaration]
   66 |                 return val ? __ffs(val) : size;
      |                              ^~~~~

Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
---
 arch/riscv/include/asm/bitops.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/riscv/include/asm/bitops.h b/arch/riscv/include/asm/bitops.h
index 238092125c118..3c1a15be54d80 100644
--- a/arch/riscv/include/asm/bitops.h
+++ b/arch/riscv/include/asm/bitops.h
@@ -11,7 +11,6 @@
 #endif /* _LINUX_BITOPS_H */
 
 #include <linux/compiler.h>
-#include <linux/irqflags.h>
 #include <asm/barrier.h>
 #include <asm/bitsperlong.h>
 
-- 
2.39.5



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 2/3] riscv: introduce percpu.h into include/asm
  2025-12-08  3:49 [PATCH v2 0/3] RISC-V: add percpu.h to include/asm Yunhui Cui
  2025-12-08  3:49 ` [PATCH v2 1/3] riscv: remove irqflags.h inclusion in asm/bitops.h Yunhui Cui
@ 2025-12-08  3:49 ` Yunhui Cui
  2025-12-09  2:12   ` kernel test robot
                     ` (2 more replies)
  2025-12-08  3:49 ` [PATCH v2 3/3] riscv: store percpu offset into thread_info Yunhui Cui
  2 siblings, 3 replies; 7+ messages in thread
From: Yunhui Cui @ 2025-12-08  3:49 UTC (permalink / raw)
  To: aou, alex, andii, andybnac, apatel, ast, ben.dooks, bjorn, bpf,
	charlie, cl, conor.dooley, cuiyunhui, cyrilbur, daniel, debug,
	dennis, eddyz87, haoluo, john.fastabend, jolsa, kpsingh,
	linux-kernel, linux-mm, linux-riscv, linux, martin.lau, palmer,
	pjw, puranjay, pulehui, ruanjinjie, rkrcmar, samuel.holland, sdf,
	song, tglx, tj, thuth, yonghong.song, yury.norov, zong.li

Current percpu operations rely on generic implementations, where
raw_local_irq_save() introduces substantial overhead. Optimization
is achieved through atomic operations and preemption disabling.

Currently, since RISC-V does not support lr/sc.b/h, when ZABHA is
not supported, lr/sc.w needs to be used instead, which requires
some additional mask operations.

Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
---
 arch/riscv/include/asm/percpu.h | 238 ++++++++++++++++++++++++++++++++
 1 file changed, 238 insertions(+)
 create mode 100644 arch/riscv/include/asm/percpu.h

diff --git a/arch/riscv/include/asm/percpu.h b/arch/riscv/include/asm/percpu.h
new file mode 100644
index 0000000000000..b173729926126
--- /dev/null
+++ b/arch/riscv/include/asm/percpu.h
@@ -0,0 +1,238 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#ifndef __ASM_PERCPU_H
+#define __ASM_PERCPU_H
+
+#include <linux/preempt.h>
+
+#include <asm/alternative-macros.h>
+#include <asm/cpufeature-macros.h>
+#include <asm/hwcap.h>
+
+#define PERCPU_RW_OPS(sz)						\
+static inline unsigned long __percpu_read_##sz(void *ptr)		\
+{									\
+	return READ_ONCE(*(u##sz *)ptr);				\
+}									\
+									\
+static inline void __percpu_write_##sz(void *ptr, unsigned long val)	\
+{									\
+	WRITE_ONCE(*(u##sz *)ptr, (u##sz)val);				\
+}
+
+PERCPU_RW_OPS(8)
+PERCPU_RW_OPS(16)
+PERCPU_RW_OPS(32)
+PERCPU_RW_OPS(64)
+
+#define __PERCPU_AMO_OP_CASE(sfx, name, sz, amo_insn)			\
+static inline void							\
+__percpu_##name##_amo_case_##sz(void *ptr, unsigned long val)		\
+{									\
+	asm volatile (							\
+		"amo" #amo_insn #sfx " zero, %[val], %[ptr]"		\
+		: [ptr] "+A" (*(u##sz *)ptr)				\
+		: [val] "r" ((u##sz)(val))				\
+		: "memory");						\
+}
+
+#define PERCPU_OP(name, amo_insn)					\
+	__PERCPU_AMO_OP_CASE(.w, name, 32, amo_insn)			\
+	__PERCPU_AMO_OP_CASE(.d, name, 64, amo_insn)
+
+PERCPU_OP(add, add)
+PERCPU_OP(andnot, and)
+PERCPU_OP(or, or)
+
+/*
+ * Currently, only this_cpu_add_return_xxx() requires a return value,
+ * and the PERCPU_RET_OP() does not account for other operations.
+ */
+#define __PERCPU_AMO_RET_OP_CASE(sfx, name, sz, amo_insn)		\
+static inline u##sz							\
+__percpu_##name##_return_amo_case_##sz(void *ptr, unsigned long val)	\
+{									\
+	register u##sz ret;						\
+									\
+	asm volatile (							\
+		"amo" #amo_insn #sfx " %[ret], %[val], %[ptr]"		\
+		: [ptr] "+A" (*(u##sz *)ptr), [ret] "=r" (ret)		\
+		: [val] "r" ((u##sz)(val))				\
+		: "memory");						\
+									\
+	return ret + val;						\
+}
+
+#define PERCPU_RET_OP(name, amo_insn)					\
+	__PERCPU_AMO_RET_OP_CASE(.w, name, 32, amo_insn)		\
+	__PERCPU_AMO_RET_OP_CASE(.d, name, 64, amo_insn)
+
+PERCPU_RET_OP(add, add)
+
+#define PERCPU_8_16_GET_SHIFT(ptr)	(((unsigned long)(ptr) & 0x3) * BITS_PER_BYTE)
+#define PERCPU_8_16_GET_MASK(sz)	GENMASK((sz)-1, 0)
+#define PERCPU_8_16_GET_PTR32(ptr)	((u32 *)((unsigned long)(ptr) & ~0x3))
+
+#define PERCPU_8_16_OP(name, amo_insn, sz, sfx, val_type, new_val_expr, asm_op)			\
+static inline void __percpu_##name##_amo_case_##sz(void *ptr, unsigned long val)		\
+{												\
+	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA) &&						\
+		riscv_has_extension_unlikely(RISCV_ISA_EXT_ZABHA)) {				\
+		asm volatile ("amo" #amo_insn #sfx " zero, %[val], %[ptr]"			\
+			: [ptr] "+A"(*(val_type *)ptr)						\
+			: [val] "r"((val_type)(new_val_expr))					\
+			: "memory");								\
+	} else {										\
+		u32 *ptr32 = PERCPU_8_16_GET_PTR32(ptr);					\
+		const unsigned long shift = PERCPU_8_16_GET_SHIFT(ptr);				\
+		const u32 mask = PERCPU_8_16_GET_MASK(sz) << shift;				\
+		const val_type val_trunc = (val_type)(new_val_expr);				\
+		u32 retx, rc;									\
+		val_type new_val_type;								\
+												\
+		asm volatile (									\
+			"0: lr.w %0, %2\n"							\
+			"and %3, %0, %4\n"							\
+			"srl %3, %3, %5\n"							\
+			#asm_op " %3, %3, %6\n"							\
+			"sll %3, %3, %5\n"							\
+			"and %1, %0, %7\n"							\
+			"or %1, %1, %3\n"							\
+			"sc.w %1, %1, %2\n"							\
+			"bnez %1, 0b\n"								\
+			: "=&r"(retx), "=&r"(rc), "+A"(*ptr32), "=&r"(new_val_type)		\
+			: "r"(mask), "r"(shift), "r"(val_trunc), "r"(~mask)			\
+			: "memory");								\
+		}										\
+}
+
+#define PERCPU_OP_8_16(op_name, op, expr, final_op)			\
+	PERCPU_8_16_OP(op_name, op, 8, .b, u8, expr, final_op);		\
+	PERCPU_8_16_OP(op_name, op, 16, .h, u16, expr, final_op)
+
+PERCPU_OP_8_16(add, add, val, add)
+PERCPU_OP_8_16(andnot, and, ~val, and)
+PERCPU_OP_8_16(or, or, val, or)
+
+#define PERCPU_8_16_RET_OP(name, amo_insn, sz, sfx, val_type, new_val_expr)			\
+static inline val_type __percpu_##name##_return_amo_case_##sz(void *ptr, unsigned long val)	\
+{												\
+	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA) &&						\
+		riscv_has_extension_unlikely(RISCV_ISA_EXT_ZABHA)) {				\
+		register val_type ret;								\
+		asm volatile ("amo" #amo_insn #sfx " %[ret], %[val], %[ptr]"			\
+			: [ptr] "+A"(*(val_type *)ptr), [ret] "=r"(ret)				\
+			: [val] "r"((val_type)(new_val_expr))					\
+			: "memory");								\
+		return ret + (val_type)(new_val_expr);						\
+	} else {										\
+		u32 *ptr32 = PERCPU_8_16_GET_PTR32(ptr);					\
+		const unsigned long shift = PERCPU_8_16_GET_SHIFT(ptr);				\
+		const u32 mask = (PERCPU_8_16_GET_MASK(sz) << shift);				\
+		const u32 inv_mask = ~mask;							\
+		const val_type val_trunc = (val_type)(new_val_expr);				\
+		u32 old, new, tmp;								\
+												\
+		asm volatile (									\
+			"0: lr.w %0, %3\n"							\
+			"and %1, %0, %4\n"							\
+			"srl %1, %1, %5\n"							\
+			"add %1, %1, %6\n"							\
+			"and %1, %1, %7\n"							\
+			"sll %1, %1, %5\n"							\
+			"and %2, %0, %8\n"							\
+			"or %2, %2, %1\n"							\
+			"sc.w %2, %2, %3\n"							\
+			"bnez %2, 0b\n"								\
+			: "=r"(old), "=r"(tmp), "=&r"(new), "+A"(*ptr32)			\
+			: "r"(mask), "r"(shift), "r"(val_trunc), "r"(PERCPU_8_16_GET_MASK(sz)), \
+			"r"(inv_mask)								\
+			: "memory");								\
+		return (val_type)(tmp);								\
+	}											\
+}
+
+PERCPU_8_16_RET_OP(add, add, 8, .b, u8, val)
+PERCPU_8_16_RET_OP(add, add, 16, .h, u16, val)
+
+#define _pcp_protect(op, pcp, ...)					\
+({									\
+	preempt_disable_notrace();					\
+	op(raw_cpu_ptr(&(pcp)), __VA_ARGS__);				\
+	preempt_enable_notrace();					\
+})
+
+#define _pcp_protect_return(op, pcp, args...)				\
+({									\
+	typeof(pcp) __retval;						\
+	preempt_disable_notrace();					\
+	__retval = (typeof(pcp))op(raw_cpu_ptr(&(pcp)), ##args);	\
+	preempt_enable_notrace();					\
+	__retval;							\
+})
+
+#define this_cpu_read_1(pcp)		_pcp_protect_return(__percpu_read_8, pcp)
+#define this_cpu_read_2(pcp)		_pcp_protect_return(__percpu_read_16, pcp)
+#define this_cpu_read_4(pcp)		_pcp_protect_return(__percpu_read_32, pcp)
+#define this_cpu_read_8(pcp)		_pcp_protect_return(__percpu_read_64, pcp)
+
+#define this_cpu_write_1(pcp, val)	_pcp_protect(__percpu_write_8, pcp, (unsigned long)val)
+#define this_cpu_write_2(pcp, val)	_pcp_protect(__percpu_write_16, pcp, (unsigned long)val)
+#define this_cpu_write_4(pcp, val)	_pcp_protect(__percpu_write_32, pcp, (unsigned long)val)
+#define this_cpu_write_8(pcp, val)	_pcp_protect(__percpu_write_64, pcp, (unsigned long)val)
+
+#define this_cpu_add_1(pcp, val)	_pcp_protect(__percpu_add_amo_case_8, pcp, val)
+#define this_cpu_add_2(pcp, val)	_pcp_protect(__percpu_add_amo_case_16, pcp, val)
+#define this_cpu_add_4(pcp, val)	_pcp_protect(__percpu_add_amo_case_32, pcp, val)
+#define this_cpu_add_8(pcp, val)	_pcp_protect(__percpu_add_amo_case_64, pcp, val)
+
+#define this_cpu_add_return_1(pcp, val)		\
+_pcp_protect_return(__percpu_add_return_amo_case_8, pcp, val)
+
+#define this_cpu_add_return_2(pcp, val)		\
+_pcp_protect_return(__percpu_add_return_amo_case_16, pcp, val)
+
+#define this_cpu_add_return_4(pcp, val)		\
+_pcp_protect_return(__percpu_add_return_amo_case_32, pcp, val)
+
+#define this_cpu_add_return_8(pcp, val)		\
+_pcp_protect_return(__percpu_add_return_amo_case_64, pcp, val)
+
+#define this_cpu_and_1(pcp, val)	_pcp_protect(__percpu_andnot_amo_case_8, pcp, ~val)
+#define this_cpu_and_2(pcp, val)	_pcp_protect(__percpu_andnot_amo_case_16, pcp, ~val)
+#define this_cpu_and_4(pcp, val)	_pcp_protect(__percpu_andnot_amo_case_32, pcp, ~val)
+#define this_cpu_and_8(pcp, val)	_pcp_protect(__percpu_andnot_amo_case_64, pcp, ~val)
+
+#define this_cpu_or_1(pcp, val)	_pcp_protect(__percpu_or_amo_case_8, pcp, val)
+#define this_cpu_or_2(pcp, val)	_pcp_protect(__percpu_or_amo_case_16, pcp, val)
+#define this_cpu_or_4(pcp, val)	_pcp_protect(__percpu_or_amo_case_32, pcp, val)
+#define this_cpu_or_8(pcp, val)	_pcp_protect(__percpu_or_amo_case_64, pcp, val)
+
+#define this_cpu_xchg_1(pcp, val)	_pcp_protect_return(xchg_relaxed, pcp, val)
+#define this_cpu_xchg_2(pcp, val)	_pcp_protect_return(xchg_relaxed, pcp, val)
+#define this_cpu_xchg_4(pcp, val)	_pcp_protect_return(xchg_relaxed, pcp, val)
+#define this_cpu_xchg_8(pcp, val)	_pcp_protect_return(xchg_relaxed, pcp, val)
+
+#define this_cpu_cmpxchg_1(pcp, o, n)	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
+#define this_cpu_cmpxchg_2(pcp, o, n)	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
+#define this_cpu_cmpxchg_4(pcp, o, n)	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
+#define this_cpu_cmpxchg_8(pcp, o, n)	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
+
+#define this_cpu_cmpxchg64(pcp, o, n)	this_cpu_cmpxchg_8(pcp, o, n)
+
+#define this_cpu_cmpxchg128(pcp, o, n)					\
+({									\
+	u128 old__, new__, ret__;					\
+	typeof(pcp) *ptr__;						\
+	old__ = o;							\
+	new__ = n;							\
+	preempt_disable_notrace();					\
+	ptr__ = raw_cpu_ptr(&(pcp));					\
+	ret__ = cmpxchg128_local(ptr__, old__, new__);			\
+	preempt_enable_notrace();					\
+	ret__;								\
+})
+
+#include <asm-generic/percpu.h>
+
+#endif /* __ASM_PERCPU_H */
-- 
2.39.5



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 3/3] riscv: store percpu offset into thread_info
  2025-12-08  3:49 [PATCH v2 0/3] RISC-V: add percpu.h to include/asm Yunhui Cui
  2025-12-08  3:49 ` [PATCH v2 1/3] riscv: remove irqflags.h inclusion in asm/bitops.h Yunhui Cui
  2025-12-08  3:49 ` [PATCH v2 2/3] riscv: introduce percpu.h into include/asm Yunhui Cui
@ 2025-12-08  3:49 ` Yunhui Cui
  2 siblings, 0 replies; 7+ messages in thread
From: Yunhui Cui @ 2025-12-08  3:49 UTC (permalink / raw)
  To: aou, alex, andii, andybnac, apatel, ast, ben.dooks, bjorn, bpf,
	charlie, cl, conor.dooley, cuiyunhui, cyrilbur, daniel, debug,
	dennis, eddyz87, haoluo, john.fastabend, jolsa, kpsingh,
	linux-kernel, linux-mm, linux-riscv, linux, martin.lau, palmer,
	pjw, puranjay, pulehui, ruanjinjie, rkrcmar, samuel.holland, sdf,
	song, tglx, tj, thuth, yonghong.song, yury.norov, zong.li

Originally we planned to add a register for the percpu offset,
which would speed up percpu variable R/W and reduce access
instructions. After discussion [1], it’s now stored in thread_info.

[1] https://lists.riscv.org/g/tech-privileged/topic/risc_v_tech_arch_review/113437553?page=2

Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
---
 arch/riscv/include/asm/asm.h         | 6 +-----
 arch/riscv/include/asm/percpu.h      | 4 ++++
 arch/riscv/include/asm/switch_to.h   | 8 ++++++++
 arch/riscv/include/asm/thread_info.h | 5 +++--
 arch/riscv/kernel/asm-offsets.c      | 1 +
 arch/riscv/kernel/smpboot.c          | 7 +++++++
 arch/riscv/net/bpf_jit_comp64.c      | 9 +--------
 7 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/arch/riscv/include/asm/asm.h b/arch/riscv/include/asm/asm.h
index e9e8ba83e632f..137a49488325e 100644
--- a/arch/riscv/include/asm/asm.h
+++ b/arch/riscv/include/asm/asm.h
@@ -91,11 +91,7 @@
 
 #ifdef CONFIG_SMP
 .macro asm_per_cpu dst sym tmp
-	lw    \tmp, TASK_TI_CPU_NUM(tp)
-	slli  \tmp, \tmp, RISCV_LGPTR
-	la    \dst, __per_cpu_offset
-	add   \dst, \dst, \tmp
-	REG_L \tmp, 0(\dst)
+	REG_L \tmp, TASK_TI_PCPU_OFFSET(tp)
 	la    \dst, \sym
 	add   \dst, \dst, \tmp
 .endm
diff --git a/arch/riscv/include/asm/percpu.h b/arch/riscv/include/asm/percpu.h
index b173729926126..18e282dded626 100644
--- a/arch/riscv/include/asm/percpu.h
+++ b/arch/riscv/include/asm/percpu.h
@@ -7,7 +7,9 @@
 
 #include <asm/alternative-macros.h>
 #include <asm/cpufeature-macros.h>
+#include <asm/current.h>
 #include <asm/hwcap.h>
+#include <asm/thread_info.h>
 
 #define PERCPU_RW_OPS(sz)						\
 static inline unsigned long __percpu_read_##sz(void *ptr)		\
@@ -233,6 +235,8 @@ _pcp_protect_return(__percpu_add_return_amo_case_64, pcp, val)
 	ret__;								\
 })
 
+#define __my_cpu_offset (((struct thread_info *)current)->pcpu_offset)
+
 #include <asm-generic/percpu.h>
 
 #endif /* __ASM_PERCPU_H */
diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
index 0e71eb82f920c..733b6cd306e40 100644
--- a/arch/riscv/include/asm/switch_to.h
+++ b/arch/riscv/include/asm/switch_to.h
@@ -88,6 +88,13 @@ static inline void __switch_to_envcfg(struct task_struct *next)
 			:: "r" (next->thread.envcfg) : "memory");
 }
 
+static inline void __switch_to_pcpu_offset(struct task_struct *next)
+{
+#ifdef CONFIG_SMP
+	next->thread_info.pcpu_offset = __my_cpu_offset;
+#endif
+}
+
 extern struct task_struct *__switch_to(struct task_struct *,
 				       struct task_struct *);
 
@@ -122,6 +129,7 @@ do {							\
 	if (switch_to_should_flush_icache(__next))	\
 		local_flush_icache_all();		\
 	__switch_to_envcfg(__next);			\
+	__switch_to_pcpu_offset(__next);		\
 	((last) = __switch_to(__prev, __next));		\
 } while (0)
 
diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/include/asm/thread_info.h
index 36918c9200c92..8d7d43cc9c405 100644
--- a/arch/riscv/include/asm/thread_info.h
+++ b/arch/riscv/include/asm/thread_info.h
@@ -52,7 +52,8 @@
  */
 struct thread_info {
 	unsigned long		flags;		/* low level flags */
-	int                     preempt_count;  /* 0=>preemptible, <0=>BUG */
+	int			preempt_count;	/* 0=>preemptible, <0=>BUG */
+	int			cpu;
 	/*
 	 * These stack pointers are overwritten on every system call or
 	 * exception.  SP is also saved to the stack it can be recovered when
@@ -60,8 +61,8 @@ struct thread_info {
 	 */
 	long			kernel_sp;	/* Kernel stack pointer */
 	long			user_sp;	/* User stack pointer */
-	int			cpu;
 	unsigned long		syscall_work;	/* SYSCALL_WORK_ flags */
+	unsigned long		pcpu_offset;
 #ifdef CONFIG_SHADOW_CALL_STACK
 	void			*scs_base;
 	void			*scs_sp;
diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
index af827448a609e..fbf53b66b0e06 100644
--- a/arch/riscv/kernel/asm-offsets.c
+++ b/arch/riscv/kernel/asm-offsets.c
@@ -38,6 +38,7 @@ void asm_offsets(void)
 	OFFSET(TASK_THREAD_SUM, task_struct, thread.sum);
 
 	OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu);
+	OFFSET(TASK_TI_PCPU_OFFSET, task_struct, thread_info.pcpu_offset);
 	OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, thread_info.preempt_count);
 	OFFSET(TASK_TI_KERNEL_SP, task_struct, thread_info.kernel_sp);
 	OFFSET(TASK_TI_USER_SP, task_struct, thread_info.user_sp);
diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c
index d85916a3660c3..9e95c068b966b 100644
--- a/arch/riscv/kernel/smpboot.c
+++ b/arch/riscv/kernel/smpboot.c
@@ -209,6 +209,11 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle)
 }
 #endif
 
+void __init smp_prepare_boot_cpu(void)
+{
+	__my_cpu_offset = per_cpu_offset(smp_processor_id());
+}
+
 void __init smp_cpus_done(unsigned int max_cpus)
 {
 }
@@ -234,6 +239,8 @@ asmlinkage __visible void smp_callin(void)
 	mmgrab(mm);
 	current->active_mm = mm;
 
+	__my_cpu_offset = per_cpu_offset(smp_processor_id());
+
 #ifdef CONFIG_HOTPLUG_PARALLEL
 	cpuhp_ap_sync_alive();
 #endif
diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
index 5f9457e910e87..4a492a6a1cc1e 100644
--- a/arch/riscv/net/bpf_jit_comp64.c
+++ b/arch/riscv/net/bpf_jit_comp64.c
@@ -1345,15 +1345,8 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 			if (rd != rs)
 				emit_mv(rd, rs, ctx);
 #ifdef CONFIG_SMP
-			/* Load current CPU number in T1 */
-			emit_lw(RV_REG_T1, offsetof(struct thread_info, cpu),
+			emit_lw(RV_REG_T1, offsetof(struct thread_info, pcpu_offset),
 				RV_REG_TP, ctx);
-			/* Load address of __per_cpu_offset array in T2 */
-			emit_addr(RV_REG_T2, (u64)&__per_cpu_offset, extra_pass, ctx);
-			/* Get address of __per_cpu_offset[cpu] in T1 */
-			emit_sh3add(RV_REG_T1, RV_REG_T1, RV_REG_T2, ctx);
-			/* Load __per_cpu_offset[cpu] in T1 */
-			emit_ld(RV_REG_T1, 0, RV_REG_T1, ctx);
 			/* Add the offset to Rd */
 			emit_add(rd, rd, RV_REG_T1, ctx);
 #endif
-- 
2.39.5



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 2/3] riscv: introduce percpu.h into include/asm
  2025-12-08  3:49 ` [PATCH v2 2/3] riscv: introduce percpu.h into include/asm Yunhui Cui
@ 2025-12-09  2:12   ` kernel test robot
  2025-12-09  3:55   ` kernel test robot
  2025-12-09 18:05   ` kernel test robot
  2 siblings, 0 replies; 7+ messages in thread
From: kernel test robot @ 2025-12-09  2:12 UTC (permalink / raw)
  To: Yunhui Cui, aou, alex, andii, andybnac, apatel, ast, ben.dooks,
	bjorn, bpf, charlie, cl, conor.dooley, cyrilbur, daniel, debug,
	dennis, eddyz87, haoluo, john.fastabend, jolsa, kpsingh,
	linux-kernel, linux-mm, linux-riscv, linux, martin.lau, palmer,
	pjw, puranjay
  Cc: oe-kbuild-all

Hi Yunhui,

kernel test robot noticed the following build errors:

[auto build test ERROR on linus/master]
[also build test ERROR on v6.18 next-20251208]
[cannot apply to bpf-next/net bpf-next/master bpf/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Yunhui-Cui/riscv-remove-irqflags-h-inclusion-in-asm-bitops-h/20251208-115407
base:   linus/master
patch link:    https://lore.kernel.org/r/20251208034944.73113-3-cuiyunhui%40bytedance.com
patch subject: [PATCH v2 2/3] riscv: introduce percpu.h into include/asm
config: riscv-allnoconfig (https://download.01.org/0day-ci/archive/20251209/202512090907.agDhp0Nd-lkp@intel.com/config)
compiler: riscv64-linux-gcc (GCC) 15.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251209/202512090907.agDhp0Nd-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202512090907.agDhp0Nd-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from include/linux/atomic.h:80,
                    from include/linux/cpumask.h:10,
                    from include/linux/smp.h:13,
                    from include/linux/lockdep.h:14,
                    from include/linux/spinlock.h:63,
                    from include/linux/mmzone.h:8,
                    from include/linux/gfp.h:7,
                    from include/linux/mm.h:7,
                    from mm/slub.c:13:
   mm/slub.c: In function '__update_cpu_freelist_fast':
>> include/linux/atomic/atomic-arch-fallback.h:414:30: error: implicit declaration of function 'arch_cmpxchg128_local'; did you mean 'arch_cmpxchg64_local'? [-Wimplicit-function-declaration]
     414 | #define raw_cmpxchg128_local arch_cmpxchg128_local
         |                              ^~~~~~~~~~~~~~~~~~~~~
   include/linux/atomic/atomic-instrumented.h:5005:9: note: in expansion of macro 'raw_cmpxchg128_local'
    5005 |         raw_cmpxchg128_local(__ai_ptr, __VA_ARGS__); \
         |         ^~~~~~~~~~~~~~~~~~~~
   arch/riscv/include/asm/percpu.h:231:17: note: in expansion of macro 'cmpxchg128_local'
     231 |         ret__ = cmpxchg128_local(ptr__, old__, new__);                  \
         |                 ^~~~~~~~~~~~~~~~
   include/asm-generic/percpu.h:110:17: note: in expansion of macro 'this_cpu_cmpxchg128'
     110 |         __val = _cmpxchg(pcp, __old, nval);                             \
         |                 ^~~~~~~~
   include/asm-generic/percpu.h:529:9: note: in expansion of macro '__cpu_fallback_try_cmpxchg'
     529 |         __cpu_fallback_try_cmpxchg(pcp, ovalp, nval, this_cpu_cmpxchg128)
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~
   mm/slab.h:24:41: note: in expansion of macro 'this_cpu_try_cmpxchg128'
      24 | #define this_cpu_try_cmpxchg_freelist   this_cpu_try_cmpxchg128
         |                                         ^~~~~~~~~~~~~~~~~~~~~~~
   mm/slub.c:4380:16: note: in expansion of macro 'this_cpu_try_cmpxchg_freelist'
    4380 |         return this_cpu_try_cmpxchg_freelist(s->cpu_slab->freelist_tid,
         |                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~


vim +414 include/linux/atomic/atomic-arch-fallback.h

9257959a6e5b4f Mark Rutland 2023-06-05  413  
9257959a6e5b4f Mark Rutland 2023-06-05 @414  #define raw_cmpxchg128_local arch_cmpxchg128_local
e6ce9d741163af Uros Bizjak  2023-04-05  415  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 2/3] riscv: introduce percpu.h into include/asm
  2025-12-08  3:49 ` [PATCH v2 2/3] riscv: introduce percpu.h into include/asm Yunhui Cui
  2025-12-09  2:12   ` kernel test robot
@ 2025-12-09  3:55   ` kernel test robot
  2025-12-09 18:05   ` kernel test robot
  2 siblings, 0 replies; 7+ messages in thread
From: kernel test robot @ 2025-12-09  3:55 UTC (permalink / raw)
  To: Yunhui Cui, aou, alex, andii, andybnac, apatel, ast, ben.dooks,
	bjorn, bpf, charlie, cl, conor.dooley, cyrilbur, daniel, debug,
	dennis, eddyz87, haoluo, john.fastabend, jolsa, kpsingh,
	linux-kernel, linux-mm, linux-riscv, linux, martin.lau, palmer,
	pjw, puranjay
  Cc: llvm, oe-kbuild-all

Hi Yunhui,

kernel test robot noticed the following build errors:

[auto build test ERROR on linus/master]
[also build test ERROR on v6.18 next-20251209]
[cannot apply to bpf-next/net bpf-next/master bpf/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Yunhui-Cui/riscv-remove-irqflags-h-inclusion-in-asm-bitops-h/20251208-115407
base:   linus/master
patch link:    https://lore.kernel.org/r/20251208034944.73113-3-cuiyunhui%40bytedance.com
patch subject: [PATCH v2 2/3] riscv: introduce percpu.h into include/asm
config: riscv-randconfig-002-20251209 (https://download.01.org/0day-ci/archive/20251209/202512091137.3Qw1dX94-lkp@intel.com/config)
compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251209/202512091137.3Qw1dX94-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202512091137.3Qw1dX94-lkp@intel.com/

All errors (new ones prefixed by >>):

>> mm/slub.c:4380:9: error: call to undeclared function 'arch_cmpxchg128_local'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
    4380 |         return this_cpu_try_cmpxchg_freelist(s->cpu_slab->freelist_tid,
         |                ^
   mm/slab.h:24:39: note: expanded from macro 'this_cpu_try_cmpxchg_freelist'
      24 | #define this_cpu_try_cmpxchg_freelist   this_cpu_try_cmpxchg128
         |                                         ^
   include/asm-generic/percpu.h:529:47: note: expanded from macro 'this_cpu_try_cmpxchg128'
     529 |         __cpu_fallback_try_cmpxchg(pcp, ovalp, nval, this_cpu_cmpxchg128)
         |                                                      ^
   1 error generated.


vim +/arch_cmpxchg128_local +4380 mm/slub.c

0b303fb402862d Vlastimil Babka 2021-05-08  4371  
6801be4f2653e5 Peter Zijlstra  2023-05-31  4372  static inline bool
6801be4f2653e5 Peter Zijlstra  2023-05-31  4373  __update_cpu_freelist_fast(struct kmem_cache *s,
6801be4f2653e5 Peter Zijlstra  2023-05-31  4374  			   void *freelist_old, void *freelist_new,
6801be4f2653e5 Peter Zijlstra  2023-05-31  4375  			   unsigned long tid)
6801be4f2653e5 Peter Zijlstra  2023-05-31  4376  {
b244358e9a1cd6 Vlastimil Babka 2025-11-07  4377  	struct freelist_tid old = { .freelist = freelist_old, .tid = tid };
b244358e9a1cd6 Vlastimil Babka 2025-11-07  4378  	struct freelist_tid new = { .freelist = freelist_new, .tid = next_tid(tid) };
6801be4f2653e5 Peter Zijlstra  2023-05-31  4379  
b244358e9a1cd6 Vlastimil Babka 2025-11-07 @4380  	return this_cpu_try_cmpxchg_freelist(s->cpu_slab->freelist_tid,
b244358e9a1cd6 Vlastimil Babka 2025-11-07  4381  					     &old.freelist_tid, new.freelist_tid);
6801be4f2653e5 Peter Zijlstra  2023-05-31  4382  }
6801be4f2653e5 Peter Zijlstra  2023-05-31  4383  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 2/3] riscv: introduce percpu.h into include/asm
  2025-12-08  3:49 ` [PATCH v2 2/3] riscv: introduce percpu.h into include/asm Yunhui Cui
  2025-12-09  2:12   ` kernel test robot
  2025-12-09  3:55   ` kernel test robot
@ 2025-12-09 18:05   ` kernel test robot
  2 siblings, 0 replies; 7+ messages in thread
From: kernel test robot @ 2025-12-09 18:05 UTC (permalink / raw)
  To: Yunhui Cui, aou, alex, andii, andybnac, apatel, ast, ben.dooks,
	bjorn, bpf, charlie, cl, conor.dooley, cyrilbur, daniel, debug,
	dennis, eddyz87, haoluo, john.fastabend, jolsa, kpsingh,
	linux-kernel, linux-mm, linux-riscv, linux, martin.lau, palmer,
	pjw, puranjay
  Cc: oe-kbuild-all

Hi Yunhui,

kernel test robot noticed the following build warnings:

[auto build test WARNING on linus/master]
[also build test WARNING on v6.18 next-20251209]
[cannot apply to bpf-next/net bpf-next/master bpf/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Yunhui-Cui/riscv-remove-irqflags-h-inclusion-in-asm-bitops-h/20251208-115407
base:   linus/master
patch link:    https://lore.kernel.org/r/20251208034944.73113-3-cuiyunhui%40bytedance.com
patch subject: [PATCH v2 2/3] riscv: introduce percpu.h into include/asm
config: riscv-randconfig-r132-20251209 (https://download.01.org/0day-ci/archive/20251210/202512100134.TRTNjFGL-lkp@intel.com/config)
compiler: riscv64-linux-gcc (GCC) 8.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251210/202512100134.TRTNjFGL-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202512100134.TRTNjFGL-lkp@intel.com/

sparse warnings: (new ones prefixed by >>)
   fs/gfs2/file.c: note: in included file (through include/linux/irqflags.h, include/linux/spinlock.h, include/linux/mmzone.h, ...):
>> arch/riscv/include/asm/percpu.h:113:1: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ff)
>> arch/riscv/include/asm/percpu.h:113:1: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ff)
>> arch/riscv/include/asm/percpu.h:113:1: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ffff)
>> arch/riscv/include/asm/percpu.h:113:1: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ffff)
>> arch/riscv/include/asm/percpu.h:113:1: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ff)
>> arch/riscv/include/asm/percpu.h:113:1: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ff)
>> arch/riscv/include/asm/percpu.h:113:1: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ffff)
>> arch/riscv/include/asm/percpu.h:113:1: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ffff)
--
   fs/gfs2/trans.c: note: in included file (through include/linux/irqflags.h, include/linux/spinlock.h, include/linux/sched.h):
>> arch/riscv/include/asm/percpu.h:113:1: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ff)
>> arch/riscv/include/asm/percpu.h:113:1: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ff)
>> arch/riscv/include/asm/percpu.h:113:1: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ffff)
>> arch/riscv/include/asm/percpu.h:113:1: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ffff)
>> arch/riscv/include/asm/percpu.h:113:1: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ff)
>> arch/riscv/include/asm/percpu.h:113:1: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ff)
>> arch/riscv/include/asm/percpu.h:113:1: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ffff)
>> arch/riscv/include/asm/percpu.h:113:1: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ffff)
>> arch/riscv/include/asm/percpu.h:113:1: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ff)
>> arch/riscv/include/asm/percpu.h:113:1: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ff)
>> arch/riscv/include/asm/percpu.h:113:1: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ffff)
>> arch/riscv/include/asm/percpu.h:113:1: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ffff)
>> arch/riscv/include/asm/percpu.h:113:1: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ff)
>> arch/riscv/include/asm/percpu.h:113:1: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ff)
>> arch/riscv/include/asm/percpu.h:113:1: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ffff)
>> arch/riscv/include/asm/percpu.h:113:1: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ffff)
>> arch/riscv/include/asm/percpu.h:113:1: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ff)
>> arch/riscv/include/asm/percpu.h:113:1: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ff)
>> arch/riscv/include/asm/percpu.h:113:1: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ffff)
>> arch/riscv/include/asm/percpu.h:113:1: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ffff)
>> arch/riscv/include/asm/percpu.h:113:1: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ff)
>> arch/riscv/include/asm/percpu.h:113:1: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ff)
>> arch/riscv/include/asm/percpu.h:113:1: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ffff)
>> arch/riscv/include/asm/percpu.h:113:1: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ffff)

vim +113 arch/riscv/include/asm/percpu.h

   108	
   109	#define PERCPU_OP_8_16(op_name, op, expr, final_op)			\
   110		PERCPU_8_16_OP(op_name, op, 8, .b, u8, expr, final_op);		\
   111		PERCPU_8_16_OP(op_name, op, 16, .h, u16, expr, final_op)
   112	
 > 113	PERCPU_OP_8_16(add, add, val, add)
   114	PERCPU_OP_8_16(andnot, and, ~val, and)
   115	PERCPU_OP_8_16(or, or, val, or)
   116	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-12-09 18:06 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-12-08  3:49 [PATCH v2 0/3] RISC-V: add percpu.h to include/asm Yunhui Cui
2025-12-08  3:49 ` [PATCH v2 1/3] riscv: remove irqflags.h inclusion in asm/bitops.h Yunhui Cui
2025-12-08  3:49 ` [PATCH v2 2/3] riscv: introduce percpu.h into include/asm Yunhui Cui
2025-12-09  2:12   ` kernel test robot
2025-12-09  3:55   ` kernel test robot
2025-12-09 18:05   ` kernel test robot
2025-12-08  3:49 ` [PATCH v2 3/3] riscv: store percpu offset into thread_info Yunhui Cui

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox