From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C56BCC83F1A for ; Fri, 18 Jul 2025 06:41:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 67A4F8D0003; Fri, 18 Jul 2025 02:41:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6516D8D0001; Fri, 18 Jul 2025 02:41:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 566AB8D0003; Fri, 18 Jul 2025 02:41:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 460788D0001 for ; Fri, 18 Jul 2025 02:41:03 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id E4A6AB8487 for ; Fri, 18 Jul 2025 06:41:02 +0000 (UTC) X-FDA: 83676437964.23.F309A2C Received: from mail-oo1-f45.google.com (mail-oo1-f45.google.com [209.85.161.45]) by imf17.hostedemail.com (Postfix) with ESMTP id 2A24C40002 for ; Fri, 18 Jul 2025 06:41:00 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=lRj8Qh0T; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf17.hostedemail.com: domain of cuiyunhui@bytedance.com designates 209.85.161.45 as permitted sender) smtp.mailfrom=cuiyunhui@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752820861; a=rsa-sha256; cv=none; b=K5UDGq/Yz2Ph3/4XOpXrrCf9k1Tu+5vIkPkt6H7ze3WTFDlrEmAf7DpcjHebj8ADHx/fTN ZXWC2hd4wzgSv8mJmGL5P/o1Wjq+BHLNis5B0BnbMb3pAJcW0zxYbLXKeJHyW2VTUbkX0h 8gpdji/6DlAhmHy2OxtujZ05jh+BS/0= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=lRj8Qh0T; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf17.hostedemail.com: domain of cuiyunhui@bytedance.com designates 209.85.161.45 as permitted sender) smtp.mailfrom=cuiyunhui@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752820861; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=X7dKWbZdVMrwFL5WdZSRxXUx2ZtPr1EjW0b8TdOPUJg=; b=Xlml1vtCXbdsTsSnfl/Ffy8Kh5kTaiWGNtucmO6yCi9GHFhWb10LA84K6Nu7xt3+a6i1Jr zRh8CT+dtvmAzuTX3MT2VMtN8ErDd4fAb67QhMNlAhKrcfa4blNucFsAWKOa1cyiQ0g5ck HkqhIEvuT+7F3tgkl655/G51Vpii2JA= Received: by mail-oo1-f45.google.com with SMTP id 006d021491bc7-615bc88dd3aso160974eaf.2 for ; Thu, 17 Jul 2025 23:40:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1752820859; x=1753425659; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=X7dKWbZdVMrwFL5WdZSRxXUx2ZtPr1EjW0b8TdOPUJg=; b=lRj8Qh0TyO6AbyyeoiOQcl+cimOek6JQg0yjCPE37TgOuYpFsDmkxj4JalZ1nOmQKe gjVkAXHmaFxY2JDp9P+IS2/UXNSwf3BwI0HfobNFsUWeL0jwlyaHpbzJaQeG/0gkTPwM MDD2tZzi5QAu4aOUp1T3IGnAmBvX+TfykOLxgQzrFNo74xzYwYozNViz7C0M3GrXuA8L sA5wnLNJVaJIyZP3J7nYU3i/8cdSWccFKOv2EqxFtGovjq7B60aHv7XhcBSZk97UZOLy HlDqlwhiXh78MF2vuUVDG0GLe2+kKZ8QBKYVDQBAH9/sBmx5BQ/0XOo2QT/iOmYS6FDL PArw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752820859; x=1753425659; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=X7dKWbZdVMrwFL5WdZSRxXUx2ZtPr1EjW0b8TdOPUJg=; b=G1bSXAQLpUYm5xJ9VuCZPX1HeGsOUR80VRjOeCXTresXIUTv6toYI3OCAcMC2+LVOg 9O/v05KCe1TSVb4m+bJQhjCmQF8fw9+wenAg4/vbexP4dtwa1l+qTvIbPLNOYiLGubRu jWg9NkzNtcfPfuxa00kAOWovhpM8aD28bZ8q0OFDeEB06FACqMKc99TX4FDawTcXAs+/ 6BMbe9wbquTfHl4YF5grMRQHtR17DPRvvkNISSEn59MWMtkXhnpo2t3xmtk+2ECy/y5c AejZvqw/m0fSRPdqVLUCGgsa3kPRXaoIv5KWbk/2wVPk3RzoCEshAHjp7aPoPgwEIh8C LfPQ== X-Forwarded-Encrypted: i=1; AJvYcCVwzSYbTwGQhYKkZdoYMNZA4QLqD5PhNdZyh8Y1Qw+mcaDoBrFQwdM1kzovlIlZtJqnhYoUnuAwcg==@kvack.org X-Gm-Message-State: AOJu0Yw/Q9AmeeqzvEDniRjahVcJB4VbxLc/YATNoF/430NpIuVbbN8r OprWqPcX/vTwS3K9373Vm4Lua7GijcIGzca2H3z5+IhFwA69TsA0IX4uwMVcyihgOf37RJ/TCzy jFTg5n0kFAN1zAO6r65IDZz9CcWfHuQ5b3lFiqvBf0g== X-Gm-Gg: ASbGncvTF831jEqnPl9LMFCtr49mSCuZ96JrIsidxaqRerK5rc+5J4B+kmXjBQZ8t54 h3z1Hqtdj4ZJqIAX7V2ZSw9+FmCE5hIr7hKWde52DWWP9nm/deTE/47Pryhc8Q9u86ctDnrRQvw AvrlKTg4qtair6b8GgZbRsNq3WEZ9MIsoETGO4DixsuNPOjIdArbexv0zMJmxQI4SGUV3T2zR7d aAAPSuU6jxiLuJnXSzRatEd X-Google-Smtp-Source: AGHT+IGXQx+9+D6Byk3UXF81OJJT/N94+GYvK1ZOdYb5atejUT67iwYQFLmGj8mi5zdY0L0jDSNAUdYsuxI2RrKi4vk= X-Received: by 2002:a4a:ee05:0:b0:615:b344:927 with SMTP id 006d021491bc7-615b3440a8amr3164215eaf.3.1752820858950; Thu, 17 Jul 2025 23:40:58 -0700 (PDT) MIME-Version: 1.0 References: <20250618034328.21904-1-cuiyunhui@bytedance.com> <20250618034328.21904-2-cuiyunhui@bytedance.com> In-Reply-To: From: yunhui cui Date: Fri, 18 Jul 2025 14:40:47 +0800 X-Gm-Features: Ac12FXwmaPFHuBEP0HhRKKK58IY4j11d1En-lve2np4vZlMrzO0q73zQBQvU45s Message-ID: Subject: Re: [External] Re: [PATCH RFC 2/2] riscv: introduce percpu.h into include/asm To: Alexandre Ghiti Cc: yury.norov@gmail.com, linux@rasmusvillemoes.dk, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, dennis@kernel.org, tj@kernel.org, cl@gentwo.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 2A24C40002 X-Stat-Signature: 9bpczsfii5fdzidox7jzcg8fr4nsti9e X-HE-Tag: 1752820860-421748 X-HE-Meta: U2FsdGVkX1/NgYkrBVvqS/SMTbgYLMniWeSVYtSRiDpP3bJumj2OgI56UX5oNaq9NnKGat+0UOCnpNsM2aNnpdbV51DsSfTevlm+GCMcMCR8VB0zz+Grd0btpsrgs/z8Nu+BjdlYFILan9xtbeACqPtFcY6Lw38dyRZkX/PlHbSSU+w8nWdQn1CQta2I86+wHsjcDmZcQcGAm1mR12wmMFmZYP17lv+l3ZpVKtWzD41AJWkg3mCl+uGlBqmhD+CDPo3kv3ChDvBJSAZGeYjuMC2ddGZxvt1GY77ogVaNxWr8SV/7RkHro/UdYfAG0iLAysbLLFwYlSVDuNCXnwKYJf58Sh3fcFOZhC3Q9iTBQDpmJzkTjV+L5krju8+e+CmWJd/YJwPOC+DmGkyJ9FKvcvwiDsBQb37PHARlRzR1bs+zif+VEV+Gd7BhiQjz3KWXROkZPkRFnmM6no//QjDAJoSt1yQZw8WlBFHW2N2UomMJ3XSHslQK6vufiLdtUMFUkNZsB8P0yjHrTxcqLpNkETt0CkgHo4zVohmnbjXAQzyEHo3Ufm3Yl19AidsH+JJ2ejLOesTvuzTw4PDTZIMj8WEBh3fVBUUkbfPIsjZI6DBFArR4mIAlGm200grep1zIYId1xP+quJPy5l2YRsq1stcEiDDihk5G8HLb4gKaX6PSxhFHlQWFZ85VphmkPwOKYyKMUdFwHR+XCOyWFvMT+2fB4EgLoqTaFfpOhhl/2AABIPG5wRorZSjvd9YAZam6asX2mCGTZfbXI1LLdZOZNuefr77PqBIuysr9L28jZiCA8psv09aqAzXCh/4G2NPbU0hJhVzmtVnHxas0DqUJMhjQrqZ4AwSBxStbvDk1MSVIeJ5Ghi4wEMrokWvpM98n9Vhd4KuxfCpgn356WtaV0GS0bJZXrywQOKCgP8sOfDgK5GQS77eRv2V1+JbXAqwD7NR4ab77BLiCpXDd7Ne FIBhRkkn SFlmrgUKvP6hRvZD0oASafbQ44bZHihvl68CXwLOvlFH+ApxfC3cG+FyxXzWU/0Pwi4qgunJkEPIBmsJPUKzAPA4akM2iPz10iAO9x62pktFjEs+5N18+Bro6Pa4rEOdVuDZNDAM2KEoK2Qlb8UyJSJoCxaO1J/zNKC/gpp7nMZrGmnC98NZRKOzzrq7XoDViqt9ujQtlOs5RD99YUPza8JJFVX2spUWibIoRDKjZjkqysDecpA9Zy7WEj+8dfFXN99w8hqDhAshbUAZ1hlXOgghWCFH8/Syrwg88eHGGa+q6jj9EEgTHmRcEkvtLWUjGX7pzhR7cjO/wREXt0GsKj++MBYJ9xQ5yz8RjxWZcyhn6FaVjBvbCLiq2D2bNAk9ar1cU4n0Ek7emLEs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Alex, On Thu, Jul 17, 2025 at 9:06=E2=80=AFPM Alexandre Ghiti wro= te: > > On 7/17/25 15:04, Alexandre Ghiti wrote: > > Hi Yunhui, > > > > On 6/18/25 05:43, Yunhui Cui wrote: > >> Current percpu operations rely on generic implementations, where > >> raw_local_irq_save() introduces substantial overhead. Optimization > >> is achieved through atomic operations and preemption disabling. > >> > >> Signed-off-by: Yunhui Cui > >> --- > >> arch/riscv/include/asm/percpu.h | 138 ++++++++++++++++++++++++++++++= ++ > >> 1 file changed, 138 insertions(+) > >> create mode 100644 arch/riscv/include/asm/percpu.h > >> > >> diff --git a/arch/riscv/include/asm/percpu.h > >> b/arch/riscv/include/asm/percpu.h > >> new file mode 100644 > >> index 0000000000000..423c0d01f874c > >> --- /dev/null > >> +++ b/arch/riscv/include/asm/percpu.h > >> @@ -0,0 +1,138 @@ > >> +/* SPDX-License-Identifier: GPL-2.0-only */ > >> + > >> +#ifndef __ASM_PERCPU_H > >> +#define __ASM_PERCPU_H > >> + > >> +#include > >> + > >> +#define PERCPU_RW_OPS(sz) \ > >> +static inline unsigned long __percpu_read_##sz(void *ptr) \ > >> +{ \ > >> + return READ_ONCE(*(u##sz *)ptr); \ > >> +} \ > >> + \ > >> +static inline void __percpu_write_##sz(void *ptr, unsigned long > >> val) \ > >> +{ \ > >> + WRITE_ONCE(*(u##sz *)ptr, (u##sz)val); \ > >> +} > >> + > >> +#define __PERCPU_AMO_OP_CASE(sfx, name, sz, amo_insn) \ > >> +static inline void \ > >> +__percpu_##name##_amo_case_##sz(void *ptr, unsigned long val) = \ > >> +{ \ > >> + asm volatile ( \ > >> + "amo" #amo_insn #sfx " zero, %[val], %[ptr]" \ > >> + : [ptr] "+A" (*(u##sz *)ptr) \ > >> + : [val] "r" ((u##sz)(val)) \ > >> + : "memory"); \ > >> +} > >> + > >> +#define __PERCPU_AMO_RET_OP_CASE(sfx, name, sz, amo_insn) \ > >> +static inline u##sz \ > >> +__percpu_##name##_return_amo_case_##sz(void *ptr, unsigned long > >> val) \ > >> +{ \ > >> + register u##sz ret; \ > >> + \ > >> + asm volatile ( \ > >> + "amo" #amo_insn #sfx " %[ret], %[val], %[ptr]" \ > >> + : [ptr] "+A" (*(u##sz *)ptr), [ret] "=3Dr" (ret) \ > >> + : [val] "r" ((u##sz)(val)) \ > >> + : "memory"); \ > >> + \ > >> + return ret + val; \ > >> +} > >> + > >> +#define PERCPU_OP(name, amo_insn) \ > >> + __PERCPU_AMO_OP_CASE(.b, name, 8, amo_insn) \ > >> + __PERCPU_AMO_OP_CASE(.h, name, 16, amo_insn) \ > >> + __PERCPU_AMO_OP_CASE(.w, name, 32, amo_insn) \ > >> + __PERCPU_AMO_OP_CASE(.d, name, 64, amo_insn) \ > >> + > >> +#define PERCPU_RET_OP(name, amo_insn) \ > >> + __PERCPU_AMO_RET_OP_CASE(.b, name, 8, amo_insn) \ > >> + __PERCPU_AMO_RET_OP_CASE(.h, name, 16, amo_insn) \ > >> + __PERCPU_AMO_RET_OP_CASE(.w, name, 32, amo_insn) \ > >> + __PERCPU_AMO_RET_OP_CASE(.d, name, 64, amo_insn) > >> + > >> +PERCPU_RW_OPS(8) > >> +PERCPU_RW_OPS(16) > >> +PERCPU_RW_OPS(32) > >> +PERCPU_RW_OPS(64) > >> + > >> +PERCPU_OP(add, add) > >> +PERCPU_OP(andnot, and) > >> +PERCPU_OP(or, or) > >> +PERCPU_RET_OP(add, add) > >> + > >> +#undef PERCPU_RW_OPS > >> +#undef __PERCPU_AMO_OP_CASE > >> +#undef __PERCPU_AMO_RET_OP_CASE > >> +#undef PERCPU_OP > >> +#undef PERCPU_RET_OP > >> + > >> +#define _pcp_protect(op, pcp, ...) \ > >> +({ \ > >> + preempt_disable_notrace(); \ > >> + op(raw_cpu_ptr(&(pcp)), __VA_ARGS__); \ > >> + preempt_enable_notrace(); \ > >> +}) > >> + > >> +#define _pcp_protect_return(op, pcp, args...) \ > >> +({ \ > >> + typeof(pcp) __retval; \ > >> + preempt_disable_notrace(); \ > >> + __retval =3D (typeof(pcp))op(raw_cpu_ptr(&(pcp)), ##args); \ > >> + preempt_enable_notrace(); \ > >> + __retval; \ > >> +}) > >> + > >> +#define this_cpu_read_1(pcp) _pcp_protect_return(__percpu_read_8, pcp= ) > >> +#define this_cpu_read_2(pcp) _pcp_protect_return(__percpu_read_16, pc= p) > >> +#define this_cpu_read_4(pcp) _pcp_protect_return(__percpu_read_32, pc= p) > >> +#define this_cpu_read_8(pcp) _pcp_protect_return(__percpu_read_64, pc= p) > >> + > >> +#define this_cpu_write_1(pcp, val) _pcp_protect(__percpu_write_8, > >> pcp, (unsigned long)val) > >> +#define this_cpu_write_2(pcp, val) _pcp_protect(__percpu_write_16, > >> pcp, (unsigned long)val) > >> +#define this_cpu_write_4(pcp, val) _pcp_protect(__percpu_write_32, > >> pcp, (unsigned long)val) > >> +#define this_cpu_write_8(pcp, val) _pcp_protect(__percpu_write_64, > >> pcp, (unsigned long)val) > >> + > >> +#define this_cpu_add_1(pcp, val) > >> _pcp_protect(__percpu_add_amo_case_8, pcp, val) > >> +#define this_cpu_add_2(pcp, val) > >> _pcp_protect(__percpu_add_amo_case_16, pcp, val) > >> +#define this_cpu_add_4(pcp, val) > >> _pcp_protect(__percpu_add_amo_case_32, pcp, val) > >> +#define this_cpu_add_8(pcp, val) > >> _pcp_protect(__percpu_add_amo_case_64, pcp, val) > >> + > >> +#define this_cpu_add_return_1(pcp, val) \ > >> +_pcp_protect_return(__percpu_add_return_amo_case_8, pcp, val) > >> + > >> +#define this_cpu_add_return_2(pcp, val) \ > >> +_pcp_protect_return(__percpu_add_return_amo_case_16, pcp, val) > >> + > >> +#define this_cpu_add_return_4(pcp, val) \ > >> +_pcp_protect_return(__percpu_add_return_amo_case_32, pcp, val) > >> + > >> +#define this_cpu_add_return_8(pcp, val) \ > >> +_pcp_protect_return(__percpu_add_return_amo_case_64, pcp, val) > >> + > >> +#define this_cpu_and_1(pcp, val) > >> _pcp_protect(__percpu_andnot_amo_case_8, pcp, ~val) > >> +#define this_cpu_and_2(pcp, val) > >> _pcp_protect(__percpu_andnot_amo_case_16, pcp, ~val) > >> +#define this_cpu_and_4(pcp, val) > >> _pcp_protect(__percpu_andnot_amo_case_32, pcp, ~val) > >> +#define this_cpu_and_8(pcp, val) > >> _pcp_protect(__percpu_andnot_amo_case_64, pcp, ~val) > > > > > > Why do we define __percpu_andnot based on amoand, and use > > __percpu_andnot with ~val here? Can't we just define __percpu_and? > > > > > >> + > >> +#define this_cpu_or_1(pcp, val) _pcp_protect(__percpu_or_amo_case_8, > >> pcp, val) > >> +#define this_cpu_or_2(pcp, val) > >> _pcp_protect(__percpu_or_amo_case_16, pcp, val) > >> +#define this_cpu_or_4(pcp, val) > >> _pcp_protect(__percpu_or_amo_case_32, pcp, val) > >> +#define this_cpu_or_8(pcp, val) > >> _pcp_protect(__percpu_or_amo_case_64, pcp, val) > >> + > >> +#define this_cpu_xchg_1(pcp, val) _pcp_protect_return(xchg_relaxed, > >> pcp, val) > >> +#define this_cpu_xchg_2(pcp, val) _pcp_protect_return(xchg_relaxed, > >> pcp, val) > >> +#define this_cpu_xchg_4(pcp, val) _pcp_protect_return(xchg_relaxed, > >> pcp, val) > >> +#define this_cpu_xchg_8(pcp, val) _pcp_protect_return(xchg_relaxed, > >> pcp, val) > >> + > >> +#define this_cpu_cmpxchg_1(pcp, o, n) > >> _pcp_protect_return(cmpxchg_relaxed, pcp, o, n) > >> +#define this_cpu_cmpxchg_2(pcp, o, n) > >> _pcp_protect_return(cmpxchg_relaxed, pcp, o, n) > >> +#define this_cpu_cmpxchg_4(pcp, o, n) > >> _pcp_protect_return(cmpxchg_relaxed, pcp, o, n) > >> +#define this_cpu_cmpxchg_8(pcp, o, n) > >> _pcp_protect_return(cmpxchg_relaxed, pcp, o, n) > >> + > >> +#include > >> + > >> +#endif /* __ASM_PERCPU_H */ > > > > > > It all looks good to me, just one thing, can you also implement > > this_cpu_cmpxchg64/128()? > > > > One last thing sorry, can you add a cover letter too? Okay. > > Thanks! > > Alex > > > > And since this is almost a copy/paste from arm64, either mention it at > > the top of the file or (better) merge both implementations somewhere > > to avoid redefining existing code :) But up to you. Actually, there's a concern here. We should account for scenarios where ZABHA isn't supported. Given that xxx_8() and xxx_16() are rarely used in practice, could we initially support only xxx_32() and xxx_64()? For xxx_8() and xxx_16(), we could default to the generic implementation. > > > > Reviewed-by: Alexandre Ghiti > > > > Thanks, > > > > Alex > > > > > > Thanks, Yunhui