From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3AAD4C83F1D for ; Thu, 10 Jul 2025 11:47:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AB6286B0096; Thu, 10 Jul 2025 07:47:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A8E786B0098; Thu, 10 Jul 2025 07:47:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9CAE86B0099; Thu, 10 Jul 2025 07:47:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 8D0E56B0096 for ; Thu, 10 Jul 2025 07:47:43 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 3668EC0179 for ; Thu, 10 Jul 2025 11:47:43 +0000 (UTC) X-FDA: 83648180406.25.143ECA0 Received: from mail-oo1-f42.google.com (mail-oo1-f42.google.com [209.85.161.42]) by imf22.hostedemail.com (Postfix) with ESMTP id 6CD63C000B for ; Thu, 10 Jul 2025 11:47:40 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=J4X1XZsG; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf22.hostedemail.com: domain of cuiyunhui@bytedance.com designates 209.85.161.42 as permitted sender) smtp.mailfrom=cuiyunhui@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752148061; a=rsa-sha256; cv=none; b=CkXFtRZqNbSoiPgHn+1JF4G0c3M8Zor9/L3MabQY6CYDWJ07jXMSJg/X3iimc0N7z4D5OO JwIoNPA/cAaKjiJYYD9FVqUo/HO+sHurcdvUVIHJPkKOJFSyGGW6rob8E0VRUvYiKtibfK NiQZEXj1+tph4CNF/OjJP0+8TuvIzRY= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=J4X1XZsG; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf22.hostedemail.com: domain of cuiyunhui@bytedance.com designates 209.85.161.42 as permitted sender) smtp.mailfrom=cuiyunhui@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752148061; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WRJ7N+W5JU0EuTuULowXAGtXu+e/abUVNu+nA6+YkOg=; b=hlTzxBneJy3guIBz5gbJN2rY+l8FgNL0kU+8zm+P80JheGFGfi4n8bbPauysMzCKd7wTAu thbg/L/ZlFd4OdO/y9xKyb64COf4S+GMYuekOj8jL94t7sKc22/0tjofJQkWgqDQO2ak3h mDcNRxOUW4Hq5aPX2Ae9Dzhq9NZ9qvg= Received: by mail-oo1-f42.google.com with SMTP id 006d021491bc7-6116d9bb6ecso477975eaf.3 for ; Thu, 10 Jul 2025 04:47:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1752148059; x=1752752859; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=WRJ7N+W5JU0EuTuULowXAGtXu+e/abUVNu+nA6+YkOg=; b=J4X1XZsGRLG3IveX1ZDjhh4szfIBsZyX7+ME/5+UikeeuAswZk7Cd97kG+KiFAstzB VKk4pSMpzRjj8Ey8IeuYea3hx05TKd0RzFyf/El2YST8CEblccj/aVSUJuB9c60Pf6ud UyPkGlXnfPlR9YTaFBY5d1PulmeUPo5mTQAUcvEtAAN/vzGi4l8WG5ngKaY45qL1tbTp gCJ35EZHSwXKBKt5GoxegjBTWiLwjzAVFmOSo06SOrLCAkJFnobdz0x8gSarVJCbsV9l 5gprthDWmurZdC5fr49QfZC7ZsuOOTbvb1PG2u/YQV3c3C3m4fpjYL4vA/cm10vmtT80 dL+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752148059; x=1752752859; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WRJ7N+W5JU0EuTuULowXAGtXu+e/abUVNu+nA6+YkOg=; b=sD2awpPkXL4IHEVchilfhO7BVBVuL5xt/GaKODZBaAHNZ31cLg+Gy6GfCv8sHPXP3j Q+cL//+ofBnSAbYz8vm65VgoT6UqIJW8t+/xG3/1ZdVm68Dcb7Z2IrqyfP+O3GE1UPqQ 5DqPSj22BajjRJchSkWkbSo55XSIGQc3D0TK1g6IRZR38UxXxXCs7eabQbrlP7Pm5SVU HvDL4zKp8xYXLwQAoYvUBtDIF7AFuFrOBFJasWEXHplxl9TvwqPcQ76jmL2IQ+ldQziO DWPv3UI8ZzvbrNaeQlGesLbDh9A66p/O1vxYTmtJTG7kg4d+0SL6+XK/Fw6kFLefzTzK xoEA== X-Forwarded-Encrypted: i=1; AJvYcCX7teVEwTJBhLfWIDvBO6SRqVHC/3TXN7c++W4d0eNY/UCZ0wrQ+RWNrB/d6xsHKnzqRMd2wFD21w==@kvack.org X-Gm-Message-State: AOJu0YwGsrpIjBtTCD/8UOcDqFe4hzlaylQ6hkMRREwz4E4FD1tF9QfS oEKF1fmJV/XZA/CpD/l2jh9R2Db0Y7LrPj9mUI9XYGlB3Ht9vVD8BJYS4dhlWAK/fhTX3tegaGE x8efyjVE1BkAJhv3O36g0Q0i6Bo14uPpcan4nPQ08xw== X-Gm-Gg: ASbGncv7MNTB4784ia56LDsmUwVKIp+lHpDXwbsX0bPTde7qsVzN6IEURX8YoND3xNM HA2GhKRSEdU/KpDbee3JfFzaPLtJ5Byz1Jade5zt9S+aMFMXztFVjeJ6J7EfF4udsQsyt4gHSz/ +l3zLd1yDCxLnhcSQO7BYMVBKtZ+mP3pB/NQ60gmHQBpb9Qg== X-Google-Smtp-Source: AGHT+IGkL47B+0utyAnO83uLx6JcDnDsO1PT+QZvP0dfZWDUEj9DS92oh6uIPmw6PeMPCn8E2rQmnD+e+3ahsKoanmI= X-Received: by 2002:a05:6820:4c12:b0:613:d09e:7531 with SMTP id 006d021491bc7-613d9de7a84mr1441606eaf.1.1752148059063; Thu, 10 Jul 2025 04:47:39 -0700 (PDT) MIME-Version: 1.0 References: <20250704084500.62688-1-cuiyunhui@bytedance.com> In-Reply-To: From: yunhui cui Date: Thu, 10 Jul 2025 19:47:27 +0800 X-Gm-Features: Ac12FXw9bF0Hi8EPNeiS01K8ULVFbmrHmgwK8R27PwbH4rDW7ZMa3s2RNd6nfSc Message-ID: Subject: Re: [External] [PATCH] RISC-V: store percpu offset in CSR_SCRATCH To: =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= Cc: masahiroy@kernel.org, nathan@kernel.org, nicolas.schier@linux.dev, dennis@kernel.org, tj@kernel.org, cl@gentwo.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, alex@ghiti.fr, andybnac@gmail.com, bjorn@rivosinc.com, cyrilbur@tenstorrent.com, rostedt@goodmis.org, puranjay@kernel.org, ben.dooks@codethink.co.uk, zhangchunyan@iscas.ac.cn, ruanjinjie@huawei.com, jszhang@kernel.org, charlie@rivosinc.com, cleger@rivosinc.com, antonb@tenstorrent.com, ajones@ventanamicro.com, debug@rivosinc.com, haibo1.xu@intel.com, samuel.holland@sifive.com, linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-riscv@lists.infradead.org, linux-riscv , wangziang.ok@bytedance.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 6CD63C000B X-Stat-Signature: 1reafhg777dcogj7ohhscooikjiewb7i X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1752148060-699700 X-HE-Meta: U2FsdGVkX1+Jp0zEKvDyA5pKT9Q901KYugAmsXhFny3hf2yfF6e/U5Pzn/FKsyXKQSXYPj5tWYL10wQ6OQi42DHwzJlXRytZItiI5+ULLZ8yXJEAtoLJPvvDXE+NiD7lP11oltYHx5WUtH3557k4xjwSTLoo9K9+Vx8gAmbPXss8L5nY6UWHCoLeWlSR63KcZ7p2BdPs+kflk2g8h3eOr+E2NTWMoieNBljakGJ8eUc+1jw/cBw1qO8nU7ZltEDPl6E9/t+f0Uu7s6cg0YAu4ZHw1BXs9J/7blmvMmMIMVVXP1SRICImHZ6usR3hEh0L+MzS4thHVVy29j+NKD5crSIJ3P8meB/BK9ugayHSB5vPBphVzPoB3zYJWKgQ4bJG64xI4dee7CHGj0R7P5ny0CicWTNr0N1PHegxZRosWSlVbm4Id/y7QawqdURQK9/GnjaCHVpeMCqfq/aTupglnAWCh+Nqja+aLjqyjv6Qmj9/QxGTUzF+Gp7GoHjZe5mbDsr7H3PjsnS1xwW9aRW9qZl3nadi5H3bbCa+7pUFGU12Mnzk8XyIae47QTWMU8xEmaon9ezppzp0BLkdppgOCxcJ6nyEva5kmIFXQ21L54aeF6ib+amDDP7Hws/89J/bXxneqv1wF3v1vPYuXv725tgHchs1Y57CTFCgWRlT8Nmd8oZU27FOIh+EYzcK2Fpi5DWDeoE4K16yvM/5V+eiIUuOR2NqqxFRqGPE0h0XcMeMyL23WivgP6cJ29RNJQkmtyl4JGz+lYCIUul/RWsYxmrqndE2dcUpOMP+V3zDOK5G533uNfJPWkZ+UUhdAdgSBTdfUZrBy3JKiu76gjjg/H6KT95un2drgFH3YfNDSxoecZx4rwaAQYd5ksUq2/1DogrzkT9AZ04FY5UT1lmtouKCd+7L32EfDoDE/sNFQWsk5zoBARxHLsXrcqqbVqI27KUb6g1MFr7x+7wtRol gXj67C5q fVDzxP7iD8ZIl3u0iLpA0rEGtnf4coZjHGJvdgCiYmmQM9Lqwt0bv7DgnKySMjC0q6mrbJVjazdaB4eIc94v4lH8s4eu24lfS1cm+Kx11j0TvQgIBzz+BKs95sUSw2lo/YfO0gwSxpduGByg5s2/9RIwbiL3snhiYMjPY+yOfOfHqjn7iLkBnJqKLpNbM95DWuB1i2eBcJbXzuCO3iI/UgPVTAzo7xouP3KkNKFFr6WOBxAhIRgC/J1cuVZjeBhgjCu0eZSxP+TyJG4/6nJOBxeLPyq6hr9OGsUukq01IecWITVSJcWv+eGmeICpE7RSxH/kMA4+CezqunzJsSiMuDYkOdjOWYIHYMDXeIAzp182oWQeqK87lyZV6MwEjCScC1qzXQv3E5dqPYvw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Radim, On Thu, Jul 10, 2025 at 2:35=E2=80=AFPM Radim Kr=C4=8Dm=C3=A1=C5=99 wrote: > > 2025-07-10T11:45:06+08:00, yunhui cui : > > On Wed, Jul 9, 2025 at 10:20=E2=80=AFPM Radim Kr=C4=8Dm=C3=A1=C5=99 wrote: > >> Is the overhead above with this patch? And when we then use the > >> CSR_SCRATCH for percpu, does it degrade even further? > > > > We can see that the percpu optimization is around 2.5% through the > > method of fixing registers, and we can consider that the percpu > > optimization can bring a 2.5% gain. Is there no need to add the percpu > > optimization logic on the basis of the scratch patch for testing? > > > > Reference: https://lists.riscv.org/g/tech-privileged/message/2485 > > That is when the value is in a GPR, though, and we don't know the > performance of a CSR_SCRATCH access. > We can hope that it's not much worse than a GPR, but an implementation > might choose to be very slow with CSR_SCRATCH. > > I have in mind another method where we can use the current CSR_SCRATCH > without changing CSR_TVAL, but I don't really want to spend time on it > if reading the CSR doesn't give any benefit. > > It would be to store the percpu offset in CSR_SCRATCH permanently, do > the early exception register shuffling with a percpu area storage, and > load the thread pointer from there as well. > That method would also eliminate writing CSR_SCRATCH on every exception > entry+exit, so maybe it makes sense to try it even if CSRs are slow... > > Thanks. Based on the patch, optimizations for percpu offset have been added, with the following data: 6.989 7.046 6.976 6.986 7.001 7.017 7.007 7.064 7.008 7.039 Geometric mean: 7.013248303 Compared to reusing the scratch register, the performance has improved by approximately 0.7%. If more optimizations can be made to the scratch register, there should be further performance improvements. Patch: --- arch/riscv/include/asm/percpu.h | 14 ++++++++++++++ arch/riscv/kernel/asm-offsets.c | 1 + arch/riscv/kernel/entry.S | 7 +++++++ arch/riscv/kernel/smpboot.c | 3 +++ 4 files changed, 25 insertions(+) create mode 100644 arch/riscv/include/asm/percpu.h diff --git a/arch/riscv/include/asm/percpu.h b/arch/riscv/include/asm/percp= u.h new file mode 100644 index 000000000000..1fbfcb108f84 --- /dev/null +++ b/arch/riscv/include/asm/percpu.h @@ -0,0 +1,14 @@ +#ifndef __ASM_PERCPU_H +#define __ASM_PERCPU_H + +static inline void set_my_cpu_offset(unsigned long off) +{ + csr_write(CSR_SCRATCH, off); +} + +#define __my_cpu_offset csr_read(CSR_SCRATCH) + +#include + +#endif + diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offset= s.c index a03129f40c46..0ce96f30bf32 100644 --- a/arch/riscv/kernel/asm-offsets.c +++ b/arch/riscv/kernel/asm-offsets.c @@ -35,6 +35,7 @@ void asm_offsets(void) OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]); OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]); OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]); + OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu); OFFSET(TASK_TI_FLAGS, task_struct, thread_info.flags); OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, thread_info.preempt_count); OFFSET(TASK_TI_KERNEL_SP, task_struct, thread_info.kernel_sp); diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S index cc2fd4cd54a0..82caeee91c15 100644 --- a/arch/riscv/kernel/entry.S +++ b/arch/riscv/kernel/entry.S @@ -75,6 +75,13 @@ SYM_CODE_START_NOALIGN(handle_exception) REG_S s4, PT_CAUSE(sp) REG_S s5, PT_TP(sp) + REG_L s0, TASK_TI_CPU(tp) + slli s0, s0, 3 + la s1, __per_cpu_offset + add s1, s1, s0 + REG_L s1, 0(s1) + csrw CSR_SCRATCH, s1 + la s1, handle_kernel_exception csrw CSR_TVEC, s1 diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c index fb6ab7f8bfbd..6fa12cc84523 100644 --- a/arch/riscv/kernel/smpboot.c +++ b/arch/riscv/kernel/smpboot.c @@ -43,6 +43,7 @@ static DECLARE_COMPLETION(cpu_running); void __init smp_prepare_boot_cpu(void) { + set_my_cpu_offset(per_cpu_offset(smp_processor_id())); } void __init smp_prepare_cpus(unsigned int max_cpus) @@ -240,6 +241,8 @@ asmlinkage __visible void smp_callin(void) mmgrab(mm); current->active_mm =3D mm; + set_my_cpu_offset(per_cpu_offset(curr_cpuid)); + store_cpu_topology(curr_cpuid); notify_cpu_starting(curr_cpuid); -- 2.43.0 Thanks, Yunhui