From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9AD25C83F03 for ; Wed, 9 Jul 2025 11:42:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 294436B00BE; Wed, 9 Jul 2025 07:42:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2450B6B00BF; Wed, 9 Jul 2025 07:42:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 10C426B00C0; Wed, 9 Jul 2025 07:42:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id ECEB66B00BE for ; Wed, 9 Jul 2025 07:42:41 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 98F271403C0 for ; Wed, 9 Jul 2025 11:42:41 +0000 (UTC) X-FDA: 83644538922.05.9F3AF2F Received: from mail-oi1-f171.google.com (mail-oi1-f171.google.com [209.85.167.171]) by imf28.hostedemail.com (Postfix) with ESMTP id 3731BC0014 for ; Wed, 9 Jul 2025 11:42:39 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=aCg5sNpn; spf=pass (imf28.hostedemail.com: domain of cuiyunhui@bytedance.com designates 209.85.167.171 as permitted sender) smtp.mailfrom=cuiyunhui@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752061359; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uf6wWl6n7d/QhS1jb85XHnTgsGudA3C5Y1Dh/3rgidQ=; b=yi9Nzk9Xoa9KyvBCbhmdJLJQTGDvngS61ZoHcIpByR3q1DQuYzGz4ETpwL89UW3q2vm+zj nuIept32st61/sWfrpEbi7PIny5kYoEC7hDCvUKBjwsDolwppniZ867mvbu5jiyTMUheYr ewhCrch2ZBYkyBWNURxqjvnCCySkXgs= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=aCg5sNpn; spf=pass (imf28.hostedemail.com: domain of cuiyunhui@bytedance.com designates 209.85.167.171 as permitted sender) smtp.mailfrom=cuiyunhui@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752061359; a=rsa-sha256; cv=none; b=iPcfvsLauQnoMfsepHCYDkLum9Pyt7pm4YPq81cOAdweWHRgiJ5tASawMk/Wxrg+42AHZD p2pxjFFLcCJsxFYwDogaV3XBv90iczAfnuQ5fE6xhSi867ZwzpDb4u6N6TVDcWNnjFS0tb JpsnnfSlz8AyhnHpddS0nBmNFia6VHc= Received: by mail-oi1-f171.google.com with SMTP id 5614622812f47-40b859461easo2954793b6e.1 for ; Wed, 09 Jul 2025 04:42:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1752061358; x=1752666158; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=uf6wWl6n7d/QhS1jb85XHnTgsGudA3C5Y1Dh/3rgidQ=; b=aCg5sNpnN5KZKcF9qK3pESABUJ063QX+oA7us19UPIxpqoMXhSonLjEOVFF875T+li 40HNN7Dpp34Z67agi4qhgt8lZagn20k//vkRg8Fz3xM8gyYKsrHEIxBAqlGyarVAyV7a x4BUX4l6UTfs+GdWsI+Qm5lq0iDXJ+qq9mxzkJbP4Qb4ZEkrnaxbxCjQOJAF/7/puDOr p9wV5m9HiZ9gaQov/zSBF58Gtr4FH6x2PhEpAMUgT7eaDq6b7Sn6YFNTAR5GDAfnZXyJ CXyuCkLyTdj1gRO0sW9AcuDr9ZlTyLhm02H0GtRMHpVTSW/6VC2Vhh7IPZchbo8kOHo6 0ZHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752061358; x=1752666158; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uf6wWl6n7d/QhS1jb85XHnTgsGudA3C5Y1Dh/3rgidQ=; b=Jf1vVGYG1XMSHubBh8VNtxvfhvlUa+iyRP4lPAXxVTdTjOkomRarzDhRGx/61IQsIm JLDh4ljyZXRAcYmf36XGazRdWbYT040lZIzqYHxOgJmZ1ipQ/M0uXhyRF0tPkhQWlxh8 URFU6C/VWWg88IbSVVWUJLYi0n+whCnLyA/ZoD9//PTP+EOB3HsIxXU3fx/IqNn/nJm3 R1V1tQgKYJZZ8KXCT03wf2u1Pb8a3eCX51DwvvoOOcOAJodQJwaK4VwbX/nildQCZ5LG o6Pg8wJNfzAn9A7YSpOSWs3cXVIYwNU+v+qUP/GWb28vI9fQb3tZi3u/ZxCduO+1senT 3uzA== X-Forwarded-Encrypted: i=1; AJvYcCVuuClDgxEQYNwcJxU+TVDha//xOnmPjLbYG6dQNuLeqbLqa1vppn7xOKlLU/XecVATPemigBY3lA==@kvack.org X-Gm-Message-State: AOJu0YxtwOtsBZWNSYlMdnPwhna8pMTkZWF8L96CYiqvriRmnqZ0ZoEo SjovICRNXmkqeWX9R89bYgVuHTSkmjj3ejrAwLQYLJ6lhDbTLy03kJ9YdxEC2wtf6N3Ts3UcmDl DgB47XpO1i+qKg5BBRUxZbNp4UWL1kchqtmLtXWUy1w== X-Gm-Gg: ASbGncuv4+3q96viH/cLf9vdRDTpppcJuoYLMx5jKLEZWgfE93fhCOS/1eNWobaNNTZ GG7eiqMOIwZCe0lxfNEMQ149Q5IFQP1A6t71jzIUekwMeQOPiXkHnNImEO2rxQFt+3dkDZnAjk3 2pQRO+t6UNhftbbdTyQpGZbkpnU+d/SPFWozZ13W03GdOuvA== X-Google-Smtp-Source: AGHT+IFFarpV+643bKt5qJ+DNCUdXRkgdbFv1DFborAIItZ6HyKDTIAMbIuL6CmIep7RFIIvzboL1lf5sVGTpr/bZVk= X-Received: by 2002:a05:6808:4fea:b0:40b:4208:710f with SMTP id 5614622812f47-412bc6654bdmr1245049b6e.24.1752061357971; Wed, 09 Jul 2025 04:42:37 -0700 (PDT) MIME-Version: 1.0 References: <20250704084500.62688-1-cuiyunhui@bytedance.com> In-Reply-To: From: yunhui cui Date: Wed, 9 Jul 2025 19:42:26 +0800 X-Gm-Features: Ac12FXxVaWmUGd4ywnuSLaoxkhkOuISEKPywBGSGA0DKa-8LUWkMvwEIkoA4H6I Message-ID: Subject: Re: [External] [PATCH] RISC-V: store percpu offset in CSR_SCRATCH To: =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= Cc: masahiroy@kernel.org, nathan@kernel.org, nicolas.schier@linux.dev, dennis@kernel.org, tj@kernel.org, cl@gentwo.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, alex@ghiti.fr, andybnac@gmail.com, bjorn@rivosinc.com, cyrilbur@tenstorrent.com, rostedt@goodmis.org, puranjay@kernel.org, ben.dooks@codethink.co.uk, zhangchunyan@iscas.ac.cn, ruanjinjie@huawei.com, jszhang@kernel.org, charlie@rivosinc.com, cleger@rivosinc.com, antonb@tenstorrent.com, ajones@ventanamicro.com, debug@rivosinc.com, haibo1.xu@intel.com, samuel.holland@sifive.com, linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-riscv@lists.infradead.org, linux-riscv , wangziang.ok@bytedance.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: 3731BC0014 X-Rspamd-Server: rspam09 X-Stat-Signature: 63fmw5aekerotc49gnuycoq9wrqenetg X-HE-Tag: 1752061359-22626 X-HE-Meta: U2FsdGVkX18HlZGWVkCEUqVVfRdsrqFb1sXBYqsNeqXoCuqEH+aUTUvTQCX1mC1tYG2+KpbVvtTntIylhAJOnk2ieQZYGlgGyo5upST7uTFC+5BTKuqXiBV6YlEJjKtyNqFNExrQIt7KCz41lmZeTQgnwyPOtL+8Rn3GeK9WHfnagUrP+3eDlOBWnLMU8J3SeHWO0JL4qIGCrtg9SXJxNtUyiQZ52pu8deJTA3g/fpyt8qbb98RGWSAHjftR7vN6DRwI6bwkN8MbSGJrIp6DnowJN32Yn0tlGQ27A7ne5PH/pmZzTITpG5RcP2IPycBe2ZyJZr3jLrGeFiVBwMskQhCuzVXR/clCEQeYuBKQkB6YQ4tm9UYkeqZ9jkP+hRfcSnoNTv/RkL5POXTopYw2PdXjru7E2lSlY5MSi7lCS/QxiLRDqxID+tnuR5lmdMcwjIHJBtzSjTL6hpi2c49TQ/6rhtiAaQwQeeL3fag2OiRWMeGTvMLi5FrAz3orxypjcrDU5gNBUCbxoop/shp3FSKCuTw75IZVFSVeb+mPQ2/AaVuy98Jhx4TTsJ7e0qdDtfeWTXEuWF2u5JXpTcQi9TqwLXn2B7WOirbubAs0wz7F1hA0gcF89RMhoT4FFbb0pqpRnEzU8ZKP1jLdpJgHm6k0jr7MblsPUI3wtVGRzkbFn4X2mwdIBLZ9Xcvb+ZNIKIwG4AsNa3HKVXMcZGiiJDfz/ijK5YEfCzCaUFjDXmu6t8p96ubG5elSpod0pOE5t4WZXZMgr1AU+N4650BSiiCenlA3zv2/C5QlhvRIEQeiOJKS00KdkBw9lHoz+lzQrSUnI8l3V5DJu49B/TGOyvxU5r6ogJXz8kWyBsTmscQWVL7PTSVPQRdQrIIpqmgaQYjaD/iHukcgS49EKxDmCNqK8vWO48sMdUSDpRSLcOUKXtXKr7B+3zUwLSQLg55ujlfP/n+ig7TOh2eoU56 l6UwT+TA U76HaC1lhLcH7EQUNUhbNy/GKJVS2DRyRiP1RcbAYEPm0pVilIJMXl1RTgVsEemeq+FGwwREfkGBwDY833nrc/Neofl3xf3K0SGkD1IyKI7wPfDw31jbbkY5rdzTVQyvRhPPtfFNls52X73QejHZ2ywpCVL0x8Mmkm+gEkEu7FAWgL0zU3kyZFh7YWiZaa9XHHObpurfqpdzZKy0DdcJJbLaunjFRrC3LN/Rrts4E6jHjj29jqqXuWkXwiKZYtVWkBgRYYYq7DmSAWVucg1OjyNf1fSaYfy54RfPEbSUUOqHIuC7DqwSv/BTEOUacqmW3dx6JphjG8k1xMPSS3NVQqgdgdPC3Jr2VdDvd7OJppKxbYgvjjRJScmG0FSQlsWRZ4niH0o4Ir9dGKUXurgJxXCY1MQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Radim, On Tue, Jul 8, 2025 at 7:10=E2=80=AFPM Radim Kr=C4=8Dm=C3=A1=C5=99 wrote: > > 2025-07-08T18:07:27+08:00, yunhui cui : > > This patch cleverly differentiates whether an exception originates > > from user mode or kernel mode. However, there's still an issue with > > using CSR_SCRATCH: each time handle_exception() is called, the > > following instructions must be executed: > > > > REG_L s0, TASK_TI_CPU(tp) > > slli s0, s0, 3 > > la s1, __per_cpu_offset > > add s1, s1, s0 > > REG_L s1, 0(s1) > > csrw CSR_SCRATCH, s1 > > We can minimize the cost at exception entry by storing the precomputed > offset in thread_info, which bloats the struct, and also incurs update > cost on cpu migration, but should still be a net performance gain. > > The minimal code at exception entry would be: > > REG_L s0, TASK_TI_PERCPU_OFFSET(tp) > csrw CSR_SCRATCH, s0 > > > Should we consider adding a dedicated CSR (e.g., CSR_SCRATCH2) to > > store the percpu offset instead? > > See: https://lists.riscv.org/g/tech-privileged/topic/113437553#msg2506 > > It would be nice to gather more data on the CSR_SCRATCH approach. > Basically, the overhead of "REG_L s0, TASK_TI_PERCPU_OFFSET(tp)". > (Or the longer sequence if we think it is worth it.) > > Can you benchmark the patch after reverting percpu.h, so we include the > overhead of switching CSR_SCRATCH, but without any benefits provided by > the per-cpu offset? > The baseline would be the patch with reverted percpu.h, and reverted the > sequence that sets the CSR_SCRATCH in handle_exception, so we roughly > estimate the benefit of adding CSR_SCRATCH2. > > The CSR_SCRATCH2 does add overhead to hardware, and to domain context > switches, and we also have to do something else for a few years anyway, > because it's not even ratified... It's possible we might not benefit > enough from CSR_SCRATCH2 to make a good case for it. > > Thanks. Bench platform: Spacemit(R) X60 No changes: 6.77, 6.791, 6.792, 6.826, 6.784, 6.839, 6.776, 6.733, 6.795, 6.763 Geometric mean: 6.786839305 Reusing the current scratch: 7.085, 7.09, 7.021, 7.089, 7.068, 7.034, 7.06, 7.062, 7.065, 7.051 Geometric mean: 7.062466876 A degradation of approximately 4.06% is observed. The possible cause of the degradation is that the CSR_TVEC register is set every time a kernel/user exception occurs. The following is the patch without percpu optimization, which only tests the overhead of separating exceptions into kernel and user modes. --- arch/riscv/kernel/entry.S | 39 ++++++++++++++++++++++----------------- arch/riscv/kernel/head.S | 7 +------ 2 files changed, 23 insertions(+), 23 deletions(-) diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S index 9d1a305d5508..cc2fd4cd54a0 100644 --- a/arch/riscv/kernel/entry.S +++ b/arch/riscv/kernel/entry.S @@ -19,17 +19,8 @@ .section .irqentry.text, "ax" -SYM_CODE_START(handle_exception) - /* - * If coming from userspace, preserve the user thread pointer and load - * the kernel thread pointer. If we came from the kernel, the scratch - * register will contain 0, and we should continue on the current TP. - */ - csrrw tp, CSR_SCRATCH, tp - bnez tp, .Lsave_context - -.Lrestore_kernel_tpsp: - csrr tp, CSR_SCRATCH +SYM_CODE_START(handle_kernel_exception) + csrw CSR_SCRATCH, tp REG_S sp, TASK_TI_KERNEL_SP(tp) #ifdef CONFIG_VMAP_STACK @@ -40,7 +31,20 @@ SYM_CODE_START(handle_exception) REG_L sp, TASK_TI_KERNEL_SP(tp) #endif -.Lsave_context: + j handle_exception +SYM_CODE_END(handle_kernel_exception) + +SYM_CODE_START(handle_user_exception) + /* + * If coming from userspace, preserve the user thread pointer and load + * the kernel thread pointer. + */ + csrrw tp, CSR_SCRATCH, tp + j handle_exception + +SYM_CODE_END(handle_user_exception) + +SYM_CODE_START_NOALIGN(handle_exception) REG_S sp, TASK_TI_USER_SP(tp) REG_L sp, TASK_TI_KERNEL_SP(tp) addi sp, sp, -(PT_SIZE_ON_STACK) @@ -71,11 +75,8 @@ SYM_CODE_START(handle_exception) REG_S s4, PT_CAUSE(sp) REG_S s5, PT_TP(sp) - /* - * Set the scratch register to 0, so that if a recursive exception - * occurs, the exception vector knows it came from the kernel - */ - csrw CSR_SCRATCH, x0 + la s1, handle_kernel_exception + csrw CSR_TVEC, s1 /* Load the global pointer */ load_global_pointer @@ -141,6 +142,10 @@ SYM_CODE_START_NOALIGN(ret_from_exception) * structures again. */ csrw CSR_SCRATCH, tp + + la a0, handle_user_exception + csrw CSR_TVEC, a0 + 1: #ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE move a0, sp diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S index a2e2f0dd3899..992acec3bc87 100644 --- a/arch/riscv/kernel/head.S +++ b/arch/riscv/kernel/head.S @@ -172,14 +172,9 @@ secondary_start_sbi: .align 2 .Lsetup_trap_vector: /* Set trap vector to exception handler */ - la a0, handle_exception + la a0, handle_kernel_exception csrw CSR_TVEC, a0 - /* - * Set sup0 scratch register to 0, indicating to exception vector that - * we are presently executing in kernel. - */ - csrw CSR_SCRATCH, zero ret .align 2 -- 2.43.0 Thanks, Yunhui