From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 484EFC8303C for ; Mon, 7 Jul 2025 12:50:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B6CE38D0003; Mon, 7 Jul 2025 08:50:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B1DD88D0002; Mon, 7 Jul 2025 08:50:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A0C018D0003; Mon, 7 Jul 2025 08:50:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 879138D0002 for ; Mon, 7 Jul 2025 08:50:44 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 40FF68031F for ; Mon, 7 Jul 2025 12:50:44 +0000 (UTC) X-FDA: 83637452808.12.B70E69B Received: from mail-wr1-f46.google.com (mail-wr1-f46.google.com [209.85.221.46]) by imf05.hostedemail.com (Postfix) with ESMTP id 3F6A010000A for ; Mon, 7 Jul 2025 12:50:42 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=ventanamicro.com header.s=google header.b=Kjomb78t; spf=pass (imf05.hostedemail.com: domain of rkrcmar@ventanamicro.com designates 209.85.221.46 as permitted sender) smtp.mailfrom=rkrcmar@ventanamicro.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751892642; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ft2gGM1xdSh/BYGEm14JWzh26OBrCpmeDTwJtAeavYc=; b=15wPxpaxleKU5yTZTuTuYQACsmc2ezWRaRIZoUvwOE2zbR6oTEpYv6Wu5i5ic/RjerQMtf Y8yHZK6sdYuAddPfnp9MW0xMrrZ8yEMa1QgA+EEoXndlpg9RjpBQYT8NXjsWHOlsjat04o 5d8lLhMWcDOUtPsz4SmUnNXAckli43c= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=ventanamicro.com header.s=google header.b=Kjomb78t; spf=pass (imf05.hostedemail.com: domain of rkrcmar@ventanamicro.com designates 209.85.221.46 as permitted sender) smtp.mailfrom=rkrcmar@ventanamicro.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751892642; a=rsa-sha256; cv=none; b=DhnNDbF+0mcEN3Ep54/6pztX38uZ5uJMgOJNlQSyyC1Ow3+51fQgwhNZCKqFIurIW/qnkh Rlk0L6uj12ReGSI8jmgTcDoWAaGJaBZn3okF6TNHQQCYcesTeQ0G+/SYMFYz49/7CbMv2g RORwJ3KB0E03sbqoCdnuiyB3Afg7pBg= Received: by mail-wr1-f46.google.com with SMTP id ffacd0b85a97d-3b20fcbaf3aso545664f8f.0 for ; Mon, 07 Jul 2025 05:50:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ventanamicro.com; s=google; t=1751892640; x=1752497440; darn=kvack.org; h=in-reply-to:references:from:to:cc:subject:message-id:date :content-transfer-encoding:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ft2gGM1xdSh/BYGEm14JWzh26OBrCpmeDTwJtAeavYc=; b=Kjomb78tiMIYbZGhJkC+u3fMnoF/B4dp4zM9ozBKwU3I/OS2hCSmNXuPToTPkko3PJ IbpZ6gzoeGa7djfghE6iGyDJhY5OUGQ8kBHTRLApjcyLU0CgtrTx6gJZNzvC12ZTShXW l//AEQzxbZwUarcFWBdrWWEW8GhrjXEBt3OxQxoQRC9N92/94T+WEsw5vZRKeRQcLkI5 IrkFvf32BzOrOGnzDyvu++TPRA+G4ei7f+fxZWmZwqnaogdwdMfgl1ehfduM8kx8sQMF SU1MCpi1fZuUt3+iDA8Z8yyLQBeCYfocL0obIQh20SeMHxbgGmmZOc01sN5D4jBa152d KYPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751892641; x=1752497441; h=in-reply-to:references:from:to:cc:subject:message-id:date :content-transfer-encoding:mime-version:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=ft2gGM1xdSh/BYGEm14JWzh26OBrCpmeDTwJtAeavYc=; b=lgWwtdXXpczmjVnlo6TYLSQkLqJD0uR0YHVmdf9joC0XCv8qFdz4XR2jgEtdYAZX0J +7AVxl8IQexVoM0T3D0SDSikdGoKSQImGd8v079zs4hfYfuvpsG2kbjyDd7es39Ac1ZJ cVe+a6LLkcPSR57cUd4On6/ADm6hmr+z/bw59LnIQtmqWSH2atafnkAAi3rwi88bxxqI DJ7u5n0tVkF5JpWyYCx1dCoaOQ0vZ3ze4Ihqz9DJsbWQ8AQ8UopCShXvtMqAFKRxK9Z2 2sj4Kcjm8Xp8JPwjB8tALNzkHKN+3mY6e1JQwajmE08tGPc5UZzCIGIbPG2bpNCSaUmB +m/w== X-Forwarded-Encrypted: i=1; AJvYcCWDYZn/XJdJPCIAjB6qZcSOrxy8IoZMWIuSI8e5X3WI38eBJAcQeMJbltIgxbcvl7esZdVaCELs4g==@kvack.org X-Gm-Message-State: AOJu0YxU3cL4w8LCGy7HWqA1DsiLZ1gifMkc5il/P1Ska39CRzfhWvSy omO3S+Yq7dP7i276E7FKLYcS0PcwUAzawcmr056o46hhP/bBN2oJZtCMb/d95rota4o= X-Gm-Gg: ASbGncueyLSY+BYBpZazGdiOxqSOzoFqYVyeX0WM4GPNrhzJb+RKn4YS9L83lejr+fA eQv6s5yCvWJxSrsEP+o714wukXTj0HF7p5aU1uPv15BrzIfsaZv6WGhoSzeLCRIUxl2CauGRhvV JlnoUYaEplP5HcObjcUmtsVFArGKQR45ho8FhTpK7m+0TAfQqTNLFcsSnKC4hVBjflx7U7xgjeR 2O3QIt6rILHjMsohvpbJbaDECEp+RE8NQju8/KK1It5g8rY+0J2kGIsScmkDT9iPj/AEBi8QpX9 NW5JSZytdkUk6ijyPBYisOFyRZPvvVy1DdRmM6r3/SDk/DjDD8I1g+YW5ILT99NznljE X-Google-Smtp-Source: AGHT+IEe0+QxHUH3lQ2uYaYi9vgxze6Sq1IneFmRKgL0IxDHY9UDkF9+kDTqonNSeqj2Uz7UzM76SQ== X-Received: by 2002:a05:6000:2086:b0:3a4:eb46:7258 with SMTP id ffacd0b85a97d-3b496617d36mr3681711f8f.15.1751892640390; Mon, 07 Jul 2025 05:50:40 -0700 (PDT) Received: from localhost ([2a02:8308:a00c:e200:df6:4ed7:c6e7:1ee1]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3b47285bdf8sm9939855f8f.87.2025.07.07.05.50.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Jul 2025 05:50:40 -0700 (PDT) Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Date: Mon, 07 Jul 2025 14:50:39 +0200 Message-Id: Subject: [PATCH] RISC-V: store percpu offset in CSR_SCRATCH Cc: "linux-riscv" To: "Yunhui Cui" , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , From: =?utf-8?q?Radim_Kr=C4=8Dm=C3=A1=C5=99?= References: <20250704084500.62688-1-cuiyunhui@bytedance.com> In-Reply-To: <20250704084500.62688-1-cuiyunhui@bytedance.com> X-Stat-Signature: cuufjwbs1wfh8j58w1i7a3tn8rmsxhc3 X-Rspamd-Queue-Id: 3F6A010000A X-Rspamd-Server: rspam11 X-Rspam-User: X-HE-Tag: 1751892642-817197 X-HE-Meta: U2FsdGVkX1/PY4NwhlT3t0pQXy/IbpFakicKw+AUctTqZ5fCvUUorvet/Sc4NU9y5xfa8iuSAJFgN8gf2JoP5/2lgZ3CSZfX8yl7XA7q9d1cx6vQ4Hk4o02s1SR1fOM0lXhv1Ec3eBxNg4g7JSmK1XBe1iBcii/Sqz2rsnrm6wKsz9hYcG5exs+Hu+N7cLp8moU2elSXNCSiPsSjNo3vqTO1sKvkzQj6JCvseQWvJ/rOe7eyofyBOFtie4sAKlXm53onUke5+5V9SCpycCLW/ht1VtSv1MwyYUbPAr/FNY3fBOEJWtAnbWccbZzGUqMIAUAKSkCL9k1YU1t3NzNjIbKlaFSlsK6LPFV86txDUM2HXNHKkj7+9sA261psb0rX/nyVsOj/otSfWudHyxVP5ipfpSK2jUR2N+fqLIeNK3AS0y67832TqOMIwmVVKLHlV8XDzuBbnsT0rv95JGIhOGCI/7iv06jTA4kXWGJnE48W8dGgFizOTo7rY0BLgjGViEYIlHSKdmwgSnVR1lE5ONJJQhDQyMmp/zzDXpYFaFbyCThak3+aYqbEAwS52x4EsoW1JI8MjFSxTvUJH7gg4zRwLPetI5otxoCOjaB5OIqiE4VyINYRy2LZVOMvdP9eboTZvwQS8rrz3ErwaN9vs9wPybHa1SN3CnzjZbCUA0JEvcRaBWkNuzhMrLD6p1NlGOfXhpXwPGRb5GC3+KNBCfHE9TPoZyTdPxg10Sf72uxsTxXU7Mg/9AZs0Rcei0QI1tUHclwmFPmXtZA/RpXi6o3At4F+L7FFAMnQPVvUl74E/lIkQ8wcF7mKGSO2jb7q2n1L5fOB8bOKinvLRU5F3oBkifwfvMJkgwlIna7PTgodbV1puTEpBO4ofjWgq7jbHoF3p7xg9KMxlZvKTOpQs1nTpol0L0mG6IYLSZQIhycDHYfV+ImIwyN8xoGmrWgNyxqogxwVv099XoU36z5 N8xwnN5S PAfJvRvsNWdvhMcjXDaGk1pR5Fai3sle3dtaHTzyH9/Fc5IFdpkwaIvIyjL9DtgPSpCkwO92Jw2iGb1F4ao2/unh8NaCaTXdoG2TrIMBtcWIcldLQeyqPv0XjKTKmw+018VsQ+20C19b0DADFn0BM1jeN4DLvDJYiAoIp6RNXRb2SRSBR6qhsKpx+XPVG0e58ROPZOc9gxiA7k9DDAzMG0hEW1tpZydiEjq/3WOFaQ4biqWPJpPnv3aR/2DGnqEm0VBekmSVJyNAPYI+xqF1Pq9KGojiKl6nDCCTQYA9/6lIxJenlF32taX1N8i3Fcpsh9FkcTYeBTsJC1P/EBDQUEpz0yxh01gGyBgCBT4UqVLcs0iKurFBBglqDyd8J+7322sAguCE91iLi13C5U+He+Z0zLn0FFhgkQiUh2vubFLeUTyY760JG8cAEe/YYLhJ9bBv3gWMKaX1C/4l01Y45UJb0FPVQSaBhxcfX9SxRdDx1t6SuLMQKJh7XiiaRQi8KI40naXwGhH6ZlMH7O9UPDLrStTjjw69rH4ehyaFBXq++W1AtQc1beOiLvk3KxXsTNsZ6mJx4RjMuNvqNB5yXLaMPJg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 2025-07-04T16:45:00+08:00, Yunhui Cui : > The following data was collected from tests conducted on the > Spacemit(R) X60 using the fixed register method: > [...] > The fixed register method reduced performance by 5.29%. > The per-CPU offset optimization improved performance by 2.52%. What is the performance if you use the scratch register? The patch below is completely unoptimized as I didn't want to shuffle code around too much, but it could give a rough idea. Thanks. ---8<--- The scratch register currently denotes the mode before exception, but we can just use two different exception entry points to provide the same information, which frees the scratch register for the percpu offset. The user/kernel entry paths need more through rewrite, because they are terribly wasteful right now. --- Applies on top of d7b8f8e20813f0179d8ef519541a3527e7661d3a (v6.16-rc5) arch/riscv/include/asm/percpu.h | 13 ++++++++++ arch/riscv/kernel/entry.S | 46 ++++++++++++++++++++------------- arch/riscv/kernel/head.S | 7 +---- arch/riscv/kernel/smpboot.c | 7 +++++ arch/riscv/kernel/stacktrace.c | 4 +-- 5 files changed, 51 insertions(+), 26 deletions(-) create mode 100644 arch/riscv/include/asm/percpu.h diff --git a/arch/riscv/include/asm/percpu.h b/arch/riscv/include/asm/percp= u.h new file mode 100644 index 000000000000..2c838514e3ea --- /dev/null +++ b/arch/riscv/include/asm/percpu.h @@ -0,0 +1,13 @@ +#ifndef __ASM_PERCPU_H +#define __ASM_PERCPU_H + +static inline void set_my_cpu_offset(unsigned long off) +{ + csr_write(CSR_SCRATCH, off); +} + +#define __my_cpu_offset csr_read(CSR_SCRATCH) + +#include + +#endif diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S index 75656afa2d6b..e48c553d6779 100644 --- a/arch/riscv/kernel/entry.S +++ b/arch/riscv/kernel/entry.S @@ -91,18 +91,8 @@ REG_L a0, TASK_TI_A0(tp) .endm =20 - -SYM_CODE_START(handle_exception) - /* - * If coming from userspace, preserve the user thread pointer and load - * the kernel thread pointer. If we came from the kernel, the scratch - * register will contain 0, and we should continue on the current TP. - */ - csrrw tp, CSR_SCRATCH, tp - bnez tp, .Lsave_context - -.Lrestore_kernel_tpsp: - csrr tp, CSR_SCRATCH +SYM_CODE_START(handle_kernel_exception) + csrw CSR_SCRATCH, tp =20 #ifdef CONFIG_64BIT /* @@ -126,8 +116,22 @@ SYM_CODE_START(handle_exception) bnez sp, handle_kernel_stack_overflow REG_L sp, TASK_TI_KERNEL_SP(tp) #endif + j handle_exception +ASM_NOKPROBE(handle_kernel_exception) +SYM_CODE_END(handle_kernel_exception) =20 -.Lsave_context: +SYM_CODE_START(handle_user_exception) + /* + * If coming from userspace, preserve the user thread pointer and load + * the kernel thread pointer. + */ + csrrw tp, CSR_SCRATCH, tp + j handle_exception + +SYM_CODE_END(handle_user_exception) +ASM_NOKPROBE(handle_user_exception) + +SYM_CODE_START_NOALIGN(handle_exception) REG_S sp, TASK_TI_USER_SP(tp) REG_L sp, TASK_TI_KERNEL_SP(tp) addi sp, sp, -(PT_SIZE_ON_STACK) @@ -158,11 +162,15 @@ SYM_CODE_START(handle_exception) REG_S s4, PT_CAUSE(sp) REG_S s5, PT_TP(sp) =20 - /* - * Set the scratch register to 0, so that if a recursive exception - * occurs, the exception vector knows it came from the kernel - */ - csrw CSR_SCRATCH, x0 + REG_L s0, TASK_TI_CPU(tp) + slli s0, s0, 3 + la s1, __per_cpu_offset + add s1, s1, s0 + REG_L s1, 0(s1) + + csrw CSR_SCRATCH, s1 + la s1, handle_kernel_exception + csrw CSR_TVEC, s1 =20 /* Load the global pointer */ load_global_pointer @@ -236,6 +244,8 @@ SYM_CODE_START_NOALIGN(ret_from_exception) * structures again. */ csrw CSR_SCRATCH, tp + la a0, handle_user_exception + csrw CSR_TVEC, a0 1: #ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE move a0, sp diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S index bdf3352acf4c..d8858334af2d 100644 --- a/arch/riscv/kernel/head.S +++ b/arch/riscv/kernel/head.S @@ -188,14 +188,9 @@ secondary_start_sbi: .align 2 .Lsetup_trap_vector: /* Set trap vector to exception handler */ - la a0, handle_exception + la a0, handle_kernel_exception csrw CSR_TVEC, a0 =20 - /* - * Set sup0 scratch register to 0, indicating to exception vector that - * we are presently executing in kernel. - */ - csrw CSR_SCRATCH, zero ret =20 SYM_CODE_END(_start) diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c index 601a321e0f17..2db44b10bedb 100644 --- a/arch/riscv/kernel/smpboot.c +++ b/arch/riscv/kernel/smpboot.c @@ -41,6 +41,11 @@ =20 static DECLARE_COMPLETION(cpu_running); =20 +void __init smp_prepare_boot_cpu(void) +{ + set_my_cpu_offset(per_cpu_offset(smp_processor_id())); +} + void __init smp_prepare_cpus(unsigned int max_cpus) { int cpuid; @@ -225,6 +230,8 @@ asmlinkage __visible void smp_callin(void) mmgrab(mm); current->active_mm =3D mm; =20 + set_my_cpu_offset(per_cpu_offset(curr_cpuid)); + store_cpu_topology(curr_cpuid); notify_cpu_starting(curr_cpuid); =20 diff --git a/arch/riscv/kernel/stacktrace.c b/arch/riscv/kernel/stacktrace.= c index 3fe9e6edef8f..69b2f390a2d4 100644 --- a/arch/riscv/kernel/stacktrace.c +++ b/arch/riscv/kernel/stacktrace.c @@ -16,7 +16,7 @@ =20 #ifdef CONFIG_FRAME_POINTER =20 -extern asmlinkage void handle_exception(void); +extern asmlinkage void handle_kernel_exception(void); extern unsigned long ret_from_exception_end; =20 static inline int fp_is_valid(unsigned long fp, unsigned long sp) @@ -72,7 +72,7 @@ void notrace walk_stackframe(struct task_struct *task, st= ruct pt_regs *regs, fp =3D frame->fp; pc =3D ftrace_graph_ret_addr(current, &graph_idx, frame->ra, &frame->ra); - if (pc >=3D (unsigned long)handle_exception && + if (pc >=3D (unsigned long)handle_kernel_exception && pc < (unsigned long)&ret_from_exception_end) { if (unlikely(!fn(arg, pc))) break;