From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C9CFC83F1B for ; Thu, 10 Jul 2025 16:40:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 118006B0096; Thu, 10 Jul 2025 12:40:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0F0466B0099; Thu, 10 Jul 2025 12:40:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 005616B009A; Thu, 10 Jul 2025 12:40:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id E47336B0096 for ; Thu, 10 Jul 2025 12:40:07 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 48EC31609F1 for ; Thu, 10 Jul 2025 16:40:07 +0000 (UTC) X-FDA: 83648917254.02.C5FDDDD Received: from mail-wr1-f46.google.com (mail-wr1-f46.google.com [209.85.221.46]) by imf04.hostedemail.com (Postfix) with ESMTP id 460384001A for ; Thu, 10 Jul 2025 16:40:05 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=ventanamicro.com header.s=google header.b=DgNdmkoy; dmarc=none; spf=pass (imf04.hostedemail.com: domain of rkrcmar@ventanamicro.com designates 209.85.221.46 as permitted sender) smtp.mailfrom=rkrcmar@ventanamicro.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752165605; a=rsa-sha256; cv=none; b=GiE2UG3UxXPZC6wPvz4JBHvbua1jtfG+NMK39Wekj8o/dUncdmitLe6txC2MRBu22t+fRN bUcwwsgGaxvBMuunO6TYCiA3TPNQ5HsjUtw4iMZIXathlIHf4cVCsrgkCZBg3ET1ZLQuZV 0epeJx+KKMgzLRWfb73PeuaQrFLlAFA= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=ventanamicro.com header.s=google header.b=DgNdmkoy; dmarc=none; spf=pass (imf04.hostedemail.com: domain of rkrcmar@ventanamicro.com designates 209.85.221.46 as permitted sender) smtp.mailfrom=rkrcmar@ventanamicro.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752165605; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Nrn4nfiPG+tAqCMVAZ7jqIn9oEnuj7dwUbTTAq45dGg=; b=G/WkUoJLR1hRXK4IHStJyB0gC1CYpyRQ+yOdY0hicZCOIWK7ItmWAp8G8vaO2AxF2C3Ryr QRyQ7dzOF95bb9DJ/sW8ZTCDAJ5G/UyI0iUlunhbNhDgAMgypRlx+eXJhbcx8p0YPr3i5x 72/lsevOswjlLZ4yGf9GF1xs48Fc7Oc= Received: by mail-wr1-f46.google.com with SMTP id ffacd0b85a97d-3a577ab8c34so89512f8f.3 for ; Thu, 10 Jul 2025 09:40:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ventanamicro.com; s=google; t=1752165603; x=1752770403; darn=kvack.org; h=in-reply-to:references:to:cc:subject:from:message-id:date :content-transfer-encoding:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Nrn4nfiPG+tAqCMVAZ7jqIn9oEnuj7dwUbTTAq45dGg=; b=DgNdmkoyxGLhzTMPBCOFhI82smPXFfikPUdz/7e7mzUD7yoeE8jMo3oth2SOA7nfqp IQ8hcOq8TAO/p3nHGYwla3jisYClSl95WBPEvXD+f5L1uDviLrwe171qQF+UBEY197H0 2Tl9dEcnAEX3+81oGMzPct573mvNKB//ocXWyyDVhMoZW3wnTnCMPThBggws+JmYxCUr 41CSI40ZMvXpRap+0uQeFIZk7B5XfNPxJkzag5CTPRTBnj8wSuvsMb05T47V8x5U8ge+ bTVgmUVlnghvjCSrV628HHbFs2ffY/JimZ2SzqU5HGpYscCtm60W/tkPaF4zFtPR6xV8 BfMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752165603; x=1752770403; h=in-reply-to:references:to:cc:subject:from:message-id:date :content-transfer-encoding:mime-version:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=Nrn4nfiPG+tAqCMVAZ7jqIn9oEnuj7dwUbTTAq45dGg=; b=XF+r7slBPR/nV2QTgWlnYoso3s2FRcOxRqbaaz9zUJhvf3rtwKQ3SgHezQ++s6lFNn W/nS+4oihQiC351nxhF2F1Yc1BRDhB0OH5fpjLrXbmTCMymRVVTZKPE6vRWjbbjYnk1u 8wh8saM729TAr+/5a3Opl66PQuJiJ82LvGAu2Y/teK/VMQDTzXRgXO6aUgTlHvxeUUOB etrOhM8DL23Xem2vRumEd91xdoGd6jwWE1f2DafycFLJ4VWXdJvNZoMAZeoyWcBeDuGv iEl4zgrcS53l9K6uC4Bj8hBcUJLD3crK/jXe6YdM4Xo9K/TcM5MHuZiKJyrbNaYg+lSI htOQ== X-Forwarded-Encrypted: i=1; AJvYcCVElTlRYci6oHfwCORBrimrBt8zivGV0iw4bFz5nfO8eTdopYM+TB+atZaeuUnH7rdvk0j0H+zBbg==@kvack.org X-Gm-Message-State: AOJu0YwYmr7BUEziJVAanbuw1mVnu/sRzJKehXLmX870JGn9Rab7232d B7i+yWTq8lvTjNshRT39chUGSy4eoZHTavVIkTta/3Hu7ZBw3UimvXrkHDmDarEDQ9Q= X-Gm-Gg: ASbGnctGIxzZNvEUmfFnKfkPSZf4lRnE6BnJOnRJDJJPwd3QLFFNazeW1baX/PRXO+a yOWHpf3ojMBaID/JetjCct0/nT+C/ESABtU8ZCiF7Ymb/dHoyd3c0CE2+5/QNe1PMY1L2kWx7qo 308k7lzjw2aVo3qlx7gE1IE+qZTYbPpA3Y6SAWKNq+5pF3B5ApJ1A+xPCzBhVqBrDjNjYKN//xy 8c0tyruqLbu2bL/DKPxYtzqt6PFq7I+sPQlf5enJoe6ZA4dXQZuE7NfOG3xnH7tCJSYnXsmGN02 QAsD2oTu1AK+xN6lXbRWeAxWL4FhVmXEYoKDN2KHwuyWin2tJSBPF/Q78nmVJoFWqnNHfw== X-Google-Smtp-Source: AGHT+IFz1D8/12XwlMH9nD/1jWSY5+ZD+bTmwmjROaWHbn7m2/jSD8nUPromTyulLxho5aGATXbK1g== X-Received: by 2002:a05:6000:18a3:b0:3a3:584b:f5d7 with SMTP id ffacd0b85a97d-3b5f186afb6mr95564f8f.5.1752165603110; Thu, 10 Jul 2025 09:40:03 -0700 (PDT) Received: from localhost ([2a02:8308:a00c:e200:bf57:83e8:7a62:80b7]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-454d511b43csm60588395e9.37.2025.07.10.09.40.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Jul 2025 09:40:02 -0700 (PDT) Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Date: Thu, 10 Jul 2025 18:40:02 +0200 Message-Id: From: =?utf-8?q?Radim_Kr=C4=8Dm=C3=A1=C5=99?= Subject: [PATCH] RISC-V: store precomputed percpu_offset in the task struct Cc: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , "linux-riscv" , To: "yunhui cui" References: <20250704084500.62688-1-cuiyunhui@bytedance.com> In-Reply-To: X-Rspamd-Server: rspam11 X-Rspam-User: X-Stat-Signature: 7h1rw731thu564atxt3pzma9fjmz3mpx X-Rspamd-Queue-Id: 460384001A X-HE-Tag: 1752165605-298087 X-HE-Meta: U2FsdGVkX19HygtoE2dXAuf5iFn81/R3cJfnch7132oxl+AqYTWRWJwyhYog05QwQMQPwdOmnsO++wDwLH/SVfJ8+XhaftxsjKMJ0sVN6A+Ux9I4TCsnxzeRAXtVbHdXq/RqNNxSf90kPbe68U0fxcCh0RBOFLNw/ZhJUXNTBpDoFJDhvfsPAZOZzZH3dNnEOqqaAkuN2b95yobn0JmiEEQ2dXu/HMkfrEPVV2eP8BZA7sXo1wGkKrWC5+lHbdgo4yL2oFwpSRLI4JxwhTmFgd19XfSjp1i6nLCfpvsy+rqEihESjGSqCWQPjYSs+Stzg9ztLuq9qsbYCPQ0qJWftf5VypvmUtpIINJjdWF/tU3j8wDKzaXkdp1Wd2I2PSmisBZY5qIlcwTIZAmwK4UIxFSODJR2EDkt0EM0LpSmEVjI5TTVl9N+T2o9jBIEinl2mbMpRbiRRC49fVZ/ueaosU1iGyEQaEvv1T/A3YZGXUtsxBSFXgnK8BPt7amD7wZDVLlTeOPC0WPs0VshqGXZxctU2z2UQuhN55Nms75JIhV3NS6kKB5WgBKF5wy27N8LJNgD6aBwlemllR+DnDksliaQQO+JHnSTtRUJGo7MmnyvIgIQSnkF6FfaYHBLcGj1E7c2xaPwTHzKSBdqgSyVTOvIoGkLgaaj+RGPc6LHVMipd4SM6FsQriS8XX22l7JhgFpG6s9cgpQi6SHIPSINCjex7VY1PzXA/i2d035x3cc+i/mWTq1F55xRiWE+2rQFkkemV9tvnu2fPNwBGZFAFnOswyVcfoBkDGsSp7jsLDSkLYvyrbCML+gBnKKCWNxk0av+9ZZFhSf+ve81AvmBM5LtkZ9D9BH+fM/w+75oi7zuDgrFWOs4YwidCddaRq3Bf9CyVlCox4bqIZJmZp5qo3Gf84ORWf+7GFEKVFnjnBysPn6CyABarx7m3Jntp2XgO8MMg0WH4icj/cywaG+ iEvgydc1 CniVJS2nk6p3LRyTs5t6vJpL/TAG/avqmMJGBEjisBKCF/13FmVAYUkH1pF9PLR1BoVVp/4iQLNQZXv5GU5nI65BgZMQsQwyBEY2m3EjZWielH0/JhGntx4pFIjCxArm5V/Flkxmtyamb3z1oBcCa9AIMdxaBOmz7z+vNU45KNmpPOeHa7zOXjhU7cw8OKqy6z810yICER7BLBKLPyrqULcjnOcaJsIS/OqLalqHBf/A3zMiPFpKo0/A9/9AnRNVQpQhUv3U041Gpzc0ZUyNGaSzPY4zF+ho56i9mg5GVLccg3OpEfzAqEyhyhOOPVdrZ7ypu02A4a2+8YLFtvcEEUkYk371hxGefxwJn4BHm1MadnKb9IfhiqlS7VULgPfQ3axec+N6gF0kcrIXKca7ljX7E5RO9ZKHyNhimKhM3o+Tb6537MStbLIXBMDRPyczSNgt89Zq+Rsy5FY6HT+9vNz9lJHevEo8RsflOqRHIRf706OfJLOTeAipa2CgTOLAaSbFMm2abFLNsYXYDT2EJKTQL/2b6Di4PWUtUSAdgzIL9HE3aoTom2RrNXhR3GhBubJ0aMcYalfSCMkjACzOD9/DdPqCI0Dee2glB898D+jBdhPYUSLEbFkOQJA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 2025-07-10T19:47:27+08:00, yunhui cui : > On Thu, Jul 10, 2025 at 2:35=E2=80=AFPM Radim Kr=C4=8Dm=C3=A1=C5=99 wrote: >> It would be to store the percpu offset in CSR_SCRATCH permanently, do >> the early exception register shuffling with a percpu area storage, and >> load the thread pointer from there as well. >> That method would also eliminate writing CSR_SCRATCH on every exception >> entry+exit, so maybe it makes sense to try it even if CSRs are slow... > > Based on the patch, optimizations for percpu offset have been added, > with the following data: > 6.989 7.046 6.976 6.986 7.001 7.017 7.007 7.064 7.008 7.039 > Geometric mean: 7.013248303 > Compared to reusing the scratch register, the performance has improved > by approximately 0.7%. Nice, thanks. The CSR_SCRATCH accesses seem much slower than GPRs, and possibly even slower than L1 hit -- we might gain more by storing the precomputed offset in the task struct. Can you check this patch as well? (It should be compared against a variant of CSR_SCRATCH that uses the TASK_TI_PERCPU_OFFSET optimizations, but we can try to interpolate. :]) ---8<--- RISC-V: store precomputed percpu_offset in the task struct Exploring the memoization trade-off... hoping that __set_task_cpu covers everything. :) I didn't put any though into where the percpu_offset should live, and the naive approach is to put it next to cpu. This needs more work to not break build on other arches, because I directly added RISC-V specific code to __set_task_cpu, to save time figuring out where else it could be. --- arch/riscv/include/asm/asm.h | 6 +----- arch/riscv/include/asm/percpu.h | 8 ++++++++ arch/riscv/include/asm/thread_info.h | 3 ++- arch/riscv/kernel/asm-offsets.c | 1 + arch/riscv/kernel/smpboot.c | 6 ++++++ kernel/sched/sched.h | 1 + 6 files changed, 19 insertions(+), 6 deletions(-) create mode 100644 arch/riscv/include/asm/percpu.h diff --git a/arch/riscv/include/asm/asm.h b/arch/riscv/include/asm/asm.h index a8a2af6dfe9d..2a6b831d9cdf 100644 --- a/arch/riscv/include/asm/asm.h +++ b/arch/riscv/include/asm/asm.h @@ -91,11 +91,7 @@ #endif =20 .macro asm_per_cpu dst sym tmp - REG_L \tmp, TASK_TI_CPU_NUM(tp) - slli \tmp, \tmp, PER_CPU_OFFSET_SHIFT - la \dst, __per_cpu_offset - add \dst, \dst, \tmp - REG_L \tmp, 0(\dst) + REG_L \tmp, TASK_TI_PERCPU_OFFSET(tp) la \dst, \sym add \dst, \dst, \tmp .endm diff --git a/arch/riscv/include/asm/percpu.h b/arch/riscv/include/asm/percp= u.h new file mode 100644 index 000000000000..c37a0fce6ebc --- /dev/null +++ b/arch/riscv/include/asm/percpu.h @@ -0,0 +1,8 @@ +#ifndef __ASM_PERCPU_H +#define __ASM_PERCPU_H + +#define __my_cpu_offset (current_thread_info()->percpu_offset) + +#include + +#endif diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/include/asm/= thread_info.h index f5916a70879a..da776b7a1d02 100644 --- a/arch/riscv/include/asm/thread_info.h +++ b/arch/riscv/include/asm/thread_info.h @@ -60,8 +60,9 @@ struct thread_info { */ long kernel_sp; /* Kernel stack pointer */ long user_sp; /* User stack pointer */ - int cpu; + int cpu; // TODO: could be packed better unsigned long syscall_work; /* SYSCALL_WORK_ flags */ + unsigned long percpu_offset; // XXX: randomly placed here #ifdef CONFIG_SHADOW_CALL_STACK void *scs_base; void *scs_sp; diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offset= s.c index 6e8c0d6feae9..9c7bb4d7e3b3 100644 --- a/arch/riscv/kernel/asm-offsets.c +++ b/arch/riscv/kernel/asm-offsets.c @@ -50,6 +50,7 @@ void asm_offsets(void) #endif =20 OFFSET(TASK_TI_CPU_NUM, task_struct, thread_info.cpu); + OFFSET(TASK_TI_PERCPU_OFFSET, task_struct, thread_info.percpu_offset); OFFSET(TASK_THREAD_F0, task_struct, thread.fstate.f[0]); OFFSET(TASK_THREAD_F1, task_struct, thread.fstate.f[1]); OFFSET(TASK_THREAD_F2, task_struct, thread.fstate.f[2]); diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c index 601a321e0f17..3c09c8f3e30c 100644 --- a/arch/riscv/kernel/smpboot.c +++ b/arch/riscv/kernel/smpboot.c @@ -41,6 +41,11 @@ =20 static DECLARE_COMPLETION(cpu_running); =20 +void __init smp_prepare_boot_cpu(void) +{ + current_thread_info()->percpu_offset =3D per_cpu_offset(smp_processor_id(= )); +} + void __init smp_prepare_cpus(unsigned int max_cpus) { int cpuid; @@ -183,6 +188,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidl= e) { int ret =3D 0; tidle->thread_info.cpu =3D cpu; + tidle->thread_info.percpu_offset =3D per_cpu_offset(cpu); =20 ret =3D start_secondary_cpu(cpu, tidle); if (!ret) { diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 475bb5998295..2180a85b1403 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2199,6 +2199,7 @@ static inline void __set_task_cpu(struct task_struct = *p, unsigned int cpu) */ smp_wmb(); WRITE_ONCE(task_thread_info(p)->cpu, cpu); + WRITE_ONCE(task_thread_info(p)->percpu_offset, per_cpu_offset(cpu)); p->wake_cpu =3D cpu; #endif }