From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70AA2C5B549 for ; Fri, 30 May 2025 09:32:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0FBAB6B0089; Fri, 30 May 2025 05:32:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 084DD6B009A; Fri, 30 May 2025 05:32:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E414F6B00A0; Fri, 30 May 2025 05:32:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id BD8196B0089 for ; Fri, 30 May 2025 05:32:35 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 7745E86246 for ; Fri, 30 May 2025 09:32:35 +0000 (UTC) X-FDA: 83499059070.11.E3191B8 Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) by imf16.hostedemail.com (Postfix) with ESMTP id 91B8018000D for ; Fri, 30 May 2025 09:32:33 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=XtkuuZ5d; spf=pass (imf16.hostedemail.com: domain of libo.gcs85@bytedance.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=libo.gcs85@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748597553; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PX/EGT89O+ZEiBiVi4yPzaBsD+Uha5lcnizaJutApCU=; b=EGzrXEvJvabKt7M9pf3MU0bLmo3N+Ze4OdzdZfLL+5Ku5JvfzBRUFH1MusMXqqM6yj6991 T9SyJta7CKRGJQOIEuHd/p/o19A+wHrB0/VIM8NV2NYgqoM2Z0JWFXQvtcL469P/5HqwmR pnzPUd0CENU7MF1Nvk/sj5mh/0V2e2s= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748597553; a=rsa-sha256; cv=none; b=5koa01bdtaagzeyb95DIZAS5A0JnHADhNe12Jh+JuHg/AKHVIPxWtc883qBFxRGZFUI6aW oGpfrk+QnKLK3NgteesZwXNCuPWJma+zgem5FP+Mg6+iuRggRilzRLq10dN/BgJTb3UPTk +6ATK6XLV8s8fJpPTinQ2sgdTnaSfYM= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=XtkuuZ5d; spf=pass (imf16.hostedemail.com: domain of libo.gcs85@bytedance.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=libo.gcs85@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-2352400344aso9594325ad.2 for ; Fri, 30 May 2025 02:32:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1748597552; x=1749202352; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=PX/EGT89O+ZEiBiVi4yPzaBsD+Uha5lcnizaJutApCU=; b=XtkuuZ5dtM1DDfmWe35/i4C0t2msxh9csQo540jvLLopG8USUrtEEM2bSbCmrd+XIx BsH4gQU5Gp2hlyMqUnxS5eYMZHu0lx+dSwWrPduUKcMZqkRz60dBuvKKmGnWbiXnm6Ji MSvDpOEYSvM1bByOFsluzund2l2nwq+dKqDTPm51hqct6BN80EFVr7fC7uWRpVF+geIM ctUsgU0qWatMsjJMwA1hEOQV/8cce5EJq6jQMfivNlfBGcOG+FlkYycdrtxpeVfWQOV0 AJplHAbz4p+gEeMw1gXhRecn9+2nlmOkKJiVdCysiB3aHBSHX9zWE07Ze78ypT1HD4MM Y01Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748597552; x=1749202352; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PX/EGT89O+ZEiBiVi4yPzaBsD+Uha5lcnizaJutApCU=; b=BlRhc9RsJrA2LBDk/NTy9q+vbm1a3MxebwLFFKqjI599z/DQYJdXMtjyGxMAjCpyMx 3NgN61axsUrmCvgM5m6RA6cnPl2i3+PJyh/fNd8mpEnoA7B3RaOdYibFixhpDxGbjBkg 2lQ5AaxKc74KAhoP6782F/PLON8MOEG2J0G1/ImdPGEzP5EqlkvABHBGU1zawCher66l ZpJs0NRdWtZJJ8LkwMEBbHHx2gumsg9AnIHWovBzDNaei2hNOHkRYEhTQmZQG4C8U4tX 0qFERig/L531E72DQFnJGf47wAr4yG0LUHwc1IUJBcWfrwXzeHUCf9v6wiv9k7aAxplG DcWA== X-Forwarded-Encrypted: i=1; AJvYcCWLvP880m1efrKsfi0aR3Z8TTM7mSbnOkaFPY1PTxZr6rsA7qcDaAIOLfBS/1MD6es5rcuYUG7fWQ==@kvack.org X-Gm-Message-State: AOJu0YxZYMgpR4ALEJ9saFMNrsF9hLKsdjplTQNu01bREYTiJYdInJdn 4br95RU4Zc1bIeMpZ+jwozqKJJqwOc2znxLlAJV553vZl7A1GncsBeFIyqESLmKwrc0= X-Gm-Gg: ASbGncuEb3qAKOX28EbyzSwTB8DdrVN5SbQExTXJlubCu+osfI+HKjhHn4ZZYwLiRht k3cczM8vSaK40gLMTU+Mo3rE2zmHKhnR6nnTDyxb9cNHpmItipcNhsl8Vvp9KbcSRHvbhBtS2xa U2slDMqSq9zyz5gMIAm8hulcZEto1PNz68WbECUWyKPVnlENg+oe4lOstOUuknBw866Fmw48EI6 rnuil4VwLPGIfv5wp1u8i83fApPzEQf95vnvsNh1cqJeAo0lxFf+qzuVduK1/bIyT40sRu53EiE 9rS74yXSEIpz7y7eFTaweBsj0lmkL4Y754qm6qArdgKkVklfVf/kcHOkeM7h1S9BdRrN0/3fmoS vRIRe530/lQ== X-Google-Smtp-Source: AGHT+IHBBO2+1F1DkG3gsf9hrt/klQKja6FHxL7s2ecvsv49mNDcLA/dM81wOfpp/PHgMJRmMz0RTg== X-Received: by 2002:a17:90b:3dce:b0:311:baa0:89ca with SMTP id 98e67ed59e1d1-31241e98d1bmr3478119a91.34.1748597552297; Fri, 30 May 2025 02:32:32 -0700 (PDT) Received: from FQ627FTG20.bytedance.net ([63.216.146.178]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-3124e29f7b8sm838724a91.2.2025.05.30.02.32.17 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 30 May 2025 02:32:32 -0700 (PDT) From: Bo Li To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, luto@kernel.org, kees@kernel.org, akpm@linux-foundation.org, david@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, peterz@infradead.org Cc: dietmar.eggemann@arm.com, hpa@zytor.com, acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com, alexander.shishkin@linux.intel.com, jolsa@kernel.org, irogers@google.com, adrian.hunter@intel.com, kan.liang@linux.intel.com, viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, jannh@google.com, pfalcato@suse.de, riel@surriel.com, harry.yoo@oracle.com, linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, yinhongbo@bytedance.com, dengliang.1214@bytedance.com, xieyongji@bytedance.com, chaiwen.cc@bytedance.com, songmuchun@bytedance.com, yuanzhu@bytedance.com, chengguozhu@bytedance.com, sunjiadong.lff@bytedance.com, Bo Li Subject: [RFC v2 16/35] RPAL: add cpu lock interface Date: Fri, 30 May 2025 17:27:44 +0800 Message-Id: <8ff6cea94a6438a0856c86a11d56be462314b1f8.1748594841.git.libo.gcs85@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 91B8018000D X-Stat-Signature: d55h8yf3y1a6fosee6otp7mue48hpzre X-Rspam-User: X-HE-Tag: 1748597553-850849 X-HE-Meta: U2FsdGVkX192zVs28zGMReaOKKVnu00M+g25N7BSM/TjiaF/L8T03Xrd/F30+Ot6TQTXOnLiJ3x9KfPRZWojYe+ePpmJ4Ug2WOkJs2MeC0chLbnzJutfTI4wb20On+NwYPQKSg+0nZloRU5gjC62xwmOIKnPj9j0Q6K8J5PJA4qVXvPSvCcMEUg6Z1WopuPo18k5rnD6//uMlhpyim0/5vTMrD5fxOCPnPnV3qimm8Hw/1NgyxftZ9YAPpMAYZzYnkQToDrSsE7m8QdQvtFfX9PzRQBjMPyFHsKbXgfEwI8bbwKcbQlU5ujqcSWAklWhsIrIZYDSpDUtGg/lPWyY1TrAbvabJnnrQQWVgLQZuxBTr+EjT2mWU0dsoaG9ADceCiZxDYBhwBCqHLe7uojvAwwjcORtJLbHO+WTjVLfnuyjomWdZxf2rUoiKlH68gt7aFCs68/ompQyIB+vU1fVPE7hKG4GTYAfAoUl0jc0rSBz1b1ZY5wRtNYgcErnb8vsL6rzTpGJnQYypVSgKYKW+3LWkXm/iBmb4ntHu/3Sle65GFAX4BY+r3dBAc/xekE4G/DFMBYlgmfPeXlRU+MOA/Y66KcSgYcv85/D7jpyrBUkpMdrFMcgxYsfgRt9aUj2zaFa6o6bJt2XcJCvtmhaZWJHfjQV8CHZN52Q00KcSdXR7BQNWk8dUc+wDhJNBzd3pJ1XFyh0Bh4ZfyOGr7DyAxvL8oA6duaC97oZMQarN3MTr6BuHKX1ssv9H3bMFtcsBYWDIORNPPkYr8ittNRhGk2GE3H8Ps8CEdw+s0NbWXcmkdl8XzxWj69e5OIE4bAWeww6BaXh37dmpyAObItsJS/laJ3Pq3LTViUCg5On4BvzlJTo025ij3neRywGzcx1vLokfPPySxmnve3tLecRbIfaYlLmKJUT5awl69AFEO0uz6Wh/P7/aTz9QWIyk2X6ha9WjfUcVcXb4gPpNXf 3f+D/E5U FH2+K3ubDITzsiLzfleAWiKnVk2IPt9gewRvIwKI/AEFlr9Ek1fc+eGJkvoVbui5EP9EUHbyxt78qFvM0JPpc3LAcGeVAq0D2df9LuLd9goBmozNk7hldX/W5MqvkkrqOJG1zFYs6Km/s8wm4/aOZXESdduScDhfEOjKQ4dUbSHaBiQ/IwjgpdT3dgve9AcVgbdmB9VjkBddZbNw9PxKaWRykMIRyz4gRG1gQJAQF2wxhdmZYLn3XC7TTTf+4oJhDezyHYPBSwIT9ng26fvLb6hH0RhDMvmS2bQQ/eIriv5ZT4GnHU8FYgFtINg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Lazy switch enables the kernel to switch from one task to another to keep the kernel context and user context matched. For the scheduler, both tasks involved in the context switch must reside in the same run queue (rq). Therefore, before a lazy switch occurs, the kernel must first bind both tasks to the same CPU to facilitate the subsequent context switch. This patch introduces the rpal_lock_cpu() interface, which binds two tasks to the same CPU while bypassing cpumask restrictions. The rpal_unlock_cpu() function serves as the inverse operation to release this binding. To ensure consistency, the kernel must prevent other threads from modifying the CPU affinity of tasks locked by rpal_lock_cpu(). Therefore, when using set_cpus_allowed_ptr() to change a task's CPU affinity, other threads must wait until the binding established by rpal_lock_cpu() is released before proceeding with modifications. Signed-off-by: Bo Li --- arch/x86/rpal/core.c | 18 +++++++ arch/x86/rpal/thread.c | 14 ++++++ include/linux/rpal.h | 8 +++ kernel/sched/core.c | 109 +++++++++++++++++++++++++++++++++++++++++ 4 files changed, 149 insertions(+) diff --git a/arch/x86/rpal/core.c b/arch/x86/rpal/core.c index 61f5d40b0157..c185a453c1b2 100644 --- a/arch/x86/rpal/core.c +++ b/arch/x86/rpal/core.c @@ -15,6 +15,24 @@ int __init rpal_init(void); bool rpal_inited; unsigned long rpal_cap; +static inline void rpal_lock_cpu(struct task_struct *tsk) +{ + rpal_set_cpus_allowed_ptr(tsk, true); + if (unlikely(!irqs_disabled())) { + local_irq_disable(); + rpal_err("%s: irq is enabled\n", __func__); + } +} + +static inline void rpal_unlock_cpu(struct task_struct *tsk) +{ + rpal_set_cpus_allowed_ptr(tsk, false); + if (unlikely(!irqs_disabled())) { + local_irq_disable(); + rpal_err("%s: irq is enabled\n", __func__); + } +} + int __init rpal_init(void) { int ret = 0; diff --git a/arch/x86/rpal/thread.c b/arch/x86/rpal/thread.c index e50a4c865ff8..bc203e9c6e5e 100644 --- a/arch/x86/rpal/thread.c +++ b/arch/x86/rpal/thread.c @@ -47,6 +47,10 @@ int rpal_register_sender(unsigned long addr) } rpal_common_data_init(&rsd->rcd); + if (rpal_init_thread_pending(&rsd->rcd)) { + ret = -ENOMEM; + goto free_rsd; + } rsd->rsp = rsp; rsd->scc = (struct rpal_sender_call_context *)(addr - rsp->user_start + rsp->kernel_start); @@ -58,6 +62,8 @@ int rpal_register_sender(unsigned long addr) return 0; +free_rsd: + kfree(rsd); put_shared_page: rpal_put_shared_page(rsp); out: @@ -77,6 +83,7 @@ int rpal_unregister_sender(void) rpal_put_shared_page(rsd->rsp); rpal_clear_current_thread_flag(RPAL_SENDER_BIT); + rpal_free_thread_pending(&rsd->rcd); kfree(rsd); atomic_dec(&cur->thread_cnt); @@ -116,6 +123,10 @@ int rpal_register_receiver(unsigned long addr) } rpal_common_data_init(&rrd->rcd); + if (rpal_init_thread_pending(&rrd->rcd)) { + ret = -ENOMEM; + goto free_rrd; + } rrd->rsp = rsp; rrd->rcc = (struct rpal_receiver_call_context *)(addr - rsp->user_start + @@ -128,6 +139,8 @@ int rpal_register_receiver(unsigned long addr) return 0; +free_rrd: + kfree(rrd); put_shared_page: rpal_put_shared_page(rsp); out: @@ -147,6 +160,7 @@ int rpal_unregister_receiver(void) rpal_put_shared_page(rrd->rsp); rpal_clear_current_thread_flag(RPAL_RECEIVER_BIT); + rpal_free_thread_pending(&rrd->rcd); kfree(rrd); atomic_dec(&cur->thread_cnt); diff --git a/include/linux/rpal.h b/include/linux/rpal.h index 4f4719bb7eae..5b115be14a55 100644 --- a/include/linux/rpal.h +++ b/include/linux/rpal.h @@ -99,6 +99,7 @@ extern unsigned long rpal_cap; enum rpal_task_flag_bits { RPAL_SENDER_BIT, RPAL_RECEIVER_BIT, + RPAL_CPU_LOCKED_BIT, }; enum rpal_receiver_state { @@ -270,8 +271,12 @@ struct rpal_shared_page { struct rpal_common_data { /* back pointer to task_struct */ struct task_struct *bp_task; + /* pending struct for cpu locking */ + void *pending; /* service id of rpal_service */ int service_id; + /* cpumask before locked */ + cpumask_t old_mask; }; struct rpal_receiver_data { @@ -464,4 +469,7 @@ struct mm_struct *rpal_pf_get_real_mm(unsigned long address, int *rebuild); extern void rpal_pick_mmap_base(struct mm_struct *mm, struct rlimit *rlim_stack); int rpal_try_to_wake_up(struct task_struct *p); +int rpal_init_thread_pending(struct rpal_common_data *rcd); +void rpal_free_thread_pending(struct rpal_common_data *rcd); +int rpal_set_cpus_allowed_ptr(struct task_struct *p, bool is_lock); #endif diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 045e92ee2e3b..a862bf4a0161 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3155,6 +3155,104 @@ static int __set_cpus_allowed_ptr_locked(struct task_struct *p, return ret; } +#ifdef CONFIG_RPAL +int rpal_init_thread_pending(struct rpal_common_data *rcd) +{ + struct set_affinity_pending *pending; + + pending = kzalloc(sizeof(*pending), GFP_KERNEL); + if (!pending) + return -ENOMEM; + pending->stop_pending = 0; + pending->arg = (struct migration_arg){ + .task = current, + .pending = NULL, + }; + rcd->pending = pending; + return 0; +} + +void rpal_free_thread_pending(struct rpal_common_data *rcd) +{ + if (rcd->pending != NULL) + kfree(rcd->pending); +} + +/* + * CPU lock is forced and all cpumask will be ignored by RPAL temporary. + */ +int rpal_set_cpus_allowed_ptr(struct task_struct *p, bool is_lock) +{ + const struct cpumask *cpu_valid_mask = cpu_active_mask; + struct set_affinity_pending *pending = p->rpal_cd->pending; + struct cpumask mask; + unsigned int dest_cpu; + struct rq_flags rf; + struct rq *rq; + int ret = 0; + struct affinity_context ac = { + .new_mask = &mask, + .flags = 0, + }; + + if (unlikely(p->flags & PF_KTHREAD)) + rpal_err("p: %d, p->flags & PF_KTHREAD\n", p->pid); + + rq = task_rq_lock(p, &rf); + + if (is_lock) { + cpumask_copy(&p->rpal_cd->old_mask, &p->cpus_mask); + cpumask_clear(&mask); + cpumask_set_cpu(smp_processor_id(), &mask); + rpal_set_task_thread_flag(p, RPAL_CPU_LOCKED_BIT); + } else { + cpumask_copy(&mask, &p->rpal_cd->old_mask); + rpal_clear_task_thread_flag(p, RPAL_CPU_LOCKED_BIT); + } + + update_rq_clock(rq); + + if (cpumask_equal(&p->cpus_mask, ac.new_mask)) + goto out; + /* + * Picking a ~random cpu helps in cases where we are changing affinity + * for groups of tasks (ie. cpuset), so that load balancing is not + * immediately required to distribute the tasks within their new mask. + */ + dest_cpu = cpumask_any_and_distribute(cpu_valid_mask, ac.new_mask); + if (dest_cpu >= nr_cpu_ids) { + ret = -EINVAL; + goto out; + } + __do_set_cpus_allowed(p, &ac); + if (cpumask_test_cpu(task_cpu(p), &p->cpus_mask)) { + preempt_disable(); + task_rq_unlock(rq, p, &rf); + preempt_enable(); + } else { + pending->arg.dest_cpu = dest_cpu; + + if (task_on_cpu(rq, p) || + READ_ONCE(p->__state) == TASK_WAKING) { + preempt_disable(); + task_rq_unlock(rq, p, &rf); + stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop, + &pending->arg, &pending->stop_work); + } else { + if (task_on_rq_queued(p)) + rq = move_queued_task(rq, &rf, p, dest_cpu); + task_rq_unlock(rq, p, &rf); + } + } + + return 0; + +out: + task_rq_unlock(rq, p, &rf); + return ret; +} +#endif + /* * Change a given task's CPU affinity. Migrate the thread to a * proper CPU and schedule it away if the CPU it's executing on @@ -3169,7 +3267,18 @@ int __set_cpus_allowed_ptr(struct task_struct *p, struct affinity_context *ctx) struct rq_flags rf; struct rq *rq; +#ifdef CONFIG_RPAL +retry: + rq = task_rq_lock(p, &rf); + if (rpal_test_task_thread_flag(p, RPAL_CPU_LOCKED_BIT)) { + update_rq_clock(rq); + task_rq_unlock(rq, p, &rf); + schedule(); + goto retry; + } +#else rq = task_rq_lock(p, &rf); +#endif /* * Masking should be skipped if SCA_USER or any of the SCA_MIGRATE_* * flags are set. -- 2.20.1