From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86F8BC5B549 for ; Fri, 30 May 2025 09:33:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 236166B00B4; Fri, 30 May 2025 05:33:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1E6346B00B6; Fri, 30 May 2025 05:33:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0AE316B00B5; Fri, 30 May 2025 05:33:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id DD5BF6B00B3 for ; Fri, 30 May 2025 05:33:21 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 922EA120137 for ; Fri, 30 May 2025 09:33:21 +0000 (UTC) X-FDA: 83499061002.25.82F717A Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by imf29.hostedemail.com (Postfix) with ESMTP id B509712000D for ; Fri, 30 May 2025 09:33:19 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=IrZhVSil; spf=pass (imf29.hostedemail.com: domain of libo.gcs85@bytedance.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=libo.gcs85@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748597599; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nfNjslNRihNfiwS5akxUBJSCwiXFfcNf3iOCsj/nqzk=; b=cWxtZF16A/0R08jRzNf9bEGaLceSGuWnJqFJWvruSmbF2+m0ut9EJqhjD8tQ5REyRyyG8D 7mklqSZsScWKLcu+fdcWhXVn2fDfnTVVyMMCcVUC9SRGq4k06RgGxzZvx/xTZaRpPxxErv sy4ECYs+km17kwQWS//7gsHnVECO6oA= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=IrZhVSil; spf=pass (imf29.hostedemail.com: domain of libo.gcs85@bytedance.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=libo.gcs85@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748597599; a=rsa-sha256; cv=none; b=ATPr5jkfWUeJVN3jBwKkgfE2zVpLjfeJ4u1VCGwMrUbAet0fMAoKAM5v0bX5TXJ//xLkqQ Ol2tTIBiZDwxHmUUMG7ZGbZQyoOiLFohjPDRrOBjsjFUkMD80M/SJJ5kbrufBpjkwDc0/A l05a3Ezl2xTiHGjEe6OEAxO1iZWByqw= Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-2351ffb669cso11074175ad.2 for ; Fri, 30 May 2025 02:33:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1748597598; x=1749202398; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=nfNjslNRihNfiwS5akxUBJSCwiXFfcNf3iOCsj/nqzk=; b=IrZhVSil6MNL7VzoKhHTJGPEnHFfpeBAmK5EYZK/4U6NVGOar/6Fl+kC8f2ClEmfgZ H9I6w9wX209DEaQCh2Lk/nXtY4w9h76195wxPcJwAeZMTbGD9Rg71UPczCEcna3rh9zI 1eQGTjTO2NM2Sp+mJR+OZsjveHsnV3owN/c28w10pXfH6Tu++ylT5I11fnG5NXHbDOD0 nNF8bNuxUBITVOJlqWPbZiSsnNv8IA0EVHY/GJaT5FFSyUmDLyRq5ih0vRZGvt+OGOPR kiuSPgahFnx0R+nzUzE4vI8M2NebUuiR2U06auUZd/gMFf4KGuSttZADeMFw5U/yD6Nz 5BgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748597599; x=1749202399; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nfNjslNRihNfiwS5akxUBJSCwiXFfcNf3iOCsj/nqzk=; b=lBgOenNNOMzRSiBNHieCJFxczaQStKNA5DhapIQJa9ky5rvZn1MWnPgYPfOzwpZFqL WkxZ96nYM0QQM3d9KzYz+6S1KUwR6a+nzKbeYH2zhdOFuqczksX13jDfTekK4GtG9dgE oCXjvepKDkNjwOarwIhKz8VR5ESTixsSBXqSdrA3cDx4y4mKNFUBFynYQYgQe4r7jSdG eXAk5iSTVGpe8Zh05dAVrPTPaEfshP80bcdSmV1fHkdIQHLdpMe+sRu1oNCF8SOgSSx5 hvvj/wZ9a+B8bDKOF8PjyzFwSx5N1Tjy5x9hs8yPkvphQl63HSnBXk8EpqBBo8jNysBT XV+Q== X-Forwarded-Encrypted: i=1; AJvYcCUZMPBLjXH6+zlV1hokPiGSqW+o255yG7TSRTiYnmfq8jwy2ajlF19Eyxd4SngO42V3tfPt+ftCKw==@kvack.org X-Gm-Message-State: AOJu0Yx3nUu3wlykD0LlqwbzyGCi5rBk2X+Q8BX6X+oJJubyLpNSrf4y vN65MJBtWv06tpRJMTVDSO3vIdD+zjvtAuunIm/Ycw1+bT4tXtc/bNbHtVTskWtjeZY= X-Gm-Gg: ASbGncusRm2O/EJBygbSalieXJxhIOSlcywa6XfChJtPi2ro/yibcbAWbcPgRx4pL74 yViaRo4R4FtfZYILWH5Vy/dUN5jdPX6q0cINhnZViCx+s/mcmhIkIegX8XkdZb07rWHqpdcVHLA +DW6RPz5GctWphB9xxscWMv+uAie/0TkRga3mzbAnD+xB3vxA526fu1R/FWXQ2K1pHNfnhoZJFE h49zoZe79s7RIC/bPjSZiS/N/yI7QUPnbXWao6bz5tsvE1YWv915JHwt1jD4gtFmuXVmbfJoKNq GoWHiARRWsr29wSADGc84+WDGZWAM12/1EEHgvQSLT17MOrloe/WOs4p0cwkWf/N4HizlkUiY5f Rv/eOS4BBog== X-Google-Smtp-Source: AGHT+IEyXGYQ2GCz5bdSvzR+T/Z8enoWNYF8Thjcb9xcpCHkk9mGcXueyM6Xj7aL+U99FPOWwALWDw== X-Received: by 2002:a17:90b:3e45:b0:311:a623:676c with SMTP id 98e67ed59e1d1-31241e8d325mr4349242a91.27.1748597598509; Fri, 30 May 2025 02:33:18 -0700 (PDT) Received: from FQ627FTG20.bytedance.net ([63.216.146.178]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-3124e29f7b8sm838724a91.2.2025.05.30.02.33.03 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 30 May 2025 02:33:18 -0700 (PDT) From: Bo Li To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, luto@kernel.org, kees@kernel.org, akpm@linux-foundation.org, david@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, peterz@infradead.org Cc: dietmar.eggemann@arm.com, hpa@zytor.com, acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com, alexander.shishkin@linux.intel.com, jolsa@kernel.org, irogers@google.com, adrian.hunter@intel.com, kan.liang@linux.intel.com, viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, jannh@google.com, pfalcato@suse.de, riel@surriel.com, harry.yoo@oracle.com, linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, yinhongbo@bytedance.com, dengliang.1214@bytedance.com, xieyongji@bytedance.com, chaiwen.cc@bytedance.com, songmuchun@bytedance.com, yuanzhu@bytedance.com, chengguozhu@bytedance.com, sunjiadong.lff@bytedance.com, Bo Li Subject: [RFC v2 19/35] RPAL: add lazy switch main logic Date: Fri, 30 May 2025 17:27:47 +0800 Message-Id: <91e9db5ad4a3e1e58a666bd496e55d8f8db2c63c.1748594841.git.libo.gcs85@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Stat-Signature: cs3qfcfeamhpfrc71rmuosxwusfa8m48 X-Rspamd-Queue-Id: B509712000D X-Rspamd-Server: rspam11 X-HE-Tag: 1748597599-148677 X-HE-Meta: U2FsdGVkX1/pyDZM6YziruZqi14dI7NT1+YlU6aU0Eqkev2LGjKi7YN3sRewYJQlg8CG4nlfmSn4vDihyPVlhigsBbzApw1emRzNvUNjFjCHtvoeSXwID2pgNFpMZ/K0aCwVoMOZRDaEs4HvFk3aLWQA1HUBVLTLZDwd8rjzGJNo34j7vjmGJYW6RuMSMfavpSv2WNABDnkiDpOBm62wMtAp6IwGkh8VyvyMcFgxWoOiukvQxgfviT5SdEmosqe8J4tL1/0+4Zi3YYVdwhEs2fV+k+5EFHC5sgyYHaekxMc2UA3AyKHz52VAJ46nQd0uktALJFP/A9J0p6qAiEqarT7laNC5X6sdxMetbMBnSm9X0GzdY0/c4NNvjyjLrWBYg5FgNtdtlhKeaG3DR9ANHnpCcdVBP4Ir5kUAqRjkJ1vijCXHQc4+Ep7jN1KcTG9GfSfgGODWWGu0L3Mi8pYs/OnWXZ788Wp8TKIaF70pT6uTb+ZfvCcv0Xr29dXryhpQR1cGlj5ryNo1XtrfXFod7tQDV31Up5hxl95hOrtySfhDKDTzHjyVuRAC86w7y3bhBtdBwnhpfTMlgYsH3VmyFhNP2yaBF9LUL+URuuh7I4GqroNLuqxvcDqVlUcVOPh7KWPA7m8ZpahW/NtOdTWnynxxUoiDP9y6XmnYZhDO5viRXumcH4F4EUXKRT7y2Drrm+/AL3l/MaM76x6rpMMHWwoDl/v+kdK4TIpWimY3UYuB4vt6Qm3OPkb5ABhuz8ZxiDcQl7n2G9rYcWRFaVGheDXgOzzoz4X9XvSiy/jk5ySZ9G1Azuzp+myw/6a6bsmjJLHowie/gRWV60SbopPE5oJLmR+jo3qwTMDs344FwFt2WwQujxYNjxQTm5LgxA9YYyivL5Tygk9GO3OiiBNKaN/9yNjVdtNN1XcgCDEdCnk0slte3e1jxD1yTYFLw627MPZQ3E/EKLyBIic3h9o j7gVnKOK Gng3m2/lSHHWWsPxEAKIM/3otPAIkriunUIcRhYuPuaC2548keVYbHyV/smcA5c56+sfZ9w0fEFgGAjL6FimH6eDTwF4ipJs1xLuPax6IF7T26Q/SnLiWw/Ris01UOxq+bOtJvaCkIjpXa/Cv4kJPJV7B/rwYYGD66ZsZ+rijDsbpPk97dJCCqOcU7to1Xj57MZO5FXlK4wtIgZ5xaUnYOfsd29q++CTEwJEV3IWEV2mDsX2d4esKAeesfI21jtsa9h62o1qzlxIWgfFo6QFQmcFrFKobWShj6myaJEbii3gnkweqFwKD5bV9nQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The implementation of lazy switch differs from a regular schedule() in three key aspects: 1. It occurs at the kernel entry with irq disabled. 2. The next task is explicitly pre-determined rather than selected by the scheduler. 3. User-space context (excluding general-purpose registers) remains unchanged across the switch. This patch introduces the rpal_schedule() interface to address these requirements. Firstly, the rpal_schedule() skips irq enabling in finish_lock_switch(), preserving the irq-disabled state required during kernel entry. Secondly, the rpal_pick_next_task() interface is used to explicitly specify the target task, bypassing the default scheduler's decision-making process. Thirdly, non-general-purpose registers (e.g., FPU, vector units) are not restored during the switch, ensuring user space context remains intact. Handling of general-purpose registers will be addressed in a subsequent patch by RPAL before invoking rpal_schedule(). Signed-off-by: Bo Li --- arch/x86/kernel/process_64.c | 75 +++++++++++++++++++++ include/linux/rpal.h | 3 + kernel/sched/core.c | 126 +++++++++++++++++++++++++++++++++++ 3 files changed, 204 insertions(+) diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 4830e9215de7..efc3f238c486 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -753,6 +753,81 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) return prev_p; } +#ifdef CONFIG_RPAL +__no_kmsan_checks +__visible __notrace_funcgraph struct task_struct * +__rpal_switch_to(struct task_struct *prev_p, struct task_struct *next_p) +{ + struct thread_struct *prev = &prev_p->thread; + struct thread_struct *next = &next_p->thread; + int cpu = smp_processor_id(); + + WARN_ON_ONCE(IS_ENABLED(CONFIG_DEBUG_ENTRY) && + this_cpu_read(hardirq_stack_inuse)); + + /* no need to switch fpu */ + /* __fpu_invalidate_fpregs_state() */ + x86_task_fpu(prev_p)->last_cpu = -1; + /* fpregs_activate() */ + __this_cpu_write(fpu_fpregs_owner_ctx, x86_task_fpu(next_p)); + trace_x86_fpu_regs_activated(x86_task_fpu(next_p)); + x86_task_fpu(next_p)->last_cpu = cpu; + set_tsk_thread_flag(prev_p, TIF_NEED_FPU_LOAD); + clear_tsk_thread_flag(next_p, TIF_NEED_FPU_LOAD); + + /* no need to save fs */ + savesegment(gs, prev_p->thread.gsindex); + if (static_cpu_has(X86_FEATURE_FSGSBASE)) + prev_p->thread.gsbase = __rdgsbase_inactive(); + else + save_base_legacy(prev_p, prev_p->thread.gsindex, GS); + + load_TLS(next, cpu); + + arch_end_context_switch(next_p); + + savesegment(es, prev->es); + if (unlikely(next->es | prev->es)) + loadsegment(es, next->es); + + savesegment(ds, prev->ds); + if (unlikely(next->ds | prev->ds)) + loadsegment(ds, next->ds); + + /* no need to load fs */ + if (static_cpu_has(X86_FEATURE_FSGSBASE)) { + if (unlikely(prev->gsindex || next->gsindex)) + loadseg(GS, next->gsindex); + + __wrgsbase_inactive(next->gsbase); + } else { + load_seg_legacy(prev->gsindex, prev->gsbase, next->gsindex, + next->gsbase, GS); + } + + /* skip pkru load as we will use pkru in RPAL */ + + this_cpu_write(current_task, next_p); + this_cpu_write(cpu_current_top_of_stack, task_top_of_stack(next_p)); + + /* no need to load fpu */ + + update_task_stack(next_p); + switch_to_extra(prev_p, next_p); + + if (static_cpu_has_bug(X86_BUG_SYSRET_SS_ATTRS)) { + unsigned short ss_sel; + + savesegment(ss, ss_sel); + if (ss_sel != __KERNEL_DS) + loadsegment(ss, __KERNEL_DS); + } + resctrl_sched_in(next_p); + + return prev_p; +} +#endif + void set_personality_64bit(void) { /* inherit personality from parent */ diff --git a/include/linux/rpal.h b/include/linux/rpal.h index 45137770fac6..0813db4552c0 100644 --- a/include/linux/rpal.h +++ b/include/linux/rpal.h @@ -487,4 +487,7 @@ int rpal_try_to_wake_up(struct task_struct *p); int rpal_init_thread_pending(struct rpal_common_data *rcd); void rpal_free_thread_pending(struct rpal_common_data *rcd); int rpal_set_cpus_allowed_ptr(struct task_struct *p, bool is_lock); +void rpal_schedule(struct task_struct *next); +asmlinkage struct task_struct * +__rpal_switch_to(struct task_struct *prev_p, struct task_struct *next_p); #endif diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 2e76376c5172..760d88458b39 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6827,6 +6827,12 @@ static bool try_to_block_task(struct rq *rq, struct task_struct *p, if (unlikely(is_special_task_state(task_state))) flags |= DEQUEUE_SPECIAL; +#ifdef CONFIG_RPAL + /* DELAY_DEQUEUE will cause CPU stalls after lazy switch, skip it */ + if (rpal_test_current_thread_flag(RPAL_RECEIVER_BIT)) + flags |= DEQUEUE_SPECIAL; +#endif + /* * __schedule() ttwu() * prev_state = prev->state; if (p->on_rq && ...) @@ -11005,6 +11011,62 @@ void sched_enq_and_set_task(struct sched_enq_and_set_ctx *ctx) #endif /* CONFIG_SCHED_CLASS_EXT */ #ifdef CONFIG_RPAL +static struct rq *rpal_finish_task_switch(struct task_struct *prev) + __releases(rq->lock) +{ + struct rq *rq = this_rq(); + struct mm_struct *mm = rq->prev_mm; + + if (WARN_ONCE(preempt_count() != 2*PREEMPT_DISABLE_OFFSET, + "corrupted preempt_count: %s/%d/0x%x\n", + current->comm, current->pid, preempt_count())) + preempt_count_set(FORK_PREEMPT_COUNT); + + rq->prev_mm = NULL; + vtime_task_switch(prev); + perf_event_task_sched_in(prev, current); + finish_task(prev); + tick_nohz_task_switch(); + + /* finish_lock_switch, not enable irq */ + spin_acquire(&__rq_lockp(rq)->dep_map, 0, 0, _THIS_IP_); + __balance_callbacks(rq); + raw_spin_rq_unlock(rq); + + finish_arch_post_lock_switch(); + kcov_finish_switch(current); + kmap_local_sched_in(); + + fire_sched_in_preempt_notifiers(current); + if (mm) { + membarrier_mm_sync_core_before_usermode(mm); + mmdrop(mm); + } + + return rq; +} + +static __always_inline struct rq *rpal_context_switch(struct rq *rq, + struct task_struct *prev, + struct task_struct *next, + struct rq_flags *rf) +{ + /* irq is off */ + prepare_task_switch(rq, prev, next); + arch_start_context_switch(prev); + + membarrier_switch_mm(rq, prev->active_mm, next->mm); + switch_mm_irqs_off(prev->active_mm, next->mm, next); + lru_gen_use_mm(next->mm); + + switch_mm_cid(rq, prev, next); + + prepare_lock_switch(rq, next, rf); + __rpal_switch_to(prev, next); + barrier(); + return rpal_finish_task_switch(prev); +} + #ifdef CONFIG_SCHED_CORE static inline struct task_struct * __rpal_pick_next_task(struct rq *rq, struct task_struct *prev, @@ -11214,4 +11276,68 @@ rpal_pick_next_task(struct rq *rq, struct task_struct *prev, BUG(); } #endif + +/* enter and exit with irqs disabled() */ +void __sched notrace rpal_schedule(struct task_struct *next) +{ + struct task_struct *prev, *picked; + bool preempt = false; + unsigned long *switch_count; + unsigned long prev_state; + struct rq_flags rf; + struct rq *rq; + int cpu; + + /* sched_mode = SM_NONE */ + + preempt_disable(); + + trace_sched_entry_tp(preempt, CALLER_ADDR0); + + cpu = smp_processor_id(); + rq = cpu_rq(cpu); + prev = rq->curr; + + schedule_debug(prev, preempt); + + if (sched_feat(HRTICK) || sched_feat(HRTICK_DL)) + hrtick_clear(rq); + + rcu_note_context_switch(preempt); + rq_lock(rq, &rf); + smp_mb__after_spinlock(); + + rq->clock_update_flags <<= 1; + update_rq_clock(rq); + rq->clock_update_flags = RQCF_UPDATED; + + switch_count = &prev->nivcsw; + + prev_state = READ_ONCE(prev->__state); + if (prev_state) { + try_to_block_task(rq, prev, &prev_state); + switch_count = &prev->nvcsw; + } + + picked = rpal_pick_next_task(rq, prev, next, &rf); + rq_set_donor(rq, next); + if (unlikely(next != picked)) + panic("rpal error: next != picked\n"); + + clear_tsk_need_resched(prev); + clear_preempt_need_resched(); + rq->last_seen_need_resched_ns = 0; + + rq->nr_switches++; + RCU_INIT_POINTER(rq->curr, next); + ++*switch_count; + migrate_disable_switch(rq, prev); + psi_account_irqtime(rq, prev, next); + psi_sched_switch(prev, next, !task_on_rq_queued(prev) || + prev->se.sched_delayed); + trace_sched_switch(preempt, prev, next, prev_state); + rq = rpal_context_switch(rq, prev, next, &rf); + trace_sched_exit_tp(true, CALLER_ADDR0); + preempt_enable_no_resched(); +} #endif -- 2.20.1