From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA056C54FB9 for ; Tue, 21 Nov 2023 06:14:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 192D76B03E1; Tue, 21 Nov 2023 01:14:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 142E86B03E7; Tue, 21 Nov 2023 01:14:09 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 00A5E6B03E9; Tue, 21 Nov 2023 01:14:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id E3CA86B03E1 for ; Tue, 21 Nov 2023 01:14:08 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id A4BA11CB597 for ; Tue, 21 Nov 2023 06:14:08 +0000 (UTC) X-FDA: 81480946176.07.6FB8A71 Received: from mail-pg1-f175.google.com (mail-pg1-f175.google.com [209.85.215.175]) by imf27.hostedemail.com (Postfix) with ESMTP id DCB3E40017 for ; Tue, 21 Nov 2023 06:14:06 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Oxsg7wnk; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf27.hostedemail.com: domain of qiang.zhang1211@gmail.com designates 209.85.215.175 as permitted sender) smtp.mailfrom=qiang.zhang1211@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700547246; a=rsa-sha256; cv=none; b=UZUPqYNy18DeaRfpZ7MiyB6/gN3wZMqIDzq4/LTSemgURnu4EDXRhChy1Rik7advbgMwVc RI7PkZefRAGWxkXjBoPBe0sF4YhV9FAA5G/DPqgbGdB6Kqi07awaDDhxqFWHpZQCWkPa1R HJt1MlsVtNqqzezLER0+VQbAIWUSogw= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Oxsg7wnk; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf27.hostedemail.com: domain of qiang.zhang1211@gmail.com designates 209.85.215.175 as permitted sender) smtp.mailfrom=qiang.zhang1211@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700547246; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GOIjomM++7B1dIn3OS6W7wRGVNhGN6yLF0JTgK8dl3k=; b=CmUqc9FEsRlFLpEOrLFaNO0nIAJO8qKj1eKfZ4UfJYOHzHzvYrM3QeFgV7N4iLauqRaRw1 lVsP/7q/ZuPTXMjg8nT21cgurgVZhYhIf+Ch36eqmrCnIyVANhI750ohy+TVjTo7l9ilI7 aJkPmWlF9aoBw3YVyDzkQZxOfy3VaS4= Received: by mail-pg1-f175.google.com with SMTP id 41be03b00d2f7-5c1f8b0c149so2920313a12.3 for ; Mon, 20 Nov 2023 22:14:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1700547245; x=1701152045; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=GOIjomM++7B1dIn3OS6W7wRGVNhGN6yLF0JTgK8dl3k=; b=Oxsg7wnkfljbZ7bNSaRli/beCAw2FkpCc1xdqzh/dwqDBvvW+pPO53EQJxAB08XKxU CNonSrds0a9WzRP46nAiSgzRK9XvgjuUSESa8ivw3GN9y48g9BnyJis0HpnrTFSWoBk7 s4YXNrJsa9DJEhAMTxyGduwDsYz3k+hfoCLBt7B+/VnDN3QjFO5hXqZzqvPKM7PKc9Ip QZed5n2Bfr+6gjMesdCn843Hr4kQGXkSHyAm6qxDtr5pRix9ze67S5wTbUlOa+7VEC0P +wwze0oaRWOPXkB6wM0LAc/HVoQd/UbJCJmz03OTIw0bMa85d6qwUltMqzjBiYfXyXbB 1JXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700547245; x=1701152045; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=GOIjomM++7B1dIn3OS6W7wRGVNhGN6yLF0JTgK8dl3k=; b=utb6Dn6fhsvOIubbOWL33vfEhASUWA2O2+tvlr70BKAiFRnQWzzHuSFI1U99Z1V61S hJwMO+i4jjOkFF2IjVT0ThfvR/RnjzgLxCZDWt0O2tCO/fGwKul2C/BPM38DXdvxcaA9 1gkw+PgsDLcPaGGEoIlyBZDnnf/1T/+7CP6RKnVdyOv7YwGU7QtWO7Efb354AwVu9cdT 17o2WGTbnnySqQ8ROFfGJQH0yxZFRZKsC8HD2eiYExAd6j72bK2HR6vlvVjgDCbUrT5A ekb/Qt0sqbjQs38bNicf6dFB+75tft2rdjBJ+MRL3QI7ZmZTlFZXLpxG/VSNcegs9xFG PJIA== X-Gm-Message-State: AOJu0YyXtCZrwV9A6hBAtiqgUQ2ruktUilMDEF9/svarDkeyfNhrD6VB snZcfdfy7FHHYkTmP0LHX+EjpFNgXoYaPGdwbAk= X-Google-Smtp-Source: AGHT+IGJ64FKYoidr21UYJBEOTHCWOxj+FVl4SiGFG9fbWplDSe8yYqQvT3VpG9ZLulT/tiA3Eb7ZppAX3UNvEou5n4= X-Received: by 2002:a05:6a20:e609:b0:186:5265:ed1c with SMTP id my9-20020a056a20e60900b001865265ed1cmr8147665pzb.6.1700547245543; Mon, 20 Nov 2023 22:14:05 -0800 (PST) MIME-Version: 1.0 References: <20231107215742.363031-1-ankur.a.arora@oracle.com> <20231107215742.363031-49-ankur.a.arora@oracle.com> <2027da00-273d-41cf-b9e7-460776181083@paulmck-laptop> <87lear4wj6.fsf@oracle.com> <46a4c47a-ba1c-4776-a6f8-6c2146cbdd0d@paulmck-laptop> <31d50051-e42c-4ef2-a1ac-e45370c3752e@paulmck-laptop> In-Reply-To: <31d50051-e42c-4ef2-a1ac-e45370c3752e@paulmck-laptop> From: Z qiang Date: Tue, 21 Nov 2023 14:13:53 +0800 Message-ID: Subject: Re: [RFC PATCH 48/86] rcu: handle quiescent states for PREEMPT_RCU=n To: paulmck@kernel.org Cc: Ankur Arora , linux-kernel@vger.kernel.org, tglx@linutronix.de, peterz@infradead.org, torvalds@linux-foundation.org, linux-mm@kvack.org, x86@kernel.org, akpm@linux-foundation.org, luto@kernel.org, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, willy@infradead.org, mgorman@suse.de, jon.grimm@amd.com, bharata@amd.com, raghavendra.kt@amd.com, boris.ostrovsky@oracle.com, konrad.wilk@oracle.com, jgross@suse.com, andrew.cooper3@citrix.com, mingo@kernel.org, bristot@kernel.org, mathieu.desnoyers@efficios.com, geert@linux-m68k.org, glaubitz@physik.fu-berlin.de, anton.ivanov@cambridgegreys.com, mattst88@gmail.com, krypton@ulrich-teichert.org, rostedt@goodmis.org, David.Laight@aculab.com, richard@nod.at, mjguzik@gmail.com Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: DCB3E40017 X-Stat-Signature: t6dqn5ykd1hztuzgziz5kswirin8sikk X-HE-Tag: 1700547246-637177 X-HE-Meta: U2FsdGVkX19kmoqxwIF0CAW0AK0fOqGaZfeuZfIy8HzjEwjvAE9QhrJi6Gc042fVC97KO7GKbgoGZkHBqaFRgO5qxyauHjiZLoAqtdv8UQfC0ONL2bj2bz7vvFmcLJ5LEHQoNnUJSoUkNtgpoT2G5lBsqyyB8Y6+lKBm6+u2baEa+8DofQlw/OPm54XU/BgCnQWKcbWk4LPs3Ka64G9l4H6Z/3AN65uTeHGimyztC0E0XpelPqmeUZT9uHm9jHPDLRb/xPtTM8n4kQY0zZ6JX1u/m36akS3tD6GX7dygmJc0hSBtpAnbuxw5NEZV1k8VpHtzLnAmQcuEWddqxfnPX9P17teFYk50+IHkF4XeUzb+txO8xPM05hTspQB0Cjw0yjYZMsNEyEFql1CzCSwi+UeXqCN0gzBzkkhQC1GPumFEdOlWH9PPEO6LNumTZPGxyKvKwNCfsEMs7gJC0uYqZP+3vJJWRpOIb1H+d1mS/XNDrYhTjB1PBQ98NSdorks/TTq3MIVLG45alumdTW0zy4SEvYy6y+SIM1gA2RtjwABq4d447pqoB9srKAABgBGmdIrlyPPDGgLS+trBNMlndJcW2DpNyYVAhU+QxFAIff8OkNba8gTuNvLJ569JrvVYNLzzZFI4Biz8qqj2e3S7kVb/ASxpnuMroAciVKjFagp+tJxAQuJQ4ziflBEGNy1WxcJuP3bhk5U6PM13V2WiZ7uINRWLOWdCnNMLh+eoal9K8P4CiRxonxXwN2VpkhXshnfk3KivPYrbUwUKumnO14Cx5iYQRd9Oh+rm9X8/nI88DBMohEDmwJgGJJaXWKYk324kCABuZmBLu/HgCy1UsBx7xFFd7wqgfezrgO0rXxQGy7N8QcmIRpntiCGzD4i+XoF0JMHTPO5Vo1zpVC7nx3i+u16OU6JudMEu72As0qhMRcyYMk5fXLrG/x6dETweHNJFOXEGcfu5qjlbcoL mV/DlbnY qjHhPSezVd4m0RPtjvunwAIHCk49woKxsNF+Do2xjdut6puk+sgJqodcx2638bHznfxg/mxCf8KzODAAh0G10n3IenohvpTpyiGtqDajHzF4B9s43q1Xih+7waO3LrGUvE/ETZ9EwBEdDZsTkCrvIP3aM9/CuJs4S+7hzOG/vo3hjVYPcviLWNy/13yooF0RZR002MNRuxMwW4YIAPNlI6QG6q4YhGE4tSd6xbNuKGgzfi4yS+k5JTfApxkgCap/oE9bqTN//hf8XFk67H0gLHnuRkzj7ibN98boGDO7ySJgWKmEmnRTwt6OCHctZAoNa3wVDO+OjC02TUaG7eGlKc9r0TQdRvjdmfTBrO8zQfSll4wRV7wjr+2AWYMq3+uEObtmvtb5br11stcx5V6vqqrULuebTU1QN5ocoXcCHv3Jy+p0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > > On Mon, Nov 20, 2023 at 09:17:57PM -0800, Paul E. McKenney wrote: > > On Mon, Nov 20, 2023 at 07:26:05PM -0800, Ankur Arora wrote: > > > > > > Paul E. McKenney writes: > > > > On Tue, Nov 07, 2023 at 01:57:34PM -0800, Ankur Arora wrote: > > > >> cond_resched() is used to provide urgent quiescent states for > > > >> read-side critical sections on PREEMPT_RCU=n configurations. > > > >> This was necessary because lacking preempt_count, there was no > > > >> way for the tick handler to know if we were executing in RCU > > > >> read-side critical section or not. > > > >> > > > >> An always-on CONFIG_PREEMPT_COUNT, however, allows the tick to > > > >> reliably report quiescent states. > > > >> > > > >> Accordingly, evaluate preempt_count() based quiescence in > > > >> rcu_flavor_sched_clock_irq(). > > > >> > > > >> Suggested-by: Paul E. McKenney > > > >> Signed-off-by: Ankur Arora > > > >> --- > > > >> kernel/rcu/tree_plugin.h | 3 ++- > > > >> kernel/sched/core.c | 15 +-------------- > > > >> 2 files changed, 3 insertions(+), 15 deletions(-) > > > >> > > > >> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h > > > >> index f87191e008ff..618f055f8028 100644 > > > >> --- a/kernel/rcu/tree_plugin.h > > > >> +++ b/kernel/rcu/tree_plugin.h > > > >> @@ -963,7 +963,8 @@ static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp) > > > >> */ > > > >> static void rcu_flavor_sched_clock_irq(int user) > > > >> { > > > >> - if (user || rcu_is_cpu_rrupt_from_idle()) { > > > >> + if (user || rcu_is_cpu_rrupt_from_idle() || > > > >> + !(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK))) { > > > > > > > > This looks good. > > > > > > > >> /* > > > >> * Get here if this CPU took its interrupt from user > > > >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > > >> index bf5df2b866df..15db5fb7acc7 100644 > > > >> --- a/kernel/sched/core.c > > > >> +++ b/kernel/sched/core.c > > > >> @@ -8588,20 +8588,7 @@ int __sched _cond_resched(void) > > > >> preempt_schedule_common(); > > > >> return 1; > > > >> } > > > >> - /* > > > >> - * In preemptible kernels, ->rcu_read_lock_nesting tells the tick > > > >> - * whether the current CPU is in an RCU read-side critical section, > > > >> - * so the tick can report quiescent states even for CPUs looping > > > >> - * in kernel context. In contrast, in non-preemptible kernels, > > > >> - * RCU readers leave no in-memory hints, which means that CPU-bound > > > >> - * processes executing in kernel context might never report an > > > >> - * RCU quiescent state. Therefore, the following code causes > > > >> - * cond_resched() to report a quiescent state, but only when RCU > > > >> - * is in urgent need of one. > > > >> - * / > > > >> -#ifndef CONFIG_PREEMPT_RCU > > > >> - rcu_all_qs(); > > > >> -#endif > > > > > > > > But... > > > > > > > > Suppose we have a long-running loop in the kernel that regularly > > > > enables preemption, but only momentarily. Then the added > > > > rcu_flavor_sched_clock_irq() check would almost always fail, making > > > > for extremely long grace periods. > > > > > > So, my thinking was that if RCU wants to end a grace period, it would > > > force a context switch by setting TIF_NEED_RESCHED (and as patch 38 mentions > > > RCU always uses the the eager version) causing __schedule() to call > > > rcu_note_context_switch(). > > > That's similar to the preempt_schedule_common() case in the > > > _cond_resched() above. > > > > But that requires IPIing that CPU, correct? > > > > > But if I see your point, RCU might just want to register a quiescent > > > state and for this long-running loop rcu_flavor_sched_clock_irq() does > > > seem to fall down. > > > > > > > Or did I miss a change that causes preempt_enable() to help RCU out? > > > > > > Something like this? > > > > > > diff --git a/include/linux/preempt.h b/include/linux/preempt.h > > > index dc5125b9c36b..e50f358f1548 100644 > > > --- a/include/linux/preempt.h > > > +++ b/include/linux/preempt.h > > > @@ -222,6 +222,8 @@ do { \ > > > barrier(); \ > > > if (unlikely(preempt_count_dec_and_test())) \ > > > __preempt_schedule(); \ > > > + if (!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK))) \ > > > + rcu_all_qs(); \ > > > } while (0) > > > > Or maybe something like this to lighten the load a bit: > > > > #define preempt_enable() \ > > do { \ > > barrier(); \ > > if (unlikely(preempt_count_dec_and_test())) { \ > > __preempt_schedule(); \ > > if (raw_cpu_read(rcu_data.rcu_urgent_qs) && \ > > !(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK))) \ > > rcu_all_qs(); \ > > } \ > > } while (0) > > > > And at that point, we should be able to drop the PREEMPT_MASK, not > > that it makes any difference that I am aware of: > > > > #define preempt_enable() \ > > do { \ > > barrier(); \ > > if (unlikely(preempt_count_dec_and_test())) { \ > > __preempt_schedule(); \ > > if (raw_cpu_read(rcu_data.rcu_urgent_qs) && \ > > !(preempt_count() & SOFTIRQ_MASK)) \ > > rcu_all_qs(); \ > > } \ > > } while (0) > > > > Except that we can migrate as soon as that preempt_count_dec_and_test() > > returns. And that rcu_all_qs() disables and re-enables preemption, > > which will result in undesired recursion. Sigh. > > > > So maybe something like this: > > > > #define preempt_enable() \ > > do { \ > > if (raw_cpu_read(rcu_data.rcu_urgent_qs) && \ > > !(preempt_count() & SOFTIRQ_MASK)) \ > > Sigh. This needs to include (PREEMPT_MASK | SOFTIRQ_MASK), > but check for equality to something like (1UL << PREEMPT_SHIFT). > For PREEMPT_RCU=n and CONFIG_PREEMPT_COUNT=y kernels for report QS in preempt_enable(), we can refer to this: void rcu_read_unlock_strict(void) { struct rcu_data *rdp; if (irqs_disabled() || preempt_count() || !rcu_state.gp_kthread) return; rdp = this_cpu_ptr(&rcu_data); rdp->cpu_no_qs.b.norm = false; rcu_report_qs_rdp(rdp); udelay(rcu_unlock_delay); } The rcu critical section may be in the NMI handler needs to be considered. Thanks Zqiang > > Clearly time to sleep. :-/ > > Thanx, Paul > > > rcu_all_qs(); \ > > barrier(); \ > > if (unlikely(preempt_count_dec_and_test())) { \ > > __preempt_schedule(); \ > > } \ > > } while (0) > > > > Then rcu_all_qs() becomes something like this: > > > > void rcu_all_qs(void) > > { > > unsigned long flags; > > > > /* Load rcu_urgent_qs before other flags. */ > > if (!smp_load_acquire(this_cpu_ptr(&rcu_data.rcu_urgent_qs))) > > return; > > this_cpu_write(rcu_data.rcu_urgent_qs, false); > > if (unlikely(raw_cpu_read(rcu_data.rcu_need_heavy_qs))) { > > local_irq_save(flags); > > rcu_momentary_dyntick_idle(); > > local_irq_restore(flags); > > } > > rcu_qs(); > > } > > EXPORT_SYMBOL_GPL(rcu_all_qs); > > > > > Though I do wonder about the likelihood of hitting the case you describe > > > and maybe instead of adding the check on every preempt_enable() > > > it might be better to instead force a context switch in the > > > rcu_flavor_sched_clock_irq() (as we do in the PREEMPT_RCU=y case.) > > > > Maybe. But rcu_all_qs() is way lighter weight than a context switch. > > > > Thanx, Paul