From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48F5DC54FB9 for ; Tue, 21 Nov 2023 05:18:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DEE7F6B03BC; Tue, 21 Nov 2023 00:18:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D9F9B6B03BD; Tue, 21 Nov 2023 00:18:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C66A46B03C2; Tue, 21 Nov 2023 00:18:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id B5D346B03BC for ; Tue, 21 Nov 2023 00:18:01 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 8756EA0357 for ; Tue, 21 Nov 2023 05:18:01 +0000 (UTC) X-FDA: 81480804762.09.20EF612 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf17.hostedemail.com (Postfix) with ESMTP id 9B63040005 for ; Tue, 21 Nov 2023 05:17:59 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="VC2DJgs/"; spf=pass (imf17.hostedemail.com: domain of "SRS0=TAUs=HC=paulmck-ThinkPad-P17-Gen-1.home=paulmck@kernel.org" designates 139.178.84.217 as permitted sender) smtp.mailfrom="SRS0=TAUs=HC=paulmck-ThinkPad-P17-Gen-1.home=paulmck@kernel.org"; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700543879; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=53Ct0oO4bJwk1nxDaVKR4vGCM66NN6Cfass4IHYdr/k=; b=IjCHJTnA34X2Rn0U7ty6ZBu9dGxa7WurNdWulddbYjfCoWeWV2eI8GcrsRAi6ugxpR+Keq EFqGXli9BG09jMYmPQIwSC1O9/vTxEIUes67xPWISh4BwqVicatY0SRzAPDjhQy4m32yhk 1jbreBMTnewgYQAuap90OskZTarR6Gw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700543879; a=rsa-sha256; cv=none; b=dPJIRBZWARhgu6VLySneDRNr6JvpqAxXO/cMMJaQbgN4MvogFELVe04iSjgUa1jd4xXbJW BWpQBlQ/Tl5QJWgHS9I2M+qiN2NidvvjAViLGBzEWFjgpxbpdUU9SQcDInD/7lknbnFMf2 BNVth+zxsocvo29+81GsEchqQPvoPok= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="VC2DJgs/"; spf=pass (imf17.hostedemail.com: domain of "SRS0=TAUs=HC=paulmck-ThinkPad-P17-Gen-1.home=paulmck@kernel.org" designates 139.178.84.217 as permitted sender) smtp.mailfrom="SRS0=TAUs=HC=paulmck-ThinkPad-P17-Gen-1.home=paulmck@kernel.org"; dmarc=pass (policy=none) header.from=kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id A68206164F; Tue, 21 Nov 2023 05:17:58 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 42B68C433C7; Tue, 21 Nov 2023 05:17:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1700543878; bh=XSjtnpgs4ER0L2EMMeyKjJWnnwvSFjP49+AiOl+O1Hc=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=VC2DJgs/gpMw1kWRXuYErJOGb1SeSJlBS2ViARXP4+55U5CFd4gftsrblYiu4xzIE OUJvsppy+KVYL9SY7RuhNjdyidyfG+uNafmZKKPKoftZ1OOWf5DLqzOSCLpjWMj0xO yb2Xe4r1SZUFSOoxZO8WyjJhxLpk/WE4h2NqkWkbpTX7tFLoQd9nZ4d83DFjp6OeoC jjaCLhkqoQUdpGqTWthQm1bvPJxHbgH9sR24RLurs3qfEKOaXM99HBvvp022O6A2PC jWEhVzpTJVd4LKE6Zox0Bn4bBf/ad1wKw5+TRlBIG0jVFf3daOs+sSMSpkZfTqcTgK Y42qJj9XIY3Hw== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id D0497CE1390; Mon, 20 Nov 2023 21:17:57 -0800 (PST) Date: Mon, 20 Nov 2023 21:17:57 -0800 From: "Paul E. McKenney" To: Ankur Arora Cc: linux-kernel@vger.kernel.org, tglx@linutronix.de, peterz@infradead.org, torvalds@linux-foundation.org, linux-mm@kvack.org, x86@kernel.org, akpm@linux-foundation.org, luto@kernel.org, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, willy@infradead.org, mgorman@suse.de, jon.grimm@amd.com, bharata@amd.com, raghavendra.kt@amd.com, boris.ostrovsky@oracle.com, konrad.wilk@oracle.com, jgross@suse.com, andrew.cooper3@citrix.com, mingo@kernel.org, bristot@kernel.org, mathieu.desnoyers@efficios.com, geert@linux-m68k.org, glaubitz@physik.fu-berlin.de, anton.ivanov@cambridgegreys.com, mattst88@gmail.com, krypton@ulrich-teichert.org, rostedt@goodmis.org, David.Laight@aculab.com, richard@nod.at, mjguzik@gmail.com Subject: Re: [RFC PATCH 48/86] rcu: handle quiescent states for PREEMPT_RCU=n Message-ID: <46a4c47a-ba1c-4776-a6f8-6c2146cbdd0d@paulmck-laptop> Reply-To: paulmck@kernel.org References: <20231107215742.363031-1-ankur.a.arora@oracle.com> <20231107215742.363031-49-ankur.a.arora@oracle.com> <2027da00-273d-41cf-b9e7-460776181083@paulmck-laptop> <87lear4wj6.fsf@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87lear4wj6.fsf@oracle.com> X-Rspamd-Queue-Id: 9B63040005 X-Rspam-User: X-Stat-Signature: fh6z5tscywnkawi7rndk6w5hinb5cbaf X-Rspamd-Server: rspam03 X-HE-Tag: 1700543879-341839 X-HE-Meta: U2FsdGVkX1/ulOX1pWnoO4c9ikxCvd8Wuhuy1StmjB6ZdRsNZK43DJTof/QYWBi9HrieH/UrMVkYnTW/vekAgLgWh8Foz8bq0lJ6HU5PyhKcbnqwNrWpWTXrJh103drHEJ9cli/y2tFXSG3KzKFlxkq3A34IvxPk8xSJMIV/0FYPT5NozPlccZwfJ48oyvkeW4aaE5Q6vTFZlp0a3JivfeMFkEmir5CIB3wfIYNwFKWExydVHEpjm9FNgB/Zz+qrlEBeH/03N8weAz5pgAcSKeSv0sTQ1rPbYDVk6KL9yEQmEr9JRdc44OrHYuaE1vLNFkCHLpbIiE1c08f8aZN2rQ0MdEA3D5PuivjRoZ9UMqPJzB3RYznD27mgESf6QAy0HO/sYNOc/JUJ0u8BYB0nQO58D1NBC5l75sJXGwl0XA6gxO+37/TfwkWsQSeRQGZ9jiThK+8aeo+Uk6aCDpfI+7AMmqHrOf6QhxU0qNtWcNIhM9uJy13nDUyPjRxCLO3yPLGDQ6NYP049ayFTUOqfhLXGp62rIFRXFLzx4/+/fogcxjJRxtsJ/Q78sieZeR5ybX57uMiHVavoxZP9OQC+b0nySywH8fpg+JG65UNuxAKWBBCdQM6KPMw7DI7TubibmjeZMoTd1tIlKXLI9+aqyS4zqPa2W1dvYs7tDxDTisNnsesy97jOgmsncOtnLQfNyP7MMfmtaU1HaNRlqKyturHnZgWCoTWMObcCeZeAcs8YLUOX8N9LWQxfGNYKvO9l57tnwtVcAjPWT5ry8i0BiHmD7jcmT5CzCtEAJdWuYS0NcIyuEtdFLQq8brs72e7y/9B95TP5Zz/Bw4+ze9kcZAguBDZo5N95CFJg35i5b03F0jxoGZQJnNU7MF6wVDRt+Te9SIYLUKxLvR2I7u5xlfmaLb6kk8JrxNAxmnDfzEonVhdKVPmTAIEABnzOtNkXvDoW4u1rPeyxhJ9Hnip Cs6G/LGl dZUKC281eFhcZMyO+qfCRZ8FcSIQaz+886KdvwVNVZfEu3Mpxz27H7mJCsm6WA06kR8lGE4gEJmK4LW6lFIQ6H2lMz0PCqSshxUC0+EWN2O31e8+65f2u73bA7rc5d3dkYFPu37QHEM0BE6LacAm7nL4smJY6V0bxIKpOf72l4daOrTPxyoO1Aji+yJ+fmBWLkIZx52s9Il6PYlvTkhG7B9MVXGfVSkQ25vXiU+ExG2D9G4C6/lE10VP/dd7laIQDz7ekDn5IiZkFOVtVgTav5lVM5Hg4b6/BvOrIHHkaqByrGRenP3Q49GSAXX5qBtV/aRsYVDJcl1oS42lsWUq3qEMrALM0ak+1r5RTC9bsY5v59inTDMFdDbH5lKXQeKZZ2I4t9zykPOWeFaDDjfCR7ODrLduwTurtw1FtXIH55ClUcAS9MjrQu8ohpDNysw6Tfr9HsO5pUaVp1uMVkPD5Blx+4l4kFHt0UY+xhh5RKv7vEKCah9Rh5FcX3uSFd08BulkmiwBACuImKA6ReaYMFc4UB3Xh0c8JaxJehq086Zcz/i8SdQzyEkdYxYySkskojtpqHpQC+49M6Nam0wpuY1SoShoRycoSfu05BedfHRJ1Ums/h/QZQwGlZXi65SypcDrq97jopfp852IjnYOm8WiS6tJmmgAQ9nYU X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Nov 20, 2023 at 07:26:05PM -0800, Ankur Arora wrote: > > Paul E. McKenney writes: > > On Tue, Nov 07, 2023 at 01:57:34PM -0800, Ankur Arora wrote: > >> cond_resched() is used to provide urgent quiescent states for > >> read-side critical sections on PREEMPT_RCU=n configurations. > >> This was necessary because lacking preempt_count, there was no > >> way for the tick handler to know if we were executing in RCU > >> read-side critical section or not. > >> > >> An always-on CONFIG_PREEMPT_COUNT, however, allows the tick to > >> reliably report quiescent states. > >> > >> Accordingly, evaluate preempt_count() based quiescence in > >> rcu_flavor_sched_clock_irq(). > >> > >> Suggested-by: Paul E. McKenney > >> Signed-off-by: Ankur Arora > >> --- > >> kernel/rcu/tree_plugin.h | 3 ++- > >> kernel/sched/core.c | 15 +-------------- > >> 2 files changed, 3 insertions(+), 15 deletions(-) > >> > >> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h > >> index f87191e008ff..618f055f8028 100644 > >> --- a/kernel/rcu/tree_plugin.h > >> +++ b/kernel/rcu/tree_plugin.h > >> @@ -963,7 +963,8 @@ static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp) > >> */ > >> static void rcu_flavor_sched_clock_irq(int user) > >> { > >> - if (user || rcu_is_cpu_rrupt_from_idle()) { > >> + if (user || rcu_is_cpu_rrupt_from_idle() || > >> + !(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK))) { > > > > This looks good. > > > >> /* > >> * Get here if this CPU took its interrupt from user > >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c > >> index bf5df2b866df..15db5fb7acc7 100644 > >> --- a/kernel/sched/core.c > >> +++ b/kernel/sched/core.c > >> @@ -8588,20 +8588,7 @@ int __sched _cond_resched(void) > >> preempt_schedule_common(); > >> return 1; > >> } > >> - /* > >> - * In preemptible kernels, ->rcu_read_lock_nesting tells the tick > >> - * whether the current CPU is in an RCU read-side critical section, > >> - * so the tick can report quiescent states even for CPUs looping > >> - * in kernel context. In contrast, in non-preemptible kernels, > >> - * RCU readers leave no in-memory hints, which means that CPU-bound > >> - * processes executing in kernel context might never report an > >> - * RCU quiescent state. Therefore, the following code causes > >> - * cond_resched() to report a quiescent state, but only when RCU > >> - * is in urgent need of one. > >> - * / > >> -#ifndef CONFIG_PREEMPT_RCU > >> - rcu_all_qs(); > >> -#endif > > > > But... > > > > Suppose we have a long-running loop in the kernel that regularly > > enables preemption, but only momentarily. Then the added > > rcu_flavor_sched_clock_irq() check would almost always fail, making > > for extremely long grace periods. > > So, my thinking was that if RCU wants to end a grace period, it would > force a context switch by setting TIF_NEED_RESCHED (and as patch 38 mentions > RCU always uses the the eager version) causing __schedule() to call > rcu_note_context_switch(). > That's similar to the preempt_schedule_common() case in the > _cond_resched() above. But that requires IPIing that CPU, correct? > But if I see your point, RCU might just want to register a quiescent > state and for this long-running loop rcu_flavor_sched_clock_irq() does > seem to fall down. > > > Or did I miss a change that causes preempt_enable() to help RCU out? > > Something like this? > > diff --git a/include/linux/preempt.h b/include/linux/preempt.h > index dc5125b9c36b..e50f358f1548 100644 > --- a/include/linux/preempt.h > +++ b/include/linux/preempt.h > @@ -222,6 +222,8 @@ do { \ > barrier(); \ > if (unlikely(preempt_count_dec_and_test())) \ > __preempt_schedule(); \ > + if (!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK))) \ > + rcu_all_qs(); \ > } while (0) Or maybe something like this to lighten the load a bit: #define preempt_enable() \ do { \ barrier(); \ if (unlikely(preempt_count_dec_and_test())) { \ __preempt_schedule(); \ if (raw_cpu_read(rcu_data.rcu_urgent_qs) && \ !(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK))) \ rcu_all_qs(); \ } \ } while (0) And at that point, we should be able to drop the PREEMPT_MASK, not that it makes any difference that I am aware of: #define preempt_enable() \ do { \ barrier(); \ if (unlikely(preempt_count_dec_and_test())) { \ __preempt_schedule(); \ if (raw_cpu_read(rcu_data.rcu_urgent_qs) && \ !(preempt_count() & SOFTIRQ_MASK)) \ rcu_all_qs(); \ } \ } while (0) Except that we can migrate as soon as that preempt_count_dec_and_test() returns. And that rcu_all_qs() disables and re-enables preemption, which will result in undesired recursion. Sigh. So maybe something like this: #define preempt_enable() \ do { \ if (raw_cpu_read(rcu_data.rcu_urgent_qs) && \ !(preempt_count() & SOFTIRQ_MASK)) \ rcu_all_qs(); \ barrier(); \ if (unlikely(preempt_count_dec_and_test())) { \ __preempt_schedule(); \ } \ } while (0) Then rcu_all_qs() becomes something like this: void rcu_all_qs(void) { unsigned long flags; /* Load rcu_urgent_qs before other flags. */ if (!smp_load_acquire(this_cpu_ptr(&rcu_data.rcu_urgent_qs))) return; this_cpu_write(rcu_data.rcu_urgent_qs, false); if (unlikely(raw_cpu_read(rcu_data.rcu_need_heavy_qs))) { local_irq_save(flags); rcu_momentary_dyntick_idle(); local_irq_restore(flags); } rcu_qs(); } EXPORT_SYMBOL_GPL(rcu_all_qs); > Though I do wonder about the likelihood of hitting the case you describe > and maybe instead of adding the check on every preempt_enable() > it might be better to instead force a context switch in the > rcu_flavor_sched_clock_irq() (as we do in the PREEMPT_RCU=y case.) Maybe. But rcu_all_qs() is way lighter weight than a context switch. Thanx, Paul