From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E346FCDB47E for ; Wed, 18 Oct 2023 20:42:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 616BE8D0191; Wed, 18 Oct 2023 16:42:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5C6A48D0016; Wed, 18 Oct 2023 16:42:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 48E2D8D0191; Wed, 18 Oct 2023 16:42:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 33CD48D0016 for ; Wed, 18 Oct 2023 16:42:30 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id DA1C6B62C2 for ; Wed, 18 Oct 2023 20:42:29 +0000 (UTC) X-FDA: 81359755218.10.BE5F704 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf29.hostedemail.com (Postfix) with ESMTP id 366BB12000A for ; Wed, 18 Oct 2023 20:42:26 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="j1kPeGY/"; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf29.hostedemail.com: domain of "SRS0=oRJs=GA=paulmck-ThinkPad-P17-Gen-1.home=paulmck@kernel.org" designates 145.40.73.55 as permitted sender) smtp.mailfrom="SRS0=oRJs=GA=paulmck-ThinkPad-P17-Gen-1.home=paulmck@kernel.org" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697661747; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=d9PMADhpLeyQZgjGi6YdVNbHFw7Blxgb8PwK1sfMkdo=; b=ITVVKzIeIQi5PcNqkEB0G2oIIeTt8NWxaff0droKP5OXz6Z7QxZWKLkkGsCPinwwh84GEh H0YC1AoQZRKeS+cSPx/XN6cn1BkgdUM+Tfwb109fPds7isMDSw2am22vEghjzpUezMHEcc qyHTpbNTF6k6IEHx+9E0HmCBP3QjsrE= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="j1kPeGY/"; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf29.hostedemail.com: domain of "SRS0=oRJs=GA=paulmck-ThinkPad-P17-Gen-1.home=paulmck@kernel.org" designates 145.40.73.55 as permitted sender) smtp.mailfrom="SRS0=oRJs=GA=paulmck-ThinkPad-P17-Gen-1.home=paulmck@kernel.org" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697661747; a=rsa-sha256; cv=none; b=JLblZN0s50kXNvdUtB7QYlhHMbVIxogOw6qhHjpVwXKT9VFvtgBeXv8ntIICHvkuPbiuak qn836NKMTfBsZqVvBbWJ7K/EEzrHtJvf90sVZeGKlcicqwPfbH/rnrc6uM3TP7TgAsq0Xc U+ViBVs/iGufa63EasqNkdZdxQ4A/HQ= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id 1AFC7CE269A; Wed, 18 Oct 2023 20:42:23 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4AC79C433C8; Wed, 18 Oct 2023 20:42:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1697661742; bh=RjEGy53VoxaTJ1W6q5mGi360GS1eyr9FMldrIArMCz0=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=j1kPeGY/78hSmBnfGznQt/X0aWbRWjTb5Jo/tG0k2OJ3xWc6Bdz7hcBQmtzA6iyZI o3qjzKTMKLpNyH4Y/8apcZ7gq15w2Jix0s+XQGITfm4+u/QogDr94TL/KWL9zeD71M Oy9n+di9RLmXc+cLpR8i6Ubt5OjRuvBy+eDZynYk/Ftg5RItc95wZ6DeNSSRFasuVA 64gTJN56FCsK1JermJPfkAJ/yyEzphkCoS1LoOKoI9aoHpbClodn7CGvCPLuVPM5ly eZteM358//nFpcTn51vxpXq/2xqZWgLGpnoZd+VvNrRDMPaSIwspgPtpvGajixWRtD GpDcv8xJWVguw== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id DBF3ECE0BB0; Wed, 18 Oct 2023 13:42:21 -0700 (PDT) Date: Wed, 18 Oct 2023 13:42:21 -0700 From: "Paul E. McKenney" To: Ankur Arora Cc: Thomas Gleixner , Linus Torvalds , Peter Zijlstra , linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org, akpm@linux-foundation.org, luto@kernel.org, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, willy@infradead.org, mgorman@suse.de, rostedt@goodmis.org, jon.grimm@amd.com, bharata@amd.com, raghavendra.kt@amd.com, boris.ostrovsky@oracle.com, konrad.wilk@oracle.com, jgross@suse.com, andrew.cooper3@citrix.com, Frederic Weisbecker Subject: Re: [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED Message-ID: Reply-To: paulmck@kernel.org References: <87ttrngmq0.ffs@tglx> <87jzshhexi.ffs@tglx> <87pm1c3wbn.ffs@tglx> <61bb51f7-99ed-45bf-8c3e-f1d65137c894@paulmck-laptop> <87r0lroffj.fsf@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87r0lroffj.fsf@oracle.com> X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 366BB12000A X-Stat-Signature: u4a915nofu7zs4qab6es9hqssuirxgu5 X-HE-Tag: 1697661746-35025 X-HE-Meta: U2FsdGVkX183hLqkWlFQRuBRdRaaBQzXLhSdYJWpIM7qucGKfe8j3jAISNIK/DvnVo6Sk9xnnohkzCqgPiKB5pCme9IEWTfQby3ch9cKJzaASyuEwmUbUH7D8CggdM+GYmNj7kVqIQXKxQWL+P8JfcMnUHn+18CKTxki6jNwRR7myBmaECKEKdpi8YYIcEz1H8d3fiVp+4Ai+mGBivNTr3OLkza5Pd4KxeX1S/XnCMVivbQBo9Vvt9QzIK3mgZVNj6IAnp8ZoA9/qxK+SKpXBUj2qv/NnRjJtzbJm3/kjn72XkviHxMOUVDNPJ4RQaFAn7gjbrI/RvGTq2g8T63muVNW5GyBh6tlw9seyLLfm8vGcxhpELZbLtsK6ZVvG+91wvg1AOfs3uBrOQPuIuFasv0QxrvwfrVNUwN4aY5jPH1SzTYObEk7qvI5tDiLEhf2V6qZ/1pN6k+eVqXW9dx/WrdnnWOvtx6fwfGWAaIU9tNB0jN8qkyEmq5W7NsMqtSfEoGC9GPP8WtzKoGSi8JfNi7T8ZVwYoxANMWaTppSCUuS4QpIlG/sN8VGSUWeNtJSI1VlzHCF1/m4t+jtHlM6qj2BPT/6RrwQuhvkXQ1OTFoImFWPY/azlDDrHW6aN7xLwegwMy3oxH46Q2iKRSGVX2cu2KP2DahC+kMMXN9dTgEhnPDiHVQymrR5xkF6Fyf8CXIvA3w6BvtnDHeO6ysSaJgTNX1o3K7Wc2JTAxQwPFWC4OPkAdLvWwdP1CJL6+IIPLP5Zv6z7hGm9KSOb09puMD+oBe+BWE+TfcCkkt87yAR+426dou2JUw8w8C6hJJWlQK8bUBJrOza9XJ1IMRMmm9Io4QMrieECnZuO40nxB1hRoGUfGgpSsxo/e5abZy2A/LHjzzImu4+6G1wHkxrXdOSm1GdpFH2skcID70w3jHHsNCXH+p7PpkFgmwjmRGDTvAYBQ6mbGQXe6WIftO Z/fk4kCP LtEkKwN7GmOkSVo9I9KAb9ljQAyDZAQ63eW7YCqmltAjnK0Qv/nX3uHe2GSBrslFyHFazDKcrim+ELyTXnW1gPq4oRnsW5ZUK+PPJaBjdj6RVhWn+DBR+bhW31UPLqq3vUbRiEovlMD0Bl49Auwn1oOqPfhYNAqWl3crlHnkyaffDXPcv7GmOyBg/+CIAGKpFiUx2Njny+zUtrLK4ehemg8Xpgm4Lt+VjlGBD5qTM4J/Bhs/NgLPK2mZ+yy7teFhdIqU6c0LMeBtOcdeJQd4Vs332FDr7bVw1f01wvkDSOKeu1Ac7ZobxUuU3MKwbTz5Zu54VxYLKMp68Yx6bfqqNo6s/a+RP2jyBSJwphL1i2elDFYa2WoyjfUdfkynEfHbxx4hJfMiB0DOVYrLk47D4krhs8hSSrjgpVKMH/XCDRJR1egmq8zuNn/svSbv2oPGeZ7UZgJJ30QZbXNsK3bi5IbGnkEgf3wQjrg/z9wu/LMcswsZK28EcR91We3Hjc3QrpFyD5WgWogy/+lV7e7k5C5FHsLPYVOWB/ZYs07D1Ywp0yBNfgEDpDdvh5PZgTMv79a2rMb/wYHI+B888VajMECJkG4Ku3kzB6SFd2G+TMPoaaXGw11mJcn4VyY3uH7gOa5hr5zVxWX7XnP8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Oct 18, 2023 at 01:15:28PM -0700, Ankur Arora wrote: > > Paul E. McKenney writes: > > > On Wed, Oct 18, 2023 at 03:16:12PM +0200, Thomas Gleixner wrote: > >> Paul! > >> > >> On Tue, Oct 17 2023 at 18:03, Paul E. McKenney wrote: > >> > Belatedly calling out some RCU issues. Nothing fatal, just a > >> > (surprisingly) few adjustments that will need to be made. The key thing > >> > to note is that from RCU's viewpoint, with this change, all kernels > >> > are preemptible, though rcu_read_lock() readers remain > >> > non-preemptible. > >> > >> Why? Either I'm confused or you or both of us :) > > > > Isn't rcu_read_lock() defined as preempt_disable() and rcu_read_unlock() > > as preempt_enable() in this approach? I certainly hope so, as RCU > > priority boosting would be a most unwelcome addition to many datacenter > > workloads. > > No, in this approach, PREEMPT_AUTO selects PREEMPTION and thus > PREEMPT_RCU so rcu_read_lock/unlock() would touch the > rcu_read_lock_nesting. Which is identical to what PREEMPT_DYNAMIC does. Understood. And we need some way to build a kernel such that RCU read-side critical sections are non-preemptible. This is a hard requirement that is not going away anytime soon. > >> With this approach the kernel is by definition fully preemptible, which > >> means means rcu_read_lock() is preemptible too. That's pretty much the > >> same situation as with PREEMPT_DYNAMIC. > > > > Please, just no!!! > > > > Please note that the current use of PREEMPT_DYNAMIC with preempt=none > > avoids preempting RCU read-side critical sections. This means that the > > distro use of PREEMPT_DYNAMIC has most definitely *not* tested preemption > > of RCU readers in environments expecting no preemption. > > Ah. So, though PREEMPT_DYNAMIC with preempt=none runs with PREEMPT_RCU, > preempt=none stubs out the actual preemption via __preempt_schedule. > > Okay, I see what you are saying. More to the point, currently, you can build with CONFIG_PREEMPT_DYNAMIC=n and CONFIG_PREEMPT_NONE=y and have non-preemptible RCU read-side critical sections. > (Side issue: but this means that even for PREEMPT_DYNAMIC preempt=none, > _cond_resched() doesn't call rcu_all_qs().) I have no idea if anyone runs with CONFIG_PREEMPT_DYNAMIC=y and preempt=none. We don't do so. ;-) > >> For throughput sake this fully preemptible kernel provides a mechanism > >> to delay preemption for SCHED_OTHER tasks, i.e. instead of setting > >> NEED_RESCHED the scheduler sets NEED_RESCHED_LAZY. > >> > >> That means the preemption points in preempt_enable() and return from > >> interrupt to kernel will not see NEED_RESCHED and the tasks can run to > >> completion either to the point where they call schedule() or when they > >> return to user space. That's pretty much what PREEMPT_NONE does today. > >> > >> The difference to NONE/VOLUNTARY is that the explicit cond_resched() > >> points are not longer required because the scheduler can preempt the > >> long running task by setting NEED_RESCHED instead. > >> > >> That preemption might be suboptimal in some cases compared to > >> cond_resched(), but from my initial experimentation that's not really an > >> issue. > > > > I am not (repeat NOT) arguing for keeping cond_resched(). I am instead > > arguing that the less-preemptible variants of the kernel should continue > > to avoid preempting RCU read-side critical sections. > > [ snip ] > > >> In the end there is no CONFIG_PREEMPT_XXX anymore. The only knob > >> remaining would be CONFIG_PREEMPT_RT, which should be renamed to > >> CONFIG_RT or such as it does not really change the preemption > >> model itself. RT just reduces the preemption disabled sections with the > >> lock conversions, forced interrupt threading and some more. > > > > Again, please, no. > > > > There are situations where we still need rcu_read_lock() and > > rcu_read_unlock() to be preempt_disable() and preempt_enable(), > > repectively. Those can be cases selected only by Kconfig option, not > > available in kernels compiled with CONFIG_PREEMPT_DYNAMIC=y. > > As far as non-preemptible RCU read-side critical sections are concerned, > are the current > - PREEMPT_DYNAMIC=y, PREEMPT_RCU, preempt=none config > (rcu_read_lock/unlock() do not manipulate preempt_count, but do > stub out preempt_schedule()) > - and PREEMPT_NONE=y, TREE_RCU config (rcu_read_lock/unlock() manipulate > preempt_count)? > > roughly similar or no? No. There is still considerable exposure to preemptible-RCU code paths, for example, when current->rcu_read_unlock_special.b.blocked is set. > >> > I am sure that I am missing something, but I have not yet seen any > >> > show-stoppers. Just some needed adjustments. > >> > >> Right. If it works out as I think it can work out the main adjustments > >> are to remove a large amount of #ifdef maze and related gunk :) > > > > Just please don't remove the #ifdef gunk that is still needed! > > Always the hard part :). Hey, we wouldn't want to insult your intelligence by letting you work on too easy of a problem! ;-) Thanx, Paul