From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03825CD3430 for ; Tue, 19 Sep 2023 02:05:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 85F296B0482; Mon, 18 Sep 2023 22:05:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7E8326B0484; Mon, 18 Sep 2023 22:05:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6614F6B0485; Mon, 18 Sep 2023 22:05:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 519396B0482 for ; Mon, 18 Sep 2023 22:05:46 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 1284A1402C1 for ; Tue, 19 Sep 2023 02:05:46 +0000 (UTC) X-FDA: 81251705892.12.5FD05AE Received: from mail-ej1-f43.google.com (mail-ej1-f43.google.com [209.85.218.43]) by imf27.hostedemail.com (Postfix) with ESMTP id 0FF4C40029 for ; Tue, 19 Sep 2023 02:05:43 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=c2PSiDLa; dmarc=none; spf=pass (imf27.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.218.43 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695089144; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ncQzUM1Efo12FAQWJZLqWp6KtQXhuyYpzpYvTO5Tk8M=; b=69eOhPaIBxulDmMRkfvAguPkR5zpkAiilDGqAQHDuy7+1XiDrSEXrv1+tD6tue2uAvowG0 Wpo3tchyia17buekBqkLE34Zv79d4CHk+CEzS7D9d2jx76npPF4wv0FbQlL8OYgwcCdQE1 lOWL3XcJ8OFEEZbitp0f+MKZ74tAJ9Y= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=c2PSiDLa; dmarc=none; spf=pass (imf27.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.218.43 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695089144; a=rsa-sha256; cv=none; b=1abRJi5BdJuybHaQZHenx2l0xI/t47yQY+Fhd0Cn7sr/D2GgZXSAocrPaClW0R55TtGq8h kWqq7VVOlLlL9AZ8XH1ISiWfO2rovMwwpJJeRIkyR+dAEmdupI28FmzRxbE7SQyFs2M6hs nGgDOyYuMmWFp24XUS8bovcG+5CA6AA= Received: by mail-ej1-f43.google.com with SMTP id a640c23a62f3a-9aa0495f9cfso1360069666b.1 for ; Mon, 18 Sep 2023 19:05:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; t=1695089142; x=1695693942; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=ncQzUM1Efo12FAQWJZLqWp6KtQXhuyYpzpYvTO5Tk8M=; b=c2PSiDLaERNpZdUMjv5RMc6xbIm68VuONOEDDJWMLlWY9B7pIayj0yKxq6rtxgIj0K MTyahH6bRRn7CbEJJBD7+hv+KPJ/lJmw7yeZ3mBxHVc02cW5K94lL7evw8y4UXg4lw7i vnyhtwWy6dg67HXfc7BxHXV6QModc3WPr3Q+s= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695089142; x=1695693942; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ncQzUM1Efo12FAQWJZLqWp6KtQXhuyYpzpYvTO5Tk8M=; b=XjRCe9O1ahf4oNS0uHQ6sRmDng4qStRTrjLi9hqZOJSJlPMpBey6lZwSgm91SBIMQ0 cKk+fsGgYUyCH7SaG4iY1IG8l2pq5pZ2zC4bLYiP1oU/RL5WOB4Phlm74X0tg6G+M8ln 42DjsXkwIYVgbJD7KoOFZYKKNltWBt/jQrB4KaZVgPTb/sB3qRdJyJ+koQIHisH+X8dT HCIXglvT++loE+dPS6TXRorcpLLRfO9nnDjAQmWs7Zzck81YpSOH2z4NKEDowl4OP/C/ gu10iXSAPLShG+6pinw7F9LKEAgvL5cMbgzaGLNYSYz9itz8/e9CCIKJcEQqRRg5Soct ZeQw== X-Gm-Message-State: AOJu0YzIUG/SZ+a0x57l5XZFdhPo9lSAj6le1YZsqrY8p0VrHMSWUKea Tmp0AmU08CyKHzyfVckoY8IoUyMVr822C2kNIn4TKe0W X-Google-Smtp-Source: AGHT+IGOkS/Bo4GJoFh9TQIhZ56jRZ9pJ2IG2P8rvnWW0CP+gkfPGtw5MkFl9TA3AiomPYPpM3M6VA== X-Received: by 2002:a17:907:a06:b0:9a5:962c:cb6c with SMTP id bb6-20020a1709070a0600b009a5962ccb6cmr1932952ejc.31.1695089142184; Mon, 18 Sep 2023 19:05:42 -0700 (PDT) Received: from mail-ej1-f46.google.com (mail-ej1-f46.google.com. [209.85.218.46]) by smtp.gmail.com with ESMTPSA id u11-20020a170906068b00b00991faf3810esm7205179ejb.146.2023.09.18.19.05.41 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 18 Sep 2023 19:05:41 -0700 (PDT) Received: by mail-ej1-f46.google.com with SMTP id a640c23a62f3a-99c93638322so1118542766b.1 for ; Mon, 18 Sep 2023 19:05:41 -0700 (PDT) X-Received: by 2002:a05:6402:1762:b0:52a:38c3:1b4b with SMTP id da2-20020a056402176200b0052a38c31b4bmr1603271edb.15.1695088688232; Mon, 18 Sep 2023 18:58:08 -0700 (PDT) MIME-Version: 1.0 References: <20230830184958.2333078-8-ankur.a.arora@oracle.com> <20230908070258.GA19320@noisy.programming.kicks-ass.net> <87zg1v3xxh.fsf@oracle.com> <87edj64rj1.fsf@oracle.com> <87zg1u1h5t.fsf@oracle.com> <20230911150410.GC9098@noisy.programming.kicks-ass.net> <87h6o01w1a.fsf@oracle.com> <20230912082606.GB35261@noisy.programming.kicks-ass.net> <87cyyfxd4k.ffs@tglx> In-Reply-To: <87cyyfxd4k.ffs@tglx> From: Linus Torvalds Date: Mon, 18 Sep 2023 18:57:50 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED To: Thomas Gleixner Cc: Peter Zijlstra , Ankur Arora , linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org, akpm@linux-foundation.org, luto@kernel.org, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, willy@infradead.org, mgorman@suse.de, rostedt@goodmis.org, jon.grimm@amd.com, bharata@amd.com, raghavendra.kt@amd.com, boris.ostrovsky@oracle.com, konrad.wilk@oracle.com, jgross@suse.com, andrew.cooper3@citrix.com Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 0FF4C40029 X-Stat-Signature: yuenktxn94cosnmqn5abnnkz456takh6 X-HE-Tag: 1695089143-731147 X-HE-Meta: U2FsdGVkX1+p5KteBz6etOp/s8lsb9wZN1H+QHCCbipC5tzB2/d7WS9XfE52/T7gK6zDoNo/aIOQDfe8rZkP0n8QFV5iPI6KGsz/UbEvaii2QSOZuZtTMRTEK1A1mu6P6fUv7t0sRL8yzx1uu6PL4hviRliv8qbJkvnhsaL0NmkDeNg/99yH78oTns2FsqfqgyFNJdyGosDESa8nXOzco1f1yosoF3JA5sVt5Iid0w0KmFYiBvLnPsoyM041nReQ8Kzz1CmDS+NS+S8w2KBJvmAiR3/8ibaP3OjYsU77khGFhulEstK1lf7QeW+IeGbbzGB+wQxOSGbYd0Xy0trKTeYN6t+N0lbH0/Tl02gm1UGG3Dxzwr7dQJLSDlEg8nmbfzOkeYUZGwE9fkNdDW2cAqIqRWi1htsByLnIUHwSuD7426fbABD3Kqd4T+G7EAxFXRW0l/n4R8tfWuUAsnO3o+WD8mN9UJ0C/cfCcsK/tZru4AAAisaEfL4Pq/iINDav3cO6QK94z1aXDwbvJxWIdKJsAnEdcq/LYEFDHf9tsRggKJk1rg9jZuR9bXpZjMAF7I5KPLUn1KITqxqVk84uKQDzKs7El+whu6KNp0GcEps/O334/nZvoA/HG8Edm/ZiVL+ppR4JWxu6k9W1A7OAGENuR2E2Jx1vIYMUkg/+A/xRM7IxyweVlTrtaSd4RSEbwTMuNosafmK3GCVNTK1ZY1kMinJ+qFgFhfZePWIZbYoQhtT02C72Fpudn2g5Q7BUWwAt91IraHhzBmREQoufGZMZ3JAsYK+G/Ujhb0nGS4d6a9QpMjuyG2jUfXNqPFVIv94iNHyOgGPug2bjyk/P1JmtBOp6DqYFEAIhci2AvDC/68KwBAJklnUarBQdYdHTjXANTRqtuCH9Smo6SrSjWFjUdknjw9JNWRcI60nFn0Qdb9aJ5JK/dCAXyR5k2KOWs1Ctla/FCZqQzEMei+u KCzss6JO kt7t9k7xpRzWOybC12DsFEoLTkuw6f74JBOgpc4F8yIJZyNhd6ihbaOsRMXk4lqNyiHKsRjnyg6n/M/0KCY2aQdi/453MPDK9bbRZKzB9RXVFTFf1h10z6B/tL/KI+rW5ylV/XgjAS/AwSk2neqtbrSfBDymyZ/VtWcUftXlwU45gHVUSX/rW5WgTNAShis3dU3h8sr88WKHwwY9NtFbm8wpRxbbiuha4h/iDS1l5QmXRE95v6DtgLfcJAdrLTm/KIVfSb8uEzM0FUT/l4k3cxs10RKQ6ia2zQiM+1LQh798/xQHr45nFK4hlh3Y4RGuIAix1GxfiOlGMRWM+g6/PMjMwenHyrQ1hj/LrtSkAXfFBptGypUsVPslo04uE4O5CEjT37LWKTWCokVpv38rOMvKUrDkuCFOPgi77ZvSIZs9DH/+OQF7BoIwEsVz/avmsfHOShFVoMEiyLgbaCrLO3KNKl85qftqHGc1ioOFuEzFWCA/0OLxLhs7G+ujpzD/NTYrTcj/c0iN3K35/kY6I3LWAkAwzHqEPzIHDa7UPReODzq0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 18 Sept 2023 at 16:42, Thomas Gleixner wrote: > > What about the following: > > 1) Keep preemption count and the real preemption points enabled > unconditionally. Well, it's certainly the simplest solution, and gets rid of not just the 'rep string' issue, but gets rid of all the cond_resched() hackery entirely. > 20 years ago this was a real issue because we did not have: > > - the folding of NEED_RESCHED into the preempt count > > - the cacheline optimizations which make the preempt count cache > pretty much always cache hot > > - the hardware was way less capable > > I'm not saying that preempt_count is completely free today as it > obviously adds more text and affects branch predictors, but as the > major distros ship with DYNAMIC_PREEMPT enabled it is obviously an > acceptable and tolerable tradeoff. Yeah, the fact that we do presumably have PREEMPT_COUNT enabled in most distros does speak for just admitting that the PREEMPT_NONE / VOLUNTARY approach isn't actually used, and is only causing pain. > 2) When the scheduler wants to set NEED_RESCHED due it sets > NEED_RESCHED_LAZY instead which is only evaluated in the return to > user space preemption points. Is this just to try to emulate the existing PREEMPT_NONE behavior? If the new world order is that the time slice is always honored, then the "this might be a latency issue" goes away. Good. And we'd also get better coverage for the *debug* aim of "might_sleep()" and CONFIG_DEBUG_ATOMIC_SLEEP, since we'd rely on PREEMPT_COUNT always existing. But because the latency argument is gone, the "might_resched()" should then just be removed entirely from "might_sleep()", so that might_sleep() would *only* be that DEBUG_ATOMIC_SLEEP thing. That argues for your suggestion too, since we had a performance issue due to "might_sleep()" _not_ being just a debug thing, and pointlessly causing a reschedule in a place where reschedules were _allowed_, but certainly much less than optimal. Which then caused that fairly recent commit 4542057e18ca ("mm: avoid 'might_sleep()' in get_mmap_lock_carefully()"). However, that does bring up an issue: even with full preemption, there are certainly places where we are *allowed* to schedule (when the preempt count is zero), but there are also some places that are *better* than other places to schedule (for example, when we don't hold any other locks). So, I do think that if we just decide to go "let's just always be preemptible", we might still have points in the kernel where preemption might be *better* than in others points. But none of might_resched(), might_sleep() _or_ cond_resched() are necessarily that kind of "this is a good point" thing. They come from a different background. So what I think what you are saying is that we'd have the following situation: - scheduling at "return to user space" is presumably always a good thing. A non-preempt-count bit NEED_RESCHED_LAZY (or TIF_RESCHED, or whatever) would cover that, and would give us basically the existing CONFIG_PREEMPT_NONE behavior. So a config variable (either compile-time with PREEMPT_NONE or a dynamic one with DYNAMIC_PREEMPT set to none) would make any external wakeup only set that bit. And then a "fully preemptible low-latency desktop" would set the preempt-count bit too. - but the "timeslice over" case would always set the preempt-count-bit, regardless of any config, and would guarantee that we have reasonable latencies. This all makes cond_resched() (and might_resched()) pointless, and they can just go away. Then the question becomes whether we'd want to introduce a *new* concept, which is a "if you are going to schedule, do it now rather than later, because I'm taking a lock, and while it's a preemptible lock, I'd rather not sleep while holding this resource". I suspect we want to avoid that for now, on the assumption that it's hopefully not a problem in practice (the recently addressed problem with might_sleep() was that it actively *moved* the scheduling point to a bad place, not that scheduling could happen there, so instead of optimizing scheduling, it actively pessimized it). But I thought I'd mention it. Anyway, I'm definitely not opposed. We'd get rid of a config option that is presumably not very widely used, and we'd simplify a lot of issues, and get rid of all these badly defined "cond_preempt()" things. Linus