From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EDD46C00A8F for ; Tue, 24 Oct 2023 14:34:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8590B6B02B0; Tue, 24 Oct 2023 10:34:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 809406B02B1; Tue, 24 Oct 2023 10:34:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6F7596B02B2; Tue, 24 Oct 2023 10:34:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 615446B02B0 for ; Tue, 24 Oct 2023 10:34:34 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id DA8AFC03E7 for ; Tue, 24 Oct 2023 14:34:33 +0000 (UTC) X-FDA: 81380600826.11.CDD5D42 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf18.hostedemail.com (Postfix) with ESMTP id 2B5831C002D for ; Tue, 24 Oct 2023 14:34:31 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf18.hostedemail.com: domain of "SRS0=p2SP=GG=goodmis.org=rostedt@kernel.org" designates 139.178.84.217 as permitted sender) smtp.mailfrom="SRS0=p2SP=GG=goodmis.org=rostedt@kernel.org" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698158072; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=c16kwT8n0FImI4UoA+3jzqcIn17FPoIdHrMFtHpRXN0=; b=B2/nx/dX5d02EneIIKjldvDsxLTn3O9ue9YOxy6evw+0UZGvtXuWB9XhtE1NJCfHctKTxT HgtG5iumg6uGJq+zFcBcfOqTza6Drj3FdMOd8rbKHEhZydVxYURl0wIhEhkxOkkWcC1BNF 7bAxr6355QyaT0hb0B1l4gSJX5/XcuA= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf18.hostedemail.com: domain of "SRS0=p2SP=GG=goodmis.org=rostedt@kernel.org" designates 139.178.84.217 as permitted sender) smtp.mailfrom="SRS0=p2SP=GG=goodmis.org=rostedt@kernel.org" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698158072; a=rsa-sha256; cv=none; b=D9/zo+5E8kvhsFaIe+bRRyHAp/zBoPCo6+oXFR2GYwHrlbLqm9XlK9xm7pYaZNkqrDJypX QCD4I6RfV9KBUDkLBjKHYti33vdII2DQ0wKkIwXGDfrYYEB8NGZEF5czSMg2GgtoSaMIbe KSNrg3tsKyIXh6DjGNvvoyZntmisV/Q= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 1DE5E62B75; Tue, 24 Oct 2023 14:34:31 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A79B2C433C7; Tue, 24 Oct 2023 14:34:27 +0000 (UTC) Date: Tue, 24 Oct 2023 10:34:26 -0400 From: Steven Rostedt To: Thomas Gleixner Cc: Peter Zijlstra , Ankur Arora , Linus Torvalds , linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org, akpm@linux-foundation.org, luto@kernel.org, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, willy@infradead.org, mgorman@suse.de, jon.grimm@amd.com, bharata@amd.com, raghavendra.kt@amd.com, boris.ostrovsky@oracle.com, konrad.wilk@oracle.com, jgross@suse.com, andrew.cooper3@citrix.com, Joel Fernandes , Youssef Esmat , Vineeth Pillai , Suleiman Souhlal Subject: Re: [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED Message-ID: <20231024103426.4074d319@gandalf.local.home> In-Reply-To: <87cyyfxd4k.ffs@tglx> References: <20230830184958.2333078-8-ankur.a.arora@oracle.com> <20230908070258.GA19320@noisy.programming.kicks-ass.net> <87zg1v3xxh.fsf@oracle.com> <87edj64rj1.fsf@oracle.com> <87zg1u1h5t.fsf@oracle.com> <20230911150410.GC9098@noisy.programming.kicks-ass.net> <87h6o01w1a.fsf@oracle.com> <20230912082606.GB35261@noisy.programming.kicks-ass.net> <87cyyfxd4k.ffs@tglx> X-Mailer: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 2B5831C002D X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 8xrfhusp9unp5jhttcfbpipzify56r16 X-HE-Tag: 1698158071-844011 X-HE-Meta: U2FsdGVkX1//+p1cRkf4/38Uw1zmvcLNcXbZt9yBmbaxrAJZTkU5NIVknx5JmyUvJFHSuHaW695J51C8AC4VdP13hQVtYfs9HrtuHr9qrH/wKx3BYUMgGsdBXjEKSC2uVnoUkuKVb5WmDWD/IxM2mftmI97+V3cBlFNETm1/qC6g3feBA47yfZEEGpvVYdmj2C5f7YfE0D4eZHNPUaEBDWb7/6ZhdNMCsEYDcaMcHO8zS2Z4ZPP7R4l3fAZvMfXJvIggz4UXVJZ9cBprCUHXGVsW8tHf/j3STtw+Ili2zdRYbmwu+L9fyiQGpwREdNL8LbIMx63oUfZIpUvrlj+b2tUu7OSDxvSQ8CY1cMbgJAmtqoMMRe6K/4Gzw4OIrrwYB/UOefhDzetx/LI7q1NrkpUz+Pkg4ejQZW1p4k+qaG46Z+XJocd3aTFnmTmTgdMKxEWVBk4ntSMJtj/J1xI4E+3yV9DIoIf8TS4Ij7uAegCBffB7gdEMHKLn/iqtxbbkEWkTdvsZMw2wRWEbbgAit2LQkAXOL2raJX18OuOXa44bzLm8DbCwCDlN8hIYtBi3ne98rXUF+Xjm+JFT1VouO5z5wKd5DqnPE3uGCwQkqBXn3cnRXDvSRHbkML0kkP0eaAEVRlW6DTir+nZnM1OO5jYQwfKAvdqDUDdkL4HNnLMkI0vyz050i9F3KfuhXU9RG0Is9lZFC0HLoa8TNoeqtPZkPt8pE0Tnj7YfIREKqleTxN5yj/Twpge4xCUT4nv0/VqLDMpGyPFh3E49qtFK6ULSEPl8IDHBTgl5DyVbDoUcIgNPa9ETBolkHK76qH/pnGHivAlA/mQXrL0yFE8axP3gDL6YHfbt/V7z7Yrv0XwyPej3DSAF2GI07qB/bkJr6hmyWEPxd4GkSRtzrEgPO5At1Mosw2gmVlc5hJdrZH06ViOjkppWX+39YBFuuTl4XMNOh4zWWK4rzbbYYtY akEGg9eL j/qN5wQ3+z0mxjmbbN89veBTKWzrlkgMcOJtZbPGzMZ0jQHz2o80oZpVN07dkK1w8WJBQrp9Ve1KKZQQMe72BfCLAfSK/zIyYwbS2BVvzl2Cqo7lK1nEqy7d9IDd+6DXLgBpEQrSyNBEojOTAV4C/oKi1STuw9v2Yh22WZjLhsI7ZXziWI+0ZbreHYht5/6RVw7TkDL3Ki3W7aIKWw1kLgCZlccPdxdL7N2LkhLLBpSY64SbHLykENiQqmgp9FBXT0Pk7/m69BLD/k5iNDgSHwbBLu5QXyZvGZhozE9HQ5jI87u0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, 19 Sep 2023 01:42:03 +0200 Thomas Gleixner wrote: > 2) When the scheduler wants to set NEED_RESCHED due it sets > NEED_RESCHED_LAZY instead which is only evaluated in the return to > user space preemption points. > > As NEED_RESCHED_LAZY is not folded into the preemption count the > preemption count won't become zero, so the task can continue until > it hits return to user space. > > That preserves the existing behaviour. I'm looking into extending this concept to user space and to VMs. I'm calling this the "extended scheduler time slice" (ESTS pronounced "estis") The ideas is this. Have VMs/user space share a memory region with the kernel that is per thread/vCPU. This would be registered via a syscall or ioctl on some defined file or whatever. Then, when entering user space / VM, if NEED_RESCHED_LAZY (or whatever it's eventually called) is set, it checks if the thread has this memory region and a special bit in it is set, and if it does, it does not schedule. It will treat it like a long kernel system call. The kernel will then set another bit in the shared memory region that will tell user space / VM that the kernel wanted to schedule, but is allowing it to finish its critical section. When user space / VM is done with the critical section, it will check the bit that may be set by the kernel and if it is set, it should do a sched_yield() or VMEXIT so that the kernel can now schedule it. What about DOS you say? It's no different than running a long system call. No task can run forever. It's not a "preempt disable", it's just "give me some more time". A "NEED_RESCHED" will always schedule, just like a kernel system call that takes a long time. The goal is to allow user space to get out of critical sections that we know can cause problems if they get preempted. Usually it's a user space / VM lock is held or maybe a VM interrupt handler that needs to wake up a task on another vCPU. If we are worried about abuse, we could even punish tasks that don't call sched_yield() by the time its extended time slice is taken. Even without that punishment, if we have EEVDF, this extension will make it less eligible the next time around. The goal is to prevent a thread / vCPU being preempted while holding a lock or resource that other threads / vCPUs will want. That is, prevent contention, as that's usually the biggest issue with performance in user space and VMs. I'm going to work on a POC, and see if I can get some benchmarks on how much this could help tasks like databases and VMs in general. -- Steve