From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C549C25B48 for ; Thu, 26 Oct 2023 13:17:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 60A146B036D; Thu, 26 Oct 2023 09:17:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 592D56B036E; Thu, 26 Oct 2023 09:17:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4355A6B0384; Thu, 26 Oct 2023 09:17:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 306F16B036D for ; Thu, 26 Oct 2023 09:17:12 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id C3895140353 for ; Thu, 26 Oct 2023 13:17:11 +0000 (UTC) X-FDA: 81387663462.16.5294232 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf11.hostedemail.com (Postfix) with ESMTP id 14AA74004B for ; Thu, 26 Oct 2023 13:17:07 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=none; spf=pass (imf11.hostedemail.com: domain of "SRS0=xBQK=GI=goodmis.org=rostedt@kernel.org" designates 145.40.73.55 as permitted sender) smtp.mailfrom="SRS0=xBQK=GI=goodmis.org=rostedt@kernel.org"; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698326228; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7lYBvMeCinMnl7WgGTbvVKgsmJJx/xRUjKu/4sMQcww=; b=OCMhfr0fY0dI4CWEwpgFlI+mhvQjexp3+EFRU7hrncyi++QyjOJYs+nx/cNcUrrSzDWYG/ 9II7HX0LGTnRV7XTz/xb6e+9Xm4oZ/lqDy3XoKJcWOckPVdwEhTdv5W+5dWgOZ5dTrQOs4 /l5YPzY/eaybK33nzVShlAZB/kkwprw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698326228; a=rsa-sha256; cv=none; b=sVR8B6imzh3p3YG/XMJd6lWis5L+UVoogVUxTtrG0eaOxp4mv6iSWZdz+eMrS0KHQn1ITp 1xUBTZ9OwAKLdEh3xj2BHLa/Fxo6xcH1mGxidR+oTTYv+QguEEnSuSq1vYsZ9FUu3HaR4u QnT2RtwdYasWqe0CHIJUWxJ1KNBYcvQ= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=none; spf=pass (imf11.hostedemail.com: domain of "SRS0=xBQK=GI=goodmis.org=rostedt@kernel.org" designates 145.40.73.55 as permitted sender) smtp.mailfrom="SRS0=xBQK=GI=goodmis.org=rostedt@kernel.org"; dmarc=none Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id 41E38CE3F6B; Thu, 26 Oct 2023 13:17:04 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BF0ECC433C9; Thu, 26 Oct 2023 13:16:59 +0000 (UTC) Date: Thu, 26 Oct 2023 09:16:58 -0400 From: Steven Rostedt To: Peter Zijlstra Cc: LKML , Thomas Gleixner , Ankur Arora , Linus Torvalds , linux-mm@kvack.org, x86@kernel.org, akpm@linux-foundation.org, luto@kernel.org, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, willy@infradead.org, mgorman@suse.de, jon.grimm@amd.com, bharata@amd.com, raghavendra.kt@amd.com, boris.ostrovsky@oracle.com, konrad.wilk@oracle.com, jgross@suse.com, andrew.cooper3@citrix.com, Joel Fernandes , Youssef Esmat , Vineeth Pillai , Suleiman Souhlal , Ingo Molnar , Daniel Bristot de Oliveira , Mathieu Desnoyers Subject: Re: [POC][RFC][PATCH] sched: Extended Scheduler Time Slice Message-ID: <20231026091658.1dcf2106@gandalf.local.home> In-Reply-To: <20231026084402.GK31411@noisy.programming.kicks-ass.net> References: <20231025054219.1acaa3dd@gandalf.local.home> <20231025102952.GG37471@noisy.programming.kicks-ass.net> <20231025085434.35d5f9e0@gandalf.local.home> <20231025135545.GG31201@noisy.programming.kicks-ass.net> <20231025103105.5ec64b89@gandalf.local.home> <20231026084402.GK31411@noisy.programming.kicks-ass.net> X-Mailer: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Stat-Signature: f3zrf8qu711gtjnzpsie1dyyz1onigm4 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 14AA74004B X-Rspam-User: X-HE-Tag: 1698326227-959110 X-HE-Meta: U2FsdGVkX193IYZNQ4jNmVpDKINeLMYcW/gotblQsVhB0bYlYdoWsvs4yk6CX0H34GqaDkFRY/v1v1v23d0LboDV8rwrU2eu8yqxJH3dy+v3gThL+8asyhsGlFhMmfDfrA9QS/H7tsAZ34k3t+d2eKJZk/moxCrxQES3PO2r2XzQ0OnkkOePCH7Xbj4vQIR2Qu47MkQKARD5DrD+Ynn4XFVAHsoW6JMiQO/XjIs0AdsQpSi5DvEO4AP4bwREklASt6CUE+CDUpi7t6RpVPJ9g+kHUvW2gYmx3kthnSf7fx2V+IwsGvkzLulwYVOEMSkJu00b2rRI/DS+e71zns/fWnLZXPjeY7ReEQL8fJTC1yQLcUuHbWvyllELAsKtobJLH+S12YVDCrdSEypbDb2Em0dbmMNP11C5QzIMTWrDodAZIywDurEynf9utPIkEIidK5qWAC2mp4lpzYbcgeGDG+/0rN9a1QUCjo8Chu0+BatoIplddWoo6sH3VtMvx0Js3unP8CCDcpO6xOJjeMQ3tobFoe0/qJni+TtF2PUeJZfYqbY7LGoKbWAZu7fG5HTK9Ii/CTUXoO+xr4twZ2229e2+ijyj53fPXBgI5wf0dTczuwWPvmimiu67h77GpRZzgxafvLAbq5Wzl8wuTJ2PtgGejQz81X9xo6PWsr9PegMV8Hd60ZrNJyMW/SJ47MKwKSDg08tKB0lk9Udt42c+w+Z7V2sJPrjD3MN9IawgVWLnQXcP4war5NkBnUJvaEGrFZPSGaGQIxWbTRB6X58/750XWxplkqA6QVDHBvrgBcJARp1ere82kZVEHsjQfjK+OtmUyZE28HkurfKYgmcyUTVcIaRkWpgfdYNIe/AW3c07D6Q1IuYDm2Z4VgwjhxITFpY0QrsQHMqcBEjaOwQ7uZwd68REm0wtD+IaH/NxgsXE7yqqaCbDF816zWxrdfRFbqR3l7ABE3+hbbwCouP EJnKKMP7 1cjQ/7ljWWptTfNOhWo4/Deqmg2a22J52CgDT6VzUIUNb7bhWKJAK0XEo0gg+BEyHaBnjMETgtI/NZ/2ZcfQuwozTVXw5nK2M3KlGdzoTTEU9OPl931bDGianKHjzn5DvULwegN68vNmXmI+ZYieVzKVSsYzqLhMCztv+7yO0OIXZH+SLaCs4EDL+gNUskYlDWXeTxr4MYaOcmVFlH85klqzbGA/KXrywtQ+mxcwf0Wk2nPWNo4o/ZJzMkolCggk66wlflglmMfaYdGBrIS7/3j1VYnvV1izXB8E7oiuQ3APoSfJ3oatzYBaHXKM6pxZu59oqdNH10Thz5wZqvfY6ckO1fwocIwrfzpL+MztO8vkQzu0M1yN5ma+OFBRZpU3OLpZQvAu8FlS7clY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, 26 Oct 2023 10:44:02 +0200 Peter Zijlstra wrote: > > Actually, it works with *any* system call. Not just sched_yield(). I ju= st > > used that as it was the best one to annotate "the kernel asked me to > > schedule, I'm going to schedule". If you noticed, I did not modify > > sched_yield() in the patch. The NEED_RESCHED_LAZY is still set, and wit= hout > > the extend bit set, on return back to user space it will schedule. =20 >=20 > So I fundamentally *HATE* you tie this hole thing to the > NEED_RESCHED_LAZY thing, that's 100% the wrong layer to be doing this > at. >=20 > It very much means you're creating an interface that won't work for a > significant number of setups -- those that use the FULL preempt setting. And why can't the FULL preempt setting still use the NEED_RESCHED_LAZY? PREEMPT_RT does. The beauty about NEED_RESCHED_LAZY is that it tells you whether you *should* schedule, or you *must* schedule (NEED_RESCHED). >=20 > > > > set this bit and leave it there for as long as you want, and it sho= uld not > > > > affect anything. =20 > > >=20 > > > It would affect the worst case interference terms of the system at the > > > very least. =20 > >=20 > > If you are worried about that, it can easily be configurable to be turn= ed > > off. Seriously, I highly doubt that this would be even measurable as > > interference. I could be wrong, I haven't tested that. It's something we > > can look at, but until it's considered a problem it should not be a show > > blocker. =20 >=20 > If everybody sets the thing and leaves it on, you basically double the > worst case latency, no? And weren't you involved in a thread only last > week where the complaint was that Chrome was a pig^W^W^W latency was too > high? In my first email about this: https://lore.kernel.org/all/20231024103426.4074d319@gandalf.local.home/ I said: If we are worried about abuse, we could even punish tasks that don't call sched_yield() by the time its extended time slice is taken. To elaborate further on this punishment, if we find that it does become an issue if a bunch of tasks were to always have this bit set and not giving up the CPU in a timely manner, it could be flagged to ignore that bit and/or remove some of its eligibility. That is, it wouldn't take too long before the abuser gets whacked and is no longer able to abuse. But I figured we would look into that if EEVDF doesn't naturally take care of it. >=20 > > > > If you look at what Thomas's PREEMPT_AUTO.patch =20 > > >=20 > > > I know what it does, it also means your thing doesn't work the moment > > > you set things up to have the old full-preempt semantics back. It > > > doesn't work in the presence of RT/DL tasks, etc.. =20 > >=20 > > Note, I am looking at ways to make this work with full preempt semantic= s. =20 >=20 > By not relying on the PREEMPT_AUTO stuff. If you noodle with the code > that actually sets preempt it should also work with preempt, but you're > working at the wrong layer. My guess is that NEED_RESCHED_LAZY will work with PREEMPT as well. That code is still a work in progress, and this code is dependent on that. Right now it depends on PREEMPT_AUTO because that's the only option that currently gives us NEED_RESCHED_LAZY. From reading the discussions from Thomas, it looks like NEED_RESCHED_LAZY will eventually be available in CONFIG_PREEMPT. >=20 > Also see that old Oracle thread that got dug up. I'll go back and read that. >=20 > > > More importantly, it doesn't work for RT/DL tasks, so having the bit = set > > > and not having OTHER policy is an error. =20 > >=20 > > It would basically be a nop. =20 >=20 > Well yes, but that is not a nice interface is it, run your task as RT/DL > and suddenly it behaves differently. User space spin locks would most definitely run differently in RT/DL today! That could cause them to easily deadlock. User space spin locks only make sense with SCHED_OTHER, otherwise great care needs to be taken to not cause unbounded priority inversion. Especially with FIFO. > > This is because these critical sections run much less than 8 atomic ops= . And > > when you are executing these critical sections millions of times a seco= nd, > > that adds up quickly. =20 >=20 > But you wouldn't be doing syscalls on every section either. If syscalls > were free (0 cycles) and you could hand-wave any syscall you pleased, > how would you do this? >=20 > The typical futex like setup is you only syscall on contention, when > userspace is going to be spinning and wasting cycles anyhow. The current > problem is that futex_wait will instantly schedule-out / block, even if > the lock owner is currently one instruction away from releasing the lock. And that is what user space adaptive spin locks are to solve, which I'm 100% all for! (I'm the one that talked Andr=C3=A9 Almeida into working on t= his). But as my tests show, the speed up is from keeping the lock holder from being preempted. The same is true for why Thomas created NEED_RESCHED_LAZY for PREEMPT_RT when it already had adaptive spin locks. -- Steve