From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D711C25B6B for ; Thu, 26 Oct 2023 08:44:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 75FA66B0340; Thu, 26 Oct 2023 04:44:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7102E6B0341; Thu, 26 Oct 2023 04:44:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5FF636B0345; Thu, 26 Oct 2023 04:44:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 4CBFF6B0340 for ; Thu, 26 Oct 2023 04:44:36 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 1C62B80B05 for ; Thu, 26 Oct 2023 08:44:36 +0000 (UTC) X-FDA: 81386976552.08.DB15E65 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) by imf05.hostedemail.com (Postfix) with ESMTP id E170D100004 for ; Thu, 26 Oct 2023 08:44:32 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=infradead.org header.s=desiato.20200630 header.b=ZYIbKNO5; dmarc=none; spf=none (imf05.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.92.199) smtp.mailfrom=peterz@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698309873; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nz7VKBsdenxgTpTOdHB+C9mGiFHuzJaU++ZOYTWvdxQ=; b=N8kRlsZo9Ecjqn8l0qyZDwcssXXKZ31MjzPXkpttxUHm5+X810/IAbqyE0jE9ytz9S9NNp 1O2HSqlw1MFpVZ/2QGc7snshq4BOvFJR7F3NJ0OjPyqAYWeNP6kZ+IygD9wUNbtfYUq9N6 1uxccWcuVtDzK2Epwxb7hxUqEx7IHGI= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=infradead.org header.s=desiato.20200630 header.b=ZYIbKNO5; dmarc=none; spf=none (imf05.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.92.199) smtp.mailfrom=peterz@infradead.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698309873; a=rsa-sha256; cv=none; b=5+vUqYT5+Cdh9e7m/i+gG7tK3Cn1TTnbwB8QHj43AjrvH3O4CZQ7IXq0R2onKUEcVx9wJs nj1h602PVcWXeqqtQ2DUe8e4jguo0e2UxhnbGzR7SbAePeb5dKNuK0zv+w5YJjtacg5ajZ qLsfmOi9f4tkFzlaDWNpEmuCuRZ2NAE= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=nz7VKBsdenxgTpTOdHB+C9mGiFHuzJaU++ZOYTWvdxQ=; b=ZYIbKNO53OgUvEkG/Us0fuh3Ds vYU6ICl6F8k46IdLhPdlYE5TpIcgAk8YUyDcccDDDw3g5Vgvi2lZdYK4yU/UFT22+K5oTApX2cCiB HJpVHfOolzEswLI9vOfCr2KsH3x7RvRvvWFYEdjlG85WdJPn0QzdK7tIP0I59uAm+XNKXMCfPZaX9 9RYoqGuQhwKnwYDe27s3stjiEtEUxT8JJ3wDyVaejWRUeEgr/0mqMtri4FX8VtIpkNzZCQRzVEOuL HJnoLYEMfVIui8UFSLvgUMqbZarm9bAz63vpl/b+FSu6Dw4jWoDUwv+lU6dprjvJeQrJidHqrhdAL friE0vrQ==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1qvvyE-00H9iA-1k; Thu, 26 Oct 2023 08:44:02 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 1000) id 32081300473; Thu, 26 Oct 2023 10:44:02 +0200 (CEST) Date: Thu, 26 Oct 2023 10:44:02 +0200 From: Peter Zijlstra To: Steven Rostedt Cc: LKML , Thomas Gleixner , Ankur Arora , Linus Torvalds , linux-mm@kvack.org, x86@kernel.org, akpm@linux-foundation.org, luto@kernel.org, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, willy@infradead.org, mgorman@suse.de, jon.grimm@amd.com, bharata@amd.com, raghavendra.kt@amd.com, boris.ostrovsky@oracle.com, konrad.wilk@oracle.com, jgross@suse.com, andrew.cooper3@citrix.com, Joel Fernandes , Youssef Esmat , Vineeth Pillai , Suleiman Souhlal , Ingo Molnar , Daniel Bristot de Oliveira , Mathieu Desnoyers Subject: Re: [POC][RFC][PATCH] sched: Extended Scheduler Time Slice Message-ID: <20231026084402.GK31411@noisy.programming.kicks-ass.net> References: <20231025054219.1acaa3dd@gandalf.local.home> <20231025102952.GG37471@noisy.programming.kicks-ass.net> <20231025085434.35d5f9e0@gandalf.local.home> <20231025135545.GG31201@noisy.programming.kicks-ass.net> <20231025103105.5ec64b89@gandalf.local.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231025103105.5ec64b89@gandalf.local.home> X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: E170D100004 X-Stat-Signature: e8jt1etwb1u4wswdct64kookqcf8asih X-Rspam-User: X-HE-Tag: 1698309872-13260 X-HE-Meta: U2FsdGVkX1/8TkYiU1AtNQQnF4O8XCgy/sG7MqyYVGbRvndN1WEf8r5pI5fJbk8DqGcjokOTWkkkZXklCeuTV2h1jcNegXIkR8qFJOQRMW/95itgDE0WE7T3Vs43ApCM+tm5m/M+Pkgfb8iyjF/LVDAqfQJ4ru3qrGb/qb1PGOAy+tmqQRa3JRdHgfq5aOESXdRlHiMmvPOUa8jUfTbAuDF8RbgXlwc6U5DOigp0q+vlMOu0TF/0MqXtewXNinnDKMnjUJP8fMu4uiDwmflHCxG2m3/hOQK/UR3Ei5DiWqyPhL95WwrcHJ90wRbFP26adse/VMJWUYZrc/qyGxvxrNMa58yvQnRW8Ofs4iVT85JzA+1Vn1f8BpUww0vxSMi2Jk9pjJbA7wd17/ixUJCG2NiL24zU3Jef20Kw3HlxTDOe18dTA3UrDjMIXsZvIxPRZN2dE/1lyHLPRz5CddafK1Xf4iIopJB2vH84awUERdoq5H9ueUBT/SJ5vnuhiqbBjFqMYSoMoP39CIbutpWqbi9Z+HrmrKuiSQLFrxzGgBz6QSCDJK/TtV2/9AozwIMj0jRTixi+Yc1L28F2aLqIa20PeUFeij3xwrWIFCHzrtV8Q9E4XZ8NAyv4i2TGHQxLtejQDu0nQCbgI+Iyw09Jx8fOvNE2/itm4OiyIsWWJUfRs1siPgNpIPFFA1YNnsakX2Fqt8H3XDIflcy3HCevBPtdvewwjFGSgCj2cp+WMLNgbS7DKhc4Rfg2pgPhb6fbQIGtpqNCQNlqHpP3Va4QsFxxnRI/uYRudcjmaT/GTzHdH6ZVbfGPKd1fYUHi4bsiNgN9WUR4+nmS5k20VMpE+79tG9qrF+25nq67+Ca73Ig65tSH9PVeg5xGDRMLMiykpl4j7zr+NscWjFDoy9nsfJR2U914Uz54NMHZoHpCInofvonikAtkXBZ1cNPPZKwS/5PqYf6IpfsAR+PtO/z 8sqIl6b/ +JbYKuFnmyJFukMXT/plbiAUP+K03MTMhUQHA270ywsGllW+kKgBwbOXptBqeEhdE7Xb4zAYPC7WyrgmL96cNrs3AfUlVRxgMrtE8QvscIyZNOldMwUiULKjvGpQ2cpRp27flZSHkxIpGMG8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Oct 25, 2023 at 10:31:05AM -0400, Steven Rostedt wrote: > > > > So what if it doesn't ? Can we kill it for not playing nice ? > > > > > > No, it's no different than a system call running for a long time. You could > > > > Then why ask for it? What's the point. Also, did you define > > sched_yield() semantics for OTHER to something useful? Because if you > > didn't you just invoked UB :-) We could be setting your pets on fire. > > Actually, it works with *any* system call. Not just sched_yield(). I just > used that as it was the best one to annotate "the kernel asked me to > schedule, I'm going to schedule". If you noticed, I did not modify > sched_yield() in the patch. The NEED_RESCHED_LAZY is still set, and without > the extend bit set, on return back to user space it will schedule. So I fundamentally *HATE* you tie this hole thing to the NEED_RESCHED_LAZY thing, that's 100% the wrong layer to be doing this at. It very much means you're creating an interface that won't work for a significant number of setups -- those that use the FULL preempt setting. > > > set this bit and leave it there for as long as you want, and it should not > > > affect anything. > > > > It would affect the worst case interference terms of the system at the > > very least. > > If you are worried about that, it can easily be configurable to be turned > off. Seriously, I highly doubt that this would be even measurable as > interference. I could be wrong, I haven't tested that. It's something we > can look at, but until it's considered a problem it should not be a show > blocker. If everybody sets the thing and leaves it on, you basically double the worst case latency, no? And weren't you involved in a thread only last week where the complaint was that Chrome was a pig^W^W^W latency was too high? > > > If you look at what Thomas's PREEMPT_AUTO.patch > > > > I know what it does, it also means your thing doesn't work the moment > > you set things up to have the old full-preempt semantics back. It > > doesn't work in the presence of RT/DL tasks, etc.. > > Note, I am looking at ways to make this work with full preempt semantics. By not relying on the PREEMPT_AUTO stuff. If you noodle with the code that actually sets preempt it should also work with preempt, but you're working at the wrong layer. Also see that old Oracle thread that got dug up. > > More importantly, it doesn't work for RT/DL tasks, so having the bit set > > and not having OTHER policy is an error. > > It would basically be a nop. Well yes, but that is not a nice interface is it, run your task as RT/DL and suddenly it behaves differently. > Remember, RT and DL are about deterministic behavior, SCHED_OTHER is about > performance. This is a performance patch, not a deterministic one. Yeah, but performance means something different depending on who and when you talk to someone. > > But even today (on good hardware or with mitigations=off): > > > > gettid-1m: 179,650,423 cycles > > xadd-1m: 23,036,564 cycles > > > > syscall is the cost of roughly 8 atomic ops. More expensive, sure. But > > not insanely so. I've seen atomic ops go up to >1000 cycles if you > > contend them hard enough. > > > > This has been your argument for over a decade, and the real world has seen > it differently. Performance matters significantly for user applications, and > if system calls didn't have performance issues, I'm sure the performance > centric applications would have used them. > > This is because these critical sections run much less than 8 atomic ops. And > when you are executing these critical sections millions of times a second, > that adds up quickly. But you wouldn't be doing syscalls on every section either. If syscalls were free (0 cycles) and you could hand-wave any syscall you pleased, how would you do this? The typical futex like setup is you only syscall on contention, when userspace is going to be spinning and wasting cycles anyhow. The current problem is that futex_wait will instantly schedule-out / block, even if the lock owner is currently one instruction away from releasing the lock.