From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42FF6C0032E for ; Wed, 25 Oct 2023 16:24:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BF9026B0343; Wed, 25 Oct 2023 12:24:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BA9136B0345; Wed, 25 Oct 2023 12:24:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A71D76B0347; Wed, 25 Oct 2023 12:24:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 9A37D6B0343 for ; Wed, 25 Oct 2023 12:24:50 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 78A76C069F for ; Wed, 25 Oct 2023 16:24:50 +0000 (UTC) X-FDA: 81384507540.03.25010C5 Received: from mail-lf1-f43.google.com (mail-lf1-f43.google.com [209.85.167.43]) by imf07.hostedemail.com (Postfix) with ESMTP id 7959440011 for ; Wed, 25 Oct 2023 16:24:48 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ntD5hNef; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf07.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.167.43 as permitted sender) smtp.mailfrom=mjguzik@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698251088; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Rb8J3cN3R9uMzhlYCtBR7uNblIZcRq+8mJPIykg4/p8=; b=AbeXrn8BOf0sJlJX9cNkhNIYsvddwVcPxg9kH57XwhdH+yziskYPmnsvOWM4nr4tuWhGWR KeuatEx0RdpcT2AcoGE35BuJ6XNimb2iTV9xRy2VWpF6YsJ21ozsYro64674ZUM+o7Oy6f L1G7f8zXvHF6i36kyLtKQa1ebwGHy0o= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ntD5hNef; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf07.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.167.43 as permitted sender) smtp.mailfrom=mjguzik@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698251088; a=rsa-sha256; cv=none; b=szBuRlMGItEZQL9NkdzzTVc8nUTB52ulKv4ZIkoDKqgJDhtq4Xq8M8yIObQcUWfikkA2Je pUnxyrUJKk1jXOCK9Glgf0jPzkMhlRQLdFfaJPkDC0oRH3OGNUTjqtZObE2LHi+Bq096xx wlmyb7c8YPtgJVw3sFerA/Egup17368= Received: by mail-lf1-f43.google.com with SMTP id 2adb3069b0e04-507d7b73b74so8461135e87.3 for ; Wed, 25 Oct 2023 09:24:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1698251086; x=1698855886; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=Rb8J3cN3R9uMzhlYCtBR7uNblIZcRq+8mJPIykg4/p8=; b=ntD5hNefenLJBOX9TUB4zD/AG6kDqDuI++Zn1RCBh6paNdMQYlZ7m919PmDu0mETsi bFmJVXVg8w327GaUdEamhlvh19+TNjZknPRErgI3I1zl6UxF4KMuPDsc1JZ2sQLRRi7u 4tFGO4wth4/Xa4btjwqFQInjQ0WX0T7uZBaTk8neV9KFYuBDEXhW0vXW6H7a0Yb6ileN La1JdQj7I07K3JQM58lHoN01piXawUuvgAbxR+aFvAsZAqigeB1JVS+b/H6RIULIApw+ WWgpxzwx35Qw3F7lr213PyiNeF0zbJXMaO0XKRB+7XAZq9Ouf6Q4BPjoEThBH3e+D+el yDIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698251086; x=1698855886; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Rb8J3cN3R9uMzhlYCtBR7uNblIZcRq+8mJPIykg4/p8=; b=iYkAKAUErvASNCapASSGUQCwIrFxDY2QVtY4MCUN95YltnnuBojpQ634wJFTRwZ/D7 Q5gWmx12g9m//4a0Lo3XGm28PhebwKGNeAYkT+XZP8+3BIzLyFBaSw4ABTF++cA30xoD QJ9KuEsjBpeJxxcIpn593BzmfA5M5el/buBAFkzQxqIBI6Q/6sLqFFyvL23opz71k8OG PsNN4VFdd01ce6jU4FJvM0I05Pzj2zXgWZrbLtVejm5X284PZsg4aLqFxK2QF0HDEbpL DMeEXMvNUuKLyObUL7kxuQ4aXMkbRRy704JU8I4mbnkMaDzX7VWLv1AT+uTwADLd0zq/ scaw== X-Gm-Message-State: AOJu0Yw6rzdICEciULvlknyaZzBQTtBMhJBdGsCpDWFbdt3f0rB11czM 7NBT02Y/cMali3FgUENyf5A= X-Google-Smtp-Source: AGHT+IH5lNOS/o2uNQlCPNMPFz89yBKto7oBJr9IMpNBYe7HXN1XpkH4fXYpXrvNzTkqg04R1tegzg== X-Received: by 2002:a19:7605:0:b0:507:a8cd:6c90 with SMTP id c5-20020a197605000000b00507a8cd6c90mr11665107lff.51.1698251086351; Wed, 25 Oct 2023 09:24:46 -0700 (PDT) Received: from f (cst-prg-84-142.cust.vodafone.cz. [46.135.84.142]) by smtp.gmail.com with ESMTPSA id dl11-20020a0560000b8b00b0032da4f70756sm12382332wrb.5.2023.10.25.09.24.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Oct 2023 09:24:45 -0700 (PDT) Date: Wed, 25 Oct 2023 18:24:35 +0200 From: Mateusz Guzik To: Mathieu Desnoyers Cc: Steven Rostedt , Peter Zijlstra , LKML , Thomas Gleixner , Ankur Arora , Linus Torvalds , linux-mm@kvack.org, x86@kernel.org, akpm@linux-foundation.org, luto@kernel.org, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, willy@infradead.org, mgorman@suse.de, jon.grimm@amd.com, bharata@amd.com, raghavendra.kt@amd.com, boris.ostrovsky@oracle.com, konrad.wilk@oracle.com, jgross@suse.com, andrew.cooper3@citrix.com, Joel Fernandes , Youssef Esmat , Vineeth Pillai , Suleiman Souhlal , Ingo Molnar , Daniel Bristot de Oliveira Subject: Re: [POC][RFC][PATCH] sched: Extended Scheduler Time Slice Message-ID: <20231025162435.ibhdktcshhzltr3r@f> References: <20231025054219.1acaa3dd@gandalf.local.home> <20231025102952.GG37471@noisy.programming.kicks-ass.net> <20231025085434.35d5f9e0@gandalf.local.home> <20231025135545.GG31201@noisy.programming.kicks-ass.net> <20231025103105.5ec64b89@gandalf.local.home> <884e4603-4d29-41ae-8715-a070c43482c4@efficios.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <884e4603-4d29-41ae-8715-a070c43482c4@efficios.com> X-Rspamd-Queue-Id: 7959440011 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 6aucpry5ueon5rnep699m1r9zgs6mh89 X-HE-Tag: 1698251088-253520 X-HE-Meta: U2FsdGVkX19KJ9fMg9LhdDgxZ6AHtT0qBo53Tc//K7idzc2LDUNpW4I9sZVxu8TIyACvpzYLPsDuOyK3Mhu7Q/iaWDZAmVK0lDOG6ZlxO+2RN6eGPKLOG6dspawbfQdbIyTQB9zh4GvuE+05y/tWB0Sjue5fr5o6Ukt4CjNcU4221Dlez/RPDxMBWiSZbrAmDF8wg0fonyd5IzLCp6J1bY1CKF/mSlRPkbnOjWiMOWQCo9MEMZez7aY0WSgdXQBRS93XJOy5qs2ewOdqm7l55v/rxqB5YdcgH3tEJtx7TvicdDaA/LNyzAGoT96XH+9bfUET91sF7b18g56u5Zr24mkaPfYR7XCLg1IERp49fhSafNFYuD2oOvzbqE4fxhl6a+CH6IoL/w+VS7U/UUwmBIphdN392xxzF5TR/9hjO08n548aE/q0gAdUB4fUx/fAlmjVGbCGH4nQy1CeBpKUVzxLLCIx+nrpgx/t3g7ljgqQxDxODhZVxti5IwbsZp1qWhhdNjyQsouy3DkrTNeAakDitGBZeE1DPd0UgB5bU0zrUzBdeT3WxivH5OmMiMBwiPZL4pHS4N4z4kCoiN67l0SVRhLhCN69K9swNA1WEWUNv2edN04Gx0a8668C113ZAa7sIdnanNVfiJldGpQ1vhrwl07HcweYTdUuYyd08+qA7JFRbhMMbHrLv2LFUjwKmfX92+YGwF9BSO6tuZNZsmi6qRQntaIXcH9we7lAp1AdvV0jSF5b58QFQClHKZ8mmNcxlDbpJvYkBDTTSN+ZKK+FA0UqbrlNWTL+SqY7b4vF5PGQjEgQAyavpbbKiWXgp8mWZxsYCubxG2tO/ZGAZKdfGYGmvR9lq5UbaqBEp7cE32xQOtL80RCrvOO+gBxZ+JQes4GFUwmT1B+HdBRY1b0q+bzghoQYcwQS5Q+SuKz3JwlezclagLGHiJgVzTMUoi6jEbvyNa2n+UOCE3o lAZC2y1i Os71Ln+YLQogexkJyQnz7irJWtLW93NRkjE5fZADk3qysxDlFBzlbU8Wm+jjdI4MGbbtyVZWU+JM1uZGkoiHM37R7R5iz/rzCUJj+Q7xEShCzj6GWhS4vtFq/eK/SJBCrKeGneKq6X7lpAkxPjDH5FBwcJkEAHbuOQEd3H5xi2ywOAhbeyPxxkDeZtQ3sNEvl/sW0rnm43zRcn4NVnXuaBSWsIeUvIdal7hSXeDdOlSUDWUrBk4jcQD66L/krvLbzjpx8O0DYmWXml2Wl0McquhCIzeET/Mi8RN/+wBUavaz4xf4D8eY9BV4UXYTQt1mJQF8h/xegYZqWE3iLSH2/lP2FVCRyZLOO065c5MbzJz6W9dVl6oxMOH4h0dhsbyGJ0ji1ISszjSwvupZcf46lnMBa6r3v7hHpQiQZUGku/h84iENG/g5TNQjUGDumgLEcF31MON0MZri+Ih+VJbPBCnIwuebpkMVvPkSFLDliK7v18UvyMxdKOrlBedU4GNCCfg3CjvWex2uOrpQy1t6yBL2IPT3rXqI5bPpCgSowh1RH+Oh/qSqh6ymI/52x3EJC5W7Aqamn2m4LG6pdTd+ZpBhsB4rvk2CljY/c X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Oct 25, 2023 at 11:42:34AM -0400, Mathieu Desnoyers wrote: > On 2023-10-25 10:31, Steven Rostedt wrote: > > On Wed, 25 Oct 2023 15:55:45 +0200 > > Peter Zijlstra wrote: > > [...] > > After digging lore for context, here are some thoughts about the actual > proposal: AFAIU the intent here is to boost the scheduling slice for a > userspace thread running with a mutex held so it can complete faster, > and therefore reduce contention. > > I suspect this is not completely unrelated to priority inheritance > futexes, except that one goal stated by Steven is to increase the > owner slice without requiring to call a system call on the fast-path. > > Compared to PI futexes, I think Steven's proposal misses the part > where a thread waiting on a futex boosts the lock owner's priority > so it can complete faster. By making the lock owner selfishly claim > that it needs a larger scheduling slice, it opens the door to > scheduler disruption, and it's hard to come up with upper-bounds > that work for all cases. > > Hopefully I'm not oversimplifying if I state that we have mainly two > actors to consider: > > [A] the lock owner thread > > [B] threads that block trying to acquire the lock > > The fast-path here is [A]. [B] can go through a system call, I don't > think it matters at all. > > So perhaps we can extend the rseq per-thread area with a field that > implements a "held locks" list that allows [A] to let the kernel know > that it is currently holding a set of locks (those can be chained when > locks are nested). It would be updated on lock/unlock with just a few > stores in userspace. > > Those lock addresses could then be used as keys for private locks, > or transformed into inode/offset keys for shared-memory locks. Threads > [B] blocking trying to acquire the lock can call a system call which > would boost the lock owner's slice and/or priority for a given lock key. > > When the scheduler preempts [A], it would check whether the rseq > per-thread area has a "held locks" field set and use this information > to find the slice/priority boost which are currently active for each > lock, and use this information to boost the task slice/priority > accordingly. > > A scheme like this should allow lock priority inheritance without > requiring system calls on the userspace lock/unlock fast path. > I think both this proposal and the original in this thread are opening a can of worms and I don't think going down that road was properly justified. A proper justification would demonstrate a big enough(tm) improvement over a locking primitive with adaptive spinning. It is well known that what mostly shafts performance of regular userspace locking is all the nasty going off cpu to wait. The original benchmark with slice extension disabled keeps using CPUs, virtually guaranteeing these threads will keep getting preempted, some of the time while holding the lock. Should that happen all other threads which happened to get preempted actively waste time. Adaptive spinning was already mentioned elsewhere in the thread and the idea itself is at least 2 decades old. If anything I find it strange it did not land years ago. I find there is a preliminary patch by you which exports the state so one can nicely spin without even going to the kernel: https://lore.kernel.org/lkml/20230529191416.53955-1-mathieu.desnoyers@efficios.com/ To be clear, I think a locking primitive which can do adaptive spinning *and* futexes *and* not get preempted while holding locks is the fastest option. What is not clear to me if it is sufficiently faster than adaptive spinning and futexes. tl;dr perhaps someone(tm) could carry the above to a state where it can be benchmarked vs the original patch