From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56F93C25B6B for ; Thu, 26 Oct 2023 07:50:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DC7CD8D0024; Thu, 26 Oct 2023 03:50:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D77118D0001; Thu, 26 Oct 2023 03:50:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C18958D0024; Thu, 26 Oct 2023 03:50:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id B27EE8D0001 for ; Thu, 26 Oct 2023 03:50:27 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 913E6B5ED2 for ; Thu, 26 Oct 2023 07:50:27 +0000 (UTC) X-FDA: 81386840094.12.BFD8C9E Received: from mail-pg1-f178.google.com (mail-pg1-f178.google.com [209.85.215.178]) by imf17.hostedemail.com (Postfix) with ESMTP id A2E344000E for ; Thu, 26 Oct 2023 07:50:25 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=SL3tUIzs; dmarc=pass (policy=none) header.from=chromium.org; spf=pass (imf17.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.215.178 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698306625; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4oF7/sp9r8tFN/7q4wIyKyAjM4HjrtS/MA/+IaxCrIo=; b=BISzxxSS/umi33R30XhT+Alv8BW157Ey3nNnr1aMLLAPIQdK96cMPma87jADVRq7hvzfNB OlDDtdgEaPALmNAuUQOf4Yb4FfHFVI2s0GRca6vpsz2CNmt584SLFQGfSNrGaq0ugMQywP UhdeboShDCpIOoLcfTO9UBwniGFL5p4= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=SL3tUIzs; dmarc=pass (policy=none) header.from=chromium.org; spf=pass (imf17.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.215.178 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698306625; a=rsa-sha256; cv=none; b=EVyPI5b5KIWk6+OeLytwKhjHfXE21/yNSO7AIzkXThDzQbbYR3yP0CfpWJK25OwXY4Gr40 ts/b4u3CAWdaloBkehz2YOFknyNq+A+8SrgWsYYDRi5ZupECK6PSYkbNvcjdeYQ11aIf/2 UhECOpTRVBSHt4/nk9JRineNkPx65oI= Received: by mail-pg1-f178.google.com with SMTP id 41be03b00d2f7-5b5354da665so456597a12.2 for ; Thu, 26 Oct 2023 00:50:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1698306624; x=1698911424; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=4oF7/sp9r8tFN/7q4wIyKyAjM4HjrtS/MA/+IaxCrIo=; b=SL3tUIzsQvSnLEzR1EVEhXOibE3/MwdFaNAQ01Hs2AT7LiwAVDcut9awsZeaN9Ri9V UFCDdMBKvS2s9Ch+OEUYUmGDCKArmQvgShATQX4ara9r8c6TDHSFcrv5Lt/IsVZC3S+i hS+eAYcQyFccLAJcbh+PqxF4y2CNG8EzUh9A8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698306624; x=1698911424; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=4oF7/sp9r8tFN/7q4wIyKyAjM4HjrtS/MA/+IaxCrIo=; b=gxVnxd6iWJuhJ+k8o3BK3nwuf7IYT+VxpHbVxdL1FKwwScJymmKb4L65oGeLZ3MKRD 5G+UmsPI5+RYvss6jt+ShuNezwjILnn6PYZN5RRP5entqyr+wU/VRNOaozd55+s7PgQ3 m//1AJ4yed1Vidv6MImYYHVPCxoB+rCKFWjaKlGOsKLt6M4p4p2AVwQbUzjvmLR80iQi qKR9shzKKvuNTdqtJDrdsQ3xoTB+pxtV6QHUAZ6bKfsDd2oyMpomBLR4+kXfeB3fMrjj YPPefeSjZ90cNACDJ2h6Gyu9qCYPxtoFH6y/m949FH3FZw5f4uz/SoSnQc1lzYKYjMka 8u9A== X-Gm-Message-State: AOJu0Yx6iO1jyBSaxr1o3n5ZEuiHVOfPeJASBHjAs11imdcbKlvsi4V3 bhaaDsFMCwk3V6tZvkDeXdxIEg== X-Google-Smtp-Source: AGHT+IEjhrgzbd6xxkUfyhfvIL+VYNKXZWIwdhaq2GkkASkOOL6jU4CP1v1Uh6uOkz0lGvT4bbSvtw== X-Received: by 2002:a17:90a:df8f:b0:27d:237b:558b with SMTP id p15-20020a17090adf8f00b0027d237b558bmr15773440pjv.5.1698306624531; Thu, 26 Oct 2023 00:50:24 -0700 (PDT) Received: from google.com ([2401:fa00:8f:203:f228:3a07:1e7f:b38f]) by smtp.gmail.com with ESMTPSA id n20-20020a17090ade9400b0027d1366d113sm1028327pjv.43.2023.10.26.00.50.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 Oct 2023 00:50:24 -0700 (PDT) Date: Thu, 26 Oct 2023 16:50:16 +0900 From: Sergey Senozhatsky To: Steven Rostedt Cc: Thomas Gleixner , Peter Zijlstra , Ankur Arora , Linus Torvalds , linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org, akpm@linux-foundation.org, luto@kernel.org, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, willy@infradead.org, mgorman@suse.de, jon.grimm@amd.com, bharata@amd.com, raghavendra.kt@amd.com, boris.ostrovsky@oracle.com, konrad.wilk@oracle.com, jgross@suse.com, andrew.cooper3@citrix.com, Joel Fernandes , Youssef Esmat , Vineeth Pillai , Suleiman Souhlal Subject: Re: [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED Message-ID: <20231026075016.GC15694@google.com> References: <87edj64rj1.fsf@oracle.com> <87zg1u1h5t.fsf@oracle.com> <20230911150410.GC9098@noisy.programming.kicks-ass.net> <87h6o01w1a.fsf@oracle.com> <20230912082606.GB35261@noisy.programming.kicks-ass.net> <87cyyfxd4k.ffs@tglx> <20231024103426.4074d319@gandalf.local.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231024103426.4074d319@gandalf.local.home> X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: A2E344000E X-Stat-Signature: wdi1zs674eqhjsrp1cww65wbbn1ik58h X-Rspam-User: X-HE-Tag: 1698306625-778943 X-HE-Meta: U2FsdGVkX18kPEQ3iuiu8/Wcj2qD1SckgCYkgnSVlcdVstorDlbLF2kx4bcFe+s3vgZ7XDCImwA0TUh8PA7yPyuIC9KnCt5x0utq4LKLfchMjGz13eOX+Wr33i5iJkFOb+1xwSMYUBXmxhgmACTHxSUmt4dj1alcwaeanjta/1yv7Vg8/TU/gq0ZswtdZ28TSxtc7urgM5RNUe9qbS7CIJUNuvuZSmBWyk8sR2j2o+bnJckpgz3wGm+edRlIukqtSjmIgPm6IVkWMkfEkZsYpE1DxDO9DZAS/rnLsxGeZu64sygs0h9iInZ4ZAfbElGGng/4KVUfFxOf90BmNhrfohTXWgxKIIZYYmJpiVHBdogNVFs/GF36UNmqVyBtId1hDm0ta8w50aXPBdk9YmAWqYDqzKyhUdo9Z49P9WPvtKN1v0FaZ357bbnF9Yk03Oq5dYppB/JNSCO4Spu9F7L2LMIzFcmKnjsOmL0JJ1RJxZmSnlldIH4IIMKRA6ahzvsOQu6JmWBDZ14QF86ZnQnqjq/tz/8gwgWpgT5iRKXfc+FJQ239DcUgdDD7QUQlle37zxGBGhUQIB/3vmOqrVACwQLMYX0Xwc6akLcrLjpcDqiuMHgFYal5613nF1gEkhEAbYSKWEtntRE7PYL88zz76gMZlbukmeOnGhr5A0OaXBpQcek9JMnhz/keBBmNtlDdokkntoQq4L4FmCgxX4wGvHDBD2NX6gBJ/l/svCVYFcPj26W9lUUL+8iJrH5g5DEz9e4HFGa8czDseypQjZOMDqup8+EVQdlip+p+VDcKtsE67/SPzItBiBJIEQtp46xAOmcpiqxfT+43NsjNL1bKcMsE+UC6yqO6nIFnHuPq0WUer/NdDI+OKEYc3dwiKMAhpHbb7BV9zHxqVg7WpJpA3b9CFt6elA6/PQznLwiS9PlrWHmBO6X/wGIC9IOuDZkOVzLrCD/NdcVOdvPH4zO YMs97wKE mF4psPhMdxSjxnUI3oSt+RucvroX6wL8/W7wmt9WLTBYDOVAg5Vq/+2BJRbkxQCWCSfeLW4LWMN1OC1Fj9c0brSU+dpsynkPH0QD8A4UllZ9jl916GZKgqhiPtCNd6Z8hsuu9lsWL+L/Knbkd5FA9DtltE7nnDqtufjU/XvA+6+DCbWpLCkXclXAD50O8KxrL2AvbbecADwIFNuxkCz6gWe3lILCU0WG+hsdiG2rG99bDy8WTHuiRecNt0V/3xi8MJAT+uCCkQhSSUYGoAFOQX0FipKjR/d4rHSqLq/vA/Gqc1lP3r/D2/hV+Pi+9tDU8suLB0n92iekd+MKYhUl7brSHZWQh9bHJT1Tc21pEdfF4/yYT6Q+SlYSOmw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On (23/10/24 10:34), Steven Rostedt wrote: > On Tue, 19 Sep 2023 01:42:03 +0200 > Thomas Gleixner wrote: > > > 2) When the scheduler wants to set NEED_RESCHED due it sets > > NEED_RESCHED_LAZY instead which is only evaluated in the return to > > user space preemption points. > > > > As NEED_RESCHED_LAZY is not folded into the preemption count the > > preemption count won't become zero, so the task can continue until > > it hits return to user space. > > > > That preserves the existing behaviour. > > I'm looking into extending this concept to user space and to VMs. > > I'm calling this the "extended scheduler time slice" (ESTS pronounced "estis") > > The ideas is this. Have VMs/user space share a memory region with the > kernel that is per thread/vCPU. This would be registered via a syscall or > ioctl on some defined file or whatever. Then, when entering user space / > VM, if NEED_RESCHED_LAZY (or whatever it's eventually called) is set, it > checks if the thread has this memory region and a special bit in it is > set, and if it does, it does not schedule. It will treat it like a long > kernel system call. > > The kernel will then set another bit in the shared memory region that will > tell user space / VM that the kernel wanted to schedule, but is allowing it > to finish its critical section. When user space / VM is done with the > critical section, it will check the bit that may be set by the kernel and > if it is set, it should do a sched_yield() or VMEXIT so that the kernel can > now schedule it. > > What about DOS you say? It's no different than running a long system call. > No task can run forever. It's not a "preempt disable", it's just "give me > some more time". A "NEED_RESCHED" will always schedule, just like a kernel > system call that takes a long time. The goal is to allow user space to get > out of critical sections that we know can cause problems if they get > preempted. Usually it's a user space / VM lock is held or maybe a VM > interrupt handler that needs to wake up a task on another vCPU. > > If we are worried about abuse, we could even punish tasks that don't call > sched_yield() by the time its extended time slice is taken. Even without > that punishment, if we have EEVDF, this extension will make it less > eligible the next time around. > > The goal is to prevent a thread / vCPU being preempted while holding a lock > or resource that other threads / vCPUs will want. That is, prevent > contention, as that's usually the biggest issue with performance in user > space and VMs. I think some time ago we tried to check guest's preempt count on each vm-exit and we'd vm-enter if guest exited from a critical section (those that bump preempt count) so that it can hopefully finish whatever is was going to do and vmexit again. We didn't look into covering guest's RCU read-side critical sections. Can you educate me, is your PoC significantly different from guest preempt count check?