From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6954CA0ECA for ; Tue, 12 Sep 2023 08:26:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3C04C6B00CE; Tue, 12 Sep 2023 04:26:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 347C86B00CF; Tue, 12 Sep 2023 04:26:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1E8676B00D0; Tue, 12 Sep 2023 04:26:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 09BD86B00CE for ; Tue, 12 Sep 2023 04:26:45 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id CFC90B3F7B for ; Tue, 12 Sep 2023 08:26:44 +0000 (UTC) X-FDA: 81227264328.27.B23ABF8 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) by imf19.hostedemail.com (Postfix) with ESMTP id 34A421A000A for ; Tue, 12 Sep 2023 08:26:41 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=infradead.org header.s=desiato.20200630 header.b="gnTg/fyJ"; dmarc=none; spf=none (imf19.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.92.199) smtp.mailfrom=peterz@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1694507203; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Vgh2RnJs2w1PlGNDLTN7kPBghNxkaN0skqLPdimcsl4=; b=F8A24b9IgKIXvRQMlkS7f11vd9RpQUMfJLnWbf75rKcuZPOLm8DGQqwsvyiYcXCYHiVQt5 llROjoI1SgXJtyreV0f3w60oIjHdQFxxgUyAH4FcquNG+s0QrHc+i+roe+Xj26jlRFk5ht B9DOuSLrrJq7X836VWrHseuyVpANHaI= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=infradead.org header.s=desiato.20200630 header.b="gnTg/fyJ"; dmarc=none; spf=none (imf19.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.92.199) smtp.mailfrom=peterz@infradead.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694507203; a=rsa-sha256; cv=none; b=BD5GLlyKxWOtPp6T5qscFL35Kl7t6XNlXg1RqNKB8ckeE+ailT1MBrP27kuQP8CojmrtLa n/YpNYKc9hY3vm0r3Sex1/yVeYvnKI7cI1/p5II51P8QqZEKxQByXZBHQbqLICm90aXzzj Y4n4wdAZVhFjdgp0pcg1Mr0puY79n58= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=Vgh2RnJs2w1PlGNDLTN7kPBghNxkaN0skqLPdimcsl4=; b=gnTg/fyJKCBGG4BoQ663gBIJr9 TMjeHCGVPHMcuhtqIEMTRnDmlDJNDDT1kefi37451MyqDpGtv+mr99pJFZAxb4tpTWDLo1zXAz/RE jnFAlpcd8zNpjhSvu7bAAVT7tVB3ZyhYYkkbSO4H6Y3OfCTzx2p3+SbODMkZ722bsFucib2RvWxQ/ gsYElfAIRSmLeKMX1WB0OaQOP5dwb1eMM2px8CJ3u+XsGXjy6eyUq6YQ6mY1BKbgB9/tt6o5rgHQX T9bwSKjRcnjB0X36RNIb7vQUzaCFG6AsnDJ7FmgNB6s0Se/jDW41uSNtUaZHzXBSUpFcQLK98WcgQ cH+ES4og==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1qfyij-005xCV-0M; Tue, 12 Sep 2023 08:26:07 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 1000) id 398A7300348; Tue, 12 Sep 2023 10:26:06 +0200 (CEST) Date: Tue, 12 Sep 2023 10:26:06 +0200 From: Peter Zijlstra To: Ankur Arora Cc: Linus Torvalds , linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org, akpm@linux-foundation.org, luto@kernel.org, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, willy@infradead.org, mgorman@suse.de, rostedt@goodmis.org, tglx@linutronix.de, jon.grimm@amd.com, bharata@amd.com, raghavendra.kt@amd.com, boris.ostrovsky@oracle.com, konrad.wilk@oracle.com, jgross@suse.com, andrew.cooper3@citrix.com Subject: Re: [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED Message-ID: <20230912082606.GB35261@noisy.programming.kicks-ass.net> References: <20230830184958.2333078-8-ankur.a.arora@oracle.com> <20230908070258.GA19320@noisy.programming.kicks-ass.net> <87zg1v3xxh.fsf@oracle.com> <87edj64rj1.fsf@oracle.com> <87zg1u1h5t.fsf@oracle.com> <20230911150410.GC9098@noisy.programming.kicks-ass.net> <87h6o01w1a.fsf@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87h6o01w1a.fsf@oracle.com> X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 34A421A000A X-Stat-Signature: wojdasns5kmg35stjtfr5h4jcexth5f8 X-Rspam-User: X-HE-Tag: 1694507201-22693 X-HE-Meta: U2FsdGVkX1/LBTg3BK68uOh9vfcFdUhGaVmYZTlgMPLhfkFhgZfUbxWAV/i67wbNMpioZgD6HGHIXBkH0Bzz4vPa0nGSr6U+2EY+TXMIQoH3CZ54L4SEiTSoQhR/oy1bbW6qGOP5bN8y+Y1xZUTPzg+qoHqU2meMi5AIK5gHvLd4O3Mld4CYc64UmmEuo/aVAgviCdMvqgAqAVbZpbt3g5hbyW1AebjXLId4WfQTt9r6zTE24FJieRSoHEKxlRXRvFRU0U71pNnsBxXkbtt/ApJg4aXcSnu0ZEeMGHdFBAJUdWZsHeur1PlSjKzegmWqQ1TBsUWbZ/veUwG4keJqVZcF1kyAoxGWsddeNEe3oQpm8zDe7ejxZV1ucP39M/cv2/OLy63whUBv9As8cxcWPm5SrOfgLtwKyhbvrxUsp7fmKPe5Y6LXy5LNSDxWSQY5+LvBN0rHNjILS1mzkPGqcOxA7eD6TGdXnkVzqTU3NWV4Xlci4lbe+3OwhdVa2wY2h7ZVNQAscRJM1hxjMwb+vxNV3PknP6lRoQPq6C59ZWzMGGLvytS70gzZPf0+FVbvyQ8dMAwJW9s/3cohtP/TCl5udXMUWIErSTbt7OzesOgscoiOYfsVGn1zD1VNDJvjMz8A3WfACjDyPaI3zYKxdI7ztaHM5LviA3ZdFU+84gyK4JCBDuFVfv6iamMTnOrqzEWnc8jPPKadf9gNLgJh5pNZ/WgpC/IAnvajnlVHeRWmYY+dr/2K1ESTZMVqORe3C8BGFxpAGeYdtgOdUQnuA+VFgGFRvfpnzmMVBgMubQ3BQIiz/fPtoPmTmkJHpIlC/kbLsuTKF3rBVnHWwfL48Zie1BCU8xid6QhiVSBsWGnh4rR55FqRBVF2G/M4YKSr9F/RFMjK4J6BEoM8ZX5KfptQ8q7tvzu2mfYNt/lCW+2UG/cHEtwSMyTlmMhor5khIJkoX4YqM3WGmxb/ChA uV06xcYL LSocxz7k9mJmGKZIGC6csY8OPpMYFzIOmnxsjv08U17dJirDangRV+kRXgyCZMBaYd/qmGH4niaKpIMhLPFan9QUuViVw6TwWWlr85YN8pcE6Y53VU2W/rzSU0/34LS+QgNO18MXf71Ubkgiwx0xvucLrPuzY02VEPWw5JsXPNNYClcDK1Z/XeQeq8ZJR5jFnhkVrw2Yf2vaULyVk9gCXMigbKvURy1oA8Ct0m48fU0jWqU5KE9n1Irf67g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Sep 11, 2023 at 10:04:17AM -0700, Ankur Arora wrote: > > Peter Zijlstra writes: > > > On Sun, Sep 10, 2023 at 11:32:32AM -0700, Linus Torvalds wrote: > > > >> I was hoping that we'd have some generic way to deal with this where > >> we could just say "this thing is reschedulable", and get rid of - or > >> at least not increasingly add to - the cond_resched() mess. > > > > Isn't that called PREEMPT=y ? That tracks precisely all the constraints > > required to know when/if we can preempt. > > > > The whole voluntary preempt model is basically the traditional > > co-operative preemption model and that fully relies on manual yields. > > Yeah, but as Linus says, this means a lot of code is just full of > cond_resched(). For instance a loop the process_huge_page() uses > this pattern: > > for (...) { > cond_resched(); > clear_page(i); > > cond_resched(); > clear_page(j); > } Yeah, that's what co-operative preemption gets you. > > The problem with the REP prefix (and Xen hypercalls) is that > > they're long running instructions and it becomes fundamentally > > impossible to put a cond_resched() in. > > > >> Yes. I'm starting to think that that the only sane solution is to > >> limit cases that can do this a lot, and the "instruciton pointer > >> region" approach would certainly work. > > > > From a code locality / I-cache POV, I think a sorted list of > > (non overlapping) ranges might be best. > > Yeah, agreed. There are a few problems with doing that though. > > I was thinking of using a check of this kind to schedule out when > it is executing in this "reschedulable" section: > !preempt_count() && in_resched_function(regs->rip); > > For preemption=full, this should mostly work. > For preemption=voluntary, though this'll only work with out-of-line > locks, not if the lock is inlined. > > (Both, should have problems with __this_cpu_* and the like, but > maybe we can handwave that away with sparse/objtool etc.) So one thing we can do is combine the TIF_ALLOW_RESCHED with the ranges thing, and then only search the range when TIF flag is set. And I'm thinking it might be a good idea to have objtool validate the range only contains simple instructions, the moment it contains control flow I'm thinking it's too complicated. > How expensive would be always having PREEMPT_COUNT=y? Effectively I think that is true today. At the very least Debian and SuSE (I can't find a RHEL .config in a hurry but I would think they too) ship with PREEMPT_DYNAMIC=y. Mel, I'm sure you ran numbers at the time (you always do), what if any was the measured overhead from PREEMPT_DYNAMIC vs 'regular' voluntary preemption?