From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66BBCC77B7D for ; Wed, 17 May 2023 14:43:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AFEA5900005; Wed, 17 May 2023 10:43:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A88A6900004; Wed, 17 May 2023 10:43:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 929C7900005; Wed, 17 May 2023 10:43:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 80286900004 for ; Wed, 17 May 2023 10:43:51 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 55C8740545 for ; Wed, 17 May 2023 14:43:51 +0000 (UTC) X-FDA: 80800016262.16.9B8C91F Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf23.hostedemail.com (Postfix) with ESMTP id 62D31140018 for ; Wed, 17 May 2023 14:43:48 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=none; spf=pass (imf23.hostedemail.com: domain of mark.rutland@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=mark.rutland@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684334628; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=e6JZSdzCea7RbN9f3U7DRxn/anGPpJ8mRRtnFd4W/nE=; b=ivh0NQFy49mC1qwNBETv8JXGJjGZinM444FPB0489y8ejrmjrs8TdWvSBSZdUMXJbrdkal n0V9hgtJ9w8cKiIrzq4EvrpKPvBNqMAhQwevXMrF1ccHrp5HksdjU/P2Dvbvxh7s3hiLG0 OZ0tF5LdDvo2BKmwHt+RuDFXSQSPPpg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684334628; a=rsa-sha256; cv=none; b=Bja1HJbpAQpnvUCfGI+Ggj7kZfI3o2qDnACsxyNjOknOnTWskqT3F+aKzoC0RVb+7spaM/ uvSZIHR291OTFG4agHT5CkqHl+wezRBESV/hbMW5rjK6G99+fbgnGzdzgDUIx2wiMcOUDB RnmgWhVfJWzah5MFCQMinBGpxXPSdIc= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=none; spf=pass (imf23.hostedemail.com: domain of mark.rutland@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=mark.rutland@arm.com; dmarc=pass (policy=none) header.from=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id EDD511FB; Wed, 17 May 2023 07:44:31 -0700 (PDT) Received: from FVFF77S0Q05N.cambridge.arm.com (FVFF77S0Q05N.cambridge.arm.com [10.1.32.153]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 38BC73F73F; Wed, 17 May 2023 07:43:45 -0700 (PDT) Date: Wed, 17 May 2023 15:43:36 +0100 From: Mark Rutland To: Thomas Gleixner Cc: Nadav Amit , Uladzislau Rezki , "Russell King (Oracle)" , Andrew Morton , linux-mm , Christoph Hellwig , Lorenzo Stoakes , Peter Zijlstra , Baoquan He , John Ogness , linux-arm-kernel@lists.infradead.org, Marc Zyngier , x86@kernel.org Subject: Re: Excessive TLB flush ranges Message-ID: References: <87cz308y3s.ffs@tglx> <87y1lo7a0z.ffs@tglx> <87o7mk733x.ffs@tglx> <7ED917BC-420F-47D4-8956-8984205A75F0@gmail.com> <87bkik6pin.ffs@tglx> <87353v7qms.ffs@tglx> <87ttwb5jx3.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87ttwb5jx3.ffs@tglx> X-Rspamd-Queue-Id: 62D31140018 X-Stat-Signature: ajesqyhcwtt7xnixr4bq7gjcxhn8nt1f X-Rspam-User: X-Rspamd-Server: rspam09 X-HE-Tag: 1684334628-423050 X-HE-Meta: U2FsdGVkX19JW8EpTm8nt9AiJJYwaug2qkVJolYAMDCbM881xHaaneao5rZYZKwuyGk3EJ/b6y1bMevIY/Fzw2Q/XlNfHMKfOpfIQLTcIF41kfYBNflG2WnqwRgocNxczArGIqumBnQ6ETaLQzff/rjfum96gTFFpguCxoW5WvAnujurhJiG0lB8gbZDczcTemWKUuU/YShmJZcLK53ur30DXylGyVFY/O3TqUOUv/QysvD80+cUJovFx+PdcH5PLfxNEQ5k7hlptX7BaCrM4h6aQf1C3UndrwhyxD72s1nGkHGK9z0FBfc5NFVI7ufWol4jfo02W+CofIfKK93x9YETMvKvahUKJXviCnIiA/oJ35Xrt8YSDq6ZiI5Kb38cKXkAzQKYOK8bNpRxqEFwgu6Ag0VWAoCquZyTl/hWAboyF2+TDxkv6aRNjeFvAHvH9pdvWt2rcDBILez/QDmD9kMjPGTcMc6AGusqXKtDTFntWCE1bh/V6bnzrysRjYhw2IFCd0ykpUPNtrQyCbJndsYxed+pIzpt12CDLNMP1nf5iJRwG+ywUmiwb9noFKp8XBr601ti3mMfb8CRi6xlKReFw+pl1IOLA+8XmufhgY529nU1ExUChssi4Fj/H5MeryXsKu5QCJLalQtbbfFipYNDrmTpyn2udtwxLnRdqPbEJl4iRIxd30ZQFFbMUUwKHVF7gIcbyjd3q3YCdZAk8B6adu9FWRMXycBpz44hD+3GX1grr8NsIuz2IPCE8mIuVy40Oi3MjDxVOC1Wi08WSMm7K6Br410pmYddrKvtdvAr5Mdzd+UjFVm7fkc0V2KU4ddmzNaGpgYSxRaNycqv86az55bAHwsiruE1jSNhbSeMMobj+kkI2XHQAoWKcJxF2Ji7BcbaZcmvuyev/s5aL0maUMk7exNsET9Gp3cmo7eCooAAfagrpWmA3jDkgLOdoCh7blYD+CxtPencKnW t+FWxFht YQCeaqHQEaOm5wwTx395MBED6HpIWKAKz+UiYNIcdIwI4qynCVX0zxXhWjhz7OH6h5WiAeziGHhW19M4is1is6IQZf7wB6j29T31mBbgyNCqJf/SblwIMieNpaWojNY4+8SoI+CXGP/ebpZhZucJ76FKp2kvFAly9bzq2vFEUgjiQNCYAjH8YiudLrtrXdxk+RSdpqfFBcvCxxjnlPimRRVJe+jRBLRPKCmfam4xhpGgZKeoiCE1KNRtJFN9RAVEw+TkIREKynbUPwhfl71QCoRyXtw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, May 17, 2023 at 12:31:04PM +0200, Thomas Gleixner wrote: > On Tue, May 16 2023 at 18:23, Nadav Amit wrote: > >> On May 16, 2023, at 5:23 PM, Thomas Gleixner wrote: > > My experience with non-IPI based TLB invalidations is more limited. IIUC > > the usage model is that the TLB shootdowns should be invoked ASAP > > (perhaps each range can be batched, but there is no sense of batching > > multiple ranges), and then later you would issue some barrier to ensure > > prior TLB shootdown invocations have been completed. > > > > If that is the (use) case, I am not sure the abstraction you used in > > your prototype is the best one. > > The way how arm/arm64 implement that in software is: > > magic_barrier1(); > flush_range_with_magic_opcodes(); > magic_barrier2(); FWIW, on arm64 that sequence (for leaf entries only) is: /* * Make sure prior writes to the page table entries are visible to all * CPUs, so that *subsequent* page table walks will see the latest * values. * * This is roughly __smp_wmb(). */ dsb(ishst) // AKA magic_barrier1() /* * The "TLBI *IS, " instructions send a message to all other * CPUs, essentially saying "please start invalidating entries for * " * * The "TLBI *ALL*IS" instructions send a message to all other CPUs, * essentially saying "please start invalidating all entries". * * In theory, this could be for discontiguous ranges. */ flush_range_with_magic_opcodes() /* * Wait for acknowledgement that all prior TLBIs have completed. This * also ensures that all accesses using those translations have also * completed. * * This waits for all relevant CPUs to acknowledge completion of any * prior TLBIs sent by this CPU. */ dsb(ish) // AKA magic_barrier2() isb() So you can batch a bunch of "TLBI *IS, " with a single barrier for completion, or you can use a single "TLBI *ALL*IS" to invalidate everything. It can still be worth using the latter, as arm64 has done since commit: 05ac65305437e8ef ("arm64: fix soft lockup due to large tlb flush range") ... as for a large range, issuing a bunch of "TLBI *IS, " can take a while, and can require the recipient CPUs to do more work than they might have to do for a single "TLBI *ALL*IS". The point at which invalidating everything is better depends on a number of factors (e.g. the impact of all CPUs needing to make new page table walks), and currently we have an arbitrary boundary where we choose to invalidate everything (which has been tweaked a bit over time); there isn't really a one-size-fits-all best answer. Thanks, Mark.