From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2D47EC77B7A for ; Wed, 17 May 2023 16:41:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9E6DC900006; Wed, 17 May 2023 12:41:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 99885900003; Wed, 17 May 2023 12:41:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 885C9900006; Wed, 17 May 2023 12:41:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 79AD3900003 for ; Wed, 17 May 2023 12:41:50 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 416D712064E for ; Wed, 17 May 2023 16:41:50 +0000 (UTC) X-FDA: 80800313580.19.92B44A3 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by imf19.hostedemail.com (Postfix) with ESMTP id 4D31C1A0013 for ; Wed, 17 May 2023 16:41:47 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b=SqhFtCn1; dkim=pass header.d=linutronix.de header.s=2020e header.b=xBF97RtS; dmarc=pass (policy=none) header.from=linutronix.de; spf=pass (imf19.hostedemail.com: domain of tglx@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=tglx@linutronix.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684341707; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=oaxra5LHo3xgJ7S7vUrzvDP6NZUhTIgpDNXRpQYpnMQ=; b=5Pjjicw1zt0yBTenBj88Kt2pDEhnkmzl1A+otBVM68r3/+7i31LIky+B6EZlLiJgqH9TWS lsNC5mrMo8x4womt0lDWTN/A9FEFj09M7T8CKp2YDBd/bvGfhLeO+f4pB6Q3LbC2nkBi3L zuwcvOFXzJadeRmDevKUHB+Xqhc9KJc= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b=SqhFtCn1; dkim=pass header.d=linutronix.de header.s=2020e header.b=xBF97RtS; dmarc=pass (policy=none) header.from=linutronix.de; spf=pass (imf19.hostedemail.com: domain of tglx@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=tglx@linutronix.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684341707; a=rsa-sha256; cv=none; b=Cp0P+lbcC/kG/0VafgoMkma6j1hGn6dkrOVTIRKxGhBgzojqGOqE3XShyBEfmsVwBMNSBl lqFoK+rZe94c1Q3PEI0471UIMCbks0y1K1K6//emP1nGx4sr5xYk7LFfd3ivdQmela3+Zg ktYrCZ6b8v/hBM0hZqabEo21kjTVKtk= From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1684341705; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=oaxra5LHo3xgJ7S7vUrzvDP6NZUhTIgpDNXRpQYpnMQ=; b=SqhFtCn1feXTg0JlJ0doVT2H2tx8esA8PU4FfccLzaFx2egMJo4sEGmrMi6/TPi+jAmQ0l yIAfFixmYJKrELPdZlfz0B6KYI4DZh+ohIV8u7iMRqpC9lnK+rPAHvEsCfgFjd4YJkJtTb xYooZz/luyFRT9flyyhOY7YIAhJfHILFL9ajLOju1da29KLvxQE4zeqWwYYFx68353uLVV G61RIJYkFqbIm2MI2U2nU7IQaKY/RIUJvM2xDX/NlFvWryHObTTfBujy1NyZHs2lkJele6 A3JO29WOr1jgFG5OhSo8VbNRQ9ReXBo/H866fdtjED4dF1MXNLxx/AH9HrULqQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1684341705; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=oaxra5LHo3xgJ7S7vUrzvDP6NZUhTIgpDNXRpQYpnMQ=; b=xBF97RtSz9tINsAxIHlc7nErm3fhbKdEHjFn+SmqJI94mETIXBMj3x87ePt2x87B40XAtY tf4bxnozuONCwbDw== To: Mark Rutland Cc: Nadav Amit , Uladzislau Rezki , "Russell King (Oracle)" , Andrew Morton , linux-mm , Christoph Hellwig , Lorenzo Stoakes , Peter Zijlstra , Baoquan He , John Ogness , linux-arm-kernel@lists.infradead.org, Marc Zyngier , x86@kernel.org Subject: Re: Excessive TLB flush ranges In-Reply-To: References: <87cz308y3s.ffs@tglx> <87y1lo7a0z.ffs@tglx> <87o7mk733x.ffs@tglx> <7ED917BC-420F-47D4-8956-8984205A75F0@gmail.com> <87bkik6pin.ffs@tglx> <87353v7qms.ffs@tglx> <87ttwb5jx3.ffs@tglx> Date: Wed, 17 May 2023 18:41:44 +0200 Message-ID: <87bkii6hbr.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain X-Stat-Signature: foo3aouz7n4yjfr4q9cuw37mtk7mj6ip X-Rspam-User: X-Rspamd-Queue-Id: 4D31C1A0013 X-Rspamd-Server: rspam07 X-HE-Tag: 1684341707-956307 X-HE-Meta: U2FsdGVkX19BQYpnhlo3e820wqxBXAeY7ObREGLq4+FJfY8938WGGHSSeTzD5aX5+GlVlXZrpzQyWzNT/tvAAE8JgN0bxUSHJlGE8VxJj+pMjoTmxDeG6uZzl84MYrnkRVue7vQnWvlt04oyAuBcqahPmhK28xRp/wRBFlc0ljcQ8zbj5AM4mPLeZiX52QLWtRX078ujtrjwcf6xCnuHntTv/WeY8p3uXEyRufxwyXpwh4Z2STs8cwRa/5nRrQnba7xfK6qUQzB5+8sOTYv2NdFzmGGrtgzuN/jq0pR9BLio6T0VqkbUI7dyw/Hpnm0Wf6+ce6c0GmRnFhGAImsvlGWGNNZS+lH6NentvX+nr+FQguRmhp93SstmYr6VpOE2Eb4BhOPiyW4HVfRmUH/FdgjIIok8M882z1mZj909+KAqS8pEkJFuGXNOUEYAusYHKq55Z1/OYhdxlrtX7SbF+kvkJXIA4iVqqgB9BLD9FpJGP5KJpGfYZTXcUtM0nLvJ1cDrJXgmtTvyXMsD9Fnblh4+VQzXmaI/FY5IiWC52KNnBG4dOWWANv3kZU93mB1YpplxqrfZoHmV4QtkBLGoHsRfMXr2Z7VjyN/vq7nCS0OkYvik/o/v3L12YNws2iCG7jUlYekYLZMdhT6TM5XuvtJwHvxcCPO/jpnyqicZ/RlHgQRNyZThjMy2MKYT1IBmNAuufhhRaOqT8MgzeSbTA+Zxe8WMn5WhK/wcPOt2t/seZEPQwG3lNPi8+nc8n1OC2ER9NryHumL03vFvan2RKtplFgB5wklNYQ62yKYgYV/uzH0rROntDKjtkEmijviQHta0V8ol5c06y07K7gTJy/LKVtWoYDxP1fg2N+KDEkHm6k3WoRI9u6LZWJBQraaIeEg3/RMSSj9IUX9L9qV+t+hIqLM0kUHIhhE0hXBOMHQkk5f5GzXwCHY91yUsn2r5Zf6RZ6yQ0vLobFaWc05 GCM1+dR8 KagNMY1AMfvnAvCozXBNWqBB0rlE4hpXF/mBBYNcvyVlxtV/a7yIc/m0oYOWZV3/qaOpnvygIZyG1vtMHxIhgryhIQlrnDL03wLhv/23IyLirBQps4Pj254wWEQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, May 17 2023 at 15:43, Mark Rutland wrote: > On Wed, May 17, 2023 at 12:31:04PM +0200, Thomas Gleixner wrote: >> The way how arm/arm64 implement that in software is: >> >> magic_barrier1(); >> flush_range_with_magic_opcodes(); >> magic_barrier2(); > > FWIW, on arm64 that sequence (for leaf entries only) is: > > /* > * Make sure prior writes to the page table entries are visible to all > * CPUs, so that *subsequent* page table walks will see the latest > * values. > * > * This is roughly __smp_wmb(). > */ > dsb(ishst) // AKA magic_barrier1() > > /* > * The "TLBI *IS, " instructions send a message to all other > * CPUs, essentially saying "please start invalidating entries for > * " > * > * The "TLBI *ALL*IS" instructions send a message to all other CPUs, > * essentially saying "please start invalidating all entries". > * > * In theory, this could be for discontiguous ranges. > */ > flush_range_with_magic_opcodes() > > /* > * Wait for acknowledgement that all prior TLBIs have completed. This > * also ensures that all accesses using those translations have also > * completed. > * > * This waits for all relevant CPUs to acknowledge completion of any > * prior TLBIs sent by this CPU. > */ > dsb(ish) // AKA magic_barrier2() > isb() > > So you can batch a bunch of "TLBI *IS, " with a single barrier for > completion, or you can use a single "TLBI *ALL*IS" to invalidate everything. > > It can still be worth using the latter, as arm64 has done since commit: > > 05ac65305437e8ef ("arm64: fix soft lockup due to large tlb flush range") > > ... as for a large range, issuing a bunch of "TLBI *IS, " can take a > while, and can require the recipient CPUs to do more work than they might have > to do for a single "TLBI *ALL*IS". And looking at the changelog and backtrace: PC is at __cpu_flush_kern_tlb_range+0xc/0x40 LR is at __purge_vmap_area_lazy+0x28c/0x3ac I'm willing to bet that this is exactly the same scenario of a direct map + module area flush. That's the only one we found so far which creates insanely large ranges. The other effects of coalescing can still result in seriously oversized flushs for just a couple of pages. The worst I've seen aside of that BPF muck was a 'flush 2 pages' with an resulting range of ~3.8MB. > The point at which invalidating everything is better depends on a number of > factors (e.g. the impact of all CPUs needing to make new page table walks), and > currently we have an arbitrary boundary where we choose to invalidate > everything (which has been tweaked a bit over time); there isn't really a > one-size-fits-all best answer. I'm well aware of that :) Thanks, tglx