From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF89DC04FF0 for ; Sat, 13 Apr 2024 09:56:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DBAEF6B007B; Sat, 13 Apr 2024 05:56:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D6B456B0082; Sat, 13 Apr 2024 05:56:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C0BE96B0083; Sat, 13 Apr 2024 05:56:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id A23A36B007B for ; Sat, 13 Apr 2024 05:56:32 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 19B0B811F1 for ; Sat, 13 Apr 2024 09:56:32 +0000 (UTC) X-FDA: 82004053824.10.3B14F84 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf17.hostedemail.com (Postfix) with ESMTP id 33ACF40009 for ; Sat, 13 Apr 2024 09:56:30 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=mFZQ2ioQ; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf17.hostedemail.com: domain of maz@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=maz@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713002190; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6Kj+FZBDnM3JfrVs3useGrN869nfwptphSyqM0KR2IU=; b=rEaExDf8BYvLcGag3f/U0RcrQ1AWOyC4Y8K1fFT1TObNb4hpJ1/Cvqde1Y5PaznHhhBF1e NuG38LM3L1dTo5zPX63vs+a1A2U0ZaQikQNdMLapHT2WtVNCBqTDTQ+hPlGBgFJcjqEzNW /WnP2UK4+pq79V4b3C6dt/Tnh1HZd8I= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=mFZQ2ioQ; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf17.hostedemail.com: domain of maz@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=maz@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713002190; a=rsa-sha256; cv=none; b=Vb5C2L6cunQ1UL/u3q0ZM7WepLZZYDkRXeLggdDoMOBk4fxeJK4AzyA1E6lvl1IXMKSWKA 9WWpr6YRa/2jbWP9evaUhB2WCyA9SmUB9KDO+v7I0ScNl4GFZ5UGpolL00C5qLcJ193rEV hqAMIG1yU65y3gABf/+1SPRpOLvJFzs= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 08B8A601FE; Sat, 13 Apr 2024 09:56:29 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9FB85C4AF55; Sat, 13 Apr 2024 09:56:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1713002188; bh=QDdFeOtdYZxjp46RxGOVjOsa4iTVpoIZG/FPIUq7PkU=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=mFZQ2ioQYI4SPEajzb0CoDJ+DBpEtqV/6Z/BFQ4PKiYU1objwuCL1h0YEV5wVvaaV CStVGbxbg2PLEvUpiruu3vkH/y4cM4+cyaR6uTFrqiuCgSAt4YmdU+kQNO4mhteYE5 q333pUGrVdvWTX2jJozMVr1WfreRxN9CZWgNdxAf4RoHKNGu901fhZ7/hHch++3d1I m2eFAZSX3y2YCFXBDOjGGbQYp+igVsDeuNqTryAXEm0o7f2cHg2zXaMLKr1soTFZ2j ofMp5Ohw4rwS2q0YaHgiHwMQ7+YkitOG5MjuKApjtCEFb8HYoNf/A99GWK9EyWwGw/ 8+Y5NXUY8UrNA== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1rva7W-0047Ih-AA; Sat, 13 Apr 2024 10:56:26 +0100 Date: Sat, 13 Apr 2024 10:56:25 +0100 Message-ID: <86h6g5si0m.wl-maz@kernel.org> From: Marc Zyngier To: Sean Christopherson Cc: Will Deacon , Paolo Bonzini , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Oliver Upton , Tianrui Zhao , Bibo Mao , Thomas Bogendoerfer , Nicholas Piggin , Anup Patel , Atish Patra , Andrew Morton , David Hildenbrand , linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, loongarch@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org Subject: Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback In-Reply-To: References: <20240405115815.3226315-1-pbonzini@redhat.com> <20240405115815.3226315-2-pbonzini@redhat.com> <20240412104408.GA27645@willie-the-truck> <86jzl2sovz.wl-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/29.2 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: seanjc@google.com, will@kernel.org, pbonzini@redhat.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, oliver.upton@linux.dev, zhaotianrui@loongson.cn, maobibo@loongson.cn, tsbogend@alpha.franken.de, npiggin@gmail.com, anup@brainfault.org, atishp@atishpatra.org, akpm@linux-foundation.org, david@redhat.com, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, loongarch@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-Rspamd-Queue-Id: 33ACF40009 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 93ucqru61u5z1yj5bfqja6tjtugm5917 X-HE-Tag: 1713002190-613682 X-HE-Meta: U2FsdGVkX1+Ry3HgicfHNwSbseGKoF2+3Z6uS+iV4wfxyQc1YTITuSxe65Hga/Y41QWJj66Kk3wMLjv190cyjyQuUy6ZQIbBtwsHcRxGW42z2/o5s0UnwR5r050EIwDqIgljeAlWrXDtwcpeg4xa0/YlOGe0Edw5IeKJkfHXiNKhddfZ3h3eQZsdwMw0arGCHFYdY7QULFhgkVgkYF1WrxIclyfy2EeiMVg1I0WCblTDWNAMvgI+3caDLLF8it/2Waf6bF8JS4VBVELmwYL7Qm29+Eq7d6m8slerOuh/UE7zXEndjU2qCdT+q/4FBZujk/oJawmM6Qd39Ayv2VJVsi9HoCXIwQzUNAGCG94ESheEFvSB6RZFokj/rR1bJs8lJhsn+6wGNYJhRsZrJOgxPS8bxfF1Jo0pX8L+1PW7tDiwgshKkTn5c1zWLffjUWVTk37fJtcOjZlLVxb1KAj1K8vElaTqtMZyBEZJDXf8nBuBg2O/ZI4Xp9y18Zb0veZiJ9HSTO0bZcXiSN1EdZyn9UDm+uGfxFlnkkZ98SmdzmiNF+gt5RNFrKbA9PXQaHCbCidYd6F9duphYoO7jl8GtLFgld+pwZIrVwIw0dx5G4jjwzEMlb/04TUlLXcqaRCkjgmTHN/sY3IxTNsGrpFqRaadsX4QzrbKRLuvZBfRnOIRQWRQyw3ac7ADYmrdHSa4lG5iFwTdL4q2EIfZki0PuB47DmYmwloCvA8UOUIWG2YwU9vMzD15eXGevXTLX924mzM+D6UorAZg6FODZl3EFwetkMYMlfkNcQFIYpBhFt5kyGAPSAPvrrVytZg22jNM2eNM882QcBBrya23dxlNNzGkdznk/+44L7I6q5ut0GxgDfsQo11g3TQLPuV2kTU1inOp4398pEUT5SMt02NYyxa/Ha624g7i0FYlZTwQZA7LuAu9WpumjIJmMKGB3MPoGuSeZixn2eVBJaYGngB 6Q2pwxHc 4gj4x0QPjefISMkriS4DB13R5PBSJzPAgDX4F3YX/1AThF99m22yj0iP+0kRYg4wsrfNZT5cUJqv8jVOaSFzG4xP/C1P3D5czT4Rec63N8XAd27HPwV8v2zUwZTW/AuaZs8CXE1HYE9jEDNFE0S60goSSLiRDzZKb3gemGW9fpWVsYTje2doe6GL/yZxH5HYjQIiobH+Yyg1umjc7uz21ZHiE9d4gO41vuaZWfMi307vc9+Skr+IZ9pmkL81C/M4Ds39hxvc8I5w6FNsAK9x6GGQ1VZuPwv8qtcKwr7SaDXxKYL13cnrglzRFkyZH6LB2ueYs X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, 12 Apr 2024 15:54:22 +0100, Sean Christopherson wrote: > > On Fri, Apr 12, 2024, Marc Zyngier wrote: > > On Fri, 12 Apr 2024 11:44:09 +0100, Will Deacon wrote: > > > On Fri, Apr 05, 2024 at 07:58:12AM -0400, Paolo Bonzini wrote: > > > Also, if you're in the business of hacking the MMU notifier code, it > > > would be really great to change the .clear_flush_young() callback so > > > that the architecture could handle the TLB invalidation. At the moment, > > > the core KVM code invalidates the whole VMID courtesy of 'flush_on_ret' > > > being set by kvm_handle_hva_range(), whereas we could do a much > > > lighter-weight and targetted TLBI in the architecture page-table code > > > when we actually update the ptes for small ranges. > > > > Indeed, and I was looking at this earlier this week as it has a pretty > > devastating effect with NV (it blows the shadow S2 for that VMID, with > > costly consequences). > > > > In general, it feels like the TLB invalidation should stay with the > > code that deals with the page tables, as it has a pretty good idea of > > what needs to be invalidated and how -- specially on architectures > > that have a HW-broadcast facility like arm64. > > Would this be roughly on par with an in-line flush on arm64? The simpler, more > straightforward solution would be to let architectures override flush_on_ret, > but I would prefer something like the below as x86 can also utilize a range-based > flush when running as a nested hypervisor. > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index ff0a20565f90..b65116294efe 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -601,6 +601,7 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm, > struct kvm_gfn_range gfn_range; > struct kvm_memory_slot *slot; > struct kvm_memslots *slots; > + bool need_flush = false; > int i, idx; > > if (WARN_ON_ONCE(range->end <= range->start)) > @@ -653,10 +654,22 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm, > break; > } > r.ret |= range->handler(kvm, &gfn_range); > + > + /* > + * Use a precise gfn-based TLB flush when possible, as > + * most mmu_notifier events affect a small-ish range. > + * Fall back to a full TLB flush if the gfn-based flush > + * fails, and don't bother trying the gfn-based flush > + * if a full flush is already pending. > + */ > + if (range->flush_on_ret && !need_flush && r.ret && > + kvm_arch_flush_remote_tlbs_range(kvm, gfn_range.start > + gfn_range.end - gfn_range.start + 1)) > + need_flush = true; > } > } > > - if (range->flush_on_ret && r.ret) > + if (need_flush) > kvm_flush_remote_tlbs(kvm); > > if (r.found_memslot) I think this works for us on HW that has range invalidation, which would already be a positive move. For the lesser HW that isn't range capable, it also gives the opportunity to perform the iteration ourselves or go for the nuclear option if the range is larger than some arbitrary constant (though this is additional work). But this still considers the whole range as being affected by range->handler(). It'd be interesting to try and see whether more precise tracking is (or isn't) generally beneficial. Thanks, M. -- Without deviation from the norm, progress is not possible.