From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05FB1C282EC for ; Tue, 18 Mar 2025 19:27:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1FB41280002; Tue, 18 Mar 2025 15:27:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 185B9280001; Tue, 18 Mar 2025 15:27:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F410E280002; Tue, 18 Mar 2025 15:27:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id D376D280001 for ; Tue, 18 Mar 2025 15:27:39 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id D3A7A1A1004 for ; Tue, 18 Mar 2025 19:27:39 +0000 (UTC) X-FDA: 83235656238.28.5EB082E Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf20.hostedemail.com (Postfix) with ESMTP id 1A7CE1C0009 for ; Tue, 18 Mar 2025 19:27:37 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=arm.com (policy=none); spf=pass (imf20.hostedemail.com: domain of cmarinas@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=cmarinas@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742326058; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=D/T920A32rbyZA6cdVFBZ76jV2OTCg7TneFl2r+1b4g=; b=7XTgoJ/e4MbM+r/PjOT6v6tcQmxGZ+D8BUmFrEDi+rBWwu5Y9f9xv8w27hUTLqAy0P9E9+ WWtJZx+lDKurWLrh0viDWYyHhgbBqtufZsUSGg1BFUpRK/HukKPLUmJaKIcvoIdPUiQvWw reAaQmaYjNxX9Y37GwZeV8jc2Gw3YFY= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=arm.com (policy=none); spf=pass (imf20.hostedemail.com: domain of cmarinas@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=cmarinas@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742326058; a=rsa-sha256; cv=none; b=48gj/qLPMJctpiwhUo5OJ9uMkJV2W18CEiyoxkXMaR3VR9Hur4J3c1c0nQlQJJaN5jWOhw XJx+sp00jsij3S+gSvTsiNp76+tF+44a/V6lbl/16tvJL8mXwPNCQ+u2zxx8QlJSl0UqPG qAQq2tB+MqmYkqdBmLaCRamZJzWyrCU= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id E5CD45C4365; Tue, 18 Mar 2025 19:25:19 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D8352C4CEEE; Tue, 18 Mar 2025 19:27:29 +0000 (UTC) Date: Tue, 18 Mar 2025 19:27:27 +0000 From: Catalin Marinas To: Jason Gunthorpe Cc: Marc Zyngier , Ankit Agrawal , "oliver.upton@linux.dev" , "joey.gouly@arm.com" , "suzuki.poulose@arm.com" , "yuzenghui@huawei.com" , "will@kernel.org" , "ryan.roberts@arm.com" , "shahuang@redhat.com" , "lpieralisi@kernel.org" , "david@redhat.com" , Aniket Agashe , Neo Jia , Kirti Wankhede , "Tarun Gupta (SW-GPU)" , Vikram Sethi , Andy Currid , Alistair Popple , John Hubbard , Dan Williams , Zhi Wang , Matt Ochs , Uday Dhoke , Dheeraj Nigam , Krishnakant Jaju , "alex.williamson@redhat.com" , "sebastianene@google.com" , "coltonlewis@google.com" , "kevin.tian@intel.com" , "yi.l.liu@intel.com" , "ardb@kernel.org" , "akpm@linux-foundation.org" , "gshan@redhat.com" , "linux-mm@kvack.org" , "ddutile@redhat.com" , "tabba@google.com" , "qperret@google.com" , "seanjc@google.com" , "kvmarm@lists.linux.dev" , "linux-kernel@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" Subject: Re: [PATCH v3 1/1] KVM: arm64: Allow cacheable stage 2 mapping using VMA flags Message-ID: References: <861pv5p0c3.wl-maz@kernel.org> <86r033olwv.wl-maz@kernel.org> <87tt7y7j6r.wl-maz@kernel.org> <8634fcnh0n.wl-maz@kernel.org> <86wmcmn0dp.wl-maz@kernel.org> <20250318125527.GP9311@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250318125527.GP9311@nvidia.com> X-Rspam-User: X-Rspamd-Queue-Id: 1A7CE1C0009 X-Rspamd-Server: rspam08 X-Stat-Signature: r6b78gaqmen5yqcfxqq5dnhwmmceqngi X-HE-Tag: 1742326057-598464 X-HE-Meta: U2FsdGVkX18dN7Be7Vsm8REV7fIOi1IyXQ2jv+y1YjNwLUcfFFuPVzRnqQ/hcpnHNJUSslL4Ttqf2Lg/NTGMyIZc25V/cAFqRVBlAEDghrk2/6w7DToHOHu0NwDk0sRtGxGpGx1aeAns/e4IcXgvb6uRDtjAhWl6RnYWZNZmpgARJdkp6LN53YjNQUcQeIZ4uGgZ2ZCK5ITuGNGKlqEg9OdtyctjDjVydsyZUkIf9c+K9IKi/kMQATmRCdkwq5/sF93wCyAIiL+chXmITTQuqDZv0YsQMBvd11QtD5xlSqiYVFzlPFQfcd6wRjbgrnEjFgGpKdRSKfr77y3iFdoma9FQhd3P7YAwIGnyYew9QusBfTfljAuUN6msuCel7ixIzb3FIttu3qJoh79TmdRepQ+4+Tw4DZDZvwPmG10VTLXY5j1bjQDLRffyskNdo+Y2bkWq+l/pRCR9OBfIXiMb0qOJnPmT7PKUN9bFh/U6IF/V+yUb8SlfH51JE01x+L5gsfhaXlxb1vRLLxv9sgPo9rFflkTry8ErkbRaHmGmD6t4OjYOYOSRncpZRzsVQC3VtR+a4+GqqloWaLmIhvkAE22tTUDjE/xWIFbCA/FDW45rCN1pel0icFd8uWZdB0hEL5xygZEdUOGjG8wfUIKtwNMNaX7Md3l8d49WBtlo8tiuJw3VhJjsbbxWW9mZxgH8ksYMGGQ556Th2nRlC0oh5X6f/ZErpVKcPDXNRDjtz1zERXMHHWPYmqfoIdH4gLgjp4yOUz0lv7zuVsS2qJAfoPXUPHFs6j+Nnwq/2xpesyfmgBLgawwMKf1bOjwCrub5A0r9VI6RjPUHdCwdcfFCaJEzpmuBW6W+XAtMbMZQ4LIaSydy/ypFknYnjx4WTANF7/LJ91F9DywVUm96u6bbvpUAPViT7u1TbBdth6HCTOstcJiLXTCGPasFmFJwILhxHrTxNa+m9M21/UhaNLA cBImnetb +tCziI0QAndqKvNpJdahwxge3KAoUK+YuuphmgjMHIpYmCIAYkPe0bZDvWimq4w2Wt3iFBsQbZQdhjSG603q/RzhgDa516WP8ZcMG7MGpo5qu11g9RDPVDUy02TRLaMcUbICoBecX6kbCLM4e0ljCcyU0id/+Y+F/6dHeElrW5M9ECnCXGrCXM5p/VvbUHg3GpyDSs9IrlmjXkofjI2h4Q9MJar/yannDG7Qn2NDGAkPCaYtG09pG46Cf2lFpNAi4mXk4eIXQ7DXLU0SFtMAgfH8vrWIZFatv5iTkAfz8LzbXCdcnwBMWyCBjdsU9769Z0E7iPhrghHY0VER7POimEmXhJQottdtK2YVJ9Qd0l+ld+pyrUFPXydVwHqc43uh2/DlCv8V05SFMBgY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Mar 18, 2025 at 09:55:27AM -0300, Jason Gunthorpe wrote: > On Tue, Mar 18, 2025 at 09:39:30AM +0000, Marc Zyngier wrote: > > The memslot must also be created with a new flag ((2c) in the taxonomy > > above) that carries the "Please map VM_PFNMAP VMAs as cacheable". This > > flag is only allowed if (1) is valid. > > > > This results in the following behaviours: > > > > - If the VMM creates the memslot with the cacheable attribute without > > (1) being advertised, we fail. > > > > - If the VMM creates the memslot without the cacheable attribute, we > > map as NC, as it is today. > > Is that OK though? > > Now we have the MM page tables mapping this memory as cachable but KVM > and the guest is accessing it as non-cached. I don't think we should allow this. > I thought ARM tried hard to avoid creating such mismatches? This is > why the pgprot flags were used to drive this, not an opt-in flag. To > prevent userspace from forcing a mismatch. We have the vma->vm_page_prot when the memslot is added, so we could use this instead of additional KVM flags. If it's Normal Cacheable and the platform does not support FWB, reject it. If the prot bits say cacheable, it means that the driver was ok with such mapping. Some extra checks for !MTE or MTE_PERM. As additional safety, we could check this again in user_mem_abort() in case the driver played with the vm_page_prot field in the meantime (e.g. in the .fault() callback). I'm not particularly keen on using the vm_page_prot but we probably need to do this anyway to avoid aliases as we can't fully trust the VMM. The alternative is a VM_* flag that says "cacheable everywhere" and we avoid the low-level attributes checking. > > What this doesn't do is *automatically* decide for the VMM what > > attributes to use. The VMM must know what it is doing, and only > > provide the memslot flag when appropriate. Doing otherwise may eat > > your data and/or take the machine down (cacheable mapping on a device > > can be great fun). > > Again, this is why we followed the VMA flags. The thing creating the > VMA already made this safety determination when it set pgprot > cachable. We should not allow KVM to randomly make any PGPROT > cachable! Can this be moved to kvm_arch_prepare_memory_region() and maybe an additional check in user_mem_abort()? Thinking some more about a KVM capability that the VMM can check, I'm not sure what it can do with this. The VMM simply maps something from a device and cannot probe the cacheability - that's a property of the device that's not usually exposed to user by the driver. The VMM just passes this vma to KVM. As with the Normal NC, we tried to avoid building device knowledge into the VMM (and ended up with VM_ALLOW_ANY_UNCACHED since the VFIO driver did not allow such user mapping and probably wasn't entirely safe either). I assume with the cacheable pfn mapping, the whole range covered by the vma is entirely safe to be mapped as such in user space. -- Catalin