From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8DBD5C4332F for ; Thu, 14 Dec 2023 15:48:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 206B08D00CC; Thu, 14 Dec 2023 10:48:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1B2328D00C7; Thu, 14 Dec 2023 10:48:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 052ED8D00CC; Thu, 14 Dec 2023 10:48:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id E46418D00C7 for ; Thu, 14 Dec 2023 10:48:31 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id C35EFC016A for ; Thu, 14 Dec 2023 15:48:31 +0000 (UTC) X-FDA: 81565856022.05.A8EF34E Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf07.hostedemail.com (Postfix) with ESMTP id B41714001F for ; Thu, 14 Dec 2023 15:48:29 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=YJqaV9dj; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf07.hostedemail.com: domain of lpieralisi@kernel.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=lpieralisi@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702568910; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OFIXoGFiJ6O003dqQoefuDizaZesakInyBHhNzQtVUs=; b=bEKLPT0gAmi6wXaHMwWcYR/xUoPQ1AYV13woeyPsKEAbFUUvjZ+qwdxKVBkQ3IP29EBdTe 5DmueeNRycvV1tGzw215KiUkxajzHz0GbCerUTuyKAMYbnDhnBNEvKQQevdCwGfojceN5X LiitluU7kHAHWJH7F4HOMQn/sQ8nBgs= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=YJqaV9dj; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf07.hostedemail.com: domain of lpieralisi@kernel.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=lpieralisi@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702568910; a=rsa-sha256; cv=none; b=qGZKdBLYPjSftaUDO6gEYC1MXlv7VBLXejZkD5fIdiJ46VEtqQm7PmqIOcLMO4Pj/gLcWO K7oHLyPK9LMa0TUVx3hTyMaCxXiq2rGL8ggaKthj1LDDN3Yw3zGpkllCBam6G/hNfgzR3/ a3dPO62+CJwtfaeP8lhr3FZwWiBIrkI= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by ams.source.kernel.org (Postfix) with ESMTP id D7481B82123; Thu, 14 Dec 2023 15:48:27 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 57E61C433C8; Thu, 14 Dec 2023 15:48:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1702568907; bh=c5Hrhu9jv1aI02dxQqDlqaubcSiytClNWLM8d4C/kvI=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=YJqaV9djsO/05rDV52LNkS1/r3WX1iHi+z4LvSF0/Di2UKXcchnv6sNH/cOKDIuBx 6MpAsvRG7pW6buIuMSK5/WrN8+vWJn03EpganGs7uk0UJSNmk2VfbrrARLeUyrYLqb DrjO+9QjEGyJkOWwoGkiMkxi93cpzZJOeLBi7crnAeQEDvyrV7Fqr9tTj4usU4fsz4 p3KC5eQByNZx19W2DV9ABOEKUr2SF0bWlWuEEOK/zS5y0Dn+xhk7PZ7JYQ0r3ZrKAL FeGwO/Bi3QkCYVvPEXbACz4VsJMvz7XU3430O1c+P0LxSe2lJjXSwOo5SYUFY2MP/w pFc7cqInVHFIQ== Date: Thu, 14 Dec 2023 16:48:15 +0100 From: Lorenzo Pieralisi To: Oliver Upton Cc: Jason Gunthorpe , Catalin Marinas , ankita@nvidia.com, maz@kernel.org, suzuki.poulose@arm.com, yuzenghui@huawei.com, will@kernel.org, alex.williamson@redhat.com, kevin.tian@intel.com, yi.l.liu@intel.com, ardb@kernel.org, akpm@linux-foundation.org, gshan@redhat.com, linux-mm@kvack.org, aniketa@nvidia.com, cjia@nvidia.com, kwankhede@nvidia.com, targupta@nvidia.com, vsethi@nvidia.com, acurrid@nvidia.com, apopple@nvidia.com, jhubbard@nvidia.com, danw@nvidia.com, mochs@nvidia.com, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, james.morse@arm.com Subject: Re: [PATCH v3 2/2] kvm: arm64: set io memory s2 pte as normalnc for vfio pci devices Message-ID: References: <20231208164709.23101-1-ankita@nvidia.com> <20231208164709.23101-3-ankita@nvidia.com> <20231212181156.GO3014157@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: B41714001F X-Stat-Signature: qnnbw7i5p4eokohtto5t7uio4z975r3i X-Rspam-User: X-HE-Tag: 1702568909-108826 X-HE-Meta: U2FsdGVkX19VhlhGvXl/Of+av6Bs6gSWKG2m5FE2jmDhKn52OhOIzKgTPF/MjmD87zAP9AgStJvKovy04xkXSRwAuMPQe8CTrdNIsCxBz26FNBB6HGjRTrHDKAcptCooPosgSssM42Fq/wpC8cCqEiH99aolMT8948e1cQsT48nUBuBb6Ir+OHCQwE3CtkxYv7eqPSCZvSgt/PYyFSmhl7KriD/u7QcOzzpq4lw4piS4awKfUr1TDOkg7aNUDqR5WAC/tqoIUC4/sT+BfNnPD8AcNpn0eq3YQB7wHbEa7Yy0MPAWOEudpOF8fsAHHlrp0atUtU8y7xBsGIiGglccCQurPfTAp5v+M734nh7/P1GgTmoJb9kKMCq2S8mtqyHj3EPO9SolS0MNmUwpDgmxFRD8nnPyTZtmBCBoZNZbmgVHJGuU44/SRPkz8FJCwY8jjD24P3/xcV0lje0n39ovIv1sLMY+g5a2TX8z3x/Fceo0GMH9VdsjhI5w+YlbgW4O6JDOYgxShvVFdkQU1iUORkk7r5mm4Hv5kKTflr0VScKE0Z4Hnnovi100RPr8ZcDOmm7ssRfkQ9IuOd7FkbUZ9+N37DaKfInoSs0pXivlupUS6/EXRr2rrcTFScMZZG+DXkx2bl+TCODXP3BiwNLE97ZvNYXzF4WyVk5kQGyRnR81z4HUigTLlt/INlZ2XjCb8HoLiLdSy+NsQjP1TtVgYVpCCN54bEV6Ui7TYG2+Krljc1wIdQ7yQlhi0yWvzxXmGNgPNbUbS7eOLh6ZlzoNm4zCEpCr89yu7yMlvwwpBL7Ss0B7sSMFYTfN5Roqfxaov086dtk0xwZHFsN51h7XDBqYQNLalLKE3fvrR8cSKSXmDCQ4vY5NuAKyJFnwTsjliED4JPF/K/zsCgcxep1z4XgSDPnhAyE4Wep1++g7WCsMHc4x7QEN3FjZ3HjQ4MwpQkJv6lSsCer6RILeYFQ zShDhZcx c1SQSufCXeZID07pfTijkAlncBcaA4b/75DxSnKWY33EWynfMgBhRXxZus4JkrQqJ7tWnEbyOhh99oefoaKuBZvBcMMkdSpJ+JJV2YwNcdinEvgQM4BM0g78L+YoToHv7t9UlgRmcUEEymUa7xXPJxym/0kPpp5qeYnCnKSr4dCNtIzHHvgCzdXS9L1PLO+4jXBKU X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: [+James] On Wed, Dec 13, 2023 at 08:05:29PM +0000, Oliver Upton wrote: > Hi, > > Sorry, a bit late to the discussion :) > > On Tue, Dec 12, 2023 at 02:11:56PM -0400, Jason Gunthorpe wrote: > > On Tue, Dec 12, 2023 at 05:46:34PM +0000, Catalin Marinas wrote: > > > should know the implications. There's also an expectation that the > > > actual driver (KVM guests) or maybe later DPDK can choose the safe > > > non-cacheable or write-combine (Linux terminology) attributes for the > > > BAR. > > > > DPDK won't rely on this interface > > Wait, so what's the expected interface for determining the memory > attributes at stage-1? I'm somewhat concerned that we're conflating two > things here: > > 1) KVM needs to know the memory attributes to use at stage-2, which > isn't fundamentally different from what's needed for userspace > stage-1 mappings. > > 2) KVM additionally needs a hint that the device / VFIO can handle > mismatched aliases w/o the machine exploding. This goes beyond > supporting Normal-NC mappings at stage-2 and is really a bug > with our current scheme (nGnRnE at stage-1, nGnRE at stage-2). > > I was hoping that (1) could be some 'common' plumbing for both userspace > and KVM mappings. And for (2), any case where a device is intolerant of > mismatches && KVM cannot force the memory attributes should be rejected. > > AFAICT, the only reason PCI devices can get the blanket treatment of > Normal-NC at stage-2 is because userspace has a Device-* mapping and can't > speculatively load from the alias. This feels a bit hacky, and maybe we > should prioritize an interface for mapping a device into a VM w/o a > valid userspace mapping. FWIW - I have tried to summarize the reasoning behind PCIe devices Normal-NC default stage-2 safety in a document that I have just realized now it has become this series cover letter, I don't think the PCI blanket treatment is related *only* to the current user space mappings (ie BTW, AFAICS it is also *possible* at present to map a prefetchable BAR through sysfs with Normal-NC memory attributes in the host at the same time a PCI device is passed-through to a guest with VFIO - and therefore we have a dev-nGnRnE stage-1 mapping for it. Don't think anyone does that - what for - but it is possible and KVM would not know about it). Again, FWIW, we were told (source Arm ARM) mismatched aliases concerning device-XXX vs Normal-NC are not problematic as long as the transactions issued for the related mappings are independent (and none of the mappings is cacheable). I appreciate this is not enough to give everyone full confidence on this solution robustness - that's why I wrote that up so that we know what we are up against and write KVM interfaces accordingly. > I very much understand that this has been going on for a while, and we > need to do *something* to get passthrough working well for devices that > like 'WC'. I just want to make sure we don't paint ourselves into a corner > that's hard to get out of in the future. That makes perfect sense, see above, if there is anything we can do to clarify we will, in whatever shape it is preferred. Thanks, Lorenzo