From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C63ACC83F17 for ; Thu, 10 Jul 2025 10:58:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1AC4A6B008A; Thu, 10 Jul 2025 06:58:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 15CF96B008C; Thu, 10 Jul 2025 06:58:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 050DC6B0092; Thu, 10 Jul 2025 06:58:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id E67346B008A for ; Thu, 10 Jul 2025 06:58:54 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 908C5140155 for ; Thu, 10 Jul 2025 10:58:54 +0000 (UTC) X-FDA: 83648057388.03.6A5437A Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.19]) by imf17.hostedemail.com (Postfix) with ESMTP id C5BB940003 for ; Thu, 10 Jul 2025 10:58:51 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Ai+sIxUO; spf=none (imf17.hostedemail.com: domain of yilun.xu@linux.intel.com has no SPF policy when checking 198.175.65.19) smtp.mailfrom=yilun.xu@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752145132; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ybHTCcADUJYpKXFCtlVFwLIeqCaS6XfB/4XDf0MORDQ=; b=vVgx4LC9Fp0wV42RA1WtuZCTIq2U448j4ubqAuz5j4PCcXxTEkFNt5A3WOPxtpwkJeyrxc cRKNRQhJj8sl19AZz6pYPrPAGTjvXDo1/kTBSNVDvFprdBS6L7gx5o+T0GOKRmIQqolvoJ 6viAszFevOhOF0/as127/JrojTEjJjs= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Ai+sIxUO; spf=none (imf17.hostedemail.com: domain of yilun.xu@linux.intel.com has no SPF policy when checking 198.175.65.19) smtp.mailfrom=yilun.xu@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752145132; a=rsa-sha256; cv=none; b=lQ8ixd4NmY2NW59ihMFbvvUNigNdeVwcuWC3dHwI9x1xwCzUfAjd0PaPl1qJ+nmIvWcJOS orj+yZeDrlEZASWfhUQn3QVqExx4JyvqKI92As9sYwr3lRoZtl4TzqpjfCumU+fHa3Zgfz EhFBo3amk4zXfJgBuQORWY+yP9Tf2qE= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1752145132; x=1783681132; h=date:from:to:cc:subject:message-id:references: mime-version:content-transfer-encoding:in-reply-to; bh=9k5W5ZlQ8WnZ+a92Zh9oGjnC5+4ALfoaYoiYSen1FNs=; b=Ai+sIxUO+YC0Yne7rKaXPHwqWA46NrpGEq8ILXtdKSgmMq0Qvt2xfeLX fvSw0ljYjuHsJ1TpiodnEZyJeg5LP3ESf2szVqUi98H/k6pIS4BbQ84h1 RatH2f/25gKgTURa4SpsHY6kzGDy/Y7vS0tGeo6nVfJTLIihKnFtpPIoQ b2xHo4raS25jyXyrFVKUuNuMpCVL04SmCs2V0F2pzOJ5uqudu/ptZ0MAf 6+vicvMwmg14v1JII/nsc1x1j8IxTA0ZxVxdRFZ+kl/fHCEStJ6P0NWNl 7W1xHVeU8Axt9227rHjOVcPhOXts1xqsigPGxjqbPSmM5m+ExKkfkpIk8 A==; X-CSE-ConnectionGUID: CT6QiQSnT9WMDNyyGg4axQ== X-CSE-MsgGUID: CnkNgZURScq66F7nqrdzIA== X-IronPort-AV: E=McAfee;i="6800,10657,11489"; a="54273853" X-IronPort-AV: E=Sophos;i="6.16,300,1744095600"; d="scan'208";a="54273853" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by orvoesa111.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jul 2025 03:58:49 -0700 X-CSE-ConnectionGUID: GVMNZSOxTjSy6hBum/1yyA== X-CSE-MsgGUID: TMRuTq9TRJ2zPj9tdZWeaQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,300,1744095600"; d="scan'208";a="187047312" Received: from yilunxu-optiplex-7050.sh.intel.com (HELO localhost) ([10.239.159.165]) by orviesa002.jf.intel.com with ESMTP; 10 Jul 2025 03:58:29 -0700 Date: Thu, 10 Jul 2025 18:50:09 +0800 From: Xu Yilun To: Vishal Annapurve Cc: Jason Gunthorpe , Yan Zhao , Alexey Kardashevskiy , Fuad Tabba , Ackerley Tng , kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Subject: Re: [RFC PATCH v2 04/51] KVM: guest_memfd: Introduce KVM_GMEM_CONVERT_SHARED/PRIVATE ioctls Message-ID: References: <9502503f-e0c2-489e-99b0-94146f9b6f85@amd.com> <20250624130811.GB72557@ziepe.ca> <20250702141321.GC904431@ziepe.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspam-User: X-Stat-Signature: 4cb3c9sa9xcdsec7pn3qbsgrnjs8uof7 X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: C5BB940003 X-HE-Tag: 1752145131-885164 X-HE-Meta: U2FsdGVkX18ScuaYIyxh94kVL3+iqnNZqOW+ltPHEboya0DSXtu3EUJ7SgUiwbRFKKDCcbXvq0kwEOu49NhN+l6391nSSyEiX46oltl4AOkxVgv3azWJlD0GEKekck8H7TZVRzDIWqKRM6Bff55TztjAEypXwo6Rd2J3vGR39cFPi1k1NPgj+ikJF5/LO9hlzae7cphNWpnnrzjfHu6Xklm9eApOfYrKXa6HeZkoEaDY3VfqBkPVZmjCOdhlzuYCK5YGIKTQgO6NZy3OHOu8/laztSNdf4rnpHf6kQUWbngNkOOSrSrJTPHyD7qlT44F8XcPH3nWj/Hprq2ki//NsdGvmt49JGHzIj8L1kHbkEof9wy2cBD5xGcfqtvi43LkY8eTGxmHuMLCRCkJ13GcjEzIJfSRP3Ai7AnrIMEJizo1hWrlY63epVbg7w6h7Iw0eIrIQydKqT1GOCqbD59t1PCrSa6HhH+/JwTKvSEIkW/1d4D6JWLKKJdWM47z8IotHyKF5xh/CMvd4K2xKw3uxQfYGUi2k5isVArp6UaA8Qe5v2uWijQ2fdapD/leWpvSTYRLQvIZmX4jdxB7b4Cg+WFs7+NuDMOk8zbAH1bl/t0zLsBLQT0t/Q6tzJEgDGkXV3IW5JLGf2RGHmQ8HFKYheg4kBQz2CrmJOWnrTUWlOPhq1hTw8DPAJ6n1Gqw8SnmzEInHjm8Vu9EXNajHVNsAyXYt3Iez3MLE026JFXC1cZd3/6xEKoeKBuErEpxG15BQ0IkZhAlfrtt1N5Gm92s0YOXBNbIvKAvKSA4jDb0BxKUkwS6lJSo1W6owBLLAT6ahEQtRDzpGdqXcg6UvcxO8Re+E0jSC/BFhgu/q/qs2ccgTbhh1kl/PNyQTp+IZR4QAoqddv68i7EX1qk9Hm/jdC2/dNukcHLf0nX44nHFqFKEZy48+WSmyZWv4YGCRD7dCOCrBsE46SM9lZRMghl mAv1gvQS iNCtOUfifVd7Qn/x1OwdBMbxJxecWx+ZyZkXSmeh2MsPv/1wvKsXENRq1C2VuEL/5KO2vb98UvGBlP6CwP0JHTJlWUFzN+TNoa5zEHEU7JKHfxUBmYVA4CPYiPfJMSu3aWtlkAYjWy16MZVkjHI9/Gn4po3WUQnDoj/latpCYvdqy52pR6dvet+/TglhFz+ZIgptTADjeEsyDSCoNrc0UVgMI4V4jCHrpBxHeIV2mD4znhtxphW1ncojGf+yTPeZLDdnNuNx739ZDiuJo8STHnCXvimI56SfJKn1s4dJ0Sbg1w3ByftLH/yMtNqBjFVGrq/80wVKFcDD5PLufGfoIwhrox0TqkSdFFPP9gP/Ph3rD2bDJ3+ntZvSW+L3x5GuMpDPFt1dXicY5tvmV2VAMZztQs+2uUxtSpwvQun5YGgowA89vvjWHByDgCPh1rRec1sI4qF6eBVzXvw8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jul 02, 2025 at 07:32:36AM -0700, Vishal Annapurve wrote: > On Wed, Jul 2, 2025 at 7:13 AM Jason Gunthorpe wrote: > > > > On Wed, Jul 02, 2025 at 06:54:10AM -0700, Vishal Annapurve wrote: > > > On Wed, Jul 2, 2025 at 1:38 AM Yan Zhao wrote: > > > > > > > > On Tue, Jun 24, 2025 at 07:10:38AM -0700, Vishal Annapurve wrote: > > > > > On Tue, Jun 24, 2025 at 6:08 AM Jason Gunthorpe wrote: > > > > > > > > > > > > On Tue, Jun 24, 2025 at 06:23:54PM +1000, Alexey Kardashevskiy wrote: > > > > > > > > > > > > > Now, I am rebasing my RFC on top of this patchset and it fails in > > > > > > > kvm_gmem_has_safe_refcount() as IOMMU holds references to all these > > > > > > > folios in my RFC. > > > > > > > > > > > > > > So what is the expected sequence here? The userspace unmaps a DMA > > > > > > > page and maps it back right away, all from the userspace? The end > > > > > > > result will be the exactly same which seems useless. And IOMMU TLB > > > > > > > > > > As Jason described, ideally IOMMU just like KVM, should just: > > > > > 1) Directly rely on guest_memfd for pinning -> no page refcounts taken > > > > > by IOMMU stack > > > > In TDX connect, TDX module and TDs do not trust VMM. So, it's the TDs to inform > > > > TDX module about which pages are used by it for DMAs purposes. > > > > So, if a page is regarded as pinned by TDs for DMA, the TDX module will fail the > > > > unmap of the pages from S-EPT. > > > > I don't see this as having much to do with iommufd. > > > > iommufd will somehow support the T=1 iommu inside the TDX module but > > it won't have an IOAS for it since the VMM does not control the > > translation. I partially agree with this. This is still the DMA Silent drop issue for security. The HW (Also applicable to AMD/ARM) screams out if the trusted DMA path (IOMMU mapping, or access control table like RMP) is changed out of TD's expectation. So from HW POV, it is the iommu problem. For SW, if we don't blame iommu, maybe we rephrase as gmemfd can't invalidate private pages unless TD agrees. > > > > The discussion here is for the T=0 iommu which is controlled by > > iommufd and does have an IOAS. It should be popoulated with all the > > shared pages from the guestmemfd. > > > > > > If IOMMU side does not increase refcount, IMHO, some way to indicate that > > > > certain PFNs are used by TDs for DMA is still required, so guest_memfd can > > > > reject the request before attempting the actual unmap. > > > > This has to be delt with between the TDX module and KVM. When KVM > > gives pages to become secure it may not be able to get them back.. Just to be clear. With In-place conversion, it is not KVM gives pages to become secure, it is gmemfd. Or maybe you mean gmemfd is part of KVM. https://lore.kernel.org/all/aC86OsU2HSFZkJP6@google.com/ > > > > This problem has nothing to do with iommufd. > > > > But generally I expect that the T=1 iommu follows the S-EPT entirely > > and there is no notion of pages "locked for dma". If DMA is ongoing > > and a page is made non-secure then the DMA fails. > > > > Obviously in a mode where there is a vPCI device we will need all the > > pages to be pinned in the guestmemfd to prevent any kind of > > migrations. Only shared/private conversions should change the page > > around. Only *guest permitted* conversion should change the page. I.e only when VMM is dealing with the KVM_HC_MAP_GPA_RANGE hypercall. Not sure if we could just let QEMU ensure this or KVM/guestmemfd should ensure this. Thanks, Yilun > > Yes, guest_memfd ensures that all the faulted-in pages (irrespective > of shared or private ranges) are not migratable. We already have a > similar restriction with CPU accesses to encrypted memory ranges that > need arch specific protocols to migrate memory contents. > > > > > Maybe this needs to be an integral functionality in guestmemfd? > > > > Jason >