From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE0FAC2D0CD for ; Wed, 21 May 2025 18:28:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 800316B0093; Wed, 21 May 2025 14:28:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7AF8B6B0095; Wed, 21 May 2025 14:28:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 677E76B0099; Wed, 21 May 2025 14:28:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 448186B0093 for ; Wed, 21 May 2025 14:28:40 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id ECC53BE9C8 for ; Wed, 21 May 2025 18:28:39 +0000 (UTC) X-FDA: 83467750758.03.641223F Received: from mail-qt1-f173.google.com (mail-qt1-f173.google.com [209.85.160.173]) by imf14.hostedemail.com (Postfix) with ESMTP id F28BC10000F for ; Wed, 21 May 2025 18:28:37 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=x4tlAxSo; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf14.hostedemail.com: domain of tabba@google.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=tabba@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747852118; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=KE+TGfgZP6vAxjKudjK+avRjPu3u2b6Qn6vlewX0Zww=; b=XAIPJ6S7F6Rxr46R5T9+ZPJAeTCXHM0acFTVOjViE8Lh63RGqZXAjVn36zts7pRt66bUng jb9qWR5AZsyNecjrTlQWGgIOaMlcovUonP/GpdVM8B0yQHyaRJfyuJX5bsjSVMRTkGIoxW Z1henixg3dglSQi6UpjZTr24eOhEshI= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=x4tlAxSo; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf14.hostedemail.com: domain of tabba@google.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=tabba@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747852118; a=rsa-sha256; cv=none; b=Z6NaOoVwA2+sozNhSW4xe83vyYB8VesnDhWKPO1VnN5x1P5xZIFduee63zgLVnSAJDDy2N bwxlmkrYs1JEdTOp7xGckGxBjDF9+7SuQD/77Bw8JtFqTHRh6o0etgWHurDnvVm7mVcFXP OwFrkNI9PrxBuq65jBaoJU+xlZeQh24= Received: by mail-qt1-f173.google.com with SMTP id d75a77b69052e-47666573242so1554411cf.0 for ; Wed, 21 May 2025 11:28:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747852117; x=1748456917; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=KE+TGfgZP6vAxjKudjK+avRjPu3u2b6Qn6vlewX0Zww=; b=x4tlAxSoiDPpVrkk+u+j/cK/CbaEbARLiyBz2cgd28HJsvEVj12TYvgYKJuJzbQ5XT XzrMMW28L6z3kDxhmqIi9XWRTrMbp8Cj1S5CxZoaCKBQs5jlwwEsFlul+PnX+cqxvdQr ilPnpmXuEwZZT7i9lfjFUNNzQ4KnHrMXjtRXEr+/+fc1XdKFursmYje3d3ZcTN35HyQ0 /WD4o2awAEXMW8PcooA1KLzAU1h96d78qQQFo6jBzrKZCtx/1XailDPwQPVsy0NNyqHq O7kFDEm5TMDLlyGYXw3DJs30NHwCM0av2yKuB7ofjTIrm2VbKa87bEjpDFrjqHjowEC2 IfbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747852117; x=1748456917; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KE+TGfgZP6vAxjKudjK+avRjPu3u2b6Qn6vlewX0Zww=; b=UT6hN0izKWSobM8tHssIm+UD/0G2KXx0GI787b1MShXCMgvMHPMb/8WOrIo7OinEmj XxiJWnx/v+g27hADky88YCPCZ5r/qRSoE2JGAV99Si9T4Q1m8Ikb+lioii/UUUvH2e/b 8LOBkOyikJOX5rqg2kaQVBFlVyG51YE+EUbloeId+qD/AeSPJXWo1VxNZyxjGqxjCMV4 1UCgVPzPFt5ixFsiS0gb67HfEF/yMADiy631L3hz3DgipovVxIgKvqC3NxQaRwCUAH9M PSd7R6/6O32G1jMKyzPvmiBTqZ83y+Hd0rk4+jEjsIZ9V/RFIs9dQd6HWHFLhPJk4a0U /g4w== X-Forwarded-Encrypted: i=1; AJvYcCWcO1OkNDu44BaLZ8c2kiIGJs+wjPKs32kzbxDQjDqXCrnjcgSZcLMv6UbBYqzcuCkHWmHbAXfaEQ==@kvack.org X-Gm-Message-State: AOJu0YyEYLKeaKyPQ5sSubUFcqPj8XYAMFJ2IW8OALgaAIul7Suydy+l lJkDq7OfmbhIi3iUJQPjnvSuNm94By6Mkxl5xLssCzofV3Q2ZCVC+ZLgYqUfHZcmm7iBy/Phzrn IHUUO4yjy+3H38GWJjXFvQIDYJXGzw7SIlvvubKm9 X-Gm-Gg: ASbGncvVXDnoACUOxzG93ao8KxqZRCpADaibo/tt3uSe8JSDHq6z57uquWhRiUpuFgB LvRRjxNpQ+7O8pYYTk231e7rgii9JvciVa+Fd3BHNxmLKzeVtJ5YgcKYpirMey02TA+ba3D3uQr Sn785hg6dWy2fVih5BJZ/tvBD69aH/BXER9eIizxtFsDM= X-Google-Smtp-Source: AGHT+IGZid6ZSwp6ECf/sysAwwsVHxmx53LALy1SypT4adky3X6cr5p2m6ZAau9iUxYfvXGZ9udv6VpTLQeFJ9CfCY0= X-Received: by 2002:ac8:7d8b:0:b0:49b:72e2:4058 with SMTP id d75a77b69052e-49b72e24109mr5409741cf.11.1747852116701; Wed, 21 May 2025 11:28:36 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Fuad Tabba Date: Wed, 21 May 2025 19:27:59 +0100 X-Gm-Features: AX0GCFvKwIuCerI-UVymdfO2So4kET6He8NY_WhPOEmZF51aoWgYNRN9VC95aCs Message-ID: Subject: Re: [RFC PATCH v2 04/51] KVM: guest_memfd: Introduce KVM_GMEM_CONVERT_SHARED/PRIVATE ioctls To: Vishal Annapurve Cc: Ackerley Tng , kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: F28BC10000F X-Stat-Signature: jpt79y54mw9y1scjbh8ecjh9wneaat3i X-Rspam-User: X-HE-Tag: 1747852117-429758 X-HE-Meta: U2FsdGVkX1/tevWLLPnIMBOINQCv1P5ZD2uKL7NYHRox8Y1WyUH5+DnAHISlmWlwJaPcVD8phYjsAN7LEeFwWW37+Tvd0p+hWke4qwFBGEEfGYrl7NxTH7hiEsFDg1M9lBXOcxrpMNgjwjhsd8zEzDCM50rGJEl5IBbhXdNk8IwoucCMtyio4aDd3wQfW1xtJqQzmRry39Chmvjlg7cmIrNF5nyqRGM2YTcxwh/n5yaC7w8BRnNywimaIKqG7ddOFsCEVfAV3w/NIgpkq3Kqrpi3CM3eBb18NlqEFFlC0WdKLitDD/DdNXK/7sl0qDVUJ2IGtXQQ3gwhtM74OZ8l5/7ByCs+boh3a8jlyjIMKBiIMAh6Q7l53M0KfyldjXJt7jKbRyc1RD1L4K3xn8+la47gKYd4yRK61lu2QJhQmQsAaJ5mSZVI6i3cKsV483CpPJnUjkFcN6THqabFhuuqBq0Xyc8B2POZ5k0HKRLPR/KWucOEKhVfpqB5R303GwtZAiqlt2PurSkfwsV/se7ps782XRdJDDjk852ia303FLQzDginicQXEdaN23rKkUmhEFZAvKVPh9oNd4FCSkotNvxunTyDDwog9+tPBZsxE/PgIgMBX2xgVTQZz11s5mhKw2IMY2d/o/goBGzPrNdvUZwojhq8oNAB/vUGjAo9ZFoV98reKyJJGzywj5KtO6n7HkHy5eF/3Fr/AXx3BXsD2WhYXmFn7Ls9PEFw5kMknVEqUlmdhSfwy5FctIdw1bc1bOswIWg7XGS7RELIQgj5ljY5yAF+9LszA8V3VOqzfoebVWdn9yLdqdt+Ri/LqHqyi1RZUQ12GjvBWM+Wp9J6td6mlQ43ZvI2a+t+J2QHy+LzsRssEzDhC2yTdKcRhGYv/Ri4mauPR1W/nEYA4UqgOjlEWcOb73kDhrViu8G5ayJXX1seH3AYeBzLMkJKhY53Ojo7uu2NaXE9n4BCft3 Y4QM9L2/ ijj52YKNhWySjHaeyUC+QRgG9RS92SjHCmDX2tuS3eO+M3jEwDGa6DBHqSxcKotALa98a5rjrW5hSX5oYPWL3AM3CeDOY3DQkpvCsBm2+Xw+dtrnLdHO0Gp8hhvtNuXNdv+giOJVZRsnt2c/nCU9gfLsDj/fDACd/Kf12TdWQu10KaFvpHCn33HxQVII2x0Nhve7qUIxDhzUILF/2J2M1VLoybDFmo5+9rBuQj5oYq7XPJJdnzIabeBTWcyg16G18DriGra3tDdw+hFjjjTsOh3LxiPcTSuhdP/rPJDe/t0RTM8hgx9l0VUCKKcXcGvd5Vd5SnnT0GReQiqOJfVKkae3WtA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Vishal, On Wed, 21 May 2025 at 16:51, Vishal Annapurve wrot= e: > > On Wed, May 21, 2025 at 8:22=E2=80=AFAM Fuad Tabba wro= te: > > > > Hi Vishal, > > > > On Wed, 21 May 2025 at 15:42, Vishal Annapurve = wrote: > > > > > > On Wed, May 21, 2025 at 5:36=E2=80=AFAM Fuad Tabba = wrote: > > > > .... > > > > > When rebooting, the memslots may not yet be bound to the guest_me= mfd, > > > > > but we want to reset the guest_memfd's to private. If we use > > > > > KVM_SET_MEMORY_ATTRIBUTES to convert, we'd be forced to first bin= d, then > > > > > convert. If we had a direct ioctl, we don't have this restriction= . > > > > > > > > > > If we do the conversion via vcpu_run() we would be forced to hand= le > > > > > conversions only with a vcpu_run() and only the guest can initiat= e a > > > > > conversion. > > > > > > > > > > On a guest boot for TDX, the memory is assumed to be private. If = the we > > > > > gave it memory set as shared, we'd just have a bunch of > > > > > KVM_EXIT_MEMORY_FAULTs that slow down boot. Hence on a guest rebo= ot, we > > > > > will want to reset the guest memory to private. > > > > > > > > > > We could say the firmware should reset memory to private on guest > > > > > reboot, but we can't force all guests to update firmware. > > > > > > > > Here is where I disagree. I do think that this is the CoCo guest's > > > > responsibility (and by guest I include its firmware) to fix its own > > > > state after a reboot. How would the host even know that a guest is > > > > rebooting if it's a CoCo guest? > > > > > > There are a bunch of complexities here, reboot sequence on x86 can be > > > triggered using multiple ways that I don't fully understand, but few > > > of them include reading/writing to "reset register" in MMIO/PCI confi= g > > > space that are emulated by the host userspace directly. Host has to > > > know when the guest is shutting down to manage it's lifecycle. > > > > In that case, I think we need to fully understand these complexities > > before adding new IOCTLs. It could be that once we understand these > > issues, we find that we don't need these IOCTLs. It's hard to justify > > adding an IOCTL for something we don't understand. > > > > I don't understand all the ways x86 guest can trigger reboot but I do > know that x86 CoCo linux guest kernel triggers reset using MMIO/PCI > config register write that is emulated by host userspace. > > > > x86 CoCo VM firmwares don't support warm/soft reboot and even if it > > > does in future, guest kernel can choose a different reboot mechanism. > > > So guest reboot needs to be emulated by always starting from scratch. > > > This sequence needs initial guest firmware payload to be installed > > > into private ranges of guest_memfd. > > > > > > > > > > > Either the host doesn't (or cannot even) know that the guest is > > > > rebooting, in which case I don't see how having an IOCTL would help= . > > > > > > Host does know that the guest is rebooting. > > > > In that case, that (i.e., the host finding out that the guest is > > rebooting) could trigger the conversion back to private. No need for > > an IOCTL. > > In the reboot scenarios, it's the host userspace finding out that the > guest kernel wants to reboot. How does the host userspace find that out? If the host userspace is capable of finding that out, then surely KVM is also capable of finding out the same. > > > > > > Or somehow the host does know that, i.e., via a hypercall that > > > > indicates that. In which case, we could have it so that for that ty= pe > > > > of VM, we would reconvert its pages to private on a reboot. > > > > > > This possibly could be solved by resetting the ranges to private when > > > binding with a memslot of certain VM type. But then Google also has a > > > usecase to support intrahost migration where a live VM and associated > > > guest_memfd files are bound to new KVM VM and memslots. > > > > > > Otherwise, we need an additional contract between userspace/KVM to > > > intercept/handle guest_memfd range reset. > > > > Then this becomes a migration issue to be solved then, not a huge page > > support issue. If such IOCTLs are needed for migration, it's too early > > to add them now. > > The guest_memfd ioctl is not needed for migration but to change/reset > guest_memfd range attributes. I am saying that migration usecase can > conflict with some ways that we can solve resetting guest_memfd range > attributes without adding a new IOCTL as migration closely resembles > reboot scenario as both of them can/need reusing the same guest memory > files but one needs to preserve guest memory state. > > Reiterating my understanding here, guest memfd ioctl can be used by > host userspace to - > 1) Change guest memfd range attributes during memory conversion > - This can be handled by KVM hypercall exits in theory as you are > suggesting but Ackerley and me are still thinking that this is a > memory operation that goes beyond vcpu scope and will involve > interaction with IOMMU backend as well, it's cleaner to have a > separate guest memfd specific ioctl for this operation as the impact > is even beyond KVM. The IOMMU backend needs to know about the sharing/unsharing, not trigger it. The memory is the guest's. We already have a mechanism for informing userspace of these kinds of events with KVM exits. This doesn't justify adding a new IOCTL. > 2) Reset guest memfd range attributes during guest reboot to allow > reusing the same guest memfd files. > - This helps reset the range state to private as needed inline > with initial shared/private configuration chosen at the guest memfd > creation. > - This also helps reconstitute all the huge pages back to their > original state that may have gotten split during the runtime of the > guest. > This is a host initiated request for guest memfd memory conversion > that we should not be overloading with other KVM interactions in my > opinion. Then, we could argue about whether we need a "reset" IOCTL (not that I am arguing for that). But still, like I said, if the host becomes aware that the confidential guest is rebooting, then surely KVM can be made aware. I wonder if this might be better suited for the biweekly guest_memfd sync. Cheers, /fuad > > > > Cheers, > > /fuad