From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7B1B5C83F1B for ; Fri, 11 Jul 2025 21:18:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A2C428D0001; Fri, 11 Jul 2025 17:18:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9B6646B008A; Fri, 11 Jul 2025 17:18:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 857778D0001; Fri, 11 Jul 2025 17:18:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 7231E6B0089 for ; Fri, 11 Jul 2025 17:18:21 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 44AF91DBF1B for ; Fri, 11 Jul 2025 21:18:21 +0000 (UTC) X-FDA: 83653247202.11.E328A5D Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) by imf10.hostedemail.com (Postfix) with ESMTP id 6170EC0005 for ; Fri, 11 Jul 2025 21:18:19 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=2jd71Ha4; spf=pass (imf10.hostedemail.com: domain of vannapurve@google.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=vannapurve@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752268699; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=srgjlONkaPIBXRcKZsQrDQafSdxOtuoef7L+z35YDgg=; b=BBqGLM4Oj9eTKdk4CaL4Ibvut7P+JGAGeCtJ9/wmmebhokPnHKsA9JsOpaUy7kW7rMRqXn 3ajAWbvuL+TNq51v/fb/Y2pRH21JzX44KuCj1KIf6kx0yf6gu2Qj69iyofjDIg8IwiBp8d w6H69e6RnbVNRJZE3c+t+/BFa4BTPcc= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=2jd71Ha4; spf=pass (imf10.hostedemail.com: domain of vannapurve@google.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=vannapurve@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752268699; a=rsa-sha256; cv=none; b=6C1qW2Tnfm1tE6X3N5SXIyvM1lMzsWy+lts1KvgPqbgUgcjPeakflmCNPPdhGr7+2DYbR1 //KdIes217z0HTHN0y4vlwiabkYKqwuMJzLm4lsz2KjgZ4PqpQnvbwMYUHMRyzYvwWxlIt 1wG9Uc1Djbugx7pR9Pz7E0B+evZ0ubk= Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-23dd9ae5aacso49535ad.1 for ; Fri, 11 Jul 2025 14:18:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1752268698; x=1752873498; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=srgjlONkaPIBXRcKZsQrDQafSdxOtuoef7L+z35YDgg=; b=2jd71Ha40q+3GhKjBH4ivpn/ZcqIgIKYy/iQLtzJ0nb6vfVV4Z8u2NUEEQkKs4gJup z3STPVdf2eC9aKiu/eBdhvmqhOnSRBqSUtAW97YwbgoV81+tMyRK6TvEMcLvIMYwqeAa +tmLCIkXcyz0ImvBPIOQWBKP0z6FNObFuRlHmNUDgEMKWcMxuKp+YHXj0e7Xr+JC8Mt0 zQK+RX0i+8wU6xBpf5u19kPV9xF+11EAIuo410fY6dG0bOuFcmMs4Ro5xzt/mPvYMH7S e29CuVUZMZmcpmgkoXH6Lvjz2b7JRAeg05Z+7bv/qQBaIFTCboUj+S8mnWuX9aMtNsQB X0dQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752268698; x=1752873498; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=srgjlONkaPIBXRcKZsQrDQafSdxOtuoef7L+z35YDgg=; b=esMcAei50XDdST2ytbTfnbHEQ8wUg59ACB/H7cVia51cxKwaG10D6GVrU3+MgCxjK0 GMCAO6RINOabJFdbzaVpjVFBmJTTCc3Na3B2m02pt06FVHXoDClvV3WDgsL8/CvG5Xia LHyTzbJ6THXAndurVmg9fGmYSXymgvLMggPRvzevt0eNr1y1mYO83KYw9DptFMPMOcHx z5X6HnH3M4dBkwKPrc9cNWuFgD3imCIomBbzNj7X/5rDWq+sL63IkuCIzN+z9x26cWgo p4cYzJm0lj46E6aGGHAamvH5M6TjybQJbVLaRU8PXAi+OEFmJXwoMN2/IMpPh/woTRFa GtSA== X-Forwarded-Encrypted: i=1; AJvYcCVFUQqooWG5J5glmfXUeAhE0pqAptjYAWFFWXPEq/LVq3tzrl75nIAk9gBJI7RpYMoTlf/g6fYpEA==@kvack.org X-Gm-Message-State: AOJu0YxZCgUhBwenfMqnLfYDgiBNpEIF66geMpuIGQ3sX3wEUjOE2Uj5 +x+WXYKwNwSjgaRv/2gyL7Rh4nu7jDbHZa2i4/gF9SHDQuUSUgaOOJIpx7uO69oL7S+N7ATDaSN Om6NxGpJApOcSxDGZLxHTelT7RsfamFitFfb6emmO X-Gm-Gg: ASbGncu6i7uh0HvP1iTymKwH1E5BOwddgGWhcasTR025Z7QoDkc+rlbK+s1xf0i//X6 wwJX6/I9S4gwAhJcgeQ0jjwFmglFBOOiBKjRhrMAvy4Z7WxFJvN3q+J2QjNuYTOvytO0YtNCd/b vaf8E1k9nZeVTLch/x0/CfzV6CFnetNe3xbML0Bc8w2ULEmuxESRL9LKFO8lQIi4gtzGTdA4BcJ O90G1CySuFNAXuanRPj1aoVS5o5cmCNFHcm3A== X-Google-Smtp-Source: AGHT+IFn4YhrR2NMOpP0RcMBzjdz+i5J7fxD5D9l1We/eALKlDGnl4JwHOuZ1oclx2rFjnO5xzhl7yjgLb4uY8La0W4= X-Received: by 2002:a17:902:da90:b0:234:9fd6:9796 with SMTP id d9443c01a7336-23df7b4371amr129275ad.19.1752268697429; Fri, 11 Jul 2025 14:18:17 -0700 (PDT) MIME-Version: 1.0 References: <5decd42b3239d665d5e6c5c23e58c16c86488ca8.camel@intel.com> In-Reply-To: From: Vishal Annapurve Date: Fri, 11 Jul 2025 14:18:03 -0700 X-Gm-Features: Ac12FXxF9xOtwyt8UF9zfQ7KKAvsu6qF_2ngIDNtG1Fq0HXYA7tXCjkLx87XlaM Message-ID: Subject: Re: [RFC PATCH v2 00/51] 1G page support for guest_memfd To: Sean Christopherson Cc: Rick P Edgecombe , "pvorel@suse.cz" , "kvm@vger.kernel.org" , "catalin.marinas@arm.com" , Jun Miao , "palmer@dabbelt.com" , "pdurrant@amazon.co.uk" , "vbabka@suse.cz" , "peterx@redhat.com" , "x86@kernel.org" , "amoorthy@google.com" , "tabba@google.com" , "quic_svaddagi@quicinc.com" , "maz@kernel.org" , "vkuznets@redhat.com" , "anthony.yznaga@oracle.com" , "mail@maciej.szmigiero.name" , "quic_eberman@quicinc.com" , Wei W Wang , Fan Du , "Wieczor-Retman, Maciej" , Yan Y Zhao , "ajones@ventanamicro.com" , Dave Hansen , "paul.walmsley@sifive.com" , "quic_mnalajal@quicinc.com" , "aik@amd.com" , "usama.arif@bytedance.com" , "fvdl@google.com" , "jack@suse.cz" , "quic_cvanscha@quicinc.com" , Kirill Shutemov , "willy@infradead.org" , "steven.price@arm.com" , "anup@brainfault.org" , "thomas.lendacky@amd.com" , "keirf@google.com" , "mic@digikod.net" , "linux-kernel@vger.kernel.org" , "nsaenz@amazon.es" , "akpm@linux-foundation.org" , "oliver.upton@linux.dev" , "binbin.wu@linux.intel.com" , "muchun.song@linux.dev" , Zhiquan1 Li , "rientjes@google.com" , Erdem Aktas , "mpe@ellerman.id.au" , "david@redhat.com" , "jgg@ziepe.ca" , "hughd@google.com" , "jhubbard@nvidia.com" , Haibo1 Xu , Isaku Yamahata , "jthoughton@google.com" , "rppt@kernel.org" , "steven.sistare@oracle.com" , "jarkko@kernel.org" , "quic_pheragu@quicinc.com" , "chenhuacai@kernel.org" , Kai Huang , "shuah@kernel.org" , "bfoster@redhat.com" , "dwmw@amazon.co.uk" , Chao P Peng , "pankaj.gupta@amd.com" , Alexander Graf , "nikunj@amd.com" , "viro@zeniv.linux.org.uk" , "pbonzini@redhat.com" , "yuzenghui@huawei.com" , "jroedel@suse.de" , "suzuki.poulose@arm.com" , "jgowans@amazon.com" , Yilun Xu , "liam.merwick@oracle.com" , "michael.roth@amd.com" , "quic_tsoni@quicinc.com" , Xiaoyao Li , "aou@eecs.berkeley.edu" , Ira Weiny , "richard.weiyang@gmail.com" , "kent.overstreet@linux.dev" , "qperret@google.com" , "dmatlack@google.com" , "james.morse@arm.com" , "brauner@kernel.org" , "linux-fsdevel@vger.kernel.org" , "ackerleytng@google.com" , "pgonda@google.com" , "quic_pderrin@quicinc.com" , "roypat@amazon.co.uk" , "hch@infradead.org" , "will@kernel.org" , "linux-mm@kvack.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: 6170EC0005 X-Rspamd-Server: rspam12 X-Stat-Signature: ez4wjwsemq5ofnzeqreqj8zehg8gjfoi X-HE-Tag: 1752268699-201554 X-HE-Meta: U2FsdGVkX1/SQ9PM4gJC3LDeHQ96fBmIGnaDLuos0h6kEk/CxPyZym73iZL1rBkr14qwBg0k9+Bta9+hsHoKKUxOFsq+oIol38ByF8iFGsU38lyQ3CVY5XclSv/0Q+TLJ2GWD8Ek7i9qubZpDveIfYSRNB6pcGuAVTpBtACxFQlPYxHJpJvOqugXxxqVKn6nENBxz2QSdW4Hv2pZsAPQPiZgulESokW0ijMcPwZ1pmS7TY+pHGlXWsXJ9gdlWzGjIr0ME5kwB46gGmmo/eKSsmJ9FO9b9a0DBGLMJQoEH/0JO0Ulrba4WhkSjBpz50QnIUdF2vUoNIJWBIbs7BkXEJvczGfnb+Qe+S75d1/F+lOPPLgLjyuWhd1Hgo6AH7C648T9PWBjIT2K/m+CjtF8tgGMbOX5oxppoLBD4Et6gWlvTg7NFyWytcUD6C8pFupcsgwxcgqDUAhoZSC9o+/iwANZETeA+UAa+zAA0mkwxX8aeQ/3iu5Ei+ZDi3+ekuHYueYLj0hkEqn3/10TlCCxjPCejgw+RjHGhYwz6gSYht3IX8WePO6rkE+JMusDE5OFYSlhSPfSsZH8X6GYxRYZCyww4sxJSfAKw8IpcScKTNyBq882ApnZHwPAYN907rjdDqvAo/ZnL02ITaZkrGXC2dOFP1e8q6qksDKnEPilzAg+p5jffAV1sEstn7MNqGmC2Q9IBWxYX8ORQfJKb4oSMkSUT5VVc0TTUEzODdDYYRrgVWTWm799eGf5GoB4gjNUrDubielyWdwoDIAv0vWFPz3LeNo7lyD7apnPTgAZZDHGisRuFd9pD2NTijvHJwVCqqA7IgcTQixVdvx58PqK4g6d3M2l7tSXIjlTQWa4PWAQdp7IeAAZFJZMKQGE3zKHAKS0teDgLc3hvrkJP90YajoYESdmCA/1UDSbO/oEHwsYGyxwqwgqtDv28W+aXsXTpLZNwkO96FLv4HhyZGw GK+RULc/ zs+6u/QvmYp2kqoLSRNvij9H+9yn1JVAwMMgg/cJsnFiZ6u9E0vQ7Xy5xmIJ9oIDZD/5CRmo/9mhaYae4oDs+6nyRneuMELx0dXsUBZOQFsRwOHvolYLdrsYns9OsZ4wIH65x67hmlucyRAhj/8kTu4Rqdt+tVecQ+Na5Q8cIh9mLFtYvHw/6igU7AYlaKyyYLkyMNCHi3IK+OC2wH+I1TsxK7ZQaTkNYWxK4mrpFy7Gk4+Te9jCk4SMwcr6z1IUl3VYkCbBkc37HzpsZTeRMQfzWF6GGe/3m8HT8hqO5mRkQRWQRb8mB81ESOf1NhVEm0anupQ6hzISU2a2PwmVjo/UKULD81yNeV75cB5o/S5L7d0ehYipVLFXMCQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jul 9, 2025 at 6:30=E2=80=AFPM Vishal Annapurve wrote: > > > 3) KVM should ideally associate the lifetime of backing > > > pagetables/protection tables/RMP tables with the lifetime of the > > > binding of memslots with guest_memfd. > > > > Again, please align your indentation. > > > > > - Today KVM SNP logic ties RMP table entry lifetimes with ho= w > > > long the folios are mapped in guest_memfd, which I think s= hould be > > > revisited. > > > > Why? Memslots are ephemeral per-"struct kvm" mappings. RMP entries an= d guest_memfd > > inodes are tied to the Virtual Machine, not to the "struct kvm" instanc= e. > > IIUC guest_memfd can only be accessed through the window of memslots > and if there are no memslots I don't see the reason for memory still > being associated with "virtual machine". Likely because I am yet to > completely wrap my head around 'guest_memfd inodes are tied to the > Virtual Machine, not to the "struct kvm" instance', I need to spend > more time on this one. > I see the benefits of tying inodes to the virtual machine and different guest_memfd files to different KVM instances. This allows us to exercise intra-host migration usecases for TDX/SNP. But I think this model doesn't allow us to reuse guest_memfd files for SNP VMs during reboot. Reboot scenario assuming reuse of existing guest_memfd inode for the next instance: 1) Create a VM 2) Create guest_memfd files that pin KVM instance 3) Create memslots 4) Start the VM 5) For reboot/shutdown, Execute VM specific Termination (e.g. KVM_TDX_TERMINATE_VM) 6) if allowed, delete the memslots 7) Create a new VM instance 8) Link the existing guest_memfd files to the new VM -> which creates new files for the same inode. 9) Close the existing guest_memfd files and the existing VM 10) Jump to step 3 The difference between SNP and TDX is that TDX memory ownership is limited to the duration the pages are mapped in the second stage secure EPT tables, whereas SNP/RMP memory ownership lasts beyond memslots and effectively remains till folios are punched out from guest_memfd filemap. IIUC CCA might follow the suite of SNP in this regard with the pfns populated in GPT entries. I don't have a sense of how critical this problem could be, but this would mean for every reboot all large memory allocations will have to let go and need to be reallocated. For 1G support, we will be freeing guest_memfd pages using a background thread which may add some delays in being able to free up the memory in time. Instead if we did this: 1) Support creating guest_memfd files for a certain VM type that allows KVM to dictate the behavior of the guest_memfd. 2) Tie lifetime of KVM SNP/TDX memory ownership with guest_memfd and memslot bindings - Each binding will increase a refcount on both guest_memfd file and KVM, so both can't go away while the binding exists. 3) For SNP/CCA, pfns are invalidated from RMP/GPT tables during unbind operations while for TDX, KVM will invalidate secure EPT entries. This can allow us to decouple memory lifecycle from VM lifecycle and match the behavior with non-confidential VMs where memory can outlast VMs. Though this approach will mean change in intrahost migration implementation as we don't need to differentiate guest_memfd files and inodes. That being said, I might be missing something here and I don't have any data to back the criticality of this usecase for SNP and possibly CCA VMs.