From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3FDB6C71136 for ; Mon, 16 Jun 2025 06:53:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C2F968D0002; Mon, 16 Jun 2025 02:53:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BDFDA8D0001; Mon, 16 Jun 2025 02:53:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ACE4A8D0002; Mon, 16 Jun 2025 02:53:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 9948A8D0001 for ; Mon, 16 Jun 2025 02:53:23 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id C09755BBE6 for ; Mon, 16 Jun 2025 06:53:22 +0000 (UTC) X-FDA: 83560347444.09.679A8E1 Received: from mail-qt1-f181.google.com (mail-qt1-f181.google.com [209.85.160.181]) by imf27.hostedemail.com (Postfix) with ESMTP id EE3C740012 for ; Mon, 16 Jun 2025 06:53:20 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=P1HBjkiR; spf=pass (imf27.hostedemail.com: domain of tabba@google.com designates 209.85.160.181 as permitted sender) smtp.mailfrom=tabba@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750056801; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FVEqVirFg5i6MzjRVG1snzVj/JgYlsQG5VsPw3Lo4N4=; b=ASth1i5QZlBCm04ytojrz5Oeo67xD7MbSPL3ydaO9wbqUdVILvxauPWTitGPmJyXAvzYr1 7htXTfuuGWKpht0iPfX5dahKNfXiSRnqQaCz9WIBpvfLe3bJOF+/GK9YzUJ92ZFWHodEz/ i8iBgWltfZkGsjj9gQk2Vga5nKYnDl4= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=P1HBjkiR; spf=pass (imf27.hostedemail.com: domain of tabba@google.com designates 209.85.160.181 as permitted sender) smtp.mailfrom=tabba@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750056801; a=rsa-sha256; cv=none; b=GPoSo+mUV8qdRDhaIWUh39vlcnyyIbV1ROkTkm4cMkisdo8u2McAwujfNx5x9Uz0pHzOwn gOb6HEwZu6RCJ0Qm4wlTXxgDxe9QRj/Vy/BieYH29ZG+XGN14ssqbe8SzP7g2EYYDLWAdv M3iSq0Kut6SI2bFVeoSNIIFOMBWx8Fc= Received: by mail-qt1-f181.google.com with SMTP id d75a77b69052e-47e9fea29easo450041cf.1 for ; Sun, 15 Jun 2025 23:53:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1750056800; x=1750661600; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=FVEqVirFg5i6MzjRVG1snzVj/JgYlsQG5VsPw3Lo4N4=; b=P1HBjkiRu/pjzKangZey+HRJTPYZd2LQCmHhN7nJ2dAcYS8Gmr6auhhD3U4jb6Sbjc pjTnuJnCEXsroBSsI8avkQejiv60Mt7fFkRhpeUU8p7GqIns37XgOCmkB/2IL2Uufwr6 5LSSarTJuU2QrrhTjNj6arnLHY8kx5Zj0z8x6v8Hx9rQgSk6yIZjf8OZTVtZzao9We2+ aS++iDIwoh8ctWflCJ/ekG5KqV+Q8rdJARRtPB0QTc3Tj3g3j7C2ksOYFRXlXwlmT/Kt c/vJxqSDbPb32Rv//1l/kVFty1SXO+EldVMbPt4mql35tx4lUNxgXLNTF84sYFduLyMI 9u3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750056800; x=1750661600; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=FVEqVirFg5i6MzjRVG1snzVj/JgYlsQG5VsPw3Lo4N4=; b=hFUsu8oc6L9hviPqn88nf2BNTixwX4DOza2xj5/WqGj7mmYxM7LX1s/3D+m1hyPUH9 iVJ05OrCBDePF5nsfG+/smv1H2tNs9iPgCwuuaA/hxUzQ4B9Uo4PEe6Ag5ENnvJ4E7Jl iPoC/ZU1jmGzp2o++UPRb/dpDwoudKlVbz34bWi6eUNN+dFGf23MnwS7QSwRAEvVY970 Zb2emUPdJFyLmihBKfc0cvfLG1R9McO20RoOQhBvlCxmcid/6HgMgy056EsfMvmoJVeu jSJg8XaIDGg8WaBV/QpSJtkAIFamYzG6tMBm0Nav79i1pqfAV3Feme9u+A8ovVaZ3F/V PjFg== X-Forwarded-Encrypted: i=1; AJvYcCUIYb/cmcRhKEGTGPs8nJBpNG1OeDcVhB1n7dZZCAr0eRrUlumSlW1lF0XasKHitD9f1AK6mDOdVw==@kvack.org X-Gm-Message-State: AOJu0YxBjCysvk8c1cc14/nC7KRPqKu0mqrXioFsiHvoKB8uLN6OZMO8 4w6/1p4qLwOOULTtsEFr+SND4OCrpJx9pQU2A+SVTZWw6zgIRkrLxYX1ygggfugtGBTKDJyxqKy 1ROe6qfDMyW1c2W4I+cCibrn0Kb9+XB8tQ43BWYk4 X-Gm-Gg: ASbGnct+AEOHpIqhucS9EPRiB7JZAOeuuOhyhVy1BgM6kfReblbzk/pSPFLyhIOnmYe M/Vmc45WRc/wU/JLsa2NYvhvecUawc6kOvYI57UZAExP1GuOrGeYF8llPwDV3fz652OHBeBirL8 jLnIqou+ZZ3AhB4W81Ph2dG/4JV7tmhDRSpuwZ0XObOCg= X-Google-Smtp-Source: AGHT+IE5hWQwTwh7MP9Du/GPZsQBHV+avpYUx4eqkYI8R99l7WlPl8ufgRsSCv2pUxGqQWRSQb1g5/J6MZjLIVn1B3Q= X-Received: by 2002:a05:622a:1213:b0:476:f1a6:d8e8 with SMTP id d75a77b69052e-4a73f3c4871mr4666031cf.11.1750056799453; Sun, 15 Jun 2025 23:53:19 -0700 (PDT) MIME-Version: 1.0 References: <20250611133330.1514028-1-tabba@google.com> <20250611133330.1514028-9-tabba@google.com> In-Reply-To: From: Fuad Tabba Date: Mon, 16 Jun 2025 07:52:43 +0100 X-Gm-Features: AX0GCFvGKRmaMT01Ibh3HhY56fkma50qiBLdK32HOEYHz0uSWpQ07u962UNyKuk Message-ID: Subject: Re: [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages To: Sean Christopherson Cc: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, kvmarm@lists.linux.dev, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, peterx@redhat.com, pankaj.gupta@amd.com, ira.weiny@intel.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: EE3C740012 X-Stat-Signature: egde4kt6mrwn5jchnmrwct85w9gzspjo X-Rspam-User: X-HE-Tag: 1750056800-772590 X-HE-Meta: U2FsdGVkX1+aqFryScTjKhcd8eMFwpO2D+xOQW3BFrFJgkAghmtyEOZDFVg63Q/GOIShvkKt5zSKjNEchpqnhs6ZbBVow4LfqP1FE1WX8ssl08k6MXyWsR+yFbP+7ctqqB8XQhLDy6SynPtdssBczApmu6S2URdc4VDt6ofrou1JrYh8SZcI98HWM1IP11d6GG42B3h8jTp7rmQJNR6AJfaYF9eIjQVKCNGfBzl/qk0B1YpehLxnXXZRzVrt3IcDyR5AhLs7COB9jUqHzBomgL0X4IugRzEjQt/Q39OXH44YjIYSXb5+Mrmz+2i29u2QEdzekbPqLCJsKoxEbk/Luci5IWOrntekKIM8ztg0p/K6c4Yvf5Dw527pV+Lnq/QxTTDWv4kZMlQCQbTyRTLlPyHwgsjM2vrjDjncXWZ686vwI1pPHmuLpV3r5pkYeyQc3ST1SA0LixZNBdQIfmmwxlprKL7UwhAymv32XHIVaiMD1iBPEwjkEm876ziiPpiH8+kszfjuyRTBFwEeTJslSZY2ixjdyRQxpnsd6Qwq29e3qZ6iG3iAerR2HyxZIVCMWYsLIMo1ZFNh6nfjfxq1/5TZNlp2GKJvAcBWTzmlHZyIOlW92MnF7FFLK1eooVOyfxLe/gTvUDWsx3OiYJafHwM5wgqaLUt1LWNWbSUMiipq3sQhu2V1AMORAskvBSZ8LgYyjSOljwbWp+hG58p9zziEv+/5G9eG2D7xrq6VxP1RwEP91BnNYhQP9ml1Wj1hpLyJVjVF9qUtf4jBl1nkHhabP3nl9h9clYwV+IFjr/iBUc9sxgpqkQCNBdl073Ldnjtin7FgB2ci3mwu7OdngYSdDTlDQqgtgBV9P8zd4MFjdyNS3c+JNYSOqiiBZNVFHNVaerWFlX9lc9FYNin8V5zFezfNVGg2nGnzAPmOZ+5P/62jVzq0WRPf8Oid8YA9NqLBedaeDwlt8Ugxn5e ikMuO9Na 0zK/p5pO7VWsVT2iL4VoIaZvjM/dmfsqSgaSz7ZsxqTSe2XnXh1BU4WmZ1PoAM4HrAgO0uBCd6ndOmTuVExLDZlbl/lC3P88lCdl/h3lpDefkuuU7iMhULgaVjldxK4A/6Ub9zU9ADIJ66cxV1l417FyEeK62A6jnM4fvTfa3XGfPbvM0O8L9qNxrPTQK9IY5+7qn2/YoKKn94PHGRz0nVVf1LePdN1HH0WvoScuYA7dFIWpD01BwalKhM+sUuJwv/LaTuf28rf7Hhodm5ErHFw+z4xknOxxtF2a32kamYgh/uIcp33cX9xhxsRi1278/ztCYwl2onxhGaBBI/B8Yjm7KdcFarYp2GE5VA/Y9VYEaPJA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Sean, On Fri, 13 Jun 2025 at 22:03, Sean Christopherson wrote: > > On Wed, Jun 11, 2025, Fuad Tabba wrote: > > This patch enables support for shared memory in guest_memfd, including > > Please don't lead with with "This patch", simply state what changes are being > made as a command. Ack. > > mapping that memory from host userspace. > > > This functionality is gated by the KVM_GMEM_SHARED_MEM Kconfig option, > > and enabled for a given instance by the GUEST_MEMFD_FLAG_SUPPORT_SHARED > > flag at creation time. > > Why? I can see that from the patch. It's in the patch series, not this patch. Would it help if I rephrase it along the lines of: This functionality isn't enabled until the introduction of the KVM_GMEM_SHARED_MEM Kconfig option, and enabled for a given instance by the GUEST_MEMFD_FLAG_SUPPORT_SHARED flag at creation time. Both of which are introduced in a subsequent patch. > This changelog is way, way, waaay too light on details. Sorry for jumping in at > the 11th hour, but we've spent what, 2 years working on this? I'll expand this. Just to make sure that I include the right details, are you looking for implementation details, motivation, use cases? > > Reviewed-by: Gavin Shan > > Acked-by: David Hildenbrand > > Co-developed-by: Ackerley Tng > > Signed-off-by: Ackerley Tng > > Signed-off-by: Fuad Tabba > > --- > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > > index d00b85cb168c..cb19150fd595 100644 > > --- a/include/uapi/linux/kvm.h > > +++ b/include/uapi/linux/kvm.h > > @@ -1570,6 +1570,7 @@ struct kvm_memory_attributes { > > #define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3) > > > > #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest_memfd) > > +#define GUEST_MEMFD_FLAG_SUPPORT_SHARED (1ULL << 0) > > I find the SUPPORT_SHARED terminology to be super confusing. I had to dig quite > deep to undesrtand that "support shared" actually mean "userspace explicitly > enable sharing on _this_ guest_memfd instance". E.g. I was surprised to see > > IMO, GUEST_MEMFD_FLAG_SHAREABLE would be more appropriate. But even that is > weird to me. For non-CoCo VMs, there is no concept of shared vs. private. What's > novel and notable is that the memory is _mappable_. Yeah, yeah, pKVM's use case > is to share memory, but that's a _use case_, not the property of guest_memfd that > is being controlled by userspace. > > And kvm_gmem_memslot_supports_shared() is even worse. It's simply that the > memslot is bound to a mappable guest_memfd instance, it's that the guest_memfd > instance is the _only_ entry point to the memslot. > > So my vote would be "GUEST_MEMFD_FLAG_MAPPABLE", and then something like > KVM_MEMSLOT_GUEST_MEMFD_ONLY. That will make code like this: > > if (kvm_slot_has_gmem(slot) && > (kvm_gmem_memslot_supports_shared(slot) || > kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE)) { > return kvm_gmem_max_mapping_level(slot, gfn, max_level); > } > > much more intutive: > > if (kvm_is_memslot_gmem_only(slot) || > kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE)) > return kvm_gmem_max_mapping_level(slot, gfn, max_level); > > And then have kvm_gmem_mapping_order() do: > > WARN_ON_ONCE(!kvm_slot_has_gmem(slot)); > return 0; I have no preference really. To me this was intuitive, but I guess I have been staring at this way too long. If you and all the stakeholders are happy with your suggested changes, then I am happy making them :) > > struct kvm_create_guest_memfd { > > __u64 size; > > diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig > > index 559c93ad90be..e90884f74404 100644 > > --- a/virt/kvm/Kconfig > > +++ b/virt/kvm/Kconfig > > @@ -128,3 +128,7 @@ config HAVE_KVM_ARCH_GMEM_PREPARE > > config HAVE_KVM_ARCH_GMEM_INVALIDATE > > bool > > depends on KVM_GMEM > > + > > +config KVM_GMEM_SHARED_MEM > > + select KVM_GMEM > > + bool > > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > > index 6db515833f61..06616b6b493b 100644 > > --- a/virt/kvm/guest_memfd.c > > +++ b/virt/kvm/guest_memfd.c > > @@ -312,7 +312,77 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn) > > return gfn - slot->base_gfn + slot->gmem.pgoff; > > } > > > > +static bool kvm_gmem_supports_shared(struct inode *inode) > > +{ > > + const u64 flags = (u64)inode->i_private; > > + > > + if (!IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM)) > > + return false; > > + > > + return flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED; > > +} > > + > > +static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf) > > And to my point about "shared", this is also very confusing, because there are > zero checks in here about shared vs. private. As you noted in a later email, it was you who suggested this name, but like I said, I am happy to change it. > > +{ > > + struct inode *inode = file_inode(vmf->vma->vm_file); > > + struct folio *folio; > > + vm_fault_t ret = VM_FAULT_LOCKED; > > + > > + if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode)) > > + return VM_FAULT_SIGBUS; > > + > > + folio = kvm_gmem_get_folio(inode, vmf->pgoff); > > + if (IS_ERR(folio)) { > > + int err = PTR_ERR(folio); > > + > > + if (err == -EAGAIN) > > + return VM_FAULT_RETRY; > > + > > + return vmf_error(err); > > + } > > + > > + if (WARN_ON_ONCE(folio_test_large(folio))) { > > + ret = VM_FAULT_SIGBUS; > > + goto out_folio; > > + } > > + > > + if (!folio_test_uptodate(folio)) { > > + clear_highpage(folio_page(folio, 0)); > > + kvm_gmem_mark_prepared(folio); > > + } > > + > > + vmf->page = folio_file_page(folio, vmf->pgoff); > > + > > +out_folio: > > + if (ret != VM_FAULT_LOCKED) { > > + folio_unlock(folio); > > + folio_put(folio); > > + } > > + > > + return ret; > > +} > > + > > +static const struct vm_operations_struct kvm_gmem_vm_ops = { > > + .fault = kvm_gmem_fault_shared, > > +}; > > + > > +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma) > > +{ > > + if (!kvm_gmem_supports_shared(file_inode(file))) > > + return -ENODEV; > > + > > + if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) != > > + (VM_SHARED | VM_MAYSHARE)) { > > And the SHARED terminology gets really confusing here, due to colliding with the > existing notion of SHARED file mappings. Ack. Before I respin, let's make sure we're all on the same page in terms of terminology. Hopefully David can chime in again now that he's had the weekend to ponder over the latest exchange :) Thanks, /fuad > > + return -EINVAL; > > + } > > + > > + vma->vm_ops = &kvm_gmem_vm_ops; > > + > > + return 0; > > +} > > + > > static struct file_operations kvm_gmem_fops = { > > + .mmap = kvm_gmem_mmap, > > .open = generic_file_open, > > .release = kvm_gmem_release, > > .fallocate = kvm_gmem_fallocate, > > @@ -463,6 +533,9 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args) > > u64 flags = args->flags; > > u64 valid_flags = 0; > > > > + if (kvm_arch_supports_gmem_shared_mem(kvm)) > > + valid_flags |= GUEST_MEMFD_FLAG_SUPPORT_SHARED; > > + > > if (flags & ~valid_flags) > > return -EINVAL; > > > > -- > > 2.50.0.rc0.642.g800a2b2222-goog > >