From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 405A7C71136 for ; Fri, 13 Jun 2025 21:03:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B0D6A6B0088; Fri, 13 Jun 2025 17:03:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ABE426B0089; Fri, 13 Jun 2025 17:03:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9AC566B008A; Fri, 13 Jun 2025 17:03:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 75B666B0088 for ; Fri, 13 Jun 2025 17:03:16 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id C4B2116124F for ; Fri, 13 Jun 2025 21:03:15 +0000 (UTC) X-FDA: 83551602750.19.97F0791 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) by imf13.hostedemail.com (Postfix) with ESMTP id 147A020004 for ; Fri, 13 Jun 2025 21:03:13 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=UOFTlvEg; spf=pass (imf13.hostedemail.com: domain of 3EJJMaAYKCK8hTPcYRVddVaT.RdbaXcjm-bbZkPRZ.dgV@flex--seanjc.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3EJJMaAYKCK8hTPcYRVddVaT.RdbaXcjm-bbZkPRZ.dgV@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749848594; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=A3Wch+x3uNF1XdTWEOPRulvN+JzipLoRE2aMsLqF+rU=; b=y5AfgMVMA599mLYmDdMmgdyIlW7eYyeI9bDn0IzwxsGVR+RxyFE6mWIMDwgntlJNA4tPzC hwECBAW8icg0WytVxneJKU47QbFiudnQNxslc/iVaGk21BgJj4j0ngWNOaKKJ8CjgRo8Ou PL9w11iTcGPPlr84KG8Y9vRWYg5LioM= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=UOFTlvEg; spf=pass (imf13.hostedemail.com: domain of 3EJJMaAYKCK8hTPcYRVddVaT.RdbaXcjm-bbZkPRZ.dgV@flex--seanjc.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3EJJMaAYKCK8hTPcYRVddVaT.RdbaXcjm-bbZkPRZ.dgV@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749848594; a=rsa-sha256; cv=none; b=D5lnhiowt3J5FRrQUYYBdnn8h8HTocFlM2K/4h9aJ1fLuO1KekIijKpIqNIJRG5TAF9ujH aftxO551cpU7Mv3fLVhHCNVR3PnsCWGYJZuKDOnVB/GsMH0SXoGsrwbScit/JbjWhInCru N4ptOC3t/aNYvgAhMCbLO5C9v2dSHgE= Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-b1fa2cad5c9so1662534a12.2 for ; Fri, 13 Jun 2025 14:03:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1749848593; x=1750453393; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=A3Wch+x3uNF1XdTWEOPRulvN+JzipLoRE2aMsLqF+rU=; b=UOFTlvEg4fjufqbJHCO0Bnb6XIWnvciFrwLOuTuI0KnpqFeKehUWqC7CcDAyrKSHJE /LNaicH2mUwuD7fpivfVgymCWdHphLdwGXvn6RqeyqtZx3isNRK2SVA8hx38DRU64ZuS uOEbfJ9XKWDM0RLV9NZ6CgGaTQG5m18l2kkFnJtOLYATZFk9Eqf8ZJLQpmoYxCDc0X0n rcFfCenh4Rq4Me59FOYGKiZbJfmY6DPwnyflC9igE9E5IZVmqpyvrfkjExOCRf/cBMe8 PyWhw8wSe2Z/bz3YOvPQpTGPoC4AvWvXe0dkzk2TYXr45Ait1xl2wHidmextd/uKQLZN NogQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749848593; x=1750453393; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=A3Wch+x3uNF1XdTWEOPRulvN+JzipLoRE2aMsLqF+rU=; b=MsvmBiOScqNLPd5M9pBRuagy0DwLRsiAMbBqdykAL7BSzxby7vx4c1LPmzylHS4d3f ujSR21De8+TrUFLZdO9fUHWX0our5s4WO5PmS7v1qOX0tJcNVs6F4oWNXCeJdjEEGrn+ ZSyVtYB76IykjYM5X2XF+gdoKfbF/FMy4goqlZ2AUvG+f4Ugp+XObJlARqomKmx9X4qL eh4z6rhWUZUd5CxwYHHSWzO7CyFCMHT/TpQYrCAMkkfpM0FvDYp5DwkXN9QuWSFNsFgS XxAUSvubuXLs/l8VZ5ZHft+ek5rc93i643I2FcPKV89dpZqTy+HdDUZ5HUYyp4kVQkxk unHA== X-Forwarded-Encrypted: i=1; AJvYcCV83RygS/WFl2jVfLqyP0ccR72nTJfBCpeMhciNmXMhPA4iP9GX/qI3qnKTkDdRZX044RRY0cxvCw==@kvack.org X-Gm-Message-State: AOJu0YzDtk80UEhf0Zxv0nYkLXaDOD6fnZeaWnMy9ARspCk5RPQKGUNW kL700gcOLFEIqg/lFPFxf9XmSVmlyq0ki/YII8bUgMwt3duy6aEC4N8KhwEQGZVp2raryJk1Byu w1pvBFg== X-Google-Smtp-Source: AGHT+IFCqUrg4V9bdDrybRc7q8t0EQEaXYwnc3flICr7EMgc9UCftl1VaSgXOwrdPngPPXCTz+jIrKW0iLY= X-Received: from pfbch3.prod.google.com ([2002:a05:6a00:2883:b0:746:22b3:4c0d]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a21:3a48:b0:1f3:2e85:c052 with SMTP id adf61e73a8af0-21fbd634e55mr1185002637.35.1749848592921; Fri, 13 Jun 2025 14:03:12 -0700 (PDT) Date: Fri, 13 Jun 2025 14:03:11 -0700 In-Reply-To: <20250611133330.1514028-9-tabba@google.com> Mime-Version: 1.0 References: <20250611133330.1514028-1-tabba@google.com> <20250611133330.1514028-9-tabba@google.com> Message-ID: Subject: Re: [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages From: Sean Christopherson To: Fuad Tabba Cc: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, kvmarm@lists.linux.dev, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, peterx@redhat.com, pankaj.gupta@amd.com, ira.weiny@intel.com Content-Type: text/plain; charset="us-ascii" X-Rspamd-Queue-Id: 147A020004 X-Rspamd-Server: rspam03 X-Rspam-User: X-Stat-Signature: nabpq4jeyiiufrupgyta6r6bs4zy7o89 X-HE-Tag: 1749848593-167051 X-HE-Meta: U2FsdGVkX1/p8qzZimstpgECTR+2TjvD0MOxUPUv2l1JcylPBZw9mkHCpV6dxzH1Gouv72vi+eTaAePBGn/SUdb7R7VMpBDzCiz2U63BgPfBxWbOsL9d+oGjagCpjZZHe3KbCn89S/fHCc0W9qMg4oygBBgixWjb8S6E3akHddTIqqtAQvv31Vuf1n09FEVsx04WrgXxjXUtj34aFcVngJCynDR86fF16x7xAfxd3vSP2hfnzxJRI9ZMfLcDF3QAHn5ab/VUzFupZ8P9iPmflO6g7T3zciS2Pqc5gJHfTNeaRp72QJkAFS5bfXRKWbzpVle5OYh5Rkvmu2L5foHIyIF5WLk0mCtOwv9d0s1zP7nrC4FgIHtVivwjkH3/M1FcThVn0EU7WFFMkpRGlr+enqmBvypWMfjou7c4q94XjQnWs7VVRhdkAZwsWa8rwjdoHv4XwNHceis0PW/jHE9IhH7cl9SAUeRnLkWci0SqWLUpZJrtLUkADLmhXlXMa+aCjjj01g0UwsF2FkO6d96ywGQ1yItEgpmbYemkvDfLybVMhsZ8w0owOPFnUUoELxIyS8kGKyBBmmnqBpV3v/lMbdVczboKqhU+H1zCdzQtS1I9V/zyZAfEIY+SfRpy0J0nh8KN8U9CzHF131gM+4Q8amlM4c+DD5ytTsVj6NUrlOOJXi05QqpokxJt7OSXSaFFMF0f01BuoZWhdwGc1KJubCDf4rpSmIDfLoqIxxCsM6oVwq1Iv+FDoXgtSijy2fxlNnrqO5upTU36IBpo2pnAcx0q3hNFIqaZ0fNgraq9vOK2653xpRrFiQnt61ybTp3/LW/tttB+fk61+SaEk5s25nUIb9rUh3mIxdNFpAHcK2HAJOwzKFZEUv9xHNw94g2MB6V6+JdpDbEGN3M6XKsvl88Fp9fMeMkAab7cjcCDtFz4lTofg78NfUkMlZzlrQ5XYqZv5kzhXdo/2ytRjDv VGP9f2ui sJrGMRMtHWA3sAteSf5kfyh94l6KLiTeoovTrK/cHZ4vv5mvH58Cm7yzFPlLM83o5h0IdkLrS2jtBLQieI9tEVvnf7+S/8X8uNTXunenH+9q6tfCla9M7QXkOl5WHuzeaoPNtWYD+kl0NrMsi7OFnOKXH+M2Fjq97b7N7rfeEjq+wsJHUWNe4MqglLQb9U7iXzyEDAPzswdleuGe9BOg2h/sta+lcK3Y6/rLE5k+szfdbGwegPZaV+Oqx25cMpMDhXud4BbGxH1bfSoQg15mf1A1BsYLPgZF+QMxFTnm/CDd3B1hDO0yxJw7vPOMu/XwyCHht58FNcsxElKfHsecYRwbJNKuY5EyzSXwBtWF6NiuzRXD/znDUFte9sqwNU1hoDmSTTRO2g+opuIZ6rnRaiXPIU6kw3EoQoCsx X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jun 11, 2025, Fuad Tabba wrote: > This patch enables support for shared memory in guest_memfd, including Please don't lead with with "This patch", simply state what changes are being made as a command. > mapping that memory from host userspace. > This functionality is gated by the KVM_GMEM_SHARED_MEM Kconfig option, > and enabled for a given instance by the GUEST_MEMFD_FLAG_SUPPORT_SHARED > flag at creation time. Why? I can see that from the patch. This changelog is way, way, waaay too light on details. Sorry for jumping in at the 11th hour, but we've spent what, 2 years working on this? > Reviewed-by: Gavin Shan > Acked-by: David Hildenbrand > Co-developed-by: Ackerley Tng > Signed-off-by: Ackerley Tng > Signed-off-by: Fuad Tabba > --- > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index d00b85cb168c..cb19150fd595 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -1570,6 +1570,7 @@ struct kvm_memory_attributes { > #define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3) > > #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest_memfd) > +#define GUEST_MEMFD_FLAG_SUPPORT_SHARED (1ULL << 0) I find the SUPPORT_SHARED terminology to be super confusing. I had to dig quite deep to undesrtand that "support shared" actually mean "userspace explicitly enable sharing on _this_ guest_memfd instance". E.g. I was surprised to see IMO, GUEST_MEMFD_FLAG_SHAREABLE would be more appropriate. But even that is weird to me. For non-CoCo VMs, there is no concept of shared vs. private. What's novel and notable is that the memory is _mappable_. Yeah, yeah, pKVM's use case is to share memory, but that's a _use case_, not the property of guest_memfd that is being controlled by userspace. And kvm_gmem_memslot_supports_shared() is even worse. It's simply that the memslot is bound to a mappable guest_memfd instance, it's that the guest_memfd instance is the _only_ entry point to the memslot. So my vote would be "GUEST_MEMFD_FLAG_MAPPABLE", and then something like KVM_MEMSLOT_GUEST_MEMFD_ONLY. That will make code like this: if (kvm_slot_has_gmem(slot) && (kvm_gmem_memslot_supports_shared(slot) || kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE)) { return kvm_gmem_max_mapping_level(slot, gfn, max_level); } much more intutive: if (kvm_is_memslot_gmem_only(slot) || kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE)) return kvm_gmem_max_mapping_level(slot, gfn, max_level); And then have kvm_gmem_mapping_order() do: WARN_ON_ONCE(!kvm_slot_has_gmem(slot)); return 0; > struct kvm_create_guest_memfd { > __u64 size; > diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig > index 559c93ad90be..e90884f74404 100644 > --- a/virt/kvm/Kconfig > +++ b/virt/kvm/Kconfig > @@ -128,3 +128,7 @@ config HAVE_KVM_ARCH_GMEM_PREPARE > config HAVE_KVM_ARCH_GMEM_INVALIDATE > bool > depends on KVM_GMEM > + > +config KVM_GMEM_SHARED_MEM > + select KVM_GMEM > + bool > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > index 6db515833f61..06616b6b493b 100644 > --- a/virt/kvm/guest_memfd.c > +++ b/virt/kvm/guest_memfd.c > @@ -312,7 +312,77 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn) > return gfn - slot->base_gfn + slot->gmem.pgoff; > } > > +static bool kvm_gmem_supports_shared(struct inode *inode) > +{ > + const u64 flags = (u64)inode->i_private; > + > + if (!IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM)) > + return false; > + > + return flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED; > +} > + > +static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf) And to my point about "shared", this is also very confusing, because there are zero checks in here about shared vs. private. > +{ > + struct inode *inode = file_inode(vmf->vma->vm_file); > + struct folio *folio; > + vm_fault_t ret = VM_FAULT_LOCKED; > + > + if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode)) > + return VM_FAULT_SIGBUS; > + > + folio = kvm_gmem_get_folio(inode, vmf->pgoff); > + if (IS_ERR(folio)) { > + int err = PTR_ERR(folio); > + > + if (err == -EAGAIN) > + return VM_FAULT_RETRY; > + > + return vmf_error(err); > + } > + > + if (WARN_ON_ONCE(folio_test_large(folio))) { > + ret = VM_FAULT_SIGBUS; > + goto out_folio; > + } > + > + if (!folio_test_uptodate(folio)) { > + clear_highpage(folio_page(folio, 0)); > + kvm_gmem_mark_prepared(folio); > + } > + > + vmf->page = folio_file_page(folio, vmf->pgoff); > + > +out_folio: > + if (ret != VM_FAULT_LOCKED) { > + folio_unlock(folio); > + folio_put(folio); > + } > + > + return ret; > +} > + > +static const struct vm_operations_struct kvm_gmem_vm_ops = { > + .fault = kvm_gmem_fault_shared, > +}; > + > +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma) > +{ > + if (!kvm_gmem_supports_shared(file_inode(file))) > + return -ENODEV; > + > + if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) != > + (VM_SHARED | VM_MAYSHARE)) { And the SHARED terminology gets really confusing here, due to colliding with the existing notion of SHARED file mappings. > + return -EINVAL; > + } > + > + vma->vm_ops = &kvm_gmem_vm_ops; > + > + return 0; > +} > + > static struct file_operations kvm_gmem_fops = { > + .mmap = kvm_gmem_mmap, > .open = generic_file_open, > .release = kvm_gmem_release, > .fallocate = kvm_gmem_fallocate, > @@ -463,6 +533,9 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args) > u64 flags = args->flags; > u64 valid_flags = 0; > > + if (kvm_arch_supports_gmem_shared_mem(kvm)) > + valid_flags |= GUEST_MEMFD_FLAG_SUPPORT_SHARED; > + > if (flags & ~valid_flags) > return -EINVAL; > > -- > 2.50.0.rc0.642.g800a2b2222-goog >