From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF290C02198 for ; Wed, 12 Feb 2025 05:07:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7BBC76B0089; Wed, 12 Feb 2025 00:07:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 79402280001; Wed, 12 Feb 2025 00:07:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6821E6B008C; Wed, 12 Feb 2025 00:07:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 4A1A76B0089 for ; Wed, 12 Feb 2025 00:07:57 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id EB8CC140B79 for ; Wed, 12 Feb 2025 05:07:56 +0000 (UTC) X-FDA: 83110110552.18.3D25183 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf04.hostedemail.com (Postfix) with ESMTP id 497AF40004 for ; Wed, 12 Feb 2025 05:07:55 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="1Q+ICE/S"; spf=pass (imf04.hostedemail.com: domain of 3qiysZwsKCPcZbjdqkdxsmffnnfkd.bnlkhmtw-lljuZbj.nqf@flex--ackerleytng.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3qiysZwsKCPcZbjdqkdxsmffnnfkd.bnlkhmtw-lljuZbj.nqf@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739336875; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:dkim-signature; bh=zOJxpI1x9MLAiq9h9+DTjl81Sa/nSgc2QKSMD+NAq0c=; b=R8Ven7H2b+QoQPiAfLEejPNS7szMyuo0wJpLRzKpA09Bbbf10nGLegxIodRi/lUEDGC+ek NDJzZl6QdRig0By+f35/YMMDDitKazYowmSjJoesOasLoMJEWxhLauvE1W6w0atZrruYxj ldmd2zILTMx+gfqYW9bcK/sH0TOO88w= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="1Q+ICE/S"; spf=pass (imf04.hostedemail.com: domain of 3qiysZwsKCPcZbjdqkdxsmffnnfkd.bnlkhmtw-lljuZbj.nqf@flex--ackerleytng.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3qiysZwsKCPcZbjdqkdxsmffnnfkd.bnlkhmtw-lljuZbj.nqf@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739336875; a=rsa-sha256; cv=none; b=s0KzFQ3ZLHOEhos6xRcd/z+viaNZt+sTPaYMlYjJ5GcBTX895zEB5jVOpuu39vFwVnCNkf 9TXAHGmYIWz7VpUk41Cgl65RRFUhYpyBqz308f5xVTRKdu0iT7u36XcTa7nWxkIoziPiUr hDFRcdWX4C2vhWhYdyh+DGbqy9IxO2U= Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2fa38f90e4dso13510454a91.3 for ; Tue, 11 Feb 2025 21:07:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1739336874; x=1739941674; darn=kvack.org; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date:from:to :cc:subject:date:message-id:reply-to; bh=zOJxpI1x9MLAiq9h9+DTjl81Sa/nSgc2QKSMD+NAq0c=; b=1Q+ICE/SwEgOy1Fy4GnJtVUn5ngjA+E54+9Q1nsgOSaaIXHcw7YpHCr/2tuK0T8V+z sBPpILhxqbiGZ9AhyBQMwLdP00nURMssfgtzb9dwq5p03TlCVVPN3plCwEGEviRUCmb4 WuBDcQq6auk/yTmLRNinFa6LJJ18Kysfo9dkDjyHtiEBWNlfGdQFUkMQTo0qvtYAhR31 7667FkCW/CABfnZP7RSP3cOgwbsBcPxFW/GCarscZmvx/ETO4MGU+R7LO6riUIq759mI bGhK6xIs/IdiRmHwjOTy/6GmIV5QvqU628yuqMEdxPxwtghNLxZYsullt9A213uiyL4B H9Mg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739336874; x=1739941674; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=zOJxpI1x9MLAiq9h9+DTjl81Sa/nSgc2QKSMD+NAq0c=; b=a5XK/4BV7moNvjShE7QebcdpQSHoEv7nnraO72gj5mTuICyneuoraceiWfeKljhnoW yxg4Eo4S1PHPLgybqFvZdIg9E8aZoG8gqzfyrquJlhzj9cVJHdShA6cEIeEzp2x/hnTe O4FJJkyhVG9p8+ckxfcf23f2yEUl4BXZCELYpV09n6M83wRE4UNktVUP9fJrlKMyUs+F 0yQ45fXx45q0AZ+RfcOM+gqlrS1c/s+CMMRd5izgRZI+Cln3d1CW6FwJom67koRWw5Cs JCJcHNlPrlQ47vrZHzAiTWUus03BZ59cQAWI9vQFYBldrKKzWB2IqW0PEkPo9qXX39BM SX4g== X-Forwarded-Encrypted: i=1; AJvYcCXh5/XHlo1qHT6+xwDc5/vcYlwnIgdstshMTsuJHBztLD3OiUb24tpvBpdAPnE3MlTFst+pBQ7ACw==@kvack.org X-Gm-Message-State: AOJu0YxNQsiWMcNQ/JsC1YV0+RQ2sm2CL9fng8teYCjRRAZ2F0cQERnW agXJE7xtD+X/J5kqajuC8Thg4yksrH3jMPcwCK9D4+iu2RPyG2VWxRPFszkFjn8pEw1DHIO4nk9 xexqtn0AzjQT76xhvHUlYWw== X-Google-Smtp-Source: AGHT+IH/xh6quStbzptBl/mL0TU0XPEI8Jdwu7q7FhIn9TFDXRLhgpliEbXcKvvkSkGPR9+aPEy3XINR4VxNZv86ug== X-Received: from pjbsv3.prod.google.com ([2002:a17:90b:5383:b0:2ef:71b9:f22f]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2c8e:b0:2ea:712d:9a82 with SMTP id 98e67ed59e1d1-2fbf5c71357mr2767999a91.29.1739336874097; Tue, 11 Feb 2025 21:07:54 -0800 (PST) Date: Wed, 12 Feb 2025 05:07:52 +0000 In-Reply-To: <20250211121128.703390-4-tabba@google.com> (message from Fuad Tabba on Tue, 11 Feb 2025 12:11:19 +0000) Mime-Version: 1.0 Message-ID: Subject: Re: [PATCH v3 03/11] KVM: guest_memfd: Allow host to map guest_memfd() pages From: Ackerley Tng To: Fuad Tabba Cc: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 497AF40004 X-Stat-Signature: pxdps3r1w6xbpdtdn99rzkguk98346jq X-HE-Tag: 1739336875-626577 X-HE-Meta: U2FsdGVkX1/JjaZyg8lqF5Kvq58hS6O+uCwSotxznVoUAp13zhagB4W6gR5Isjvei2znyoIlp5paBlye+9kj5m2fHEL1ZKoIiztmkaE26RPZIMCb7146aMjiCq8BvMBrmBx+lSuCatBzS2/NarxHOohQqYHxUUSNZdDd0AAQN9ChbQee/rUN2j7I42wnaGFHlqFgboF/R7bH2iErCQZRJ4cSsQn8JIse4jfmAPMQBzVd347e/5K8PBrLpluEIYtRWWnINo1kTKsBH0Pz+9xJ+d6p7N4+O4IjXgKMZLD0G9DkhnQyYysEJrEIzuGu/RsrK7aUaPWgUJy41RfBNf5fESfuN/grtm8D8ivOMp+/wQliHJDzDhSJkWSkUMwxcE9Q1Up6U2+dOpW/yfGe9iuVICr3lDk+zRc19JS6Pr4DeZ7r4C6sxCuiweytp/Q8fW+ITksdVWZkPGtoC3iaj5E1M2gA1Vy9k1afpvInXjnQ8wzdMmrFDcAt5Z+p1uZTPDxL4r2roa6DuFyP8Z28bFFdAQvRoAhWaB+cvoOCAVKHGxyav8C8eoIeUirjPv9TJG36qnGnyl4hO3OllKiTA09NGn1tujkqRT7SVSq12SXjGSnMntmhYfBa/f2O1lq4m5PpWIHiINjDPcoGaEUbCj9HXcKsiv9+2JhV9vU7QECYRFQkNCMFQnywnZXD0pgprTpPTZLqqhW2UFWaji80EPphMjZfIVRxY9vfkj0l3PgpUf4KsPTOYxIQ2w0Ea7wPtBzyvfoPSwK3ukvfH+mQmTDF4nULabhwZhdzT1COd4MtREeRFDutDFQJcdFrGePQtt92fZXw/nv6ti7HPmAW1Cj1DLy9SSd7o3Bcl1QX6mBVSAD9sDIdQOhMn2/+01l0ETjJElvlsUyKLzntxnv3Yztwmz2hWbPurPjfNILtFFe/9v/CvLW0NLgOqivT8b296FZ81zjlzkncIGyojOQX65p YLY9TJon S4F6WfROcTaZYvLIzuwHJ1GiMk017nK+OfxPUdT8nM82SuXNl6fhIHoSXyKyQ624TMKJ+qO3Yb8caIgZmCwJp2ikXKFggKvqsllfG539k3zCdaCMNptZEH+hjp/26GQ1Ez8T5v7TV2+tRuljoAaIfN5bKlOuNymy4V1ylDgxutgP+Kbi02qgS4BUGJu5kjwnESGVFm6cT5fcXL5khWFj0cBeFFh00pHDCpHSF+sdu6iBkwLMkEd78w3X8/5Nur5CMHWuTc5KYUqlr96fEpJcWQU3bN1kCnBRNL5sOmoHcJQBRn/bsEafcmwnVd8wXKf1stZQe5zK0fSsWMTfohdaaEX4h74tCsqX3gCDSBLIft4wvQVISr8JgRoXBI2lxOTa2sSaY3Tso0evH+py4W3+h2f4i0FvSDXrm0QlHGSODaQ4gT0ZfxTXSMOjoHWwSgNSaq9FCIrPsBBpfU8+FpEqd7l9Cd4ZpyVReLm4YhHfdV2v/S3G0GPNrGr+N/nJLEinEBOpNFd+Oru2PW+57JV83shhYLKGcJYVnJwynw6uJQre71xH0V5hWJHEGUO8daOOm1rXHBcuLee66+sBl+KmGTrFovi7QF+ZpEaB/l4lBJ8m6lr8WiLyG1+Qf4OTB6z8lPRwmAUVhmsvaer9e2TO6JvHK52aeM3o+abpA3NWRlJS/ymHATvCJvA56Lg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Fuad Tabba writes: > Add support for mmap() and fault() for guest_memfd backed memory > in the host for VMs that support in-place conversion between > shared and private (shared memory). To that end, this patch adds > the ability to check whether the VM type has that support, and > only allows mapping its memory if that's the case. > > Additionally, this behavior is gated with a new configuration > option, CONFIG_KVM_GMEM_SHARED_MEM. > > Signed-off-by: Fuad Tabba > > --- > > This patch series will allow shared memory support for software > VMs in x86. It will also introduce a similar VM type for arm64 > and allow shared memory support for that. In the future, pKVM > will also support shared memory. Thanks, I agree that introducing mmap this way could help in having it merged earlier, independently of conversion support, to support testing. I'll adopt this patch in the next revision of 1G page support for guest_memfd. > --- > include/linux/kvm_host.h | 11 +++++ > virt/kvm/Kconfig | 4 ++ > virt/kvm/guest_memfd.c | 93 ++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 108 insertions(+) > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 8b5f28f6efff..438aa3df3175 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -728,6 +728,17 @@ static inline bool kvm_arch_has_private_mem(struct kvm *kvm) > } > #endif > > +/* > + * Arch code must define kvm_arch_gmem_supports_shared_mem if support for > + * private memory is enabled and it supports in-place shared/private conversion. > + */ > +#if !defined(kvm_arch_gmem_supports_shared_mem) && !IS_ENABLED(CONFIG_KVM_PRIVATE_MEM) > +static inline bool kvm_arch_gmem_supports_shared_mem(struct kvm *kvm) > +{ > + return false; > +} > +#endif Perhaps this could be declared in the #ifdef CONFIG_KVM_PRIVATE_MEM block? Could this be defined as a __weak symbol for architectures to override? Or perhaps that can be done once guest_memfd gets refactored separately since now the entire guest_memfd.c isn't even compiled if CONFIG_KVM_PRIVATE_MEM is not set. > + > #ifndef kvm_arch_has_readonly_mem > static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm) > { > diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig > index 54e959e7d68f..4e759e8020c5 100644 > --- a/virt/kvm/Kconfig > +++ b/virt/kvm/Kconfig > @@ -124,3 +124,7 @@ config HAVE_KVM_ARCH_GMEM_PREPARE > config HAVE_KVM_ARCH_GMEM_INVALIDATE > bool > depends on KVM_PRIVATE_MEM > + > +config KVM_GMEM_SHARED_MEM > + select KVM_PRIVATE_MEM > + bool > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > index c6f6792bec2a..85467a3ef8ea 100644 > --- a/virt/kvm/guest_memfd.c > +++ b/virt/kvm/guest_memfd.c > @@ -317,9 +317,102 @@ void kvm_gmem_handle_folio_put(struct folio *folio) > { > WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress."); > } > + > +static bool kvm_gmem_offset_is_shared(struct file *file, pgoff_t index) > +{ > + struct kvm_gmem *gmem = file->private_data; > + > + /* For now, VMs that support shared memory share all their memory. */ > + return kvm_arch_gmem_supports_shared_mem(gmem->kvm); > +} > + > +static vm_fault_t kvm_gmem_fault(struct vm_fault *vmf) > +{ > + struct inode *inode = file_inode(vmf->vma->vm_file); > + struct folio *folio; > + vm_fault_t ret = VM_FAULT_LOCKED; > + > + filemap_invalidate_lock_shared(inode->i_mapping); > + > + folio = kvm_gmem_get_folio(inode, vmf->pgoff); > + if (IS_ERR(folio)) { > + ret = VM_FAULT_SIGBUS; Will it always be a SIGBUS if there is some error getting a folio? > + goto out_filemap; > + } > + > + if (folio_test_hwpoison(folio)) { > + ret = VM_FAULT_HWPOISON; > + goto out_folio; > + } > + > + /* Must be called with folio lock held, i.e., after kvm_gmem_get_folio() */ > + if (!kvm_gmem_offset_is_shared(vmf->vma->vm_file, vmf->pgoff)) { > + ret = VM_FAULT_SIGBUS; > + goto out_folio; > + } > + > + /* > + * Only private folios are marked as "guestmem" so far, and we never > + * expect private folios at this point. > + */ Proposal - rephrase this comment as: before typed folios can be mapped, PGTY_guestmem is only tagged on folios so that guest_memfd will receive the kvm_gmem_handle_folio_put() callback. The tag is definitely not expected when a folio is about to be faulted in. I propose the above because I think technically when mappability is NONE the folio isn't private? Not sure if others see this differently. > + if (WARN_ON_ONCE(folio_test_guestmem(folio))) { > + ret = VM_FAULT_SIGBUS; > + goto out_folio; > + } > + > + /* No support for huge pages. */ > + if (WARN_ON_ONCE(folio_test_large(folio))) { > + ret = VM_FAULT_SIGBUS; > + goto out_folio; > + } > + > + if (!folio_test_uptodate(folio)) { > + clear_highpage(folio_page(folio, 0)); > + kvm_gmem_mark_prepared(folio); > + } > + > + vmf->page = folio_file_page(folio, vmf->pgoff); > + > +out_folio: > + if (ret != VM_FAULT_LOCKED) { > + folio_unlock(folio); > + folio_put(folio); > + } > + > +out_filemap: > + filemap_invalidate_unlock_shared(inode->i_mapping); > + > + return ret; > +} > + > +static const struct vm_operations_struct kvm_gmem_vm_ops = { > + .fault = kvm_gmem_fault, > +}; > + > +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma) > +{ > + struct kvm_gmem *gmem = file->private_data; > + > + if (!kvm_arch_gmem_supports_shared_mem(gmem->kvm)) > + return -ENODEV; > + > + if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) != > + (VM_SHARED | VM_MAYSHARE)) { > + return -EINVAL; > + } > + > + file_accessed(file); > + vm_flags_set(vma, VM_DONTDUMP); > + vma->vm_ops = &kvm_gmem_vm_ops; > + > + return 0; > +} > +#else > +#define kvm_gmem_mmap NULL > #endif /* CONFIG_KVM_GMEM_SHARED_MEM */ > > static struct file_operations kvm_gmem_fops = { > + .mmap = kvm_gmem_mmap, I think it's better to surround this with #ifdef CONFIG_KVM_GMEM_SHARED_MEM so that when more code gets inserted between the struct declaration and the definition of kvm_gmem_mmap() it is more obvious that .mmap is only overridden when CONFIG_KVM_GMEM_SHARED_MEM is set. > .open = generic_file_open, > .release = kvm_gmem_release, > .fallocate = kvm_gmem_fallocate,