From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3AC5BC02198 for ; Wed, 12 Feb 2025 09:22:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A3088280002; Wed, 12 Feb 2025 04:22:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 991DB280001; Wed, 12 Feb 2025 04:22:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 80BB2280002; Wed, 12 Feb 2025 04:22:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 60458280001 for ; Wed, 12 Feb 2025 04:22:49 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id E48771C9685 for ; Wed, 12 Feb 2025 09:22:06 +0000 (UTC) X-FDA: 83110751052.30.E0DA16C Received: from mail-qt1-f170.google.com (mail-qt1-f170.google.com [209.85.160.170]) by imf03.hostedemail.com (Postfix) with ESMTP id 1996E20006 for ; Wed, 12 Feb 2025 09:22:04 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=3QXWeCNT; spf=pass (imf03.hostedemail.com: domain of tabba@google.com designates 209.85.160.170 as permitted sender) smtp.mailfrom=tabba@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739352125; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dKYEyxBuKcUxq/NwCgSZb1M5YkMm6GBK5DoOiz1mWzs=; b=z7aK7nyegDXgVb1jyQ56KUbFxiGGzBRxHE5YYxp7xS15aqyYEQFyVOWz2f9ybRId3Zp+mb SJuAMvFH1UGpuM8t4IOlgV4y44DOfpB4wVSR25BkL6bmemchaiBHImX3WNvasW3zrQfVOG SlGyF//OXBESFiC9q2isHV1QNmO7U5U= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=3QXWeCNT; spf=pass (imf03.hostedemail.com: domain of tabba@google.com designates 209.85.160.170 as permitted sender) smtp.mailfrom=tabba@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739352125; a=rsa-sha256; cv=none; b=JFyncGQuv5EsV+fVXWOyMB0zm4uUALtvoGHv6zAkxD7reI2Xk2fRk6PUek0VzCNWHRUBUO J9d4BzxvNYJUrLKKFFK+bu8FV9QMDhwEp3DgtV2M12yypYWi7c2TXZNwJNvJQVsM7hvPNd 7afbbF+pMc+ibuEbqp+3GK6XGU4vSXw= Received: by mail-qt1-f170.google.com with SMTP id d75a77b69052e-4718e224eb1so122821cf.1 for ; Wed, 12 Feb 2025 01:22:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1739352124; x=1739956924; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=dKYEyxBuKcUxq/NwCgSZb1M5YkMm6GBK5DoOiz1mWzs=; b=3QXWeCNTIt5GOYc46EkrfnU5TYx4x7U7TGy/l5mxZTHRVFn5uE7hcDDy6qqlakKH69 f/RIXqFAARA+/6feQ6cTOwbsaeZyidlz4R4UVy7zK1sr9JLSF7vBHHwZCp38p9DYdYSu CEVCi4IvNjg0k5qEvUHkSX2d0Jj+R8KJ3gO2cs3+PcolnuOOBVSPy9I6X8uEq4cQcvbl 8lwC6RPPV+peWv4z/VYB4b/gu1zSXcwLdfg6rNmHsoG6gvZyekdOTVsw/Dqu8YD5N/cJ AHqFsTbTAQzeo9GEcugQXXP/mEE3bMKTRNSp4YIM4q9DStmDItgGB5l/l2lDqmLaO6lu 1Ypw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739352124; x=1739956924; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=dKYEyxBuKcUxq/NwCgSZb1M5YkMm6GBK5DoOiz1mWzs=; b=fVuERysxCpAumuR344oFmnmtxH/G21UO411vs7KXgtWr/vyeo0a+PAd4PWlDHLzjEO hOvArJUi6px3h/gX3+uRA1/MM5qYQTq0pWI/IA2Mfos8bzVPEjeZsketD+io2pS3jqBi ZZyoMzl/vCBoBgcPCkCK+n06gD5C4VCdnibst0EMFyfYEVw3j2jm+oGuZHVfjkSA7wYl MtQ3SvsfkaZF9ZGCFzZRfuArqaNOULKrkC0A3sHax98dhwQ8m9RjYJnxT9u1gBxMlbYt NpivWW5igxMI3LngBytl1gMPeJYJQ+V9fTqF/7Y/TlOxKlagW/FLzKQganziAb/5aKsx mPsw== X-Forwarded-Encrypted: i=1; AJvYcCVQWD0w8fL61XxIzcTcCbEE1KqFMI2CU8gMIVQ6fzHgERWHHKFE7TyXeL6R+xBRQq2zkV3p/tXJRQ==@kvack.org X-Gm-Message-State: AOJu0YyGln/4NcvC1zGTFx0VQdRtWmpmyhMMklzxSfpJyYBMxpQcttKV g1SlA0/nzakYgQSCj/zUuct0GmEyA5fs+cfIsv70fqfdwvyOGZanGvFEiMCl8gmQoe2yZUyjGc9 9omqr1zbkbGLoGdNq7qyZzOcqSPYsgIf2y8nQ X-Gm-Gg: ASbGncuGuGAyluL7/gTECwubZXFhC135vHQQS3ov9cKyfWT1xulXPQPXi0I13fwOgqv x0VQM6pYgGym0P5iUI67ZBE/ARhg+Wg/sC09lkP3Agbx4DKGSWrZp//sGx4m8ITauB5hIylOq2k cE6lTsXEFZDDlSrU196SLKuDtNCQ== X-Google-Smtp-Source: AGHT+IGSFFm16+c+j2RVCupUZzDXJ7KybX7ZtoH3oU2TdEZ+Vv4L4tzWJV3CaxlZx/ZvjoHgFKOm3DHyO3nyJv/aqkQ= X-Received: by 2002:a05:622a:5192:b0:466:8f39:fc93 with SMTP id d75a77b69052e-471b1c0136emr2039141cf.3.1739352123901; Wed, 12 Feb 2025 01:22:03 -0800 (PST) MIME-Version: 1.0 References: <20250211121128.703390-4-tabba@google.com> In-Reply-To: From: Fuad Tabba Date: Wed, 12 Feb 2025 09:21:27 +0000 X-Gm-Features: AWEUYZmPf_xLnwm4A9iaW62B0o_6QVplSidwJVSQ9a3e8sP5ZkCQV2ZPhlG2MAg Message-ID: Subject: Re: [PATCH v3 03/11] KVM: guest_memfd: Allow host to map guest_memfd() pages To: Ackerley Tng Cc: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 1996E20006 X-Stat-Signature: bzmxfuxe66z1gktpo8qh59qxnfexoopo X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1739352124-63567 X-HE-Meta: U2FsdGVkX19TkkTrCyuPSd/bpmzWAeQ3itJ6lRtfciNYBh8/6zg0G/E88+kmE5aXIK//hI+fRShWpvoGGcUAr/NghOkAAg72Kz37IEXPCZoXTIquL7nrC9yWapFqXdfdHlEMzGtBX/MAuPjAHNr9UXNBgYnwgM3g0GkzFcdKIFXPtrdWfmDF98ujPmJUd/G1F8H3+mv3y6B0WtvOqS38WbnB4i66zGPZnKuef2QDDXw8Nydtu0fllt78QPw0bHv58S5E6kghBOJ/w1LtaVbmFkyJ4rcKsgn2jNsfB6sV6wqFYoUtD6PlWktrXB7f4Fr9/0TgChR/OAvXlsqUMQ7lUlD3XqYUuWaoqGffcuY1vSXPwUDt0sihNc3Z7Fv1L0a8DeX8hOsL+sjnY9IFK5IOLxoTatUQrumHc9ULlOlO7nZBhzbSHzjz9iJqYiBGMewFjItV2pfRpr0zoZ1v0m4k1LkvjN6UGO7fpLkZBu2ftLW21MNJG1EIRtYQTGZ9k7/ZMJx4tY7Vdkd87mAagVcQFLIY3R7hPizZC8zK9ZVc951LTHlOh/86pDuq6MxTFSIoaK9nOPOjJ/wg1VNccOLZmzNVEjCAUL4ks9VvOR0K2VOon4mHXyBtIzrDEOFtiK1DzQ9UNX3SZh7JMuMPDCcF2GcDIiH2FsWt+5UsiNhImP0t973vsXdyx4XBTl4agwTg6yPlmjwJ7WqigxjmmxL+bA6LpAw0kOrkIADXeYUX7WJw3vhuSZonS0wQU1CEDtEAzr09CQ/pAqYpw68IbY5uxhY1y/pfKjF+HV90SPzeX61VnKuh3OLSzWHDfTEffRIA52r/Il2f7dKcrYFgA9xF9FqQP/m/36x0zgE0g2NuPdxtAd1JOYbUFjfPo4unEgsGIwSCrwW+qWcsruMYZy++wTt8Y1ogvvLbBKrntqL3mct1pAXKuX/K+Gdfq3AWj8RshjJ+vmpD1DLw1Uyp5hJ L9V06SMC os8y7jwNef4JNUh7wbCkcxcxmYivha3HacCDRQeg8kFRSKM9yHwiopd86i2B90TfylkwirAXQJlW4k9X43ORpDwpsb1RIOonP0nhyUd9FUh7AqQjozDjLCw3lFM0b9QvwUBsPjUjAilVpDDr0iWDcib2/ZlqMdNn+e40PYtehvEmtdpdfFrYN6f8DzYZT8SRPdCArmE374o9hDc4A7WTc0YYC/MYsFeCClAvkgiB/fd4tbyDgrny+4TgaT/ETy6n7qE+GHAQr6rE88H7sjnPyoLRYEPKa1xttijhLR/Ea3mIJGWZJIfB27IsuBELVEL6leNIpxkVFHnpxsRvXJJqyR/0eqQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Ackerley, On Wed, 12 Feb 2025 at 05:07, Ackerley Tng wrote: > > Fuad Tabba writes: > > > Add support for mmap() and fault() for guest_memfd backed memory > > in the host for VMs that support in-place conversion between > > shared and private (shared memory). To that end, this patch adds > > the ability to check whether the VM type has that support, and > > only allows mapping its memory if that's the case. > > > > Additionally, this behavior is gated with a new configuration > > option, CONFIG_KVM_GMEM_SHARED_MEM. > > > > Signed-off-by: Fuad Tabba > > > > --- > > > > This patch series will allow shared memory support for software > > VMs in x86. It will also introduce a similar VM type for arm64 > > and allow shared memory support for that. In the future, pKVM > > will also support shared memory. > > Thanks, I agree that introducing mmap this way could help in having it > merged earlier, independently of conversion support, to support testing. > > I'll adopt this patch in the next revision of 1G page support for > guest_memfd. > > > --- > > include/linux/kvm_host.h | 11 +++++ > > virt/kvm/Kconfig | 4 ++ > > virt/kvm/guest_memfd.c | 93 ++++++++++++++++++++++++++++++++++++++++ > > 3 files changed, 108 insertions(+) > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > index 8b5f28f6efff..438aa3df3175 100644 > > --- a/include/linux/kvm_host.h > > +++ b/include/linux/kvm_host.h > > @@ -728,6 +728,17 @@ static inline bool kvm_arch_has_private_mem(struct kvm *kvm) > > } > > #endif > > > > +/* > > + * Arch code must define kvm_arch_gmem_supports_shared_mem if support for > > + * private memory is enabled and it supports in-place shared/private conversion. > > + */ > > +#if !defined(kvm_arch_gmem_supports_shared_mem) && !IS_ENABLED(CONFIG_KVM_PRIVATE_MEM) > > +static inline bool kvm_arch_gmem_supports_shared_mem(struct kvm *kvm) > > +{ > > + return false; > > +} > > +#endif > > Perhaps this could be declared in the #ifdef CONFIG_KVM_PRIVATE_MEM > block? > > Could this be defined as a __weak symbol for architectures to override? > Or perhaps that can be done once guest_memfd gets refactored separately > since now the entire guest_memfd.c isn't even compiled if > CONFIG_KVM_PRIVATE_MEM is not set. I don't have a strong opinion about this. I did it this way because that's what the existing code for kvm_arch_has_private_mem() is like. It seemed logical to follow that pattern. > > + > > #ifndef kvm_arch_has_readonly_mem > > static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm) > > { > > diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig > > index 54e959e7d68f..4e759e8020c5 100644 > > --- a/virt/kvm/Kconfig > > +++ b/virt/kvm/Kconfig > > @@ -124,3 +124,7 @@ config HAVE_KVM_ARCH_GMEM_PREPARE > > config HAVE_KVM_ARCH_GMEM_INVALIDATE > > bool > > depends on KVM_PRIVATE_MEM > > + > > +config KVM_GMEM_SHARED_MEM > > + select KVM_PRIVATE_MEM > > + bool > > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > > index c6f6792bec2a..85467a3ef8ea 100644 > > --- a/virt/kvm/guest_memfd.c > > +++ b/virt/kvm/guest_memfd.c > > @@ -317,9 +317,102 @@ void kvm_gmem_handle_folio_put(struct folio *folio) > > { > > WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress."); > > } > > + > > +static bool kvm_gmem_offset_is_shared(struct file *file, pgoff_t index) > > +{ > > + struct kvm_gmem *gmem = file->private_data; > > + > > + /* For now, VMs that support shared memory share all their memory. */ > > + return kvm_arch_gmem_supports_shared_mem(gmem->kvm); > > +} > > + > > +static vm_fault_t kvm_gmem_fault(struct vm_fault *vmf) > > +{ > > + struct inode *inode = file_inode(vmf->vma->vm_file); > > + struct folio *folio; > > + vm_fault_t ret = VM_FAULT_LOCKED; > > + > > + filemap_invalidate_lock_shared(inode->i_mapping); > > + > > + folio = kvm_gmem_get_folio(inode, vmf->pgoff); > > + if (IS_ERR(folio)) { > > + ret = VM_FAULT_SIGBUS; > > Will it always be a SIGBUS if there is some error getting a folio? I could try to expand it, and map more of the VM_FAULT_* to the potential return values from __filemap_get_folio(), i.e., -EAGAIN -> VM_FAULT_RETRY -ENOMEM -> VM_FAULT_OOM but some don't really map 1:1 if you read the description. For example is it correct to map -ENOENT -> VM_FAULT_NOPAGE /* fault installed the pte, not return page */ That said, it might be useful to map at least EAGAIN and ENOMEM. > > + goto out_filemap; > > + } > > + > > + if (folio_test_hwpoison(folio)) { > > + ret = VM_FAULT_HWPOISON; > > + goto out_folio; > > + } > > + > > + /* Must be called with folio lock held, i.e., after kvm_gmem_get_folio() */ > > + if (!kvm_gmem_offset_is_shared(vmf->vma->vm_file, vmf->pgoff)) { > > + ret = VM_FAULT_SIGBUS; > > + goto out_folio; > > + } > > + > > + /* > > + * Only private folios are marked as "guestmem" so far, and we never > > + * expect private folios at this point. > > + */ > > Proposal - rephrase this comment as: before typed folios can be mapped, > PGTY_guestmem is only tagged on folios so that guest_memfd will receive > the kvm_gmem_handle_folio_put() callback. The tag is definitely not > expected when a folio is about to be faulted in. > > I propose the above because I think technically when mappability is NONE > the folio isn't private? Not sure if others see this differently. A folio whose mappability is NONE is private as far as the host is concerned. That said, I can rephrase this for clarity. > > + if (WARN_ON_ONCE(folio_test_guestmem(folio))) { > > + ret = VM_FAULT_SIGBUS; > > + goto out_folio; > > + } > > + > > + /* No support for huge pages. */ > > + if (WARN_ON_ONCE(folio_test_large(folio))) { > > + ret = VM_FAULT_SIGBUS; > > + goto out_folio; > > + } > > + > > + if (!folio_test_uptodate(folio)) { > > + clear_highpage(folio_page(folio, 0)); > > + kvm_gmem_mark_prepared(folio); > > + } > > + > > + vmf->page = folio_file_page(folio, vmf->pgoff); > > + > > +out_folio: > > + if (ret != VM_FAULT_LOCKED) { > > + folio_unlock(folio); > > + folio_put(folio); > > + } > > + > > +out_filemap: > > + filemap_invalidate_unlock_shared(inode->i_mapping); > > + > > + return ret; > > +} > > + > > +static const struct vm_operations_struct kvm_gmem_vm_ops = { > > + .fault = kvm_gmem_fault, > > +}; > > + > > +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma) > > +{ > > + struct kvm_gmem *gmem = file->private_data; > > + > > + if (!kvm_arch_gmem_supports_shared_mem(gmem->kvm)) > > + return -ENODEV; > > + > > + if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) != > > + (VM_SHARED | VM_MAYSHARE)) { > > + return -EINVAL; > > + } > > + > > + file_accessed(file); > > + vm_flags_set(vma, VM_DONTDUMP); > > + vma->vm_ops = &kvm_gmem_vm_ops; > > + > > + return 0; > > +} > > +#else > > +#define kvm_gmem_mmap NULL > > #endif /* CONFIG_KVM_GMEM_SHARED_MEM */ > > > > static struct file_operations kvm_gmem_fops = { > > + .mmap = kvm_gmem_mmap, > > I think it's better to surround this with #ifdef > CONFIG_KVM_GMEM_SHARED_MEM so that when more code gets inserted between > the struct declaration and the definition of kvm_gmem_mmap() it is more > obvious that .mmap is only overridden when CONFIG_KVM_GMEM_SHARED_MEM is > set. This is a question of style, but I prefer this because it avoids having more ifdefs. I think that I've seen this pattern elsewhere in the kernel. That said, if others disagree, I'm happy to change this. Thank you, /fuad > > > .open = generic_file_open, > > .release = kvm_gmem_release, > > .fallocate = kvm_gmem_fallocate,