From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1CEC7CFC286 for ; Tue, 15 Oct 2024 10:28:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9A73B6B0082; Tue, 15 Oct 2024 06:28:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 957376B0085; Tue, 15 Oct 2024 06:28:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7DB1B6B0082; Tue, 15 Oct 2024 06:28:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 54C9D6B0082 for ; Tue, 15 Oct 2024 06:28:30 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id C9D64AC176 for ; Tue, 15 Oct 2024 10:28:12 +0000 (UTC) X-FDA: 82675462044.03.902C6E5 Received: from mail-ed1-f43.google.com (mail-ed1-f43.google.com [209.85.208.43]) by imf10.hostedemail.com (Postfix) with ESMTP id C73A0C000C for ; Tue, 15 Oct 2024 10:28:24 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=fkI8KKsN; spf=pass (imf10.hostedemail.com: domain of tabba@google.com designates 209.85.208.43 as permitted sender) smtp.mailfrom=tabba@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1728987964; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=RSTtzbmx1EuT92ep7OYTLC8fP/gGDl2rPFFVcdlePDo=; b=D+D8FI5nJe+pDifxATwKM9AxW6LaMCyoR9Rj7nU8bFRQ8wjE7VXPjL/l3e7Mn12pONWjKQ ZYIv3PW4PdBuDk5vhlt1HiDN09KZJuOGu8+WgoB73p8VZlNKAdK2fRH2h37YYGVrctOaMd LMOyczzaRmNCzRjVKpWtFdh1y2h11L4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1728987964; a=rsa-sha256; cv=none; b=aDjKQggg9fna+y5VD9YOlYcRjUKNX31GNVBiOanS5wKJ8hrew6JPMip5CdiRAZgF3kBz/b MRf2+Ms7oGo8SLd8wkmJd84T4A7IW6kjQ1Ru9vn7etEl+bhSUQlq4Owv7xnVFEDDHewJGQ Rfm90yEuzatGjGSafvNztV+DPh8Bna4= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=fkI8KKsN; spf=pass (imf10.hostedemail.com: domain of tabba@google.com designates 209.85.208.43 as permitted sender) smtp.mailfrom=tabba@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-ed1-f43.google.com with SMTP id 4fb4d7f45d1cf-5c932b47552so41666a12.0 for ; Tue, 15 Oct 2024 03:28:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1728988106; x=1729592906; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=RSTtzbmx1EuT92ep7OYTLC8fP/gGDl2rPFFVcdlePDo=; b=fkI8KKsNEbJQ+9f7igeORj+WyA4xs17Zo+d1LfBYQ4JKs5QII1Dfuurwmcoxw+kk8/ dxI96kYNOzo3rXVeVqsH3gWA0Da2RdoLtNASDDcQAb1rnzu2K6Oq1q+ObNHO7BdIMtcQ ByxoHWW7H4EmfrGOC5SlUwY0uDjq/sbKOvovuihYqVohvALlIplY//TWxT/7iyR/v3HG OOdIgqWluIg7lCX4ezOVWoSAhoKV1BxDbp5WvXVzI5LRxk2Jt2HjdvSHt7AskE32aiUf JR16IUldUowFF0nKlu+D7v4nsNBV4fbD2vL+63MbkQh0LFdTrFl1zzFJhrTVoUThW9ZE Kr+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728988106; x=1729592906; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=RSTtzbmx1EuT92ep7OYTLC8fP/gGDl2rPFFVcdlePDo=; b=ZLVE1f5oIg21arACiJNxC4x317lu8Ra/9/NqK4sTOX8MKVvIgxf8N8fyux15pXWgdE PQw5Koda0EZsZuL6AIACd0SJVWi2nq7w2jrtZHTOyLcMIlEdFV2NCSFmatXza5krWOkf TiBz/prNxdgi8mysHi9GpLas7TPmYSqLCeQQYeLxaJi+dbc2SMJjxZa1hVspCmOFFHs2 YXlXtTQYxFjDTgazN2didWI6d1+f0BLqG31PY5W8ryWWeKIFshyp+hmyuILKHL+Rj9qm eS4iiXPCmCUrUQFIvEfnstnB0UDmmZTKXpfYKE4zJfdMQl8YJJSFqGzbbevVHNh8TUXd wT8A== X-Forwarded-Encrypted: i=1; AJvYcCUuSjwBrVE2YDN9PPbFesy3G067QBTniv1V3/jckPJEJm77BkSSaleubdCQRxe861/559BApUSPEg==@kvack.org X-Gm-Message-State: AOJu0YyxQWOdkKY7fPjhKtre3j+VbI9NAbSb51h4zn9fvZXVMvUWf+CU EuLM9xLNYhySudRq/Imy9X66g/6KRL8oy63R4w1jwM/rRjIxMPOR7P3jr0kJA1bccgKfWRd3qSi e3tJXUWcOBfr6KBdh7S35s3UitlNGwLE0PEAY X-Google-Smtp-Source: AGHT+IHKqOIJnbk+AfNxn/S0niZ5Lw7/6fa3hE6OhHpYtKLObsvZB+OHC0HlZcl6gO8D+mQTsBl4PFyDsCDec2OIPyk= X-Received: by 2002:a05:6402:5251:b0:5c8:aefb:746d with SMTP id 4fb4d7f45d1cf-5c95c5ee259mr591403a12.5.1728988105987; Tue, 15 Oct 2024 03:28:25 -0700 (PDT) MIME-Version: 1.0 References: <20241010085930.1546800-1-tabba@google.com> <20241010085930.1546800-5-tabba@google.com> <20241011102208348-0700.eberman@hu-eberman-lv.qualcomm.com> In-Reply-To: <20241011102208348-0700.eberman@hu-eberman-lv.qualcomm.com> From: Fuad Tabba Date: Tue, 15 Oct 2024 11:27:48 +0100 Message-ID: Subject: Re: [PATCH v3 04/11] KVM: guest_memfd: Allow host to mmap guest_memfd() pages when shared To: Elliot Berman Cc: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: u4rtqccg6bdbpf5mrje4gbddsni8jcyd X-Rspamd-Queue-Id: C73A0C000C X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1728988104-365240 X-HE-Meta: U2FsdGVkX1+CoMraC3MNiO8Wo1vMAbrB6ChDDtox7GFU00SzVwnkdqb9H8jfwSFBBgataWXsJCOjPpPuS3UQkTwTqgIcrczblsXu7URRvSGFX+GPcYbLTMfnnCR/BXBy+NxxnpRLDNAJqE4ju9Dw3FgWVIUqV+oZHaH9yZIUCoXgERxnwYB3J+N0B+GLXmlECy3sU4GNtsEYWeFcnmRCw080hu5YCoVJdkLZXjd516kaQq5CNxQVcMsaiQeluYvzBDxqt54Przn2golC9uQCEHG/6/gyHwmMjXb3o7ruBSUejB5jpSeWpX3VVS4If/ICPZ9FKB1C1Fzl29f92YWK3UX5AJCTmpF5YAxeYZZ7QX9WuWHNufIHDfehP5/2foXpiORrQIaWZUNhM5VHTKp6KmDmv6kyjJqwLEMLB1Cpu+EeH+HtRc/ok86/sR9QcNuVU+hSKBSV+GYe0I3if2Bd8xb8zpgh/IrHXLw8mzACanpz97eChR6X9HPGcFcSqnvW6ifafYnfMinTTQsQSmq4LFkor5WKoPlRI+tX7C3+xHP19dfRBejOaYHBgtw0wA9Ep8Q9W+s597WyWz0kArEXru9QoT2CCycof1AYOj6x7L1WY8hbpCN5f7DwP4MxPipL73jB+CFlpFe66folA42Rd23jVJgynOMQI0A96WJVzPAzqIeb71n0CWozevmvOwfz5dexhu3CqvRV8ycxNGM04C1r83VbQPu4Rc0OeZvvT6PzqBV4QMTJVU2Dva1g2AS+5M+6mKoNW05FIh7nMKYBbCVczKdpQyRvACMQcBrdhxyoO4lMn2Z4ULElQktnhd8peZjCyRv+agLM8GvPJRLlcss/dGOw3IXNg6RsCjCQuUsokm+72XTGt4/0BM7i7QCrH4nX6SHIuEfQEfhRmCz8AdOUwWdem98wblIzwukV5rzknOP6QEdpO1r+0trc6eP+uUT0Ze3ZeZT3AIs9NQG /ZSJEV/j a52nzkUw3jChvBxarPAxAsLno4uvr6MO6gFedeJJodmt/hs+fzffYEfof/tyEvz6nMkOKwzeFPKepAHlvRGxOnm6xiNlqG4A/EsN0YrVyg4as1IBDvgpu1iNsNFTM7wg7sc/bN0BirwGe6iKNivw1c8Kw4TV54WkYgL39s2Qt7ALrJHohbYYrkr+Pn43DGq075o058bZl+dZArFTwwrr0amG8d8O5pSnzU6sBQVlHSkB0I3TyujIY0IxL60r0k/ApmYW+vEMX4NEb/7yRhQxlO+DUWboqxFZEEQ4Fg7bvINCm0LEMyVxjHqg92SYPztH8FIr0m8bdOt+h57OUiKAXZWhgPnKII5TX9NGhmk8u5ON6MgRXi3ESzMTzKtORu/PDr4DzruCFy1J+ysQ5S7KysKsMpJ9kW1oFr1Q/73GEuCP3hJjYJ2gkoYyPLwZedboT+TB4UkJRqkLxYxA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Elliot, On Mon, 14 Oct 2024 at 17:53, Elliot Berman wrote: > > On Thu, Oct 10, 2024 at 09:59:23AM +0100, Fuad Tabba wrote: > > Add support for mmap() and fault() for guest_memfd in the host. > > The ability to fault in a guest page is contingent on that page > > being shared with the host. > > > > The guest_memfd PRIVATE memory attribute is not used for two > > reasons. First because it reflects the userspace expectation for > > that memory location, and therefore can be toggled by userspace. > > The second is, although each guest_memfd file has a 1:1 binding > > with a KVM instance, the plan is to allow multiple files per > > inode, e.g. to allow intra-host migration to a new KVM instance, > > without destroying guest_memfd. > > > > The mapping is restricted to only memory explicitly shared with > > the host. KVM checks that the host doesn't have any mappings for > > private memory via the folio's refcount. To avoid races between > > paths that check mappability and paths that check whether the > > host has any mappings (via the refcount), the folio lock is held > > in while either check is being performed. > > > > This new feature is gated with a new configuration option, > > CONFIG_KVM_GMEM_MAPPABLE. > > > > Co-developed-by: Ackerley Tng > > Signed-off-by: Ackerley Tng > > Co-developed-by: Elliot Berman > > Signed-off-by: Elliot Berman > > Signed-off-by: Fuad Tabba > > > > --- > > > > Note that the functions kvm_gmem_is_mapped(), > > kvm_gmem_set_mappable(), and int kvm_gmem_clear_mappable() are > > not used in this patch series. They are intended to be used in > > future patches [*], which check and toggle mapability when the > > guest shares/unshares pages with the host. > > > > [*] https://android-kvm.googlesource.com/linux/+/refs/heads/tabba/guestmem-6.12-v3-pkvm > > > > --- > > include/linux/kvm_host.h | 52 +++++++++++ > > virt/kvm/Kconfig | 4 + > > virt/kvm/guest_memfd.c | 185 +++++++++++++++++++++++++++++++++++++++ > > virt/kvm/kvm_main.c | 138 +++++++++++++++++++++++++++++ > > 4 files changed, 379 insertions(+) > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > index acf85995b582..bda7fda9945e 100644 > > --- a/include/linux/kvm_host.h > > +++ b/include/linux/kvm_host.h > > @@ -2527,4 +2527,56 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu, > > struct kvm_pre_fault_memory *range); > > #endif > > > > +#ifdef CONFIG_KVM_GMEM_MAPPABLE > > +bool kvm_gmem_is_mappable(struct kvm *kvm, gfn_t gfn, gfn_t end); > > +bool kvm_gmem_is_mapped(struct kvm *kvm, gfn_t start, gfn_t end); > > +int kvm_gmem_set_mappable(struct kvm *kvm, gfn_t start, gfn_t end); > > +int kvm_gmem_clear_mappable(struct kvm *kvm, gfn_t start, gfn_t end); > > +int kvm_slot_gmem_set_mappable(struct kvm_memory_slot *slot, gfn_t start, > > + gfn_t end); > > +int kvm_slot_gmem_clear_mappable(struct kvm_memory_slot *slot, gfn_t start, > > + gfn_t end); > > +bool kvm_slot_gmem_is_mappable(struct kvm_memory_slot *slot, gfn_t gfn); > > +#else > > +static inline bool kvm_gmem_is_mappable(struct kvm *kvm, gfn_t gfn, gfn_t end) > > +{ > > + WARN_ON_ONCE(1); > > + return false; > > +} > > +static inline bool kvm_gmem_is_mapped(struct kvm *kvm, gfn_t start, gfn_t end) > > +{ > > + WARN_ON_ONCE(1); > > + return false; > > +} > > +static inline int kvm_gmem_set_mappable(struct kvm *kvm, gfn_t start, gfn_t end) > > +{ > > + WARN_ON_ONCE(1); > > + return -EINVAL; > > +} > > +static inline int kvm_gmem_clear_mappable(struct kvm *kvm, gfn_t start, > > + gfn_t end) > > +{ > > + WARN_ON_ONCE(1); > > + return -EINVAL; > > +} > > +static inline int kvm_slot_gmem_set_mappable(struct kvm_memory_slot *slot, > > + gfn_t start, gfn_t end) > > +{ > > + WARN_ON_ONCE(1); > > + return -EINVAL; > > +} > > +static inline int kvm_slot_gmem_clear_mappable(struct kvm_memory_slot *slot, > > + gfn_t start, gfn_t end) > > +{ > > + WARN_ON_ONCE(1); > > + return -EINVAL; > > +} > > +static inline bool kvm_slot_gmem_is_mappable(struct kvm_memory_slot *slot, > > + gfn_t gfn) > > +{ > > + WARN_ON_ONCE(1); > > + return false; > > +} > > +#endif /* CONFIG_KVM_GMEM_MAPPABLE */ > > + > > #endif > > diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig > > index fd6a3010afa8..2cfcb0848e37 100644 > > --- a/virt/kvm/Kconfig > > +++ b/virt/kvm/Kconfig > > @@ -120,3 +120,7 @@ config HAVE_KVM_ARCH_GMEM_PREPARE > > config HAVE_KVM_ARCH_GMEM_INVALIDATE > > bool > > depends on KVM_PRIVATE_MEM > > + > > +config KVM_GMEM_MAPPABLE > > + select KVM_PRIVATE_MEM > > + bool > > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > > index f414646c475b..df3a6f05a16e 100644 > > --- a/virt/kvm/guest_memfd.c > > +++ b/virt/kvm/guest_memfd.c > > @@ -370,7 +370,184 @@ static void kvm_gmem_init_mount(void) > > kvm_gmem_mnt->mnt_flags |= MNT_NOEXEC; > > } > > > > +#ifdef CONFIG_KVM_GMEM_MAPPABLE > > +static struct folio * > > +__kvm_gmem_get_pfn(struct file *file, struct kvm_memory_slot *slot, > > + gfn_t gfn, kvm_pfn_t *pfn, bool *is_prepared, > > + int *max_order); > > + > > +static int gmem_set_mappable(struct inode *inode, pgoff_t start, pgoff_t end) > > +{ > > + struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets; > > + void *xval = xa_mk_value(true); > > + pgoff_t i; > > + bool r; > > + > > + filemap_invalidate_lock(inode->i_mapping); > > + for (i = start; i < end; i++) { > > + r = xa_err(xa_store(mappable_offsets, i, xval, GFP_KERNEL)); > > I think it might not be strictly necessary, Sorry, but I don't quite get what isn't strictly necessary. Is it the checking for an error? > > + if (r) > > + break; > > + } > > + filemap_invalidate_unlock(inode->i_mapping); > > + > > + return r; > > +} > > + > > +static int gmem_clear_mappable(struct inode *inode, pgoff_t start, pgoff_t end) > > +{ > > + struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets; > > + pgoff_t i; > > + int r = 0; > > + > > + filemap_invalidate_lock(inode->i_mapping); > > + for (i = start; i < end; i++) { > > + struct folio *folio; > > + > > + /* > > + * Holds the folio lock until after checking its refcount, > > + * to avoid races with paths that fault in the folio. > > + */ > > + folio = kvm_gmem_get_folio(inode, i); > > We don't need to allocate the folio here. I think we can use > > folio = filemap_lock_folio(inode, i); > if (!folio || WARN_ON_ONCE(IS_ERR(folio))) > continue; Good point (it takes an inode->i_mapping though). > folio = filemap_lock_folio(inode->i_mapping, i); > > + if (WARN_ON_ONCE(IS_ERR(folio))) > > + continue; > > + > > + /* > > + * Check that the host doesn't have any mappings on clearing > > + * the mappable flag, because clearing the flag implies that the > > + * memory will be unshared from the host. Therefore, to maintain > > + * the invariant that the host cannot access private memory, we > > + * need to check that it doesn't have any mappings to that > > + * memory before making it private. > > + * > > + * Two references are expected because of kvm_gmem_get_folio(). > > + */ > > + if (folio_ref_count(folio) > 2) > > If we'd like to be prepared for large folios, it should be > folio_nr_pages(folio) + 1. Will do that. Thanks! /fuad > > + r = -EPERM; > > + else > > + xa_erase(mappable_offsets, i); > > + > > + folio_put(folio); > > + folio_unlock(folio); > > + > > + if (r) > > + break; > > + } > > + filemap_invalidate_unlock(inode->i_mapping); > > + > > + return r; > > +} > > + > > +static bool gmem_is_mappable(struct inode *inode, pgoff_t pgoff) > > +{ > > + struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets; > > + bool r; > > + > > + filemap_invalidate_lock_shared(inode->i_mapping); > > + r = xa_find(mappable_offsets, &pgoff, pgoff, XA_PRESENT); > > + filemap_invalidate_unlock_shared(inode->i_mapping); > > + > > + return r; > > +} > > + > > +int kvm_slot_gmem_set_mappable(struct kvm_memory_slot *slot, gfn_t start, gfn_t end) > > +{ > > + struct inode *inode = file_inode(slot->gmem.file); > > + pgoff_t start_off = slot->gmem.pgoff + start - slot->base_gfn; > > + pgoff_t end_off = start_off + end - start; > > + > > + return gmem_set_mappable(inode, start_off, end_off); > > +} > > + > > +int kvm_slot_gmem_clear_mappable(struct kvm_memory_slot *slot, gfn_t start, gfn_t end) > > +{ > > + struct inode *inode = file_inode(slot->gmem.file); > > + pgoff_t start_off = slot->gmem.pgoff + start - slot->base_gfn; > > + pgoff_t end_off = start_off + end - start; > > + > > + return gmem_clear_mappable(inode, start_off, end_off); > > +} > > + > > +bool kvm_slot_gmem_is_mappable(struct kvm_memory_slot *slot, gfn_t gfn) > > +{ > > + struct inode *inode = file_inode(slot->gmem.file); > > + unsigned long pgoff = slot->gmem.pgoff + gfn - slot->base_gfn; > > + > > + return gmem_is_mappable(inode, pgoff); > > +} > > + > > +static vm_fault_t kvm_gmem_fault(struct vm_fault *vmf) > > +{ > > + struct inode *inode = file_inode(vmf->vma->vm_file); > > + struct folio *folio; > > + vm_fault_t ret = VM_FAULT_LOCKED; > > + > > + /* > > + * Holds the folio lock until after checking whether it can be faulted > > + * in, to avoid races with paths that change a folio's mappability. > > + */ > > + folio = kvm_gmem_get_folio(inode, vmf->pgoff); > > + if (!folio) > > + return VM_FAULT_SIGBUS; > > + > > + if (folio_test_hwpoison(folio)) { > > + ret = VM_FAULT_HWPOISON; > > + goto out; > > + } > > + > > + if (!gmem_is_mappable(inode, vmf->pgoff)) { > > + ret = VM_FAULT_SIGBUS; > > + goto out; > > + } > > + > > + if (!folio_test_uptodate(folio)) { > > + unsigned long nr_pages = folio_nr_pages(folio); > > + unsigned long i; > > + > > + for (i = 0; i < nr_pages; i++) > > + clear_highpage(folio_page(folio, i)); > > + > > + folio_mark_uptodate(folio); > > + } > > + > > + vmf->page = folio_file_page(folio, vmf->pgoff); > > +out: > > + if (ret != VM_FAULT_LOCKED) { > > + folio_put(folio); > > + folio_unlock(folio); > > + } > > + > > + return ret; > > +} > > + > > +static const struct vm_operations_struct kvm_gmem_vm_ops = { > > + .fault = kvm_gmem_fault, > > +}; > > + > > +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma) > > +{ > > + if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) != > > + (VM_SHARED | VM_MAYSHARE)) { > > + return -EINVAL; > > + } > > + > > + file_accessed(file); > > + vm_flags_set(vma, VM_DONTDUMP); > > + vma->vm_ops = &kvm_gmem_vm_ops; > > + > > + return 0; > > +} > > +#else > > +static int gmem_set_mappable(struct inode *inode, pgoff_t start, pgoff_t end) > > +{ > > + WARN_ON_ONCE(1); > > + return -EINVAL; > > +} > > +#define kvm_gmem_mmap NULL > > +#endif /* CONFIG_KVM_GMEM_MAPPABLE */ > > + > > static struct file_operations kvm_gmem_fops = { > > + .mmap = kvm_gmem_mmap, > > .open = generic_file_open, > > .release = kvm_gmem_release, > > .fallocate = kvm_gmem_fallocate, > > @@ -557,6 +734,14 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) > > goto err_gmem; > > } > > > > + if (IS_ENABLED(CONFIG_KVM_GMEM_MAPPABLE)) { > > + err = gmem_set_mappable(file_inode(file), 0, size >> PAGE_SHIFT); > > + if (err) { > > + fput(file); > > + goto err_gmem; > > + } > > + } > > + > > kvm_get_kvm(kvm); > > gmem->kvm = kvm; > > xa_init(&gmem->bindings); > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > index 05cbb2548d99..aed9cf2f1685 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -3263,6 +3263,144 @@ static int next_segment(unsigned long len, int offset) > > return len; > > } > > > > +#ifdef CONFIG_KVM_GMEM_MAPPABLE > > +static bool __kvm_gmem_is_mappable(struct kvm *kvm, gfn_t start, gfn_t end) > > +{ > > + struct kvm_memslot_iter iter; > > + > > + lockdep_assert_held(&kvm->slots_lock); > > + > > + kvm_for_each_memslot_in_gfn_range(&iter, kvm_memslots(kvm), start, end) { > > + struct kvm_memory_slot *memslot = iter.slot; > > + gfn_t gfn_start, gfn_end, i; > > + > > + gfn_start = max(start, memslot->base_gfn); > > + gfn_end = min(end, memslot->base_gfn + memslot->npages); > > + if (WARN_ON_ONCE(gfn_start >= gfn_end)) > > + continue; > > + > > + for (i = gfn_start; i < gfn_end; i++) { > > + if (!kvm_slot_gmem_is_mappable(memslot, i)) > > + return false; > > + } > > + } > > + > > + return true; > > +} > > + > > +bool kvm_gmem_is_mappable(struct kvm *kvm, gfn_t start, gfn_t end) > > +{ > > + bool r; > > + > > + mutex_lock(&kvm->slots_lock); > > + r = __kvm_gmem_is_mappable(kvm, start, end); > > + mutex_unlock(&kvm->slots_lock); > > + > > + return r; > > +} > > + > > +static bool kvm_gmem_is_pfn_mapped(struct kvm *kvm, struct kvm_memory_slot *memslot, gfn_t gfn_idx) > > +{ > > + struct page *page; > > + bool is_mapped; > > + kvm_pfn_t pfn; > > + > > + /* > > + * Holds the folio lock until after checking its refcount, > > + * to avoid races with paths that fault in the folio. > > + */ > > + if (WARN_ON_ONCE(kvm_gmem_get_pfn_locked(kvm, memslot, gfn_idx, &pfn, NULL))) > > + return false; > > + > > + page = pfn_to_page(pfn); > > + > > + /* Two references are expected because of kvm_gmem_get_pfn_locked(). */ > > + is_mapped = page_ref_count(page) > 2; > > + > > + put_page(page); > > + unlock_page(page); > > + > > + return is_mapped; > > +} > > + > > +static bool __kvm_gmem_is_mapped(struct kvm *kvm, gfn_t start, gfn_t end) > > +{ > > + struct kvm_memslot_iter iter; > > + > > + lockdep_assert_held(&kvm->slots_lock); > > + > > + kvm_for_each_memslot_in_gfn_range(&iter, kvm_memslots(kvm), start, end) { > > + struct kvm_memory_slot *memslot = iter.slot; > > + gfn_t gfn_start, gfn_end, i; > > + > > + gfn_start = max(start, memslot->base_gfn); > > + gfn_end = min(end, memslot->base_gfn + memslot->npages); > > + if (WARN_ON_ONCE(gfn_start >= gfn_end)) > > + continue; > > + > > + for (i = gfn_start; i < gfn_end; i++) { > > + if (kvm_gmem_is_pfn_mapped(kvm, memslot, i)) > > + return true; > > + } > > + } > > + > > + return false; > > +} > > + > > +bool kvm_gmem_is_mapped(struct kvm *kvm, gfn_t start, gfn_t end) > > +{ > > + bool r; > > + > > + mutex_lock(&kvm->slots_lock); > > + r = __kvm_gmem_is_mapped(kvm, start, end); > > + mutex_unlock(&kvm->slots_lock); > > + > > + return r; > > +} > > + > > +static int kvm_gmem_toggle_mappable(struct kvm *kvm, gfn_t start, gfn_t end, > > + bool is_mappable) > > +{ > > + struct kvm_memslot_iter iter; > > + int r = 0; > > + > > + mutex_lock(&kvm->slots_lock); > > + > > + kvm_for_each_memslot_in_gfn_range(&iter, kvm_memslots(kvm), start, end) { > > + struct kvm_memory_slot *memslot = iter.slot; > > + gfn_t gfn_start, gfn_end; > > + > > + gfn_start = max(start, memslot->base_gfn); > > + gfn_end = min(end, memslot->base_gfn + memslot->npages); > > + if (WARN_ON_ONCE(start >= end)) > > + continue; > > + > > + if (is_mappable) > > + r = kvm_slot_gmem_set_mappable(memslot, gfn_start, gfn_end); > > + else > > + r = kvm_slot_gmem_clear_mappable(memslot, gfn_start, gfn_end); > > + > > + if (WARN_ON_ONCE(r)) > > + break; > > + } > > + > > + mutex_unlock(&kvm->slots_lock); > > + > > + return r; > > +} > > + > > +int kvm_gmem_set_mappable(struct kvm *kvm, gfn_t start, gfn_t end) > > +{ > > + return kvm_gmem_toggle_mappable(kvm, start, end, true); > > +} > > + > > +int kvm_gmem_clear_mappable(struct kvm *kvm, gfn_t start, gfn_t end) > > +{ > > + return kvm_gmem_toggle_mappable(kvm, start, end, false); > > +} > > + > > +#endif /* CONFIG_KVM_GMEM_MAPPABLE */ > > + > > /* Copy @len bytes from guest memory at '(@gfn * PAGE_SIZE) + @offset' to @data */ > > static int __kvm_read_guest_page(struct kvm_memory_slot *slot, gfn_t gfn, > > void *data, int offset, int len) > > -- > > 2.47.0.rc0.187.ge670bccf7e-goog > >