From: Fuad Tabba <tabba@google.com>
To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org
Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au,
anup@brainfault.org, paul.walmsley@sifive.com,
palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com,
viro@zeniv.linux.org.uk, brauner@kernel.org,
willy@infradead.org, akpm@linux-foundation.org,
xiaoyao.li@intel.com, yilun.xu@intel.com,
chao.p.peng@linux.intel.com, jarkko@kernel.org,
amoorthy@google.com, dmatlack@google.com,
yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com,
mic@digikod.net, vbabka@suse.cz, vannapurve@google.com,
ackerleytng@google.com, mail@maciej.szmigiero.name,
david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com,
liam.merwick@oracle.com, isaku.yamahata@gmail.com,
kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com,
steven.price@arm.com, quic_eberman@quicinc.com,
quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com,
quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com,
quic_pderrin@quicinc.com, quic_pheragu@quicinc.com,
catalin.marinas@arm.com, james.morse@arm.com,
yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org,
will@kernel.org, qperret@google.com, keirf@google.com,
roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org,
jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com,
fvdl@google.com, hughd@google.com, jthoughton@google.com,
tabba@google.com
Subject: [RFC PATCH v4 05/14] KVM: guest_memfd: Folio mappability states and functions that manage their transition
Date: Fri, 13 Dec 2024 16:48:01 +0000 [thread overview]
Message-ID: <20241213164811.2006197-6-tabba@google.com> (raw)
In-Reply-To: <20241213164811.2006197-1-tabba@google.com>
To allow restricted mapping of guest_memfd folios by the host,
guest_memfd needs to track whether they can be mapped and by who,
since the mapping will only be allowed under conditions where it
safe to access these folios. These conditions depend on the
folios being explicitly shared with the host, or not yet exposed
to the guest (e.g., at initialization).
This patch introduces states that determine whether the host and
the guest can fault in the folios as well as the functions that
manage transitioning between those states.
Signed-off-by: Fuad Tabba <tabba@google.com>
---
include/linux/kvm_host.h | 53 ++++++++++++++
virt/kvm/guest_memfd.c | 153 +++++++++++++++++++++++++++++++++++++++
virt/kvm/kvm_main.c | 92 +++++++++++++++++++++++
3 files changed, 298 insertions(+)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index cda3ed4c3c27..84aa7908a5dd 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2564,4 +2564,57 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
struct kvm_pre_fault_memory *range);
#endif
+#ifdef CONFIG_KVM_GMEM_MAPPABLE
+bool kvm_gmem_is_mappable(struct kvm *kvm, gfn_t gfn, gfn_t end);
+int kvm_gmem_set_mappable(struct kvm *kvm, gfn_t start, gfn_t end);
+int kvm_gmem_clear_mappable(struct kvm *kvm, gfn_t start, gfn_t end);
+int kvm_slot_gmem_set_mappable(struct kvm_memory_slot *slot, gfn_t start,
+ gfn_t end);
+int kvm_slot_gmem_clear_mappable(struct kvm_memory_slot *slot, gfn_t start,
+ gfn_t end);
+bool kvm_slot_gmem_is_mappable(struct kvm_memory_slot *slot, gfn_t gfn);
+bool kvm_slot_gmem_is_guest_mappable(struct kvm_memory_slot *slot, gfn_t gfn);
+#else
+static inline bool kvm_gmem_is_mappable(struct kvm *kvm, gfn_t gfn, gfn_t end)
+{
+ WARN_ON_ONCE(1);
+ return false;
+}
+static inline int kvm_gmem_set_mappable(struct kvm *kvm, gfn_t start, gfn_t end)
+{
+ WARN_ON_ONCE(1);
+ return -EINVAL;
+}
+static inline int kvm_gmem_clear_mappable(struct kvm *kvm, gfn_t start,
+ gfn_t end)
+{
+ WARN_ON_ONCE(1);
+ return -EINVAL;
+}
+static inline int kvm_slot_gmem_set_mappable(struct kvm_memory_slot *slot,
+ gfn_t start, gfn_t end)
+{
+ WARN_ON_ONCE(1);
+ return -EINVAL;
+}
+static inline int kvm_slot_gmem_clear_mappable(struct kvm_memory_slot *slot,
+ gfn_t start, gfn_t end)
+{
+ WARN_ON_ONCE(1);
+ return -EINVAL;
+}
+static inline bool kvm_slot_gmem_is_mappable(struct kvm_memory_slot *slot,
+ gfn_t gfn)
+{
+ WARN_ON_ONCE(1);
+ return false;
+}
+static inline bool kvm_slot_gmem_is_guest_mappable(struct kvm_memory_slot *slot,
+ gfn_t gfn)
+{
+ WARN_ON_ONCE(1);
+ return false;
+}
+#endif /* CONFIG_KVM_GMEM_MAPPABLE */
+
#endif
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 0a7b6cf8bd8f..d1c192927cf7 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -375,6 +375,159 @@ static void kvm_gmem_init_mount(void)
kvm_gmem_mnt->mnt_flags |= MNT_NOEXEC;
}
+#ifdef CONFIG_KVM_GMEM_MAPPABLE
+/*
+ * An enum of the valid states that describe who can map a folio.
+ * Bit 0: if set guest cannot map the page
+ * Bit 1: if set host cannot map the page
+ */
+enum folio_mappability {
+ KVM_GMEM_ALL_MAPPABLE = 0b00, /* Mappable by host and guest. */
+ KVM_GMEM_GUEST_MAPPABLE = 0b10, /* Mappable only by guest. */
+ KVM_GMEM_NONE_MAPPABLE = 0b11, /* Not mappable, transient state. */
+};
+
+/*
+ * Marks the range [start, end) as mappable by both the host and the guest.
+ * Usually called when guest shares memory with the host.
+ */
+static int gmem_set_mappable(struct inode *inode, pgoff_t start, pgoff_t end)
+{
+ struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets;
+ void *xval = xa_mk_value(KVM_GMEM_ALL_MAPPABLE);
+ pgoff_t i;
+ int r = 0;
+
+ filemap_invalidate_lock(inode->i_mapping);
+ for (i = start; i < end; i++) {
+ r = xa_err(xa_store(mappable_offsets, i, xval, GFP_KERNEL));
+ if (r)
+ break;
+ }
+ filemap_invalidate_unlock(inode->i_mapping);
+
+ return r;
+}
+
+/*
+ * Marks the range [start, end) as not mappable by the host. If the host doesn't
+ * have any references to a particular folio, then that folio is marked as
+ * mappable by the guest.
+ *
+ * However, if the host still has references to the folio, then the folio is
+ * marked and not mappable by anyone. Marking it is not mappable allows it to
+ * drain all references from the host, and to ensure that the hypervisor does
+ * not transition the folio to private, since the host still might access it.
+ *
+ * Usually called when guest unshares memory with the host.
+ */
+static int gmem_clear_mappable(struct inode *inode, pgoff_t start, pgoff_t end)
+{
+ struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets;
+ void *xval_guest = xa_mk_value(KVM_GMEM_GUEST_MAPPABLE);
+ void *xval_none = xa_mk_value(KVM_GMEM_NONE_MAPPABLE);
+ pgoff_t i;
+ int r = 0;
+
+ filemap_invalidate_lock(inode->i_mapping);
+ for (i = start; i < end; i++) {
+ struct folio *folio;
+ int refcount = 0;
+
+ folio = filemap_lock_folio(inode->i_mapping, i);
+ if (!IS_ERR(folio)) {
+ refcount = folio_ref_count(folio);
+ } else {
+ r = PTR_ERR(folio);
+ if (WARN_ON_ONCE(r != -ENOENT))
+ break;
+
+ folio = NULL;
+ }
+
+ /* +1 references are expected because of filemap_lock_folio(). */
+ if (folio && refcount > folio_nr_pages(folio) + 1) {
+ /*
+ * Outstanding references, the folio cannot be faulted
+ * in by anyone until they're dropped.
+ */
+ r = xa_err(xa_store(mappable_offsets, i, xval_none, GFP_KERNEL));
+ } else {
+ /*
+ * No outstanding references. Transition the folio to
+ * guest mappable immediately.
+ */
+ r = xa_err(xa_store(mappable_offsets, i, xval_guest, GFP_KERNEL));
+ }
+
+ if (folio) {
+ folio_unlock(folio);
+ folio_put(folio);
+ }
+
+ if (WARN_ON_ONCE(r))
+ break;
+ }
+ filemap_invalidate_unlock(inode->i_mapping);
+
+ return r;
+}
+
+static bool gmem_is_mappable(struct inode *inode, pgoff_t pgoff)
+{
+ struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets;
+ unsigned long r;
+
+ r = xa_to_value(xa_load(mappable_offsets, pgoff));
+
+ return (r == KVM_GMEM_ALL_MAPPABLE);
+}
+
+static bool gmem_is_guest_mappable(struct inode *inode, pgoff_t pgoff)
+{
+ struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets;
+ unsigned long r;
+
+ r = xa_to_value(xa_load(mappable_offsets, pgoff));
+
+ return (r == KVM_GMEM_ALL_MAPPABLE || r == KVM_GMEM_GUEST_MAPPABLE);
+}
+
+int kvm_slot_gmem_set_mappable(struct kvm_memory_slot *slot, gfn_t start, gfn_t end)
+{
+ struct inode *inode = file_inode(slot->gmem.file);
+ pgoff_t start_off = slot->gmem.pgoff + start - slot->base_gfn;
+ pgoff_t end_off = start_off + end - start;
+
+ return gmem_set_mappable(inode, start_off, end_off);
+}
+
+int kvm_slot_gmem_clear_mappable(struct kvm_memory_slot *slot, gfn_t start, gfn_t end)
+{
+ struct inode *inode = file_inode(slot->gmem.file);
+ pgoff_t start_off = slot->gmem.pgoff + start - slot->base_gfn;
+ pgoff_t end_off = start_off + end - start;
+
+ return gmem_clear_mappable(inode, start_off, end_off);
+}
+
+bool kvm_slot_gmem_is_mappable(struct kvm_memory_slot *slot, gfn_t gfn)
+{
+ struct inode *inode = file_inode(slot->gmem.file);
+ unsigned long pgoff = slot->gmem.pgoff + gfn - slot->base_gfn;
+
+ return gmem_is_mappable(inode, pgoff);
+}
+
+bool kvm_slot_gmem_is_guest_mappable(struct kvm_memory_slot *slot, gfn_t gfn)
+{
+ struct inode *inode = file_inode(slot->gmem.file);
+ unsigned long pgoff = slot->gmem.pgoff + gfn - slot->base_gfn;
+
+ return gmem_is_guest_mappable(inode, pgoff);
+}
+#endif /* CONFIG_KVM_GMEM_MAPPABLE */
+
static struct file_operations kvm_gmem_fops = {
.open = generic_file_open,
.release = kvm_gmem_release,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index de2c11dae231..fffff01cebe7 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3094,6 +3094,98 @@ static int next_segment(unsigned long len, int offset)
return len;
}
+#ifdef CONFIG_KVM_GMEM_MAPPABLE
+bool kvm_gmem_is_mappable(struct kvm *kvm, gfn_t start, gfn_t end)
+{
+ struct kvm_memslot_iter iter;
+ bool r = true;
+
+ mutex_lock(&kvm->slots_lock);
+
+ kvm_for_each_memslot_in_gfn_range(&iter, kvm_memslots(kvm), start, end) {
+ struct kvm_memory_slot *memslot = iter.slot;
+ gfn_t gfn_start, gfn_end, i;
+
+ if (!kvm_slot_can_be_private(memslot))
+ continue;
+
+ gfn_start = max(start, memslot->base_gfn);
+ gfn_end = min(end, memslot->base_gfn + memslot->npages);
+ if (WARN_ON_ONCE(gfn_start >= gfn_end))
+ continue;
+
+ for (i = gfn_start; i < gfn_end; i++) {
+ r = kvm_slot_gmem_is_mappable(memslot, i);
+ if (r)
+ goto out;
+ }
+ }
+out:
+ mutex_unlock(&kvm->slots_lock);
+
+ return r;
+}
+
+int kvm_gmem_set_mappable(struct kvm *kvm, gfn_t start, gfn_t end)
+{
+ struct kvm_memslot_iter iter;
+ int r = 0;
+
+ mutex_lock(&kvm->slots_lock);
+
+ kvm_for_each_memslot_in_gfn_range(&iter, kvm_memslots(kvm), start, end) {
+ struct kvm_memory_slot *memslot = iter.slot;
+ gfn_t gfn_start, gfn_end;
+
+ if (!kvm_slot_can_be_private(memslot))
+ continue;
+
+ gfn_start = max(start, memslot->base_gfn);
+ gfn_end = min(end, memslot->base_gfn + memslot->npages);
+ if (WARN_ON_ONCE(start >= end))
+ continue;
+
+ r = kvm_slot_gmem_set_mappable(memslot, gfn_start, gfn_end);
+ if (WARN_ON_ONCE(r))
+ break;
+ }
+
+ mutex_unlock(&kvm->slots_lock);
+
+ return r;
+}
+
+int kvm_gmem_clear_mappable(struct kvm *kvm, gfn_t start, gfn_t end)
+{
+ struct kvm_memslot_iter iter;
+ int r = 0;
+
+ mutex_lock(&kvm->slots_lock);
+
+ kvm_for_each_memslot_in_gfn_range(&iter, kvm_memslots(kvm), start, end) {
+ struct kvm_memory_slot *memslot = iter.slot;
+ gfn_t gfn_start, gfn_end;
+
+ if (!kvm_slot_can_be_private(memslot))
+ continue;
+
+ gfn_start = max(start, memslot->base_gfn);
+ gfn_end = min(end, memslot->base_gfn + memslot->npages);
+ if (WARN_ON_ONCE(start >= end))
+ continue;
+
+ r = kvm_slot_gmem_clear_mappable(memslot, gfn_start, gfn_end);
+ if (WARN_ON_ONCE(r))
+ break;
+ }
+
+ mutex_unlock(&kvm->slots_lock);
+
+ return r;
+}
+
+#endif /* CONFIG_KVM_GMEM_MAPPABLE */
+
/* Copy @len bytes from guest memory at '(@gfn * PAGE_SIZE) + @offset' to @data */
static int __kvm_read_guest_page(struct kvm_memory_slot *slot, gfn_t gfn,
void *data, int offset, int len)
--
2.47.1.613.gc27f4b7a9f-goog
next prev parent reply other threads:[~2024-12-13 16:48 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-13 16:47 [RFC PATCH v4 00/14] KVM: Restricted mapping of guest_memfd at the host and arm64 support Fuad Tabba
2024-12-13 16:47 ` [RFC PATCH v4 01/14] mm: Consolidate freeing of typed folios on final folio_put() Fuad Tabba
2024-12-13 16:47 ` [RFC PATCH v4 02/14] KVM: guest_memfd: Make guest mem use guest mem inodes instead of anonymous inodes Fuad Tabba
2024-12-13 16:47 ` [RFC PATCH v4 03/14] KVM: guest_memfd: Introduce kvm_gmem_get_pfn_locked(), which retains the folio lock Fuad Tabba
2024-12-13 16:48 ` [RFC PATCH v4 04/14] KVM: guest_memfd: Track mappability within a struct kvm_gmem_private Fuad Tabba
2024-12-13 16:48 ` Fuad Tabba [this message]
2024-12-13 16:48 ` [RFC PATCH v4 06/14] KVM: guest_memfd: Handle final folio_put() of guestmem pages Fuad Tabba
2024-12-13 16:48 ` [RFC PATCH v4 07/14] KVM: guest_memfd: Allow host to mmap guest_memfd() pages when shared Fuad Tabba
2024-12-27 4:21 ` Alexey Kardashevskiy
2025-01-09 10:17 ` Fuad Tabba
2024-12-13 16:48 ` [RFC PATCH v4 08/14] KVM: guest_memfd: Add guest_memfd support to kvm_(read|/write)_guest_page() Fuad Tabba
2024-12-13 16:48 ` [RFC PATCH v4 09/14] KVM: guest_memfd: Add KVM capability to check if guest_memfd is host mappable Fuad Tabba
2024-12-13 16:48 ` [RFC PATCH v4 10/14] KVM: guest_memfd: Add a guest_memfd() flag to initialize it as mappable Fuad Tabba
2024-12-13 16:48 ` [RFC PATCH v4 11/14] KVM: guest_memfd: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
2024-12-13 16:48 ` [RFC PATCH v4 12/14] KVM: arm64: Skip VMA checks for slots without userspace address Fuad Tabba
2024-12-13 16:48 ` [RFC PATCH v4 13/14] KVM: arm64: Handle guest_memfd()-backed guest page faults Fuad Tabba
2025-01-16 14:48 ` Patrick Roy
2025-01-16 15:16 ` Fuad Tabba
2024-12-13 16:48 ` [RFC PATCH v4 14/14] KVM: arm64: Enable guest_memfd private memory when pKVM is enabled Fuad Tabba
2025-01-09 16:34 ` [RFC PATCH v4 00/14] KVM: Restricted mapping of guest_memfd at the host and arm64 support Fuad Tabba
2025-01-16 0:35 ` Ackerley Tng
2025-01-16 9:19 ` Fuad Tabba
2025-01-20 9:26 ` Vlastimil Babka
2025-01-20 9:36 ` David Hildenbrand
2025-01-16 14:48 ` Patrick Roy
2025-01-16 15:02 ` Fuad Tabba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241213164811.2006197-6-tabba@google.com \
--to=tabba@google.com \
--cc=ackerleytng@google.com \
--cc=akpm@linux-foundation.org \
--cc=amoorthy@google.com \
--cc=anup@brainfault.org \
--cc=aou@eecs.berkeley.edu \
--cc=brauner@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=chao.p.peng@linux.intel.com \
--cc=chenhuacai@kernel.org \
--cc=david@redhat.com \
--cc=dmatlack@google.com \
--cc=fvdl@google.com \
--cc=hch@infradead.org \
--cc=hughd@google.com \
--cc=isaku.yamahata@gmail.com \
--cc=isaku.yamahata@intel.com \
--cc=james.morse@arm.com \
--cc=jarkko@kernel.org \
--cc=jgg@nvidia.com \
--cc=jhubbard@nvidia.com \
--cc=jthoughton@google.com \
--cc=keirf@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=liam.merwick@oracle.com \
--cc=linux-arm-msm@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mail@maciej.szmigiero.name \
--cc=maz@kernel.org \
--cc=mic@digikod.net \
--cc=michael.roth@amd.com \
--cc=mpe@ellerman.id.au \
--cc=oliver.upton@linux.dev \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=pbonzini@redhat.com \
--cc=qperret@google.com \
--cc=quic_cvanscha@quicinc.com \
--cc=quic_eberman@quicinc.com \
--cc=quic_mnalajal@quicinc.com \
--cc=quic_pderrin@quicinc.com \
--cc=quic_pheragu@quicinc.com \
--cc=quic_svaddagi@quicinc.com \
--cc=quic_tsoni@quicinc.com \
--cc=rientjes@google.com \
--cc=roypat@amazon.co.uk \
--cc=seanjc@google.com \
--cc=shuah@kernel.org \
--cc=steven.price@arm.com \
--cc=suzuki.poulose@arm.com \
--cc=vannapurve@google.com \
--cc=vbabka@suse.cz \
--cc=viro@zeniv.linux.org.uk \
--cc=wei.w.wang@intel.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=xiaoyao.li@intel.com \
--cc=yilun.xu@intel.com \
--cc=yu.c.zhang@linux.intel.com \
--cc=yuzenghui@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox